Scalable big data analytics for protein bioinformatics. Efficient computational solutions for protein structures (Q1793982)

Big data is a term that is used for the efficient analysis of a large amount of data. To understand living organisms under certain research aspects, biological data are collected. The biological data these days are growing in numbers, handling such a type of data which expands in different directions and in different dimensions has become difficult to analyze quickly and efficiently on typical desktop computers. This leads to the need of high performance cloud computing. In this book, the author deals with various techniques that can be used for data handling and efficient analysis related to computational processes that require a great deal of time and effort, for example, structure similarity searching, protein structure modeling, protein structure alignment, and superposition. \par The author gives a brief introduction of protein and introduces a formal model of 3D protein structures for functional genomics, comparative bioinformatics, and molecular modeling. The author further describes some of the techniques that deal with protein structure exploration. He then has elaborates the concept of cloud computing and related concepts. He stresses upon the benefits of using multi-threading in cloud computing, which means performing multiple threads simultaneously on one core of the processor. The author emphasizes that, for faster execution and accelerating time consuming tasks related to processing the data, such multi-threaded processes are being used and these are explored further in the exposition as well. \par The author explicates the scalability of the system, which means to enable the system to perform the function efficiently, even if the load of the system is increased. The author points to some of the cloud services that provide this scalability or the flexibility to the system. First, using Microsoft Azure Cloud Services, the author builds up the Cloud4PSi system for the 3D protein structure similarity searching and then builds up the CloudPSP system for modeling 3D structures of protein, thus improving the efficiency of the search and prediction processes. \par Further, the author utilizes scalable big data computational framework, like Hadoop and Spark for exploring the 3D protein structure alignment. The author introduces the foundation of big data and big data framework, Apache Hadoop and Apache Spark. These two platforms allow scalable data processing and analysis. The author indicates that these systems provide faster speed of performing certain calculations that ease the development of solutions. The author utilizes the Hadoop and MapReduce processing model for the efficient mining of 3D protein structures and for structural superposition. He proposes the use of such system for finding protein structures of the unstructured data for the protein sequences that are already known, which could be utilized to get insights into the molecular basis of many diseases. \par The author winds up the book on the chapters dealing with the use of multi-threading and GPUs for finding protein structure similarity faster and efficiently.

0 references

reviewed by

Jasbir Kaur

0 references

Mathematics Subject Classification ID

0 references

0 references

0 references