A topological approach for protein classification

From MaRDI portal
Publication:326629

DOI10.1515/MLBMB-2015-0009zbMATH Open1347.92054arXiv1510.00953OpenAlexW2481594651WikidataQ62727751 ScholiaQ62727751MaRDI QIDQ326629FDOQ326629


Authors: Zixuan Cang, Lin Mu, Kedi Wu, Kristopher Opron, G. W. Wei, Kelin Xia Edit this on Wikidata


Publication date: 12 October 2016

Published in: Molecular Based Mathematical Biology (Search for Journal in Brave)

Abstract: Protein function and dynamics are closely related to its sequence and structure. However prediction of protein function and dynamics from its sequence and structure is still a fundamental challenge in molecular biology. Protein classification, which is typically done through measuring the similarity be- tween proteins based on protein sequence or physical information, serves as a crucial step toward the understanding of protein function and dynamics. Persistent homology is a new branch of algebraic topology that has found its success in the topological data analysis in a variety of disciplines, including molecular biology. The present work explores the potential of using persistent homology as an indepen- dent tool for protein classification. To this end, we propose a molecular topological fingerprint based support vector machine (MTF-SVM) classifier. Specifically, we construct machine learning feature vectors solely from protein topological fingerprints, which are topological invariants generated during the filtration process. To validate the present MTF-SVM approach, we consider four types of problems. First, we study protein-drug binding by using the M2 channel protein of influenza A virus. We achieve 96% accuracy in discriminating drug bound and unbound M2 channels. Additionally, we examine the use of MTF-SVM for the classification of hemoglobin molecules in their relaxed and taut forms and obtain about 80% accuracy. The identification of all alpha, all beta, and alpha-beta protein domains is carried out in our next study using 900 proteins. We have found a 85% success in this identifica- tion. Finally, we apply the present technique to 55 classification tasks of protein superfamilies over 1357 samples. An average accuracy of 82% is attained. The present study establishes computational topology as an independent and effective alternative for protein classification.


Full work available at URL: https://arxiv.org/abs/1510.00953




Recommendations




Cites Work


Cited In (20)

Uses Software





This page was built for publication: A topological approach for protein classification

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q326629)