A topological approach for protein classification
From MaRDI portal
Publication:326629
DOI10.1515/MLBMB-2015-0009zbMATH Open1347.92054arXiv1510.00953OpenAlexW2481594651WikidataQ62727751 ScholiaQ62727751MaRDI QIDQ326629FDOQ326629
Authors: Zixuan Cang, Lin Mu, Kedi Wu, Kristopher Opron, G. W. Wei, Kelin Xia
Publication date: 12 October 2016
Published in: Molecular Based Mathematical Biology (Search for Journal in Brave)
Abstract: Protein function and dynamics are closely related to its sequence and structure. However prediction of protein function and dynamics from its sequence and structure is still a fundamental challenge in molecular biology. Protein classification, which is typically done through measuring the similarity be- tween proteins based on protein sequence or physical information, serves as a crucial step toward the understanding of protein function and dynamics. Persistent homology is a new branch of algebraic topology that has found its success in the topological data analysis in a variety of disciplines, including molecular biology. The present work explores the potential of using persistent homology as an indepen- dent tool for protein classification. To this end, we propose a molecular topological fingerprint based support vector machine (MTF-SVM) classifier. Specifically, we construct machine learning feature vectors solely from protein topological fingerprints, which are topological invariants generated during the filtration process. To validate the present MTF-SVM approach, we consider four types of problems. First, we study protein-drug binding by using the M2 channel protein of influenza A virus. We achieve 96% accuracy in discriminating drug bound and unbound M2 channels. Additionally, we examine the use of MTF-SVM for the classification of hemoglobin molecules in their relaxed and taut forms and obtain about 80% accuracy. The identification of all alpha, all beta, and alpha-beta protein domains is carried out in our next study using 900 proteins. We have found a 85% success in this identifica- tion. Finally, we apply the present technique to 55 classification tasks of protein superfamilies over 1357 samples. An average accuracy of 82% is attained. The present study establishes computational topology as an independent and effective alternative for protein classification.
Full work available at URL: https://arxiv.org/abs/1510.00953
Recommendations
Cites Work
- Javaplex: a research software package for persistent (co)homology
- The theory of multidimensional persistence
- Support-vector networks
- A topological view of unsupervised learning from noisy data
- Topology and data
- Computational topology. An introduction
- Persistence-based clustering in Riemannian manifolds
- Barcodes: The persistent topology of data
- Stability of persistence diagrams
- Computing persistent homology
- Topological persistence and simplification
- Persistent cohomology and circular coordinates
- On the local behavior of spaces of natural images
- Working set selection using second order information for training support vector machines
- Zigzag persistence
- Sliding windows and persistence: an application of topological methods to signal analysis
- Proximity of persistence modules and their diagrams
- A logical calculus of the ideas immanent in nervous activity
- PERSISTENCE BARCODES FOR SHAPES
- Computational homology
- Extending persistence using Poincaré and Lefschetz duality
- A topological measurement of protein compressibility
- Zigzag persistent homology and real-valued functions
- Topological methods in data analysis and visualization III. Theory, algorithms, and applications. Based on the 5th workshop on topology-based methods in data analysis and visualization, TopoInVis 2013, Davis, CA, USA, March 4--6, 2013
- Biomolecular surface construction by PDE transform
- Quantum dynamics in continuum for proton transport. II: Variational solvent-solute interface
- Persistent homology of complex networks
- Clear and compress: computing persistent homology in chunks
- A distance for similarity classes of submanifolds of a Euclidean space
- Computing multidimensional persistence
- Morse theory for filtrations and efficient computation of persistent homology
- A fast algorithm for constructing topological structure in large data
- Title not available (Why is that?)
- Persistent homology for kernels, images, and cokernels
- Computing topological persistence for simplicial maps (extended abstract)
- Variational multiscale models for charge transport
- Zigzag zoology
- Persistent intersection homology
- A Fast Learning Algorithm for Deep Belief Nets
- Topological and statistical methods for complex data. Tackling large-scale, high-dimensional, and multivariate data spaces. Selected papers based on the presentations at the workshop on the analysis of large-scale, high-dimensional, and multivariate data using topology and statistics, Le Barp, France, June 12--14, 2013
- Differential geometry based solvation model. I: Eulerian formulation
- A Mayer-Vietoris formula for persistent homology with an application to shape recognition in the presence of occlusions
- Differential geometry based solvation model II: Lagrangian formulation
- Differential geometry based multiscale models
- Geometric and potential driving formation and evolution of biomolecular surfaces
- Extreme elevation on a 2-manifold
Cited In (20)
- Protein classification with improved topological data analysis
- Canonical labels for protein spots of proteomics maps
- Centralities in simplicial complexes. Applications to protein interaction networks
- On the expectation of a persistence diagram by the persistence weighted kernel
- Using persistent homology and dynamical distances to analyze protein binding
- Evolutionary de Rham-Hodge method
- Evolutionary homology on coupled dynamical systems with applications to protein flexibility analysis
- Parameter estimation in systems exhibiting spatially complex solutions via persistent homology and machine learning
- PERCEPT: A New Online Change-Point Detection Method using Topological Data Analysis
- Analyzing Protein Data with the Generative Topographic Mapping Approach
- A topological data analysis approach on predicting phenotypes from gene expression data
- Kernel method for persistence diagrams via kernel embedding and weight factor
- Topological analysis of U.S. city demographics
- Atom-specific persistent homology and its application to protein flexibility analysis
- Homotopy continuation for the spectra of persistent Laplacians
- Persistent topology of protein space
- Multiscale persistent functions for biomolecular structure characterization
- Geometric metrics for topological representations
- Biomolecular topology: modelling and analysis
- Protein classification using texture descriptors extracted from the protein backbone image
Uses Software
This page was built for publication: A topological approach for protein classification
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q326629)