Safety and completeness in flow decompositions for RNA assembly
From MaRDI portal
Publication:2170145
DOI10.1007/978-3-031-04749-7_11zbMATH Open1494.92088arXiv2201.10372OpenAlexW4285161644MaRDI QIDQ2170145FDOQ2170145
Authors: Shahbaz Khan, Milla Kortelainen, Manuel O. Cáceres, Lucia Williams, Alexandru I. Tomescu
Publication date: 30 August 2022
Abstract: Decomposing a network flow into weighted paths has numerous applications. Some applications require any decomposition that is optimal w.r.t. some property such as number of paths, robustness, or length. Many bioinformatic applications require a specific decomposition where the paths correspond to some underlying data that generated the flow. For real inputs, no optimization criteria guarantees to uniquely identify the correct decomposition. Therefore, we propose to report safe paths, i.e., subpaths of at least one path in every flow decomposition. Ma, Zheng, and Kingsford [WABI 2020] addressed the existence of multiple optimal solutions in a probabilistic framework, i.e., non-identifiability. Later [RECOMB 2021], they gave a quadratic-time algorithm based on a global criterion for solving a problem called AND-Quant, which generalizes the problem of reporting whether a given path is safe. We give the first local characterization of safe paths for flow decompositions in directed acyclic graphs (DAGs), leading to a practical algorithm for finding the complete set of safe paths. We evaluated our algorithms against the trivial safe algorithms (unitigs, extended unitigs) and the popularly used heuristic (greedy-width) for flow decomposition on RNA transcripts datasets. Despite maintaining perfect precision our algorithm reports significantly higher coverage ( more) than trivial safe algorithms. The greedy-width algorithm though reporting a better coverage, has significantly lower precision on complex graphs. Overall, our algorithm outperforms (by ) greedy-width on a unified metric (F-Score) when the dataset has significant number of complex graphs. Moreover, it has superior time () and space efficiency (), resulting in a better and more practical approach for bioinformatics applications of flow decomposition.
Full work available at URL: https://arxiv.org/abs/2201.10372
Recommendations
- Flow Decomposition with Subpath Constraints
- A practical fpt algorithm for F<scp>low</scp> D<scp>ecomposition</scp> and transcript assembly
- Fast, flexible, and exact minimum flow decompositions via ILP
- Safety in \(s\)-\(t\) paths, trails and walks
- Simple bounds and greedy algorithms for decomposing a flow into a minimal set of paths
Cites Work
- Network flows. Theory, algorithms, and applications.
- Efficient string matching
- Persistency in maximum cardinality bipartite matchings
- Simple bounds and greedy algorithms for decomposing a flow into a minimal set of paths
- Integral flow decomposition with minimum longest path length
- Flow Decomposition with Subpath Constraints
- Combinatorial algorithms for DNA sequence assembly
- An Eulerian path approach to DNA fragment assembly
- Persistency in combinatorial optimization problems on matroids
- Safe and complete contig assembly via omnitigs
- An optimal \(O(nm)\) algorithm for enumerating all walks common to all closed edge-covering walks of a graph
- A study on flow decomposition methods for scheduling of electric buses in public transport based on aggregated time-space network models
- A practical fpt algorithm for F<scp>low</scp> D<scp>ecomposition</scp> and transcript assembly
- Exact transcript quantification over splice graphs
Cited In (6)
- Optimal Omnitig Listing for Safe and Complete Contig Assembly
- A practical fpt algorithm for F<scp>low</scp> D<scp>ecomposition</scp> and transcript assembly
- Feasibility of flow decomposition with subpath constraints in linear time
- Simplicity in Eulerian circuits: uniqueness and safety
- Flow Decomposition with Subpath Constraints
- Fast, flexible, and exact minimum flow decompositions via ILP
Uses Software
This page was built for publication: Safety and completeness in flow decompositions for RNA assembly
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2170145)