An external-memory algorithm for string graph construction
From MaRDI portal
Abstract: Some recent results have introduced external-memory algorithms to compute self-indexes of a set of strings, mainly via computing the Burrows-Wheeler Transform (BWT) of the input strings. The motivations for those results stem from Bioinformatics, where a large number of short strings (called reads) are routinely produced and analyzed. In that field, a fundamental problem is to assemble a genome from a large set of much shorter samples extracted from the unknown genome. The approaches that are currently used to tackle this problem are memory-intensive. This fact does not bode well with the ongoing increase in the availability of genomic data. A data structure that is used in genome assembly is the string graph, where vertices correspond to samples and arcs represent two overlapping samples. In this paper we address an open problem: to design an external-memory algorithm to compute the string graph.
Recommendations
- scientific article; zbMATH DE number 5499319
- Publication:4886043
- scientific article; zbMATH DE number 1522938
- External memory algorithms for finding disjoint paths in undirected graphs
- Design and Engineering of External Memory Traversal Algorithms for General Graphs
- On the complexity of string matching for graphs
- On the Complexity of String Matching for Graphs
- Publication:4952710
- scientific article; zbMATH DE number 1559569
Cites work
- scientific article; zbMATH DE number 1003281 (Why is no real title available?)
- scientific article; zbMATH DE number 1142307 (Why is no real title available?)
- scientific article; zbMATH DE number 1424324 (Why is no real title available?)
- scientific article; zbMATH DE number 2203240 (Why is no real title available?)
- Algorithms for parallel memory, I: Two-level memories
- Comparing DNA sequence collections by direct comparison of compressed text indexes
- Computing the multi-string BWT and LCP array in external memory
- Covering pairs in directed acyclic graphs
- Indexing compressed text
- Lightweight LCP Construction for Next-Generation Sequencing Datasets
- Lightweight algorithms for constructing and inverting the BWT of string collections
- Linear approximation of shortest superstrings
- Physical mapping of chromosomes: A combinatorial problem in molecular biology
- Replacing suffix trees with enhanced suffix arrays
- Scaling metagenome sequence assembly with probabilistic de Bruijn graphs
- The Burrows-Wheeler transform between data compression and combinatorics on words
- Trading off space for passes in graph streaming problems
Cited in
(7)- Construction of Fundamental Data Structures for Strings
- On the longest common prefix of suffixes in an inverse Lyndon factorization and other properties
- scientific article; zbMATH DE number 5499319 (Why is no real title available?)
- Computing the multi-string BWT and LCP array in external memory
- External memory BWT and LCP computation for sequence collections with applications
- Can formal languages help pangenomics to represent and analyze multiple genomes?
- A data structure for a sequence of string accesses in external memory
This page was built for publication: An external-memory algorithm for string graph construction
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2362353)