Indexing weighted sequences: neat and efficient
From MaRDI portal
Abstract: In a emph{weighted sequence}, for every position of the sequence and every letter of the alphabet a probability of occurrence of this letter at this position is specified. Weighted sequences are commonly used to represent imprecise or uncertain data, for example, in molecular biology where they are known under the name of Position-Weight Matrices. Given a probability threshold , we say that a string of length occurs in a weighted sequence at position if the product of probabilities of the letters of at positions in is at least . In this article, we consider an emph{indexing} variant of the problem, in which we are to preprocess a weighted sequence to answer multiple pattern matching queries. We present an -time construction of an -sized index for a weighted sequence of length over a constant-sized alphabet that answers pattern matching queries in optimal, time, where is the number of occurrences reported. The cornerstone of our data structure is a novel construction of a family of special strings that carries the information about all the strings that occur in the weighted sequence with a sufficient probability. We obtain a weighted index with the same complexities as in the most efficient previously known index by Barton et al. (CPM 2016), but our construction is significantly simpler. The most complex algorithmic tool required in the basic form of our index is the suffix tree which we use to develop a new, more straightforward index for the so-called property matching problem. We provide an implementation of our data structure. Our construction allows us also to obtain a significant improvement over the complexities of the approximate variant of the weighted index presented by Biswas et al. (EDBT 2016) and an improvement of the space complexity of their general index.
Recommendations
Cites work
- scientific article; zbMATH DE number 2119724 (Why is no real title available?)
- scientific article; zbMATH DE number 6792413 (Why is no real title available?)
- Algorithms on Strings
- Dynamic text and static pattern matching
- Errata for ``Faster index for property matching
- Fast Algorithms for Finding Nearest Common Ancestors
- Fast Average-Case Pattern Matching on Weighted Sequences
- Faster index for property matching
- Linear-time computation of prefix table for weighted strings {\&} applications
- Log-logarithmic worst-case range queries are possible in space theta(N)
- Lowest common ancestors in trees and directed acyclic graphs
- On the sorting-complexity of suffix tree construction
- On-Line Pattern Matching on Uncertain Sequences and Applications
- On-line construction of suffix trees
- On-line weighted pattern matching
- Pattern matching and consensus problems on weighted sequences and profiles
- Probability Inequalities for Sums of Bounded Random Variables
- Property matching and weighted matching
- Property suffix array with applications
- Storing a Sparse Table with 0 (1) Worst Case Access Time
- The property suffix tree with dynamic properties
- The weighted suffix tree: an efficient data structure for handling molecular weighted sequences and its applications
- Time-optimal top-\(k\) document retrieval
- Weighted ancestors in suffix trees
Cited in
(14)- Computing the Antiperiod(s) of a String
- String Covering: A Survey
- Efficient Computation of 2-Covers of a String.
- Property matching and weighted matching
- Streaming \(k\)-mismatch with error correcting and applications
- Property Suffix Array with Applications in Indexing Weighted Sequences
- Property suffix array with applications
- The weighted suffix tree: an efficient data structure for handling molecular weighted sequences and its applications
- String covers of a tree
- Experimental evaluation of algorithms for computing quasiperiods
- scientific article; zbMATH DE number 6792413 (Why is no real title available?)
- Alignment-free sequence comparison using absent words
- Property Matching and Weighted Matching
- Weighted shortest common supersequence problem revisited
This page was built for publication: Indexing weighted sequences: neat and efficient
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2288210)