The complexity of the co-occurrence problem
From MaRDI portal
Publication:6166971
Abstract: Let be a string of length over an alphabet and let be a subset of of size . The 'co-occurrence problem' is to construct a compact data structure that supports the following query: given an integer return the number of length- substrings of that contain each character of at least once. This is a natural string problem with applications to, e.g., data mining, natural language processing, and DNA analysis. The state of the art is an space data structure that -- with some minor additions -- supports queries in time [CPM 2021]. Our contributions are as follows. Firstly, we analyze the problem in terms of a new, natural parameter , giving a simple data structure that uses space and supports queries in time. The preprocessing algorithm does a single pass over , runs in expected time, and uses space in addition to the input. Furthermore, we show that space is optimal and that -time queries are optimal given optimal space. Secondly, we bound , giving clean bounds in terms of and that match the state of the art. Furthermore, we prove that bits of space is necessary in the worst case, meaning that the upper bound is tight to within polylogarithmic factors. All of our results are based on simple and intuitive combinatorial ideas that simplify the state of the art.
Recommendations
- Implicit complexity for coinductive data: a characterization of corecurrence
- The Complexity of Problems in P Given Correlated Instances
- Kolmogorov complexity cores
- A combinatorial approach to complexity
- scientific article; zbMATH DE number 62483
- On the complexity of the cogrowth sequence
- The complexity of counting problems
- The complexity of some complementation problems
- scientific article; zbMATH DE number 403953
Cites work
- scientific article; zbMATH DE number 6381684 (Why is no real title available?)
- scientific article; zbMATH DE number 1947405 (Why is no real title available?)
- AWLCO: All-Window Length Co-Occurrence
- Log-logarithmic worst-case range queries are possible in space theta(N)
- Universal classes of hash functions
Cited in
(3)
This page was built for publication: The complexity of the co-occurrence problem
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6166971)