An algorithm to compute the character access count distribution for pattern matching algorithms (Q1736492)

From MaRDI portal





scientific article; zbMATH DE number 7042109
Language Label Description Also known as
default for all languages
No label defined
    English
    An algorithm to compute the character access count distribution for pattern matching algorithms
    scientific article; zbMATH DE number 7042109

      Statements

      An algorithm to compute the character access count distribution for pattern matching algorithms (English)
      0 references
      0 references
      0 references
      0 references
      26 March 2019
      0 references
      Summary: We propose a framework for the exact probabilistic analysis of window-based pattern matching algorithms, such as Boyer-Moore, Horspool, Backward DAWG Matching, Backward Oracle Matching, and more. In particular, we develop an algorithm that efficiently computes the distribution of a pattern matching algorithm's running time cost (such as the number of text character accesses) for any given pattern in a random text model. Text models range from simple uniform models to higher-order Markov models or hidden Markov models (HMMs). Furthermore, we provide an algorithm to compute the exact distribution of \textit{differences} in running time cost of two pattern matching algorithms. Methodologically, we use extensions of finite automata which we call \textit{deterministic arithmetic automata} (DAAs) and \textit{probabilistic arithmetic automata} (PAAs) [the authors, Lect. Notes Comput. Sci. 5029, 95--106 (2008; Zbl 1143.68440)]. Given an algorithm, a pattern, and a text model, a PAA is constructed from which the sought distributions can be derived using dynamic programming. To our knowledge, this is the first time that substring- or suffix-based pattern matching algorithms are analyzed exactly by computing the whole distribution of running time cost. Experimentally, we compare Horspool's algorithm, Backward DAWG Matching, and Backward Oracle Matching on prototypical patterns of short length and provide statistics on the size of minimal DAAs for these computations.
      0 references
      pattern matching
      0 references
      analysis of algorithms
      0 references
      finite automaton
      0 references
      minimization
      0 references
      deterministic arithmetic automaton
      0 references
      probabilistic arithmetic automaton
      0 references

      Identifiers