La reconnaissance des facteurs d'un langage fini dans un texte en temps linéaire. (Recognition of the factors of a finite language in a text in linear time) (Q1115203)

From MaRDI portal
scientific article
Language Label Description Also known as
English
La reconnaissance des facteurs d'un langage fini dans un texte en temps linéaire. (Recognition of the factors of a finite language in a text in linear time)
scientific article

    Statements

    La reconnaissance des facteurs d'un langage fini dans un texte en temps linéaire. (Recognition of the factors of a finite language in a text in linear time) (English)
    0 references
    1988
    0 references
    First we give here an on-line construction of a transducer \({\mathfrak F}(L)\) which recognizes all the factors of a finite language L and positions each factor as a factor of a word from L. \({\mathfrak F}(L)\) can be twice smaller than the partial automaton introduced by \textit{A. Blumer}, \textit{J. Blumer}, \textit{D. Haussler}, \textit{R. McConnel} and \textit{A. Ehrenfeucht} [J. Assoc. Comput. Mach. 34, 578-595 (1987)] which recognizes the same words. Though, the complexity of the construction of \({\mathfrak F}(L)\) is in \(O(\| L\| \cdot (| A| +\min (| L|,\quad lg\max)))\) where \(| L|\) and \(| A|\) are respectively the cardinality of L and of its alphabet A, \(\| L\|\) is the sum of the lengths of the words from L and lgmax is the maximal length of these words and not in \(O(\| L\|).\) Then we build a second transducer \({\mathfrak F}'(L)\) which has the same states as \({\mathfrak F}(L)\) and which finds, for each factor u of L and each letter a of A such that ua is not a factor of L, the largest right factor of ua which is a factor of L. \({\mathfrak F}'(L)\) generalizes the transducer we have introduced for a unique word [Theor. Comput. Sci. 48, 35-52 (1986; Zbl 0626.68058)]. The determination of \({\mathfrak F}'(L)\) is in \(O(\| L\| \cdot | A|).\) By using the transducers \({\mathfrak F}(L)\) and \({\mathfrak F}'(L)\), we obtain an algorithm which finds all the occurrences of the factors of L in a text in time linear in the length of the text and independently of the cardinality of the alphabet of this text. This algorithm can be used in computing to find and modify a family of identifiers in a program. Linguists can also determine all the words of a same family or related to a same concept - paronym words may be eliminated.
    0 references

    Identifiers