La reconnaissance des facteurs d'un langage fini dans un texte en temps linéaire. (Recognition of the factors of a finite language in a text in linear time) (Q1115203)

scientific article

Language	Label	Description	Also known as
English	La reconnaissance des facteurs d'un langage fini dans un texte en temps linéaire. (Recognition of the factors of a finite language in a text in linear time)	scientific article

Statements

instance of

scholarly article

0 references

title

La reconnaissance des facteurs d'un langage fini dans un texte en temps linéaire. (Recognition of the factors of a finite language in a text in linear time) (English)

0 references

author

Jean-Claude Spehner

0 references

published in

Theoretical Computer Science

0 references

publication date

1988

0 references

review text

First we give here an on-line construction of a transducer \({\mathfrak F}(L)\) which recognizes all the factors of a finite language L and positions each factor as a factor of a word from L. \({\mathfrak F}(L)\) can be twice smaller than the partial automaton introduced by \textit{A. Blumer}, \textit{J. Blumer}, \textit{D. Haussler}, \textit{R. McConnel} and \textit{A. Ehrenfeucht} [J. Assoc. Comput. Mach. 34, 578-595 (1987)] which recognizes the same words. Though, the complexity of the construction of \({\mathfrak F}(L)\) is in \(O(\| L\| \cdot (| A| +\min (| L|,\quad lg\max)))\) where \(| L|\) and \(| A|\) are respectively the cardinality of L and of its alphabet A, \(\| L\|\) is the sum of the lengths of the words from L and lgmax is the maximal length of these words and not in \(O(\| L\|).\) Then we build a second transducer \({\mathfrak F}'(L)\) which has the same states as \({\mathfrak F}(L)\) and which finds, for each factor u of L and each letter a of A such that ua is not a factor of L, the largest right factor of ua which is a factor of L. \({\mathfrak F}'(L)\) generalizes the transducer we have introduced for a unique word [Theor. Comput. Sci. 48, 35-52 (1986; Zbl 0626.68058)]. The determination of \({\mathfrak F}'(L)\) is in \(O(\| L\| \cdot | A|).\) By using the transducers \({\mathfrak F}(L)\) and \({\mathfrak F}'(L)\), we obtain an algorithm which finds all the occurrences of the factors of L in a text in time linear in the length of the text and independently of the cardinality of the alphabet of this text. This algorithm can be used in computing to find and modify a family of identifiers in a program. Linguists can also determine all the words of a same family or related to a same concept - paronym words may be eliminated.

0 references

MaRDI profile type

MaRDI publication profile

0 references

full work available at URL

https://doi.org/10.1016/0304-3975(88)90116-8

0 references

cites work

Efficient string matching

0 references

Q3859267

0 references