From regular expressions to deterministic automata (Q580983)

The main theorem allows an elegant algorithm to be refined into an efficient one. The elegant algorithm for constructing a finite automaton from a regular expression is based on `derivatives of ' regular expressions; the efficient algorithm is based on `marking of' regular expressions. Derivatives of regular expressions correspond to state transitions in finite automata. When a finite automaton makes a transition under input symbol a, a leading a is stripped from the remaining input. Correspondingly, if the input string is generated by a regular expression E, then the derivative of E by a generates the remaining input after a leading a is stripped. \textit{J. A. Brzozowski} [J. Assoc. Comput. Mach. 11, 481-494 (1964; Zbl 0225.94044)] used derivatives to construct finite automata; the state for expression E has a transition under a to the state for the derivative of E by a. This approach extends to regular expressions with new operators, including intersection and complement; however, explicit computation of derivatives can be expensive. Marking of regular expressions yields an expression with distinct input symbols. Following \textit{R. McNaughton} and \textit{H. Yamada} [IRE Trans. Electron. Comput. EC-9, 38-47 (1960; Zbl 0156.255)], we attach subscripts to each input symbol in an expression; \((ab+b)^*ba\) becomes \((a_ 1b_ 2+b_ 3)^*b_ 4a_ 5\). Conceptually, the efficient algorithm constructs an automaton for the marked expression. The marks on the transitions are then erased, resulting in a nondeterministic automaton for the original unmarked expression. This approach works for the usual operations of union, concatenation, and iteration; however, intersection and complement cannot be handled because marking and unmarking do not preserve the languages generated by regular expressions with these operators.

0 references

zbMATH Keywords

algorithm for constructing a finite automaton from a regular expression

0 references

efficient algorithm

0 references

Derivatives of regular expressions

0 references

marked expression

0 references

describes a project that uses

Esterel