Java II Finite Automata I Bernd Kiefer Bernd.Kiefer@dfki.de Deutsches Forschungszentrum f¨ ur k¨ unstliche Intelligenz Finite Automata I – p.1/13
Processing Regular Expressions We already learned about Java’s regular expression functionality Now we get to know the machinery behind Pattern and Matcher classes Compiling a regular expression into a Pattern object produces a Finite Automaton This automaton is then used to perform the matching tasks We will see how to construct a finite automaton that recognizes an input string, i.e., tries to find a full match Finite Automata I – p.2/13
Definition: Finite Automaton A finite automaton (FA) is a tuple A = < Q, Σ , δ, q 0 , F > Q a finite non-empty set of states Σ a finite alphabet of input letters δ a (total) transition function Q × Σ − → Q q 0 ∈ Q the initial state F ⊆ Q the set of final (accepting) states Transition graphs (diagrams): states transition initial state final state g d o q 0 q 1 q 2 q 3 Finite Automata I – p.3/13
Finite Automata: Matching A finite automaton accepts a given input string s if there is a sequence of states p 1 , p 2 , . . . , p | s | ∈ Q such that 1. p 1 = q 0 , the start state 2. δ ( p i , s i ) = p i +1 , where s i is the i -th character in s 3. p | s | ∈ F , i.e., a final state A string is successfully matched if we have found the appropriate sequence of states Imagine the string on an input tape with a pointer that is advanced when using a δ transition The set of strings accepted by an automaton is the accepted language , analogous to regular expressions Finite Automata I – p.4/13
(Non)deterministic Automata in the definition of automata, δ was a total function ⇒ given an input string, the path through the automaton is uniquely determined those automata are therefore called deterministic for nondeterministic FA , δ is a transition relation δ : Q × Σ ∪ { ǫ } − → P ( Q ) , where P ( Q ) is the powerset of Q allows transitions from one state into several states with the same input symbol need not be total can have transitions labeled ǫ (not in Σ ), which represents the empty string Finite Automata I – p.5/13
RegExps − → Automata Construct nondeterminstic automata from regular expressions q fα ( αβ ) . . . . . . q fβ q 0 α q 0 β ǫ ǫ ( α | β ) . . . q fα q 0 β . . . q fβ q f q 0 q 0 α ǫ ǫ ǫ ( α ) ∗ . . . q fα q f q 0 q 0 α ǫ ǫ ǫ Finite Automata I – p.6/13
NFA vs. DFA Traversing a DFA is easy given the input string: the path is uniquely determined In contrast, traversing an NFA requires keeping track of a set of (current) states, starting with the set { q o } Processing the next input symbol means taking all possible outgoing transitions from this set and collecting the new set From every NFA, an equivalent DFA (one which does accept the same language), can be computed Basic Idea: track the subsets that can be reached for every possible input Finite Automata I – p.7/13
Traversing an NFA ǫ a 3 2 ǫ ǫ a b ǫ ǫ 0 1 6 7 8 9 ǫ ǫ b 4 5 ǫ abab Finite Automata I – p.8/13
Traversing an NFA ǫ a 3 2 2 ǫ ǫ a b ǫ ǫ 0 0 1 1 6 7 7 8 9 ǫ ǫ b 4 4 5 ǫ abab Finite Automata I – p.8/13
Traversing an NFA ǫ a 3 3 2 2 ǫ ǫ a b ǫ ǫ 0 1 1 6 6 7 7 8 8 9 ǫ ǫ b 4 4 5 ǫ abab Finite Automata I – p.8/13
Traversing an NFA ǫ a 3 2 2 ǫ ǫ a b ǫ ǫ 0 1 1 6 6 7 7 8 9 9 ǫ ǫ b 4 4 5 5 ǫ abab Finite Automata I – p.8/13
Traversing an NFA ǫ a 3 3 2 2 ǫ ǫ a b ǫ ǫ 0 1 1 6 6 7 7 8 8 9 ǫ ǫ b 4 4 5 ǫ abab Finite Automata I – p.8/13
Traversing an NFA ǫ a 3 2 2 ǫ ǫ a b ǫ ǫ 0 1 1 6 6 7 7 8 9 9 ǫ ǫ b 4 4 5 5 ǫ abab Finite Automata I – p.8/13
NFA − → DFA: Subset Construction Simulate “in parallel” all possible moves the automaton can make The states of the resulting DFA will represent sets of states of the NFA, i.e., elements of P ( Q ) We use two operations on states/state-sets of the NFA Set of states reachable from any state s in T on ǫ -closure ( T ) on ǫ -transitions Set of states to which there is a transition from move ( T, a ) one state in T on input symbol a The final states of the DFA are those where the corresponding NFA subset contains a final state Finite Automata I – p.9/13
Algorithm: Subset Construction proc SubsetConstruction ( s 0 ) ≡ DFAStates = ǫ -closure ( { s 0 } ) while there is an unmarked state T in DFAStates do mark T for each input symbol a do U := ǫ -closure ( move ( T, a )) DFADelta [ T, a ] := U if U �∈ DFAStates then add U as unmarked state to DFAStates proc ǫ -closure ( T ) ≡ ǫ -closure := T ; to check := T while to check not empty do get some state t from to check for each state u with edge labeled ǫ from t to u if u �∈ ǫ -closure then add u to ǫ -closure and to check Finite Automata I – p.10/13
Example: Subset Construction ǫ a 2 3 ǫ ǫ a b ǫ ǫ 7 0 1 6 8 9 ǫ ǫ b 5 4 ǫ Finite Automata I – p.11/13
Example: Subset Construction ǫ a 2 2 3 ǫ ǫ a b ǫ ǫ 7 7 0 0 1 1 6 8 9 ǫ ǫ b 5 4 4 ǫ 0,1, 0,1, 2,4,7 2,4,7 Finite Automata I – p.11/13
Example: Subset Construction ǫ a 2 2 3 3 ǫ ǫ a b ǫ ǫ 7 7 0 1 1 6 6 8 8 9 ǫ ǫ b 5 4 4 ǫ 1,2,3 1,2,3 a 0,1, 4,6,7,8 4,6,7,8 2,4,7 Finite Automata I – p.11/13
Example: Subset Construction ǫ a 2 2 3 ǫ ǫ a b ǫ ǫ 7 7 0 1 1 6 6 8 9 ǫ ǫ b 5 5 4 4 ǫ 1,2,3 a 0,1, 4,6,7,8 2,4,7 b 1,2,4 1,2,4 5,6,7 5,6,7 Finite Automata I – p.11/13
Example: Subset Construction ǫ a 2 2 3 3 ǫ ǫ a b ǫ ǫ 7 7 0 1 1 6 6 8 8 9 ǫ ǫ b 5 4 4 ǫ 1,2,3 1,2,3 a 0,1, 4,6,7,8 4,6,7,8 2,4,7 a b 1,2,4 5,6,7 Finite Automata I – p.11/13
Example: Subset Construction ǫ a 2 2 3 ǫ ǫ a b ǫ ǫ 7 7 0 1 1 6 6 8 9 ǫ ǫ b 5 5 4 4 ǫ 1,2,3 a 0,1, 4,6,7,8 2,4,7 a b 1,2,4 1,2,4 b 5,6,7 5,6,7 Finite Automata I – p.11/13
Example: Subset Construction ǫ a 2 2 3 3 ǫ ǫ a b ǫ ǫ 7 7 0 1 1 6 6 8 8 9 ǫ ǫ b 5 4 4 ǫ a 1,2,3 1,2,3 a 0,1, 4,6,7,8 4,6,7,8 2,4,7 a b 1,2,4 b 5,6,7 Finite Automata I – p.11/13
Example: Subset Construction ǫ a 2 2 3 3 ǫ ǫ a b ǫ ǫ 7 7 0 1 1 6 6 8 8 9 9 ǫ ǫ b 5 4 4 ǫ a b 1,2,4 1,2,4 1,2,3 5,6,7,9 5,6,7,9 a 0,1, 4,6,7,8 2,4,7 a b 1,2,4 b 5,6,7 Finite Automata I – p.11/13
Example: Subset Construction ǫ a 2 2 3 3 ǫ ǫ a b ǫ ǫ 7 7 0 1 1 6 6 8 8 9 9 ǫ ǫ b 5 4 4 ǫ a b 1,2,4 1,2,4 1,2,3 a 5,6,7,9 5,6,7,9 a 0,1, 4,6,7,8 2,4,7 a b b 1,2,4 b 5,6,7 Finite Automata I – p.11/13
Time/Space Considerations DFA traversal is linear to the length of input string x NFA needs O ( n ) space (states+transitions), where n is the length of the regular expression NFA traversal may need time n × | x | , so why use NFAs? Finite Automata I – p.12/13
Time/Space Considerations DFA traversal is linear to the length of input string x NFA needs O ( n ) space (states+transitions), where n is the length of the regular expression NFA traversal may need time n × | x | , so why use NFAs? There are DFA that have at least 2 n states! Finite Automata I – p.12/13
Time/Space Considerations DFA traversal is linear to the length of input string x NFA needs O ( n ) space (states+transitions), where n is the length of the regular expression NFA traversal may need time n × | x | , so why use NFAs? There are DFA that have at least 2 n states! Solution 1: “Lazy” construction of the DFA: construct DFA states on the fly up to a certain amount and cache them Finite Automata I – p.12/13
Time/Space Considerations DFA traversal is linear to the length of input string x NFA needs O ( n ) space (states+transitions), where n is the length of the regular expression NFA traversal may need time n × | x | , so why use NFAs? There are DFA that have at least 2 n states! Solution 1: “Lazy” construction of the DFA: construct DFA states on the fly up to a certain amount and cache them Solution 2: Try to minimize the DFA: There is a unique (modulo state names) minimal automaton for a regular language! Finite Automata I – p.12/13
Minimization Algorithm by Hopcroft proc Minimize () ≡ B 1 = F ; B 2 = Q F E = { B 1 , B 2 } k = 3 for a ∈ Σ do a ( i ) = { s ∈ Q | s ∈ B i ∧ ∃ t : δ ( t, a ) = s } L = the smaller of the a ( i ) while L � = ∅ do take some i ∈ L and delete it for j < k s.th. ∃ t ∈ B j Finite Automata I – p.13/13
Recommend
More recommend