COMP3630/6360: Theory of Computation Semester 1, 2020 The Australian National University Regular Expressions 1 / 21
This Lecture Covers Chapter 3 of HMU: Regular Expressions and Languages � Introduction to regular expressions and regular languages � Equivalence of classes of regular languages and languages accepted � Algebraic laws of (abstract) regular expressions Additional Reading: Chapter 3 of HMU.
Regular Expressions and Languages Regular Expressions: Overview ∠ So far: DFAs, NFAs were given a machine-like description ∠ Regular expressions are user-friendly and declarative formulation ∠ Regular expressions find extensive use. ∠ Searching/finding strings/pattern matching or conformance in text-formatting systems (e.g., UNIX grep , egrep , fgrep ) ∠ Lexical analyzers (in compilers) use regular expressions to identify tokens (e.g., Lex , Flex ) ∠ In Web forms to (structurally) validate entries (passwords, dates, email IDs) ∠ A regular expression over an alphabet Σ is a string consisting of: ∠ symbols from Σ ∠ constants: ∅ , ǫ ∠ operators: + , ∗ ∠ parantheses: (, ) ∠ Regular expressions are defined inductively. 3 / 21
Regular Expressions and Languages Regular Expressions: Definition ∠ Regular expressions are defined inductively as follows: ∠ Basis: B1 ∅ and ǫ are regular expressions. B2 For each a ∈ Σ , a is a regular expression. ∠ Induction: If E and F are regular expressions, then: Option 1: HMU Approach Option 2: A ‘Precise’ Approach I1 so is E ∗ I1’ so is ( E ∗ ) I2 so is E + F I2’ so is ( E + F ) I3 so is EF I3’ so is ( EF ) I4 so is ( E ) . ∠ Only those generated by the above induction are regular. ∠ Remark 1 : Some authors/texts use | instead of + . HMU uses +. ∠ Remark 2 : All expressions generated by Option 2 are also generated by Option 1. I 1 + I 4 ⇒ I 1 ′ ; I 2 + I 4 ⇒ I 2 ′ ; I 3 + I 4 ⇒ I 3 ′ . ∠ Remark 3 : Some expressions are regular according to Option 1 but not Option 2. E.g., (( 0 )) , 0 + 11 ∗ 4 / 21
Regular Expressions and Languages Regular Expressions: Examples ∠ Let Σ = { 0 , 1 } . ∠ (((( 0 + 1 ) 1 ) ∗ ) 0 ) is a regular expression 0 + 11 ∗ 0 is a regular expression Rule Rule Expression Expression 0 (B2) 0 (B2) 1 (B2) 1 (B2) (0+1) (I2’) 0 + 1 (I2) ((0+1)1) (I3’) (I3) 0 + 11 (((0+1)1)*) (I1’) (I1) 0 + 11 ∗ (I3) ((((0+1)1)*)0) (I3’) 0 + 11 ∗ 0 5 / 21
Regular Expressions and Languages What do Regular Expressions Stand for? ∠ Each properly parenthesized regular expression E (i.e., a regular expression that is generated by Option 2) is a shorthand for a language. ∠ A language is said to be regular if it corresponds to a regular expression. This correspondence is defined by the following induction procedure: ∠ Basis: B1 L ( ∅ ) = ∅ ; (Empty Language) L ( ǫ ) = { ǫ } ; (Language with only the empty string) B2 L ( a ) = { a } , a ∈ Σ (Language with only the symbol a ) ∠ Induction: For any regular expressions E and F , I1’ L (( E ∗ )) = ( L ( E )) ∗ = { ǫ } ∪ L ( E ) ∪ L ( E ) 2 ∪ · · · (Kleene- ∗ closure of L ( E ) ) I2’ L (( E + F )) = L ( E ) ∪ L ( F ) (Union) I3’ L (( EF )) = L ( E ) L ( F ) (Concatenation) What if a regular expression is generated by Option 1? 6 / 21
Regular Expressions and Languages What if an Expression is not Bracketed Properly? ∠ Improperly parenthesized expressions lead to confusion, e.g., is 0 + 11 the same as ( 0 + 1 ) 1 or ( 0 + ( 11 )) ? ∠ Improperly parenthesized regular expressions (generated by Option 1) must be converted to properly parenthesized expressions ∠ We remove unwanted parentheses by replacing (( E )) by ( E ) inductively. ∠ Additionally, if E is a symbol or a constant, we replace it by E + ∅ e.g., (( 0 )) ≡ ( 0 + ∅ ) , (( 0 + 1 )) ≡ ( 0 + 1 ) ∠ Apply precedence rules: ∠ First: ∗ applies to the smallest (properly bracketable) expression preceding ∗ , e.g., 01 ∗ ≡ ( 0 ( 1 ∗ )) ∠ Second: concatenation applies from left to right, e.g., 010 ≡ (( 01 ) 0 ) ∠ Third: + applies from left to right, e.g., a + b + c ≡ (( a + b ) + c ) Examples ∠ 0 + 11 ∗ ≡ ( 0 + ( 1 ( 1 ∗ ))) L ( 0 + 11 ∗ ) = L ((( 0 )) + 11 ∗ ) = ( L ( 0 ) ∪ ( L ( 1 ) L ( 1 ) ∗ ) = { 0 , 1 , 11 , 111 , 1111 , . . . } ∠ (( 0 )) + 11 ∗ ≡ (( 0 + ∅ ) + ( 1 ( 1 ∗ ))) 7 / 21
DFAs and Regular Languages Regular Languages: Some Basic Properties Theorem 3.2.1 Let w ∈ Σ ∗ . Then { w } is regular. Proof of Theorem 3.2.1 ∠ Languages { ǫ } and { a } for a ∈ Σ are regular (B1, B2). By a straightforward Induction argument, we can show that for any k ∈ N and w = s 1 · · · s k ∈ Σ k , { w } = L ( s 1 s 2 · · · s k ) . Theorem 3.2.2 Let L 1 and L 2 be regular languages. Then, L ∗ 1 , L 1 ∪ L 2 and L 1 L 2 are also regular. Proof of Theorem 3.2.2 ∠ Let L i = L ( E i ) for i = 1 , 2. Then, L ∗ 1 = L (( E ∗ 1 )) , L 1 ∪ L 2 = L (( E 1 + E 2 )) and L 1 L 2 = L (( E 1 E 2 )) . Since E ∗ 1 , ( E 1 + E 2 ) and ( E 1 E 2 ) are regular expressions, the claim holds. ∠ Corollary 1: The class of regular languages is closed under finite union and concatenation, i.e., if L 1 , . . . , L k are regular languages for any k ∈ N , then L 1 ∪ · · · L k and L 1 · · · L k are also regular languages. ∠ Corollary 2: Any finite language is regular. 8 / 21
DFAs and Regular Languages DFAs and Regular Languages Theorem 3.2.3 For every regular language M, there exists a DFA A such that M = L ( A ) . Proof of Theorem 3.2.3 ∠ WLOG, let Σ = { 0 , 1 } . Let M be a regular language. Then, M = L ( E ) for some regular expression E . ∠ For each regular expression, we will devise an ǫ -NFA. ∅ › ∠ Basis: 0 ; 1 A : A : 0 ; 1 q 0 q 1 q 0 q 1 1 0 q 2 q 2 A : 1 A : 0 1 0 q 0 q 1 q 0 q 1 9 / 21
DFAs and Regular Languages DFAs and Regular Languages Proof of Theorem 3.2.3 (Cont’d) ∠ Induction I1’: ( E ∗ ) › E E › . . . › . . . › 10 / 21
DFAs and Regular Languages DFAs and Regular Languages Proof of Theorem 3.2.3 (Cont’d) ∠ Induction I2’: E ( E + F ) E . . . . . . › F › F . . . . . . 11 / 21
DFAs and Regular Languages DFAs and Regular Languages Proof of Theorem 3.2.1 (Cont’d) ∠ Induction I3’: E . . . (EF) F E F . . . . . . . . . 12 / 21
DFAs and Regular Languages So Far... Languages accepted by Regular Languages DFAs, NFAs, › -NFAs Finite languages ∠ Is the inclusion strict? ∠ Are there languages accepted by DFAs that are not regular? 13 / 21
DFAs and Regular Languages DFAs and Regular Languages Theorem 3.2.4 For every DFA A, there is a regular expression E such that L ( A ) = L ( E ) . Proof of Theorem 3.2.4 ∠ Let DFA A = ( Q , Σ , δ, q 0 , F ) be given. ∠ Let us rename the states so that Q = { q 0 , q 1 , q 2 , . . . , q n − 1 ) . ∠ For any string s 1 . . . s k ∈ L ( A ) , there is a path s 1 s 2 s k − → q i 1 − → q i 2 · · · − → q i k ∈ F q 0 ∠ Define: R ( i , j , k ) be the set of all input strings that move the internal state of A from q i to q j using paths whose intermediate nodes comprise only of q ℓ , ℓ < k . States q k ,. . . , q n − 1 q j q i States q 0 ,. . . , q k − 1 ∠ Idea: prove that (a) each R ( i , j , k ) is regular, and (b) L ( A ) is a union of R ( i , j , k ) ’s. 14 / 21
DFAs and Regular Languages DFAs and Regular Languages Proof of Theorem 3.2.4 (Cont’d) ∠ Note that L ( A ) = � R ( 0 , j , n ) . (i.e., paths that start in q 0 and end in an accepting j : q j ∈ F state with intermediate nodes q 0 , q 1 , . . . , q n − 1 (all nodes)) ∠ L ( A ) will be regular if each R ( i , j , k ) to be regular. We now proceed by induction to show that each R ( i , j , k ) is regular. ∠ Basis: Consider R ( i , j , 0 ) for i , j ∈ { 0 , 1 , . . . , n − 1 } . ∠ R ( i , j , 0 ) consists of strings whose corresponding paths start in q i and end in q j with intermediate nodes q ℓ , ℓ < 0. ⇒ No intermediate nodes ⇒ R ( i , j , 0 ) contains strings that change state q i to q j directly ⇒ R ( i , j , 0 ) ⊆ { ǫ } ∪ Σ ⇒ R ( i , j , 0 ) is a regular language [Corollary 2] ∠ Induction: Let R ( i , j , ℓ ) be regular for i , j ∈ { 0 , . . . , n − 1 } and 0 ≤ ℓ < k . Consider R ( i , j , k ) for i , j ∈ { 0 , . . . , n − 1 } . 15 / 21
DFAs and Regular Languages DFAs and Regular Languages Proof of Theorem 3.2.4 (Cont’d) ∠ The strings in R ( i , j , k ) correspond either to paths whose intermediate nodes belong to { q 0 , . . . , q k − 1 } . ∠ Partition R ( i , j , k ) as follows: Case (a): Strings whose paths do not have q k − 1 as an intermediate node. Case (b): Strings whose paths do pass through q k − 1 as an intermediate node. q i q j case (b) States q 0 ; : : : ; q k − 2 ∠ R ( i , j , k ) = { Case (a) strings } ∪ { Case (b) strings } . ∠ Case (a) Strings are exactly those in R ( i , j , k − 1 ) ∠ Hence, R ( i , j , k ) = R ( i , j , k − 1 ) ∪ { Case (b) strings } . 16 / 21
Recommend
More recommend