Text Search and Closure Properties CSCI 3130 Formal Languages and - PowerPoint PPT Presentation

Text Search and Closure Properties CSCI 3130 Formal Languages and Automata Theory Siu On CHAN Fall 2018 Chinese University of Hong Kong 1/28

Text Search

grep program grep -E regex file.txt n copies [ab]{2} one or more (cat)+ zero or one [ab]? star * (ab) concatenation [ab][12] cat|12 Searches for an occurrence of patterns matching a regular expression regex language meaning 2/28 union [abc] shorthand for a | b | c { cat , 12 } { a , b , c } { a1 , a2 , b1 , b2 } { ε, ab , abab , . . . } { ε, a , b } { cat , catcat , . . . } { aa , ab , ba , bb }

Searching with grep savoring Babbage grep -E '[abAB]{5}' words Words with 5 consecutive a or b unsavory savory's savory savors savoriest Words containing savories savorier savored savor's savor grep -E 'savou?r' words cd /usr/share/dict/ savor or savour 3/28

More grep commands . any symbol [a-d] anything in a range ^ beginning of line $ end of line grep -E '^a.pl.$' words 4/28

How do you look for Words that start in go and have another go grep -E '^go.*go' words grep -iE '([aeiouy].*){10}' words grep -iE '^[âeiouy]*$' words [^R] means “does not contain” grep -iE '^[âeiouy]*([aeiouy][âeiouy]*){10}$' words 5/28 Words with at least ten vowels? Words without any vowels? Words with exactly ten vowels?

How grep (could) work allowed Python, etc) Regular expression also supported in modern languages (C, Java, fjnds substring accept/reject output looks for substring matches whole input handling not allowed regular [ab]? , a+ , (cat){3} in grep in class differences input text fjle DFA NFA expression 6/28

Implementation of grep a{3} not containing ? [^aeiouy] n times n copies How do you handle expressions like 7/28 one or more (cat)+ zero or more [ab]? → ()|[ab] R ? → ε | R → (cat)(cat)* R + → RR ∗ → aaa R { n } → RR . . . R � ��

Closure properties

So L can be described by the regular expression Example 011 1 1 0 0 1 0 111 110 100 001 010 The language L of strings that end in 101 is regular 000 1 0 or has length 0, 1, or 2 000, 001, 010, 011, 100, 110 or 111 Hint: a string does not end in 101 if and only if it ends in How about the language L of strings that do not end in 101? 8/28 ( 0 + 1 ) ∗ 101

Example The language L of strings that end in 101 is regular How about the language L of strings that do not end in 101? Hint: a string does not end in 101 if and only if it ends in 000, 001, 010, 011, 100, 110 or 111 or has length 0, 1, or 2 8/28 ( 0 + 1 ) ∗ 101 So L can be described by the regular expression ( 0 + 1 ) ∗ ( 000 + 001 + 010 + 011 + 100 + 110 + 111 )+ ε +( 0 + 1 )+( 0 + 1 )( 0 + 1 )

Complement The complement L of a language L contains those strings that are not in L Examples or have length 0, 1, or 2 9/28 L = { w ∈ Σ ∗ | w / ∈ L } ( Σ = { 0 , 1 } ) L 1 = lang. of all strings that end in 101 L 1 = lang. of all strings that do not end in 101 = lang. of all strings that end in 000, …, 111 (but not 101) L 2 = lang. of 1 ∗ = { ε, 1 , 11 , 111 , . . . } L 2 = lang. of all strings that contain at least one 0 = lang. of the regular expression ( 0 + 1 ) ∗ 0 ( 0 + 1 ) ∗

Example The language L of strings that contain 101 is regular You can write a regular expression, but it is a lot of work! 10/28 ( 0 + 1 ) ∗ 101 ( 0 + 1 ) ∗ How about the language L of strings that do not contain 101?

Closure under complement If L is a regular language, so is L languages regular expression NFA DFA The DFA defjnition will be the most convenient here We assume L has a DFA, and show L also has a DFA 11/28 To argue this, we can use any of the equivalent defjnitions of regular

Arguing closure under complement Suppose L is regular, then it has a DFA M accepts L M reversed accepts strings not in L 12/28 Now consider the DFA M ′ with the accepting and rejecting states of

Can we do the same with an NFA? q 2 Not the complement! 1 0 0, 1 0 1 q 1 q 0 q 0 0, 1 0 1 q 2 q 1 13/28 ( 0 + 1 ) ∗ 10

Can we do the same with an NFA? q 1 Not the complement! 0, 1 0 1 q 2 q 0 q 0 0, 1 0 1 q 2 q 1 13/28 ( 0 + 1 ) ∗ 10 ( 0 + 1 ) ∗

Intersection L 14/28 Examples: L The intersection L ∩ L ′ is the set of strings that are in both L and L ′ L ′ L ∩ L ′ 1 ∗ 1 ∗ 11 ( 0 + 1 ) ∗ 11 L ∩ L ′ L ′ 1 ∗ ( 0 + 1 ) ∗ 10 ∅ If L and L ′ are regular, is L ∩ L ′ also regular?

Closure under intersection languages regular expression NFA DFA 15/28 If L and L ′ are regular languages, so is L ∩ L ′ To argue this, we can use any of the equivalent defjnitions of regular Suppose L and L ′ have DFAs, call them M and M ′ Goal: construct a DFA (or NFA) for L ∩ L ′

r 0 s 0 r 0 s 1 r 1 s 0 r 1 s 1 Example 0 1 1 0 0 0 0 1 1 1 0 1 r 1 r 0 L (even number of 0s) M 1 0 1 0 s 1 s 0 16/28 L ′ (odd number of 1s) M ′ L ∩ L ′ = lang. of even number of 0s and odd number of 1s

Example 1 1 1 0 0 0 0 1 1 0 0 1 r 1 r 0 s 0 s 1 0 1 0 1 M L (even number of 0s) 16/28 L ′ (odd number of 1s) M ′ r 0 , s 0 r 0 , s 1 r 1 , s 0 r 1 , s 1 L ∩ L ′ = lang. of even number of 0s and odd number of 1s

Closure under intersection start states F for M states accepting 17/28 states M and M ′ DFA for L ∩ L ′ Q × Q ′ = { ( r 1 , s 1 ) , ( r 1 , s 2 ) , Q = { r 1 , . . . , r s } Q ′ = { s 1 , . . . , s m } . . . , ( r 2 , s 1 ) , . . . , ( r n , s m ) } ( r i , s j ) r i for M s j for M ′ F × F ′ = { ( r i , s j ) | r i ∈ F , s j ∈ F ′ } F ′ for M ′ Whenever M is in state r i and M ′ is in state s j , the DFA for L ∩ L ′ will be in state ( r i , s j )

Closure under intersection transitions r i r j a s k a a 18/28 DFA for L ∩ L ′ M and M ′ r j , s ℓ r i , s k s ℓ

Reversal reversing all its strings 19/28 The reversal w R of a string w is w written backwards w R = god w = dog The reversal L R of a language L is the language obtained by L R = { god , raw , level } L = { dog , war , level }

Reversal of regular languages L is regular and has regex How about L R ? It is regular and represented by 20/28 L = language of all strings that end in 01 ( 0 + 1 ) ∗ 01 This is the language of all strings beginning in 10 10 ( 0 + 1 ) ∗

Closure under reversal If L is a regular language, so is L R How do we argue? regular expression NFA DFA 21/28

Arguing closure under reversal Take a regular expression E for L A regular expression can be of the following types: • alphabet symbols like a and b 22/28 We will fjnd a regular expression E R representing L R • special symbols ∅ and ε • union, concatenation, or star of simpler expressions

Inductive proof of closure under reversal E R 1 1 E R E 1 E 2 2 Regular expression E 23/28 a a reversal E R ∅ ∅ ε ε E 1 + E 2 1 + E R 2 E R E ∗ ( E R 1 ) ∗

Duplication? Example: 24/28 L DUP = { ww | w ∈ L } L = { cat , dog } L DUP = { catcat , dogdog } If L is regular, is L DUP also regular?

Attempts Let’s try regular expression L a b L DUP aa bb LL aa ab ba bb Let’s try NFA q 0 NFA for L NFA for L q 1 25/28 L DUP ? = L 2

Attempts Let’s try regular expression Let’s try NFA q 0 NFA for L NFA for L q 1 25/28 L = { a , b } L DUP = { aa , bb } L DUP ? = L 2 LL = { aa , ab , ba , bb } ε ε ε

An example ( L is regular) Let’s design an NFA for L DUP 26/28 L = language of 0 ∗ 1 L = { 1 , 01 , 001 , 0001 , . . . } L DUP = { 11 , 0101 , 001001 , 00010001 , . . . } = { 0 n 10 n 1 | n � 0 }

Next lecture: will show that languages like L DUP are not regular An example 0001 Seems to require infjnitely many states! 0 0 0 1 001 1 01 1 1 1 27/28 L DUP = { 11 , 0101 , 001001 , 00010001 , . . . } = { 0 n 10 n 1 | n � 0 } 0 …

An example 001 Seems to require infjnitely many states! 0 0 0 0001 1 1 01 1 1 1 27/28 L DUP = { 11 , 0101 , 001001 , 00010001 , . . . } = { 0 n 10 n 1 | n � 0 } 0 … Next lecture: will show that languages like L DUP are not regular

Backreferences in grep Advanced feature in grep and other “regular expression” libraries grep -E '^(.*)\1$' words Standard “regular expression” libraries can accept irregular languages (as defjned in this course)! 28/28 the special expression \1 refers to the substring specifjed by (.*) (.*)\1 looks for a repeated substring, e.g. mama ^(.*)\1$ accepts the language L DUP

Text Search and Closure Properties CSCI 3130 Formal Languages and - PowerPoint PPT Presentation

Text Search and Closure Properties CSCI 3130 Formal Languages and Automata Theory Siu On CHAN Fall 2018 Chinese University of Hong Kong 1/28 Text Search grep program grep -E regex file.txt n copies [ab]{2} one or more (cat)+ zero or one

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

Closure, Properties and Closure Properties of Multirelations Rudolf Berghammer Walter Guttmann

Theory of Computer Science C4. Regular Languages: Closure Properties and Decidability Malte

Landfill Closure SWANA NLC Conference May 2017 CLOSURE DISCUSSIONS What is Final Closure?

CONTENT TITLE Insert Subtitle Here Enter Text Here Enter Text Here Enter Text Here

Closure Properties of Regular Languages Union, Intersection, Difference, Concatenation, Kleene

Now our picture looks like Decision and Closure Context Free Languages Properties of CFLs

Inf2A: Closure Properties of Regular Languages Stuart Anderson School of Informatics University

Post-Conference Presentation Sunday Oladayo Oladejo Table of Content A Introduction B

CON MI NE CON MI NE CON MI NE CON MI NE CLOSURE & RECLAMATI ON CLOSURE & RECLAMATI ON

Gestalt Principles of Perception Closure Closure A form exhibits closure when its separate

Properties of Context-Free Languages Decision Properties Closure Properties 1 Summary of

Closure Properties of Regular Languages We show how to combine regular languages. Closure

Text Text #ICANN51 15 October 2014 Text Text IDN Root Zone LGR Sarmad Hussain IDN Program

Enhancing ICANN Text Accountability 26 June 2014 Text #ICANN50 Text #ICANN50 Text #ICANN50

Add Your Title Here Replace your text here! Replace your text here! Insert your title here 1

Basic Number Skills Key Vocabulary Numeral Number Value Cardinal Value Subsidising Step 1

KOACH Guilherme Cavalheiro Mockup Review Orange Team Quantifying Boxing 1. Vision 2. Contract

CSE 543 - Computer Security (Fall 2006) Lecture 16 - Network Security October 31, 2006 URL:

CSE 543 - Computer Security Lecture 20 - Firewalls November 8, 2007 URL:

FE Software Emulator Progress Nikola Whallon University of Washington April 14, 2017, LBNL

Knowledge Representation and Reasoning (Logic) George Konidaris gdk@cs.brown.edu Fall 2019

How a Seasonal Retailer Evolved fr from Promo Emails to Year-Round Content Marketing Mar arcia

Single Page Apps and the Future of History Michael Mahemoff 1 of 133 The App-fication of

Text Search and Closure Properties CSCI 3130 Formal Languages and - PowerPoint PPT Presentation

Text Search and Closure Properties CSCI 3130 Formal Languages and Automata Theory Siu On CHAN Fall 2018 Chinese University of Hong Kong 1/28 Text Search grep program grep -E regex file.txt n copies [ab]{2} one or more (cat)+ zero or one

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

Closure, Properties and Closure Properties of Multirelations Rudolf Berghammer Walter Guttmann

Theory of Computer Science C4. Regular Languages: Closure Properties and Decidability Malte

Landfill Closure SWANA NLC Conference May 2017 CLOSURE DISCUSSIONS What is Final Closure?

CONTENT TITLE Insert Subtitle Here Enter Text Here Enter Text Here Enter Text Here

Closure Properties of Regular Languages Union, Intersection, Difference, Concatenation, Kleene

Now our picture looks like Decision and Closure Context Free Languages Properties of CFLs

Inf2A: Closure Properties of Regular Languages Stuart Anderson School of Informatics University

Post-Conference Presentation Sunday Oladayo Oladejo Table of Content A Introduction B

CON MI NE CON MI NE CON MI NE CON MI NE CLOSURE &amp; RECLAMATI ON CLOSURE &amp; RECLAMATI ON

Gestalt Principles of Perception Closure Closure A form exhibits closure when its separate

Properties of Context-Free Languages Decision Properties Closure Properties 1 Summary of

Closure Properties of Regular Languages We show how to combine regular languages. Closure

Text Text #ICANN51 15 October 2014 Text Text IDN Root Zone LGR Sarmad Hussain IDN Program

Enhancing ICANN Text Accountability 26 June 2014 Text #ICANN50 Text #ICANN50 Text #ICANN50

Add Your Title Here Replace your text here! Replace your text here! Insert your title here 1

Basic Number Skills Key Vocabulary Numeral Number Value Cardinal Value Subsidising Step 1

KOACH Guilherme Cavalheiro Mockup Review Orange Team Quantifying Boxing 1. Vision 2. Contract

CSE 543 - Computer Security (Fall 2006) Lecture 16 - Network Security October 31, 2006 URL:

CSE 543 - Computer Security Lecture 20 - Firewalls November 8, 2007 URL:

FE Software Emulator Progress Nikola Whallon University of Washington April 14, 2017, LBNL

Knowledge Representation and Reasoning (Logic) George Konidaris gdk@cs.brown.edu Fall 2019

How a Seasonal Retailer Evolved fr from Promo Emails to Year-Round Content Marketing Mar arcia

Single Page Apps and the Future of History Michael Mahemoff 1 of 133 The App-fication of

CON MI NE CON MI NE CON MI NE CON MI NE CLOSURE & RECLAMATI ON CLOSURE & RECLAMATI ON