today
play

TODAY Regular Expressions REs and NFAs NFA simulation NFA - PowerPoint PPT Presentation

BBM 202 - ALGORITHMS D EPT . OF C OMPUTER E NGINEERING R EGULAR E XPRESSIONS Acknowledgement: The course slides are adapted from the slides prepared by R. Sedgewick and K. Wayne of Princeton University. TODAY Regular Expressions


  1. 
 BBM 202 - ALGORITHMS D EPT . OF C OMPUTER E NGINEERING R EGULAR E XPRESSIONS Acknowledgement: The course slides are adapted from the slides prepared by R. Sedgewick 
 and K. Wayne of Princeton University.

  2. TODAY ‣ Regular Expressions ‣ REs and NFAs ‣ NFA simulation ‣ NFA construction ‣ Applications

  3. 
 
 Pattern matching Substring search. Find a single string in text. Pattern matching. Find one of a specified set of strings in text. Ex. [genomics] • Fragile X syndrome is a common cause of mental retardation. • Human genome contains triplet repeats of CGG or AGG , 
 bracketed by GCG at the beginning and CTG at the end. • Number of repeats is variable, and correlated with syndrome. GCG(CGG|AGG)*CTG pattern GCGGCGTGTGTGCGAGAGAGTGGGTTTAAAGCTGGCGCGGAGGCGGCTGGCGCGGAGGCTG text 3

  4. Syntax highlighting input output /************************************************************************* * Compilation: javac NFA.java HTML Ada * Execution: java NFA regexp text XHTML Asm LATEX * Dependencies: Stack.java Bag.java Digraph.java DirectedDFS.java Applescript * MediaWiki Awk ODF * % java NFA "(A*B|AC)D" AAAABD Bat TEXINFO * true Bib ANSI * Bison DocBook * % java NFA "(A*B|AC)D" AAAAC C/C++ * false C# * Cobol *************************************************************************/ Caml Changelog public class NFA { Css private Digraph G; // digraph of epsilon transitions D private String regexp; // regular expression Erlang private int M; // number of characters in regular expression Flex Fortran // Create the NFA for the given RE GLSL public NFA(String regexp) { Haskell this.regexp = regexp; Html M = regexp.length(); Java Stack<Integer> ops = new Stack<Integer>(); Javalog G = new Digraph(M+1); Javascript Latex Lisp GNU source-highlight 3.1.4 Lua ⋮ 4

  5. Google code search http://code.google.com/p/chromium/source/search 5

  6. 
 Pattern matching: applications Test if a string matches some pattern. • Process natural language. • Scan for virus signatures. • Specify a programming language. • Access information in digital libraries. • Search genome using PROSITE patterns. • Filter text (spam, NetNanny, Carnivore, malware). • Validate data-entry fields (dates, email, URL, credit card). 
 ... Parse text files. • Compile a Java program. • Crawl and index the Web. • Read in data stored in ad hoc input file format. • Create Java documentation from Javadoc comments. 
 ... 6

  7. Regular expressions A regular expression is a notation to specify a set of strings. a “language” operation order example RE matches does not match AABAAB AABAAB concatenation 3 every other string AA AA | BAAB or 4 every other string BAAB AA 
 AB AB*A closure 2 ABBBBBBBBA ABABA AAAAB 
 A(A|B)AAB every other string ABAAB parentheses 1 A 
 AA (AB)*A ABABABABABA ABBA 7

  8. 
 
 
 
 
 
 
 
 
 
 
 
 
 Regular expression shortcuts Additional operations are often added for convenience. operation example RE matches does not match CUMULUS SUCCUBUS .U.U.U. wildcard JUGULUM TUMULTUOUS word 
 camelCase 
 [A-Za-z][a-z]* character class Capitalized 4illegal ABCDE ADE A(BC)+DE at least 1 ABCBCDE BCDE 08540-1321 111111111 [0-9]{5}-[0-9]{4} exactly k 19072-5541 166-54-111 [^AEIOU]{6} RHYTHM DECADE complement Ex. [A-E]+ is shorthand for (A|B|C|D|E)(A|B|C|D|E)* 8

  9. 
 
 
 
 
 
 
 
 
 
 
 
 
 Regular expression examples RE notation is surprisingly expressive regular expression matches does not match .*SPB.* 
 RASPBERRY SUBSPACE CRISPBREAD SUBSPECIES ( substring search ) [0-9]{3}-[0-9]{2}-[0-9]{4} 166-11-4433 11-55555555 
 166-45-1111 8675309 ( Social Security numbers ) [a-z]+@([a-z]+\.)+(edu|com) wayne@princeton.edu 
 spam@nowhere rs@princeton.edu ( email addresses ) [$_A-Za-z][$_A-Za-z0-9]* ident3 3a PatternMatcher ident#3 ( Java identifiers ) REs plays a well-understood role in the theory of computation. 9

  10. Can the average web surfer learn to use REs? Google. Supports * for full word wildcard and | for union. 10

  11. Regular expressions to the rescue http://xkcd.com/208 11

Recommend


More recommend