Approximate Search of Regular Expressions Using Bit-Parallel - PowerPoint PPT Presentation

Approximate Search of Regular Expressions Using Bit-Parallel Algorithms Kristo Tammeoja Jaak Vilo Teooriapäevad Rõuges, 2007

Contents � Regular expression (RE) syntax � Glushkov’s automaton � Existing bit-parallel algorithms � Exact matching � Approximate matching � New feature added � Error-free regions 2

Regular expression � Syntax � (, ) � | � Quantifier � *, +, ?, {m,n}, {m,} � Character classes (example [a-z]) 3

Regular expression � Syntax � (, ) � | � Quantifier � *, +, ?, {m,n}, {m,} � Character classes (example [a-z]) � Matching as used in presentation � Regular expression A* � AAAAA match � BAAAC no match 4

Regular expression 1 error allowed R(E|G)(EX) * 1:R(E|G)(EX)* 1:R<E|G>(EX)* 1:R(E|G)<EX>* 5

Regular expression 1 error allowed R(E|G)(EX) * 1:R(E|G)(EX)* 1:R<E|G>(EX)* R E R G R E E X 1:R(E|G)<EX>* R G E X R E E X E X 6

Regular expression 1 error allowed R(E|G)(EX) * 1:R(E|G)(EX)* 1:R<E|G>(EX)* R R R E subst. R G R G del. R E E X R E X E 1:R(E|G)<EX>* R G E X R E G E X R E E X E X R E E E X E X ins. R E E R X E X 7

Regular expression 1 error allowed R(E|G)(EX) * 1:R(E|G)(EX)* 1:R<E|G>(EX)* R R R E no match subst. R G R no match G del. R E E X R E X match E 1:R(E|G)<EX>* R G E X R E G E X R E E X E X R E E E X E X ins. R E E R X E X 8

Regular expression 1 error allowed R(E|G)(EX) * 1:R(E|G)(EX)* 1:R<E|G>(EX)* R R R E no match subst. R G R no match G del. R E E X R E X match E 1:R(E|G)<EX>* R G E X R E G E X match R E E X E X R E E E X E X ins. match R E E R X E X no match 9

Glushkov’s automaton R ( E | G ) ( E X ) * 10

Glushkov’s automaton � Character in RE = state in automaton R ( E | G ) ( E X ) * R E G E X 11

Glushkov’s automaton � Character in RE = state in automaton + one state for the beginning of the RE R ( E | G ) ( E X ) * R E G E X 12

Glushkov’s automaton � Character in RE = state in automaton + one state for the beginning of the RE � Transitions show which characters/positions can precede each other R ( E | G ) ( E X ) * R E G E X R... 13

Glushkov’s automaton � Character in RE = state in automaton + one state for the beginning of the RE � Transitions show which characters/positions can precede each other R ( E | G ) ( E X ) * R R E G E X R... 14

Glushkov’s automaton � Character in RE = state in automaton + one state for the beginning of the RE � Transitions show which characters/positions can precede each other R ( E | G ) ( E X ) * R R E G E X R ... 15

Glushkov’s automaton � Character in RE = state in automaton + one state for the beginning of the RE � Transitions show which characters/positions can precede each other R ( E | G ) ( E X ) * E R R E G E X R E... R G... G 16

Glushkov’s automaton � Character in RE = state in automaton + one state for the beginning of the RE � Transitions show which characters/positions can precede each other R ( E | G ) ( E X ) * E R R E G E X RE... G 17

Glushkov’s automaton � Character in RE = state in automaton + one state for the beginning of the RE � Transitions show which characters/positions can precede each other R ( E | G ) ( E X ) * E E R R E G E X R E E... G 18

Glushkov’s automaton � Character in RE = state in automaton + one state for the beginning of the RE � Transitions show which characters/positions can precede each other R ( E | G ) ( E X ) * E E R R E G E X R G E... E G 19

Glushkov’s automaton � Character in RE = state in automaton + one state for the beginning of the RE � Transitions show which characters/positions can precede each other R ( E | G ) ( E X ) * E E R R E G E X RG E X... E X G 20

Glushkov’s automaton � Character in RE = state in automaton + one state for the beginning of the RE � Transitions show which characters/positions can precede each other R ( E | G ) ( E X ) * E E R E R E G E X RGE X E... E X G 21

Glushkov’s automaton � Character in RE = state in automaton + one state for the beginning of the RE � Transitions show which characters/positions can precede each other R ( E | G ) ( E X ) * E E R E R E G E X E X G 22

Glushkov’s automaton � All labels entering a node are labeled by the same character R ( E | G ) ( E X ) * E E R E R E G E X E X G 23

Glushkov’s automaton � All labels entering a node are labeled by the same character R ( E | G ) ( E X ) * E E R E R E G E X E X G 24

Glushkov’s automaton � All labels entering a node are labeled by the same character for example after reading character ‘E’ only states with label ‘E’ can be active E E R E R E G E X E X G 25

Exact search � Simulation of NFA = changing active states based on the character read from the text � We use bit-vectors (one bit for each state) to hold active states δ (D, a) � D – bit-vector of active states � a – character read � Returns new bit-vector � 2 |D| · | Σ | different sets of parameters � |D| – number of states in automaton � | Σ | - alphabet's size 26

Exact search � “ After reading character ‘E’ only states with label ‘E’ can be active ” so ... � δ (D, a) = T[D] & B[a] � T[ D ] – states that can be reached from states in D by any character � B[ a ] – states that can be reached by character a 27

Exact search � δ (D, a) = T[D] & B[a] A B C AA|AB|AC A A A B A C A A A a B[a] D T[D] ‘A’ 0111010 1000000 ‘B’ 0100000 ‘C’ ... 0101010 ... 28

Exact search � δ (D, a) = T[D] & B[a] A B C AA|AB|AC A A A B A C A A A a B[a] D T[D] ‘A’ 0111010 1000000 ‘B’ 0000100 0100000 ‘C’ ... 0101010 ... 29

Exact search � δ (D, a) = T[D] & B[a] A B C AA|AB|AC A A A B A C A A A a B[a] D T[D] ‘A’ 0111010 1000000 ‘B’ 0000100 0100000 ‘C’ 0000001 ... 0101010 ... 30

Exact search � δ (D, a) = T[D] & B[a] A B C AA|AB|AC A A A B A C A A A a B[a] D T[D] ‘A’ 0111010 1000000 0101010 ‘B’ 0000100 0100000 ‘C’ 0000001 ... 0101010 ... 31

Exact search � δ (D, a) = T[D] & B[a] A B C AA|AB|AC A A A B A C A A A a B[a] D T[D] ‘A’ 0111010 1000000 0101010 ‘B’ 0000100 0100000 0010000 ‘C’ 0000001 ... 0101010 ... 32

Exact search � δ (D, a) = T[D] & B[a] A B C AA|AB|AC A A A B A C A A A a B[a] D T[D] ‘A’ 0111010 1000000 0101010 ‘B’ 0000100 0100000 0010000 ‘C’ 0000001 ... 0101010 0010101 ... 33

Exact search � δ (D, a) = T[D] & B[a] A B C AA|AB|AC A A A B A C A A A δ (0101010, ‘A’) a B[a] D T[D] ‘A’ 0111010 1000000 0101010 ‘B’ 0000100 0100000 0010000 ‘C’ 0000001 ... 0101010 0010101 ... 34

Exact search � δ (D, a) = T[D] & B[a] A B C AA|AB|AC A A A B A C A A A δ (0101010, ‘A’) a B[a] D T[D] 0010101 T[D] ‘A’ 0111010 1000000 0101010 & 0111010 B[a] ‘B’ 0000100 0100000 0010000 ‘C’ 0000001 ... 0010000 0101010 0010101 ... 35

Exact search D ← 100..00 // initial state active F ← bit-vector of final states For pos ∈ 1 ... n Do // scanning text D ← T[D] & B[t pos ] If D & F ≠ 000..00 Then match End of For 36

Approximate search Errors � Insertion � Deletion � Substitution 37

Approximate search � When searching with k errors we make k+1 replicas of the automaton, one for each error-level � Plus we need transitions for errors R E G E X No errors R E G E X ? ? ? ? ? R E G E X Up to 1 error R E G E X 38

Approximate search � R 0 , R 1 – current bit-vectors � R 0 ’, R 1 ’ – bit-vectors after processing character a R 0 ’ = T[R 0 ] & B[c] R 1 ’ = ? 39

Approximate search R 1 ’ = T[R 1 ] & B[c] | ... no errors � Same as in exact search E GEX R E G E X No errors R E G E X R E G E X Up to 1 error R E G E X 40

Approximate search R 1 ’ = T[R 1 ] & B[c] | R 0 | ... no errors del � Active states remain the same R A EGEX R E G E X No errors R E G E X Σ Σ Σ Σ Σ Σ R E G E X Up to 1 error R E G E X 41

Approximate search R 1 ’ = T[R 1 ] & B[c] | R 0 | T[R 0 ’] | ... no errors del ins � Insert new character after the current one � Just one step in automaton R E EX R E G E X No errors R E G E X ε ε ε ε ε Σ Σ Σ Σ Σ Σ R E G E X Up to 1 error R E G E X 42

Approximate Search of Regular Expressions Using Bit-Parallel - PowerPoint PPT Presentation

Approximate Search of Regular Expressions Using Bit-Parallel Algorithms Kristo Tammeoja Jaak Vilo Teooriapevad Ruges, 2007 Contents Regular expression (RE) syntax Glushkovs automaton Existing bit-parallel algorithms Exact

Regular Expressions (REs) Regular Expressions (REs) p.1/37 Expressions In arithmetic:

Objectives You should be able to ... Regular Languages Use the syntax of regular expressions

Regexp Lecture 26: Regular Expressions Regular Expressions Regular expressions are a small

Regular Expressions A regular expression describes a language using three operations. Regular

C++0x Regular Expressions Simon Andreas Frimann Lund Datalogisk Institut Kbenhavns

Regular Expressions = Regular Languages Mark Greenstreet, CpSc 421, Term 1, 2008/09 17

Theory of Computer Science C3. Regular Languages: Regular Expressions, Pumping Lemma Malte

Chapter 7 Expressions and Statements Expressions Arithmetic Expressions Conditional

Kleene Algebras: The Algebra of Regular Expressions Adam Braude University of Puget Sound May

CS/COE 1520 pitt.edu/~ach54/cs1520 Regular expressions Regular expressions Formally:

Regular Expressions in .NET Regular Expressions in .NET By: Nasser Alshammari College of

Regular Expressions Regular Expressions and Automata and Automata Berlin Chen 2003 References:

Regular Expressions for Linguists: A Life Skill . Michael Yoshitaka Erlewine mitcho@mitcho.com

Regular Expressions Dr. Mattox Beckman University of Illinois at Urbana-Champaign Department of

Regular Languages Dr. Mattox Beckman University of Illinois at Urbana-Champaign Department of

Fem Poble(s): Expressions Meritxell (Txell) Martn Pardo, Ph.D Research associate Data

Design Patterns & Refactoring Flyweight Oliver Haase HTWG Konstanz Oliver Haase (HTWG

Reconstructing ancestral sequences through a combined bioinformatics and molecular modelling

Compiling Techniques Lecture 4: Automatic Lexer Generation (EaC 2.4) Christophe Dubach 27

Phylogenetics: Parsimony and Likelihood COMP 571 - Spring 2015 Luay Nakhleh, Rice University

NN-Correlations in the spin symmetry energy of neutron matter Symmetry energy of nuclear

Quantum Monte Carlo calculations for light nuclei using chiral forces Joel Lynn Theoretical

Pluto and Charon From SINFONI Observations Francesca DeMeo and Christophe Dumas June 17, 2008

HybridUML Profile for UML 2.0 Kirsten Berkenktter Stefan Bisanz Ulrich Hannemann Jan

Approximate Search of Regular Expressions Using Bit-Parallel - PowerPoint PPT Presentation

Approximate Search of Regular Expressions Using Bit-Parallel Algorithms Kristo Tammeoja Jaak Vilo Teooriapevad Ruges, 2007 Contents Regular expression (RE) syntax Glushkovs automaton Existing bit-parallel algorithms Exact

Regular Expressions (REs) Regular Expressions (REs) p.1/37 Expressions In arithmetic:

Objectives You should be able to ... Regular Languages Use the syntax of regular expressions

Regexp Lecture 26: Regular Expressions Regular Expressions Regular expressions are a small

Regular Expressions A regular expression describes a language using three operations. Regular

C++0x Regular Expressions Simon Andreas Frimann Lund Datalogisk Institut Kbenhavns

Regular Expressions = Regular Languages Mark Greenstreet, CpSc 421, Term 1, 2008/09 17

Theory of Computer Science C3. Regular Languages: Regular Expressions, Pumping Lemma Malte

Chapter 7 Expressions and Statements Expressions Arithmetic Expressions Conditional

Kleene Algebras: The Algebra of Regular Expressions Adam Braude University of Puget Sound May

CS/COE 1520 pitt.edu/~ach54/cs1520 Regular expressions Regular expressions Formally:

Regular Expressions in .NET Regular Expressions in .NET By: Nasser Alshammari College of

Regular Expressions Regular Expressions and Automata and Automata Berlin Chen 2003 References:

Regular Expressions for Linguists: A Life Skill . Michael Yoshitaka Erlewine mitcho@mitcho.com

Regular Expressions Dr. Mattox Beckman University of Illinois at Urbana-Champaign Department of

Regular Languages Dr. Mattox Beckman University of Illinois at Urbana-Champaign Department of

Fem Poble(s): Expressions Meritxell (Txell) Martn Pardo, Ph.D Research associate Data

Design Patterns &amp; Refactoring Flyweight Oliver Haase HTWG Konstanz Oliver Haase (HTWG

Reconstructing ancestral sequences through a combined bioinformatics and molecular modelling

Compiling Techniques Lecture 4: Automatic Lexer Generation (EaC 2.4) Christophe Dubach 27

Phylogenetics: Parsimony and Likelihood COMP 571 - Spring 2015 Luay Nakhleh, Rice University

NN-Correlations in the spin symmetry energy of neutron matter Symmetry energy of nuclear

Quantum Monte Carlo calculations for light nuclei using chiral forces Joel Lynn Theoretical

Pluto and Charon From SINFONI Observations Francesca DeMeo and Christophe Dumas June 17, 2008

HybridUML Profile for UML 2.0 Kirsten Berkenktter Stefan Bisanz Ulrich Hannemann Jan

Design Patterns & Refactoring Flyweight Oliver Haase HTWG Konstanz Oliver Haase (HTWG