approximate search of regular expressions using bit
play

Approximate Search of Regular Expressions Using Bit-Parallel - PowerPoint PPT Presentation

Approximate Search of Regular Expressions Using Bit-Parallel Algorithms Kristo Tammeoja Jaak Vilo Teooriapevad Ruges, 2007 Contents Regular expression (RE) syntax Glushkovs automaton Existing bit-parallel algorithms Exact


  1. Approximate Search of Regular Expressions Using Bit-Parallel Algorithms Kristo Tammeoja Jaak Vilo Teooriapäevad Rõuges, 2007

  2. Contents � Regular expression (RE) syntax � Glushkov’s automaton � Existing bit-parallel algorithms � Exact matching � Approximate matching � New feature added � Error-free regions 2

  3. Regular expression � Syntax � (, ) � | � Quantifier � *, +, ?, {m,n}, {m,} � Character classes (example [a-z]) 3

  4. Regular expression � Syntax � (, ) � | � Quantifier � *, +, ?, {m,n}, {m,} � Character classes (example [a-z]) � Matching as used in presentation � Regular expression A* � AAAAA match � BAAAC no match 4

  5. Regular expression 1 error allowed R(E|G)(EX) * 1:R(E|G)(EX)* 1:R<E|G>(EX)* 1:R(E|G)<EX>* 5

  6. Regular expression 1 error allowed R(E|G)(EX) * 1:R(E|G)(EX)* 1:R<E|G>(EX)* R E R G R E E X 1:R(E|G)<EX>* R G E X R E E X E X 6

  7. Regular expression 1 error allowed R(E|G)(EX) * 1:R(E|G)(EX)* 1:R<E|G>(EX)* R R R E subst. R G R G del. R E E X R E X E 1:R(E|G)<EX>* R G E X R E G E X R E E X E X R E E E X E X ins. R E E R X E X 7

  8. Regular expression 1 error allowed R(E|G)(EX) * 1:R(E|G)(EX)* 1:R<E|G>(EX)* R R R E no match subst. R G R no match G del. R E E X R E X match E 1:R(E|G)<EX>* R G E X R E G E X R E E X E X R E E E X E X ins. R E E R X E X 8

  9. Regular expression 1 error allowed R(E|G)(EX) * 1:R(E|G)(EX)* 1:R<E|G>(EX)* R R R E no match subst. R G R no match G del. R E E X R E X match E 1:R(E|G)<EX>* R G E X R E G E X match R E E X E X R E E E X E X ins. match R E E R X E X no match 9

  10. Glushkov’s automaton R ( E | G ) ( E X ) * 10

  11. Glushkov’s automaton � Character in RE = state in automaton R ( E | G ) ( E X ) * R E G E X 11

  12. Glushkov’s automaton � Character in RE = state in automaton + one state for the beginning of the RE R ( E | G ) ( E X ) * R E G E X 12

  13. Glushkov’s automaton � Character in RE = state in automaton + one state for the beginning of the RE � Transitions show which characters/positions can precede each other R ( E | G ) ( E X ) * R E G E X R... 13

  14. Glushkov’s automaton � Character in RE = state in automaton + one state for the beginning of the RE � Transitions show which characters/positions can precede each other R ( E | G ) ( E X ) * R R E G E X R... 14

  15. Glushkov’s automaton � Character in RE = state in automaton + one state for the beginning of the RE � Transitions show which characters/positions can precede each other R ( E | G ) ( E X ) * R R E G E X R ... 15

  16. Glushkov’s automaton � Character in RE = state in automaton + one state for the beginning of the RE � Transitions show which characters/positions can precede each other R ( E | G ) ( E X ) * E R R E G E X R E... R G... G 16

  17. Glushkov’s automaton � Character in RE = state in automaton + one state for the beginning of the RE � Transitions show which characters/positions can precede each other R ( E | G ) ( E X ) * E R R E G E X RE... G 17

  18. Glushkov’s automaton � Character in RE = state in automaton + one state for the beginning of the RE � Transitions show which characters/positions can precede each other R ( E | G ) ( E X ) * E E R R E G E X R E E... G 18

  19. Glushkov’s automaton � Character in RE = state in automaton + one state for the beginning of the RE � Transitions show which characters/positions can precede each other R ( E | G ) ( E X ) * E E R R E G E X R G E... E G 19

  20. Glushkov’s automaton � Character in RE = state in automaton + one state for the beginning of the RE � Transitions show which characters/positions can precede each other R ( E | G ) ( E X ) * E E R R E G E X RG E X... E X G 20

  21. Glushkov’s automaton � Character in RE = state in automaton + one state for the beginning of the RE � Transitions show which characters/positions can precede each other R ( E | G ) ( E X ) * E E R E R E G E X RGE X E... E X G 21

  22. Glushkov’s automaton � Character in RE = state in automaton + one state for the beginning of the RE � Transitions show which characters/positions can precede each other R ( E | G ) ( E X ) * E E R E R E G E X E X G 22

  23. Glushkov’s automaton � All labels entering a node are labeled by the same character R ( E | G ) ( E X ) * E E R E R E G E X E X G 23

  24. Glushkov’s automaton � All labels entering a node are labeled by the same character R ( E | G ) ( E X ) * E E R E R E G E X E X G 24

  25. Glushkov’s automaton � All labels entering a node are labeled by the same character for example after reading character ‘E’ only states with label ‘E’ can be active E E R E R E G E X E X G 25

  26. Exact search � Simulation of NFA = changing active states based on the character read from the text � We use bit-vectors (one bit for each state) to hold active states δ (D, a) � D – bit-vector of active states � a – character read � Returns new bit-vector � 2 |D| · | Σ | different sets of parameters � |D| – number of states in automaton � | Σ | - alphabet's size 26

  27. Exact search � “ After reading character ‘E’ only states with label ‘E’ can be active ” so ... � δ (D, a) = T[D] & B[a] � T[ D ] – states that can be reached from states in D by any character � B[ a ] – states that can be reached by character a 27

  28. Exact search � δ (D, a) = T[D] & B[a] A B C AA|AB|AC A A A B A C A A A a B[a] D T[D] ‘A’ 0111010 1000000 ‘B’ 0100000 ‘C’ ... 0101010 ... 28

  29. Exact search � δ (D, a) = T[D] & B[a] A B C AA|AB|AC A A A B A C A A A a B[a] D T[D] ‘A’ 0111010 1000000 ‘B’ 0000100 0100000 ‘C’ ... 0101010 ... 29

  30. Exact search � δ (D, a) = T[D] & B[a] A B C AA|AB|AC A A A B A C A A A a B[a] D T[D] ‘A’ 0111010 1000000 ‘B’ 0000100 0100000 ‘C’ 0000001 ... 0101010 ... 30

  31. Exact search � δ (D, a) = T[D] & B[a] A B C AA|AB|AC A A A B A C A A A a B[a] D T[D] ‘A’ 0111010 1000000 0101010 ‘B’ 0000100 0100000 ‘C’ 0000001 ... 0101010 ... 31

  32. Exact search � δ (D, a) = T[D] & B[a] A B C AA|AB|AC A A A B A C A A A a B[a] D T[D] ‘A’ 0111010 1000000 0101010 ‘B’ 0000100 0100000 0010000 ‘C’ 0000001 ... 0101010 ... 32

  33. Exact search � δ (D, a) = T[D] & B[a] A B C AA|AB|AC A A A B A C A A A a B[a] D T[D] ‘A’ 0111010 1000000 0101010 ‘B’ 0000100 0100000 0010000 ‘C’ 0000001 ... 0101010 0010101 ... 33

  34. Exact search � δ (D, a) = T[D] & B[a] A B C AA|AB|AC A A A B A C A A A δ (0101010, ‘A’) a B[a] D T[D] ‘A’ 0111010 1000000 0101010 ‘B’ 0000100 0100000 0010000 ‘C’ 0000001 ... 0101010 0010101 ... 34

  35. Exact search � δ (D, a) = T[D] & B[a] A B C AA|AB|AC A A A B A C A A A δ (0101010, ‘A’) a B[a] D T[D] 0010101 T[D] ‘A’ 0111010 1000000 0101010 & 0111010 B[a] ‘B’ 0000100 0100000 0010000 ‘C’ 0000001 ... 0010000 0101010 0010101 ... 35

  36. Exact search D ← 100..00 // initial state active F ← bit-vector of final states For pos ∈ 1 ... n Do // scanning text D ← T[D] & B[t pos ] If D & F ≠ 000..00 Then match End of For 36

  37. Approximate search Errors � Insertion � Deletion � Substitution 37

  38. Approximate search � When searching with k errors we make k+1 replicas of the automaton, one for each error-level � Plus we need transitions for errors R E G E X No errors R E G E X ? ? ? ? ? R E G E X Up to 1 error R E G E X 38

  39. Approximate search � R 0 , R 1 – current bit-vectors � R 0 ’, R 1 ’ – bit-vectors after processing character a R 0 ’ = T[R 0 ] & B[c] R 1 ’ = ? 39

  40. Approximate search R 1 ’ = T[R 1 ] & B[c] | ... no errors � Same as in exact search E GEX R E G E X No errors R E G E X R E G E X Up to 1 error R E G E X 40

  41. Approximate search R 1 ’ = T[R 1 ] & B[c] | R 0 | ... no errors del � Active states remain the same R A EGEX R E G E X No errors R E G E X Σ Σ Σ Σ Σ Σ R E G E X Up to 1 error R E G E X 41

  42. Approximate search R 1 ’ = T[R 1 ] & B[c] | R 0 | T[R 0 ’] | ... no errors del ins � Insert new character after the current one � Just one step in automaton R E EX R E G E X No errors R E G E X ε ε ε ε ε Σ Σ Σ Σ Σ Σ R E G E X Up to 1 error R E G E X 42

Recommend


More recommend