parsing episode i
play

Parsing: Episode I Matthew Might University of Utah matt.might.net - PowerPoint PPT Presentation

Parsing: Episode I Matthew Might University of Utah matt.might.net ucombinator.org Administrivia Project 1: Use the source! Agenda What is parsing? Context-free languages Context-free grammars Recursive descent parsing


  1. Parsing: Episode I Matthew Might University of Utah matt.might.net ucombinator.org

  2. Administrivia • Project 1: Use the source!

  3. Agenda • What is parsing? • Context-free languages • Context-free grammars • Recursive descent parsing • Properties of grammars

  4. What is parsing? A parser converts a token stream from the lexer into a parse tree.

  5. Example f x = x

  6. Example f x = x ID(f) ID(x) EQUAL ID(x)

  7. Example f x = x ID(f) ID(x) EQUAL ID(x) Dec FunDef ArgList Expr ID(f) EQUAL Arg Ref ID(x) ID(x)

  8. Parsing methods • LALR( k ) • Nondet. rec. descent • LR( k ) • Predictive rec. descent • SLR( k ) • PEG/Packrat • LL( k ) • Combinators • Back-tracking search • Earley

  9. Context-free languages

  10. Context-free languages • Natural choice for describing syntax • Like regular expressions plus recursion

  11. Example • Language of balanced parentheses • Language is context-free language • But language is not regular language

  12. As formal language • Context-free languages are formal languages • Two operations allowed: catenation, union • Recursive equations are allowed as well

  13. Example L B = { ǫ } ∪ ( { ( } · L B · { ) } · L B ) .

  14. Problem: Recursion! How do we assign meaning to recursive definitions?

  15. Fixed points!

  16. Fixed points If x = f ( x ), then the point x is a fixed point of the function f .

  17. Fixed points Fix ( f ) = { L : L = f ( L ) } .

  18. Algebra • x = x 2 - 1 is a recursive definition of x • If f ( v ) = v 2 - 1, then x = f ( x ). • Solutions are the fixed points of f .

  19. f ( x ) 0 x

  20. f ( x ) = x 2 -1 f ( x ) 0 x

  21. f ( x ) = x 2 -1 f ( x ) fixed line 0 x

  22. Refactoring L B = f ( L B ) f ( L ) = { ǫ } ∪ ( { ( } · L · { ) } · L ) .

  23. Candidates L B ∈ Fix ( f ),

  24. Sensible choices � lfp( f ) = L L ∈ Fix ( f ) � gfp( f ) = L L ∈ Fix ( f )

  25. Greatest fixed point • Includes infinitely long strings! • Example: ()()()()()()() ...

  26. Kleene’s theorem (specialized) If a function f is continuous , then: ∞ � f n ( ∅ ) lfp( f ) = n ≥ 1

  27. Continuous The function f is continuous only if: �� � � = f ( x i ) f x i i i

  28. Constructive observation ∅ ⊆ f ( ∅ ) ⊆ f 2 ( ∅ ) ⊆ f 3 ( ∅ ) ⊆ · · ·

  29. Excursion

  30. In general � In general, for a set of recursive equations over the languages L 1 , . . . , L n , if L 1 = f 1 ( L 1 , . . . , L n ) L 2 = f 2 ( L 1 , . . . , L n ) . . . . . = . L n = f n ( L 1 , . . . , L n ), then these languages are a fixed point of the function F : P ( A ∗ ) n → P ( A ∗ ) n : F ( L 1 , . . . , L n ) = ( f 1 ( L 1 , . . . , L n ) , f 2 ( L 1 , . . . , L n ) , . . . f n ( L 1 , . . . , L n )), and by default, the least fixed point of this function: ( L 1 , . . . , L n ) = lfp( F ).

  31. Context-free grammars

  32. Context-free grammars A context-free grammar is a quadruple ( A, N, R, n 0 ), where: • the set A contains the terminal symbols of the language—its alphabet; and • the set N contains the non-terminal symbols of the language; and • the set R ⊆ N × ( A × N ) ∗ contains non-terminal-to-terminal substitution rules; and • the symbol n 0 ∈ N is the top-level “start” symbol.

  33. Example A = { ( , ) } N = { B } R ∋ B → ( B ) B R ∋ B → ǫ n 0 = B .

  34. Recognizing strings wnw ′ ∈ L ( A, N, R, n 0 ) ( n → s 1 . . . s n ) ∈ R ws 1 . . . s n w ′ ∈ L ( A, N, R, n 0 ).

  35. Example B = n 0 ( B → ( B ) B ) ∈ R ( B → ǫ ) ∈ R B ∈ L ( G B ) ( B ) B ∈ L ( G B ) () ∈ L ( G B ).

  36. Parse trees • Convenient diagrammatic notation • Demonstrates membership in language • Simultaneously shows structure of string

  37. Example B B B ( ) ǫ ǫ

  38. Example: Regexes A = { ( , ) , a , . . . , z , | , * } N = { E, T, F, K } R ∋ E → T | E R ∋ E → T R ∋ T → F T R ∋ T → F R ∋ F → K * R ∋ F → K R ∋ K → ( E ) R ∋ K → a , for every a ∈ { a , . . . , z } n 0 = E .

  39. Parse tree: (a|b)* E A = { ( , ) , a , . . . , z , | , * } T N = { E, T, F, K } F R ∋ E → T | E R ∋ E → T K * R ∋ T → F T ( E ) R ∋ T → F T E R ∋ F → K * F T R ∋ F → K K F R ∋ K → ( E ) K a R ∋ K → a , for every a ∈ { a , . . . , z } b n 0 = E .

  40. Ambiguous grammars A grammar is ambiguous if there is at least one string that has one or more parse trees.

  41. Example: Ambiguity A = { ( , ) , + , * } ∪ Z N = { E } R ∋ E → E + E R ∋ E → E * E R ∋ E → z , for every z ∈ Z n 0 = E .

  42. Example: 3 + 4 * 9 E E E E 3 + * 9 4 * 9 3 + 4

  43. Left-recursion A grammar is left-recursive if a non-terminal symbol can derive a new string with itself in leftmost position.

  44. Example: Left-recursion S → S , x S → x

  45. Example: Factoring S → x , S S → x

  46. Exercise: Nondeterministic recursive descent

  47. Grammar X → ( X ∗ ) X → num X → sym X ∗ → X X ∗ X ∗ → ǫ .

  48. Exercise: Predictive recursive descent

  49. Lexer API • next() : Token • eat(t : TokenType) • peek(k : Int) : TokenType

  50. CFG properties

  51. Nullability The nullability function , δ : ( A ∪ N ) → {{ ǫ } , ∅ } , returns the set { ǫ } if the provided symbol can derive the empty string, and ∅ otherwise: δ ( a ) = ∅ δ ( n ) ⊇ δ ( s 1 ) · . . . · δ ( s n ) if ( n → s 1 . . . s n ) ∈ R δ ( n ) ⊇ { ǫ } if ( n → ǫ ) ∈ R .

  52. Nullability The nullability function , δ : ( A ∪ N ) → {{ ǫ } , ∅ } , returns the set { ǫ } if the provided symbol can derive the empty string, and ∅ otherwise: δ ( a ) = ∅ δ ( n ) ⊇ δ ( s 1 ) · . . . · δ ( s n ) if ( n → s 1 . . . s n ) ∈ R δ ( n ) ⊇ { ǫ } if ( n → ǫ ) ∈ R .

  53. Inclusion constraints X 1 ⊇ f 1 ( X 1 , . . . , X n ) . . . . . . X n ⊇ f n ( X 1 , . . . , X n ),

  54. Inclusion constraints X 1 ⊇ f 1 ( X 1 , . . . , X n ) . . . . . . X n ⊇ f n ( X 1 , . . . , X n ),

  55. Solving inclusions X i ← ∅ for all i changed ← true while ( changed ) changed ← false X ′ i ← f i ( X 1 , . . . , X n ) if ( X i � = X ′ i ) X i ← X ′ i changed ← true .

  56. First sets In context-free grammars, first sets are easily computed with subset-inclusion constraints; for every rule ( n → s 1 . . . s m ) ∈ R : m � first ( n ) ⊇ δ ( s 1 . . . s i − 1 ) · first ( s i ). i ≥ 1

  57. First sets In context-free grammars, first sets are easily computed with subset-inclusion constraints; for every rule ( n → s 1 . . . s m ) ∈ R : m � first ( n ) ⊇ δ ( s 1 . . . s i − 1 ) · first ( s i ). i ≥ 1

  58. Follow sets function follow : ( A ∪ N ) → A ; for every rule n → s 1 . . . s n n − 1 � follow ( s i ) ⊇ δ ( s i +1 . . . s j ) · first ( s j +1 ) j ≥ i ∪ δ ( s i +1 . . . s n ) · follow ( n ).

  59. Follow sets function follow : ( A ∪ N ) → A ; for every rule n → s 1 . . . s n n − 1 � follow ( s i ) ⊇ δ ( s i +1 . . . s j ) · first ( s j +1 ) j ≥ i ∪ δ ( s i +1 . . . s n ) · follow ( n ).

  60. CFL trivia • Are regular languages context-free? • Are CFLs closed under complement? • Is the intersection of CFLs context-free? • Does a CFG accept no strings? • Does a CFG accept a finite set? • Does a CFG accept every string? • Is one CFL a subset of another CFL?

Recommend


More recommend