Parsing: Episode I Matthew Might University of Utah matt.might.net ucombinator.org
Administrivia • Project 1: Use the source!
Agenda • What is parsing? • Context-free languages • Context-free grammars • Recursive descent parsing • Properties of grammars
What is parsing? A parser converts a token stream from the lexer into a parse tree.
Example f x = x
Example f x = x ID(f) ID(x) EQUAL ID(x)
Example f x = x ID(f) ID(x) EQUAL ID(x) Dec FunDef ArgList Expr ID(f) EQUAL Arg Ref ID(x) ID(x)
Parsing methods • LALR( k ) • Nondet. rec. descent • LR( k ) • Predictive rec. descent • SLR( k ) • PEG/Packrat • LL( k ) • Combinators • Back-tracking search • Earley
Context-free languages
Context-free languages • Natural choice for describing syntax • Like regular expressions plus recursion
Example • Language of balanced parentheses • Language is context-free language • But language is not regular language
As formal language • Context-free languages are formal languages • Two operations allowed: catenation, union • Recursive equations are allowed as well
Example L B = { ǫ } ∪ ( { ( } · L B · { ) } · L B ) .
Problem: Recursion! How do we assign meaning to recursive definitions?
Fixed points!
Fixed points If x = f ( x ), then the point x is a fixed point of the function f .
Fixed points Fix ( f ) = { L : L = f ( L ) } .
Algebra • x = x 2 - 1 is a recursive definition of x • If f ( v ) = v 2 - 1, then x = f ( x ). • Solutions are the fixed points of f .
f ( x ) 0 x
f ( x ) = x 2 -1 f ( x ) 0 x
f ( x ) = x 2 -1 f ( x ) fixed line 0 x
Refactoring L B = f ( L B ) f ( L ) = { ǫ } ∪ ( { ( } · L · { ) } · L ) .
Candidates L B ∈ Fix ( f ),
Sensible choices � lfp( f ) = L L ∈ Fix ( f ) � gfp( f ) = L L ∈ Fix ( f )
Greatest fixed point • Includes infinitely long strings! • Example: ()()()()()()() ...
Kleene’s theorem (specialized) If a function f is continuous , then: ∞ � f n ( ∅ ) lfp( f ) = n ≥ 1
Continuous The function f is continuous only if: �� � � = f ( x i ) f x i i i
Constructive observation ∅ ⊆ f ( ∅ ) ⊆ f 2 ( ∅ ) ⊆ f 3 ( ∅ ) ⊆ · · ·
Excursion
In general � In general, for a set of recursive equations over the languages L 1 , . . . , L n , if L 1 = f 1 ( L 1 , . . . , L n ) L 2 = f 2 ( L 1 , . . . , L n ) . . . . . = . L n = f n ( L 1 , . . . , L n ), then these languages are a fixed point of the function F : P ( A ∗ ) n → P ( A ∗ ) n : F ( L 1 , . . . , L n ) = ( f 1 ( L 1 , . . . , L n ) , f 2 ( L 1 , . . . , L n ) , . . . f n ( L 1 , . . . , L n )), and by default, the least fixed point of this function: ( L 1 , . . . , L n ) = lfp( F ).
Context-free grammars
Context-free grammars A context-free grammar is a quadruple ( A, N, R, n 0 ), where: • the set A contains the terminal symbols of the language—its alphabet; and • the set N contains the non-terminal symbols of the language; and • the set R ⊆ N × ( A × N ) ∗ contains non-terminal-to-terminal substitution rules; and • the symbol n 0 ∈ N is the top-level “start” symbol.
Example A = { ( , ) } N = { B } R ∋ B → ( B ) B R ∋ B → ǫ n 0 = B .
Recognizing strings wnw ′ ∈ L ( A, N, R, n 0 ) ( n → s 1 . . . s n ) ∈ R ws 1 . . . s n w ′ ∈ L ( A, N, R, n 0 ).
Example B = n 0 ( B → ( B ) B ) ∈ R ( B → ǫ ) ∈ R B ∈ L ( G B ) ( B ) B ∈ L ( G B ) () ∈ L ( G B ).
Parse trees • Convenient diagrammatic notation • Demonstrates membership in language • Simultaneously shows structure of string
Example B B B ( ) ǫ ǫ
Example: Regexes A = { ( , ) , a , . . . , z , | , * } N = { E, T, F, K } R ∋ E → T | E R ∋ E → T R ∋ T → F T R ∋ T → F R ∋ F → K * R ∋ F → K R ∋ K → ( E ) R ∋ K → a , for every a ∈ { a , . . . , z } n 0 = E .
Parse tree: (a|b)* E A = { ( , ) , a , . . . , z , | , * } T N = { E, T, F, K } F R ∋ E → T | E R ∋ E → T K * R ∋ T → F T ( E ) R ∋ T → F T E R ∋ F → K * F T R ∋ F → K K F R ∋ K → ( E ) K a R ∋ K → a , for every a ∈ { a , . . . , z } b n 0 = E .
Ambiguous grammars A grammar is ambiguous if there is at least one string that has one or more parse trees.
Example: Ambiguity A = { ( , ) , + , * } ∪ Z N = { E } R ∋ E → E + E R ∋ E → E * E R ∋ E → z , for every z ∈ Z n 0 = E .
Example: 3 + 4 * 9 E E E E 3 + * 9 4 * 9 3 + 4
Left-recursion A grammar is left-recursive if a non-terminal symbol can derive a new string with itself in leftmost position.
Example: Left-recursion S → S , x S → x
Example: Factoring S → x , S S → x
Exercise: Nondeterministic recursive descent
Grammar X → ( X ∗ ) X → num X → sym X ∗ → X X ∗ X ∗ → ǫ .
Exercise: Predictive recursive descent
Lexer API • next() : Token • eat(t : TokenType) • peek(k : Int) : TokenType
CFG properties
Nullability The nullability function , δ : ( A ∪ N ) → {{ ǫ } , ∅ } , returns the set { ǫ } if the provided symbol can derive the empty string, and ∅ otherwise: δ ( a ) = ∅ δ ( n ) ⊇ δ ( s 1 ) · . . . · δ ( s n ) if ( n → s 1 . . . s n ) ∈ R δ ( n ) ⊇ { ǫ } if ( n → ǫ ) ∈ R .
Nullability The nullability function , δ : ( A ∪ N ) → {{ ǫ } , ∅ } , returns the set { ǫ } if the provided symbol can derive the empty string, and ∅ otherwise: δ ( a ) = ∅ δ ( n ) ⊇ δ ( s 1 ) · . . . · δ ( s n ) if ( n → s 1 . . . s n ) ∈ R δ ( n ) ⊇ { ǫ } if ( n → ǫ ) ∈ R .
Inclusion constraints X 1 ⊇ f 1 ( X 1 , . . . , X n ) . . . . . . X n ⊇ f n ( X 1 , . . . , X n ),
Inclusion constraints X 1 ⊇ f 1 ( X 1 , . . . , X n ) . . . . . . X n ⊇ f n ( X 1 , . . . , X n ),
Solving inclusions X i ← ∅ for all i changed ← true while ( changed ) changed ← false X ′ i ← f i ( X 1 , . . . , X n ) if ( X i � = X ′ i ) X i ← X ′ i changed ← true .
First sets In context-free grammars, first sets are easily computed with subset-inclusion constraints; for every rule ( n → s 1 . . . s m ) ∈ R : m � first ( n ) ⊇ δ ( s 1 . . . s i − 1 ) · first ( s i ). i ≥ 1
First sets In context-free grammars, first sets are easily computed with subset-inclusion constraints; for every rule ( n → s 1 . . . s m ) ∈ R : m � first ( n ) ⊇ δ ( s 1 . . . s i − 1 ) · first ( s i ). i ≥ 1
Follow sets function follow : ( A ∪ N ) → A ; for every rule n → s 1 . . . s n n − 1 � follow ( s i ) ⊇ δ ( s i +1 . . . s j ) · first ( s j +1 ) j ≥ i ∪ δ ( s i +1 . . . s n ) · follow ( n ).
Follow sets function follow : ( A ∪ N ) → A ; for every rule n → s 1 . . . s n n − 1 � follow ( s i ) ⊇ δ ( s i +1 . . . s j ) · first ( s j +1 ) j ≥ i ∪ δ ( s i +1 . . . s n ) · follow ( n ).
CFL trivia • Are regular languages context-free? • Are CFLs closed under complement? • Is the intersection of CFLs context-free? • Does a CFG accept no strings? • Does a CFG accept a finite set? • Does a CFG accept every string? • Is one CFL a subset of another CFL?
Recommend
More recommend