1/31 LR(0) Parsers CSCI 3130 Formal Languages and Automata Theory Siu On CHAN Chinese University of Hong Kong Fall 2016
2/31 The alphabet of Java CFG consists of tokens like Parsing computer programs if (n == 0) { return x; } First phase of javac compiler: lexical analysis if ( ID == INT_LIT ) { return ID ; } Σ = { if , return , ( , ) , { , } , ; , == , ID , INT_LIT , . . . }
3/31 Expression Identifier Primary Expression Statement BlockStatement BlockStatements Block Statement Parsing computer programs Literal Primary Parse tree of a Java statement ExpressionRest Identifier Primary Expression Expression Infixop Statement ParExpression if ( ) { } == ; ID return INT_LIT if (n == 0) { return x; } ID
4/31 CFG of the java programming language Identifier: IdentifierChars but not a Keyword or BooleanLiteral or NullLiteral Literal: IntegerLiteral FloatingPointLiteral BooleanLiteral CharacterLiteral StringLiteral NullLiteral Expression: LambdaExpression AssignmentExpression AssignmentOperator: (one of) = *= /= %= += -= <<= >>= >>>= &= ^= |= from http: //java.sun.com/docs/books/jls/second_edition/html/syntax.doc.html#52996
5/31 Parsing Java programs Simple Java program: about 1000 tokens class Point2d { /* The X and Y coordinates of the point--instance variables */ private double x; private double y; private boolean debug; // A trick to help with debugging public Point2d (double px, double py) { // Constructor x = px; y = py; debug = false; // turn off debugging } public Point2d () { // Default constructor this (0.0, 0.0); // Invokes 2 parameter Point2D constructor } // Note that a this() invocation must be the BEGINNING of // statement body of constructor public Point2d (Point2d pt) { // Another consructor x = pt.getX(); y = pt.getY(); } ... }
6/31 Parsing algorithms How long would it take to parse this program? try all parse trees CYK algorithm hours Can we parse faster? CYK is the fastest known general-purpose parsing algorithm for CFGs Luckily, some CFGs can be rewritten to allow for a faster parsing algorithm! � 10 80 years
7/31 Hierarchy of context-free grammars context-free grammars LR( ∞ ) grammars LR( 1 ) grammars LR( 0 ) grammars Java, Python, etc have LR( 1 ) grammars We will describe LR( 0 ) parsing algorithm A grammar is LR( 0 ) if LR( 0 ) parser works correctly for it
8/31 6 A A S S 9 A A S 8 A S A S 7 4 S 5 A A LR( 0 ) parser: overview S → SA | A input: ()() A → ( S ) | () 1 • ()() 2 ( • )() 3 () • () • () • () ( • ) ( ) ( ) ( ) () • • • ( ) ( ) ( ) ( ) ( )
9/31 3 A S 5 A 4 LR( 0 ) parser: overview S → SA | A input: ()() A → ( S ) | () Features of LR( 0 ) parser: ◮ Greedily reduce the recently completed rule into a variable ◮ Unique choice of reduction at any time ⇒ ⇒ () • () • () • () ( ) ( )
10/31 A A S NFA N accepts To speed up parsing, keep track of partially completed rules in a PDA P In fact, the PDA will be a simple modification of an NFA N LR( 0 ) parsing using a PDA The NFA accepts if a rule B → β has just been completed and the PDA will reduce β to B ✓ ✓ … ⇒ 2 ( • )() ⇒ 3 () • () ⇒ 4 • () ⇒ 5 • () ⇒ … ( ) ( ) ✓ :
11/31 4 A S 7 Example: NFA acceptance condition A This case can be chained and Examples: S → SA | A A → ( S ) | () A rule B → β has just been completed if Case 1 input/bufger so far is exactly β 3 () • () • () ( ) Case 2 Or bufger so far is αβ and there is another rule C → α B γ () • ( )
( ) ( S ) ( S ) ( S ) ( S ) ( ) () ( ) () A A S A A S A A A S q S SA SA S A S A A S A S Designing NFA for Case 1 A 12/31 S → SA | A A → ( S ) | () Design an NFA N ′ to accept the right hand side of some rule B → β
12/31 A Designing NFA for Case 1 S S A S → SA | A A → ( S ) | () Design an NFA N ′ to accept the right hand side of some rule B → β S → • SA S → S • A S → SA • ε S → • A S → A • ε q 0 ( ) ε A → • ( S ) A → ( • S ) A → ( S • ) A → ( S ) • ε ( ) A → • () A → ( • ) A → () •
, add C ( ) ( S ) ( S ) ( S ) ( S ) ( ) () ( ) () A S A A S A All blue A A A A 13/31 Designing NFA for Cases 1 & 2 SA and for longer chains For every rule C B , B B B q A S S S A S SA S A S A are -transitions Design an NFA N to accept αβ for some rules S → SA | A C → α B γ, B → β A → ( S ) | ()
13/31 and for longer chains S A A S Designing NFA for Cases 1 & 2 Design an NFA N to accept αβ for some rules S → SA | A C → α B γ, B → β A → ( S ) | () ε For every rule C → α B γ , B → β , add C → α • B γ B → • β S → • SA S → S • A S → SA • ε All blue − → are ε -transitions S → • A S → A • ε q 0 ( ) ε A → • ( S ) A → ( • S ) A → ( S • ) A → ( S ) • ε ( ) A → • () A → ( • ) A → () •
14/31 Summary of the NFA X The NFA N will accept whenever a rule has just been completed For every rule B → β , add ε B → • β q 0 For every rule B → α X β ( X may be terminal or variable), add B → α • X β B → α X • β Every completed rule B → β is accepting B → β • For every rule C → α B γ , B → β , add ε C → α • B γ B → • β
15/31 Equivalent DFA D for the NFA N A A S A A Observation: every accepting state contains only one rule: S Dead state (empty set) not shown for clarity S → • SA S → SA • S → S • A S → • A A → • ( S ) A → • ( S ) A → • () A → • () ( A → ( S • ) ( A → ( • S ) S → S • A A → ( • ) A → • ( S ) S → A • S → • SA A → • () ( S → • A ) ( A → • ( S ) ) A → () • A → ( S ) • A → • () a completed rule B → β • , and such rules appear only in accepting states
16/31 Every accepting state contains only one rule: and completed rules appear only in accepting states Shifu state: no completed rule Reduce state: has (unique) completed rule LR( 0 ) grammars A grammar G is LR( 0 ) if its corresponding D G satisfies: a completed rule of the form B → β • A → ( S ) • S → S • A A → • ( S ) A → • ()
17/31 Simulating DFA D Our parser P simulates state transitions in DFA D Solution: keep track of previous states in a stack go back to the correct state by looking at the stack ⇒ (() • ) ( A • ) ( ) Afuer reducing () to A , what is the new state?
18/31 Let’s label D ’s states A S A A S A q 1 q 2 q 3 S → • SA S → SA • S → S • A S → • A A → • ( S ) A → • ( S ) A → • () A → • () q 6 ( A → ( S • ) ( q 5 A → ( • S ) S → S • A A → ( • ) A → • ( S ) q 4 S → A • S → • SA A → • () ( S → • A ) ( A → • ( S ) ) q 8 q 7 A → () • A → ( S ) • A → • ()
19/31 2. constructs part of the parse tree X k B symbol is B 1. P simulates D ’s transition upon reading terminal or variable X At D ’s non-accepting state q i completed rules P ’s stack contains labels of D ’s states to remember progress of partially LR( 0 ) parser: a “PDA” P simulating DFA D 2. P pushes current state label q i onto its stack At D ’s accepting state with completed rule B → X 1 . . . X k 1. P pops k labels q k , . . . , q 1 from its stack . . . X 1 X 2 3. P goes to state q 1 (last label popped earlier), pretend next input
20/31 3 5 stack state A 6 4 Example S A A 4 A S 2 state stack 1 3 • ()() $ q 1 ( • )() $1 q 5 • () () • () $15 $1 q 2 q 8 • A () $ q 1 ( ) ( ) • () $1 q 4 ( • ) $12 q 5 ( ) • S () $ q 1 ( ) ( )
21/31 S A A Example state stack 8 A 8 A 9 S S A A S parser’s output is the parse tree S 7 state A stack S 7 A • S $ q 1 () • $125 q 8 ( ) ( ) • A $1 ( ) q 2 • $1 q 2 ( ) ( ) • $12 ( ) q 3 ( ) ( ) ( )
22/31 NFA N : A A Another LR( 0 ) grammar L = { w # w R | w ∈ { a , b } ∗ } C → a C a | b C b | # a a C → • a C a C → a • C a C → a C • a C → a C a • ε ε ε ε # C → • # C → # • q 0 ε ε ε ε ε b C → • b C b C → b • C b C → b C • b C → b C b • b
23/31 S 2 4 5 6 7 8 input: stack state action 1 4 C S 3 S 2 R 5 S 7 R 6 S 8 1 3 C R Another LR( 0 ) grammar C → a C a | b C b | # ba#ab C → • a C a # C → • b C b C → # • C → • # $ b # $1 a # $14 C → a • C a C → b • C b b $143 C → • a C a C → • a C a $143 a b C → • b C b C → • b C b $1435 a C → • # C → • # $14 $146 C → a C • a C → b C • b a b C → a C a • C → b C b •
24/31 Deterministic PDAs Some CFLs require non-deterministic PDAs, such as PDA for LR( 0 ) parsing is deterministic L = { ww R | w ∈ { a , b } ∗ } What goes wrong when we do LR( 0 ) parsing on L ?
25/31 Example 2 NFA N : A A L = { ww R | w ∈ { a , b } ∗ } C → a C a | b C b | ε a a C → • a C a C → a • C a C → a C • a C → a C a • ε ε ε ε C → • q 0 ε ε ε ε ε b C → • b C b C → b • C b C → b C • b C → b C b • b
Recommend
More recommend