LR(0) Parsers CSCI 3130 Formal Languages and Automata Theory Siu On - PowerPoint PPT Presentation

1/31 LR(0) Parsers CSCI 3130 Formal Languages and Automata Theory Siu On CHAN Chinese University of Hong Kong Fall 2016

2/31 The alphabet of Java CFG consists of tokens like Parsing computer programs if (n == 0) { return x; } First phase of javac compiler: lexical analysis if ( ID == INT_LIT ) { return ID ; } Σ = { if , return , ( , ) , { , } , ; , == , ID , INT_LIT , . . . }

3/31 Expression Identifier Primary Expression Statement BlockStatement BlockStatements Block Statement Parsing computer programs Literal Primary Parse tree of a Java statement ExpressionRest Identifier Primary Expression Expression Infixop Statement ParExpression if ( ) { } == ; ID return INT_LIT if (n == 0) { return x; } ID

4/31 CFG of the java programming language Identifier: IdentifierChars but not a Keyword or BooleanLiteral or NullLiteral Literal: IntegerLiteral FloatingPointLiteral BooleanLiteral CharacterLiteral StringLiteral NullLiteral Expression: LambdaExpression AssignmentExpression AssignmentOperator: (one of) = *= /= %= += -= <<= >>= >>>= &= ^= |= from http: //java.sun.com/docs/books/jls/second_edition/html/syntax.doc.html#52996

5/31 Parsing Java programs Simple Java program: about 1000 tokens class Point2d { /* The X and Y coordinates of the point--instance variables */ private double x; private double y; private boolean debug; // A trick to help with debugging public Point2d (double px, double py) { // Constructor x = px; y = py; debug = false; // turn off debugging } public Point2d () { // Default constructor this (0.0, 0.0); // Invokes 2 parameter Point2D constructor } // Note that a this() invocation must be the BEGINNING of // statement body of constructor public Point2d (Point2d pt) { // Another consructor x = pt.getX(); y = pt.getY(); } ... }

6/31 Parsing algorithms How long would it take to parse this program? try all parse trees CYK algorithm hours Can we parse faster? CYK is the fastest known general-purpose parsing algorithm for CFGs Luckily, some CFGs can be rewritten to allow for a faster parsing algorithm! � 10 80 years

7/31 Hierarchy of context-free grammars context-free grammars LR( ∞ ) grammars LR( 1 ) grammars LR( 0 ) grammars Java, Python, etc have LR( 1 ) grammars We will describe LR( 0 ) parsing algorithm A grammar is LR( 0 ) if LR( 0 ) parser works correctly for it

8/31 6 A A S S 9 A A S 8 A S A S 7 4 S 5 A A LR( 0 ) parser: overview S → SA | A input: ()() A → ( S ) | () 1 • ()() 2 ( • )() 3 () • () • () • () ( • ) ( ) ( ) ( ) () • • • ( ) ( ) ( ) ( ) ( )

9/31 3 A S 5 A 4 LR( 0 ) parser: overview S → SA | A input: ()() A → ( S ) | () Features of LR( 0 ) parser: ◮ Greedily reduce the recently completed rule into a variable ◮ Unique choice of reduction at any time ⇒ ⇒ () • () • () • () ( ) ( )

10/31 A A S NFA N accepts To speed up parsing, keep track of partially completed rules in a PDA P In fact, the PDA will be a simple modification of an NFA N LR( 0 ) parsing using a PDA The NFA accepts if a rule B → β has just been completed and the PDA will reduce β to B ✓ ✓ … ⇒ 2 ( • )() ⇒ 3 () • () ⇒ 4 • () ⇒ 5 • () ⇒ … ( ) ( ) ✓ :

11/31 4 A S 7 Example: NFA acceptance condition A This case can be chained and Examples: S → SA | A A → ( S ) | () A rule B → β has just been completed if Case 1 input/bufger so far is exactly β 3 () • () • () ( ) Case 2 Or bufger so far is αβ and there is another rule C → α B γ () • ( )

( ) ( S ) ( S ) ( S ) ( S ) ( ) () ( ) () A A S A A S A A A S q S SA SA S A S A A S A S Designing NFA for Case 1 A 12/31 S → SA | A A → ( S ) | () Design an NFA N ′ to accept the right hand side of some rule B → β

12/31 A Designing NFA for Case 1 S S A S → SA | A A → ( S ) | () Design an NFA N ′ to accept the right hand side of some rule B → β S → • SA S → S • A S → SA • ε S → • A S → A • ε q 0 ( ) ε A → • ( S ) A → ( • S ) A → ( S • ) A → ( S ) • ε ( ) A → • () A → ( • ) A → () •

, add C ( ) ( S ) ( S ) ( S ) ( S ) ( ) () ( ) () A S A A S A All blue A A A A 13/31 Designing NFA for Cases 1 & 2 SA and for longer chains For every rule C B , B B B q A S S S A S SA S A S A are -transitions Design an NFA N to accept αβ for some rules S → SA | A C → α B γ, B → β A → ( S ) | ()

13/31 and for longer chains S A A S Designing NFA for Cases 1 & 2 Design an NFA N to accept αβ for some rules S → SA | A C → α B γ, B → β A → ( S ) | () ε For every rule C → α B γ , B → β , add C → α • B γ B → • β S → • SA S → S • A S → SA • ε All blue − → are ε -transitions S → • A S → A • ε q 0 ( ) ε A → • ( S ) A → ( • S ) A → ( S • ) A → ( S ) • ε ( ) A → • () A → ( • ) A → () •

14/31 Summary of the NFA X The NFA N will accept whenever a rule has just been completed For every rule B → β , add ε B → • β q 0 For every rule B → α X β ( X may be terminal or variable), add B → α • X β B → α X • β Every completed rule B → β is accepting B → β • For every rule C → α B γ , B → β , add ε C → α • B γ B → • β

15/31 Equivalent DFA D for the NFA N A A S A A Observation: every accepting state contains only one rule: S Dead state (empty set) not shown for clarity S → • SA S → SA • S → S • A S → • A A → • ( S ) A → • ( S ) A → • () A → • () ( A → ( S • ) ( A → ( • S ) S → S • A A → ( • ) A → • ( S ) S → A • S → • SA A → • () ( S → • A ) ( A → • ( S ) ) A → () • A → ( S ) • A → • () a completed rule B → β • , and such rules appear only in accepting states

16/31 Every accepting state contains only one rule: and completed rules appear only in accepting states Shifu state: no completed rule Reduce state: has (unique) completed rule LR( 0 ) grammars A grammar G is LR( 0 ) if its corresponding D G satisfies: a completed rule of the form B → β • A → ( S ) • S → S • A A → • ( S ) A → • ()

17/31 Simulating DFA D Our parser P simulates state transitions in DFA D Solution: keep track of previous states in a stack go back to the correct state by looking at the stack ⇒ (() • ) ( A • ) ( ) Afuer reducing () to A , what is the new state?

18/31 Let’s label D ’s states A S A A S A q 1 q 2 q 3 S → • SA S → SA • S → S • A S → • A A → • ( S ) A → • ( S ) A → • () A → • () q 6 ( A → ( S • ) ( q 5 A → ( • S ) S → S • A A → ( • ) A → • ( S ) q 4 S → A • S → • SA A → • () ( S → • A ) ( A → • ( S ) ) q 8 q 7 A → () • A → ( S ) • A → • ()

19/31 2. constructs part of the parse tree X k B symbol is B 1. P simulates D ’s transition upon reading terminal or variable X At D ’s non-accepting state q i completed rules P ’s stack contains labels of D ’s states to remember progress of partially LR( 0 ) parser: a “PDA” P simulating DFA D 2. P pushes current state label q i onto its stack At D ’s accepting state with completed rule B → X 1 . . . X k 1. P pops k labels q k , . . . , q 1 from its stack . . . X 1 X 2 3. P goes to state q 1 (last label popped earlier), pretend next input

20/31 3 5 stack state A 6 4 Example S A A 4 A S 2 state stack 1 3 • ()() $ q 1 ( • )() $1 q 5 • () () • () $15 $1 q 2 q 8 • A () $ q 1 ( ) ( ) • () $1 q 4 ( • ) $12 q 5 ( ) • S () $ q 1 ( ) ( )

21/31 S A A Example state stack 8 A 8 A 9 S S A A S parser’s output is the parse tree S 7 state A stack S 7 A • S $ q 1 () • $125 q 8 ( ) ( ) • A $1 ( ) q 2 • $1 q 2 ( ) ( ) • $12 ( ) q 3 ( ) ( ) ( )

22/31 NFA N : A A Another LR( 0 ) grammar L = { w # w R | w ∈ { a , b } ∗ } C → a C a | b C b | # a a C → • a C a C → a • C a C → a C • a C → a C a • ε ε ε ε # C → • # C → # • q 0 ε ε ε ε ε b C → • b C b C → b • C b C → b C • b C → b C b • b

23/31 S 2 4 5 6 7 8 input: stack state action 1 4 C S 3 S 2 R 5 S 7 R 6 S 8 1 3 C R Another LR( 0 ) grammar C → a C a | b C b | # ba#ab C → • a C a # C → • b C b C → # • C → • # $ b # $1 a # $14 C → a • C a C → b • C b b $143 C → • a C a C → • a C a $143 a b C → • b C b C → • b C b $1435 a C → • # C → • # $14 $146 C → a C • a C → b C • b a b C → a C a • C → b C b •

24/31 Deterministic PDAs Some CFLs require non-deterministic PDAs, such as PDA for LR( 0 ) parsing is deterministic L = { ww R | w ∈ { a , b } ∗ } What goes wrong when we do LR( 0 ) parsing on L ?

25/31 Example 2 NFA N : A A L = { ww R | w ∈ { a , b } ∗ } C → a C a | b C b | ε a a C → • a C a C → a • C a C → a C • a C → a C a • ε ε ε ε C → • q 0 ε ε ε ε ε b C → • b C b C → b • C b C → b C • b C → b C b • b

LR(0) Parsers CSCI 3130 Formal Languages and Automata Theory Siu On - PowerPoint PPT Presentation

1/31 LR(0) Parsers CSCI 3130 Formal Languages and Automata Theory Siu On CHAN Chinese University of Hong Kong Fall 2016 2/31 The alphabet of Java CFG consists of tokens like Parsing computer programs if (n == 0) { return x; } First phase of

Scanners and parsers COMP 520 Fall 2010 Scanners and Parsers (2) A scanner or lexer transforms a

LR Parsing Compiler Design CSE 504 Shift-Reduce Parsing 1 LR Parsers 2 SLR and LR(1) Parsers

Objectives Combinator Parsing Show how to build complex parsers by composing simpler parsers.

XML Parsers Asst. Prof. Dr. Kanda Runapongsa Saikaew (krunapon@kku.ac.th) Dept. of Computer

Instruction Parsers Nathan Jay Paradyn Project Scalable Tools Workshop Granlibakken, California

Features of Statistical Parsers Mark Johnson Brown Laboratory for Linguistic Information

Dependency and Phrasal Parsers of the Czech Language: A Comparison ak 1 , Tom s Holan 2 ,

Shift-Reduce Parsers for Transition Networks Luca Breveglieri Stefano Crespi Reghizzi Angelo

Training Deterministic Parsers with Non-Deterministic Oracles by Yoav Goldberg and Joakim

CS406: Compilers Spring 2020 Week 5: Parsers, AST, and Semantic Routines 1 Recap 2 3

Natural and Flexible Error Recovery for Generated Parsers Maartje de Jonge Emma Nilsson-Nyman

Design and Evaluation of HTTP Protocol Parsers for IPFIX Measurement Petr Velan, Tom

Towards efficient, typed LR parsers Franc ois Pottier and Yann R egis-Gianas June 2005

Scaling Semantic Parsers with On-the-Fly Ontology Matching Tom Kwiakowski, Eunsol Choi, Yoav

Parser Evaluation and the BNC Standard Parser Evaluation The Parsers Jennifer Foster and Josef

ASTs AST node classes The parsers output is an abstract syntax tree (AST) Each node in an AST

CS137: Today Electronic Design Automation Sequential Verification DFA equivalence

Nondeterministic Finite Automata Nondeterminism Subset Construction 1 Nondeterminism A

Regular Languages and Finite State Automata Data structures and algorithms for Computational

Status of the FAIR Project Status of the FAIR Project I. Augustin FAIR Coordination Group GSI

Reducing -regular Specifications to Safety Conditions Joint work with John Fearnley

Introduction to Matching and Allocation Problems (II) Scott Duke Kominers Society of Fellows,

Exact Statistical Inference after Model Selection. Jason D Lee Dept of Statistics and Institute

SENSATA FIRST QUARTER 2019 EARNINGS PRESENTATION MAY 1, 2019 Forward-Looking Statements and