shift reduce parsers for transition networks
play

Shift-Reduce Parsers for Transition Networks Luca Breveglieri - PowerPoint PPT Presentation

Shift-Reduce Parsers for Transition Networks Luca Breveglieri Stefano Crespi Reghizzi Angelo Morzenti Politecnico di Milano LATA 2014 - 10-14 March - Madrid Breveglieri, Crespi, Morzenti (PoliMi) S-R Parsers for Transition Networks LATA 2014


  1. Shift-Reduce Parsers for Transition Networks Luca Breveglieri Stefano Crespi Reghizzi Angelo Morzenti Politecnico di Milano LATA 2014 - 10-14 March - Madrid Breveglieri, Crespi, Morzenti (PoliMi) S-R Parsers for Transition Networks LATA 2014 1 / 26

  2. Introduction Aim of the Work Problem statement and research objectives On the status of the LR or bottom-up syntax analysis LR (bottom-up) is an established methodology for syntax analysis. Theory is mostly developed for grammars in Backus-Naur Form ( BNF ). There are automated tools for compiler design that use it (e.g., Bison). Extended BNF ( EBNF ) grammars (rules contain regular expressions) are widely used for specifying technical languages of all sorts. Usually EBNF rules are reduced to BNF ones and then analyzed ! Objectives of the present research work Develop an Extended LR ( ELR ) methodology to generalize the LR one. Applicable to EBNF grammars represented as Transition Networks ( TN ). Breveglieri, Crespi, Morzenti (PoliMi) S-R Parsers for Transition Networks LATA 2014 2 / 26

  3. Introduction Contents Outline Table of contents Introduction 1 Transition Network 2 Parser Control 3 Main Theorem 4 Parser Construction 5 Experimentation 6 Conclusion 7 Breveglieri, Crespi, Morzenti (PoliMi) S-R Parsers for Transition Networks LATA 2014 3 / 26

  4. Introduction State of the Art State of the art in the LR syntax analysis Classical LR ( k ) theory for BNF grammars is well developed. Compiler design tools for LR ( k ) parsers exist (e.g., Bison). EBNF grammars are popular for describing technical languages (e.g., syntax charts), but then little used to obtain the parser. More recently attention has focused on representing an EBNF grammar in the equivalent form of a Transition Network ( TN ). For EBNF grammars (or their TN ’s) there are many attempts to apply LR analysis, but no simple and standard solution: • regular expressions are annotated and manipulated directly ⇒ this approach is somewhat distant from practical parsing • EBNF is turned into BNF ⇒ grammar is obscured and larger • EBNF rules are processed directly ⇒ parser is complicated due to the reduction move (at least in the current solutions) There are also incomplete or even wrong solutions proposed. Breveglieri, Crespi, Morzenti (PoliMi) S-R Parsers for Transition Networks LATA 2014 4 / 26

  5. Transition Network Definition and Example EBNF grammar and recursive transition network An EBNF grammar may have a regular expr. in a rule right part: a | b B ∗ c A → in general A → r . e . a , b , . . . , A , B . . . � + � � � and such an extended rule is interpreted as ∞ -many BNF rules: A → a | b c | b B c | b B B c | . . . | a b c | b c a | . . . Thus stipulate EBNF may have only one rule per each nonterminal. Represent a grammar by a Transition Network ( TN ): a set of DFA ’s. Each DFA is equivalent to the regular expression in a rule right part. The TN has a single DFA (called machine ) per each nonterminal. A transition with a nonterminal label is a call site for another machine. So any machine can invoke any other one recursively (even itself). Breveglieri, Crespi, Morzenti (PoliMi) S-R Parsers for Transition Networks LATA 2014 5 / 26

  6. Transition Network Definition and Example Sample transition network EBNF grammar G of a simple language of expressions (axiom S ) { a , ‘ ( ’ , ‘ ) ’ } S T ∗ � → Σ = G V { S , T } T ‘ ( ’ S ‘ ) ’ | a → = Transition network of G with a machine for S (axiomatic) and one for T T S ( ) T → S → T → 0 S 1 S 0 T 1 T 2 T 3 T call site ↓ ↓ a A machine of the TN is a DFA over the alphabet union of Σ and V . But the initial state of a machine must not have any ingoing arcs. A machine may be in the minimal form (except the initial state). BNF : machine modeled as tree with no loops or confluent paths. Breveglieri, Crespi, Morzenti (PoliMi) S-R Parsers for Transition Networks LATA 2014 6 / 26

  7. Transition Network Alternative Representation Right linearized grammar of a TN A right linearized grammar is a piecewise right linear grammar. Rules are parted into right linear groups, which call one another. Each such group has only terminal or right recursive linear rules. So a right linearized grammar maps a TN 1 , but is purely BNF . It is a useful theoretical representation, yet unfit for parsing. Right linearized grammar G RL of the sample TN (axiom 0 S ) 0 T → ‘ ( ’ 1 T | a 3 T  ↑ ↑ T   0 S → 0 T 1 S | ε S → T 0 S 1 S  1 T → 0 S 2 T  G RL 2 T → ‘ ) ’ 3 T ( S )  1 S → 0 T 1 S | ε T → →  0 T 1 T 2 T 3 T  3 T → ε  a 1 Heilbrunner defined G RL (’79), unrelated to TN . Breveglieri, Crespi, Morzenti (PoliMi) S-R Parsers for Transition Networks LATA 2014 7 / 26

  8. Parser Control Analysis Item Item structure and its meaning An item is a pair � p , π � (called state , look-ahead ), such that: • p is a state of (a machine of) the transition network � � • π ⊆ Σ ∪ { ⊣ } is a subset of terminals ( π � = ∅ ) An item represents an analysis point reached by the parser: • a machine (i.e., a rule) matches the input as far as state p • π contains the terminals expected after the machine ends a B d o A r A s A t A A · · · · · · call site � p B , { d } � b c p B q B B → → 0 B If the string to parse is . . . a b c d . . . , then the item means that B machine B (called at site r A → s A ) has matched symbol b and now is at state p B , and that when it ends, symbol d is expected. Breveglieri, Crespi, Morzenti (PoliMi) S-R Parsers for Transition Networks LATA 2014 8 / 26

  9. Parser Control Analysis Item Item shift: evolving an existing item Suppose � p , π � is an existing item, with state p and look-ahead π . Define an item shift (partial) function: shift : set of all items × Σ ∪ V � � → set of all items which works on an item as follows: if arc p X shift � p , π � , X = � q , π � → q is in the TN � � where X is any grammar symbol (terminal or nonterminal). Using the TN , the shift function matches an item and a grammar symbol, and goes to the next item on the same machine. Since the machine of the TN remains the same for the shifted item, the shift function does not change the item look-ahead. Breveglieri, Crespi, Morzenti (PoliMi) S-R Parsers for Transition Networks LATA 2014 9 / 26

  10. Parser Control Analysis Item Item closure: creating a new look-ahead Closure of a (non-empty) set I of items ∃ item � r , ρ � ∈ closure ( I )     B  and ∃ arc � r → s � ∈ TN  closure ( I ) = I ∪ � 0 B , π � and π = initials L ( s ) · ρ  � �    (sample TN of G ) Closure examples set I of items new items added to I by closure � 0 S , { ‘ ) ’ } � � 0 T , { ‘ ( ’ , a , ‘ ) ’ } � � 1 T , { . . . } � T ( S ) S → T T → → 0 S 1 S 0 T 1 T 2 T 3 T ↓ ↓ a Closure may create items with initial TN state and new look-ahead. Breveglieri, Crespi, Morzenti (PoliMi) S-R Parsers for Transition Networks LATA 2014 10 / 26

  11. Parser Control Graph Construction Macro-state (m-state) and parser pilot A macro-state (m-state) is a non-empty set of items, which represent possible analysis points reached by the parser. The pilot of a TN (grammar) is a finite directed graph, where: • the nodes are the m-states reachable by the parser • the arcs connect m-states through grammar symbols Extend the item shift function shift to the macro-states: � p , π � p π shift ( � p , π � , X ) � q , π � � r , ρ � r = ρ shift ( I , X ) = shift ( � r , ρ � , X ) � s , ρ � = . . . . . . . . . shift ( . . . , X ) . . . m-state I graphic form For BNF grammars, items are often denoted as marked rules: B → β • γ, π � p B , π � ⇔ String β is the path from state 0 B to state p B in the machine B . Breveglieri, Crespi, Morzenti (PoliMi) S-R Parsers for Transition Networks LATA 2014 11 / 26

  12. Parser Control Graph Construction Algorithm for building the pilot graph pilot DFA P = ( Σ ∪ V , R , ϑ, I 0 ) m-state set R = { I 0 , I 1 , . . . } transition function ϑ : R × ( Σ ∪ V ) → R Pilot graph algorithm - computes R and ϑ of P R := closure - - initial m-state I 0 � � � � � 0 S , { ⊣ } � repeat for each m-s. I ∈ R and sym. X ∈ Σ ∪ V do I ′ := closure shift ( I , X ) � � add m-state I ′ to the m-state set R → I ′ to the transition function ϑ add arc I X end for until R does not change any more Breveglieri, Crespi, Morzenti (PoliMi) S-R Parsers for Transition Networks LATA 2014 12 / 26

  13. Parser Control Graph Construction Algorithm for building the pilot graph pilot DFA P = ( Σ ∪ V , R , ϑ, I 0 ) m-state set R = { I 0 , I 1 , . . . } transition function ϑ : R × ( Σ ∪ V ) → R Pilot graph algorithm - computes R and ϑ of P R := closure - - initial m-state I 0 � � � � � 0 S , { ⊣ } � graphic form repeat items obtained X for each m-s. I ∈ R and sym. X ∈ Σ ∪ V do I − → through shift shift (m-state base) I ′ := closure shift ( I , X ) � � closure add m-state I ′ to the m-state set R new items (if any) → I ′ to the transition function ϑ add arc I X added to m-state through closure end for (m-state closure) until R does not change any more m-state I ′ Breveglieri, Crespi, Morzenti (PoliMi) S-R Parsers for Transition Networks LATA 2014 12 / 26

Recommend


More recommend