parsing expression grammars
play

Parsing Expression Grammars: A Recognition-Based Syntactic - PowerPoint PPT Presentation

Parsing Expression Grammars: A Recognition-Based Syntactic Foundation Bryan Ford Massachusetts Institute of Technology January 14, 2004 Designing a Language Syntax Designing a Language Syntax Textbook Method 1.Formalize syntax via


  1. Parsing Expression Grammars: A Recognition-Based Syntactic Foundation Bryan Ford Massachusetts Institute of Technology January 14, 2004

  2. Designing a Language Syntax

  3. Designing a Language Syntax Textbook Method 1.Formalize syntax via context-free grammar 2.Write a YACC parser specification 3.Hack on grammar until “near- LALR(1) ” 4.Use generated parser

  4. Designing a Language Syntax Textbook Method Pragmatic Method 1.Specify syntax 1.Formalize syntax via informally context-free grammar 2.Write a recursive 2.Write a YACC parser descent parser specification 3.Hack on grammar until “near- LALR(1) ” 4.Use generated parser

  5. What exactly does a CFG describe? Short answer: a rule system to generate language strings S Example CFG:  aa S S  aa S aa aaaa S S   aaaa ...

  6. What exactly does a CFG describe? Short answer: a rule system to generate language strings Start symbol S Example CFG:  aa S S  aa S aa aaaa S S   aaaa ...

  7. What exactly does a CFG describe? Short answer: a rule system to generate language strings Start symbol S Example CFG:  aa S S  aa S aa aaaa S S   aaaa ... Output strings

  8. What exatly do we want to describe? Proposed answer: a rule system to recognize language strings Parsing Expression Grammar (PEG) models recursive descent parsing practice a a a a  Example PEG: a a S S  aa S /  a a S S

  9. What exatly do we want to describe? Proposed answer: a rule system to recognize language strings Parsing Expression Grammar (PEG) models recursive descent parsing practice Input a a a a  Example PEG: string a a S S  aa S /  a a S S

  10. What exatly do we want to describe? Proposed answer: a rule system to recognize language strings Parsing Expression Grammar (PEG) models recursive descent parsing practice Input a a a a  Example PEG: string a a S S  aa S /  a a Derive S structure S

  11. Take-Home Points Key benefits of PEGs: ● Simplicity, formalism, analyzability of CFGs ● Closer match to syntax practices – More expressive than deterministic CFGs ( LL / LR ) – More of the “ right kind ” of expressiveness: prioritized choice, greedy rules, syntactic predicates – Unlimited lookahead, backtracking ● Linear-time parsing for any PEG

  12. What kind of recursive descent parsing? Key assumptions: ● Parsing functions are stateless : depend only on input string ● Parsing functions make decisions locally : return at most one result (success/failure)

  13. Parsing Expression Grammars Consists of: (∑, N , R , e S ) – ∑: finite set of terminals (character set) – N : finite set of nonterminals – R : finite set of rules of the form “ A  e ”, where A ∈ N , e is a parsing expression . – e S : a parsing expression called the start expression .

  14. Parsing Expressions  the empty string terminal ( a ∈ ∑) a nonterminal ( A ∈ N ) A a sequence of parsing expressions e 1 e 2 e 1 / e 2 prioritized choice between alternatives e ? , e *, e + optional, zero-or-more, one-or-more & e , ! e syntactic predicates

  15. How PEGs Express Languages Given input string s , a parsing expression either: – Matches and consumes a prefix s' of s . – Fails on s . Example: S matches “ badder ” S matches “ baddest ” S  bad S fails on “ abad ” S fails on “ babe ”

  16. Prioritized Choice with Backtracking S  A / B means: “To parse an S , first try to parse an A . If A fails, then backtrack and try to parse a B .” Example: S  if C then S else S / if C then S S matches “ if C then S foo ” S matches “ if C then S 1 else S 2 ” S fails on “ if C else S ”

  17. Prioritized Choice with Backtracking S  A / B means: “To parse an S , first try to parse an A . If A fails, then backtrack and try to parse a B .” Example from the C++ standard : “An expression-statement ... can be indistinguishable from a declaration ... In those cases the statement is a declaration .” statement  declaration / expression-statement

  18. Greedy Option and Repetition A  e ? A  e /  equivalent to A  e* A  e A /  equivalent to A  e + A  e e* equivalent to Example: I matches “ foobar ” I  L + I matches “ foo(bar) ” L  a / b / c / ... I fails on “ 123 ”

  19. Syntactic Predicates And-predicate: & e succeeds whenever e does, but consumes no input [Parr '94, '95] Not-predicate: ! e succeeds whenever e fails Example: A matches “ foobar ” A  foo &( bar ) A fails on “ foobie ” B matches “ foobie ” B  foo !( bar ) B fails on “ foobar ”

  20. Syntactic Predicates And-predicate: & e succeeds whenever e does, but consumes no input [Parr '94, '95] Not-predicate: ! e succeeds whenever e fails Example: C  B I* E C matches “ (*ab*)cd ” I  ! E ( C / T ) C matches “ (*a(*b*)c*) ” B  (* C fails on “ (*a(*b*) ” E  *) T  [any terminal]

  21. Syntactic Predicates And-predicate: & e succeeds whenever e does, but consumes no input [Parr '94, '95] Not-predicate: ! e succeeds whenever e fails Example: Begin marker C  B I* E C matches “ (*ab*)cd ” I  ! E ( C / T ) C matches “ (*a(*b*)c*) ” B  (* C fails on “ (*a(*b*) ” E  *) T  [any terminal]

  22. Syntactic Predicates And-predicate: & e succeeds whenever e does, but consumes no input [Parr '94, '95] Not-predicate: ! e succeeds whenever e fails Example: Internal elements C  B I* E C matches “ (*ab*)cd ” I  ! E ( C / T ) C matches “ (*a(*b*)c*) ” B  (* C fails on “ (*a(*b*) ” E  *) T  [any terminal]

  23. Syntactic Predicates And-predicate: & e succeeds whenever e does, but consumes no input [Parr '94, '95] Not-predicate: ! e succeeds whenever e fails Example: End marker C  B I* E C matches “ (*ab*)cd ” I  ! E ( C / T ) C matches “ (*a(*b*)c*) ” B  (* C fails on “ (*a(*b*) ” E  *) T  [any terminal]

  24. Syntactic Predicates And-predicate: & e succeeds whenever e does, but consumes no input [Parr '94, '95] Not-predicate: ! e succeeds whenever e fails Example: C  B I* E C matches “ (*ab*)cd ” ➔ I  ! E ( C / T ) C matches “ (*a(*b*)c*) ” B  (* C fails on “ (*a(*b*) ” E  *) T  [any terminal]

  25. Syntactic Predicates And-predicate: & e succeeds whenever e does, but consumes no input [Parr '94, '95] Not-predicate: ! e succeeds whenever e fails Only if an end marker doesn't start here... Example: C  B I* E C matches “ (*ab*)cd ” ➔ I  ! E ( C / T ) C matches “ (*a(*b*)c*) ” B  (* C fails on “ (*a(*b*) ” E  *) T  [any terminal]

  26. Syntactic Predicates And-predicate: & e succeeds whenever e does, but consumes no input [Parr '94, '95] Not-predicate: ! e succeeds whenever e fails Only if an end marker doesn't start here... Example: C  B I* E ...consume a nested comment, or else consume any single character. C matches “ (*ab*)cd ” ➔ I  ! E ( C / T ) C matches “ (*a(*b*)c*) ” B  (* C fails on “ (*a(*b*) ” E  *) T  [any terminal]

  27. Syntactic Predicates And-predicate: & e succeeds whenever e does, but consumes no input [Parr '94, '95] Not-predicate: ! e succeeds whenever e fails Example: C  B I* E C matches “ (*ab*)cd ” I  ! E ( C / T ) C matches “ (*a(*b*)c*) ” B  (* C fails on “ (*a(*b*) ” E  *) T  [any terminal]

  28. Unified Grammars PEGs can express both lexical and hierarchical syntax of realistic languages in one grammar ● Example (in paper): Complete self-describing PEG in 2/3 column ● Example (on web): Unified PEG for Java language

  29. Lexical/Hierarchical Interplay Unified grammars create new design opportunities Example: To get Unicode “ ∀ ”, E  S / ( E ) / ... instead of “\u2200” , S  “ C * “ write “\(0x2200)” C  \( E ) / “\(8704)” or ! “ ! \ T “\(FOR_ALL)” or T  [any terminal]

  30. Lexical/Hierarchical Interplay Unified grammars create new design opportunities Example: General-purpose expression syntax To get Unicode “ ∀ ”, E  S / ( E ) / ... instead of “\u2200” , S  “ C * “ write “\(0x2200)” C  \( E ) / “\(8704)” or ! “ ! \ T “\(FOR_ALL)” or T  [any terminal]

  31. Lexical/Hierarchical Interplay Unified grammars create new design opportunities Example: String literals To get Unicode “ ∀ ”, E  S / ( E ) / ... instead of “\u2200” , S  “ C * “ write “\(0x2200)” C  \( E ) / “\(8704)” or ! “ ! \ T “\(FOR_ALL)” or T  [any terminal]

  32. Lexical/Hierarchical Interplay Unified grammars create new design opportunities Example: Quotable characters To get Unicode “ ∀ ”, E  S / ( E ) / ... instead of “\u2200” , S  “ C * “ write “\(0x2200)” C  \( E ) / “\(8704)” or ! “ ! \ T “\(FOR_ALL)” or T  [any terminal]

  33. Lexical/Hierarchical Interplay Unified grammars create new design opportunities Example: To get Unicode “ ∀ ”, E  S / ( E ) / ... instead of “\u2200” , S  “ C * “ write “\(0x2200)” C  \( E ) / “\(8704)” or ! “ ! \ T “\(FOR_ALL)” or T  [any terminal]

  34. Formal Properties of PEGs ● Express all deterministic languages - LR(k) ● Closed under union, intersection, complement ● Some non-context free languages, e.g., a n b n c n ● Undecidable whether L ( G ) = ∅ ● Predicate operators can be eliminated – ...but the process is non-trivial!

Recommend


More recommend