compiler development cmpsc 401
play

Compiler Development (CMPSC 401) Syntax Analysis Janyl Jumadinova - PowerPoint PPT Presentation

Compiler Development (CMPSC 401) Syntax Analysis Janyl Jumadinova February 14, 2019 Janyl Jumadinova Compiler Development (CMPSC 401) February 14, 2019 1 / 14 Syntax Analysis (Parsing) Janyl Jumadinova Compiler Development (CMPSC 401)


  1. Compiler Development (CMPSC 401) Syntax Analysis Janyl Jumadinova February 14, 2019 Janyl Jumadinova Compiler Development (CMPSC 401) February 14, 2019 1 / 14

  2. Syntax Analysis (Parsing) Janyl Jumadinova Compiler Development (CMPSC 401) February 14, 2019 2 / 14

  3. Syntax Analysis (Parsing) Janyl Jumadinova Compiler Development (CMPSC 401) February 14, 2019 2 / 14

  4. What is Syntax Analysis? Janyl Jumadinova Compiler Development (CMPSC 401) February 14, 2019 3 / 14

  5. What is Syntax Analysis? After lexical analysis (scanning), we have a series of tokens. In syntax analysis (or parsing ), we want to interpret what those tokens mean. Janyl Jumadinova Compiler Development (CMPSC 401) February 14, 2019 3 / 14

  6. What is Syntax Analysis? After lexical analysis (scanning), we have a series of tokens. In syntax analysis (or parsing ), we want to interpret what those tokens mean. Goal : Recover the structure described by that series of tokens. Goal : Report errors if those tokens do not properly encode a structure. Janyl Jumadinova Compiler Development (CMPSC 401) February 14, 2019 3 / 14

  7. Formal Languages An alphabet is a set � of symbols that act as letters. A language over � is a set of strings made from symbols in � . Janyl Jumadinova Compiler Development (CMPSC 401) February 14, 2019 4 / 14

  8. Formal Languages An alphabet is a set � of symbols that act as letters. A language over � is a set of strings made from symbols in � . When scanning, our alphabet was ASCII or Unicode characters. We produced tokens. Janyl Jumadinova Compiler Development (CMPSC 401) February 14, 2019 4 / 14

  9. Formal Languages An alphabet is a set � of symbols that act as letters. A language over � is a set of strings made from symbols in � . When scanning, our alphabet was ASCII or Unicode characters. We produced tokens. When parsing, our alphabet is the set of tokens produced by the scanner. Janyl Jumadinova Compiler Development (CMPSC 401) February 14, 2019 4 / 14

  10. Regular Expressions When scanning, we used regular expressions to define each token. Unfortunately, regular expressions are (usually) too weak to define programming languages. Janyl Jumadinova Compiler Development (CMPSC 401) February 14, 2019 5 / 14

  11. Regular Expressions When scanning, we used regular expressions to define each token. Unfortunately, regular expressions are (usually) too weak to define programming languages. Cannot define a regular expression matching all expressions with properly balanced parentheses. Cannot define a regular expression matching all functions with properly nested block structure. Janyl Jumadinova Compiler Development (CMPSC 401) February 14, 2019 5 / 14

  12. Regular Expressions When scanning, we used regular expressions to define each token. Unfortunately, regular expressions are (usually) too weak to define programming languages. Cannot define a regular expression matching all expressions with properly balanced parentheses. Cannot define a regular expression matching all functions with properly nested block structure. We need a more powerful formalism. Janyl Jumadinova Compiler Development (CMPSC 401) February 14, 2019 5 / 14

  13. Context-Free Grammar A context-free grammar (or CFG) is a formalism for defining languages. Can define the context-free languages, a strict superset of the regular languages. Unlike regular grammars, the right hand-side of the production rules are unrestricted. Janyl Jumadinova Compiler Development (CMPSC 401) February 14, 2019 6 / 14

  14. CFG Example One possible CFG for describing all legal arithmetic expressions using addition, subtraction, multiplication, and division Janyl Jumadinova Compiler Development (CMPSC 401) February 14, 2019 7 / 14

  15. CFG Example One possible CFG for describing all legal arithmetic expressions using addition, subtraction, multiplication, and division Janyl Jumadinova Compiler Development (CMPSC 401) February 14, 2019 8 / 14

  16. Context-Free Grammar Formally, a context-free grammar (as is the regular grammar) is a collection of four objects: A set of nonterminal symbols (or variables ), A set of terminal symbols, A set of production rules saying how each nonterminal can be converted by a string of terminals and nonterminals, and A start symbol that begins the derivation. Janyl Jumadinova Compiler Development (CMPSC 401) February 14, 2019 9 / 14

  17. Ambiguity A context-free grammar is said to be ambiguous if there is more than one derivation for a particular string. Janyl Jumadinova Compiler Development (CMPSC 401) February 14, 2019 10 / 14

  18. Ambiguity A context-free grammar is said to be ambiguous if there is more than one derivation for a particular string. Consider: 1 S → ASB 2 S → ǫ 3 A → a 4 B → b Janyl Jumadinova Compiler Development (CMPSC 401) February 14, 2019 10 / 14

  19. Ambiguity Consider: 1 Expr → Expr + Expr 2 Expr → Expr * Expr 3 Expr → ( Expr ) 4 Expr → var 5 Expr → const There are two different derivation trees for the string var+var*var Janyl Jumadinova Compiler Development (CMPSC 401) February 14, 2019 11 / 14

  20. Ambiguity We need unambiguous grammars for parsing - The derivation determines the shape of the parse tree/ abstract syntax tree, which in turn determines meaning. Janyl Jumadinova Compiler Development (CMPSC 401) February 14, 2019 12 / 14

  21. Ambiguity We need unambiguous grammars for parsing - The derivation determines the shape of the parse tree/ abstract syntax tree, which in turn determines meaning. If a grammar can be made unambiguous at all, it is usually made unambiguous through layering . Janyl Jumadinova Compiler Development (CMPSC 401) February 14, 2019 12 / 14

  22. Ambiguity We need unambiguous grammars for parsing - The derivation determines the shape of the parse tree/ abstract syntax tree, which in turn determines meaning. If a grammar can be made unambiguous at all, it is usually made unambiguous through layering . – Have exactly one way to build each piece of the string. – Have exactly one way of combining those pieces back together. Janyl Jumadinova Compiler Development (CMPSC 401) February 14, 2019 12 / 14

  23. Resolving Ambiguity With grammar : If you can re-design the language, can avoid the problem entirely, e.g., create an end to match closest if Janyl Jumadinova Compiler Development (CMPSC 401) February 14, 2019 13 / 14

  24. Resolving Ambiguity With grammar : If you can re-design the language, can avoid the problem entirely, e.g., create an end to match closest if With tools : Most parser tools can cope with ambiguous grammars. - Typically one can specify operator precedence and associativity. - Allows simpler, ambiguous grammar with fewer nonterminals as basis for generated parser, without creating problems. Janyl Jumadinova Compiler Development (CMPSC 401) February 14, 2019 13 / 14

  25. Precedence Declaration If we leave the world of pure CFGs, we can often resolve ambiguities through precedence declarations - e.g. multiplication has higher precedence than addition, but lower precedence than exponentiation. Janyl Jumadinova Compiler Development (CMPSC 401) February 14, 2019 14 / 14

  26. Precedence Declaration If we leave the world of pure CFGs, we can often resolve ambiguities through precedence declarations - e.g. multiplication has higher precedence than addition, but lower precedence than exponentiation. Allows for unambiguous parsing of ambiguous grammars. Janyl Jumadinova Compiler Development (CMPSC 401) February 14, 2019 14 / 14

Recommend


More recommend