cse 3341 principles of programming languages syntax
play

CSE 3341: Principles of Programming Languages Syntax Jeremy Morris - PowerPoint PPT Presentation

CSE 3341: Principles of Programming Languages Syntax Jeremy Morris 1 Syntax vs. Semantics Syntax: What kinds of symbols are allowed in a language? Semantics What do the symbols in a language mean ? 2 Language Terminology


  1. CSE 3341: Principles of Programming Languages Syntax Jeremy Morris 1

  2. Syntax vs. Semantics  Syntax:  What kinds of symbols are allowed in a language?  Semantics  What do the symbols in a language mean ? 2

  3. Language Terminology  Alphabet  Finite set of symbols  String  Sequence of symbols  Language  Set of strings over an alphabet  Grammar  Rules that define which strings over an alphabet are in the language and which ones are not 3

  4. Terminology Example  Consider the Java programming language  Alphabet The tokens in the Java language.  if , then , while , do , > , < , String , variable names, etc.  Note: Not the individual characters  Not your intuitive understanding of the term “alphabet”.   String A sequence of tokens from the alphabet   Language The set of all syntactically correct Java programs.   Grammar The rules for producing syntactically correct Java programs.  https://docs.oracle.com/javase/specs/jls/se8/html/index.html  (It’s a nearly 800 page book – you don’t need to read it)  4

  5. Language Terminology  We typically talk about languages in mathematical terms as sets  Alphabet – finite set of symbols Often denoted as Σ   String – finite set of symbol sequences Empty string: ε – a sequence of length 0  Σ * - the set of all strings over Σ (including ε )  The * represents the “Kleene closure” – we’ll discuss this more later  Σ + - the set of all non-empty strings over Σ  The + represents “one or more” where the * represents “zero or more”   Language – set of strings Language L ⊆ Σ *  Defined by a grammar  Probably will not contain everything in Σ *  5

  6. Syntax - Specification  We use syntax rules to specify the syntax of a language  Language – set of all strings  Some rules for non-negative integers: number → digit digit* digit → 0 | 1 | 2 | 3 | 4 | 5| 6 | 7 | 8 | 9  With these we can specify any non-negative integer. 6

  7. Syntax Rule Terminology  Terminal symbol  Any symbol that represents a member of the alphabet for the language i.e. Any symbol that is in the set of all possible tokens for the  language Will only appear on the right hand side of a syntax rule  (At least for our purposes – not strictly true)   Non-terminal symbol  Any symbol that represents a rule to be expanded Non-terminal – meaning “we need to keep going”  Can appear on either the left or the right hand side of a syntax rule   Meta-symbols  Symbols used to write the rules, but not part of the alphabet or non-terminals →, |, *, etc .  7

  8. Terminology Example number → digit digit* digit → 0 | 1 | 2 | 3 | 4 | 5| 6 | 7 | 8 | 9 Which of these are terminal symbols? Non-terminal? Meta? 8

  9. Syntax – Types of Grammars  Chomsky Hierarchy  Outlines how complex formal languages are based on their rules  Type-0 – Unrestricted (aka Recursively enumerable)  Type-1 – Context-sensitive  Type-2 – Context-free  Type-3 – Regular  We will focus on those last two 9

  10. Regular Languages (aka Regular Expressions)  The simplest kind of grammar  Requires only 3 kinds of rules: Concatenation  Join two things together  Alternation  Select between two choices  “Kleene closure”  Repeat something zero or more times.   No recursion is allowed If we allow recursion, then we get Context-free grammars  10

  11. Regular Languages (aka Regular Expressions)  Assume an alphabet Σ . A regular expression over Σ is:  Φ – the empty set  ε – the empty string  Any member of Σ (i.e. R = { r | r ϵ Σ })  Concatenation If R and S are both regular expressions over Σ , then so is RS  RS = {r.s | r ϵ Σ and s ϵ Σ }   Alternation If R and S are both regular expressions over Σ , then so is R ∪ S  Written as R|S – choose between R or S   “Kleene closure” If R is a regular expression over Σ , then so is R*  R repeated 0 or more times – R concatenated with itself  11

  12. Regular Languages  In syntax rules we can define a regular language like this: number → digit digit* digit → 0 | 1 | 2 | 3 | 4 | 5| 6 | 7 | 8 | 9  Another way of saying: Σ = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9} number = {dd*, d ϵ Σ } (There might be a problem with this definition of a natural number – can you spot it?) 12

  13. Regular Languages  Another example (from the textbook)  Numeric constants number → integer | real integer → digit digit* real → integer exp | decimal (exp | ε ) decimal → digit* (. digit | digit .) digit* exp → (e | E) (+ | - | ε ) integer digit → 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |0 13

  14. Derivations  Using syntax rules we can derive strings that are in our language  Using the previous set of rules, can we show that 655 is in our language of “numeric constants”? ⇒ integer number ⇒ digit digit* ⇒ 6 digit* ⇒ 6 5 digit* ⇒ 6 5 5 digit* ⇒ 6 5 5 14

  15. Derivations Example  Using the rules on the previous slide, determine if the following strings are in the language for numeric constants:  10e5  .65e30  .65e0.30  10.0e5.0  10.0e-5 15

  16. Context-Free Languages  The Chomsky Hierarchy mentioned above is a hierarchy  All Regular Languages are also Context-Free, but not all Context- Free Languages are Regular  Consider the language L = { a n b n | n ≥ 0 } Empty string, ab, aabb, aaabbb, etc. are all in this language  aabbb, aaabb, a, etc. are not.  Can we derive the rules for this language using only the rules set out  for regular languages? No, as it turns out.  You can prove this mathematically using a theorem known as the  pumping lemma , but that’s outside the scope of this class see CSE 3321 – Formal Languages and Automata  But if we allow recursion we can do it easily  16

  17. Context-Free Grammars (CFGs)  A grammar that defines a Context-Free language has the same properties as a Regular grammar…  Concatenation, Alternation, Kleene Closure  …but allows for recursion in its rules  Either immediate recursion – the non-terminal on both the right and left hand side of the same rule We’ll see an example of this on the next slide   Or mutal recursion – a non-terminal on the left expands a rule that eventually expands that non-terminal We’ll see an example of this in a moment – hang in there  17

  18. Context Free Grammars (CFGs) The following grammar is not Regular, but is Context-Free: expr → number | expr op expr | ( expr ) op → + | - | / | * number → digit digit* digit → 0 | 1 | 2 | 3 | 4 | 5| 6 | 7 | 8 | 9  Note the recursion in the rule for expanding expr  This grammar is problematic…  Let’s derive 1+3*2 using the previous rules 18

  19. Context-Free Grammars  We can represent a derivation graphically as a parse tree or syntax tree  The root of the tree is the start symbol for the grammar  The internal nodes are non-terminal symbols  The leaf nodes are terminal symbols expr expr expr op number + expr op expr 1 number * number 2 3 19

  20. Context-Free Grammars  Consider these two trees, both derived from the above expr grammar: expr expr op number expr op expr * 2 number number + expr 3 1 expr expr op number + expr op expr 1 number * number 2 3 20

  21. Context-Free Grammars  A better, unambiguous grammar: expr → term | expr add_op term term → factor | term mult_op factor factor → number | ( expr ) mult_op → * | / add_op → + | - number → digit digit* digit → 0 | 1 | 2 | 3 | 4 | 5| 6 | 7 | 8 | 9  Still not Regular, but Context-Free  Recursion is still there 21

  22. Languages in Compilers & Interpreters Stream of Parse Tokenizer/ Next Steps Characters Tree Scanner Stream of Parser tokens 22

  23. Syntax - Specification  The previous syntax rules are one type on notation for a syntax. number → digit digit* digit → 0 | 1 | 2 | 3 | 4 | 5| 6 | 7 | 8 | 9  Here’s another: <number> ::= <digit> | <digit> <number> <digit> ::= 0 | 1 | 2 | 3 | 4 | 5| 6 | 7 | 8 | 9 Backus-Naur Form (aka Backus normal form aka BNF)  Note that pure BNF does not use Kleene-star or Kleene-plus  Other extensions provide shorthand to allow these, but it doesn't change the  expressiveness to not have them (see above for how to replace Kleene star) 23

  24. BNF Specification <number> ::= <digit> | <digit> <number> <digit> ::= 0 | 1 | 2 | 3 | 4 | 5| 6 | 7 | 8 | 9  Special symbols: <, >, | and ::=  Reserved (or ‘meta’) symbols  Non-terminals  Wrapped in <> tags - <digit> or <number>  Indicate rules that need to be expanded  Terminals  Not wrapped in <> tags  Indicate “terminal” symbols – no more expansion 24

Recommend


More recommend