chapter 6 syntax syntax
play

Chapter 6: Syntax Syntax Syntax is the structure of a language. - PowerPoint PPT Presentation

Chapter 6: Syntax Syntax Syntax is the structure of a language. Earlier, both syntax and semantics were described using lengthy English language explanations. Although semantics are still described in English, syntax is described


  1. Chapter 6: Syntax

  2. Syntax  Syntax is the structure of a language.  Earlier, both syntax and semantics were described using lengthy English language explanations.  Although semantics are still described in English, syntax is described using a formal system. 2

  3. Syntax  In the 1950s, Noam Chomsky developed the idea of context-free grammars.  John Backus, with contributions by Peter Naur, developed a notational system for describing context-free grammars: The Backus-Naur Forms (BNF) 3

  4. Syntax  BNF was first used to describe the syntax of Algol60.  Later used to describe C, Java, and Ada. Every modern programmer and computer scientist must know how to read, interpret, and apply BNF descriptions of language syntax. 4

  5. Syntax  BNFs occur in three basic forms:  Original BNF  Extended BNF (EBNF) (Popularized by Niklaus Wirth)  Syntax Diagrams 5

  6. Lexical Structure  The lexical structure of a programming language is the structure of its words.  Can be considered separate from syntax, but is VERY closely related to it. 6

  7. Lexical Structure  Typically, the scanning phase of a translator collects sequences of characters from the input program into tokens.  Tokens are then processed by a parsing phase, which determines the syntactic structure.  Tokens can be defined using either grammar or regular expressions (to describe text patterns). 7

  8. Lexical Structure • Tokens fall into several distinct categories: – Reserved words (Keywords): • if, while, else, main – Literals or constants: • 42, 27.5, “ Hello ” , ‘ A ’ – Special symbols: • > >= < ; , + – Identifiers • X24, var1, balance 8

  9. Lexical Structure  Java reserved words: abstract default if private this boolean implements protected throw do break double import public throws byte else instanceof return transient case extends int short try catch final interface static void char finally long strictfp volatile class float native super while const for new switch continue goto package synchronized 9

  10. Lexical Structure  Identifiers may not be names as keywords.  Keywords may also be called predefined identifiers.  In some languages, identifiers have a fixed maximum length. 10

  11. Lexical Structure  Some programming languages allow arbitrary length of identifiers, but only the first six or eight characters may be guaranteed to be significant (very confusing for programmers). 11

  12. Lexical Structure • What about: – doif • Is it an identifier called “ doif ” • Or is it the keywords do if? 12

  13. Lexical Structure  Principle of longest substring (Principle of maximum munch):  At each point, the longest possible string of characters is collected into a single token. 13

  14. Lexical Structure  The principle of longest substring requires that certain tokens be separated by token delimiters or white space.  End of lines may be significant, indentation may also be significant. 14

  15. Lexical Structure  A free format language is one where the format does not affect the program structure (Except to satisfy the principle of longest substring of course). Example:  Put as many blank lines as you want.  Put as many spaces as you want between identifiers.  Most modern languages are free format. 15

  16. Lexical Structure  FORTRAN is a primary example of a language violating the free format conventions.  As pre-processing, FORTRAN totally ignores white spaces. They are removed before processing starts.  FORTRAN has no reserved words at all. 16

  17. Lexical Structure  Regular expressions:  Are descriptions of patterns of characters.  Composed of three basic operations:  Concatenation  Repetition  Choice (selection) 17

  18. Lexical Structure  Regular expressions:  Example, describe using a regular expression the occurrence of: Repetition Choice  0 or more repetitions of either a or b  Followed by the single character c (concatenation)  Such as: Concatenation  aaaaabbbbbbc  abbbbbbbbbc  abaaaabbbbaaaaabc  c  abaaabbbbc  bbbbbbc 18

  19. Lexical Structure  Regular expressions:  Example, describe using a regular expression the occurrence of:  0 or more repetitions of either a or b  Followed by the single character c (concatenation)  Example of rejected strings:  bca  cabbbb  b  a  aaaabbb 19

  20. Lexical Structure  Regular expressions:  Example, describe using a regular expression the occurrence of:  0 or more repetitions of either a or b  Followed by the single character c (concatenation)  The regular expression is:  (a | b)* c  The | means OR  The * means zero or more occurrences 20

  21. Lexical Structure • Regular expressions: – Regular expression notation is often extended by additional operators such as the “ + ” operator. – (a | b)+ • Means ONE or more occurrences of either a or b • Equivalent to (a | b) (a | b)* 21

  22. Lexical Structure  Regular expressions:  Example: write a regular expression for integer constants: i.e. one or more digits.  Note [a-b] means a range 22

  23. Lexical Structure  Regular expressions:  Example: write a regular expression for integer constants  [0-9]+ 23

  24. Lexical Structure  Regular expressions:  Example: write a regular expression for floating point constants: One or more digits followed by an optional decimal point then one or more digits.  [0-9]+(\.[0-9]+)? Escape Optional Sequence 24

  25. Lexical Structure  Regular expressions:  Most modern text editors allow for defining regular expressions to perform searching.  Search utilities such as UNIX grep also uses it.  Lex can also be used to turn regular expressions into an automatic scanner! 25

  26. Lexical Structure  Regular expressions:  Can you write a small lexical analyzer to recognize certain tokens.  Can you write a small scanner to accept a simple expression consisting of the tokens you previously recognized?. 26

  27. Parsing Techniques and Tools • A scanner program that only identifies tokens using regular expressions can be automatically generated using regular expressions. • Lex is a famous scanner generator. • It ’ s freeware version is called Flex (Fast Lex). • To be covered in detail in a compiler course. 27

  28. Context-Free Grammars and BNFs  Grammar of a Simple English Sentence Example: sentence -> noun_phrase verb_phrase . noun_phrase -> article noun article -> a | the noun -> girl | dog verb_phrase -> verb noun_phrase verb -> sees | pets OR 28

  29. Context-Free Grammars and BNFs • Grammar of a Simple English Sentence Example: – One can alternatively use different notation such as: • <sentence> ::= <noun_phrase> <verb_phrase> ‘ . ’ • But the ‘ ‘ used around the full stop now also become metasymbols themselves. 29

  30. Context-Free Grammars and BNFs  There is an ISO standard format for BNF notation.  ISO 14977 [1996] 30

  31. Context-Free Grammars and BNFs • Question: Does the sentence “ The girl sees a dog. ” belong to the grammar indicated earlier? • We go through a process of derivation to see if this sentence is accepted by the grammar or not. 31

  32. Context-Free Grammars and BNFs  Exercise: Is it possible to derive:  The girl sees a dog.  From the following grammar? sentence -> noun_phrase verb_phrase . noun_phrase -> article noun article -> a | the noun -> girl | dog verb_phrase -> verb noun_phrase verb -> sees | pets 32

  33. Context-Free Grammars and BNFs • There are two primary problems with the previous grammar: – thegirlseesapet is also an acceptable sentence. • It is up to the scanner to be insensitive to spaces. – The grammar does not specify that articles appearing at the beginning of a sentence should be capitalized. • Such “ positional ” property is often hard to deal with using context-free grammars. 33

  34. Context-Free Grammars and BNFs  Terminology: Start Symbol Metasymbol Non-Terminal sentence -> noun_phrase verb_phrase . noun_phrase -> article noun article -> a | the Terminal noun -> girl | dog verb_phrase -> verb noun_phrase verb -> sees | pets Production (Grammar Rule) 34

  35. Context-Free Grammars and BNFs • Definitions: – A context-free grammar consists of a series of grammar rules: • The rules consist of a left hand side that is a single structure. • Followed by a metasymbol “ -> ” • Followed by a right hand side consisting of non-terminals and terminals separated by | – Productions are in BNF if they are as given using only the symbols • -> • | • Sometimes parenthesis 35

  36. Context-Free Grammars and BNFs  Definitions:  A context-free language:  Defines the language of the grammar.  This language is the set of all strings of terminals for which there exists a derivation beginning with the start symbol and ending with the string of terminals. 36

Recommend


More recommend