Outline Semantic Analysis • The role of semantic analysis in a compiler – A laundry list of tasks • Scope – Static vs. Dynamic scoping – Implementation: symbol tables • Types – Static analyses that detect type errors – Statically vs. Dynamically typed languages 2 Where we are The Compiler Front-End Lexical analysis : program is lexically well-formed – Tokens are legal • e.g. identifiers have valid names, no stray characters, etc. – Detects inputs with illegal tokens Parsing : program is syntactically well-formed – Declarations have correct structure, expressions are syntactically valid, etc. – Detects inputs with ill-formed syntax Semantic analysis : – Last “front end” compilation phase – Catches all remaining errors 3 4
Beyond Syntax Errors Why Have a Separate Semantic Analysis? Parsing cannot catch some errors • What’s wrong with foo(int a, char * s){...} this C code? (Note: it parses Some language constructs are not context-free int bar() { correctly) int f[3]; – Example: Identifier declaration and use int i, j, k; – An abstract version of the problem is: • Undeclared identifier char q, *p; L = { wcw | w ∈ (a + b) * } • Multiply declared identifier float k; • Index out of bounds – The 1st w represents the identifier’s declaration; foo(f[6], 10, j); • Wrong number or types of the 2nd w represents a use of the identifier break; arguments to function call i->val = 42; – This language is not context-free • Incompatible types for operation j = m + k; • break statement outside printf("%s,%s.\n",p,q); switch/loop goto label42; • goto with no label } 5 6 What Does Semantic Analysis Do? What’s Wrong? Performs checks beyond syntax of many kinds ... Example 1 Examples: let string y ← "abc" in y + 42 1. All used identifiers are declared 2. Identifiers declared only once Example 2 3. Types let integer y in x + 42 4. Procedures and functions defined only once 5. Procedures and functions used with the right number and type of arguments And others . . . The requirements depend on the language 7 8
Attributes of an Identifier Semantic Processing : Syntax-Directed Translation Basic idea : Associate information with language name : character string (obtained from scanner) constructs by attaching attributes to the scope : program region in which identifier is valid grammar symbols that represent these constructs type : – Values for attributes are computed using semantic - integer rules associated with grammar productions - array: – An attribute can represent anything (reasonable) • number of dimensions that we choose; e.g. a string, number, type, etc. • upper and lower bounds for each dimension – A parse tree showing the values of attributes at • type of elements each node is called an annotated parse tree – function: • number and type of parameters (in order) • type of returned value • size of stack frame 9 10 Scope Scope (Cont.) • The scope of an identifier (a binding of a name • The scope of an identifier is the portion of a to the entity it names) is the textual part of program in which that identifier is accessible the program in which the binding is active • The same identifier may refer to different • Scope matches identifier declarations with uses things in different parts of the program – Important static analysis step in most languages – Different scopes for same name don’t overlap • An identifier may have restricted scope 11 12
Static vs. Dynamic Scope Static Scoping Example • Most languages have static (lexical) scope let integer x ← 0 in – Scope depends only on the physical structure of { program text, not its run-time behavior x; – The determination of scope is made by the compiler let integer x ← 1 in – C, Java, ML have static scope; so do most languages x; x; • A few languages are dynamically scoped } – Lisp, SNOBOL – Lisp has changed to mostly static scoping – Scope depends on execution of the program Uses of x refer to closest enclosing definition 13 14 Dynamic Scope Static vs. Dynamic Scope • A dynamically-scoped variable refers to the Program scopes (input, output); closest enclosing binding in the execution of var a: integer; the program With static scope procedure first; rules, it prints 1 begin a := 1; end; procedure second; Example With dynamic scope var a: integer; g(y) = let integer a ← rules, it prints 2 42 in f(3); begin first; end; f(x) = a; begin – When invoking g(54) the result will be 42 a := 2; second; write(a); end. 15 16
Dynamic Scope (Cont.) Scope of Identifiers • With dynamic scope, bindings cannot always be • In most programming languages identifier resolved by examining the program because bindings are introduced by they are dependent on calling sequences – Function declarations (introduce function names) • Dynamic scope rules are usually encountered in – Procedure definitions (introduce procedure names) interpreted languages – Identifier declarations (introduce identifiers) – Formal parameters (introduce identifiers) • Also, usually these languages do not normally have static type checking: – type determination is not always possible when dynamic rules are in effect 17 18 Scope of Identifiers (Cont.) Example: Use Before Definition • Not all kinds of identifiers follow the most- foo (integer x) closely nested scope rule { integer y • For example, function declarations y ← bar(x) – often cannot be nested ... – are globally visible throughout the program } bar (integer i): integer • In other words, a function name can be used { before it is defined ... } 19 20
Other Kinds of Scope Implementing the Most-Closely Nested Rule • In O-O languages, method and attribute • Much of semantic analysis can be expressed as names have more sophisticated (static) scope a recursive descent of an AST rules – Process an AST node n – Process the children of n – Finish processing the AST node n • A method need not be defined in the class in which it is used, but in some parent class • When performing semantic analysis on a portion of the AST, we need to know which • Methods may also be redefined (overridden) identifiers are defined 21 22 Implementing Most-Closely Nesting (Cont.) Symbol Tables • Example: Purpose : To hold information about identifiers that is computed at some point and looked up – the scope of variable declarations is one subtree at later times during compilation let integer x ← 42 in E Examples: – type of a variable can be used in subtree E x – – entry point for a function Operations : insert , lookup , delete Common implementations : linked lists, hash tables 23 24
Symbol Tables A Simple Symbol Table Implementation • Assuming static scope, consider again: • Structure is a stack let integer x ← 42 in E • Idea: • Operations – Before processing E , add definition of x to add_symbol( x ) push x and associated info, such as current definitions, overriding any other x ’s type, on the stack definition of x find_symbol( x ) search stack, starting from top, for – After processing E , remove definition of x x . Return first x found or NULL if none found and, if needed, restore old definition of x remove_symbol() pop the stack • A symbol table is a data structure that tracks • Why does this work? the current bindings of identifiers 25 26 Limitations A Fancier Symbol Table • The simple symbol table works for variable start/push a new nested scope • enter_scope() declarations finds current x (or null) • find_symbol(x) – Symbols added one at a time add a symbol x to the table • add_symbol(x) – Declarations are perfectly nested • check_scope(x) true if x defined in current scope • Doesn’t work for exits/pops the current scope • exit_scope() foo(x: integer, x: float); • Other problems? 27 28
Function/Procedure Definitions Types • Function names can be used prior to their • What is a type? definition – This is a subject of some debate • We can’t check that for function names – The notion varies from language to language – using a symbol table – or even in one pass • Consensus • Solution – A type is a set of values and – Pass 1: Gather all function/procedure names – A set of operations on those values – Pass 2: Do the checking • Semantic analysis requires multiple passes • Type errors arise when operations are performed on values that do not support that operation – Probably more than two 29 30 Why Do We Need Type Systems? Types and Operations Consider the assembly language fragment • Certain operations are legal for values of each type addi $r1, $r2, $r3 – It doesn’t make sense to add a function pointer and an integer in C What are the types of $r1, $r2, $r3 ? – It does make sense to add two integers – But both have the same assembly language implementation! 31 32
Recommend
More recommend