CS502: Compiler Design Semantic Analysis (Cont.) Manas Thakur Fall 2020
Recap ● Syntax analysis can only find, well, syntax errors. ● We are interested in being able to find various other kinds of errors: bar(int a, char* s) {...} int foo() { int f[3]; int i, j, k; char q, *p; float k; bar(f[6], 10, x); break; i->val = 5; q = k + p; printf(“%s, %s.\n”, p, k); goto label2; } Manas Thakur CS502: Compiler Design 2
Program checking ● When are checks performed? ● Static checking – At compile-time – Detect and report errors by analyzing the program offline ● Dynamic checking – At run-time – Detect and report/handle errors as they occur ● Pros and cons? – Efficiency? – Completeness? – Developer and user experience? – Language flexibility? Manas Thakur CS502: Compiler Design 3
What all can be checked statically? ● Uniqueness checks – Certain names must be unique – Many languages require variable declarations ● Control-flow checks – Match control-flow operators with structures – Example: break applies to innermost loop/switch ● Type checks – Check compatibility of operators and operands – Example: Does 3.5 + “foobar” make sense? ● What kind of check is “array bounds”? Manas Thakur CS502: Compiler Design 4
Uniqueness checks ● What does a name in a program denote? – Variable – Function – Class – Label ● Information maintained in bindings – A binding from the name to the corresponding entity – Bindings have scope: ● the region of the program in which they are valid ● Uniqueness checks – Analyze the bindings – Make sure they obey the rules Manas Thakur CS502: Compiler Design 5
Namespace abstractions ● What is a function/procedure/method? What is a class? – Do they exist at the machine-code level? – Not really! ● Functions/procedures/methods and classes essentially define namespaces. ● Helpful in – Identifying scopes – Defining bindings Manas Thakur CS502: Compiler Design 6
Procedures as namespaces ● Each procedure creates its own namespace – Names can be declared locally – Local names hide identical non-local (global) names (shadowing) – Local names cannot be seen outside the procedure ● Such a set of rules is called lexical (or static) scoping. – There must then exist a dynamic scoping! ● Ask those who have taken CS302! ● e.g., C has global, static, local, and block scopes – Blocks can be nested, procedures cannot. Manas Thakur CS502: Compiler Design 7
Lexical scoping Difgerent because of ● Why is it good? lexical scoping – Flexibility for programmer (reuse of variable names) { for (int i = 0; i < 100; ++i) { ... } for (Iterator i = list.iterator(); i.hasNext();) { ... } } – Easy to “see” a binding! ● Compiler’s headache to differentiate same-name variables at different points – Implementation: Lexically scoped symbol tables Manas Thakur CS502: Compiler Design 8
Symbol Table Character stream Machine-Independent Machine-Independent Lexical Analyzer Lexical Analyzer Code Optimizer Code Optimizer Intermediate representation Token stream Syntax Analyzer Code Generator Syntax Analyzer Code Generator Target machine code Syntax tree Machine-Dependent Machine-Dependent Semantic Analyzer Semantic Analyzer Code Optimizer Code Optimizer Syntax tree Target machine code Intermediate Symbol Intermediate Code Generator Table Code Generator Intermediate representation Manas Thakur CS502: Compiler Design 9
Lexically scoped symbol tables ● Tasks at hand – Keep track of names – At the use of a name, find its information (e.g., which one?) ● The problem – Compiler needs a distinct entry for each declaration – Nested lexical scopes allow duplicate entries ● Let’s see an example. Manas Thakur CS502: Compiler Design 10
Scopes class p { S p :{ int a, b, c; int a, b, c; method q { S q : { int v, b, x, w; int v, b, x, w; for (r = 0; ...) { S r : { int x, y, z; int x, y, z; … ... } } while (s) { S s : { int x, a, v; … int x, a, v; } ... … r … s } } } … q … } } Manas Thakur CS502: Compiler Design 11
Chained implementation ● Create a new table for each scope ● Chain tables together for lookup ● enter() creates a new table ... ● insert() adds at current level p a q ● lookup() walks chain of tables r r and returns fjrst occurrence b x of name • v ... ● exit() throws away the table for the current level c y b x w ● How would one implement the z individual tables? Manas Thakur CS502: Compiler Design 12
Tomorrow ● Extensions to symbol tables for OO languages – Classes – Objects – Object fields – Inheritance ● Implementation: – Your compiler is taking shape now. ● Poll on Teams for doubt session. Manas Thakur CS502: Compiler Design 13
CS502: Compiler Design Semantic Analysis (Cont.) Manas Thakur Fall 2020
Virtual White Board ● Designing a symbol table ● Extending for new scopes ● Classes and inheritance ● Assignment 2: Not overweight, but under-tall – Try feeding lasagne to Garfield – Deadline: Oct 18 th Manas Thakur CS502: Compiler Design 15
CS502: Compiler Design Semantic Analysis (Cont.) Manas Thakur Fall 2020
Uniqueness checks: More complications ● Forward references – need multiple passes ● includes, packages, modules, interfaces – need to import/export ● Various coding conveniences – int a = sizeof(a); ● Declare “ a ” in the namespace before parsing the initializer – int b, c[sizeof(b)]; ● Declare “ b ” with a type before parsing “ c ” ● Multiple inheritance? ● Summary: Language features complicate the life of compiler designers even for a seemingly simple check! Manas Thakur CS502: Compiler Design 17
Type checking ● Big topic – Type expressions – Type equivalence – Type systems – Type inference ● What is a type? – A collection of values and the set of operations on those values. – Remember why did you say a door can’t kick or a ship can’t die? ● Types define capabilities. Manas Thakur CS502: Compiler Design 18
Purpose of types ● Identify and prevent errors – Avoid meaningless or harmful computations – Meaningless: (x < 6) + 1 - “bathtub” – Harmful? ● Program organization and documentation – Separate types for separate concepts P o P P – Types indicate programmers’ intent o P ● Support implementation – Allocate right amount of space for variables – Select right machine operands – Optimization: e.g., use fewer bits when possible ● Key idea: types can be checked Manas Thakur CS502: Compiler Design 19
Type errors ● Problem: – Underlying memory has no concept of type – Everything is just a string of bits: 0100 0000 0101 1000 0000 0000 0000 0000 – The floating point number: 3.375 – The 32-bit integer: 1,079,508,992 – Two 16-bit integers: 16472 and 0 – Four ASCII characters: @, X, NULL and NULL ● Without type checking: – Machine will let you store 3.375 and later load 1,079,508,992 – Violates the intended semantics of the program Manas Thakur CS502: Compiler Design 20
Type system ● Idea: – Provide clear interpretation for bits in memory – Impose constraints on the use of variables and data – Expressed as a set of rules – Automatically check the rules – Report errors to programmers ● Key questions: – What types are built into the language? – Can the programmer build new types? – What are the typing rules? – When does type checking occur? – How strictly are the rules enforced? Manas Thakur CS502: Compiler Design 21
When are checks performed? ● Statically typed languages – Types of all the variables are determined ahead of time – Examples? ● C, C++, Java ● Dynamically typed languages – Type of a variable can vary at run-time – Examples? ● Python, JavaScript, bash, Scheme ● Our focus: – Static typing – corresponds to standard static compilation Manas Thakur CS502: Compiler Design 22
Expressiveness ● Consider this Scheme function: P o P P o (define myfunc (lambda (x) P (if (list? x) (myfunc(car x)) (+ x 1)) ● What is the type of x ? – Sometimes a list, sometimes an atom – Downside? ● What would happen in static typing? – Cannot assign a type to x at compile-time – Cannot write this function – Static typing is conservative Manas Thakur CS502: Compiler Design 23
Types and Compilers ● Suppose the task is to generate code for: a = b + c * d; arr[i] = *p + 2; – What does the compiler need to know? ● Duties of a compiler: – Enforce type rules of the language – Choose operations to be performed ● Can a certain computation be done in one machine instruction? – Provide concrete representation (bits) ● What if a check can’t be performed at compile-time? Manas Thakur CS502: Compiler Design 24
Strong vs weak typing ● A strongly typed language does not allow variables to be used in a way inconsistent with their types (no loopholes) . – Example: Java. ● A weakly typed language allows many ways to bypass/violate the type system. – Classic example: C. How? ● Pointer arithmetic. ● C’s motto: just trust the programmer! Manas Thakur CS502: Compiler Design 25
Recommend
More recommend