The constants-table program Compiler Construction effjcient (constants that contain other constants) 23 / 114 when your program executes ▶ The constants-table is a compile-time data structure: ▶ It exists until your compiler is done generating code ▶ It does not exist when the [generated] code is running ▶ The constants-table serves several purposes: ▶ Lays out constants where constants shall reside in memory ▶ Helps to pre-compute the locations of the constants in your ▶ The locations are needed to lay out other constants in memory ▶ The locations are needed during code generation ▶ Constants are compiled into a single mov instruction ▶ The size/depth/complexity of the constant are of no signifjcance ▶ The run-time behaviour for constants is always the same, always Mayer Goldberg \ Ben-Gurion University
The constants-table ( continued ) Issue: Constants can be nested Compiler Construction 24 / 114 ▶ A sub-constant is also a constant ▶ It must be allocated at compile-time ▶ Its address needs to be known at compile-time ▶ Relevant data types: ▶ Pairs ▶ Vectors ▶ Symbols ▶ Symbols are special; we shall discuss symbols later Mayer Goldberg \ Ben-Gurion University
Example in C: The linked list (4 9 6 3 5 1) const LL c6351 = {6, &c351}; Compiler Construction sub-constants of c496351 const LL c496351 = {4, &c96351}; typedef struct LL { const LL c96351 = {9, &c6351}; const LL c351 = {3, &c51}; const LL c51 = {5, &c1}; const LL c1 = {1, (struct LL *)0}; } LL; struct LL *next; int value; 25 / 114 ☞ The constants c1 , c51 , c351 , c6351 , c96351 are all ▶ They need to be defjned, laid out in memory, and their address known before we can defjne c496351 Mayer Goldberg \ Ben-Gurion University
The constants-table ( continued ) Issue: Sharing sub-constants cannot assume any specifjc behaviour when performing side efgects on them sub-constants Compiler Construction 26 / 114 ▶ Since constants are, by defjnition, immutable, we can save space by factoring out & sharing common sub-constants: ▶ That side-efgects on constants are undefjned means that we ▶ This gives us license to share sub-constants ▶ Most Scheme implementations do not factor-out & share ☞ Our implementation shall factor-out & share sub-constants Mayer Goldberg \ Ben-Gurion University
The constants-table ( continued ) Interactivity Compiler Construction 27 / 114 ▶ Most Scheme systems are interactive ▶ Interactivity is not the same as being interpreted ▶ Chez Scheme is interactive ▶ Ches Scheme has no interpreter ▶ Expressions are compiled & executed on-the-fmy Mayer Goldberg \ Ben-Gurion University
The constants-table ( continued ) Interactivity tag-parse) interpreting it, or by compiling & executing it on-the-fmy (exit) is evaluated Compiler Construction 28 / 114 ▶ Interactive systems are conversational ▶ The “conversation” takes place at the REPL ▶ REPL stands for Read-Eval-Print-Loop ▶ Read: Read an expression from an input channel (scan, read, ▶ Eval: Compute the value of the expression, either by ▶ Print: Print the value of the expression (unless it’s #<void> ) ▶ Loop: Return to the start of the REPL ▶ The REPL executes until the end-of-fjle is reached or the Mayer Goldberg \ Ben-Gurion University
The constants-table ( continued ) Interactivity ( continued ) expressions are entered at the REPL sub-constants: constants” at run-time would make this process imperfect Compiler Construction 29 / 114 ▶ Interactive systems need to create constants on-the-fmy, as ▶ Creating constants on-the-fmy is not conducive to sharing ▶ There would be a great performance penalty to “looking up ▶ Some constants would/should be garbage-collected, which ▶ So interactive systems do not factor & share constants Mayer Goldberg \ Ben-Gurion University
The constants-table ( continued ) Interactivity ( continued ) on-the-fmy, which is harder than generating and writing assembly-instructions (which are just text) to a text fjle invoke a system debugger (such as gdb ) on an executable time-consuming, and would not ofger a great added benefjt to the course Compiler Construction 30 / 114 ☞ But we are not writing an interactive compiler! ▶ Writing interactive compilers requires generating machine-code ▶ Interactive compilers are harder to debug, since we cannot ▶ While interactive compilers are fun, the exercise would be Mayer Goldberg \ Ben-Gurion University
The constants-table ( continued ) too, which would be about as diffjcult as writing an interactive compiler of time Compiler Construction 31 / 114 ▶ We are writing an offmine/batch compiler ▶ It’s not conversational ▶ We see all the source code at compile-time ▶ We won’t be implementing the load procedure ▶ So code cannot be loaded during run-time ▶ This would require the compiler to be available during run-time ▶ In particular, we get to see all the constants in our code, ahead ▶ So it makes sense that we factor/share sub-constants Mayer Goldberg \ Ben-Gurion University
The constants-table ( continued ) Constructing the constants-table Const records which it is a part Compiler Construction 32 / 114 ① Scan the AST (one recursive pass) & collect the sexprs in all ▶ The result is a list of sexprs ② Convert the list to a set (removing duplicates) ③ Expand the list to include all sub-constants ▶ The list should be sorted topologically ▶ Each sub-constant should appear in the list before the const of ▶ For example, (2 3) should appear before (1 2 3) ④ Convert the resulting list into a set (remove all duplicates, again) Mayer Goldberg \ Ben-Gurion University
The constants-table ( continued ) Constructing the constants-table Compiler Construction less than 256 the constants-table your code 33 / 114 constants-table: ⑤ Go over the list, from fjrst to last, and create the ① For each sexpr in the list, create a 3-tuple: ▶ The address of the constant sexpr ▶ The constant sexpr itself ▶ The representation of the constant sexpr as a list of bytes ② The fjrst constant should start at address zero (0) ▶ The TAs will instruct you how to make use of this address in ③ The constant sexpr is used as a key for looking up constants in ④ The representation of a constant is a list of numbers: ▶ Each number is a byte, that is, a non-negative integer that is Mayer Goldberg \ Ben-Gurion University
The constants-table ( continued ) Constructing the constants-table constants-table: it, in its intermediate state, to look-up the addresses of sub-constants lookup & extend the constants-table Compiler Construction 34 / 114 ⑤ Go over the list, from fjrst to last, and create the ⑤ As you construct the constants-table, you shall need to consult ▶ The list of 3-tuples contains all the information needed to Mayer Goldberg \ Ben-Gurion University
The constants-table ( continued ) program Compiler Construction constants at run-time to create and issue the mov instructions that evaluate the representation of other constants that contain them How the constants-table is used 35 / 114 your program ① The representations of the constants initialize the memory of ▶ They are laid out in memory by the code-generator ▶ They are allocated in assembly-language by the compiler ▶ They are assembled into a data stored in a data segment ▶ They are loaded by the system loader when you run your ▶ They are available in memory before the program starts to run ② The addresses of the constants are used to to determine the ③ The addresses of the constants are used by the code generator Mayer Goldberg \ Ben-Gurion University
The constants-table ( continued ) How/where sharing of sub-constants takes place in the constants-table appeared, and is now shared by all of them Compiler Construction 36 / 114 ▶ When constructing the constants-table, we twice converted lists to sets, i.e., removed duplicates ▶ This means that for any constant sexpr S will appear only once ▶ All sexprs that contain S will use the same address of the one and only occurrence of the constant sexpr S ▶ So S has been “factored out” of all constant sexprs in which it Mayer Goldberg \ Ben-Gurion University
The constants-table ( continued ) You still need some information… The code to generate the constants-table is straightforward to write, but please don’t start on it just yet. The TAs will give you some additional information: representing the various constants in memory various data types sub-constants Compiler Construction 37 / 114 ▶ The TAs will give you the layout, i.e., the schema for ▶ In particular, you need to know how to encode the RTTI for the ▶ For Strings, Pairs, Vectors, you need to know how to handle ▶ Symbols are complicated (will be covered later on) Mayer Goldberg \ Ben-Gurion University
Chapter 6 Roadmap Code Generation: Compiler Construction 38 / 114 🗹 Constants ▶ Symbols & Free Variables ▶ The Code Generator Mayer Goldberg \ Ben-Gurion University
Symbols & Free Vars interactive systems vs batch systems, the implementation of symbols & free variables is difgerent too similar languages) developed over decades, and is by now a fundamental aspect of these languages, so understing the implementation is essential in a standard, interactive system Compiler Construction 39 / 114 ▶ Just as the implementation of constants is difgerent in ▶ The implementation of symbols & free variables in Scheme (and ▶ We fjrst consider how symbols & free variables are implemented ▶ Then we consider how batch systems are difgerent ▶ Finally, we detail what you should implement in your system Mayer Goldberg \ Ben-Gurion University
Symbols & Free Vars ( continued ) Interactive Systems and is known as its print name > (symbol->string 'moshe) "moshe" Compiler Construction 40 / 114 ▶ Symbols are hashed strings ▶ The hash table is also known as the symbol table ▶ Each symbol has a representative string that serves as a key ▶ To see the print names, use the procedure symbol->string : Mayer Goldberg \ Ben-Gurion University
Symbols & Free Vars ( continued ) Interactive Systems ( continued ) modifjed using string-set! function will no longer map to it hackish way to “hide” data. Today it’s an unnecessary anachronism… Compiler Construction 41 / 114 ▶ Symbols are hashed strings ▶ Dr Racket returns a duplicate of the representative string ▶ Chez Scheme returns the exact, identical string ▶ This is one area where Che’s behaviour is problematic: ▶ If the original representative string is returned, it can be ▶ This shall render the symbol inaccessible, since the hash ▶ This [mis-]behaviour was intentional in Chez, and was used as a Mayer Goldberg \ Ben-Gurion University
Symbols & Free Vars ( continued ) or pre-existing) Compiler Construction same name Interactive Systems ( continued ) 42 / 114 new expressions get typed at the REPL or loaded from fjles ▶ In interactive Scheme, new symbols are added all the time as ▶ The scanner is in charge of ▶ Recognizing the symbol token ▶ Hashing the symbol string to obtain a bucket (whether original ▶ Creating the symbol object: A symbol is a tagged object containing the address of the corresponding bucket ▶ The bucket contains 2 cells: ▶ The print cell , pointing to the representative string ▶ The value cell, holding the value of the free variable by the Mayer Goldberg \ Ben-Gurion University
Symbols & Free Vars ( continued ) Interactive Systems ( continued ) strings closely related in interactive systems: Compiler Construction 43 / 114 ▶ The symbol-table serves two purposes: ▶ Managing the symbol data structure as a collection of hashed ▶ Managing the global-variable bindings via the top-level ▶ These two purposes may appear unrelated, but, in fact, they are ▶ Every free variable was once a symbol… ▶ Every symbol is hashed ▶ Free variables and symbols can be loaded during run-time Mayer Goldberg \ Ben-Gurion University
The value-cell are defjned at the top-level aggregating groups of functions and variables is exported by default Compiler Construction 44 / 114 ▶ The view of the symbol-table across the dimension of the value cells is known as the top-level ▶ The top-level holds the global bindings in Scheme ▶ For example, the procedures car , cdr , cons , and other builtins ▶ Modern versions of Scheme (R 6 RS) & modern dialects of LISP come with namespaces, packages, modules, as ways of ▶ The top-level, in such systems, is just a system namespace that Mayer Goldberg \ Ben-Gurion University
n -LISP name-cell, they contain 2 additional cells Compiler Construction 45 / 114 ▶ Scheme buckets come with a name-cell and a value-cell ▶ Some dialects of LISP come with more cells ▶ A value-cell & and a function-cell ▶ Such systems are known as 2-LISP systems, because beyond the ▶ In this sense, Scheme is a 1-LISP Mayer Goldberg \ Ben-Gurion University
n -LISP ( continued ) What does it means to have a value-cell & function-cell? (x x) value procedure funcall data), you need to use the special form function (which has the reader-macro form #' ) Compiler Construction 46 / 114 ▶ The same variable name can refer both to a procedure and a ▶ This does not mean you cannot store a procedure in a value cell ▶ To apply a procedure in a value cell, you need to use the ▶ To obtain the closure in the function-cell (to be passed as Mayer Goldberg \ Ben-Gurion University
n -LISP ( continued ) curve if you ever need to learn Perl! Compiler Construction 47 / 114 name adds power to the language ▶ What are the advantages of 2-LISP languages? ☞ There are NONE! ▶ A long time ago, some people thought the ability to overload a ▶ So why bother with 2-LISP languages?? ▶ Well, there’s this hardly-known, esoteric, programming language by the name of Perl, which is a 5-LISP language… 😊 ▶ Every name in Perl can be used for ▶ A function ▶ A scalar ▶ An array ▶ A hash table ▶ A fjle handle ▶ So knowing about this nonsense might reduce your learning Mayer Goldberg \ Ben-Gurion University
Symbols & Free Vars ( continued ) Part of the symbol-table & top-level for the code: Compiler Construction 48 / 114 > (define x 34) > (define foo 'foo) The Symbol Table & Top-Level hash table print cell symbol hash bucket value cell integer 34 symbol string 3 'f' 'o' 'o' string 1 'x' Mayer Goldberg \ Ben-Gurion University
Symbols & Free Vars ( continued ) become the print name for a symbol: > (string->symbol "a234") a234 > (string->symbol "A234") A234 > (string->symbol "A 234") A\x20;234 > (string->symbol "this is a bad symbol!") this\x20;is\x20;a\x20;bad\x20;symbol! Compiler Construction 49 / 114 ▶ There is a strict grammar for literal symbols, but any string can Mayer Goldberg \ Ben-Gurion University
Symbols & Free Vars ( continued ) to the hash function for the symbol-table languages Compiler Construction 50 / 114 ▶ Because a symbol can be created from any string, symbols that do not resemble literal strings are printed in peculiar ways, using hexadecimal characters, so as to avoid confusion ▶ The string->symbol procedure may be thought of as the API ▶ As of R 6 RS, Scheme supports hash tables as fjrst-class objects, so programmers may use them as freely as dictionaries in other Mayer Goldberg \ Ben-Gurion University
Symbols & Free Vars ( continued ) the initial value is #<undefined> (a special object that signifjes that the global variable hasn’t been defjned & holds no value) error, although Chez Scheme is tolerant of this, and tacitly defjnes the variable before setting it to set! Compiler Construction 51 / 114 ▶ When a new symbol is hashed, a bucket for it is created, and ▶ Global variables are defjned by means of the define -expression ▶ Attempts to assign an undefjned variable is defjned to be an ▶ Re-defjning a variable changes its value & is somewhat similar Mayer Goldberg \ Ben-Gurion University
Symbols & Free Vars ( continued ) sexpr: objects are created variable, the variable cell is accessible for defjnition/set/get via the hash bucket of the symbol the symbol & the hash bucket Compiler Construction 52 / 114 ▶ When expressions are read, either at the REPL, or from a fjle, either in textual or compiled form, each expr is fjrst read as an ▶ At this stage, symbols are hashed & the corresponding symbol ▶ If, upon parsing, such a symbol turns out to denote a free ▶ Thus free variable access is but a pointer dereference away from Mayer Goldberg \ Ben-Gurion University
Symbols & Free Vars ( continued ) entry-top-line asmop-add entry-bot-line Compiler Construction 9420 > (length (oblist)) entry-mark Effect) record-constructor ... (entry-row-set! entry-col-set! entry-screen-cols-set! > (oblist) procedure oblist : answer is always affjrmative. to ask whether a given symbol is in the symbol-table: The 53 / 114 ▶ Because symbols are hashed by the scanner, it makes no sense ▶ The list of print-names from the symbol-table is available via the Mayer Goldberg \ Ben-Gurion University
Symbols & Free Vars ( continued ) break the defjnition of symbols: Compiler Construction gensym , and are also known as gensyms variable names in hygienic macro-expanders is not equal to any other symbol (in the sense of eq? ) fresh symbol that does not appear anywhere in the system, and 54 / 114 hashed a special kind of symbols that are not hashed hash table) ▶ Symbols for which there are buckets in the hash-table are said to be interned symbols (in the sense that they are internal to the ▶ Another kind of symbols are the uninterned symbols, which are ▶ The vast majority of symbols in the system are interned and ▶ Uninterned symbols are a hack that was added intentionally to ▶ Uninterned symbols are used in situations where we require a ▶ Such symbols are used when we need unique names, such as for ▶ Uninterned symbols are created by means of the procedure Mayer Goldberg \ Ben-Gurion University
Symbols & Free Vars ( continued ) Uninterned symbols are supported via the following API: Compiler Construction #f > (eq? (gensym) (gensym)) #f > (eq? 'g1 (gensym)) g0 > (gensym) name: compared to anything but itself: names, such as g1 , g2 , g3 , etc 55 / 114 ▶ gensym generates uninterned symbols, usually with numbered ▶ The symbol? predicate returns #t for an uninterned symbol ▶ The eq? predicate returns #f whenever an uninterned symbol is ▶ Either an interned symbol, including a symbol by the same ▶ Or another gensym : Mayer Goldberg \ Ben-Gurion University
Symbols & Free Vars ( continued ) Uninterned symbols are supported via the following API: form "g1" , "g2" , etc., that may look like the one generated for an uninterned symbol: > ( list (symbol-> string 'g1) (symbol-> string ( gensym ))) ("g1" "g1") Compiler Construction 56 / 114 ▶ The symbol->string procedure will generate a string of the ▶ Uninterned symbols can be identifjed via the gensym? procedure Mayer Goldberg \ Ben-Gurion University
Symbols & Free Vars ( continued ) Our implementation of symbols procedure string->symbol , so that new symbol objects cannot be created, at run-time, from strings constants, and this simplifjes matters considerably sub-constant Compiler Construction 57 / 114 ▶ The implementation of symbols is simplifjed by the fact that ours is a static compiler, and therefore symbols shall not be loaded during run-time ▶ To simplify matters further, you should not implement the ▶ This means that all symbols in our system are static, literal ▶ All symbols shall have the respective representative string as a ☞ This afgects the way you construct the constants-table Mayer Goldberg \ Ben-Gurion University
Symbols & Free Vars ( continued ) Our implementation of symbols points to the representative string, which itself is a tagged, constant, string-object two symbol objects of the representative string of the symbol Compiler Construction 58 / 114 ▶ The symbol data-structure is a tagged data structure that ▶ The eq? procedure should compare the address fjelds of the ▶ The symbol->string procedure shall create and return a copy Mayer Goldberg \ Ben-Gurion University
Symbols & Free Vars ( continued ) Our implementation of free variables not loaded during run-time code-generation phase of the compiler pipeline Compiler Construction 59 / 114 ▶ Just as before, our implementation of free variables is simplifjed by the fact that ours is a static compiler, so free variables are ▶ Global variables in our system are not much more than names that serve as shorthand for assembly-language labels that point to global storage in the data section ▶ Your goal is to create a free-variables-table to serve during the Mayer Goldberg \ Ben-Gurion University
Symbols & Free Vars ( continued ) Our implementation of free variables ( continued ) of the user-code, so you collect a list of strings that are the names of all the free variables that occur in the AST of the user code duplicate strings each name-string of a free variable a unique, indexed string for a label in the x86/64 assembly language: "v1" , "v2" , "v3" , etc Compiler Construction 60 / 114 ▶ Just as you collected a list of constants, by traversing the AST ▶ Create a set from the above list of strings, by removing ▶ Create a list of pairs, based on the above set, by associating with Mayer Goldberg \ Ben-Gurion University
Symbols & Free Vars ( continued ) Our implementation of free variables ( continued ) instruction to the respective label/variable v n Compiler Construction 61 / 114 ▶ The list of pairs is the free-variables-table: ▶ This table must be available to the code generator ▶ Here is how the code-generator uses it: ▶ For a get to a free variable, the code-generator issues a mov instruction from the respective label/variable v n ▶ For a set to a free variable, the code-generator issues a mov Mayer Goldberg \ Ben-Gurion University
Chapter 6 Roadmap Code Generation: Compiler Construction 62 / 114 🗹 Constants 🗹 Symbols & Free Variables ▶ The Code Generator Mayer Goldberg \ Ben-Gurion University
Code Generation Compiler Construction 63 / 114 compiling program compiler Compilër* lang runs lang src dst on *Some assembly required Mayer Goldberg \ Ben-Gurion University
Code Generation line… Compiler Construction 64 / 114 ▶ The code generator is a function expr ′ → string ▶ We look at expr ′ after the semantic analysis phase is complete ▶ After the constants-table and free-vars-table have been set up ▶ The string returned is x86/64 assembly language code, line by Mayer Goldberg \ Ben-Gurion University
Code Generation ( continued ) Assumptions about the code-generator Compiler Construction code-generator, and consequently, for the compiler then combined to form a proof of correctness for the entire correct behaviour 65 / 114 We make several assumptions concerned our code-generator, that we shall have to satisfy: ▶ Notation: The notation � · � stands for the code-generator ▶ The induction hypothesis of the code-generator: For any expression E , � E � is a string of instructions in x86/64 assembly language, that evaluate E , and place its value in register rax ▶ We need this assumption to convince ourselves that for each node in the AST of expr ′ , we generate code that has the ▶ The relative correctness of each part of the code-generator is Mayer Goldberg \ Ben-Gurion University
Code Generation ( continued ) Assumptions about the code-generator ( continued ) the fjrst 6 non-fmoating-point arguments are passed through 6 general-purpose registers, 8 fmoating point arguments are passed through 8 SSE registers, and any additional arguments are passed on the system-stack. procedures take far less than 6 arguments the extensive use of apply , variadic procedures & procedures with optional arguments, and the relatively little use of fmoating-point numbers Compiler Construction 66 / 114 ▶ The calling conventions on the x86/64 architecture specify that ▶ This calling convention is very nice for C, because most ▶ This calling convention is not very nice for Scheme, because of Mayer Goldberg \ Ben-Gurion University
Code Generation ( continued ) Assumptions about the code-generator ( continued ) the fjrst 6 non-fmoating-point arguments are passed through 6 general-purpose registers, 8 fmoating point arguments are passed through 8 SSE registers, and any additional arguments are passed on the system-stack. conventions, but shall use the system-stack, organized into activation frames, to pass all the arguments, regardless of their number, type, & size Compiler Construction 67 / 114 ▶ The calling conventions on the x86/64 architecture specify that ☞ The code we generate shall not adhere to these calling Mayer Goldberg \ Ben-Gurion University
Code Generation ( continued ) Assumptions about the code-generator ( continued ) constants, that are always present in the run-time system, even if they are not present statically in the code: Compiler Construction 68 / 114 ▶ We shall assume the availablity of four singleton, litteral ▶ The void object #<void> ▶ located at label sob_void ▶ The empty list () ▶ located at label sob_nil ▶ The Boolean value false #f ▶ located at label sob_false ▶ The Boolean value true #t ▶ located at label sob_true Mayer Goldberg \ Ben-Gurion University
Code Generation ( continued ) ⋯ Compiler Construction Assumptions about the code-generator ( continued ) 69 / 114 ▶ We assume the following structure for all activation frames: System Stack lex env ret addr old rbp An-1 qword [rbp + 8 * 4] A0 stack frame qword [rbp + 8 * 3] n qword [rbp + 8 * 2] lex env qword [rbp + 8 * 1] ret addr qword [rbp] old rbp Mayer Goldberg \ Ben-Gurion University
Code Generation ( continued ) Assumptions about the code-generator ( continued ) the body of some lambda -expression that has been applied. For example, within the body of a null let -expression: (let () ... ) an initially dummy frame at the start of the program Compiler Construction 70 / 114 ▶ We shall assume there is always at least one activation frame ▶ This means that the code-generator assumes that we are within ▶ We will need to support/maintain this assumption by setting up Mayer Goldberg \ Ben-Gurion University
Code Generation ( continued ) describe in pseudo-code, what the code-generator returns for each and every node. Compiler Construction 71 / 114 We shall now go over each of the nodes in the expr ′ AST, and Mayer Goldberg \ Ben-Gurion University
Code Generation ( continued ) ⋯ Compiler Construction Constants 72 / 114 The frame System Stack lex env ret addr old rbp � Const'(c) � An-1 = mov rax, AddressInConstTable ( c ) qword [rbp + 8 * 4] A0 stack frame qword [rbp + 8 * 3] n qword [rbp + 8 * 2] lex env qword [rbp + 8 * 1] ret addr qword [rbp] old rbp Mayer Goldberg \ Ben-Gurion University
Code Generation ( continued ) ⋯ Compiler Construction Parameters / get 73 / 114 The frame System Stack lex env ret addr old rbp � Var'(VarParam'(_, minor)) � An-1 = mov rax , qword [ rbp + 8 ∗ ( 4 + minor )] qword [rbp + 8 * 4] A0 stack frame qword [rbp + 8 * 3] n qword [rbp + 8 * 2] lex env qword [rbp + 8 * 1] ret addr qword [rbp] old rbp Mayer Goldberg \ Ben-Gurion University
Code Generation ( continued ) The frame Compiler Construction Parameters / set ⋯ 74 / 114 System Stack lex env ret addr � Set(Var'(VarParam'(_, minor)), E ) � old rbp An-1 = � E � mov qword [ rbp + 8 ∗ ( 4 + minor )] , rax qword [rbp + 8 * 4] A0 stack frame qword [rbp + 8 * 3] n mov rax , sob _ void qword [rbp + 8 * 2] lex env qword [rbp + 8 * 1] ret addr qword [rbp] old rbp Mayer Goldberg \ Ben-Gurion University
Code Generation ( continued ) The frame Compiler Construction Bound vars / get ⋯ 75 / 114 System Stack lex env � Var'(VarBound'(_, major, minor)) � ret addr old rbp = mov rax , qword [ rbp + 8 ∗ 2 ] An-1 mov rax , qword [ rax + 8 ∗ major ] qword [rbp + 8 * 4] A0 stack frame mov rax , qword [ rax + 8 ∗ minor ] qword [rbp + 8 * 3] n qword [rbp + 8 * 2] lex env qword [rbp + 8 * 1] ret addr qword [rbp] old rbp Mayer Goldberg \ Ben-Gurion University
Code Generation ( continued ) The frame Compiler Construction Bound vars / set ⋯ 76 / 114 major, System Stack � Set(Var'(VarBound'(_, lex env ret addr minor)), E ) � old rbp = � E � An-1 mov rbx , qword [ rbp + 8 ∗ 2 ] qword [rbp + 8 * 4] A0 stack frame mov rbx , qword [ rbx + 8 ∗ major ] qword [rbp + 8 * 3] n qword [rbp + 8 * 2] lex env mov qword [ rbx + 8 ∗ minor ] , rax qword [rbp + 8 * 1] ret addr qword [rbp] old rbp mov rax , sob _ void Mayer Goldberg \ Ben-Gurion University
Code Generation ( continued ) ⋯ Compiler Construction Free vars / get 77 / 114 The frame System Stack lex env ret addr old rbp � Var'(VarFree'(v)) � An-1 = mov rax , qword [ LabelInFVarTable ( v )] qword [rbp + 8 * 4] A0 stack frame qword [rbp + 8 * 3] n qword [rbp + 8 * 2] lex env qword [rbp + 8 * 1] ret addr qword [rbp] old rbp Mayer Goldberg \ Ben-Gurion University
Code Generation ( continued ) The frame Compiler Construction Free vars / set ⋯ 78 / 114 System Stack lex env ret addr � Set(Var'(VarFree'(v)), E ) � old rbp An-1 = � E � mov qword [ LabelInFVarTable ( v )] , rax qword [rbp + 8 * 4] A0 stack frame qword [rbp + 8 * 3] n mov rax , sob _ void qword [rbp + 8 * 2] lex env qword [rbp + 8 * 1] ret addr qword [rbp] old rbp Mayer Goldberg \ Ben-Gurion University
Code Generation ( continued ) The frame Compiler Construction Sequences ⋯ 79 / 114 System Stack � Seq([ E 1 ; E 2 ; · · · ; E n ]) � lex env ret addr = � E 1 � old rbp An-1 � E 2 � · · · qword [rbp + 8 * 4] A0 stack frame qword [rbp + 8 * 3] n � E n � qword [rbp + 8 * 2] lex env qword [rbp + 8 * 1] ret addr qword [rbp] old rbp Mayer Goldberg \ Ben-Gurion University
Code Generation ( continued ) jne Lexit Compiler Construction Or ⋯ The frame Lexit: 80 / 114 cmp rax, sob_false jne Lexit cmp rax, sob_false � Or'([ E 1 ; E 2 ; · · · ; E n ]) � System Stack = � E 1 � lex env ret addr old rbp � E 2 � An-1 qword [rbp + 8 * 4] A0 stack frame qword [rbp + 8 * 3] n qword [rbp + 8 * 2] lex env · · · qword [rbp + 8 * 1] ret addr qword [rbp] old rbp � E n � Mayer Goldberg \ Ben-Gurion University
Code Generation ( continued ) Lelse: Compiler Construction If ⋯ The frame Lexit: 81 / 114 jmp Lexit je Lelse cmp rax, sob_false System Stack � If'( Q , T , E ) � = � Q � lex env ret addr old rbp An-1 � T � qword [rbp + 8 * 4] A0 stack frame qword [rbp + 8 * 3] n qword [rbp + 8 * 2] lex env qword [rbp + 8 * 1] ret addr � E � qword [rbp] old rbp Mayer Goldberg \ Ben-Gurion University
Code Generation ( continued ) Boxes only one of many possible implementations Compiler Construction 82 / 114 ▶ Boxes privide one extra level of indirection to the value ▶ Boxes can be implemented as untagged arrays of size 1 ▶ That they are untagged means that boxes do not contain RTTI ▶ This is probably the simplest implementation, but nevertheless, Mayer Goldberg \ Ben-Gurion University
Code Generation ( continued ) The frame Compiler Construction Box / get ⋯ 83 / 114 System Stack lex env ret addr old rbp � BoxGet'(Var'(v)) � An-1 = � Var'(v) � qword [rbp + 8 * 4] A0 stack frame mov rax , qword [ rax ] qword [rbp + 8 * 3] n qword [rbp + 8 * 2] lex env qword [rbp + 8 * 1] ret addr qword [rbp] old rbp Mayer Goldberg \ Ben-Gurion University
Code Generation ( continued ) mov rax, sob_void Compiler Construction Box / set ⋯ The frame 84 / 114 push rax System Stack � BoxSet'(Var'(v), E ) � lex env ret addr = � E � old rbp An-1 � Var'(v) � qword [rbp + 8 * 4] A0 stack frame qword [rbp + 8 * 3] n pop qword [ rax ] qword [rbp + 8 * 2] lex env qword [rbp + 8 * 1] ret addr qword [rbp] old rbp Mayer Goldberg \ Ben-Gurion University
Code Generation ( continued ) ++i, ++j) { Compiler Construction LambdaSimple Outline } ExtEnv[j] = Env[i]; 85 / 114 1): (on the stack) to ExtEnv (with ofgset of pseudo-code: � LambdaSimple ′ ([ p 1 ; · · · ; p m ] , body ]) � in Closure-Creation Code Create ExtEnv ▶ Allocate the ExtEnv (the size of which is known statically, and is 1 + | Env | ) Allocate closure object ▶ Copy pointers of minor vectors from Env Closure → Env ≔ ExtEnv Closure → Code ≔ Lcode jmp Lcont Lcode: closure body push rbp for (i = 0, j = 1; i < | Env | ; mov rbp, rsp 〚 body 〛 leave ret Lcont: Mayer Goldberg \ Ben-Gurion University
Code Generation ( continued ) for (i = 0; i < n; ++i) Compiler Construction LambdaSimple ( continued ) Outline ExtEnv[0][i] = Param i ; 86 / 114 where to store the parameters pseudo-code: � LambdaSimple ′ ([ p 1 ; · · · ; p m ] , body ]) � in Closure-Creation Code Create ExtEnv ▶ Allocate ExtEnv[0] to point to a vector Allocate closure object ▶ Copy the parameters ofg of the stack: Closure → Env ≔ ExtEnv Closure → Code ≔ Lcode jmp Lcont Lcode: closure body push rbp mov rbp, rsp ▶ Allocate the closure object; Address in rax 〚 body 〛 ▶ Set rax → env = ExtEnv leave ret ▶ Set rax → code = Lcode Lcont: ▶ jmp Lcont Mayer Goldberg \ Ben-Gurion University
Code Generation ( continued ) leave Compiler Construction LambdaSimple ( continued ) Outline Lcont: ret 87 / 114 pseudo-code: push rbp mov rbp, rsp Closure-Creation Code � LambdaSimple ′ ([ p 1 ; · · · ; p m ] , body ]) � in Create ExtEnv Allocate closure object ▶ Lcode: Closure → Env ≔ ExtEnv Closure → Code ≔ Lcode jmp Lcont � body � Lcode: closure body push rbp mov rbp, rsp 〚 body 〛 leave ret Lcont: Mayer Goldberg \ Ben-Gurion University
Code Generation ( continued ) compositional Compiler Construction LambdaSimple ( continued ) Outline with the jmp Lcont instruction code-generator 88 / 114 perform only the code in blue the code in orange executes the code in blue Closure-Creation Code ▶ During the creation of the closure, we Create ExtEnv ▶ During the application of the closure, only Allocate closure object ▶ The code in orange is embedded within Closure → Env ≔ ExtEnv Closure → Code ≔ Lcode jmp Lcont ▶ This makes our code-generator Lcode: closure body push rbp mov rbp, rsp ▶ We can combine the output of the 〚 body 〛 leave ret ▶ The downside is that we need to pay Lcont: Mayer Goldberg \ Ben-Gurion University
Code Generation ( continued ) lambda -expression, so that the normal Compiler Construction LambdaSimple ( continued ) Outline the the closure was applied mistake, and it would not execute unless program-fmow would not reach it by 89 / 114 of the code-generator code-generator non-compositional instruction at the expense of making our of the way” where to place the code Closure-Creation Code ▶ We could have saved the jmp Lcont Create ExtEnv ▶ We could not have combined the output Allocate closure object Closure → Env ≔ ExtEnv Closure → Code ≔ Lcode ▶ We would have needed some place, “out jmp Lcont Lcode: generated for the body of a closure body push rbp mov rbp, rsp 〚 body 〛 leave ret Lcont: Mayer Goldberg \ Ben-Gurion University
Code Generation ( continued ) . Compiler Construction Verify that rax has type closure push n Application push rax . . push rax 90 / 114 � Applic ′ ( proc , [ Arg 1 ; · · · ; Arg n ]) � in pseudo-code: � Arg n � � Arg 1 � � proc � push rax → env call rax → code Mayer Goldberg \ Ben-Gurion University
Code Generation ( continued ) Application ( continued ) add rsp, 8*1 ; pop env pop rbx ; pop arg count shl rbx, 3 ; rbx = rbx * 8 add rsp, rbx; pop args stack before popping ofg the arguments pop might be difgerent from the number originally pushed: Compiler Construction 91 / 114 � Applic ′ ( proc , [ Arg 1 ; · · · ; Arg n ]) � in pseudo-code: ▶ Notice that upon return, we consult the argument count on the ▶ This takes into account the fact that the number we need to ▶ lambda -expressions with optional arguments ▶ The tail-call optimization Mayer Goldberg \ Ben-Gurion University
Code Generation ( continued ) procedure, i.e., at Lcode Compiler Construction Lambda with optional args Outline 92 / 114 pseudo-code: LambdaSimple' ClosureOpt-Creation Code Create ExtEnv � LambdaOpt ′ ([ p 1 ; · · · ; p m ] , opt , body ]) � in Allocate closure object Closure → Env ≔ ExtEnv Closure → Code ≔ Lcode jmp Lcont ▶ The code is essentially the same as for Lcode: Adjust stack for c opt args l ▶ The difgerence occurs in the body of the o s u r e push rbp b o d mov rbp, rsp y 〚 body 〛 leave ret Lcont: Mayer Goldberg \ Ben-Gurion University
Code Generation ( continued ) leave Compiler Construction Lambda with optional args Outline Lcont: ret 93 / 114 mov rbp, rsp pseudo-code: push rbp optional arguments Adjust the stack for the Lcode: ClosureOpt-Creation Code � LambdaOpt ′ ([ p 1 ; · · · ; p m ] , opt , body ]) � in Create ExtEnv Allocate closure object Closure → Env ≔ ExtEnv Closure → Code ≔ Lcode jmp Lcont Lcode: Adjust stack for c opt args l o � body � s u r e push rbp b o d mov rbp, rsp y 〚 body 〛 leave ret Lcont: Mayer Goldberg \ Ben-Gurion University
Code Generation ( continued ) supposed to point to the list (3 Compiler Construction Lambda with optional args Outline arguments must change too! 5 8) 94 / 114 ... ) is applied to the arguments 1, 1, 2, 3, 5, 8 4 arguments ▶ Suppose (lambda (a b c . d) The stack as it is The stack as expected 8 (3 5 8) ▶ Six arguments are passed 5 2 ▶ The body of the procedure expects 3 1 2 1 ▶ The last argument, d , is 1 4 1 env 6 ret ▶ The stack needs to be adjusted env ret ▶ Notice that the number of Mayer Goldberg \ Ben-Gurion University
Code Generation ( continued ) supposed to point to the empty Compiler Construction Lambda with optional args Outline arguments must change too! list () 95 / 114 ... ) is applied to the arguments 1, 1, 2 4 arguments ▶ Suppose (lambda (a b c . d) The stack as it is The stack as expected ▶ Three arguments are passed 2 () ▶ The body of the procedure expects 1 2 1 1 ▶ The last argument, d , is 3 1 env 4 ret env ▶ The stack needs to be adjusted ret ▶ Notice that the number of Mayer Goldberg \ Ben-Gurion University
Code Generation ( continued ) the empty list Compiler Construction activation frame: Lambda with optional args to make room for opt 96 / 114 ▶ As you can see ▶ Sometimes we need to shrink the top frame ▶ Sometimes we need to enlarge the top frame by one ▶ When the number of arguments matches precisely the number of required parameters, there is no room in the frame to place ▶ We shift the contents of the frame down by one [8-byte] word ▶ We can test during run-time and decide what to do ▶ This is the basic approach ▶ We can also use magic to save us from having to test and shift down… 🧚 ▶ To use magic, we need to change the structure of our Mayer Goldberg \ Ben-Gurion University
Code Generation ( continued ) With Magic Compiler Construction ⋯ Without Magic 97 / 114 ⋯ System Stack System Stack lex env lex env ret addr ret addr old rbp old rbp magic An-1 An-1 qword [rbp + 8 * 4] qword [rbp + 8 * 4] A0 stack frame stack frame qword [rbp + 8 * 3] A0 qword [rbp + 8 * 3] n qword [rbp + 8 * 2] n qword [rbp + 8 * 2] lex env qword [rbp + 8 * 1] lex env qword [rbp + 8 * 1] ret addr qword [rbp] ret addr qword [rbp] old rbp old rbp Mayer Goldberg \ Ben-Gurion University
Code Generation ( continued ) supposed to point to the list (3 Compiler Construction Lambda with optional args with Outline arguments must change too! 5 8) 98 / 114 4 arguments magic 1, 1, 2, 3, 5, 8 ... ) is applied to the arguments ▶ Suppose (lambda (a b c . d) The stack as it is The stack as expected magic magic 8 (3 5 8) ▶ Six arguments are passed 5 2 ▶ The body of the procedure expects 3 1 2 1 ▶ The last argument, d , is 1 4 1 env 6 ret ▶ The stack needs to be adjusted env ret ▶ Notice that the number of Mayer Goldberg \ Ben-Gurion University
Code Generation ( continued ) supposed to point to the empty Compiler Construction Lambda with optional args with Outline arguments must change too! list () 99 / 114 4 arguments magic 1, 1, 2 ... ) is applied to the arguments ▶ Suppose (lambda (a b c . d) The stack as it is The stack as expected ▶ Three arguments are passed magic () 2 2 ▶ The body of the procedure expects 1 1 1 1 ▶ The last argument, d , is 3 3 env env ret ret ▶ The stack needs to be adjusted ▶ Notice that the number of Mayer Goldberg \ Ben-Gurion University
Code Generation ( continued ) Lambda with optional args ( continued ) To summarize: belong to procedures with optional arguments! from the frame after returning from an application depending on your taste/style… Compiler Construction 100 / 114 ▶ Using magic means reserving a word at the start of each frame ▶ All frames grow by one word, regardless of whether or not they ▶ We do not include magic in the argument count on the stack! ▶ If you choose to use magic you need to remember to remove it ☞ You are free to use either the basic approach or magic, Mayer Goldberg \ Ben-Gurion University
Recommend
More recommend