Assemblers and Linkers CS 2253 Owen Kaser, UNBSJ
Contents ● Review of assembler tasks ● A look at linker tasks ● Assembler implementation ● The location counter and symbol table ● Two-pass assembler ● Macros and conditional compilation
Review of Assemblers ● An assembler takes commands and translates them into what will be the contents of some areas. ● Assembler commands can be – directives, such as ● AREA foo, data [ change the area being generated] ● DCB “hello” [ generate some byte contents in current area] – instructions ● ADD R1,R2,R3 [generate machine code bits in current area] – labels ● blah …. [ record the current position in the current area as “blah”]
Linkers ● The assembler typically generates one “object code” (.OBJ) file, containing the contents of the various areas. ● One source code file → one object code file. ● Libraries are also object code files. ● Linker's overall job is to put together the various areas in all the object files, getting an executable file that is ready to load into memory and run.
Relocation ● Consider the following situation in area AAA foo DCD foo ; say this is 40 bytes into AAA … 100 instructions later LDR R1,=foo ● The machine code for the LDR is really for LDR R1, [PC,#-408] “relocatable” ● But the content of the variable foo is supposed to depend on where foo ends up in memory. The assembler does not know this. It just knows that foo will be 40 bytes into its area. ● Only the linker knows where foo will be located. At link time, say that AAA starts at 3000. The linker will fill in 3040 as part of its relocation. ● The assembler had generated a “fix me” note in the .OBJ file, recording something like “fixme: start of AAA + 40” for the linker.
Externals ● A related job for the assembler is to handle cases where one source-code file referred to something that is defined in another source-code file. ● At the assembly-language level, when you intend to use something defined in a different source code file, you declare that thing to be external. ● When you define something that you want to be used in another file, you declare it global.
Crossware's Example include xstdsys.h extern_main,__initiostreams,__init_cvars,__HP global__cstart,_exit ;***************************************************************************************** ; These sections are required by the Crossware C Compiler: area __STACK,4,data,high ; Linker places this at highest available ram location space1 org __LowestRomLocation ← new directive for you global __START __START * give the linker the start address dcdu __STACK ; Initialise supervisor stack for C compiler dcdu __cstart+1 ; Jump to __cstart on power up …...
Assembler Implementation ● Key data structures: – a “location counter”. (Some assemblers let you use its value as a constant, and they call it $) – an array of area information ● a saved location counter ● a buffer of all code/data generated so far into that area – a symbol table, mapping labels to their addresses. ● address: probably an offset within an specified area ● symbol table may also record a type, or whether the entry is global or external, etc.
Assembler : Rough Sketch Area ← default; $ ← 0; buffer ← empty buffer Repeat get a line of text, parse it and discard any comments if line has label L, then SymTab.put(L, Area, $) if line has directive AREA nm, type ( where nm is new) Areas.put(Area, $, buffer); Area ← nm; $ ← 0; else if line has directive DCB <someexpression> x = evaluate_constant_expression( <someexpression>, SymTab) buffer.add_byte(x); $ ← $+1; else if line has instruction ADD r X , r Y , r Z x = figure out machine code(“ADD”, rX, rY, rZ); buffer.add_word( x); $ ← $+4; else if line has instruction B <someexpression> if <someexpression> is a label in SymTab whose area matches Area distance = ($+8 – SymTab.getValue( <someexpression>))/4; x = figure out machine code(“B”, distance) buffer.add_word(x); $ ← $+4 else ??? else if line has ….. Until line has directive END
Problem 1: External References ● What if the assembler processes a line like B foo where foo is a label in a different source code file? ● Solution: buffer.add(figure_out_machine_code(B, 0)) buffer.add(<fixme note for linker: value foo>) $ ← $ + 4 // even if fixme note is large ● The final object code file will have a linked list of the various “fixme” places. ● At link time, the linker will know where “foo” will really be and it will replace the offset of 0 by the actual distance.
Problem 2: Forward References ● Contrast this: foo add r1, r2, r3 bne foo (foo is in the SymTab already) ● to this: add r1, r2, r3 bne foo (foo may not be in the SymTab yet) add r4, r5, r6 foo add r7, r7, r7
2-Pass Assemblers ● Fairly easy solution to the forward-reference problem: Process the source-code twice. ● First pass: pretend to generate code for the areas, but when something (ie, a forward reference) is unknown, just stick some padding (of the appropriate size) into the buffer. But create the symbol table. ● Second pass: run through the code again, but using the symbol table made in the first pass to generate the correct code for forward references.
Conditional Assembly ● At assembly time, you can test a condition (usually based on textual or constant equality) and exclude the assembler from seeing a block of code if the condition fails. ● Like an if-statement at assembly time . It affects what code is actually placed into your object file. ● One set of source code can generate machine code for slightly different platforms. ● C and C++ have this feature too.
Example Crossware Code ifeq __NoRom ... dcdu HardFault+1 …. elsec …. dcdu HardFault+27 endc
Other Conditional Directives ● ifeq <expr> checks whether <expr> is zero ● ifge <expr> checks whether it’s >= 0 ● iflt <expr> : <= 0 ● ifc <string1>,<string2> checks whether the two strings are equal to each other
Macros (textbook p 73) ● Macros allow a programmer to assign a mnemonic name to a bunch of assembler lines. ● When the mnemonic is then used, associated assembler lines are “copy pasted” into the source code (from the viewpoint of the assembler: actually, your actual source code is unchanged). ● Macros can have parameters that are substituted in the copy-paste process. ● Macros can function like assembly time methods. ● Not all assemblers support macros. Certain HLLs, notably C and C++, also have macros.
Macro in Crossware ● Macros and conditional compilation in Crossware’s ARM assembler and their 8051 assembler appear similar. The 8051 is documented...(see course website for link) ● foobar macr ….. some lines of assembly (body of macro)… endm ● To invoke: foobar R0, hello, 35 ← comma sep. args ● First param is \0 in body of macro (here, it is R0) ● Also, \1, \2, … ● Labels in the macro body should be \.0 to \.9 (On each invocation of the macro, a different label will be used)
Crossware Macro ● Silly R0, R5 expands to silly macr ifc \0,R0 ← no space mov R5,R0 mov \1, \0 ● Silly 18, R5 expands to elsec cmp R7, #18 cmp R7, #\0 beq \.0 beq temp000 mov \1, #\0 mov R5, #18 \.0 add R0, R0, R0 temp000 add R0,R0,R0 endc endm
Use of Macros ● Skilled assembler programmers (there are a few left…) often develop a library of macros that generate code for a variety of fiddly tasks. ● The include assembler directive requests that the assembler read a named file (perhaps with lots of juicy macro definitions) and act as if the contents had been pasted into this source code file. ● C language uses a similar mechanism. Java has a higher-level import idea.
Repetition at Assembly Time ● Sometime, you want the assembler to process the same block of code a bunch of times ● But you don't want to type it yourself ● Some assemblers (dunno about Crossware's) allow you to put a REPT <n> at the start of a block of lines, and ENDREPT at the end. ● Like an assembly-time FOR loop running n times. REPT 5 ADD R1, R2, R3 ← assembler sees this 5 times ENDREPT
“Macro Language” ● Together, macros, conditional assembly and maybe repetition essentially form a little programming language that runs at assembly time . ● If unlimited repetition (or recursive macros) are allowed, the macro language can be “Turing Complete” - wait till you finish CS2333. ● In the 1990s, Shaw/McNally challenged me to implement numerical integration in the TASM macro language. ● Usefulness: compute a table of values to use in the “real” program. ● N.B. The macro language for C is not Turing Complete.
Recommend
More recommend