A Simplistic Translation Scheme CS429: Computer Organization and Architecture m.c ASCII source file Problems: Linking I & II Efficiency: small change requires complete Compiler re-compilation. Dr. Bill Young Modularity: hard to share m.s Department of Computer Sciences common functions (e.g., University of Texas at Austin printf). Assembler Solution: Static linker (or linker). Last updated: April 5, 2018 at 09:23 Binary executable object file p (memory image on disk) CS429 Slideset 23: 1 Linking I CS429 Slideset 23: 2 Linking I Better Scheme Using a Linker Linking Linking is the process of m.c a.c ASCII source files combining various pieces of code and data into a A linker takes representations of separate program modules and single file that can be Compiler Compiler combines them into a single executable . loaded (copied) into memory and executed. This involves two primary steps: m.s a.s 1 Symbol resolution: associate each symbol reference Linking could happen at: throughout the set of modules with a single symbol definition. Assembler Assembler compile time; 2 Relocation: associate a memory location with each symbol Separately compiled load time; m.o a.o definition, and modify each reference to point to that location. relocatable object files run time. Linker (ld) Must somehow tell a module about symbols Executable object file p (code and data for all functions from other modules. defined in m.c and a.c) CS429 Slideset 23: 3 Linking I CS429 Slideset 23: 4 Linking I
Translating the Example Program Role of the Assembler A compiler driver coordinates all steps in the translation and linking process. Typically included with each compilation system (e.g., gcc). Translate assembly code (compiled or hand generated) into Invokes the preprocessor (cpp), compiler (cc1), assembler machine code. (as), and linker (ld). Translate data into binary code (using directives). Passes command line arguments to the appropriate phases Resolve symbols—translate into relocatable offsets. Example: Create an executable p from m.c and a.c: Error checking: > gcc − O2 − v − o p m. c a . c Syntax checking; cpp [ args ] m. c /tmp/ cca07630 . i Ensure that constants are not too large for fields. cc1 /tmp/ cca07630 . i m. c − O2 [ args ] − o /tmp/ cca07630 . s as [ args ] − o /tmp/ cca076301 . o /tmp/ cca07630 . s < s i m i l a r p r o c e s s f o r a . c > l d − o p [ system obj f i l e s ] /tmp/ cca076301 . o /tmp/ cca076302 . o > CS429 Slideset 23: 5 Linking I CS429 Slideset 23: 6 Linking I What Does a Linker Do? Why Linkers? Merges object files Modularity Merges multiple relocatable (.o) object files into a single Programs can be written as a collection of smaller source files, executable object file that can be loaded and executed. rather than one monolithic mass. Resolves external references Can build libraries of common functions shared by multiple As part of the merging process, resolves external references. programs (e.g., math library, standard C library) External reference: reference to a symbol defined in another Efficiency object file. Time: Relocates symbols Change one source file, recompile, and then relink. Relocates symbols from their relative locations in the .o files No need to recompile other source files. to new absolute positions in the executable. Space: Updates all references to these symbols to reflect their new Libraries of common functions can be aggregated into a single positions. file. Yet executable files and running machine images contain only References can be in either code or data: code for the functions they actually use. code: a(); /* reference to symbol a */ data: *xp = &x; /* reference to symbol x */ CS429 Slideset 23: 7 Linking I CS429 Slideset 23: 8 Linking I
Example C Program Merging Relocatable Object Files Relocatable object files are merged into an executable by the Linker. Both are in ELF (Executable and Linkable Format). a.c headers system code system code m.c .text e x t e r n i n t e ; main() .data system data i n t e = 7; .text i n t ∗ ep = &e ; a() i n t x = 15; i n t main () more system code i n t y ; .text main() { system data i n t r = a () ; int e = 7 .data i n t a () int e = 7 } { .data int *ep = &e r e t u r n ∗ ep + x + y ; } int x = 15 a() .text .bss uninitialized data int *ep = &e .data .symtab int x = 15 .debug int y .bss CS429 Slideset 23: 9 Linking I CS429 Slideset 23: 10 Linking I Relocating Symbols and Resolving External References Relocating Symbols and Resolving External References (2) Symbols are lexical entities that name functions and variables. Each symbol has a value (typically a memory address). a.c Code consists of symbol definitions and references . e x t e r n i n t e ; References can be either local or external . i n t ∗ ep = &e ; // def of g l o b a l ep , r e f to m.c // e x t e r n a l symbol e i n t e = 7; // def of g l o b a l e i n t x = 15; // def of g l o b a l x i n t y ; // def of g l o b a l y i n t main () { i n t r = a () ; // r e f to e x t e r n a l symbol a i n t a () { // def of g l o b a l a e x i t (0) ; // r e f to e x t e r n a l symbol e x i t r e t u r n ∗ ep+x+y ; // r e f s of g l o b a l s ep , x , y // ( d e f i n e d i n l i b c . so ) } } Note that e is locally defined, but global in that it is visible to all modules. Declaring a variable static limits its scope to the current file module. CS429 Slideset 23: 11 Linking I CS429 Slideset 23: 12 Linking I
m.o Relocation Info a.o Relocation Info (.text) Disassembly of section .text Disassembly of section .text 00000000 < a > : a.c 00000000 < main > : 0: 55 pushl %ebp m.c 0: 55 pushl %ebp 1: 8b 15 00 00 00 movl 0x0 , %edx e x t e r n i n t e ; 1: 89 e5 movl %esp , %ebp 6: 00 i n t e = 7; 3: e8 f c f f f f f f c a l l 4 < main+0x4 > 3: R 386 32 ep i n t ∗ ep = &e ; 4: R 386 PC32 a 7: a1 00 00 00 00 movl 0x0 , %eax i n t x = 15; i n t main () { 8: 6a 00 pushl $0x0 8: R 386 32 x i n t y ; i n t r = a () ; a : e8 f c f f f f f f c a l l b < main+0xb > c : 89 e5 movl %esp , %ebp b : R 386 PC32 e x i t e x i t (0) ; e : 03 02 addl (%edx ) ,%eax i n t a () { f 90 nop } 10: 89 ec movl %ebp , %esp r e t u r n 12: 03 05 00 00 00 addl 0x0 , %eax ∗ ep + x + y ; Disassembly of section .data 17: 00 Source: objdump } 14: R 386 32 y 00000000 < e > : 18: 5d popl %ebp 0: 07 00 00 00 19: 3c r e t CS429 Slideset 23: 13 Linking I CS429 Slideset 23: 14 Linking I a.o Relocation Info (.data) Strong and Weak Symbols Program symbols are either strong or weak . a.c strong: procedures and initialized globals e x t e r n i n t e ; weak: uninitialized globals Disassembly of section .data i n t ∗ ep = &e ; This doesn’t apply to purely local variables. 00000000 < ep > : i n t x = 15; 0: 00 00 00 00 i n t y ; 0: R 386 32 e 00000004 < x > : p1.c p2.c 4: 0 f 00 00 00 i n t a () { r e t u r n ∗ ep + x + y ; i n t foo = 5; // foo : s t r o n g i n t foo ; // foo : weak here } p1 () { // p1 : s t r o n g p2 () { // p2 : s t r o n g . . . . . . } } CS429 Slideset 23: 15 Linking I CS429 Slideset 23: 16 Linking I
Linker Symbol Rules Linker Puzzles What happens in each case? File 1 File 2 Result Rule 1: A strong symbol can only appear once. int x; p1() {} p1() {} Rule 2: A weak symbol can be overridden by a strong symbol of int x; int x; the same name. p1() {} p2() {} References to the weak symbol resolve to the strong symbol. int x; double x; int y; p2() {} Rule 3: If there are multiple weak symbols, the linker can pick one p1() {} arbitrarily. int x=7; double x; p2() {} int y=5; p1() {} int x=7; int x; p1() {} p2() {} CS429 Slideset 23: 17 Linking I CS429 Slideset 23: 18 Linking I Linker Puzzles The Complete Picture m.c a.c Think carefully about each of these. Translators Translators (cc1, as) (cc1, as) File 1 File 2 Result int x; Link time error: two strong symbols (p1) p1() {} p1() {} m.o a.o int x; int x; References to x will refer to the same libwhatever.a p1() {} p2() {} unitialized int. What you wanted? Writes to x in p2 might overwrite y! int x; double x; int y; p2() {} That’s just evil! Linker (ld) p1() {} int x=7; double x; Writes to x in p2 might overwrite y! p2() {} Very nasty! int y=5; p libc.so libm.so p1() {} References to x will refer to the same int x=7; int x; p1() {} p2() {} initialized variable. Loader/Dynamic Linker (ld−linux.so) Nightmare scenario: two identical weak structs, compiled by different compilers with different alignment rules. p’ CS429 Slideset 23: 19 Linking I CS429 Slideset 23: 20 Linking I
Recommend
More recommend