Code Generation: Intro Sebastian Hack Saarland University Compiler - PowerPoint PPT Presentation

Code Generation: Intro Sebastian Hack Saarland University Compiler Construction W2015 saarland university computer science 1

Code Generation Consists (roughly) of three parts: 1. Instruction Selection Select processor instructions for IR instructions 2. Instruction Scheduling Linearize data-dependence graph of each basic block. 3. Register Allocation For each program point, decide which IR variable resides in what register or in memory. Properties: � All three are influence each other (phase ordering problem) � For reasonably realistic scenarios, each one is a NP-hard optimization problem � Compilers usually attack them heuristically (which works ok, often well) 2

Target Properties that Compilers have to care about � Instruction set architecture (ISA) of the CPU – How to “talk” to the processor – Affects several optimizations and transformations � Aspects of the CPU’s implementation – Organization of instruction execution (pipeline) – Memory hierarchy topology (cache sizes, associativity, sharing among cores) – Core topology (for automatic parallelization) � Conventions of the runtime / operating system – parameter passing of subroutines in libraries – how to address global data – interface to garbage collector – . . . 3

Instruction Set Architectures � RISC – Many registers, typically 32 – Few simple address modes – Load-/store-architecture – three-address code: Rz ← Rx ⊕ Ry – constant-length instruction encoding, typically 4 bytes – VLIW like RISC but compiler packs insns into bundles and manages parallel exec of instructions � CISC – Fewer registers, 8–16 – Complex address modes – Memory operands – two-address code: Rx ← Rx ⊕ Ry – variable-length instruction encoding (x86: from 1 to 15 bytes) Beware of the classical RISC / CISC debate! Today, most CPUs are RISC inside but might have CISC ISA. The processor translates CISC instructions into RISC instructions internally 4

ISA Examples: MIPS � prototypical RISC ISA � 32 registers � minimal core instruction set # $a0 = A, $a1 = i sal $t0 $a1 2 int *A; addu $t0 $a0 $t0 ... lw $t1 8($t0) A[i+2] += 100 addiu $t1 $t1 100 sw $t1 8($t0) = 20 Bytes 5

ISA Examples: x86 � CISC ISA � 8 Registers (64-bit mode 16 registers) � Powerful addressing modes: base register + (1,2,4) * index register + constant � For many instructions, one operand can be a memory cell (instead of reg) � Inhomogeneous register usage: some registers only work with some instructions � Hundreds of instructions in vector extensions # ebx = A, ecx = i int *A; mov eax , 100 ... add [ebx + ecx*4 + 8], eax A[i+2] += 100 = 5 Byte 6

ISA Examples: ARM � RISC-style: load/store, fixed-size insns, three-adress code � CISC-style: addressing modes (barrel shifter, pre/post increment/decrement) � 15 Registers (Reg 15 is PC) � Every instruction can be predicated (effect only on certain condition) Addressing Modes: RSB r9 , r5 , r5 , LSL #3 ; r9 = r5 * 8 - r5 or r9 = r5 * 7 SUB r3 , r9 , r8 , LSR #4 ; r3 = r9 - r8 / 16 ADD r9 , r5 , r5 , LSL #3 ; r9 = r5 + r5 * 8 or r9 = r5 * 9 LDR r2 , [r0 , r1 , LSL #2] ; r2 = M[r0 + 4 * r1] LDR r2 , [r1], #4 ; r2 = M[r1], r1 = r1 + 4 Predication: CMP r3 ,#0 BEQ skip CMP r3 ,#0 ADD r0 ,r1 ,r2 ADDNE r0 ,r1 ,r2 skip: 7

Hardware Properties relevant to the Compiler � In-order execution: – Compiler has to manage instruction level parallelism – Instruction scheduling very important direct influence on code latency – Cores have different functional units / pipes Not every instruction can go into each pipe – VLIW processors allow to pack instructions into bundles � Out-of-order execution: – Processor schedules instructions to functional units dynamically Analyzes data dependences of instruction stream – Resolves false dependencies by register renaming: Internally, processor has way more regs than the ISA has – Instruction scheduling less important because done by CPU – List of instruction merely a “data structure” to communicate the data dependence graph to the processor – Avoiding spill code is more important (critical) 8

Out-of-order vs. In-order � OOO costs more energy � OOO allows for worse compilers � OOO goes well along with speculation � Modern OOO processors speculate over several loop iterations to keep the FUs busy � Hard to imagine that something similar can be done statically � Itanium (high-performance Intel VLIW CPU from the 2000s) is considered a failure � Unclear, if same performance for less energy can be achieved with in-order arch and better compilers 9

Code Generation: Intro Sebastian Hack Saarland University Compiler - PowerPoint PPT Presentation

Code Generation: Intro Sebastian Hack Saarland University Compiler Construction W2015 saarland university computer science 1 Code Generation Consists (roughly) of three parts: 1. Instruction Selection Select processor instructions for IR

Code Generation Machine code generation cs4713 1 Machine code generation machine Intermediate

Code Generation Chapter 9 1 Compiler Construction Code Generation Issues in Code Generation

INF5110 Compiler Construction Spring 2016 1 / 98 Outline 1. Intermediate code generation

INF5110 Compiler Construction Spring 2017 1 / 97 Outline 1. Intermediate code generation

Interchange Intro Presentation Plus: Intro (Mixed media Interchange Intro Presentation Plus: Intro

Interchange Intro Presentation Plus: Intro (Mixed media Interchange Intro Presentation Plus: Intro

Instruction Selection and Scheduling Machine code generation cs5363 1 Machine code generation

INF5110 Compiler Construction Code generation Spring 2016 1 / 123 Outline 1. Code

{Sequential Code} {Sequential Code} {Sequential Code} {Sequential Code} {Sequential Code}

Compilers Introduction to Code Generation Alex Aiken Code Generation We focus on generating

80% of Code Red 2 Code Red 2 re-re- Code Red 1 and Code Red 2 Code Red 2 re- cleaned up

Selection Sort Section 10.2 Code for Selection Sort (cont.) Code for an Array Sort Code for an

in practice source code source code javac scalac groovyc jrubyc 0xCAFEBABE byte code

Array Code Generation 1. Array code generation 2. Surprises in memory access 3. Lessons learned

Compiler Design and Construction Code Generation Pop Quiz/Review What options do we have for

CMSC 430 Introduction to Compilers Spring 2016 Code Generation Introduction Code generation

CS 333 Introduction to Operating Systems Class 9 - Memory Management Jonathan Walpole Computer

CR16 Architecture CS/EE 3710 Part of a microcontroller family from National Semiconductor

Consistently Adding Primitive Recursive Definitions in ACL2 by John Cowles University of

Map and Foldr Dr. Mattox Beckman University of Illinois at Urbana-Champaign Department of

lecture 11 MIPS registers already mentioned new today MIPS assembly language 4 - functions

From batch to streaming to both Herman Schaaf, Senior Software Engineer A Story About me Herman

CS 744: SPARK STREAMING Shivaram Venkataraman Fall 2019 ADMINISTRIVIA - Midterm grades this

ProtoDUNE-DP Electronics and DAQ LBNC Meeting, 5 December 2019 Dario Autiero SFT chimneys,