Compilation 2016 Instruction Selection Aslan Askarov aslan@cs.au.dk Partially based on slides by E. Ernst
Where are we? High-level source code Translation to Lexing/Parsing Semantic analysis LLVM-- IR Low-level target Instruction code selection Register allocation
Instruction selection — translating IR elements into target • How to pick instructions for di ff erent IR elements? • When IR is relatively simple, such as LLVM--, the process is relatively straightforward • most of the hard work is done by the codegen • When IR is a bit more complex, such as the textbook IR Tree language, there is more work to be done at this phase • Maximum Munch algorithm
Tree IR language (from Textbook) • A simple tree expression language: signature TREE = sig type label = Temp.label datatype stm = MOVE of exp * exp | EXP of exp | JUMP of exp * label list | CJUMP of relop * exp * exp * label * label | SEQ of stm * stm | LABEL of label and exp = CONST of int | NAME of label | TEMP of Temp.temp | BINOP of binop * exp * exp | MEM of exp | CALL of exp * exp list | ESEQ of stm * exp and binop = PLUS | MINUS | MUL | DIV | AND | OR | LSHIFT | RSHIFT | ARSHIFT | XOR and relop = EQ | NE | LT | GT | LE | GE | ULT | ULE | UGT | UGE ... end
Instruction Selection for Tree IR language • Each IR node does one thing, real MEM machine instructions typically do BINOP several things • Ex: typical memory access ➜ PLUS e CONST • This is good, IR should be primitive c • Instruction selection = find ways to MEM express IR trees using instructions + • NB: using shorthand notation ➜ e CONST c
Describing Instructions • Basic device: the tree pattern • Matching idea • A tree pattern is a partial tree, a tile • From the top: concrete nodes • At bottom: blanks, standing for subtrees, called leaves • Repeated matching, tiling, reconstructs an IR tree • Read o ff instruction sequence: top-down traversal = reverse order
For illustration: Jouette • Need concrete instruction set • Hypothetical (RISC) CPU architecture ‘Jouette’ • Instructions ➜ ADD r i ⃪ r j + r k MUL r i ⃪ r j * r k • Three-address format: SUB r i ⃪ r j - r k flexible locations DIV r i ⃪ r j / r k • Arithmetic operations: ADDI r i ⃪ r j + c only in registers SUBI r i ⃪ r j - c • Addressing modes: LOAD r i ⃪ M[ r j + c ] only one address, fixed o ff set
Jouette Tiles • Two categories: • ‘Expression tile’: produces a result in a register • ‘Statement tile’: creates a side-e ff ect • Special case: a register is an atomic expression TEMP shorthand: TEMP (no name) r i t
Jouette Expression Tiles • Main arithmetic operations: unique patterns + ADD r i ⃪ r j + r k - SUB r i ⃪ r j - r k * MUL r i ⃪ r j * r k / DIV r i ⃪ r j / r k
Jouette Expression Tiles • Arithmetic operations involving immediate: multiple interpretations — multiple patterns + + CONST ADDI r i ⃪ r j + c CONST CONST SUBI r i ⃪ r j - c - CONST
Jouette Expression Tiles • Reading from memory: many interpretations LOAD r i ⃪ M[ r j + c ] MEM MEM MEM MEM CONST + + CONST CONST
Jouette Statement Tiles • Storing in memory: larger tiles STORE M[ r i + c ] ⃪ r j MOVE MOVE MOVE MOVE MEM MEM MEM MEM + + CONST CONST CONST
Jouette Statement Tiles • Moving in memory MOVE MOVEMM[ r i ] ⃪ M[ r j ] MEM MEM • (Not a typical RISC instruction, but illustrative) • NB: store tiles always match the two nodes MOVE(MEM,_) simultaneously
Example Tilings • Consider an IR tree for a[i] := x Discuss MOVE how this MEM MEM tree can + + specify that assignment! MEM * FP CONST x + TEMP i CONST 4 FP CONST a
Example Tilings • One way to tile this IR tree for a[i] := x MOVE LOAD r 1 ⃪ M[FP + a ] MEM MEM ADDI r 2 ⃪ r 0 + 4 MUL r 2 ⃪ r i * r 2 + + ADD r 1 ⃪ r 1 + r 2 MEM * FP CONST x LOAD r 2 ⃪ M[FP + x ] STORE M[ r 1 + 0] ⃪ r 2 + TEMP i CONST 4 FP CONST a
Example Tilings • Another way to tile this IR tree for a[i] := x MOVE MEM MEM LOAD r1 ⃪ M[FP + a] + + ADDI r2 ⃪ r0 + 4 MUL r2 ⃪ ri * r2 MEM * FP CONST x ADD r1 ⃪ r1 + r2 + TEMP i CONST 4 FP CONST a
Example Tilings • An “anti-optimal” tiling of the tree for a[i] := x MOVE ADDI r1 ⃪ r0 + a ADD r1 ⃪ FP + r1 MEM MEM LOAD r1 ⃪ M[r1 + 0] + + ADDI r2 ⃪ r0 + 4 MUL r2 ⃪ ri * r2 MEM * FP CONST x ADD r1 ⃪ r1 + r2 + TEMP i CONST 4 ADDI r2 ⃪ r0 + x ADD r2 ⃪ FP + r2 FP CONST a
Optimal vs Optimum Tilings • What’s the “best” tiling? • Minimal number of instructions? • Best performance at runtime? • Compositionally assumption: Can compute “best” based on each tile (reality: cost is not additive!) • Choice here: Minimal number of instructions • Optimal: No gain combining two neighboring tiles • Optimum: No tiling has lower cost • Property for optimal: local, for optimum: global • Note that optimum ⇒ optimal, not vice versa
Comparing Criteria • Obviously, optimal easier than optimum • Then, how valuable is optimum? • RISC CPU architecture: Not terribly important • each tile small, optimal/optimum often identical • CISC CPU architecture: More important • larger tiles, many choices everywhere
Algorithm: Maximal Munch • A greedy algorithm, fast, easy to understand • Idea: • Start from root of IR tree, work downward • At each node N , choose biggest tile that matches • Recur on leaves of chosen tile (not children of N ! ) • Note: Is never stuck if all single-node tiles exist
Maximal Munch Example • The second tiling for a[i] := x MOVE MEM MEM LOAD r1 ⃪ M[FP + a] ADDI r2 ⃪ r0 + 4 + + MUL r2 ⃪ ri * r2 ADD r1 ⃪ r1 + r2 MEM * FP CONST x ADDI r2 ⃪ FP + x + TEMP i CONST 4 FP CONST a
Optimum Algorithm • An algorithm based on dynamic programming, a bit more complex than maximal munch • Idea: • Start from bottom of IR tree, work upward (recursion: process children, then current node) • Concept: assign cost to each node (bottom up) • At each node, compute cost for each tile T by adding cost of T to cost of T' s leaves • Solution is optimum
Algorithm Complexity • Parameters: • N : number of nodes in given IR tree • T : number of tiles • K : average number of non-leaf nodes in tiles • K' : max no. of nodes to check to see which tiles match • T' : average number of tiles matching at a node • Maximal Munch: N/K(K'+T') • Optimum (dyn.pgm.) algorithm: N(K'+T') • But this is linear in the size of the IR tree! • “No problem!”
Tree Grammars • Motivation: Some CPUs, e.g., Motorola 68000, have register classes: data vs. address registers • Problem: using previous algorithm, sub-tiling may produce result in the wrong type of register • Idea: d ➜ MEM(+( a ,CONST)) • Specify tiles as CFG rules ➜ d ➜ MEM(+(CONST , a )) d ➜ MEM(CONST) • Non-terminal indicates class d ➜ MEM( a ) • Derivation creates IR tree d ➜ a a ➜ d • Ambiguity = alternative tilings • Tools exist (code-generator generators), usage not unlike parser generators
CPU Architecture Issues • RISC was mostly invented to fit well with modern code generation • RISC features, good and bad: • many registers (e.g., 32) • every register can do everything (just one class) • arithmetic operations only on registers (no MUL?) • three-address instructions (flexible placement) • just one memory addressing mode ( M[reg+const] ) • uniform instruction size (e.g., 32 bit) • every instruction has a single e ff ect/result
Summary • IR nodes do one thing, instructions many • Tree patterns, tiles, ‘leaves’ of tiles • Instruction selection: Cover IR tree with tiles • Jouette architecture, instruction set • Jouette statement tiles, expression tiles • Example tilings • Optimum vs. optimal tilings • Algorithms: Maximal munch; dyn. programming • Tree grammars
Recommend
More recommend