Instruction Selection Aslan Askarov aslan@cs.au.dk Partially based - PowerPoint PPT Presentation

    Compilation 2016 Instruction Selection Aslan Askarov aslan@cs.au.dk   Partially based on slides by E. Ernst

Where are we? High-level source code Translation to Lexing/Parsing Semantic analysis LLVM-- IR Low-level target Instruction code selection Register allocation

Instruction selection — translating IR elements into target • How to pick instructions for di ff erent IR elements? • When IR is relatively simple, such as LLVM--, the process is relatively straightforward • most of the hard work is done by the codegen • When IR is a bit more complex, such as the textbook IR Tree language, there is more work to be done at this phase • Maximum Munch algorithm

Instruction Selection for Tree IR language • Each IR node does one thing, real MEM machine instructions typically do BINOP several things • Ex: typical memory access ➜ PLUS e CONST • This is good, IR should be primitive c • Instruction selection = find ways to MEM express IR trees using instructions + • NB: using shorthand notation ➜ e CONST c

Describing Instructions • Basic device: the tree pattern • Matching idea • A tree pattern is a partial tree, a tile • From the top: concrete nodes • At bottom: blanks, standing for subtrees, called leaves   • Repeated matching, tiling, reconstructs an IR tree • Read o ff instruction sequence:   top-down traversal = reverse order

For illustration: Jouette • Need concrete instruction set • Hypothetical (RISC) CPU architecture ‘Jouette’ • Instructions ➜ ADD r i ⃪ r j + r k MUL r i ⃪ r j * r k • Three-address format:   SUB r i ⃪ r j - r k flexible locations DIV r i ⃪ r j / r k • Arithmetic operations:   ADDI r i ⃪ r j + c only in registers SUBI r i ⃪ r j - c • Addressing modes:   LOAD r i ⃪ M[ r j + c ] only one address, fixed o ff set

Jouette Tiles • Two categories: • ‘Expression tile’: produces a result in a register • ‘Statement tile’: creates a side-e ff ect • Special case: a register is an atomic expression TEMP shorthand: TEMP (no name) r i t

Jouette Expression Tiles • Main arithmetic operations: unique patterns + ADD r i ⃪ r j + r k - SUB r i ⃪ r j - r k * MUL r i ⃪ r j * r k / DIV r i ⃪ r j / r k

Jouette Expression Tiles • Arithmetic operations involving immediate:   multiple interpretations — multiple patterns + + CONST ADDI r i ⃪ r j + c CONST CONST SUBI r i ⃪ r j - c - CONST

Jouette Expression Tiles • Reading from memory: many interpretations LOAD r i ⃪ M[ r j + c ] MEM MEM MEM MEM CONST + + CONST CONST

Jouette Statement Tiles • Storing in memory: larger tiles STORE M[ r i + c ] ⃪ r j MOVE MOVE MOVE MOVE MEM MEM MEM MEM + + CONST CONST CONST

Jouette Statement Tiles • Moving in memory MOVE MOVEMM[ r i ] ⃪ M[ r j ] MEM MEM • (Not a typical RISC instruction, but illustrative) • NB: store tiles always match the two nodes MOVE(MEM,_) simultaneously

Example Tilings • Consider an IR tree for a[i] := x Discuss MOVE how this MEM MEM tree can + + specify that assignment! MEM * FP CONST x + TEMP i CONST 4 FP CONST a

Example Tilings • One way to tile this IR tree for a[i] := x MOVE LOAD r 1 ⃪ M[FP + a ] MEM MEM ADDI r 2 ⃪ r 0 + 4 MUL r 2 ⃪ r i * r 2 + + ADD r 1 ⃪ r 1 + r 2 MEM * FP CONST x LOAD r 2 ⃪ M[FP + x ] STORE M[ r 1 + 0] ⃪ r 2 + TEMP i CONST 4 FP CONST a

Example Tilings • Another way to tile this IR tree for a[i] := x MOVE MEM MEM LOAD r1 ⃪ M[FP + a] + + ADDI r2 ⃪ r0 + 4 MUL r2 ⃪ ri * r2 MEM * FP CONST x ADD r1 ⃪ r1 + r2 + TEMP i CONST 4 FP CONST a

Example Tilings • An “anti-optimal” tiling of the tree for a[i] := x MOVE ADDI r1 ⃪ r0 + a ADD r1 ⃪ FP + r1 MEM MEM LOAD r1 ⃪ M[r1 + 0] + + ADDI r2 ⃪ r0 + 4 MUL r2 ⃪ ri * r2 MEM * FP CONST x ADD r1 ⃪ r1 + r2 + TEMP i CONST 4 ADDI r2 ⃪ r0 + x ADD r2 ⃪ FP + r2 FP CONST a

Optimal vs Optimum Tilings • What’s the “best” tiling? • Minimal number of instructions? • Best performance at runtime? • Compositionally assumption: Can compute “best” based on each tile (reality: cost is not additive!) • Choice here: Minimal number of instructions • Optimal: No gain combining two neighboring tiles • Optimum: No tiling has lower cost • Property for optimal: local, for optimum: global • Note that optimum ⇒ optimal, not vice versa

Comparing Criteria • Obviously, optimal easier than optimum • Then, how valuable is optimum? • RISC CPU architecture: Not terribly important • each tile small, optimal/optimum often identical • CISC CPU architecture: More important • larger tiles, many choices everywhere

Algorithm: Maximal Munch • A greedy algorithm, fast, easy to understand • Idea: • Start from root of IR tree, work downward • At each node N , choose biggest tile that matches • Recur on leaves of chosen tile (not children of N ! ) • Note: Is never stuck if all single-node tiles exist

Maximal Munch Example • The second tiling for a[i] := x MOVE MEM MEM LOAD r1 ⃪ M[FP + a] ADDI r2 ⃪ r0 + 4 + + MUL r2 ⃪ ri * r2 ADD r1 ⃪ r1 + r2 MEM * FP CONST x ADDI r2 ⃪ FP + x + TEMP i CONST 4 FP CONST a

Optimum Algorithm • An algorithm based on dynamic programming, a bit more complex than maximal munch • Idea: • Start from bottom of IR tree, work upward   (recursion: process children, then current node) • Concept: assign cost to each node (bottom up) • At each node, compute cost for each tile T by adding cost of T to cost of T' s leaves • Solution is optimum

Algorithm Complexity • Parameters: • N : number of nodes in given IR tree • T : number of tiles • K : average number of non-leaf nodes in tiles • K' : max no. of nodes to check to see which tiles match • T' : average number of tiles matching at a node • Maximal Munch: N/K(K'+T') • Optimum (dyn.pgm.) algorithm: N(K'+T') • But this is linear in the size of the IR tree! • “No problem!”

Tree Grammars • Motivation: Some CPUs, e.g., Motorola 68000, have register classes: data vs. address registers • Problem: using previous algorithm, sub-tiling may produce result in the wrong type of register • Idea: d ➜ MEM(+( a ,CONST)) • Specify tiles as CFG rules ➜ d ➜ MEM(+(CONST , a )) d ➜ MEM(CONST) • Non-terminal indicates class d ➜ MEM( a ) • Derivation creates IR tree d ➜ a a ➜ d • Ambiguity = alternative tilings • Tools exist (code-generator generators),   usage not unlike parser generators

CPU Architecture Issues • RISC was mostly invented to fit well with modern code generation • RISC features, good and bad: • many registers (e.g., 32) • every register can do everything (just one class) • arithmetic operations only on registers (no MUL?) • three-address instructions (flexible placement) • just one memory addressing mode ( M[reg+const] ) • uniform instruction size (e.g., 32 bit) • every instruction has a single e ff ect/result

Summary • IR nodes do one thing, instructions many • Tree patterns, tiles, ‘leaves’ of tiles • Instruction selection: Cover IR tree with tiles • Jouette architecture, instruction set • Jouette statement tiles, expression tiles • Example tilings • Optimum vs. optimal tilings • Algorithms: Maximal munch; dyn. programming • Tree grammars

Instruction Selection Aslan Askarov aslan@cs.au.dk Partially based - PowerPoint PPT Presentation

Compilation 2016 Instruction Selection Aslan Askarov aslan@cs.au.dk Partially based on slides by E. Ernst Where are we? High-level source code Translation to Lexing/Parsing Semantic analysis LLVM-- IR Low-level target

GlobalISel LLVMs Latest Instruction Selection Framework Diana Picu Instruction Selection

Global Instruction Selection A Proposal Quentin Colombet Apple What Is Instruction Selection?

Synthesizing an Instruction Selection Rule Library from Semantic Specifjcations Sebastian

Adventures in Fuzzing Instruction Selection EuroLLVM 2017 Justin Bogner 1 Overview

Instruction Selection Akim Demaille tienne Renault Roland Levillain first . last

Back-end missing pieces Simone Campanoni simonec@eecs.northwestern.edu Instruction selection is

Instruction Selection for Tiger Aslan Askarov aslan@cs.au.dk Based on slides by E. Ernst

Instruction Selection on SSA Graphs Sebastian Hack, Sebastian Buchwald, Andreas Zwinkau Compiler

Instruction Selection for LLVM-- Aslan Askarov aslan@cs.au.dk The x86 assembly language

Instruction Set 2 Architecting a vocabulary for the HW INSTRUCTION SET OVERVIEW 3 Instruction

Concepts Introduced in Chapter 8 register assignment instruction selection run-time

ATSS PROCESS CORE INSTRUCTION, INTERVENTION SELECTION, EXTENSION AND PROGRESS MONITORING ATSS

Selection Rules: Selection Rules Each of the spectroscopies have associated selection

Instruction Selection and Scheduling Machine code generation cs5363 1 Machine code generation

EE 109 Unit 8 MIPS Instruction Set Architecting a vocabulary for the HW INSTRUCTION SET

SELECTION Deterministic Stochastic Proportionate selection: Roulette Wheel Selection

Pipelining Instruction Pipelining is the use of pipelining to allow more than one instruction to

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?

Instruction Scheduling cs5363 1 Instruction scheduling Reordered Original Instruction code

EE 109 Unit 10 MIPS Instruction Set MIPS INSTRUCTION OVERVIEW 10.3 10.4 Instruction Set

Basic Steps for Execution Fetch an instruction from the instruction store Decode it

Instruction-Level Parallelism (ILP) Fine-grained parallelism Obtained by: instruction

EXPLICIT INSTRUCTION WEBINAR #5: ORGANIZING FOR INSTRUCTION P R E S E N T E D B Y: G I N A H O

EE 457 Unit 3 Instruction Sets 2 With Focus on our Case Study: MIPS INSTRUCTION SET OVERVIEW 3

Instruction Selection Aslan Askarov aslan@cs.au.dk Partially based - PowerPoint PPT Presentation

Compilation 2016 Instruction Selection Aslan Askarov aslan@cs.au.dk Partially based on slides by E. Ernst Where are we? High-level source code Translation to Lexing/Parsing Semantic analysis LLVM-- IR Low-level target

GlobalISel LLVMs Latest Instruction Selection Framework Diana Picu Instruction Selection

Global Instruction Selection A Proposal Quentin Colombet Apple What Is Instruction Selection?

Synthesizing an Instruction Selection Rule Library from Semantic Specifjcations Sebastian

Adventures in Fuzzing Instruction Selection EuroLLVM 2017 Justin Bogner 1 Overview

Instruction Selection Akim Demaille tienne Renault Roland Levillain first . last

Back-end missing pieces Simone Campanoni simonec@eecs.northwestern.edu Instruction selection is

Instruction Selection for Tiger Aslan Askarov aslan@cs.au.dk Based on slides by E. Ernst

Instruction Selection on SSA Graphs Sebastian Hack, Sebastian Buchwald, Andreas Zwinkau Compiler

Instruction Selection for LLVM-- Aslan Askarov aslan@cs.au.dk The x86 assembly language

Instruction Set 2 Architecting a vocabulary for the HW INSTRUCTION SET OVERVIEW 3 Instruction

Concepts Introduced in Chapter 8 register assignment instruction selection run-time

ATSS PROCESS CORE INSTRUCTION, INTERVENTION SELECTION, EXTENSION AND PROGRESS MONITORING ATSS

Selection Rules: Selection Rules Each of the spectroscopies have associated selection

Instruction Selection and Scheduling Machine code generation cs5363 1 Machine code generation

EE 109 Unit 8 MIPS Instruction Set Architecting a vocabulary for the HW INSTRUCTION SET

SELECTION Deterministic Stochastic Proportionate selection: Roulette Wheel Selection

Pipelining Instruction Pipelining is the use of pipelining to allow more than one instruction to

ERP Selection KIRTANE &amp; PANDIT Suhas Deshpande Why ERP Selection is important ?

Instruction Scheduling cs5363 1 Instruction scheduling Reordered Original Instruction code

EE 109 Unit 10 MIPS Instruction Set MIPS INSTRUCTION OVERVIEW 10.3 10.4 Instruction Set

Basic Steps for Execution Fetch an instruction from the instruction store Decode it

Instruction-Level Parallelism (ILP) Fine-grained parallelism Obtained by: instruction

EXPLICIT INSTRUCTION WEBINAR #5: ORGANIZING FOR INSTRUCTION P R E S E N T E D B Y: G I N A H O

EE 457 Unit 3 Instruction Sets 2 With Focus on our Case Study: MIPS INSTRUCTION SET OVERVIEW 3

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?