Back-end missing pieces Simone Campanoni simonec@eecs.northwestern.edu
Instruction selection is part of the backend IR Back-end Tracing Instruction Register Code ..... and selection allocation generation data layout Assembly
Register allocation after instruction selection move Optimum! Total cost: 5 mem v3 lea (5+%rax*4), %rbx move A register allocation += *= 0 subq %rbx, %rax v1 -> rax movq 0(%rax), %r8 v1 v2 4 5 v2 -> rbx v1 *= 4 v3 -> r8 lea (5+%v1*4), %v2 v2 <- v1 subq %v2, %v1 v2 += 5 movq 0(%v1), %v3 v3 <- mem v1 0
Register allocation after instruction selection Temporary register lea (5+%rax*4), %rbx lea (5+%v1*4), %v2 A register allocation subq %rbx, %rax subq %v2, %v1 v1 -> rax movq 0(%rax), %r10 movq 0(%v1), %v3 v2 -> rbx movq %r10, O(%rsp) v3 -> stack O v3
Register allocation after instruction selection lea (5+%rax*4), %rbx lea (5+%v1*4), %v2 A register allocation subq %rbx, %rax subq %v2, %v1 v1 -> rax movq 0(%rax), %r10 movq 0(%v1), %v3 v2 -> rbx movq %r10, O(%rsp) movq %v3, %v4 v3 -> stack O movq O(%rsp), %r8 v4 -> r8 Peephole matching Wait, I thought we found the optimum …
Instruction selection is part of the backend IR Back-end Tracing Instruction Register Code and selection allocation generation data layout Peephole matching Assembly
Peephole matching • Basic idea: compiler can discover local improvements locally • Look at a small set of adjacent operations • Move a “peephole” over code & search for improvement • Example: store followed by load movq %r10, O(%rsp) movq %r10, O(%rsp) Peephole matching movq O(%rsp), %r8 movq %r10, %r8
Are we happy now with the generated assembly? Of course NOT!
The problem left lea (5+%rax*4), %rbx lea (5+%rax*4), %rbx subq %rbx, %rax subq %r9, %r10 movq 0(%rax), %r10 subq %rbx, %rax Instruction movq %r10, O(%rsp) movq %r10, 0(%r11) scheduling movq %r10, %r8 movq 0(%rax), %r10 subq %r9, %r10 movq %r10, O(%rsp) movq %r10, 0(%r11) movq %r10, %r8 Better schedule of instructions
Instruction selection is part of the backend IR Tracing Instruction Register Code and selection allocation generation data layout Peephole matching Instruction scheduling Back-end Assembly
Recommend
More recommend