Higgs, an Experimental JIT Compiler written in D DConf 2013 Maxime Chevalier-Boisvert Université de Montréal
Introduction ● PhD research: compilers, optimizing dynamic languages, type analysis, JIT compilation ● Higgs: experimental optimizing JIT for JS ● The core of Higgs is written in D ● This talk will be about ● Dynamic language optimization ● Higgs, JIT compilation, my research ● Experience implementing a JIT in D ● A JIT for D's CTFE
Dynamic Languages ● Dynamic typing ● Types associated with values ● Variables can change type over time ● No type annotations ● Late binding ● Symbols resolved dynamically (e.g.: globals) ● Dynamic loading of code (eval, load) ● Dynamic growth of objects ● Objects as dictionaries 3
Why so Slow? ● Reputation for being slow ● Easiest to implement in an interpreter ● Naive implementations have big overhead ● Values are usually “boxed” ● Values as pairs: datum + type tag ● Values as objects: CPython's numbers ● Basic operators (+, -, *, ...) have dynamic dispatch ● Global and field accesses as hash table lookups 4
Making it Fast ● Make the code more static ● Remove dynamic behavior where possible ● Requires type information ● Profiling ● Type analysis ● Prove that specific variables have a given type ● e.g.: x is always an integer ● e.g.: the function foo will never be redefined 5
Harder than it seems ● JS, Python, Ruby not designed with performance in mind ● Python: (re)write critical parts in C ● Dynamic code loading, eval ● Can break your assumptions ● Numerical towers, overflow checks ● Hard to prove overflows won't happen 6
Higgs ● Two main components: ● Interpreter ● JIT compiler ● Moderate complexity: ● D: ~23 KLOC ● JS: ~11 KLOC ● Python: ~2 KLOC ● JS support: ● ~ES5, no property attributes, no with 7
AST Tokens Source Source Lexer Parser Runtime IR CFG Stdlib Interpreter IR gen Profiling x86 ASM JIT Data 8
Building Higgs ● Lexer and parser written from scratch, in D ● Designed IR, began implementing AST->IR ● Began implementing basic interpreter ● Grew interpreter, runtime to cover more JS ● Built an x86 assembler, in D ● Implemented basic JIT compiler ● Currently: ● Implementing research ideas into JIT ● Icing on the cake: FFI, library support ● Added new unit tests at every step 9
The Interpreter ● Interpreter is used: ● For profiling ● Fallback for unimplemented JIT features ● To start executing code faster ● Designed to be: ● Simple, easy to maintain ● Quick to extend and experiment with ● "JIT-friendly" ● Interpreter is quite slow, 1000 cycles/instr 10
Higgs Interpreter Instructions IRInstr wsp tsp alloc limit ip IRInstr Word/type stacks IRInstr IRInstr Heap 11
JIT-Friendly ● Register based VM, not stack-based ● Easier to analyze/optimize ● IR based on a control-flow graph, not AST ● Closer to machine code ● Easier to reason about ● Interpreter stack is an array of values/words ● Directly reused by the JIT ● Not recursive 12
fib(n) ENTRY: ENTRY: ENTRY: BASE: If (n < 2) goto BASE else REC if (n < 2) goto BASE else REC If (n < 2) goto BASE else REC return n ENTRY: REC: REC: ENTRY: If (n < 2) goto BASE else REC If (n < 2) goto BASE else REC t0 = n - 1 t0 = n - 1 t1 = call fib(t0), return to CONT1 t1 = call fib(t0), return to CONT1 ENTRY: ENTRY: CONT1: REC: REC: ENTRY: ENTRY: CONT2: If (n < 2) goto BASE else REC If (n < 2) goto BASE else REC t0 = n - 1 t2 = n - 2 If (n < 2) goto BASE else REC If (n < 2) goto BASE else REC t4 = t1 + t3 t0 = n - 1 t1 = call fib(t0), return to CONT1 t3 = call fib(t2), return to CONT2 t1 = call fib(t0), return to CONT1 return t4 13
Low-level Instructions ● Higgs interprets a low-level IR ● Simplifies the interpreter ● Deals with simple, low-level ops – e.g.: imul, fmul, load, store, call, ret ● Knows little about JS semantics ● Simplifies the JIT ● Less duplicated functionality in interpreter and JIT ● Avoids implicit dynamic dispatch in IR ops – e.g.: the + operator in JS has lots of implicit branches! 14
Self-hosting ● Runtime and standard library are self-hosted ● JS primitives (e.g.: JS add operator) are implemented in an extended dialect of JS ● Exposes low-level operations ● Primitives are compiled/inlined/optimized like any other JS code ● Avoids opaque calls into C or D code ● Easy to extend/change runtime ● Higher compilation times ● Inlining is critical 15
// JS less-than operator (x < y) function $rt_lt(x, y) { // If x is integer if ( $ir_is_int32 (x)) { if ( $ir_is_int32 (y)) return $ir_lt_i32 (x, y); if ($ir_is_float(y)) return $ir_lt_f64($ir_i32_to_f64(x), y); } // If x is float if ($ir_is_float(x)) { if ($ir_is_int32(y)) return $ir_lt_f64(x, $ir_i32_to_f64(y)); if ($ir_is_float(y)) return $ir_lt_f64(x, y); } … 16 }
The Higgs Heap ● Higgs manages its own heap for JS objects ● GC is copying, semi-space, stop-the-world ● Extremely simple ● Allocation by incrementing a pointer ● References to D objects must be maintained ● i.e.: Function IR/AST ● Interpreter manipulates references to JS heap ● Higgs GC might invalidate these 17
Higgs heap object closure Interpreter Live functions D heap IRFunction IRInstr IRInstr IRInstr 18
The JIT Compiler ● Targets x86-64 only, for simplicity ● Kicks in once functions have been found hot enough (worth compiling) ● Execution counters on basic blocks ● Currently fairly basic ● No inlining, bulk of code is function calls ● Speedups of 5 to 20x ● Expected to soon reach 100x+ speedups 19
Current Research ● Context-driven basic block versioning ● Similar idea to procedure cloning ● Specializing based on: ● Low-level type information ● Register allocation state ● Accumulated facts ● Integrating this in the JIT ● Similarities with trace compilation 20
for (i = 0; i < k; ++i) { x = f1(x,y,z); y = f2(x,y,z); z = f3(x,y,z); } i < k LOOP_TEST x = f1(x,y,z); y = f2(x,y,z); LOOP_BODY z = f3(x,y,z); ++i LOOP_INCR LOOP_EXIT 21
for (i = 0; i < k; ++i) { x = f1(x,y,z); y = f2(x,y,z); z = f3(x,y,z); x: RAX } y: RCX z: stack slot 10 i: R9 LOOP_TEST LOOP_BODY LOOP_INCR LOOP_EXIT 22
for (i = 0; i < k; ++i) { x = f1(x,y,z); y = f2(x,y,z); z = f3(x,y,z); x: RAX } y: RCX z: stack slot 10 i: R9 LOOP_TEST x: RBX y: R11 LOOP_BODY z: stack slot 12 i: R9 LOOP_INCR LOOP_EXIT 23
for (i = 0; i < k; ++i) { x = f1(x,y,z); y = f2(x,y,z); z = f3(x,y,z); x: RAX } y: RCX z: stack slot 10 i: R9 mov RAX, RBX mov RCX, R11 LOOP_TEST mov RSI, [RSP + 12 * 8] mov [RSP + 10 * 8], RSI x: RBX y: R11 LOOP_BODY z: stack slot 12 i: R9 LOOP_INCR LOOP_EXIT 24
x: RBX x: RAX y: R11 y: RCX z: stack slot 12 z: stack slot 10 i: R9 i: R9 LOOP_TEST LOOP_TEST LOOP_TEST_V2 LOOP_TEST LOOP_BODY LOOP_BODY LOOP_BODY_V2 LOOP_BODY LOOP_INCR LOOP_INCR LOOP_INCR_V2 LOOP_INCR LOOP_EXIT 25
Advantages ● Automatically do loop peeling (when useful) ● Automatically do tail duplication ● Register allocation ● Fewer move operations ● Make simpler allocators more efficient ● Similar to trace compilation ● Accumulate knowledge ● Specialize based on types, constants 26
A “Multi-world” View ● Traditional control-flow analysis ● Compute a fixed-point (LFP or GFP) ● At each basic block, solution must agree ● Pessimistic answer agrees with all inputs ● Block versioning ● Multiple solutions possible for a block ● Don't necessarily have to sacrifice ● Shifting fixed point to versioning of blocks 27
Research Questions ● How much code blowup can we expect? ● Will we have to limit block versioning? ● What can we do to reduce code blowup? ● What performance gains can we expect? ● What kind of info should we version with? ● Constant propagation ● Granularity of type info used ● How much is too much? ● What is the effect on compilation time? 28
Why did you choose D? 29
JIT Compilers ● Need access to low-level operations ● Manual memory management ● Raw memory access ● System libraries ● Are very complex pieces of software ● Pipeline of code transformations ● Several interacting components ● Want to mitigate complexity ● Expressive language ● Garbage collection 30
I like C++, but... ● C++ is very verbose ● Header files are frustrating ● Redundant declarations ● Poor organization of code ● Annoying constraints ● C macros are messy and weak ● C++ templates still feel limited ● No standard GC implementation 31
Other Options ● Google's Go ● No templates/generics ● No pointer arithmetic (without casting) ● Very minimalist and very opinionated ● Mozilla's Rust ● Very young, still in flux ● Not an option when I started 32
D to the rescue! ● Garbage collection by default ● But manual memory management is still possible ● Has been around for over a decade ● More mature than newer systems languages ● Attractive collection of features ● mixins, CTFE, templates, closures ● Freedom to choose ● Community is active, responsive 33
Recommend
More recommend