pycket a tracing jit for a functional language
play

Pycket A Tracing JIT For a Functional Language APLS December 16, - PowerPoint PPT Presentation

Pycket A Tracing JIT For a Functional Language APLS December 16, 2015 Spenser Bauman 1 Carl Friedrich Bolz 2 Robert Hirschfeld 3 Vasily Kirilichev 3 Tobias Pape 3 Jeremy G. Siek 1 Sam Tobin-Hochstadt 1 1 Indiana University Bloomington, USA 2


  1. Pycket A Tracing JIT For a Functional Language APLS December 16, 2015 Spenser Bauman 1 Carl Friedrich Bolz 2 Robert Hirschfeld 3 Vasily Kirilichev 3 Tobias Pape 3 Jeremy G. Siek 1 Sam Tobin-Hochstadt 1 1 Indiana University Bloomington, USA 2 King’s College London, UK 3 Hasso-Plattner-Institut, University of Potsdam, Germany

  2. Problem: Racket is slow on generic code ( unless (= len (flvector-length v2)) (unsafe-flvector-ref v2 n))))))) (unsafe-fl+ sum (unsafe-fl* (unsafe-flvector-ref v1 n) (loop (unsafe-fx+ n 1) ( if (unsafe-fx= len n) sum ( let loop ([n 0] [sum 0.0]) (error 'fail)) ( define len (flvector-length v1)) Generic code: ( define (dot-fast v1 v2) Hand optimized: ( time (dot v1 v2)) ;; 3864 ms (* e1 e2))) (for/sum ([e1 v1] [e2 v2]) ( define (dot v1 v2) ( time (dot-fast v1 v2)) ;; 268 ms

  3. Problem: Racket is slow on contracts (define/contract (dot-safe v1 v2) ((vectorof flonum?) (vectorof flonum?) . -> . flonum?) (for/sum ([e1 v1] [e2 v2]) (* e1 e2))) ( time (dot-safe v1 v2)) ;; 8888 ms

  4. Problem: Racket is slow wrt. gradual typing Is Sound Gradual Typing Dead? Takikawa et al. POPL 2016 kcfa (7 modules) 128 128 128 128 128 128 128 128 128 tetris 512 512 512 1.00x (9 modules) 512 512 512 512 512 512 typed/untyped ratio 102 102 102 102 102 102 102 102 102 typed/untyped ratio 0.97x max. overhead 22.67x 410 410 410 410 410 410 410 410 410 77 77 77 77 77 77 77 77 77 max. overhead 117.28x mean overhead 9.23x 307 307 307 307 307 307 307 307 307 51 51 51 51 51 51 51 51 51 mean overhead 33.34x 3-deliverable 32 (25%) 205 205 205 205 205 205 205 205 205 26 26 26 26 26 26 26 26 26 3-deliverable 128 (25%) 48 (38%) 3/10-usable 102 102 102 102 102 102 102 102 102 0 0 0 3/10-usable 0 (0%) 0 0 0 0 0 0 1x 1x 1x 6x 6x 6x 10x 10x 10x 15x 15x 15x 20x 20x 20x 1x 1x 1x 6x 6x 6x 10x 10x 10x 15x 15x 15x 20x 20x 20x 1x 1x 1x 6x 6x 6x 10x 10x 10x 15x 15x 15x 20x 20x 20x 0 0 0 0 0 0 0 0 0 1x 1x 1x 6x 6x 6x 10x 10x 10x 15x 15x 15x 20x 20x 20x 1x 1x 1x 1x 1x 1x 6x 6x 6x 6x 6x 6x 10x 10x 10x 10x 10x 10x 15x 15x 15x 15x 15x 15x 20x 20x 20x 20x 20x 20x snake (8 modules) 256 256 256 256 256 256 256 256 256 synth (10 modules) 1024 1024 1024 1024 1024 1024 1024 1024 1024 0.92x typed/untyped ratio 205 205 205 205 205 205 205 205 205 typed/untyped ratio 1.03x 800 800 800 800 800 800 800 800 800 max. overhead 121.51x 154 154 154 154 154 154 154 154 154 max. overhead 85.90x 614 614 614 614 614 614 mean overhead 32.30x 614 614 614 102 102 102 102 102 102 mean overhead 39.69x 102 102 102 4 (2%) 400 400 400 3-deliverable 400 400 400 400 400 400 3-deliverable 15 (1%) 51 51 51 51 51 51 51 51 51 3/10-usable 28 (11%) 200 200 200 200 200 200 200 200 200 3/10-usable 73 (7%) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1x 1x 1x 1x 1x 1x 6x 6x 6x 6x 6x 6x 10x 10x 10x 10x 10x 10x 15x 15x 15x 15x 15x 15x 20x 20x 20x 20x 20x 20x 0 0 0 1x 1x 1x 6x 6x 6x 10x 10x 10x 15x 15x 15x 20x 20x 20x 1x 1x 1x 1x 1x 1x 6x 6x 6x 6x 6x 6x 10x 10x 10x 10x 10x 10x 15x 15x 15x 15x 15x 15x 20x 20x 20x 20x 20x 20x 1x 1x 1x 6x 6x 6x 10x 10x 10x 15x 15x 15x 20x 20x 20x

  5. Pycket is a tracing JIT compiler which reduces the need for manual specialization and reduces contract overhead. ( time (dot v1 v2)) ;; 74 ms ( time (dot-fast v1 v2)) ;; 74 ms (268 ms on Racket) ( time (dot-safe v1 v2)) ;; 95 ms

  6. Pycket tames overhead from gradual typing synth snake kcfa tetris 500 120 racket racket pycket pycket 400 100 hidden hidden number below number below 80 300 60 200 40 100 20 0 0 5 10 15 20 5 10 15 20 slowdown factor slowdown factor 250 1000 racket pycket 200 800 hidden number below number below 150 600 100 400 racket 50 200 pycket hidden 0 0 5 10 15 20 5 10 15 20 slowdown factor slowdown factor

  7. Idea: Apply dynamic language JIT compiler to Racket Take: Racket Apply: RPython Project Pycket + =

  8. Background: Tracing JIT Compilation Program Virtual Machine Input Interpret & Profile hot loop Optimize … … Code Gen. … Native … Tracing … Execution … … … …

  9. Background: Tracing JIT Compilation side exit Interpreter A A guard Execution Trace Program B C B D D

  10. Background: The PyPy Meta-Tracing JIT Python Python Program Interpreter Virtual Machine Input RPython Interpret & Profile hot loop Optimize … … Code Gen. … Native … Tracing … Execution … … … …

  11. The Pycket Meta-Tracing JIT Racket Racket Program Interpreter Virtual Machine Input RPython Interpret & Profile hot loop Optimize … … Code Gen. … Native … Tracing … Execution … … … …

  12. Our Racket Interpreter: The CEK Machine Programming Languages and Lambda Calculi. Flatt and Felleisen. 2007 e ::= x | λ x . e | ( e e ) | letcc x . e | e @ e κ ::= [] | arg ( e , ρ ):: κ | fun ( v , ρ ):: κ | ccarg ( e , ρ ):: κ | cc ( κ ):: κ v ::= λ x . e | κ ⟨ x , ρ, κ ⟩ �− → ⟨ ρ ( x ) , ρ, κ ⟩ ⟨ ( e 1 e 2 ) , ρ, κ ⟩ �− → ⟨ e 1 , ρ, arg ( e 2 , ρ ):: κ ⟩ ⟨ v 1 , ρ, arg ( e 2 , ρ ′ ):: κ ⟩ �− → ⟨ e 2 , ρ ′ , fun ( v 1 , ρ ):: κ ⟩ ⟨ v 2 , ρ, fun ( λ x . e , ρ ′ ):: κ ⟩ �− → ⟨ e , ρ ′ [ x �→ v 2 ] , κ ⟩ ⟨ letcc x . e , ρ, κ ⟩ �− → ⟨ e , ρ [ x �→ κ ] , κ ⟩ ⟨ ( e 1 @ e 2 ) , ρ, κ ⟩ �− → ⟨ e 1 , ρ, ccarg ( e 2 , ρ ):: κ ⟩ ⟨ κ 1 , ρ, ccarg ( e 2 , ρ ′ ):: κ ⟩ �− → ⟨ e 2 , ρ ′ , cc ( κ 1 ) :: κ ⟩ ⟨ v 2 , ρ, cc ( κ 1 ):: κ ) ⟩ �− → ⟨ v 2 , ρ, κ 1 ⟩

  13. Challenges particular to Racket without explicit loop constructs ▶ Detect loops for trace compilation in a higher-order language ▶ Reduce the need for manual specialization ▶ Reduce the overhead imposed by contracts

  14. Loop finding: cyclic paths . . . . . . . Record cycles in control flow . Default RPython strategy . . . . . pc 1 pc 2 pc 3 pc 4 pc 5 5 1 p c p c <

  15. Tracing cycles in the control flow is insufficient (my-add a b) . . Begin tracing at a hot node and continue until that node is reached again .. (loop a b)) 5 . 4 . The CEK machine has no notion of a program counter, (my-add a b) 3 . ( define (loop a b) 2 . ( define (my-add a b) (+ a b)) 1 . can try to use AST nodes instead. (+ a b)

  16. Tracing cycles in the control flow is insufficient (loop a b)) . (+ a b) . . Begin tracing at a hot node and continue until that node is reached again . 5 . 1 . (my-add a b) 4 . (my-add a b) 3 . ( define (loop a b) 2 . ( define (my-add a b) (+ a b)) (loop a b)

  17. Tracing cycles in the control flow is insufficient . . (loop a b) . (+ a b) . . Begin tracing at a hot node and continue until that node is reached again (loop a b)) 1 . 5 . (my-add a b) 4 . (my-add a b) 3 . ( define (loop a b) 2 . ( define (my-add a b) (+ a b)) (my-add a b) 1

  18. Tracing cycles in the control flow is insufficient 1 . . (+ a b) . . (loop a b) . (+ a b) . . Begin tracing at a hot node and continue until that node is reached again .. (loop a b)) 5 . (my-add a b) 4 . (my-add a b) 3 . ( define (loop a b) 2 . ( define (my-add a b) (+ a b)) (my-add a b) 1 (my-add a b) 2

  19. The Callgraph . . (+ a b) . . (+ a b) . . . 2. Mark functions in a cycle as a loop 1. Build the callgraph during execution Newer definition: A loop is a cycle in the program’s call graph. my-add . loop . (loop a b) (my-add a b) 1 (my-add a b) 2

  20. Unbox small, fixed-size arrays of Racket values Data Structure Specialization Env EnvSize3 Vals * List3 Val0 1 0 Fixnum: 1 Val1 3.14 1 Flonum: 3.14 Val2 * Symbol: 'a 2 Symbol: 'a

  21. Specialized Mutable Objects Optimistically specialize the representation of homogeneous containers When a mutating operation invalidates the current strategy, the storage is rewritten — this is fortunately infrequent [Bolz et al., OOPSLA 2013] FixnumCons 2 Vector strategy storage FloatVectorStrategy array 2 1.4 5.5

  22. Pycket: What Works? (open-input-file "list.txt") (open-output-file "brain.dat") number? complex? real? rational? integer? ... (define-contract ...) #lang typed/racket ▶ File IO ▶ Numeric tower ▶ Contracts ▶ Typed Racket ▶ Primitive Functions ( ∼ 900/1400 )

  23. Pycket: What Doesn’t Work? #lang scribble/base #lang web-server/insta (thread ( ฀ () ...)) ▶ FFI ▶ Scribble ▶ DrRacket ▶ Web ▶ Threads ▶ Lesser used primitives

  24. Performance Caveats Fast Slow Tight loops Branchy/irregular control flow Numeric Computations Code not easily expressed as loops Interpreters Short-running programs

  25. Benchmarks

  26. Overall Performance Larceny Benchmarks Shootout Benchmarks 1.0 0.8 0.6 geomean runtime 0.4 0.2 0.0 racket larceny gambit bigloo pycket racket pycket system system

  27. Specialization Despecialization Slowdown 30 25 20 % slowdown 15 10 5 0 racket pycket system

Recommend


More recommend