core bench micro benchmarking for ocaml
play

Core bench: micro-benchmarking for OCaml Christopher S. Hardin and - PowerPoint PPT Presentation

Overview Implementation Core bench: micro-benchmarking for OCaml Christopher S. Hardin and Roshan P. James Jane Street September 24, 2013, OUD Workshop Christopher S. Hardin and Roshan P. James Core bench: micro-benchmarking for OCaml


  1. Overview Implementation Core bench: micro-benchmarking for OCaml Christopher S. Hardin and Roshan P. James Jane Street September 24, 2013, OUD Workshop Christopher S. Hardin and Roshan P. James Core bench: micro-benchmarking for OCaml

  2. Overview Implementation Micro-benchmarking Precise measurement is essential for writing performance sensitive code. Objective: Measure the execution cost of functions that are relatively cheap. Functions with execution times on the order of nanoseconds to a tens or hundreds of milli-seconds. A 3.4 GHz cpu runs several simple instructions per nanosecond. Christopher S. Hardin and Roshan P. James Core bench: micro-benchmarking for OCaml

  3. Overview Implementation Micro-benchmarking : Timing let t1 = Time.now () in f (); let t2 = Time.now () in report (t2 - t1) Time.now is often too imprecise (about 1 microsec). Asking for current time also takes time. Christopher S. Hardin and Roshan P. James Core bench: micro-benchmarking for OCaml

  4. Overview Implementation Micro-benchmarking : Timing let t1 = Time.now () in f (); let t2 = Time.now () in report (t2 - t1) Time.now is often too imprecise (about 1 microsec). Asking for current time also takes time. Christopher S. Hardin and Roshan P. James Core bench: micro-benchmarking for OCaml

  5. Overview Implementation Micro-benchmarking : Batch sizes let t1 = Time.now () in for i = 1 to batch_size do f (); done; let t2 = Time.now () in report batch_size (t2 - t1) Compute a batch size to account for the timer. Criterion for Haskell. Mean, Std deviation to account for system noise. Christopher S. Hardin and Roshan P. James Core bench: micro-benchmarking for OCaml

  6. Overview Implementation Micro-benchmarking : Batch sizes let t1 = Time.now () in for i = 1 to batch_size do f (); done; let t2 = Time.now () in report batch_size (t2 - t1) Compute a batch size to account for the timer. Criterion for Haskell. Mean, Std deviation to account for system noise. Christopher S. Hardin and Roshan P. James Core bench: micro-benchmarking for OCaml

  7. Overview Implementation Micro-benchmarking : Noise System noise from other processes and OS activity. More importantly, there are delayed costs due to GC. Variance in execution times is influenced by batch size. 5e+07 4e+07 runtime (ms) 3e+07 2e+07 1e+07 0 0 2000 4000 6000 8000 10000 batch size Christopher S. Hardin and Roshan P. James Core bench: micro-benchmarking for OCaml

  8. Overview Implementation Core bench : Linear regression Treats micro-benchmarking as a linear regression. Simple case: fit of execution time to batch size. Data of larger batch sizes have smaller %-error. Geometric sampling of batch sizes to get a better linear fit. 7000 6000 runtime (ms) 5000 4000 3000 2000 1000 0 0 1e+06 batch size Christopher S. Hardin and Roshan P. James Core bench: micro-benchmarking for OCaml

  9. Overview Implementation Core bench : Linear regression No need to estimate the clock and other constant errors: Constant overheads are accounted for in the y-intercept. Predict other costs in the same way. Estimate memory allocations and promotions using batch size. Estimate garbage collection using batch size. User specifies how much sampling time is allowed. More data allows better estimates. Error estimation, goodness of fit by Bootstrapping R 2 Christopher S. Hardin and Roshan P. James Core bench: micro-benchmarking for OCaml

  10. Overview Implementation Example source (basic) open Core.Std open Core_bench.Std let t1 = Bench.Test.create ~name:"id" (fun () -> ()) let t2 = Bench.Test.create ~name:"Time.now" (fun () -> ignore (Time.now ())) let t3 = Bench.Test.create ~name:"Array.create300" (fun () -> ignore (Array.create ~len:300 0)) let () = Command.run (Bench.make_command [t1; t2; t3]) Output Name Time/Run Minor Major ----------------- ---------- ------- ------- id 3.08 Time.now 843 2.00 Array.create300 3_971 301 Christopher S. Hardin and Roshan P. James Core bench: micro-benchmarking for OCaml

  11. Overview Implementation Some functions have strange execution times let benchmark = Bench.Test.create ~name:"List.init" (fun () -> ignore(List.init 100_000 ~f:id)) 700 observed 1-predictor model 600 500 runtime (ms) 400 300 200 100 0 0 100 200 300 400 500 batch size Christopher S. Hardin and Roshan P. James Core bench: micro-benchmarking for OCaml

  12. Overview Implementation Multiple predictors 700 observed runtime runs 600 promoted words compactions 500 milliseconds 400 300 200 100 0 0 50 100 150 200 250 300 350 400 450 batch size Christopher S. Hardin and Roshan P. James Core bench: micro-benchmarking for OCaml

  13. Overview Implementation Multiple predictors: fit Using runs, compactions, promoted as predictors 700 observed 1-predictor model 600 3-predictor model 500 runtime (ms) 400 300 200 100 0 0 100 200 300 400 500 batch size Christopher S. Hardin and Roshan P. James Core bench: micro-benchmarking for OCaml

  14. Overview Implementation Runtime cost decomposition example X = [batch size x , minor GCs, compactions], y = runtime (ns). Solve X β = y , x γ = X . Suppose we get   1 . 06 × 10 4 � � 1 . 04 × 10 6 β = γ = 1 0 . 00299 0 . 00149   2 . 25 × 10 6 Then (predicted) runtime is ns/mGC ns/cmp mGCs/run cmps/run � �� � � �� � � �� � � �� � γβ = (1 . 06 × 10 4 )(1) (1 . 04 × 10 6 ) (2 . 25 × 10 6 ) + (0 . 00299) + (0 . 00149) � �� � � �� � � �� � nominal minor GC cost compaction cost = 10 . 6 µ s + 3 . 1 µ s + 3 . 4 µ s = 17 . 4 µ s (Note: Just solving xm = y gives 17 . 4 µ s.) Christopher S. Hardin and Roshan P. James Core bench: micro-benchmarking for OCaml

  15. Overview Implementation Conclusion and Future Work opam install core bench Expose more predictors Measure the effect of live words on performance. Counters for major collection work per minor GC. Accuracy of results Ordinary least-squares is susceptible to outliers. Incorporate the fact that measurement error is heavy-tailed (on the positive side). Automatically select execution time based on error. Automatically pick predictors from a set. Christopher S. Hardin and Roshan P. James Core bench: micro-benchmarking for OCaml

  16. Overview Implementation Thank you. Christopher S. Hardin and Roshan P. James Core bench: micro-benchmarking for OCaml

Recommend


More recommend