bencherl a scalability benchmark suite for erlang otp
play

bencherl : A scalability benchmark suite for Erlang/OTP Stavros - PowerPoint PPT Presentation

bencherl : A scalability benchmark suite for Erlang/OTP Stavros Aronis 1 Nikolaos Papaspyrou 2 Katerina Roukounaki 2 Konstantinos Sagonas 1 , 2 Yiannis Tsiouris 2 Ioannis Venetis 2 1 Department of Information Technology, Uppsala University, Sweden


  1. bencherl : A scalability benchmark suite for Erlang/OTP Stavros Aronis 1 Nikolaos Papaspyrou 2 Katerina Roukounaki 2 Konstantinos Sagonas 1 , 2 Yiannis Tsiouris 2 Ioannis Venetis 2 1 Department of Information Technology, Uppsala University, Sweden 2 School of Electrical and Computer Engineering, National Technical University of Athens, Greece Erlang Workshop 2012, Copenhagen 1 / 23

  2. Motivation Frustrated Erlang programmer I thought my Erlang program was 100% parallelizable , but when I made it parallel and ran it on a machine with N CPU cores , I got a speedup that was much lower than N . Why? 2 / 23

  3. bencherl Serves both as a tool to run and analyze benchmarks and as an enhanceable benchmark repository Focuses on scalability, rather than on throughput or latency Examines how the following factors influence the scalability of Erlang applications Number of Erlang nodes Number of CPU cores Number of schedulers Erlang/OTP releases and flavors Command-line arguments to erl Can be used to study the performance of any Erlang application, as well as the Erlang/OTP itself 3 / 23

  4. Definitions Application : The piece of software whose execution behaviour we intend to measure and analyze. Benchmark : A specific use case of the application that includes setting up the environment, calling specific functions and using specific data. Runtime environment : A specific combination of values for the scalability factors. E.g. 8 Erlang nodes each node runs on a machine with 8 CPU cores each node uses 8 schedulers each node runs the R15B02 release of Erlang/OTP each node passes “+sbt db” as command-line arguments to erl 4 / 23

  5. Architecture 5 / 23

  6. Coordinator The module that coordinates everything during a bencherl run. Determines the benchmarks that should be executed Determines the runtime environments, where each benchmark should be executed Sets up each runtime environment before a benchmark is executed Prepares instruction files for the executor Performs any benchmark-specific pre- and post-execution actions 6 / 23

  7. Executor The module that executes a particular benchmark in a particular runtime environment. Receives detailed instructions from the executor about what to do Starts any necessary Erlang slave nodes Executes the benchmark in a new process Stops the Erlang slave nodes it started Makes sure that the output produced by the benchmark during its execution is written in an output file Makes sure that the measurements collected during the execution of the benchmark are written in a measurement file Uses erlang:now/0 and timer:diff/2 7 / 23

  8. Sanity checker The module that checks whether all executions of a particular benchmark produced the same output. Runs after a benchmark has executed in all desired runtime environments Examines the output produced by the benchmark in all runtime environments Decides whether the benchmark was successfully executed in all runtime environments Is based on the assumption that if a benchmark produces any output during its execution, then this output should be the same across all runtime environments, where the benchmark was executed Uses diff 8 / 23

  9. Graph plotter The module that plots scalability graphs based on the collected measurements. Runs after a benchmark has executed in all desired runtime environments Processes the measurements that were collected during the execution of the benchmark Plots a set of scalability graphs Uses gnuplot 9 / 23

  10. Scalability graphs Both time and speedup graphs Graphs that show how benchmarks scale when executed with a specific version of Erlang/OTP and command-line arguments and with a different number of schedulers (nodes) Graphs that show how benchmarks scale when executed with a specific version of Erlang/OTP and with different number of schedulers (nodes) and runtime options Graphs that show how benchmarks scale when executed with a specific runtime options and with different number of schedulers (nodes) and versions of Erlang/OTP 10 / 23

  11. Benchmarks bencherl comes with an initial collection of benchmarks. synthetic real-world dialyzer bench bang orbit int scalaris bench big parallel ehb pcmark ets test ran genstress serialmsg mbrot timer wheel This collection can be extended in two simple steps. 11 / 23

  12. Step 1: Add in bencherl everything that the benchmark needs for its execution. The sources of the Erlang application that it benchmarks E.g. dialyzer Any scripts to run before or after its execution E.g. a script that starts scalaris Any data that it needs for its execution E.g. for dialyzer bench the BEAM files Any specific configuration settings that it requires E.g. a specific cookie that nodes should share 12 / 23

  13. Step 2: Write the handler for the benchmark. A benchmark handler is a standard Erlang module exporting two functions. bench args : a function that returns the different argument sets that should be used for running a specific version of the benchmark bench_args(Vrsn, Conf) -> Args when Vrsn :: ’short’ | ’intermediate’ | ’long’, Conf :: [{Key :: atom(), Val :: term()}, ...], Args :: [[term()]]. run : a function that runs the benchmark on specific Erlang nodes, with specific arguments and configuration settings run(Args, Slaves, Conf) -> ’ok’ | {’error’, Reason} when Args :: [term()], Slaves :: [node()], Conf :: [{Key :: atom(), Val :: term()}, ...], Reason :: term(). 13 / 23

  14. A benchmark handler example -module(scalaris_bench). -include_lib("kernel/include/inet.hrl"). -export([bench_args/2, run/3]). bench_args(Version, Conf) -> {_, Cores} = lists:keyfind(number_of_cores, 1, Conf), [F1, F2, F3] = case Version of short -> [1, 1, 0.5]; intermediate -> [1, 8, 0.5]; long -> [1, 16, 0.5] end, [[T,I,V] || T <- [F1 * Cores], I <- [F2 * Cores], V <- [trunc(F3 * Cores)]]. run([T,I,V|_], _, _) -> {ok, N} = inet:gethostname(), {ok, #hostent{h_name=H}} = inet:gethostbyname(N), Node = list_to_atom("firstnode@" ++ H), rpc:block_call(Node, api_vm, add_nodes, [V]), io:format("~p~n", [rpc:block_call(Node, bench, quorum_read, [T,I])]), ok. 14 / 23

  15. Experience #1: Some benchmarks scale well. big - R15B01 - DEFARGS 45 ([1536]) 40 35 30 25 Speedup 20 15 10 5 0 0 10 20 30 40 50 60 70 # Schedulers 15 / 23

  16. Experience #2: Some benchmarks do not scale well on more than one node. orbit_int - R15B01 - DEFARGS 35 ([true,#Fun<bench.g1245.1>,10048,128]) ([false,#Fun<bench.g1245.1>,10048,128]) 30 25 20 Speedup 15 10 5 0 0 10 20 30 40 50 60 70 # Schedulers 16 / 23

  17. Experience #2: Some benchmarks do not scale well on more than one node. orbit_int - R15B01 - DEFARGS 4.5 ([true,#Fun<bench.g1245.1>,10048,128]) ([false,#Fun<bench.g1245.1>,10048,128]) 4 3.5 3 Speedup 2.5 2 1.5 1 0.5 0 0 10 20 30 40 50 60 70 # Nodes 17 / 23

  18. Experience #3: Some benchmarks do not scale. parallel - R15B01 - DEFARGS 1 ([70016,640]) 0.9 0.8 0.7 Speedup 0.6 0.5 0.4 0.3 0.2 0 10 20 30 40 50 60 70 # Schedulers 18 / 23

  19. Experience #4: Some benchmarks scale better with specific runtime options. dialyzer_bench - R15B01 8 (TNNPS,[plt]) (TNNPS,[otp]) (U,[plt]) (U,[otp]) 7 6 5 Speedup 4 3 2 1 0 2 4 6 8 10 12 14 16 # Schedulers 19 / 23

  20. Experience #5: Some benchmarks scale better with specific Erlang/OTP releases. scalaris_bench - DEFARGS 8 (R14B04,[64,1024,32]) (R15B01,[64,1024,32]) (R15B,[64,1024,32]) 7 6 5 Speedup 4 3 2 1 0 10 20 30 40 50 60 70 # Schedulers 20 / 23

  21. Conclusions bencherl is a publicly available scalability benchmark suite for Erlang/OTP ⇒ http://release.softlab.ntua.gr/bencherl Examines how nodes, cores, schedulers, Erlang/OTP versions and erl command-line options affect the scalability of Erlang applications Collects scalability measurements Plots scalability graphs Serves as a benchmark repository, where people can add their own benchmarks, so that they can be accessed and used by other people 21 / 23

  22. Future work bencherl currently collects only execution times ⇒ Collect more information during the execution of a benchmark (e.g. heap size) bencherl currently can only answer questions like “Does this application scale well for this scenario?” ⇒ Try to answer questions like “Why doesn’t this application scale well for this scenario?” bencherl could use DTrace 22 / 23

  23. Thank you! 23 / 23

Recommend


More recommend