use of a levy distribution for modeling best case
play

Use of a Levy Distribution for Modeling Best Case Execution Time - PowerPoint PPT Presentation

Use of a Levy Distribution for Modeling Best Case Execution Time Variation Jonathan Beard, Roger Chamberlain SBS Stream Based Supercomputing Lab http://sbs.wustl.edu Work also supported by: 1 Outline Motivation Stream Processing


  1. Use of a Levy Distribution for Modeling Best Case Execution Time Variation Jonathan Beard, Roger Chamberlain SBS Stream Based Supercomputing Lab http://sbs.wustl.edu Work also supported by: 1

  2. Outline • Motivation � • Stream Processing � • Optimization Goals � • Methodology � • Distributions � • Results 2

  3. Streaming Computing SBS Stream Based 3 Supercomputing Lab http://sbs.wustl.edu

  4. Streaming Computing Kernel SBS Stream Based 3 Supercomputing Lab http://sbs.wustl.edu

  5. Streaming Computing Kernel 2 Stream Stream Kernel 1 Kernel 3 Stream Stream Kernel 2 SBS Stream Based 4 Supercomputing Lab http://sbs.wustl.edu

  6. Streaming Languages StreamIt, Auto-Pipe, Brook, Cg, S- Net, Scala-Pipe, Streams-C and many others SBS Stream Based 5 Supercomputing Lab http://sbs.wustl.edu

  7. Optimization Slow Medium Kernel Fast Super Fast SBS Stream Based 6 Supercomputing Lab http://sbs.wustl.edu

  8. Optimization multi-core A multi-core B Kernel 2 1 2 1 2 3 4 3 4 Kernel 1 Kernel 3 More allocation choices, NUMA node A or B to Kernel 2 allocate stream. SBS Stream Based 7 Supercomputing Lab http://sbs.wustl.edu

  9. Optimization 2 1 multi-core A multi-core B Kernel 2 1 2 1 2 3 4 3 4 Kernel 1 Kernel 3 More allocation choices, NUMA node A or B to Kernel 2 allocate stream. SBS Stream Based 7 Supercomputing Lab http://sbs.wustl.edu

  10. Optimization 2 1 multi-core A multi-core B Kernel 2 1 2 1 2 3 4 3 4 Kernel 1 Kernel 3 More allocation choices, NUMA node A or B to Kernel 2 allocate stream. SBS Stream Based 7 Supercomputing Lab http://sbs.wustl.edu

  11. Optimization A B C “Stream” is modeled as a Queue A B Q2 C Q1 SBS Stream Based 8 Supercomputing Lab http://sbs.wustl.edu

  12. Optimization A B C “Stream” is modeled as a Queue A B Q2 C Q1 SBS Stream Based 8 Supercomputing Lab http://sbs.wustl.edu

  13. Streaming on Multi-core Systems We want good models for streaming systems on shared multi-core systems (i.e., a cluster) Problem: Accurate measurement is very difficult. Is there a way to decide on a model without it. • Commodity multi-core timer availability and latency • Frequency scaling and core migration • Measuring modifies the application behavior SBS Stream Based 9 Supercomputing Lab http://sbs.wustl.edu

  14. Derived Information Expected Observed SBS Stream Based 10 Supercomputing Lab http://sbs.wustl.edu

  15. Derived Information Expected Observed Is there a pattern of minimal variation within the systems we’re running on? Avg. Service Time = E[ X ] + Error SBS Stream Based 10 Supercomputing Lab http://sbs.wustl.edu

  16. Goal Find a distribution that characterizes the minimum expected variation of a hardware and software system Use this characterization to accept or reject models SBS Stream Based 11 Supercomputing Lab http://sbs.wustl.edu

  17. Process • Measurement � • Workload definition � • Find a distribution � • Utilize the distribution to aid model selection SBS Stream Based 12 Supercomputing Lab http://sbs.wustl.edu

  18. Timer Mechanism Ask for Time Timer Thread Code Receive Time SBS Stream Based 13 Supercomputing Lab http://sbs.wustl.edu

  19. Timer Mechanism Timer Thread rdtsc clock_gettime • POSIX standard • x86 assembly • relatively accurate • varying methods • portable to serialize • slower than rdtsc • relatively fast • multiple drift issues SBS Stream Based 14 Supercomputing Lab http://sbs.wustl.edu

  20. Two Timing Choices SBS Stream Based 15 Supercomputing Lab http://sbs.wustl.edu

  21. NUMA Node Variations SBS Stream Based 16 Supercomputing Lab http://sbs.wustl.edu

  22. Minimize Variation • Restricting timer to single core � • Use the x86 rdtsc instruction with processor recommended serializers for each processor type � • Keeping processes under test on the same NUMA node as timer � • Run timer thread with altered priority to minimize core context swaps SBS Stream Based 17 Supercomputing Lab http://sbs.wustl.edu

  23. B est C ase E xecution T ime V ariation • no-op instruction implemented in most processors � • usually takes exactly 1 cycle � • no real functional units are involved, so least taxing � • variation observed in execution time should be external to process SBS Stream Based 18 Supercomputing Lab http://sbs.wustl.edu

  24. Data Collection • no-op loops calibrated for various nominal times, tied to a single core and run thousands of times � • Execution time measured end to end for each run, environment collected � • Parameters include: Number of processes executing on core Number of context swaps (voluntary, involuntary) Many others SBS Stream Based 19 Supercomputing Lab http://sbs.wustl.edu

  25. Levy Distribution Execution Time Error ( obs - mean ) SBS Stream Based 20 Supercomputing Lab http://sbs.wustl.edu

  26. Levy Distribution Normal Distribution SBS Stream Based 21 Supercomputing Lab http://sbs.wustl.edu

  27. Levy Distribution Gumbel Distribution SBS Stream Based 22 Supercomputing Lab http://sbs.wustl.edu

  28. Levy Distribution Levy Distribution SBS Stream Based 23 Supercomputing Lab http://sbs.wustl.edu

  29. Levy Distribution Levy Distribution SBS Stream Based 23 Supercomputing Lab http://sbs.wustl.edu

  30. Levy Distribution • Truncation enables mean calculation, but requires fitting to each dataset to find where to truncate � • The truncation parameters are correlated to both the number of processes per core and the expected execution time � • Roughly linear relationship gives an approximate solution to truncation parameters without refitting SBS Stream Based 24 Supercomputing Lab http://sbs.wustl.edu

  31. Levy Fit 1 - 5 processes 6 - 10 processes � 0.000014 � 0.0000125 � 0.0000145 � 0.000013 � 0.000015 � 0.0000135 � 0.0000155 � 0.000014 � 0.000025 � 0.00001 � 0.000025 � 0.00001 0 11 - 15 processes 16 - 20 processes � 0.00002 � 0.00001 � 0.000025 � 0.000015 � 0.00003 � 0.000035 � 0.00002 � 0.00004 � 0.000025 � 0.000045 � 0.00003 � 0.00005 � 0.00006 � 0.00003 0 � 0.00005 � 0.00002 0 SBS Stream Based 25 Supercomputing Lab http://sbs.wustl.edu

  32. Test Setup A B Q1 Question: Can we use an M/M/1 queueing model to estimate the mean queue occupancy of this system? � Hypothesis: Lower Kullback-Leibler (KL) divergence between expected and realized distribution is associated with higher model accuracy. SBS Stream Based 26 Supercomputing Lab http://sbs.wustl.edu

  33. Test Setup A B Q1 1. Dedicated thread of execution monitors queue occupancy 2. Calculate the estimated mean queue occupancy using the M/M/1 model 3. Calculate KL Divergence for the arrival process distribution using the truncated Levy distribution noise model SBS Stream Based 27 Supercomputing Lab http://sbs.wustl.edu

  34. Convolution with Exponential SBS Stream Based 28 Supercomputing Lab http://sbs.wustl.edu

  35. Conclusions • The truncated Levy distribution can be used to approximate BCETV � • The distribution of BCETV can be used as a tool to accept or reject a stochastic queueing model based on distributional assumptions � • KL divergence between the expected and convolved distribution highly correlates with queue model accuracy SBS Stream Based 29 Supercomputing Lab http://sbs.wustl.edu

  36. Parting Notes Slides available here: sbs.wust.edu � Timer C++ template code: http://goo.gl/ItJ3jP � Test harness used to collect data: http://goo.gl/U1VG6N SBS Stream Based 30 Supercomputing Lab http://sbs.wustl.edu

Recommend


More recommend