cs626 data analysis and simulation
play

CS626 Data Analysis and Simulation Instructor: Peter Kemper R 104A, - PowerPoint PPT Presentation

CS626 Data Analysis and Simulation Instructor: Peter Kemper R 104A, phone 221-3462, email:kemper@cs.wm.edu Today: Trace Driven Simulation Reference: Law/Kelton, Simulation Modeling and Analysis, Ch 5.6. 1 What is trace-driven simulation?


  1. CS626 Data Analysis and Simulation Instructor: Peter Kemper R 104A, phone 221-3462, email:kemper@cs.wm.edu Today: Trace Driven Simulation Reference: Law/Kelton, Simulation Modeling and Analysis, Ch 5.6. 1

  2. What is trace-driven simulation? General idea  Replace sequence of pseudo random numbers by measurement data from real system (historical data) in a simulation run. Examples  Workload model: arrival times and types of requests/tasks  Machine model:  service times for tasks  failures and repair times Purpose  Model validation: Does the model represent the real system well enough? We need to compare the output of a simulation with the measurement data. Why is trace-driven simulation not used for production runs?  Can only reproduce what happened historically  Seldom enough data for all scenarios of interest  Data limited to set of observed values (finite set of discrete values) 2

  3. Trace driven simulation Idea: feed measurement data into simulation model Example from MAP fitting work by Casale, Smirni et al. 3

  4. Trace-Driven Simulation Advantages  Credibility  Easy Validation: Compare simulation results with measured data  Accurate Workload: Models correlation and dependencies  Detailed workload: Can study effect of small changes  Less Randomness: Input is deterministic input  Fair Comparison: Better than random input Disadvantages  Complexity: May be too detailed for simulation model  Representativeness: Historical data may be outdated, may refer to very particular load situation and system configuration  Finiteness: Simulation must stop at end of data  Space: May take enormous amount of space  Single Point of Validation: One particular scenario in design space  Parameterization: Workload data difficult to parameterize/adjust 4

  5. Comparing simulated and measured behavior Basic Inspection Approach  To compare simulated and measured behavior, run simulation with input values sampled from a distribution and compare to measurement data.  Seems classical area of statistical tests  Are both sets of samples from the same distribution?  But: Tests assume samples are i.i.d ... but simulated output is usually correlated and NOT independent.  What if we compare estimates of performance measures? Law/Kelton compares 2 M/M/1 systems  System X is M/M/1 with λ =1, ρ =.6  Model Y is M/M/1 with λ =1, ρ =.5  Observation: Sequence of delays in queue D i , let’s compare estimated means: correct E(X)=.87, E(Y)=.49 for first 200 customers  Exp 1 µ X =0.90 µ Y =0.70 µ X -µ Y =0.20  Exp 2 µ X =0.70 µ Y =0.71 µ X -µ Y =-0.01  Exp 3 µ X =1.08 µ Y =0.35 µ X -µ Y =0.73 5

  6. Trace-driven Simulation, Correlated Inspection Approach Correlation is good ...  If System and Model face exactly the same observations from input RVs, then comparison should be more precise due to correlation. Why is that?  Say RV X corresponds to the system, Y to the model  Recall: Var(aX + bY) = a 2 Var(X) + b 2 Var(Y) + 2 ab Cov(X,Y)  If X and Y are independent because the simulation draws from a distribution to produce values for Y, then Cov(X,Y)=0 and Var(X-Y) = Var(X) + Var(Y)  If the model follows the measured input data of the system, then we can expect that X and Y are positively correlated, s.th. Var(X-Y) = Var(X) + Var(Y) - 2 Cov(X,Y) and V(X-Y) is reduced. Law/Kelton contains an illustrating example to show that  Trace-driven and ordinary simulation both produce comparable estimates for the mean of a performance measure but trace-driven simulation results in a smaller variance of the estimate. 6

  7. Technical Issues Given: Sequence of interarrival times for tasks/requests How to incorporate data into a model (here Mobius)?  If not directly supported, we need to find a work-around ...  Problem 1: Need to make data in file accessible  Output-Gates of an Activity allow us to write C++ code segments  Open file and load data into some internal data structure like an array  Use an immediate, one-time activity to load data from file into array  Problem 2: Store data such that an activity can access it  Define an extended place to hold an array of floating point values  State variables are accessible in activities since behavior can be state- dependent  Problem 3: Make activity fire according to given interarrival times  Define a timed activity with a deterministic delay  Define the parameter of that delay to be the value at the current position in the array of interarrival times  Increment the current position in the output gate of that activity  Problem 4: Check if dynamic behavior is as expected (with trace) 7

  8. Improvements Change array into a ring buffer  load more data on-demand as necessary  uses less space  requires less configuration effort Encapsulate aspect into a separate atomic model  Reuse same model to read multiple files for different input streams  Requires some way to assign filenames appropriately Note: Many more ways to do this  Mobius supports user defined libraries with C++ code  Possible to implement file access with particular methods in a library  Provide an iterator concept to access numerical entries in a file  Memory mapped files as an alternative to arrays  Have a robust parser for file access with an appropriate exception handling  ... 8

  9. Furthermore Law/Kelton, Section 5.6.2  If we can obtain  m independent sets of system data  n independent sets of simulation data  we can take advantage of the independence and calculate confidence intervals for the µ X -µ Y  Options  paired-t approach, n=m but pairs can be correlated  Welch approach, any values of n, m > 1 but X,Y must be independent  Need to check if 0 is contained in interval between lower and upper bound  Statistically significant vs practically significant  Practically significant: Magnitude of difference invalidates any inference about the system that would be derived from the model 9

Recommend


More recommend