CS626 Data Analysis and Simulation Instructor: Peter Kemper R 104A, phone 221-3462, email:kemper@cs.wm.edu Today: Trace Driven Simulation Reference: Law/Kelton, Simulation Modeling and Analysis, Ch 5.6. 1
What is trace-driven simulation? General idea Replace sequence of pseudo random numbers by measurement data from real system (historical data) in a simulation run. Examples Workload model: arrival times and types of requests/tasks Machine model: service times for tasks failures and repair times Purpose Model validation: Does the model represent the real system well enough? We need to compare the output of a simulation with the measurement data. Why is trace-driven simulation not used for production runs? Can only reproduce what happened historically Seldom enough data for all scenarios of interest Data limited to set of observed values (finite set of discrete values) 2
Trace driven simulation Idea: feed measurement data into simulation model Example from MAP fitting work by Casale, Smirni et al. 3
Trace-Driven Simulation Advantages Credibility Easy Validation: Compare simulation results with measured data Accurate Workload: Models correlation and dependencies Detailed workload: Can study effect of small changes Less Randomness: Input is deterministic input Fair Comparison: Better than random input Disadvantages Complexity: May be too detailed for simulation model Representativeness: Historical data may be outdated, may refer to very particular load situation and system configuration Finiteness: Simulation must stop at end of data Space: May take enormous amount of space Single Point of Validation: One particular scenario in design space Parameterization: Workload data difficult to parameterize/adjust 4
Comparing simulated and measured behavior Basic Inspection Approach To compare simulated and measured behavior, run simulation with input values sampled from a distribution and compare to measurement data. Seems classical area of statistical tests Are both sets of samples from the same distribution? But: Tests assume samples are i.i.d ... but simulated output is usually correlated and NOT independent. What if we compare estimates of performance measures? Law/Kelton compares 2 M/M/1 systems System X is M/M/1 with λ =1, ρ =.6 Model Y is M/M/1 with λ =1, ρ =.5 Observation: Sequence of delays in queue D i , let’s compare estimated means: correct E(X)=.87, E(Y)=.49 for first 200 customers Exp 1 µ X =0.90 µ Y =0.70 µ X -µ Y =0.20 Exp 2 µ X =0.70 µ Y =0.71 µ X -µ Y =-0.01 Exp 3 µ X =1.08 µ Y =0.35 µ X -µ Y =0.73 5
Trace-driven Simulation, Correlated Inspection Approach Correlation is good ... If System and Model face exactly the same observations from input RVs, then comparison should be more precise due to correlation. Why is that? Say RV X corresponds to the system, Y to the model Recall: Var(aX + bY) = a 2 Var(X) + b 2 Var(Y) + 2 ab Cov(X,Y) If X and Y are independent because the simulation draws from a distribution to produce values for Y, then Cov(X,Y)=0 and Var(X-Y) = Var(X) + Var(Y) If the model follows the measured input data of the system, then we can expect that X and Y are positively correlated, s.th. Var(X-Y) = Var(X) + Var(Y) - 2 Cov(X,Y) and V(X-Y) is reduced. Law/Kelton contains an illustrating example to show that Trace-driven and ordinary simulation both produce comparable estimates for the mean of a performance measure but trace-driven simulation results in a smaller variance of the estimate. 6
Technical Issues Given: Sequence of interarrival times for tasks/requests How to incorporate data into a model (here Mobius)? If not directly supported, we need to find a work-around ... Problem 1: Need to make data in file accessible Output-Gates of an Activity allow us to write C++ code segments Open file and load data into some internal data structure like an array Use an immediate, one-time activity to load data from file into array Problem 2: Store data such that an activity can access it Define an extended place to hold an array of floating point values State variables are accessible in activities since behavior can be state- dependent Problem 3: Make activity fire according to given interarrival times Define a timed activity with a deterministic delay Define the parameter of that delay to be the value at the current position in the array of interarrival times Increment the current position in the output gate of that activity Problem 4: Check if dynamic behavior is as expected (with trace) 7
Improvements Change array into a ring buffer load more data on-demand as necessary uses less space requires less configuration effort Encapsulate aspect into a separate atomic model Reuse same model to read multiple files for different input streams Requires some way to assign filenames appropriately Note: Many more ways to do this Mobius supports user defined libraries with C++ code Possible to implement file access with particular methods in a library Provide an iterator concept to access numerical entries in a file Memory mapped files as an alternative to arrays Have a robust parser for file access with an appropriate exception handling ... 8
Furthermore Law/Kelton, Section 5.6.2 If we can obtain m independent sets of system data n independent sets of simulation data we can take advantage of the independence and calculate confidence intervals for the µ X -µ Y Options paired-t approach, n=m but pairs can be correlated Welch approach, any values of n, m > 1 but X,Y must be independent Need to check if 0 is contained in interval between lower and upper bound Statistically significant vs practically significant Practically significant: Magnitude of difference invalidates any inference about the system that would be derived from the model 9
Recommend
More recommend