Computer Science, Informatik 4 Communication and Distributed Systems Simulation “Discrete-Event System Simulation” Dr. Mesut Güneş
Computer Science, Informatik 4 Communication and Distributed Systems Chapter 8 Input Modeling
Computer Science, Informatik 4 Communication and Distributed Systems Purpose & Overview Input models provide the driving force for a simulation model. � The quality of the output is no better than the quality of inputs. � In this chapter, we will discuss the 4 steps of input model � development: 1) Collect data from the real system 2) Identify a probability distribution to represent the input process 3) Choose parameters for the distribution 4) Evaluate the chosen distribution and parameters for goodness of fit. Dr. Mesut Güneş Chapter 8. Input Modeling 3
Computer Science, Informatik 4 Communication and Distributed Systems Data Collection One of the biggest tasks in solving a real problem � • GIGO – Garbage-In-Garbage-Out System Input Raw Data Performance Output Data simulation Even when model structure is valid simulation results can be � misleading, if the input data are • inaccurately collected • inappropriately analyzed • not representative of the environment Dr. Mesut Güneş Chapter 8. Input Modeling 4
Computer Science, Informatik 4 Communication and Distributed Systems Data Collection � Suggestions that may enhance and facilitate data collection: • Plan ahead: begin by a practice or pre-observing session, watch for unusual circumstances • Analyze the data as it is being collected: check adequacy • Combine homogeneous data sets: successive time periods, during the same time period on successive days • Be aware of data censoring: the quantity is not observed in its entirety, danger of leaving out long process times • Check for relationship between variables: build scatter diagram • Check for autocorrelation: • Collect input data, not performance data Dr. Mesut Güneş Chapter 8. Input Modeling 5
Computer Science, Informatik 4 Communication and Distributed Systems Identifying the Distribution Histograms � Scatter Diagrams � Selecting families of distribution � Parameter estimation � Goodness-of-fit tests � Fitting a non-stationary process � Dr. Mesut Güneş Chapter 8. Input Modeling 6
Computer Science, Informatik 4 Communication and Distributed Systems Histograms � A frequency distribution or histogram is useful in determining the shape of a distribution � The number of class intervals depends on: • The number of observations • The dispersion of the data • Suggested number of intervals: the square root of the sample size � For continuous data: • Corresponds to the probability density function of a theoretical distribution � For discrete data: • Corresponds to the probability mass function If few data points are available � • combine adjacent cells to eliminate the ragged appearance of the histogram Dr. Mesut Güneş Chapter 8. Input Modeling 7
Computer Science, Informatik 4 Communication and Distributed Systems Histograms Vehicle Arrival Example: Number of vehicles arriving at an intersection � between 7 am and 7:05 am was monitored for 100 random workdays. Arrivals per Period Frequency 0 12 1 10 2 19 3 17 Same data 4 10 with different 5 8 interval sizes 6 7 7 5 8 5 9 3 10 3 11 1 There are ample data, so the histogram may have a cell for each possible � value in the data range Dr. Mesut Güneş Chapter 8. Input Modeling 8
Computer Science, Informatik 4 Communication and Distributed Systems Histograms – Example � Life tests were performed on electronic components at 1.5 times the nominal voltage, and their lifetime was recorded Component Life Frequency 0 ≤ x < 3 23 3 ≤ x < 6 10 6 ≤ x < 9 5 9 ≤ x < 12 1 12 ≤ x < 15 1 … 42 ≤ x < 45 1 … 144 ≤ x < 147 1 Dr. Mesut Güneş Chapter 8. Input Modeling 9
Computer Science, Informatik 4 Communication and Distributed Systems Histograms – Example Stanford University Mobile Activity Traces (SUMATRA) • Target community: cellular network research community • Traces contain mobility as well as connection information Available traces � • SULAWESI (S.U. Local Area Wireless Environment Signaling Information) • BALI (Bay Area Location Information) BALI Characteristics � • San Francisco Bay Area • Trace length: 24 hour • Number of cells: 90 • Persons per cell: 1100 • Persons at all: 99.000 Question: How to transform the BALI � • Active persons: 66.550 information so that it is usable with a • Move events: 243.951 network simulator, e.g., ns-2? • Call events: 1.570.807 • Node number as well as connection number is too high for ns-2 Dr. Mesut Güneş Chapter 8. Input Modeling 10
Computer Science, Informatik 4 Communication and Distributed Systems Histograms – Example 1800 Analysis of the BALI Trace � 1600 • Goal: Reduce the amount of 1400 data by identifying user groups 1200 e 1000 l p o User group � e 800 P 600 • Between 2 local minima 400 200 • Communication characteristic 50 0 40 is kept in the group 30 0 C 5 20 • A user represents a group a l 10 l s s 10 15 n t e m e v o Groups with different mobility 20 M � 0 characteristics 25000 • Intra- and inter group 20000 communication Number of People 15000 Interesting characteristic � 10000 • Number of people with odd number movements is 5000 negligible! 0 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Number of Movements Dr. Mesut Güneş Chapter 8. Input Modeling 11
Computer Science, Informatik 4 Communication and Distributed Systems Scatter Diagrams � A scatter diagram is a quality tool that can show the relationship between paired data • Random Variable X = Data 1 • Random Variable Y = Data 2 • Draw random variable X on the x -axis and Y on the y -axis Strong Correlation Moderate Correlation No Correlation Dr. Mesut Güneş Chapter 8. Input Modeling 12
Computer Science, Informatik 4 Communication and Distributed Systems Scatter Diagrams � Linear relationship • Correlation: Measures how well data line up • Slope: Measures the steepness of the data • Direction • Y Intercept Dr. Mesut Güneş Chapter 8. Input Modeling 13
Computer Science, Informatik 4 Communication and Distributed Systems Selecting the Family of Distributions A family of distributions is selected based on: � • The context of the input variable • Shape of the histogram Frequently encountered distributions: � • Easier to analyze: Exponential, Normal and Poisson • Harder to analyze: Beta, Gamma and Weibull Dr. Mesut Güneş Chapter 8. Input Modeling 14
Computer Science, Informatik 4 Communication and Distributed Systems Selecting the Family of Distributions Use the physical basis of the distribution as a guide, for example: � • Binomial: Number of successes in n trials • Poisson: Number of independent events that occur in a fixed amount of time or space • Normal: Distribution of a process that is the sum of a number of component processes • Exponential: time between independent events, or a process time that is memoryless • Weibull: time to failure for components • Discrete or continuous uniform: models complete uncertainty • Triangular: a process for which only the minimum, most likely, and maximum values are known • Empirical: resamples from the actual data collected Dr. Mesut Güneş Chapter 8. Input Modeling 15
Computer Science, Informatik 4 Communication and Distributed Systems Selecting the Family of Distributions Remember the physical characteristics of the process � • Is the process naturally discrete or continuous valued? • Is it bounded? No “true” distribution for any stochastic input process � Goal: obtain a good approximation � Dr. Mesut Güneş Chapter 8. Input Modeling 16
Computer Science, Informatik 4 Communication and Distributed Systems Quantile-Quantile Plots Q-Q plot is a useful tool for evaluating distribution fit � If X is a random variable with CDF F , then the q -quantile of X is the γ � such that γ = ≤ γ = < < F( ) P(X ) q , for 0 q 1 • When F has an inverse, γ = F -1 (q) Let { x i , i = 1,2, …., n} be a sample of data from X and { y j , j = 1,2, …, n } � be the observations in ascending order: ⎛ ⎞ j - 0.5 - 1 y is approximately F ⎜ ⎟ j ⎝ ⎠ n • where j is the ranking or order number Dr. Mesut Güneş Chapter 8. Input Modeling 17
Computer Science, Informatik 4 Communication and Distributed Systems Quantile-Quantile Plots � The plot of y j versus F -1 ( ( j - 0.5 ) / n ) is • Approximately a straight line if F is a member of an appropriate family of distributions • The line has slope 1 if F is a member of an appropriate family of distributions with appropriate parameter values Dr. Mesut Güneş Chapter 8. Input Modeling 18
Recommend
More recommend