Input Distributions Reading: Chapter 6 in Law Input Distributions Overview Probability Theory for Choosing Distributions Peter J. Haas Data-Driven Approaches Other Approaches to Input Distributions CS 590M: Simulation Spring Semester 2020 1 / 22 2 / 22 Overview To specify a simulation model, we need to define clock-setting “input” sequences Examples: Input Distributions ◮ Interarrival sequences Overview ◮ Processing time sequences for a production system Probability Theory for Choosing Distributions ◮ Asset-value sequence for a financial model Data-Driven Approaches Other Approaches to Input Distributions Even if we assume i.i.d. sequences for simplicity... ◮ What type of distribution should we use (gamma, Weibull, normal, exponential, . . . )? ◮ Given a type of probability distribution (i.e., a “distribution family”) what parameter values should we use? Two approaches: probability theory and historical data 3 / 22 4 / 22
Theoretical Justification for Normal Random Variables Theoretical Justification for Lognormal Random Variables Suppose that X can be expressed as a sum of random Suppose that X can be expressed as a product of random variables: X = Y 1 + Y 2 + · · · + Y n variables: X = Z 1 Z 2 · · · Z n ◮ Example: Value of a financial asset In great generality, versions of CLT imply that X D ∼ N ( µ, σ 2 ) (approx.) for large n , where µ = E [ X ] and σ 2 = Var[ X ] Then log( X ) = Y 1 + Y 2 + · · · + Y n , where Y i = log( Z i ) ◮ Y i ’s need not be i.i.d., just not “too dependent” and not “too By prior discussion, log( X ) D non-identical” ∼ N ( µ, σ 2 ) (approx.) for large n , i.e. X D � � ∼ exp N (0 , 1) , so that X is approximately lognormally Q: Examples where CLT breaks down? distributed Moral: Moral: If X is the product of a large number of other random quantities, If X is the sum of a large number of other random quantities, then then X can be approx. modeled as a lognormal random variable. X can be approx. modeled as a normal random variable. 5 / 22 6 / 22 Theoretical Justification for Poisson Arrival Process Theoretical Justification for Weibull Distribution Suppose that the arrival process is a superposition of arrivals from a variety of statistically independent sources Suppose that X can be expressed as a minimum of nonnegative random variables: X = min 1 ≤ i ≤ n Y i source 1 ◮ Example: Lifetime of a complex system source 2 source 3 queue : Extreme-value theory (Gnedenko’s Theorem) says that if Y i ’s : : are i.i.d. then X has approximately a Weibull distribution source n when n is large: P ( X ≤ x ) = 1 − e − ( λ x ) α The Palm-Khintchine Theorem says that superposition of n ◮ α is the shape parameter and λ is the scale parameter i.i.d. sources looks ≈ Poisson as n becomes large ◮ Can relax i.i.d. assumption Moral: If X is the the lifetime of a complicated component, then X can be Moral: approx. modeled as a Weibull random variable. Poisson process often a reasonable model of arrival processes. (But beware of long-range dependence) 7 / 22 8 / 22
Controlling the Variability Squared coefficient of variation: ρ 2 ( X ) = Var( X ) ( E [ X ]) 2 D Case 1: X ∼ exp( λ ) Input Distributions ρ 2 ( X ) = 1 Overview Probability Theory for Choosing Distributions D f X ( x ) = λ e − λ x ( λ x ) α − 1 / Γ( α ) Case 2: X ∼ gamma( λ, α ) Data-Driven Approaches Other Approaches to Input Distributions ρ 2 ( X ) = α/λ 2 ( α/λ ) 2 = 1 α Three scenarios to play with: ◮ α > 1: less variable than exponential ◮ α = 1: exponential ◮ α < 1: more variable than exponential 9 / 22 10 / 22 Goodness-of-Fit Software Feature Matching to Data GoF software applies a set of goodness-of-fit tests to select Pragmatic approach: match key features in the data distribution family with “best fit” to data ◮ chi-square, Kolmogorov-Smirnov, Epps-Singleton, Ex: Use gamma dist’n, match first two empirical moments Anderson-Darling, . . . ◮ Hence match the empirical coefficient of variation (see below) ◮ Note: nothing about “process physics” implies gamma, we use GoF software must be used with caution it for it’s convenence and flexibility in modeling a range of ◮ Low power: Poor discrimination between different distributions variability ◮ Sequential testing: Test properties mathematically ill-defined ◮ Discourages sensitivity analysis: Unwary users stop with “best” Can use even more flexible distribution families ◮ Can obscure non-i.i.d. features: e.g., trends, autocorrelation ◮ Ex: Johnson translation system (4 parameters) ◮ Fails on big data: All test fail on real-world datasets ◮ Ex: Generalized lambda distribution (4 parameters) ◮ Over-reliance on summary statistics: Should also plot data ◮ Useful when extreme values not important to simulation ◮ Ex: Q-Q plots [Law, p. 339-344] better indicate departures [Nelson (2013), pp. 113–116] from candidate distribution (lower/upper, heavy/light tails) 11 / 22 12 / 22
Estimating Parameters: Maximum Likelihood Method Maximum Likelihood, Continued Ex: Poisson arrival process to a queue Ex: Geometric distribution ◮ Exponential interarrivals: 3.0, 1.0, 4.0, 3.0, 8.0 ◮ Number of failures before first success in Bernoulli trials, ◮ Goal: estimate λ success probability = p ◮ For a continuous dist’n, likelihood of an observation = pdf ◮ P { X = k } = (1 − p ) k p for k ≥ 0 [Law, Problem 6.26] ◮ How to estimate p given four observations of X ? X = 3 , 5 , 2 , 8 ◮ Given a value of λ , likelihoods for observations: ◮ Given a value of p , likelihood for ◮ Joint likelihood: L ( λ ) = obs1: obs2: obs3: obs4: ◮ Joint log-likelihood: ˜ L ( λ ) = ◮ Joint likelihood L ( p ) for all four observations: ◮ Estimate ˆ λ = ◮ Choose estimator ˆ p to maximize L ( p ) (“best explains data”) ◮ Equivalently, maximize ˜ � � L ( p ) = log L ( p ) : Q: Why is this a reasonable estimator? 13 / 22 14 / 22 Maximum Likelihood, Continued Maximum Likelihood, Continued Maximizing the likelihood function General setup (continuous i.i.d. case) ◮ Simple case: solve ◮ Given X 1 , . . . , X n i.i.d. samples from pdf f ( · ; α 1 , . . . , α k ) ◮ MLE’s ˆ α 1 , . . . , ˆ α k maximize the likelihood function ∂ ˜ L n (ˆ α 1 , . . . , ˆ α k ) = 0 , for i = 1 , 2 , . . . , k ∂α i n � L n ( α 1 , . . . , α k ) = f ( X i ; α 1 , . . . , α k ) ◮ Harder cases: maximum occurs on boundary, constraints i =1 (Kuhn-Tucker conditions) or, equivalently, the log-likelihood function ◮ Hardest cases (typical in practice): solve numerically n Why bother? ˜ � � � L n ( α 1 , . . . , α k ) = log f ( X i ; α 1 , . . . , α k ) ◮ MLEs maximize asymptotic statistical efficiency i =1 ◮ For large n , MLEs “squeeze maximal info from sample” ◮ For discrete case use pmf instead of pdf (i.e., smallest variance ⇒ narrowest confidence intervals) 15 / 22 16 / 22
Estimating Parameters: Method of Moments Bayesian Parameter Estimation Idea: equate k sample moments to k true moments & solve View unknown parameter θ as a random variable with prior distribution f ( θ ) Example: Exponential distribution ◮ Encapsulates prior knowledge about θ before seeing data ◮ k = 1 and E [ X ] = 1 /λ ◮ Ex: If we know that θ ∈ [5 , 10] set f = Uniform(5,10) ◮ So equate first moments: After seeing data Y , use Bayes’ Rule to compute posterior X n = 1 / ˆ ¯ λ = 1 / ¯ ˆ λ ⇒ X n f ( θ | Y ) = f ( Y | θ ) f ( θ ) / c Example: Gamma distribution where f ( Y | θ ) is the likelihood of Y under θ and c is normalizing ◮ k = 2, E [ X ] = α/λ , and Var[ X ] = α/λ 2 constant ◮ equate moments: Estimate θ by mean (or mode) of posterior ◮ f ( θ | Y ) usually very complex ¯ α/ ˆ λ and s 2 α/ ˆ λ 2 α = ¯ X 2 n / s 2 n and ˆ λ = ¯ X n / s 2 X n = ˆ n = ˆ ⇒ ˆ n ◮ Use Markov Chain Monte Carlo (MCMC) to estimate mean— see HW 2 Can match other stats, e.g., quantiles [Nelson 2013, p. 115] 17 / 22 18 / 22 Bayesian Parameter Estimation, Continued Simple example (see refresher handout) ◮ Goal: Estimate success probability for Bernoulli distribution ◮ Assume a Beta( α, β ) prior on θ Input Distributions ◮ Observe n Bernoulli trials with Y successes Overview Probability Theory for Choosing Distributions ◮ Y | θ has Binomial( n , θ ) likelihood distribution Data-Driven Approaches ◮ Posterior, given Y = y , is Beta( α + y , β + n − y ) Other Approaches to Input Distributions (Beta is “conjugate prior” to binomial) α + y ◮ ˆ θ = mean of posterior = α + β + n Note: Maximum likelihood is a special case of Bayesian estimation (with a uniform prior and MAP estimation) 19 / 22 20 / 22
Recommend
More recommend