Draft 1 Density estimation by Monte Carlo and randomized quasi-Monte Carlo (RQMC) Pierre L’Ecuyer Joint work with Amal Ben Abdellah, Art B. Owen, and Florian Puchhammer NUS, Singapore, October 2019
Draft 2 What this talk is about Monte Carlo (MC) simulation is widely used to estimate the expectation E [ X ] of a random variable X and compute a confidence interval on E [ X ]. MSE = Var [ ¯ X n ] = O ( n − 1 ).
Draft 2 What this talk is about Monte Carlo (MC) simulation is widely used to estimate the expectation E [ X ] of a random variable X and compute a confidence interval on E [ X ]. MSE = Var [ ¯ X n ] = O ( n − 1 ). But simulation usually provides information to do much more! The output data can be used to estimate the entire distribution of X , e.g., the cumulative distribution function (cdf) F of X , defined by F ( x ) = P [ X ≤ x ], or its density f defined by f ( x ) = F ′ ( x ).
Draft 2 What this talk is about Monte Carlo (MC) simulation is widely used to estimate the expectation E [ X ] of a random variable X and compute a confidence interval on E [ X ]. MSE = Var [ ¯ X n ] = O ( n − 1 ). But simulation usually provides information to do much more! The output data can be used to estimate the entire distribution of X , e.g., the cumulative distribution function (cdf) F of X , defined by F ( x ) = P [ X ≤ x ], or its density f defined by f ( x ) = F ′ ( x ). If X 1 , . . . , X n are n indep. realizations of X , the empirical cdf n F n ( x ) = 1 ˆ � I [ X i ≤ x ] n i =1 is unbiased for F ( x ) at all x , and Var [ ˆ F n ( x )] = O ( n − 1 ).
Draft 2 What this talk is about Monte Carlo (MC) simulation is widely used to estimate the expectation E [ X ] of a random variable X and compute a confidence interval on E [ X ]. MSE = Var [ ¯ X n ] = O ( n − 1 ). But simulation usually provides information to do much more! The output data can be used to estimate the entire distribution of X , e.g., the cumulative distribution function (cdf) F of X , defined by F ( x ) = P [ X ≤ x ], or its density f defined by f ( x ) = F ′ ( x ). If X 1 , . . . , X n are n indep. realizations of X , the empirical cdf n F n ( x ) = 1 ˆ � I [ X i ≤ x ] n i =1 is unbiased for F ( x ) at all x , and Var [ ˆ F n ( x )] = O ( n − 1 ). For a continuous r.v. X , the density f provides a better visual idea of the distribution. Here we focus on estimating f over [ a , b ] ⊂ R .
Draft 3 Setting Classical density estimation was developed in the context where independent observations X 1 , . . . , X n of X are given and one wishes to estimate the density f of X from that. Best rate: Var [ˆ f n ( x )] = O ( n − 4 / 5 ). Here we assume that X 1 , . . . , X n are generated by simulation from a stochastic model. We can choose n and we have some freedom on how the simulation is performed. The X i ’s are realizations of a random variable X = g ( U ) ∈ R with density f , where U = ( U 1 , . . . , U s ) ∼ U (0 , 1) s and g ( u ) can be computed easily for any u ∈ (0 , 1) s .
Draft 3 Setting Classical density estimation was developed in the context where independent observations X 1 , . . . , X n of X are given and one wishes to estimate the density f of X from that. Best rate: Var [ˆ f n ( x )] = O ( n − 4 / 5 ). Here we assume that X 1 , . . . , X n are generated by simulation from a stochastic model. We can choose n and we have some freedom on how the simulation is performed. The X i ’s are realizations of a random variable X = g ( U ) ∈ R with density f , where U = ( U 1 , . . . , U s ) ∼ U (0 , 1) s and g ( u ) can be computed easily for any u ∈ (0 , 1) s . 1. Can we obtain a better estimate of f with RQMC instead of MC? How much better? What about taking a stratified sample over [0 , 1) s ? 2. Is it possible to obtain unbiased density estimators that converge as O ( n − 1 ) or faster, using clever sampling strategies?
Draft 4 Small example: A stochastic activity network Gives precedence relations between activities. Activity k has random duration Y k (also length of arc k ) with known cdf F k ( y ) := P [ Y k ≤ y ] . Project duration X = (random) length of longest path from source to sink. Can look at deterministic equivalent of X , E [ X ], cdf, density, ... sink 5 8 Y 11 Y 6 Y 10 Y 13 Want to estimate the density of X , 2 4 7 Y 5 Y 9 f ( x ) = F ′ ( x ) = d d x P [ X ≤ x ]. Y 2 Y 3 Y 8 Y 12 0 1 3 6 Y 1 Y 5 Y 7 source
Draft 4 Small example: A stochastic activity network Gives precedence relations between activities. Activity k has random duration Y k (also length of arc k ) with known cdf F k ( y ) := P [ Y k ≤ y ] . Project duration X = (random) length of longest path from source to sink. Can look at deterministic equivalent of X , E [ X ], cdf, density, ... The sample cdf sink F n ( x ) = 1 ˆ � n i =1 I [ X i ≤ x ] 5 8 n Y 11 is an unbiased estimator of the cdf Y 6 F ( x ) = P [ X ≤ x ] . Y 10 Y 13 Want to estimate the density of X , 2 4 7 Y 5 Y 9 f ( x ) = F ′ ( x ) = d d x P [ X ≤ x ]. Y 2 Y 3 Y 8 Y 12 0 1 3 6 Y 1 Y 5 Y 7 source
Draft 4 Small example: A stochastic activity network Gives precedence relations between activities. Activity k has random duration Y k (also length of arc k ) with known cdf F k ( y ) := P [ Y k ≤ y ] . Project duration X = (random) length of longest path from source to sink. Can look at deterministic equivalent of X , E [ X ], cdf, density, ... The sample cdf sink F n ( x ) = 1 ˆ � n i =1 I [ X i ≤ x ] 5 8 n Y 11 is an unbiased estimator of the cdf Y 6 F ( x ) = P [ X ≤ x ] . Y 10 Y 13 Want to estimate the density of X , 2 4 7 Y 5 Y 9 f ( x ) = F ′ ( x ) = d d x P [ X ≤ x ]. Y 2 Y 3 Y 8 Y 12 The sample derivative ˆ F ′ n ( x ) is useless 0 1 3 6 fo estimate f ( x ), because it is 0 almost Y 1 Y 5 Y 7 everywhere. source
Draft 5 Numerical illustration from Elmaghraby (1977): Y k ∼ N ( µ k , σ 2 k ) for k = 1 , 2 , 4 , 11 , 12, and Y k ∼ Expon(1 /µ k ) otherwise. µ 1 , . . . , µ 13 : 13.0, 5.5, 7.0, 5.2, 16.5, 14.7, 10.3, 6.0, 4.0, 20.0, 3.2, 3.2, 16.5. Results of an experiment with n = 100 000. Note: X is not normal! Frequency X det = 48 . 2 10000 5000 0 X 0 25 50 75 100 125 150 175 200
Draft 5 Numerical illustration from Elmaghraby (1977): Y k ∼ N ( µ k , σ 2 k ) for k = 1 , 2 , 4 , 11 , 12, and Y k ∼ Expon(1 /µ k ) otherwise. µ 1 , . . . , µ 13 : 13.0, 5.5, 7.0, 5.2, 16.5, 14.7, 10.3, 6.0, 4.0, 20.0, 3.2, 3.2, 16.5. Results of an experiment with n = 100 000. Note: X is not normal! Frequency mean = 64 . 2 X det = 48 . 2 10000 5000 ˆ ξ 0 . 99 = 131 . 8 0 X 0 25 50 75 100 125 150 175 200
Draft 6 Density Estimation Suppose we estimate the density f over a finite interval [ a , b ]. Let ˆ f n ( x ) denote the density estimator at x , with sample size n . We use the following measures of error: � b E [(ˆ f n ( x ) − f ( x )) 2 ] d x = mean integrated squared error = MISE a = IV + ISB � b Var [ˆ IV = integrated variance = f n ( x )] d x a � b ( E [ˆ f n ( x )] − f ( x )) 2 d x ISB = integrated squared bias = a To minimize the MISE, we need to balance the IV and ISB.
Draft 7 Density Estimation Simple histogram: Partition [ a , b ] in m intervals of size h = ( b − a ) / m and define f n ( x ) = n j ˆ nh for x ∈ I j = [ a + ( j − 1) h , a + jh ) , j = 1 , ..., m where n j is the number of observations X i that fall in interval j . Kernel Density Estimator (KDE) : Select kernel k (unimodal symmetric density centered at 0) and bandwidth h > 0 (horizontal stretching factor for the kernel). The KDE is n n f n ( x ) = 1 � x − X i � = 1 � x − g ( U i ) � ˆ � � k k . nh h nh h i =1 i =1
Draft 8 Example of a KDE in s = 1 dimension KDE (blue) vs true density (red) with n = 2 19 : Here we take U 1 , . . . , U n in (0 , 1) and put X i = F − 1 ( U i ). midpoint rule (left) stratified sample of U = F ( X ) (right). (10 − 5 ) (10 − 5 ) 80 80 70 70 60 60 50 50 40 40 30 30 20 20 10 10 0 0 -5.1 -4.6 -4.1 -3.6 -5.1 -4.6 -4.1 -3.6
Draft 9 Asymptotic convergence with Monte Carlo for smooth f Here we assume independent random samples (Monte Carlo or given data). For histograms and KDEs, when n → ∞ and h → 0: C nh + Bh α . AMISE = AIV + AISB ∼ For any g : R → R , define � b � ∞ ( g ( x )) 2 d x , −∞ x r g ( x ) d x , R ( g ) = µ r ( g ) = for r = 0 , 1 , 2 , . . . a α C B R ( f ′ ) / 12 Histogram 1 2 ( µ 2 ( k )) 2 R ( f ′′ ) / 4 µ 0 ( k 2 ) KDE 4
Draft 10 The asymptotically optimal h is � C � 1 / ( α +1) h ∗ = B α n and it gives AMISE = Kn − α/ (1+ α ) . h ∗ C B α AMISE R ( f ′ ) ( nR ( f ′ ) / 6) − 1 / 3 O ( n − 2 / 3 ) Histogram 1 2 12 ( µ 2 ( k )) 2 R ( f ′′ ) � 1 / 5 µ 0 ( k 2 ) � µ 0 ( k 2 ) O ( n − 4 / 5 ) KDE 4 ( µ 2 ( k )) 2 R ( f ′′ ) n 4 To estimate h ∗ , one can estimate R ( f ′ ) and R ( f ′′ ) via KDE (plugin). This is true under the simplifying assumption that h must be the same all over [ a , b ].
Draft 11 Can we take the stochastic derivative of an estimator of F ? Can we estimate the density f ( x ) = F ′ ( x ) by the derivative of an estimator of F ( x ). A simple candidate cdf estimator is the empirical cdf n F n ( x ) = 1 ˆ � I [ X i ≤ x ] . n i =1 However d ˆ F n ( x ) / d x = 0 almost everywhere, so this cannot be a useful density estimator! We need a smoother estimator of F .
Recommend
More recommend