Draft 1 Density estimation by Monte Carlo and randomized quasi-Monte Carlo Pierre L’Ecuyer Joint work with Amal Ben Abdellah, Art B. Owen, and Florian Puchhammer S´ eminaire au GERAD , Montr´ eal, 27 mars 2019
Draft 2 What this talk is about Quasi-Monte Carlo (QMC) and randomized QMC (RQMC) methods have been studied extensively for estimating an integral, say E [ X ], and for approximating a function from its evaluation at a finite number of points. How can we use them to estimate the entire distribution of X ? Here we will focus on estimating the density of X over [ a , b ] ⊂ R .
Draft 2 What this talk is about Quasi-Monte Carlo (QMC) and randomized QMC (RQMC) methods have been studied extensively for estimating an integral, say E [ X ], and for approximating a function from its evaluation at a finite number of points. How can we use them to estimate the entire distribution of X ? Here we will focus on estimating the density of X over [ a , b ] ⊂ R . People often look at empirical distributions via histograms, for example. More refined methods: kernel density estimators (KDEs). Can RQMC improve such density estimators, and by how much? Are there other types of density estimators than KDEs, that work better with RQMC?
Draft 2 What this talk is about Quasi-Monte Carlo (QMC) and randomized QMC (RQMC) methods have been studied extensively for estimating an integral, say E [ X ], and for approximating a function from its evaluation at a finite number of points. How can we use them to estimate the entire distribution of X ? Here we will focus on estimating the density of X over [ a , b ] ⊂ R . People often look at empirical distributions via histograms, for example. More refined methods: kernel density estimators (KDEs). Can RQMC improve such density estimators, and by how much? Are there other types of density estimators than KDEs, that work better with RQMC? We will discuss an alternative that takes the sample derivative of a smoothed estimator of the cumulative distribution function (cdf). The smoothing can be achieved via conditional Monte Carlo, for example.
Draft 3 Small example: A stochastic activity network Gives precedence relations between activities. Activity k has random duration Y k (also length of arc k ) with known cdf F k ( y ) := P [ Y k ≤ y ] . Project duration X = (random) length of longest path from source to sink. Want to estimate the density of X , sink f ( x ) = F ′ ( x ) = d d x P [ X ≤ x ]. 5 8 Y 10 Y 5 Y 9 Y 12 2 4 7 Y 4 Y 8 Y 1 Y 2 Y 7 Y 11 0 1 3 6 Y 0 Y 3 Y 6 source
Draft 3 Small example: A stochastic activity network Gives precedence relations between activities. Activity k has random duration Y k (also length of arc k ) with known cdf F k ( y ) := P [ Y k ≤ y ] . Project duration X = (random) length of longest path from source to sink. Want to estimate the density of X , sink f ( x ) = F ′ ( x ) = d d x P [ X ≤ x ]. 5 8 Y 10 The sample cdf Y 5 Y 9 Y 12 ˆ � n F n ( x ) = 1 i =1 I [ X i ≤ x ] n is an unbiased estimator of the cdf 2 4 7 F ( x ) = P [ X ≤ x ] . Y 4 Y 8 Y 1 Y 2 Y 7 Y 11 0 1 3 6 Y 0 Y 3 Y 6 source
Draft 3 Small example: A stochastic activity network Gives precedence relations between activities. Activity k has random duration Y k (also length of arc k ) with known cdf F k ( y ) := P [ Y k ≤ y ] . Project duration X = (random) length of longest path from source to sink. Want to estimate the density of X , sink f ( x ) = F ′ ( x ) = d d x P [ X ≤ x ]. 5 8 Y 10 The sample cdf Y 5 Y 9 Y 12 ˆ � n F n ( x ) = 1 i =1 I [ X i ≤ x ] n is an unbiased estimator of the cdf 2 4 7 F ( x ) = P [ X ≤ x ] . Y 4 Y 8 Y 1 Y 2 Y 7 Y 11 But its derivative ˆ F ′ n ( x ) is not a mean- ingful estimator of f ( x ), because it is 0 0 1 3 6 Y 0 Y 3 Y 6 almost everywhere. source
Draft 4 Numerical illustration from Elmaghraby (1977): Y k ∼ N ( µ k , σ 2 k ) for k = 0 , 1 , 3 , 10 , 11, and Y k ∼ Expon(1 /µ k ) otherwise. µ 0 , . . . , µ 12 : 13.0, 5.5, 7.0, 5.2, 16.5, 14.7, 10.3, 6.0, 4.0, 20.0, 3.2, 3.2, 16.5. Results of an experiment with n = 100 000. Frequency mean = 64 . 2 X det = 48 . 2 10000 5000 ˆ ξ 0 . 99 = 131 . 8 0 T 0 25 50 75 100 125 150 175 200
Draft 5 Setting Classical density estimation was developed in the context where independent observations X 1 , . . . , X n of X are given and one wishes to estimate the density f of X from that. Here we assume that X 1 , . . . , X n are generated by simulation from a stochastic model. We can choose n and we have some freedom on how the simulation is performed. The X i ’s are realizations of a random variable X = g ( U ) ∈ R with density f , where U = ( U 1 , . . . , U s ) ∼ U (0 , 1) s and g ( u ) can be computed easily for any u ∈ (0 , 1) s . Can we obtain a better estimate of f with RQMC instead of MC? How much better? Whats about taking a stratified sample over [0 , 1) s ?
Draft 6 Density Estimation Suppose we estimate the density f over a finite interval [ a , b ]. Let ˆ f n ( x ) denote the density estimator at x , with sample size n . We use the following measures of error: � b E [(ˆ f n ( x ) − f ( x )) 2 ] d x = mean integrated squared error = MISE a = IV + ISB � b Var [ˆ IV = integrated variance = f n ( x )] d x a � b ( E [ˆ f n ( x )] − f ( x )) 2 d x ISB = integrated squared bias = a
Draft 7 Density Estimation Simple histogram: Partition [ a , b ] in m intervals of size h = ( b − a ) / m and define f n ( x ) = n j ˆ nh for x ∈ I j = [ a + ( j − 1) h , a + jh ) , j = 1 , ..., m where n j is the number of observations X i that fall in interval j . Kernel Density Estimator (KDE) : Select kernel k (unimodal symmetric density centered at 0) and bandwidth h > 0 (horizontal stretching factor for the kernel). The KDE is n f n ( x ) = 1 � x − X i � ˆ � k . nh h i =1
Draft 8 KDE bandwidth selection: an illustration in s = 1 dimension KDE (blue) vs true density (red) with RQMC point sets with n = 2 19 : lattice + shift (left), Stratified sample of U = F ( X ) (right) (10 − 5 ) (10 − 5 ) 80 80 70 70 60 60 50 50 40 40 30 30 20 20 10 10 0 0 -5.1 -4.6 -4.1 -3.6 -5.1 -4.6 -4.1 -3.6
Draft 9 Asymptotic convergence with Monte Carlo for smooth f For any g : R → R , define � b ( g ( x )) 2 d x , R ( g ) = a � ∞ x r g ( x ) d x , µ r ( g ) = for r = 0 , 1 , 2 , . . . −∞ For histograms and KDEs, when n → ∞ and h → 0: C nh + Bh α . AMISE = AIV + AISB ∼ α C B R ( f ′ ) / 12 Histogram 1 2 ( µ 2 ( k )) 2 R ( f ′′ ) / 4 µ 0 ( k 2 ) KDE 4
Draft 10 The asymptotically optimal h is � C � 1 / ( α +1) h ∗ = B α n and it gives AMISE = Kn − α/ (1+ α ) . h ∗ C B α AMISE R ( f ′ ) ( nR ( f ′ ) / 6) − 1 / 3 O ( n − 2 / 3 ) Histogram 1 2 12 ( µ 2 ( k )) 2 R ( f ′′ ) � 1 / 5 µ 0 ( k 2 ) � µ 0 ( k 2 ) O ( n − 4 / 5 ) KDE 4 ( µ 2 ( k )) 2 R ( f ′′ ) n 4 To estimate h ∗ , one can estimate R ( f ′ ) and R ( f ′′ ) via KDE (plugin). This is under the simplifying assumption that h must be the same all over [ a , b ].
Draft 11 Elementary quasi-Monte Carlo (QMC) Bounds (Recall) Integration error for g : [0 , 1) s → R with point set P n = { u 0 , . . . , u n − 1 } ⊂ [0 , 1) s : n − 1 E n = 1 � � g ( u i ) − [0 , 1) s g ( u ) d u . n i =0 Koksma-Hlawka inequality: | E n | ≤ V HK ( g ) D ∗ ( P n ) where � � ∂ | v | g � � � � V HK ( g ) = ∂ v ( u ) � d u , (Hardy-Krause (HK) variation) � � � � [0 , 1) s � ∅� = v ⊆S � � � vol [ 0 , u ) − | P n ∩ [ 0 , u ) | D ∗ ( P n ) � � = sup (star-discrepancy) . � � n u ∈ [0 , 1) s � There are explicit point sets for which D ∗ ( P n ) = O ((log n ) s − 1 / n ) = O ( n − 1+ ǫ ) , ∀ ǫ > 0. Explicit RQMC constructions for which E [ E n ] = 0 and Var [ E n ] = O ( n − 2+ ǫ ) , ∀ ǫ > 0. With ordinary Monte Carlo (MC), one has Var [ E n ] = O ( n − 1 ).
Draft 12 Asymptotic convergence of KDE with RQMC Idea: Replace U 1 , . . . , U n by RQMC points. RQMC does not change the bias. For a KDE with smooth k , one could hope (perhaps) to get AIV = C ′ n − β h − 1 for β > 1, instead of Cn − 1 h − 1 . If the IV is reduced, the optimal h can be taken smaller to reduce the ISB as well (re-balance) and then reduce the MISE.
Draft 12 Asymptotic convergence of KDE with RQMC Idea: Replace U 1 , . . . , U n by RQMC points. RQMC does not change the bias. For a KDE with smooth k , one could hope (perhaps) to get AIV = C ′ n − β h − 1 for β > 1, instead of Cn − 1 h − 1 . If the IV is reduced, the optimal h can be taken smaller to reduce the ISB as well (re-balance) and then reduce the MISE. Unfortunately, things are not so simple. Roughly, decreasing h increases the variation of the function in the KDE estimator. So we rather have something like AIV = C ′ n − β h − δ IV ≈ C ′ n − β h − δ in some bounded region, for β > 1 and δ > 1. or
Draft 13 Bounding the AIV under RQMC for a KDE KDE density estimator at a single point x : n n � x − g ( U i ) � f n ( x ) = 1 1 = 1 ˆ � � hk g ( U i ) . ˜ n h n i =1 i =1 g ( u ) d u = E [ˆ � With RQMC points U i , this is an RQMC estimator of E [˜ g ( U )] = [0 , 1) s ˜ f n ( x )]. RQMC does not change the bias, but may reduce Var [ˆ f n ( x )], and then the IV. To get RQMC variance bounds, we need bounds on the variation of ˜ g .
Recommend
More recommend