Draft 1 Density estimation by Randomized Quasi-Monte Carlo Pierre L’Ecuyer Joint work with Amal Ben Abdellah, Art B. Owen, and Florian Puchhammer SAMSI , Raleigh-Durham, North-Carolina, May 2018
Draft 2 What this talk is about Quasi-Monte Carlo (QMC) and randomized QMC (RQMC) methods have been studied extensively for estimating an integral, E [ X ]. Can they be useful for estimating the entire distribution of X ? E.g., estimating a density, a cdf, some quantiles, etc. When hours or days of computing time are required to perform simulation runs, reporting only a confidence interval on the mean is a waste of information! People routinely look at empirical distributions via histograms, for example. More refined methods: kernel density estimators (KDEs). Can RQMC improve such density estimators, and by how much?
Draft 3 Setting Classical density estimation was developed in the context where independent observations X 1 , . . . , X n are given and one wishes to estimate the density f from which they come. Here we assume that X 1 , . . . , X n are generated by simulation from a stochastic model. We can choose n and we have some freedom on how the simulation is performed. The X i ’s are realizations of a random variable X = g ( U ) ∈ R with density f , where U = ( U 1 , . . . , U s ) ∼ U (0 , 1) s and g ( u ) can be computed easily for any u ∈ (0 , 1) s . Can we obtain a better estimate of f with RQMC instead of MC? How much better?
Draft 4 Density Estimation Suppose we estimate the density f over a finite interval ( a , b ). Let ˆ f n ( x ) denote the density estimator at x , with sample size n . We use the following measures of error: � b E [ˆ f n ( x ) − f ( x )] 2 d x MISE = mean integrated squared error = a = IV + ISB � b Var [ˆ IV = integrated variance = f n ( x )] d x a � b ( E [ˆ f n ( x )] − f ( x )) 2 d x ISB = integrated squared bias = a
Draft 5 Density Estimation Simple histogram: Partition [ a , b ] in m intervals of size h = ( b − a ) / m and define f n ( x ) = n j ˆ nh for x ∈ I j = [ a + ( j − 1) h , a + jh ) , j = 1 , ..., m where n j is the number of observations X i that fall in interval j . Kernel Density Estimator (KDE) : Select kernel k (unimodal symmetric density centered at 0) and bandwidth h > 0 (serves as horizontal stretching factor for the kernel). The KDE is defined by n f n ( x ) = 1 � x − X i � ˆ � k . nh h i =1
Draft 6 Asymptotic convergence with Monte Carlo for smooth f For g : R → R , define � b ( g ( x )) 2 d x , R ( g ) = a � ∞ x r g ( x ) d x , µ r ( g ) = for r = 0 , 1 , 2 , . . . −∞ For histograms and KDEs, when n → ∞ and h → 0: C nh + Bh α . AMISE = AIV + AISB ∼ α C B R ( f ′ ) / 12 Histogram 1 2 ( µ 2 ( k )) 2 R ( f ′′ ) / 4 µ 0 ( k 2 ) KDE 4
Draft 7 The asymptotically optimal h is � C � 1 / ( α +1) h ∗ = B α n and it gives AMISE = Kn − α/ (1+ α ) . h ∗ α AMISE C B R ( f ′ ) ( nR ( f ′ ) / 6) − 1 / 3 O ( n − 2 / 3 ) Histogram 1 2 12 ( µ 2 ( k )) 2 R ( f ′′ ) � 1 / 5 µ 0 ( k 2 ) � µ 0 ( k 2 ) O ( n − 4 / 5 ) KDE 4 ( µ 2 ( k )) 2 R ( f ′′ ) n 4 To estimate h ∗ , one can estimate R ( f ′ ) and R ( f ′′ ) via KDE (plugin).
Draft 8 Asymptotic convergence with RQMC for smooth f Idea: Replace U 1 , . . . , U n by RQMC points. RQMC does not change the bias. For a KDE with smooth k , one could hope (perhaps) to get AIV = C ′ n − β h − 1 for β > 1, instead of Cn − 1 h − 1 . If the IV is reduced, the optimal h can be taken smaller to reduce the ISB as well (re-balance) and then reduce the MISE.
Draft 8 Asymptotic convergence with RQMC for smooth f Idea: Replace U 1 , . . . , U n by RQMC points. RQMC does not change the bias. For a KDE with smooth k , one could hope (perhaps) to get AIV = C ′ n − β h − 1 for β > 1, instead of Cn − 1 h − 1 . If the IV is reduced, the optimal h can be taken smaller to reduce the ISB as well (re-balance) and then reduce the MISE. Unfortunately, things are not so simple. Roughly, decreasing h increases the variation of the function in the estimator. So we may have something like AIV = C ′ n − β h − δ IV ≈ C ′ n − β h − δ in some bounded region. or
Draft 9 Elementary QMC Bounds (Recall) Integration error for g : [0 , 1) s → R with point set P n = { u 0 , . . . , u n − 1 } ⊂ [0 , 1) s : n − 1 E n = 1 � � g ( u i ) − [0 , 1) s g ( u ) d u . n i =0 Koksma-Hlawka inequality: | E n | ≤ V HK ( g ) D ∗ ( P n ) where � � ∂ | v | g � � � � V HK ( g ) = � d u , (Hardy-Krause (HK) variation) � � ∂ v � � [0 , 1) s � ∅� = v ⊆S � � vol [ 0 , u ) − | P n ∩ [ 0 , u ) | � D ∗ ( P n ) � � = sup (star-discrepancy) . � � n � u ∈ [0 , 1) s There are explicit point sets for which D ∗ ( P n ) = O ((log n ) s − 1 / n ) = O ( n − 1+ ǫ ). Explicit RQMC constructions for which E [ E n ] = 0 and Var [ E n ] = O ( n − 2+ ǫ ).
Draft 10 Also | E n | ≤ V 2 ( g ) D 2 ( P n ) where 2 � � ∂ | v | g � � � V 2 � 2 ( g ) = d u , (square L 2 variation) , � � ∂ v � � [0 , 1) s � � ∅� = v ⊆S 2 � � vol [ 0 , u ) − | P n ∩ [ 0 , u ) | � � D 2 � � 2 ( P n ) = d u (square L 2 -star-discrepancy) . � � n u ∈ [0 , 1) s � Explicit constructions for which D 2 ( P n ) = O ( n − 1+ ǫ ). Moreover, if P n is a digital net randomized by a nested uniform scramble (NUS) and V 2 ( g ) < ∞ , then E [ E n ] = 0 and Var [ E n ] = O ( V 2 2 ( g ) n − 3+ ǫ ) for all ǫ > 0.
Draft 11 Bounding the AIV under RQMC for a KDE KDE density estimator at a single point x : n n � x − g ( U i ) � f n ( x ) = 1 1 = 1 ˆ � � hk g ( U i ) . ˜ n h n i =1 i =1 � With RQMC points U i , this is an RQMC estimator of E [˜ g ( U )] = [0 , 1) s ˜ g ( u ) d u = E [ f n ( x )]. RQMC does not change the bias, but may reduce Var [ˆ f n ( x )], and then the IV. To get RQMC variance bounds, we need bounds on the variation of ˜ g .
Draft 11 Bounding the AIV under RQMC for a KDE KDE density estimator at a single point x : n n � x − g ( U i ) � f n ( x ) = 1 1 = 1 ˆ � � hk g ( U i ) . ˜ n h n i =1 i =1 � With RQMC points U i , this is an RQMC estimator of E [˜ g ( U )] = [0 , 1) s ˜ g ( u ) d u = E [ f n ( x )]. RQMC does not change the bias, but may reduce Var [ˆ f n ( x )], and then the IV. To get RQMC variance bounds, we need bounds on the variation of ˜ g . The partial derivatives are: ∂ | v | ∂ | v | g ( u ) = 1 � x − g ( u ) � ˜ k . ∂ u v ∂ u v h h We assume they exist and are uniformly bounded. E.g., Gaussian kernel k . By expanding via the chain rule, we obtain terms in h − j for j = 2 , . . . , | v | + 1. One of the term for v = S grows as h − s − 1 k ( s ) (( g ( u ) − x ) / h ) � s j =1 g j ( u ) = O ( h − s − 1 ) when h → 0, so this AIV bound grows as h − 2 s − 2 . Not so good!
Draft 12 Improvement by a Change of Variable, in One Dimension Suppose g : [0 , 1] → R is monotone. Change of variable w = ( x − g ( u )) / h . In one dimension ( s = 1), we have d w / d u = − g ′ ( u ) / h , so � 1 � ∞ � � − g ′ ( u ) g ) = 1 � x − g ( u ) � d u = 1 k ′ k ′ ( w ) d w = O ( h − 1 ) . V HK (˜ h h h h 0 −∞ Then, if k and g are continuously differentiable, with RQMC points having D ∗ ( P n ) = O ( n − 1+ ǫ ), we obtain AIV = O ( n − 2+ ǫ h − 2 ). With h = Θ( n − 1 / 3 ), this gives AMISE = O ( n − 4 / 3 ). A similar argument gives � 1 � ∞ �� 2 � � − g ′ ( u ) g ) = 1 � � x − g ( u ) d u = 1 V 2 k ′ ( k ′ ( w )) 2 d w = O ( h − 3 ) 2 (˜ h 3 L g h 2 h h 0 −∞ if | g ′ | ≤ L g , and then with NUS: AIV = O ( n − 3+ ǫ h − 3 ). With h = Θ( n − 3 / 7 ), this gives AMISE = O ( n − 12 / 7 ).
Draft 13 Higher Dimensions Let s = 2 and v = { 1 , 2 } . With the change of variable ( u 1 , u 2 ) → ( w , u 2 ), the Jacobian is | d w / d u 1 | = | g 1 ( u 1 , u 2 ) / h | , where g j = ∂ g /∂ u j . If | g 2 | and | g 12 / g 1 | are bounded by a constant L , � ∂ | v | ˜ � � ∂ 2 �� � g 1 � � x − g ( u ) � � � � � d u = k � d u 1 d u 2 � � � � ∂ u v ∂ u 1 ∂ u 2 h h [0 , 1) 2 � [0 , 1) 2 � � g 1 ( u ) � g 12 ( u ) � � 1 � � x − g ( u ) g 2 ( u ) � x − g ( u ) � � k ′′ + k ′ � = � d u 1 d u 2 � � h h h h h h [0 , 1) 2 � 1 � ∞ � � 1 � k ′′ ( w ) g 2 ( u ) + k ′ ( w ) g 12 ( u ) � � = � d w d u 2 � � g 1 ( u ) h h 0 −∞ L h [ µ 0 ( k ′′ ) / h + µ 0 ( k ′ )] = O ( h − 2 ) . = This provides a bound of O ( h − 2 ) for V HK (˜ g ), then AIV = O ( n − 2+ ǫ h − 4 ). g ) = O ( h − s ), AIV = O ( n − 2+ ǫ h − 2 s ), MISE = O ( n − 4 / (2+ s ) ) . Generalizing to s ≥ 2 gives V HK (˜ Beats MC for s < 3, same rate for s = 3. Not very satisfactory.
Recommend
More recommend