20 Draft Randomized quasi-Monte Carlo (RQMC) � To estimate µ = (0 , 1) s f ( u ) d u , RQMC Estimator: n − 1 µ n , rqmc = 1 � ˆ f ( U i ) , n i =0 with P n = { U 0 , . . . , U n − 1 } ⊂ (0 , 1) s an RQMC point set: (i) each point U i has the uniform distribution over (0 , 1) s ; (ii) P n as a whole is a low-discrepancy point set. E [ˆ µ n , rqmc ] = µ (unbiased) , Var [ f ( U i )] 2 � Var [ˆ µ n , rqmc ] = + Cov [ f ( U i ) , f ( U j )] . n 2 n i < j We want to make the last sum as negative as possible.
20 Draft Randomized quasi-Monte Carlo (RQMC) � To estimate µ = (0 , 1) s f ( u ) d u , RQMC Estimator: n − 1 µ n , rqmc = 1 � ˆ f ( U i ) , n i =0 with P n = { U 0 , . . . , U n − 1 } ⊂ (0 , 1) s an RQMC point set: (i) each point U i has the uniform distribution over (0 , 1) s ; (ii) P n as a whole is a low-discrepancy point set. E [ˆ µ n , rqmc ] = µ (unbiased) , Var [ f ( U i )] 2 � Var [ˆ µ n , rqmc ] = + Cov [ f ( U i ) , f ( U j )] . n 2 n i < j We want to make the last sum as negative as possible. Weak attempts: antithetic variates ( n = 2), Latin hypercube sampling,...
21 Draft Variance estimation: Can compute m independent realizations X 1 , . . . , X m of ˆ µ n , rqmc , then µ n , rqmc ] by their sample mean ¯ estimate µ and Var [ˆ X m and sample variance S 2 m . Could be used to compute a confidence interval.
22 Draft Stratification of the unit hypercube Partition axis j in k j ≥ 1 equal parts, for j = 1 , . . . , s . Draw n = k 1 · · · k s random points, one per box, independently. Example , s = 2, k 1 = 12, k 2 = 8, n = 12 × 8 = 96. 1 u i , 1 u i , 0 0 1
23 Stratified estimator: Draft n − 1 X s , n = 1 � f ( U j ) . n j =0 The crude MC variance with n points can be decomposed as n − 1 X n ] = Var [ X s , n ] + 1 Var [ ¯ � ( µ j − µ ) 2 n j =0 where µ j is the mean over box j . The more the µ j differ, the more the variance is reduced.
23 Stratified estimator: Draft n − 1 X s , n = 1 � f ( U j ) . n j =0 The crude MC variance with n points can be decomposed as n − 1 X n ] = Var [ X s , n ] + 1 Var [ ¯ � ( µ j − µ ) 2 n j =0 where µ j is the mean over box j . The more the µ j differ, the more the variance is reduced. If f ′ is continuous and bounded, and all k j are equal, then Var [ X s , n ] = O ( n − 1 − 2 / s ) .
23 Stratified estimator: Draft n − 1 X s , n = 1 � f ( U j ) . n j =0 The crude MC variance with n points can be decomposed as n − 1 X n ] = Var [ X s , n ] + 1 Var [ ¯ � ( µ j − µ ) 2 n j =0 where µ j is the mean over box j . The more the µ j differ, the more the variance is reduced. If f ′ is continuous and bounded, and all k j are equal, then Var [ X s , n ] = O ( n − 1 − 2 / s ) . For large s , not practical. For small s , not really better than midpoint rule with a grid when f is smooth. But can still be applied to a few important random variables. Gives an unbiased estimator, and variance can be estimated by replicating m ≥ 2 times.
24 Draft Randomly-Shifted Lattice Example: lattice with s = 2 , n = 101 , v 1 = (1 , 12) / 101 1 u i , 1 u i , 0 0 1
24 Draft Randomly-Shifted Lattice Example: lattice with s = 2 , n = 101 , v 1 = (1 , 12) / 101 1 u i , 1 U u i , 0 0 1
24 Draft Randomly-Shifted Lattice Example: lattice with s = 2 , n = 101 , v 1 = (1 , 12) / 101 1 u i , 1 u i , 0 0 1
24 Draft Randomly-Shifted Lattice Example: lattice with s = 2 , n = 101 , v 1 = (1 , 12) / 101 1 u i , 1 u i , 0 0 1
25 Draft Variance bounds We can obtain various Cauchy-Shwartz inequalities of the form µ n , rqmc ] ≤ V 2 ( f ) · D 2 ( P n ) Var [ˆ for all f in some Hilbert space or Banach space H , where V ( f ) = � f − µ � H is the variation of f , and D ( P n ) is the discrepancy of P n (defined by an expectation in the RQMC case).
25 Draft Variance bounds We can obtain various Cauchy-Shwartz inequalities of the form µ n , rqmc ] ≤ V 2 ( f ) · D 2 ( P n ) Var [ˆ for all f in some Hilbert space or Banach space H , where V ( f ) = � f − µ � H is the variation of f , and D ( P n ) is the discrepancy of P n (defined by an expectation in the RQMC case). Lattice rules: For certain Hilbert spaces of smooth periodic functions f with square-integrable partial derivatives of order up to α : D ( P n ) = O ( n − α + ǫ ) for all ǫ > 0. µ n , rqmc ] = O ( n − 2 α + ǫ ) for all ǫ > 0. This gives Var [ˆ Non-periodic functions can be made periodic via a baker’s transformation (easy).
Example of a digital net in base 2 : 26 Draft The first n = 64 = 2 6 Sobol points in s = 2 dimensions: 1 u i , 1 0 1 u i , 0 They form a (0 , 6 , 2)-net in two dimensions.
Example of a digital net in base 2 : 26 Draft The first n = 64 = 2 6 Sobol points in s = 2 dimensions: 1 u i , 1 0 1 u i , 0 They form a (0 , 6 , 2)-net in two dimensions.
Example of a digital net in base 2 : 26 Draft The first n = 64 = 2 6 Sobol points in s = 2 dimensions: 1 u i , 1 0 1 u i , 0 They form a (0 , 6 , 2)-net in two dimensions.
Example of a digital net in base 2 : 26 Draft The first n = 64 = 2 6 Sobol points in s = 2 dimensions: 1 u i , 1 0 1 u i , 0 They form a (0 , 6 , 2)-net in two dimensions.
Example of a digital net in base 2 : 27 Draft Hammersley point set (or Sobol + 1 coord.), n = 64, s = 2. 1 u i , 1 0 1 u i , 0 Also a (0 , 6 , 2)-net in two dimensions.
Example of a digital net in base 2 : 27 Draft Hammersley point set (or Sobol + 1 coord.), n = 64, s = 2. 1 u i , 1 0 1 u i , 0 Also a (0 , 6 , 2)-net in two dimensions.
Example of a digital net in base 2 : 27 Draft Hammersley point set (or Sobol + 1 coord.), n = 64, s = 2. 1 u i , 1 0 1 u i , 0 Also a (0 , 6 , 2)-net in two dimensions.
Example of a digital net in base 2 : 27 Draft Hammersley point set (or Sobol + 1 coord.), n = 64, s = 2. 1 u i , 1 0 1 u i , 0 Also a (0 , 6 , 2)-net in two dimensions.
28 Draft Digital net with random digital shift Equidistribution in digital boxes is lost with random shift modulo 1, but can be kept with a random digital shift in base b . In base 2: Generate U ∼ U (0 , 1) s and XOR it bitwise with each u i . Example for s = 2: u i = (0.01100100..., 0.10011000...) 2 U = (0.01001010..., 0.11101001...) 2 u i ⊕ U = (0.00101110..., 0.01110001...) 2 . Each point has U (0 , 1) distribution. Preservation of the equidistribution ( k 1 = 3, k 2 = 5): u i = (0.***, 0.*****) U = (0.101, 0.01011) 2 u i ⊕ U = (0.C*C, 0.*C*CC)
29 Draft Hammersley points Digital shift with U = (0 . 10100101 ..., 0 . 0101100 ... ) 2 , first bit. 1 1 u i , 1 u i , 1 0 1 0 1 u i , 0 u i , 0
30 Draft Digital shift with U = (0 . 10100101 ..., 0 . 0101100 ... ) 2 , second bit. 1 1 u i , 1 u i , 1 0 1 0 1 u i , 0 u i , 0
31 Draft Digital shift with U = (0 . 10100101 ..., 0 . 0101100 ... ) 2 , third bit. 1 1 u i , 1 u i , 1 0 1 0 1 u i , 0 u i , 0
32 Draft Digital shift with U = (0 . 10100101 ..., 0 . 0101100 ... ) 2 , all bits (final). 1 1 u i , 1 u i , 1 0 1 0 1 u i , 0 u i , 0
33 Draft Variance bounds Digital nets: “Classical” Koksma-Hlawka inequality for QMC: f must have finite variation in the sense of Hardy and Krause (implies no discontinuity not aligned with the axes). Popular constructions achieve D ( P n ) = O ( n − 1 (ln n ) s ) = O ( n − 1+ ǫ ) for all ǫ > 0. µ n , rqmc ] = O ( n − 2+ ǫ ) for all ǫ > 0. Gives Var [ˆ More recent constructions (polynomial lattice rules) offer better rates for smooth functions.
33 Draft Variance bounds Digital nets: “Classical” Koksma-Hlawka inequality for QMC: f must have finite variation in the sense of Hardy and Krause (implies no discontinuity not aligned with the axes). Popular constructions achieve D ( P n ) = O ( n − 1 (ln n ) s ) = O ( n − 1+ ǫ ) for all ǫ > 0. µ n , rqmc ] = O ( n − 2+ ǫ ) for all ǫ > 0. Gives Var [ˆ More recent constructions (polynomial lattice rules) offer better rates for smooth functions. With nested uniform scrambling (NUS) by Owen, one has µ n , rqmc ] = O ( n − 3+ ǫ ) for all ǫ > 0. Var [ˆ Bounds are conservative and too hard to compute in practice. Hidden constant and variation often increase fast with dimension s . But still often works very well empirically!
34 Draft Classical Randomized Quasi-Monte Carlo (RQMC) for Markov Chains One RQMC point for each sample path. Put V i = ( U i , 1 , . . . , U i ,τ ) ∈ (0 , 1) s = (0 , 1) d τ . Estimate µ by n − 1 µ rqmc , n = 1 � ˆ g ( X i ,τ ) n i =0 where P n = { V 0 , . . . , V n − 1 } ⊂ (0 , 1) s is an RQMC point set: (a) each point V i has the uniform distribution over (0 , 1) s ; (b) P n covers (0 , 1) s very evenly (i.e., has low discrepancy). The dimension s = d τ is often very large!
35 Draft Array-RQMC for Markov Chains L., L´ ecot, Tuffin, et al. [2004, 2006, 2008, etc.] Earlier deterministic versions by L´ ecot et al. Simulate an “array” of n chains in “parallel.” At each step, use an RQMC point set P n to advance all the chains by one step. Seek global negative dependence across the chains. Goal : Want small discrepancy (or “distance”) between empirical distribution of S n , j = { X 0 , j , . . . , X n − 1 , j } and theoretical distribution of X j . If we succeed, we have an unbiased estimator with small variance, for any j : n − 1 µ arqmc , j , n = 1 � µ j = E [ g ( X j )] ≈ ˆ g ( X i , j ) . n i =0
Some RQMC insight: To simplify the discussion, suppose X j ∼ U (0 , 1) ℓ . 36 Draft This can be achieved (in principle) by a change of variable. We estimate � µ j = E [ g ( X j )] = E [ g ( ϕ j ( X j − 1 , U ))] = [0 , 1) ℓ + d g ( ϕ j ( x , u )) d x d u (we take a single j here) by n − 1 n − 1 µ arqmc , j , n = 1 g ( X i , j ) = 1 � � ˆ g ( ϕ j ( X i , j − 1 , U i , j )) . n n i =0 i =0 This is (roughly) RQMC with the point set Q n = { ( X i , j − 1 , U i , j ) , 0 ≤ i < n } . We want Q n to have low discrepancy (LD) (be highly uniform) over [0 , 1) ℓ + d .
Some RQMC insight: To simplify the discussion, suppose X j ∼ U (0 , 1) ℓ . 36 Draft This can be achieved (in principle) by a change of variable. We estimate � µ j = E [ g ( X j )] = E [ g ( ϕ j ( X j − 1 , U ))] = [0 , 1) ℓ + d g ( ϕ j ( x , u )) d x d u (we take a single j here) by n − 1 n − 1 µ arqmc , j , n = 1 g ( X i , j ) = 1 � � ˆ g ( ϕ j ( X i , j − 1 , U i , j )) . n n i =0 i =0 This is (roughly) RQMC with the point set Q n = { ( X i , j − 1 , U i , j ) , 0 ≤ i < n } . We want Q n to have low discrepancy (LD) (be highly uniform) over [0 , 1) ℓ + d . We do not choose the X i , j − 1 ’s in Q n : they come from the simulation. To construct the (randomized) U i , j , select a LD point set ˜ Q n = { ( w 0 , U 0 , j ) , . . . , ( w n − 1 , U n − 1 , j ) } , where the w i ∈ [0 , 1) ℓ are fixed and each U i , j ∼ U (0 , 1) d . Permute the states X i , j − 1 so that X π j ( i ) , j − 1 is “close” to w i for each i (LD between the two sets), and compute X i , j = ϕ j ( X π j ( i ) , j − 1 , U i , j ) for each i . Example: If ℓ = 1, can take w i = ( i + 0 . 5) / n and just sort the states. For ℓ > 1, there are various ways to define the matching (multivariate sort).
37 Draft Array-RQMC algorithm X i , 0 ← x 0 (or X i , 0 ← x i , 0 ) for i = 0 , . . . , n − 1; for j = 1 , 2 , . . . , τ do Compute the permutation π j of the states (for matching); Randomize afresh { U 0 , j , . . . , U n − 1 , j } in ˜ Q n ; X i , j = ϕ j ( X π j ( i ) , j − 1 , U i , j ), for i = 0 , . . . , n − 1; µ arqmc , j , n = ¯ � n − 1 Y n , j = 1 ˆ i =0 g ( X i , j ); n end for Estimate µ by the average ¯ Y n = ˆ µ arqmc ,τ, n .
37 Draft Array-RQMC algorithm X i , 0 ← x 0 (or X i , 0 ← x i , 0 ) for i = 0 , . . . , n − 1; for j = 1 , 2 , . . . , τ do Compute the permutation π j of the states (for matching); Randomize afresh { U 0 , j , . . . , U n − 1 , j } in ˜ Q n ; X i , j = ϕ j ( X π j ( i ) , j − 1 , U i , j ), for i = 0 , . . . , n − 1; µ arqmc , j , n = ¯ � n − 1 Y n , j = 1 ˆ i =0 g ( X i , j ); n end for Estimate µ by the average ¯ Y n = ˆ µ arqmc ,τ, n . Proposition: (i) The average ¯ Y n is an unbiased estimator of µ . (ii) The empirical variance of m independent realizations gives an unbiased estimator of Var [ ¯ Y n ].
38 Draft Key issues: 1. How can we preserve LD of S n , j = { X 0 , j , . . . , X n − 1 , j } as j increases? µ arqmc ,τ, n ] = O ( n − α ) for some α > 1? 2. Can we prove that Var [ˆ How? What α ? 3. How does it behave empirically for moderate n ? Intuition: Write discrepancy measure of S n , j as the mean square integration error (or variance) when integrating some function ψ : [0 , 1) ℓ + d → R using Q n . Use RQMC theory to show it is small if Q n has LD. Then use induction.
39 Draft Convergence results and applications L., L´ ecot, and Tuffin [2006, 2008]: Special cases: convergence at MC rate, one-dimensional, stratification, etc. Var in O ( n − 3 / 2 ). L´ ecot and Tuffin [2004]: Deterministic, one-dimension, discrete state. El Haddad, L´ ecot, L. [2008, 2010]: Deterministic, multidimensional. Fakhererredine, El Haddad, L´ ecot [2012, 2013, 2014]: LHS, stratification, Sudoku sampling, ... W¨ achter and Keller [2008]: Applications in computer graphics. Gerber and Chopin [2015]: Sequential QMC (particle filters), Owen nested scrambling and Hilbert sort. Variance in o ( n − 1 ).
40 Draft Some generalizations L., L´ ecot, and Tuffin [2008]: τ can be a random stopping time w.r.t. the filtration F{ ( j , X j ) , j ≥ 0 } . L., Demers, and Tuffin [2006, 2007]: Combination with splitting techniques (multilevel and without levels), combination with importance sampling and weight windows. Covers particle filters. L. and Sanvido [2010]: Combination with coupling from the past for exact sampling. Dion and L. [2010]: Combination with approximate dynamic programming and for optimal stopping problems. Gerber and Chopin [2015]: Sequential QMC.
41 Draft Mapping chains to points when ℓ > 2 1. Multivariate batch sort : Sort the states (chains) by first coordinate, in n 1 packets of size n / n 1 . Sort each packet by second coordinate, in n 2 packets of size n / n 1 n 2 . · · · At the last level, sort each packet of size n ℓ by the last coordinate. Choice of n 1 , n 2 , ..., n ℓ ?
42 Draft A (4,4) mapping Sobol’ net in 2 dimensions after random digital shift States of the chains 1.0 1.0 s s s 0.9 s 0.9 s s 0.8 s 0.8 s s s 0.7 0.7 s s 0.6 0.6 s s 0.5 s 0.5 s s 0.4 0.4 s s s 0.3 0.3 s s s s s s 0.2 0.2 s s s s 0.1 0.1 s s 0.0 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
43 Draft A (4,4) mapping Sobol’ net in 2 dimensions after random digital shift States of the chains 1.0 1.0 s s s 0.9 s 0.9 s s 0.8 s 0.8 s s s 0.7 0.7 s s 0.6 0.6 s s 0.5 s 0.5 s s 0.4 0.4 s s s 0.3 0.3 s s s s s s 0.2 0.2 s s s s 0.1 0.1 s s 0.0 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
44 Draft A (4,4) mapping Sobol’ net in 2 dimensions after random digital shift States of the chains 1.0 1.0 s s s 0.9 s 0.9 s s 0.8 s 0.8 s s s 0.7 0.7 s s 0.6 0.6 s s 0.5 s 0.5 s s 0.4 0.4 s s s 0.3 0.3 s s s s s s 0.2 0.2 s s s s 0.1 0.1 s s 0.0 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
44 Draft A (4,4) mapping Sobol’ net in 2 dimensions after random digital shift States of the chains 1.0 1.0 s s s 0.9 s 0.9 s s 0.8 s 0.8 s s ③ s 0.7 0.7 s s 0.6 0.6 s ③ s 0.5 ③ s 0.5 s s 0.4 0.4 s s s 0.3 0.3 s s s ③ s s s 0.2 0.2 s s s s 0.1 0.1 s s 0.0 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
45 Draft s 1.0 1.0 15 s 3 s 11 s 0.9 15 0.9 s 7 s 11 0.8 0.8 s 10 s 3 s s 2 14 0.7 0.7 s 14 s 10 0.6 0.6 s 6 s 2 s 5 0.5 0.5 s s 9 9 0.4 0.4 s 7 s 13 s 1 0.3 0.3 s 6 s 8 s 1 s 13 s 8 s 0.2 0.2 5 s 4 s 12 s 0 s 0.1 4 0.1 s 12 s 0 0.0 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
46 Draft Mapping chains to points when ℓ > 2 2. Multivariate split sort : n 1 = n 2 = · · · = 2. Sort by first coordinate in 2 packets. Sort each packet by second coordinate in 2 packets. etc.
47 Draft Mapping by split sort Sobol’ net in 2 dimensions after random digital shift States of the chains 1.0 1.0 s s s 0.9 s 0.9 s s 0.8 s 0.8 s s s 0.7 0.7 s s 0.6 0.6 s s 0.5 s 0.5 s s 0.4 0.4 s s s 0.3 0.3 s s s s s s 0.2 0.2 s s s s 0.1 0.1 s s 0.0 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
47 Draft Mapping by split sort Sobol’ net in 2 dimensions after random digital shift States of the chains 1.0 1.0 s s s 0.9 s 0.9 s s 0.8 s 0.8 s s s 0.7 0.7 s s 0.6 0.6 s s 0.5 s 0.5 s s 0.4 0.4 s s s 0.3 0.3 s s s s s s 0.2 0.2 s s s s 0.1 0.1 s s 0.0 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
47 Draft Mapping by split sort Sobol’ net in 2 dimensions after random digital shift States of the chains 1.0 1.0 s s s 0.9 s 0.9 s s 0.8 s 0.8 s s s 0.7 0.7 s s 0.6 0.6 s s 0.5 s 0.5 s s 0.4 0.4 s s s 0.3 0.3 s s s s s s 0.2 0.2 s s s s 0.1 0.1 s s 0.0 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
47 Draft Mapping by split sort Sobol’ net in 2 dimensions after random digital shift States of the chains 1.0 1.0 s s s 0.9 s 0.9 s s 0.8 s 0.8 s s s 0.7 0.7 s s 0.6 0.6 s s 0.5 s 0.5 s s 0.4 0.4 s s s 0.3 0.3 s s s s s s 0.2 0.2 s s s s 0.1 0.1 s s 0.0 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
47 Draft Mapping by split sort Sobol’ net in 2 dimensions after random digital shift States of the chains 1.0 1.0 s s s 0.9 s 0.9 s s 0.8 s 0.8 s s ③ s 0.7 0.7 s s 0.6 0.6 s ③ s 0.5 ③ s 0.5 s s 0.4 0.4 s s s 0.3 0.3 s s s ③ s s s 0.2 0.2 s s s s 0.1 0.1 s s 0.0 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
48 Draft Mapping by batch sort and split sort One advantage: The state space does not have to be [0 , 1) d : Sobol’ net + digital shift States of the chains s 1.0 ∞ s 0.9 s s s s 0.8 s s s s 0.7 s s 0.6 s s s 0.5 s 0.4 s s s s 0.3 s s s 0.2 s s s s s s 0.1 s s s 0.0 −∞ 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 −∞ ∞
48 Draft Mapping by batch sort and split sort One advantage: The state space does not have to be [0 , 1) d : Sobol’ net + digital shift States of the chains s 1.0 ∞ s 0.9 s s s s 0.8 s s s s 0.7 s s 0.6 s s s 0.5 s 0.4 s s s s 0.3 s s s 0.2 s s s s s s 0.1 s s s 0.0 −∞ 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 −∞ ∞
48 Draft Mapping by batch sort and split sort One advantage: The state space does not have to be [0 , 1) d : Sobol’ net + digital shift States of the chains s 1.0 ∞ s 0.9 s s s s 0.8 s s s s 0.7 s s 0.6 s s s 0.5 s 0.4 s s s s 0.3 s s s 0.2 s s s s s s 0.1 s s s 0.0 −∞ 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 −∞ ∞
48 Draft Mapping by batch sort and split sort One advantage: The state space does not have to be [0 , 1) d : Sobol’ net + digital shift States of the chains s 1.0 ∞ s 0.9 s s s s 0.8 s s s s 0.7 s s 0.6 s s s 0.5 s 0.4 s s s s 0.3 s s s 0.2 s s s s s s 0.1 s s s 0.0 −∞ 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 −∞ ∞
48 Draft Mapping by batch sort and split sort One advantage: The state space does not have to be [0 , 1) d : Sobol’ net + digital shift States of the chains s 1.0 ∞ s 0.9 s s s s 0.8 s s ③ s s 0.7 s s 0.6 s s ③ ③ s 0.5 s 0.4 s s s s 0.3 s s s 0.2 ③ s s s s s s 0.1 s s s 0.0 −∞ 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 −∞ ∞
49 Draft Lowering the state dimension For ℓ > 1: Define a transformation h : X → [0 , 1) c for c < ℓ . Sort the transformed points h ( X i , j ) in c dimensions. Now we only need c + d dimensions for the RQMC point sets; c for the mapping and d to advance the chain. Choice of h : states mapped to nearby values should be nearly equivalent. For c = 1, X is mapped to [0 , 1), which leads to a one-dim sort. The mapping h with c = 1 can be based on a space-filling curve: W¨ achter and Keller [2008] use a Lebesgue Z-curve and mention others; Gerber and Chopin [2015] use a Hilbert curve and prove o ( n − 1 ) convergence for the variance when used with digital nets and Owen nested scrambling. A Peano curve would also work in base 3. Reality check: We only need a good pairing between states and RQMC points. Any good way of doing this is welcome! Machine learning to the rescue?
50 Draft Sorting by a Hilbert curve Suppose the state space is X = [0 , 1) ℓ . Partition this cube into 2 m ℓ subcubes of equal size. When a subcube contains more than one point (a collision), we could split it again in 2 ℓ . But in practice, we rather fix m and neglect collisions. The Hilbert curve defines a way to enumerate (order) the subcubes so that successive subcubes are always adjacent. This gives a way to sort the points. Colliding points are ordered arbitrarily. We precompute and store the map from point coordinates (first m bits) to its position in the list. Then we can map states to points as if the state had one dimension. We use RQMC points in 1 + d dimensions, ordered by first coordinate, which is used to match the states, and d (randomized) coordinates are used to advance the chains.
51 Draft Hilbert curve sort Map the state to [0 , 1], then sort. States of the chains 1.0 s s 0.9 0.8 s s s 0.7 0.6 s 0.5 s 0.4 s 0.3 s s s s s 0.2 s s 0.1 s 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
51 Draft Hilbert curve sort Map the state to [0 , 1], then sort. States of the chains 1.0 s s 0.9 0.8 s s s 0.7 0.6 s 0.5 s 0.4 s 0.3 s s s s s 0.2 s s 0.1 s 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
51 Draft Hilbert curve sort Map the state to [0 , 1], then sort. States of the chains 1.0 s s 0.9 0.8 s s s 0.7 0.6 s 0.5 s 0.4 s 0.3 s s s s s 0.2 s s 0.1 s 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
51 Draft Hilbert curve sort Map the state to [0 , 1], then sort. States of the chains 1.0 s s 0.9 0.8 s s s 0.7 0.6 s 0.5 s 0.4 s 0.3 s s s s s 0.2 s s 0.1 s 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
52 What if state space is not [0 , 1) ℓ ? Draft Ex.: For the Asian option, X = [0 , ∞ ) 2 . Then one must define a transformation ψ : X → [0 , 1) ℓ so that the transformed state is approximately uniformly distributed over [0 , 1) ℓ . Not easy to find a good ψ in general! Gerber and Chopin [2015] propose using a logistic transformation for each coordinate, combined with trial and error. A lousy choice could possibly damage efficiency.
53 Draft Hilbert curve batch sort Perform a multivariate batch sort, or a split sort, and then enumerate the boxes as in the Hilbert curve sort. Advantage: the state space can be R ℓ . ∞ s s s s s s s s s s s s s s s s −∞ −∞ ∞
54 Draft Convergence results and proofs For ℓ = 1, O ( n − 3 / 2 ) variance has been proved under some conditions. For ℓ > 1, worst-case error of O ( n − 1 / ( ℓ +1) ) has been proved in deterministic settings under strong conditions on ϕ j , using a batch sort (El Haddad, L´ ecot, L’Ecuyer 2008, 2010). Gerber and Chopin (2015) proved o ( n − 1 ) variance, for Hilbert sort and digital net with nested scrambling.
55 Draft Proved convergence results L., L´ ecot, Tuffin [2008] + some extensions. Simple case: suppose ℓ = d = 1, X = [0 , 1], and X j ∼ U (0 , 1). Define | ˆ ∆ j = sup F j ( x ) − F j ( x ) | (star discrepancy of states) x ∈X � 1 � � dg ( x ) � � V ∞ ( g ) = � dx (corresponding variation of g ) � � dx � 0 � 1 n − 1 12 n 2 + 1 1 ( ˆ � D 2 F j ( x ) − F j ( x )) 2 dx = = (( i + 0 . 5 / n ) − F j ( X ( i ) , j )) j n 0 i =0 � 1 2 � � dg ( x ) V 2 � � 2 ( g ) = (corresp. square variation of g ) . dx � � dx � � 0 We have � ¯ � � Y n , j − E [ g ( X j )] ≤ ∆ j V ∞ ( g ) , � Var [ ¯ Y n , j ] = E [( ¯ Y n , j − E [ g ( X j )]) 2 ] E [ D 2 j ] V 2 ≤ 2 ( g ) .
56 Draft Convergence results and proofs, ℓ = 1 Assumption 1. ϕ j ( x , u ) non-decreasing in u . Also n = k 2 for some integer k and that each square of the k × k grid contains exactly one RQMC point. Let Λ j = sup 0 ≤ z ≤ 1 V ( F j ( z | · )). Proposition. (Worst-case error.) Under Assumption 1, j j � � ∆ j ≤ n − 1 / 2 (Λ k + 1) Λ i . k =1 i = k +1 Corollary. If Λ j ≤ ρ < 1 for all j , then ∆ j ≤ 1 + ρ 1 − ρ n − 1 / 2 .
57 Draft Convergence results and proofs, ℓ = 1 Assumption 2. (Stratification) Assumption 1 holds, ϕ j also non-decreasing in x , and randomized parts of the points are uniformly distributed in the cubes and pairwise independent (or negatively dependent) conditional on the cubes in which they lie. Proposition. (Variance bound.) Under Assumption 2, � j j � 1 � � E [ D 2 Λ 2 n − 3 / 2 j ] ≤ (Λ ℓ + 1) i 4 ℓ =1 i = ℓ +1 Corollary. If Λ j ≤ ρ < 1 for all j , then 1 + ρ 1 4(1 − ρ 2 ) n − 3 / 2 = E [ D 2 4(1 − ρ ) n − 3 / 2 , j ] ≤ 1 Var [ ¯ 4(1 − ρ ) V 2 2 ( g ) n − 3 / 2 . Y n , j ] ≤ These bounds are uniform in j .
58 Draft Convergence results and proofs, ℓ > 1 Worst-case error of O ( n − 1 / ( ℓ +1) ) has been proved in a deterministic setting for a discrete state space in X ⊆ Z ℓ , and for a continuous state space X ⊆ R ℓ under strong conditions on ϕ j , using a batch sort (El Haddad, L´ ecot, L’Ecuyer 2008, 2010). Gerber and Chopin (2015) proved o ( n − 1 ) for the variance, for Hilbert sort and digital net with nested scrambling.
59 Draft Example: Asian Call Option S (0) = 100, K = 100, r = 0 . 05, σ = 0 . 15, t j = j / 52, j = 0 , . . . , τ = 13. RQMC: Sobol’ points with linear scrambling + random digital shift. Similar results for randomly-shifted lattice + baker’s transform. log 2 Var [ˆ µ RQMC , n ] -10 n − 1 crude MC -20 RQMC sequential -30 array-RQMC, split sort n − 2 -40 log 2 n 8 10 12 14 16 18 20
60 Draft Example: Asian Call Option S (0) = 100, K = 100, r = ln(1 . 09), σ = 0 . 2, Var ≈ O ( n − α ). t j = (230 + j ) / 365, for j = 1 , . . . , τ = 10. α = log 2 Var [ ¯ Y n , j ] Sort RQMC points VRF CPU (sec) log 2 n 2 . 0 × 10 2 Split sort SS -1.38 3093 4 . 0 × 10 6 Sobol -2.04 1116 2 . 6 × 10 6 Sobol+NUS -2.03 1402 2 . 2 × 10 6 Korobov+baker -2.00 903 2 . 0 × 10 2 Batch sort SS -1.38 744 4 . 2 × 10 6 ( n 1 = n 2 ) Sobol -2.03 532 2 . 8 × 10 6 Sobol+NUS -2.03 1035 4 . 4 × 10 6 Korobov+baker -2.04 482 2 . 4 × 10 3 Hilbert sort SS -1.55 840 2 . 6 × 10 6 (logistic map) Sobol -2.03 534 2 . 8 × 10 6 Sobol+NUS -2.02 724 3 . 3 × 10 6 Korobov+baker -2.01 567 VRF for n = 2 20 . CPU time for m = 100 replications.
61 Draft Example: Asian Call Option S (0) = 100, K = 100, r = ln(1 . 09), σ = 0 . 2, t j = (230 + j ) / 365, for j = 1 , . . . , τ = 10. log 2 Var [ ¯ Y n , j ] Sort RQMC points VRF CPU (sec) log 2 n 2 . 0 × 10 2 Split sort SS -1.38 3093 4 . 0 × 10 6 Sobol -2.04 1116 2 . 6 × 10 6 Sobol+NUS -2.03 1402 2 . 2 × 10 6 Korobov+baker -2.00 903 2 . 0 × 10 2 Batch sort SS -1.38 744 4 . 2 × 10 6 ( n 1 = n 2 ) Sobol -2.03 532 2 . 8 × 10 6 Sobol+NUS -2.03 1035 4 . 4 × 10 6 Korobov+baker -2.04 482 2 . 4 × 10 3 Hilbert sort SS -1.55 840 2 . 6 × 10 6 (logistic map) Sobol -2.03 534 2 . 8 × 10 6 Sobol+NUS -2.02 724 3 . 3 × 10 6 Korobov+baker -2.01 567 VRF for n = 2 20 . CPU time for m = 100 replications.
62 Draft The small Markov chain α = log 2 MISE RQMC points log 2 n Monte Carlo -1.00 Sobol+DS -1.88 Lattice -1.91
63 Draft A small example with a one-dimensional state Let θ ∈ [0 , 1) and let G θ be the cdf of Y = θ U + (1 − θ ) V , where U , V are indep. U (0 , 1). We define a Markov chain by X 0 = U 0 ∼ U (0 , 1); Y j = θ X j − 1 + (1 − θ ) U j ; X j = G θ ( Y j ) = ϕ j ( X j − 1 , U j ) , j ≥ 1 , where U j ∼ U (0 , 1). Then, X j ∼ U (0 , 1) for all j . We consider various functions g , all with E [ g ( X j )] = 0: g ( x ) = x − 1 / 2, g ( x ) = x 2 − 1 / 3, g ( x ) = sin(2 π x ), g ( x ) = e x − e + 1 (all smooth), g ( x ) = ( x − 1 / 2) + − 1 / 8 (kink), g ( x ) = I [ x ≤ 1 / 3] − 1 / 3 (step). We pretend we do not know E [ g ( X j )], and see how well we can estimate it by simulation. We also want to see how well we can estimate the exact distribution of X j (uniform) by the empirical distribution of X 0 , j , . . . , X n − 1 , j .
64 Draft One-dimensional example We take ρ = 0 . 3 and j = 5. For array-RQMC, we take X i , 0 = w i = ( i − 1 / 2) / n . We tried different array-RQMC variants, for n = 2 9 to n = 2 21 . We did m = 200 independent replications for each n . We fitted a linear regression of log 2 Var [ ¯ Y n , j ] vs log 2 n , for various g
64 Draft One-dimensional example We take ρ = 0 . 3 and j = 5. For array-RQMC, we take X i , 0 = w i = ( i − 1 / 2) / n . We tried different array-RQMC variants, for n = 2 9 to n = 2 21 . We did m = 200 independent replications for each n . We fitted a linear regression of log 2 Var [ ¯ Y n , j ] vs log 2 n , for various g We also looked at uniformity measures of the set of n states at step j . For example, the Kolmogorov-Smirnov (KS) and Cramer von Mises (CvM) test statistics, denoted KS j and D j . With ordinary MC, E [ KS j ] and E [ D j ] converge as O ( n − 1 ) for any j . For stratification, we have a proof that n − 3 / 2 1 − θ E [ D 2 4(1 − 2 θ ) n − 3 / 2 . j ] ≤ 4(1 − ρ ) =
65 Draft Some MC and RQMC point sets: MC: Crude Monte Carlo LHS: Latin hypercube sampling SS: Stratified sampling SSA: Stratified sampling with antithetic variates in each stratum Sobol: Sobol’ points, left matrix scrambling + digital random shift Sobol+baker: Add baker transformation Sobol+NUS: Sobol’ points with Owen’s nested uniform scrambling Korobov: Korobov lattice in 2 dim. with a random shift modulo 1 Korobov+baker: Add a baker transformation
Recommend
More recommend