draft
play

Draft 1 Lattice Rules for Quasi-Monte Carlo Pierre LEcuyer SAMSI - PowerPoint PPT Presentation

Draft 1 Lattice Rules for Quasi-Monte Carlo Pierre LEcuyer SAMSI Opening workshop on QMC, Duke University, August 2017 Draft 2 Monte Carlo integration Want to estimate = ( f ) = [0 , 1) s f ( u ) d u = E [ f ( U )] where f :


  1. Draft 15 Randomly-Shifted Lattice Example: lattice with s = 2 , n = 101 , v 1 = (1 , 12) / 101 1 u i , 1 u i , 2 0 1

  2. Draft 15 Randomly-Shifted Lattice Example: lattice with s = 2 , n = 101 , v 1 = (1 , 12) / 101 1 u i , 1 u i , 2 0 1

  3. Draft 16 A small lattice shifted by the red vector, modulo 1. 1 5 2 7 u i , 2 4 1 6 3 0 0 1 u i , 1

  4. Draft 16 A small lattice shifted by the red vector, modulo 1. 1 1 5 2 7 2 7 4 1 u i , 2 U i , 2 4 6 1 3 6 0 3 5 0 0 1 0 1 u i , 1 U i , 1

  5. Draft 17 Can generate the shift uniformly in the parallelotope determined by basis vectors, 1 u i , 2 0 1 u i , 1

  6. Draft 17 Can generate the shift uniformly in the parallelotope determined by basis vectors, 1 1 u i , 2 U i , 2 0 1 0 1 u i , 1 U i , 1

  7. Draft 17 Can generate the shift uniformly in the parallelotope determined by basis vectors, or in any shifted copy of it. 1 1 u i , 2 U i , 2 0 1 0 1 u i , 1 U i , 1

  8. Draft 18 Perhaps less obvious: Can generate it in any of the colored shapes below. 1 U i , 2 0 1 U i , 1

  9. Draft 18 Perhaps less obvious: Can generate it in any of the colored shapes below. 1 1 U i , 2 U i , 2 0 1 0 1 U i , 1 U i , 1

  10. Draft 18 Perhaps less obvious: Can generate it in any of the colored shapes below. 1 1 U i , 2 U i , 2 0 1 0 1 U i , 1 U i , 1

  11. Draft 19 Generating the shift uniformly in one tile Proposition. Let R ⊂ [0 , 1) s such that { R i = ( R + u i ) mod 1 , i = 0 , . . . , n − 1 } is a partition of [0 , 1) s in n regions of volume 1 / n . Then, sampling the random shift U uniformly in any given R i is equivalent to sampling it uniformly in [0 , 1) s . The error function g n ( U ) = ˆ µ n , rqmc − µ over any R i is the same as over R .

  12. Draft 20 Error function g n ( u ) for f ( u 1 , u 2 ) = ( u 1 − 1 / 2) ( u 2 − 1 / 2). 1 0 . 3 0 . 8 0 . 2 0 . 1 0 . 6 u 2 0 0 . 4 − 0 . 1 0 . 2 − 0 . 2 0 0 0 . 2 0 . 4 0 . 6 0 . 8 1 u 1

  13. Draft 21 Error function g n ( u ) for f ( u 1 , u 2 ) = ( u 1 − 1 / 2) + ( u 2 − 1 / 2). 1 1 0 . 8 0 . 5 0 . 6 u 2 0 0 . 4 − 0 . 5 0 . 2 − 1 0 0 0 . 2 0 . 4 0 . 6 0 . 8 1 u 1

  14. Draft 22 Error function g n ( u ) for f ( u 1 , u 2 ) = u 1 u 2 ( u 1 − 1 / 2) ( u 2 − 1 / 2). · 10 − 2 1 1 0 . 8 0 . 6 0 u y 0 . 4 − 1 0 . 2 0 0 0 . 2 0 . 4 0 . 6 0 . 8 1 u x

  15. Draft 23 Variance for randomly-shifted lattice Suppose f has Fourier expansion � f ( h ) e 2 π i h t u . ˆ f ( u ) = h ∈ Z s For a randomly shifted lattice, the exact variance is (always) � | ˆ f ( h ) | 2 , Var [ˆ µ n , rqmc ] = 0 � = h ∈ L ∗ s where L ∗ s is the dual lattice. From the viewpoint of variance reduction, an optimal lattice for given f minimizes the square “discrepancy” D 2 ( P n ) = Var [ˆ µ n , rqmc ].

  16. Draft 24 � | ˆ f ( h ) | 2 . Var [ˆ µ n , rqmc ] = 0 � = h ∈ L ∗ s Let α > 0 be an even integer. If f has square-integrable mixed partial derivatives up to order α/ 2 > 0, and the periodic continuation of its derivatives up to order α/ 2 − 1 is continuous across the unit cube boundaries, then f ( h ) | 2 = O ((max(1 , h 1 ) · · · max(1 , h s )) − α ) . | ˆ Moreover, there is a vector v 1 = v 1 ( n ) such that (max(1 , h 1 ) · · · max(1 , h s )) − α = O ( n − α + ǫ ) . � P α := 0 � = h ∈ L ∗ s This P α is the variance for a worst-case f having f ( h ) | 2 = (max(1 , | h 1 | ) · · · max(1 , | h s | )) − α . | ˆ A larger α means a smoother f and a faster convergence rate.

  17. Draft 25 If α is an even integer, this worst-case f is (2 π ) α/ 2 f ∗ ( u ) = � � ( α/ 2)! B α/ 2 ( u j ) . j ∈ u u ⊆{ 1 ,..., s } where B α/ 2 is the Bernoulli polynomial of degree α/ 2. In particular, B 1 ( u ) = u − 1 / 2 and B 2 ( u ) = u 2 − u + 1 / 6. Easy to compute P α and to search for good lattices in this case! However: This worst-case function is not necessarily representative of what happens in applications. Also, the hidden factor in O increases quickly with s , so this result is not very useful for large s . To get a bound that is uniform in s , the Fourier coefficients must decrease rapidly with the dimension and “size” of vectors h ; that is, f must be “smoother” in high-dimensional projections. This is typically what happens in applications for which RQMC is effective!

  18. Draft 26 A very general weighted P α P α can be generalized by giving different weights w ( h ) to the vectors h : � ˜ w ( h )(max(1 , | h 1 | ) · · · max(1 , | h s | )) − α . P α := 0 � = h ∈ L ∗ s But how do we choose these weights? There are too many! The optimal weights to minimize the variance are: w ( h ) = (max(1 , | h 1 | ) · · · max(1 , | h s | )) α | ˆ f ( h ) | 2 .

  19. Draft 27 ANOVA decomposition The Fourier expansion has too many terms to handle. As a cruder expansion, we can write f ( u ) = f ( u 1 , . . . , u s ) as: s s � � � f ( u ) = f u ( u ) = µ + f { i } ( u i ) + f { i , j } ( u i , u j ) + · · · i =1 i , j =1 u ⊆{ 1 ,..., s } where � � f u ( u ) = u | f ( u ) d u ¯ u − f v ( u v ) , [0 , 1) | ¯ v ⊂ u and the Monte Carlo variance decomposes as σ 2 = � σ 2 where σ 2 u , u = Var [ f u ( U )] . u ⊆{ 1 ,..., s } The σ 2 u ’s can be estimated (perhaps very roughly) by MC or RQMC. Intuition: Make sure the projections P n ( u ) are very uniform for subsets u with large σ 2 u .

  20. Draft 28 Weighted P γ,α with projection-dependent weights γ u Denote u ( h ) = u ( h 1 , . . . , h s ) the set of indices j for which h j � = 0. � γ u ( h ) (max(1 , | h 1 | ) · · · max(1 , | h s | )) − α . P γ,α = 0 � = h ∈ L ∗ s For α/ 2 integer > 0, with u i = ( u i , 1 , . . . , u i , s ) = i v 1 mod 1, n − 1 � | u | � � − ( − 4 π 2 ) α/ 2 1 � � P γ,α = γ u B α ( u i , j ) , ( α )! n ∅� = u ⊆{ 1 ,..., s } i =0 j ∈ u and the corresponding variation is 2 ∂ α | u | / 2 � � 1 � � V 2 � � γ ( f ) = ∂ u α/ 2 f u ( u ) d u , � � γ u (4 π 2 ) α | u | / 2 [0 , 1] | u | � � ∅� = u ⊆{ 1 ,..., s } for f : [0 , 1) s → R smooth enough. Then, � µ n , rqmc ( f u )] ≤ V 2 Var [ˆ µ n , rqmc ] = Var [ˆ γ ( f ) P γ,α . u ⊆{ 1 ,..., s }

  21. Draft Weighted P γ,α : 29 � γ u ( h ) (max(1 , h 1 ) , . . . , max(1 , h s )) − α P γ,α = 0 � = h ∈ L ∗ s Variance for a worst-case function whose square Fourier coefficients are f ( h ) | 2 = γ u ( h ) (max(1 , h 1 ) , . . . , max(1 , h s )) − α . | ˆ This is the RQMC variance for the function (2 π ) α/ 2 √ γ u f ∗ ( u ) = � � ( α/ 2)! B α/ 2 ( u j ) . u ⊆{ 1 ,..., s } j ∈ u We also have for this f : � | u | � | u | Var [ B α/ 2 ( U )] (4 π 2 ) α/ 2 | B α (0) | (4 π 2 ) α/ 2 � � σ 2 u = γ u = γ u . (( α/ 2)!) 2 ( α )!

  22. Draft Weighted P γ,α : 29 � γ u ( h ) (max(1 , h 1 ) , . . . , max(1 , h s )) − α P γ,α = 0 � = h ∈ L ∗ s Variance for a worst-case function whose square Fourier coefficients are f ( h ) | 2 = γ u ( h ) (max(1 , h 1 ) , . . . , max(1 , h s )) − α . | ˆ This is the RQMC variance for the function (2 π ) α/ 2 √ γ u f ∗ ( u ) = � � ( α/ 2)! B α/ 2 ( u j ) . u ⊆{ 1 ,..., s } j ∈ u We also have for this f : � | u | � | u | Var [ B α/ 2 ( U )] (4 π 2 ) α/ 2 | B α (0) | (4 π 2 ) α/ 2 � � σ 2 u = γ u = γ u . (( α/ 2)!) 2 ( α )! For α = 2, we should take γ u = (3 /π 2 ) | u | σ 2 u ≈ (0 . 30396) | u | σ 2 u . For α = 4, we should take γ u = [45 /π 4 ] | u | σ 2 u ≈ (0 . 46197) | u | σ 2 u . For α → ∞ , we have γ u → (0 . 5) | u | σ 2 u . The ratios weight / variance should decrease exponentially with | u | .

  23. Draft 30 Heuristics for choosing the weights For f ∗ , we should take γ u = ρ | u | σ 2 u for some constant ρ . But there are still 2 s − 1 subsets u to consider! One could define a simple parametric model for the square variations and then estimate the parameters by matching the ANOVA variances σ 2 u [Wang and Sloan 2006, L. and Munger 2012]. For example, product weights: γ u = � j ∈ u γ j for some constants γ j ≥ 0. Order-dependent weights: γ u depends only on | u | . Example: γ u = 1 for | u | ≤ d and γ u = 0 otherwise. Wang (2007) suggests this with d = 2. Mixture: POD weights (Kuo et al. 2011). Note that all one-dimensional projections (before random shift) are the same. So the weights γ u for | u | = 1 are irrelevant.

  24. Draft 31 Weighted R γ,α When α is not even, one can take   ⌊ n / 2 ⌋ n − 1 1 � � � � max(1 , | h | ) − α e 2 π i hu i , j − 1  . R γ,α ( P n ) = γ u  n i =0 j ∈ u ∅� = u ⊆{ 1 ,..., s } h = −⌊ ( n − 1) / 2 ⌋ Upper bounds on P γ,α can be computed in terms of R γ,α . Can be computed for any α > 0 (finite sum). For example, can take α = 1. We can compute it using FFT.

  25. Draft 32 Figure of merit based on the spectral test Compute the shortest vector ℓ u ( P n ) in dual lattice for each projection u and normalize by an upper bound ℓ ∗ | u | ( n ) (with Euclidean length): ℓ ∗ | u | ( n ) D u ( P n ) = ℓ u ( P n ) ≥ 1 .

  26. Draft 32 Figure of merit based on the spectral test Compute the shortest vector ℓ u ( P n ) in dual lattice for each projection u and normalize by an upper bound ℓ ∗ | u | ( n ) (with Euclidean length): ℓ ∗ | u | ( n ) D u ( P n ) = ℓ u ( P n ) ≥ 1 . L. and Lemieux (2000), etc., maximize   ℓ { 1 ,..., r } ( P n ) ℓ u ( P n )  min  . M t 1 ,..., t d = min , min min ℓ ∗ ℓ ∗ r ( n ) r ( n ) 2 ≤ r ≤ t 1 2 ≤ r ≤ d u = { j 1 ,..., j r }⊂{ 1 ,..., s } 1= j 1 < ··· < j r ≤ t r Computing time of ℓ u ( P n ) is almost independent of n , but exponential in | u | . Poor lattices can be eliminated quickly. Can use a different norm, compute shortest vector in primal lattice, etc.

  27. Draft 33 Search methods Korobov lattices. Search over all admissible a , for a = (1 , a , a 2 , . . . , ... ). Random Korobov. Try r random values of a .

  28. Draft 33 Search methods Korobov lattices. Search over all admissible a , for a = (1 , a , a 2 , . . . , ... ). Random Korobov. Try r random values of a . Rank 1, exhaustive search. Pure random search. Try admissible vectors a at random.

  29. Draft 33 Search methods Korobov lattices. Search over all admissible a , for a = (1 , a , a 2 , . . . , ... ). Random Korobov. Try r random values of a . Rank 1, exhaustive search. Pure random search. Try admissible vectors a at random. Component by component (CBC) construction. (Sloan, Kuo, etc.). Let a 1 = 1; For j = 2 , 3 , . . . , s , find z ∈ { 1 , . . . , n − 1 } , gcd( z , n ) = 1, such that ( a 1 , a 2 , . . . , a j = z ) minimizes D ( P n ( { 1 , . . . , j } )). Fast CBC construction for P γ,α : use FFT. (Nuyens, Cools).

  30. Draft 33 Search methods Korobov lattices. Search over all admissible a , for a = (1 , a , a 2 , . . . , ... ). Random Korobov. Try r random values of a . Rank 1, exhaustive search. Pure random search. Try admissible vectors a at random. Component by component (CBC) construction. (Sloan, Kuo, etc.). Let a 1 = 1; For j = 2 , 3 , . . . , s , find z ∈ { 1 , . . . , n − 1 } , gcd( z , n ) = 1, such that ( a 1 , a 2 , . . . , a j = z ) minimizes D ( P n ( { 1 , . . . , j } )). Fast CBC construction for P γ,α : use FFT. (Nuyens, Cools). Randomized CBC construction. Let a 1 = 1; For j = 2 , . . . , s , try r random z ∈ { 1 , . . . , n − 1 } , gcd( z , n ) = 1, and retain ( a 1 , a 2 , . . . , a j = z ) that minimizes D ( P n ( { 1 , . . . , j } )). Can add filters to eliminate poor lattices more quickly.

  31. Draft 34 Embedded lattices P n 1 ⊂ P n 2 ⊂ . . . P n m with n 1 < n 2 < · · · < n m , for some m > 0. Usually: n k = b c + k for integers c ≥ 0 and b ≥ 2, typically with b = 2, a k = a k +1 mod n k for all k < m , and the same random shift. We need a measure that accounts for the quality of all m lattices. We standardize the merit at all levels k so they have a comparable scale: E q ( P n ) = D q ( P n ) / D ∗ q ( n ) , where D ∗ q ( n ) is a normalization factor, e.g., a bound on D q ( P n ) or a bound on its average over all ( a 1 , . . . , a s ) under consideration. For P γ,α , bounds by Sinescu and L. (2012) and Dick et al. (2008). For CBC, we do this for each coordinate j = 1 , . . . , s (replace s by j ). Then we can take as a global measure (with sum or max): m � ¯ � q = w k [ E q ( P n k )] q . � E q , m ( P n 1 , . . . , P n m ) k =1

  32. Draft 35 Available software tools Construction : Nuyens (2012) provides Matlab code for fast-CBC construction of lattice rules based on P γ,α , with product and order-dependent weights. Precomputed tables for fixed criteria: Maisonneuve (1972), Sloan and Joe (1994), L. and Lemieux (2000), Kuo (2012), etc. Software for using (randomized) lattice rules in simulations is also available in many places (e.g., in SSJ).

  33. Draft 36 Lattice Builder Implemented as C++ library, modular, object-oriented, accessible from a program via API. Various choices of figures of merit, arbitrary weights, construction methods, etc. Easily extensible. For better run-time efficiency, uses static polymorphism, via templates, rather than dynamic polymorphism. Several other techniques to reduce computations and improve speed. Offers a pre-compiled program with Unix-like command line interface. Also graphical interface. Available for download on GitHub, with source code, documentation, and precompiled executable codes for Linux or Windows, in 32-bit and 64-bit versions. Coming very soon: Construction of polynomial lattice rules as well. Show graphical interface

  34. Draft 37 Baker’s (or tent) transformation To make the periodic continuation of f continuous. If f (0) � = f (1), define ˜ f by ˜ f (1 − u ) = ˜ f ( u ) = f (2 u ) for 0 ≤ u ≤ 1 / 2. This ˜ f has the same integral as f and ˜ f (0) = ˜ f (1). 0 1 1/2 .

  35. Draft 37 Baker’s (or tent) transformation To make the periodic continuation of f continuous. If f (0) � = f (1), define ˜ f by ˜ f (1 − u ) = ˜ f ( u ) = f (2 u ) for 0 ≤ u ≤ 1 / 2. This ˜ f has the same integral as f and ˜ f (0) = ˜ f (1). 0 1 1/2 .

  36. Draft 37 Baker’s (or tent) transformation To make the periodic continuation of f continuous. If f (0) � = f (1), define ˜ f by ˜ f (1 − u ) = ˜ f ( u ) = f (2 u ) for 0 ≤ u ≤ 1 / 2. This ˜ f has the same integral as f and ˜ f (0) = ˜ f (1). 0 1 1/2 .

  37. Draft 37 Baker’s (or tent) transformation To make the periodic continuation of f continuous. If f (0) � = f (1), define ˜ f by ˜ f (1 − u ) = ˜ f ( u ) = f (2 u ) for 0 ≤ u ≤ 1 / 2. This ˜ f has the same integral as f and ˜ f (0) = ˜ f (1). 0 1 1/2 For smooth f , can reduce the variance to O ( n − 4+ ǫ ) (Hickernell 2002). The resulting ˜ f is symmetric with respect to u = 1 / 2. In practice, we transform the points U i instead of f .

  38. Draft 38 One-dimensional case Random shift followed by baker’s transformation. Along each coordinate, stretch everything by a factor of 2 and fold. Same as replacing U j by min[2 U j , 2(1 − U j )]. 0 0.5 1

  39. Draft 38 One-dimensional case Random shift followed by baker’s transformation. Along each coordinate, stretch everything by a factor of 2 and fold. Same as replacing U j by min[2 U j , 2(1 − U j )]. 0 0.5 1 U / n

  40. Draft 38 One-dimensional case Random shift followed by baker’s transformation. Along each coordinate, stretch everything by a factor of 2 and fold. Same as replacing U j by min[2 U j , 2(1 − U j )]. 0 0.5 1

  41. Draft 38 One-dimensional case Random shift followed by baker’s transformation. Along each coordinate, stretch everything by a factor of 2 and fold. Same as replacing U j by min[2 U j , 2(1 − U j )]. 0 0.5 1 Gives locally antithetic points in intervals of size 2 / n . This implies that linear pieces over these intervals are integrated exactly. Intuition: when f is smooth, it is well-approximated by a piecewise linear function, which is integrated exactly, so the error is small.

  42. Draft 39 Example: A stochastic activity network Gives precedence relations between activities. Activity k has random duration Y k (also length of arc k ) with known cumulative distribution function (cdf) F k ( y ) := P [ Y k ≤ y ] . Project duration T = (random) length of longest path from source to sink. May want to estimate E [ T ], P [ T > x ], a quantile, density of T , etc. 5 8 sink Y 10 Y 5 Y 9 Y 12 2 4 7 Y 4 Y 8 Y 1 Y 2 Y 7 Y 11 source 0 1 3 6 Y 0 Y 3 Y 6

  43. Draft 40 Simulation Algorithm: to generate T : for k = 0 , . . . , 12 do Generate U k ∼ U (0 , 1) and let Y k = F − 1 k ( U k ) Compute X = T = h ( Y 0 , . . . , Y 12 ) = f ( U 0 , . . . , U 12 ) Monte Carlo: Repeat n times independently to obtain n realizations X 1 , . . . , X n of T . (0 , 1) s f ( u ) d u by ¯ � n − 1 X n = 1 � Estimate E [ T ] = i =0 X i . n To estimate P ( T > x ), take X = I [ T > x ] instead. RQMC: Replace the n independent points by an RQMC point set of size n .

  44. Draft 40 Simulation Algorithm: to generate T : for k = 0 , . . . , 12 do Generate U k ∼ U (0 , 1) and let Y k = F − 1 k ( U k ) Compute X = T = h ( Y 0 , . . . , Y 12 ) = f ( U 0 , . . . , U 12 ) Monte Carlo: Repeat n times independently to obtain n realizations X 1 , . . . , X n of T . (0 , 1) s f ( u ) d u by ¯ � n − 1 X n = 1 � Estimate E [ T ] = i =0 X i . n To estimate P ( T > x ), take X = I [ T > x ] instead. RQMC: Replace the n independent points by an RQMC point set of size n . Numerical illustration from Elmaghraby (1977): Y k ∼ N ( µ k , σ 2 k ) for k = 0 , 1 , 3 , 10 , 11, and V k ∼ Expon(1 /µ k ) otherwise. µ 0 , . . . , µ 12 : 13.0, 5.5, 7.0, 5.2, 16.5, 14.7, 10.3, 6.0, 4.0, 20.0, 3.2, 3.2, 16.5.

  45. Draft Naive idea: replace each Y k by its expectation. Gives T = 48 . 2. 41 Results of an experiment with n = 100 000. Histogram of values of T is a density estimator that gives more information than a confidence interval on E [ T ] or P [ T > x ]. Values range from 14.4 to 268.6; 11.57% exceed x = 90. Frequency mean = 64 . 2 T = 48 . 2 T = x = 90 10000 5000 ˆ ξ 0 . 99 = 131 . 8 0 T 0 25 50 75 100 125 150 175 200

  46. Draft Naive idea: replace each Y k by its expectation. Gives T = 48 . 2. 41 Results of an experiment with n = 100 000. Histogram of values of T is a density estimator that gives more information than a confidence interval on E [ T ] or P [ T > x ]. Values range from 14.4 to 268.6; 11.57% exceed x = 90. RQMC can also reduce the error (e.g., the MISE) of a density estimator! Frequency mean = 64 . 2 T = 48 . 2 T = x = 90 10000 5000 ˆ ξ 0 . 99 = 131 . 8 0 T 0 25 50 75 100 125 150 175 200

  47. Draft 42 Alternative estimator of P [ T > x ] = E [ I ( T > x )] for SAN. Naive estimator: Generate T and compute X = I [ T > x ]. Repeat n times and average.

  48. Draft 42 Alternative estimator of P [ T > x ] = E [ I ( T > x )] for SAN. Naive estimator: Generate T and compute X = I [ T > x ]. Repeat n times and average. 5 8 sink Y 10 Y 5 Y 9 Y 12 2 4 7 Y 4 Y 8 Y 1 Y 2 Y 7 Y 11 source 0 1 3 6 Y 6 Y 0 Y 3

  49. Draft 43 Conditional Monte Carlo estimator of P [ T > x ] . Generate the Y j ’s only for the 8 arcs that do not belong to the cut L = { 4 , 5 , 6 , 8 , 9 } , and replace I [ T > x ] by its conditional expectation given those Y j ’s, X e = P [ T > x | { Y j , j �∈ L} ] . This makes the integrand continuous in the U j ’s.

  50. Draft 43 Conditional Monte Carlo estimator of P [ T > x ] . Generate the Y j ’s only for the 8 arcs that do not belong to the cut L = { 4 , 5 , 6 , 8 , 9 } , and replace I [ T > x ] by its conditional expectation given those Y j ’s, X e = P [ T > x | { Y j , j �∈ L} ] . This makes the integrand continuous in the U j ’s. To compute X e : for each l ∈ L , say from a l to b l , compute the length α l of the longest path from 1 to a l , and the length β l of the longest path from b l to the destination. The longest path that passes through link l does not exceed x iff α l + Y l + β l ≤ x , which occurs with probability P [ Y l ≤ x − α l − β l ] = F l [ x − α l − β l ].

  51. Draft 43 Conditional Monte Carlo estimator of P [ T > x ] . Generate the Y j ’s only for the 8 arcs that do not belong to the cut L = { 4 , 5 , 6 , 8 , 9 } , and replace I [ T > x ] by its conditional expectation given those Y j ’s, X e = P [ T > x | { Y j , j �∈ L} ] . This makes the integrand continuous in the U j ’s. To compute X e : for each l ∈ L , say from a l to b l , compute the length α l of the longest path from 1 to a l , and the length β l of the longest path from b l to the destination. The longest path that passes through link l does not exceed x iff α l + Y l + β l ≤ x , which occurs with probability P [ Y l ≤ x − α l − β l ] = F l [ x − α l − β l ]. Since the Y l are independent, we obtain � X e = 1 − F l [ x − α l − β l ] . l ∈L Can be faster to compute than X , and always has less variance.

  52. Draft 44 ANOVA Variances for estimator of P [ T > x ] in Stochastic Activity Network Stochastic Activity Network x = 64 x = 100 CMC, x = 64 CMC, x = 100 0 10 20 30 40 50 60 70 80 90 100 % of total variance for each cardinality of u

  53. Draft 45 Variance for estimator of P [ T > x ] for SAN 10 − 3 Stochastic Activity Network ( x = 64) 10 − 4 10 − 5 variance 10 − 6 MC Sobol 10 − 7 Lattice ( P 2 ) + baker n − 2 2 8 . 66 2 11 . 54 2 14 . 43 2 17 . 31 2 20 . 2

  54. Draft 46 Variance for estimator of P [ T > x ] with CMC 10 − 4 Stochastic Activity Network (CMC x = 64) 10 − 5 10 − 6 variance 10 − 7 MC Sobol 10 − 8 Lattice ( P 2 ) + baker n − 2 2 8 . 66 2 11 . 54 2 14 . 43 2 17 . 31 2 20 . 2

  55. Draft 47 Histograms, with n = 8191 and m = 10 , 000 single MC draw ( x = 100) 1 probability 0 . 8 0 . 6 0 . 4 0 . 2 0 0 0 . 5 1 MC estimator ( x = 100) 0 . 15 probability 0 . 1 5 · 10 − 2 0 6 7 · 10 − 2 RQMC estimator ( x = 100) probability 0 . 1 5 · 10 − 2 0 6 . 5 7

  56. Draft 48 Histograms, with n = 8191 and m = 10 , 000 single MC draw (CMC x = 100) probability 0 . 3 0 . 2 0 . 1 0 0 0 . 5 1 MC estimator (CMC x = 100) probability 0 . 15 0 . 1 5 · 10 − 2 0 6 6 . 5 7 · 10 − 2 RQMC estimator (CMC x = 100) 0 . 15 probability 0 . 1 5 · 10 − 2 0 6 . 4 6 . 5 6 . 6 6 . 7

  57. Draft 49 Effective dimension (Caflisch, Morokoff, and Owen 1997). A function f has effective dimension d in proportion ρ in the superposition sense if � σ 2 u ≥ ρσ 2 . | u |≤ d It has effective dimension d in the truncation sense if � σ 2 u ≥ ρσ 2 . u ⊆{ 1 ,..., d } High-dimensional functions with low effective dimension are frequent. One may change f to make this happen.

  58. Draft 50 Example: Function of a Multinormal vector Let µ = E [ f ( U )] = E [ g ( Y )] where Y = ( Y 1 , . . . , Y s ) ∼ N ( 0 , Σ ).

  59. Draft 50 Example: Function of a Multinormal vector Let µ = E [ f ( U )] = E [ g ( Y )] where Y = ( Y 1 , . . . , Y s ) ∼ N ( 0 , Σ ). For example, if the payoff of a financial derivative is a function of the values taken by a c -dimensional geometric Brownian motion (GMB) at d observations times 0 < t 1 < · · · < t d = T , then we have s = cd .

  60. Draft 50 Example: Function of a Multinormal vector Let µ = E [ f ( U )] = E [ g ( Y )] where Y = ( Y 1 , . . . , Y s ) ∼ N ( 0 , Σ ). For example, if the payoff of a financial derivative is a function of the values taken by a c -dimensional geometric Brownian motion (GMB) at d observations times 0 < t 1 < · · · < t d = T , then we have s = cd . To generate Y : Decompose Σ = AA t , generate Z = ( Z 1 , . . . , Z s ) ∼ N ( 0 , I ) where the (independent) Z j ’s are generated by inversion: Z j = Φ − 1 ( U j ), and return Y = AZ .

  61. Draft 50 Example: Function of a Multinormal vector Let µ = E [ f ( U )] = E [ g ( Y )] where Y = ( Y 1 , . . . , Y s ) ∼ N ( 0 , Σ ). For example, if the payoff of a financial derivative is a function of the values taken by a c -dimensional geometric Brownian motion (GMB) at d observations times 0 < t 1 < · · · < t d = T , then we have s = cd . To generate Y : Decompose Σ = AA t , generate Z = ( Z 1 , . . . , Z s ) ∼ N ( 0 , I ) where the (independent) Z j ’s are generated by inversion: Z j = Φ − 1 ( U j ), and return Y = AZ . Choice of A ?

  62. Draft 50 Example: Function of a Multinormal vector Let µ = E [ f ( U )] = E [ g ( Y )] where Y = ( Y 1 , . . . , Y s ) ∼ N ( 0 , Σ ). For example, if the payoff of a financial derivative is a function of the values taken by a c -dimensional geometric Brownian motion (GMB) at d observations times 0 < t 1 < · · · < t d = T , then we have s = cd . To generate Y : Decompose Σ = AA t , generate Z = ( Z 1 , . . . , Z s ) ∼ N ( 0 , I ) where the (independent) Z j ’s are generated by inversion: Z j = Φ − 1 ( U j ), and return Y = AZ . Choice of A ? Cholesky factorization: A is lower triangular.

  63. Draft 51 Principal component decomposition (PCA) (Ackworth et al. 1998): A = PD 1 / 2 where D = diag ( λ s , . . . , λ 1 ) (eigenvalues of Σ in decreasing order) and the columns of P are the corresponding unit-length eigenvectors.

  64. Draft 51 Principal component decomposition (PCA) (Ackworth et al. 1998): A = PD 1 / 2 where D = diag ( λ s , . . . , λ 1 ) (eigenvalues of Σ in decreasing order) and the columns of P are the corresponding unit-length eigenvectors. With this A , Z 1 accounts for the max amount of variance of Y , then Z 2 the max amount of variance cond. on Z 1 , etc.

  65. Draft 51 Principal component decomposition (PCA) (Ackworth et al. 1998): A = PD 1 / 2 where D = diag ( λ s , . . . , λ 1 ) (eigenvalues of Σ in decreasing order) and the columns of P are the corresponding unit-length eigenvectors. With this A , Z 1 accounts for the max amount of variance of Y , then Z 2 the max amount of variance cond. on Z 1 , etc. Function of a Brownian motion (or other L´ evy process): Payoff depends on c -dimensional Brownian motion { X ( t ) , t ≥ 0 } observed at times 0 = t 0 < t 1 < · · · < t d = T .

  66. Draft 51 Principal component decomposition (PCA) (Ackworth et al. 1998): A = PD 1 / 2 where D = diag ( λ s , . . . , λ 1 ) (eigenvalues of Σ in decreasing order) and the columns of P are the corresponding unit-length eigenvectors. With this A , Z 1 accounts for the max amount of variance of Y , then Z 2 the max amount of variance cond. on Z 1 , etc. Function of a Brownian motion (or other L´ evy process): Payoff depends on c -dimensional Brownian motion { X ( t ) , t ≥ 0 } observed at times 0 = t 0 < t 1 < · · · < t d = T . Sequential (or random walk) method: generate X ( t 1 ), then X ( t 2 ) − X ( t 1 ), then X ( t 3 ) − X ( t 2 ), etc.

  67. Draft 51 Principal component decomposition (PCA) (Ackworth et al. 1998): A = PD 1 / 2 where D = diag ( λ s , . . . , λ 1 ) (eigenvalues of Σ in decreasing order) and the columns of P are the corresponding unit-length eigenvectors. With this A , Z 1 accounts for the max amount of variance of Y , then Z 2 the max amount of variance cond. on Z 1 , etc. Function of a Brownian motion (or other L´ evy process): Payoff depends on c -dimensional Brownian motion { X ( t ) , t ≥ 0 } observed at times 0 = t 0 < t 1 < · · · < t d = T . Sequential (or random walk) method: generate X ( t 1 ), then X ( t 2 ) − X ( t 1 ), then X ( t 3 ) − X ( t 2 ), etc. Bridge sampling (Moskowitz and Caflisch 1996). Suppose d = 2 m . generate X ( t d ), then X ( t d / 2 ) conditional on ( X (0) , X ( t d )),

  68. Draft 51 Principal component decomposition (PCA) (Ackworth et al. 1998): A = PD 1 / 2 where D = diag ( λ s , . . . , λ 1 ) (eigenvalues of Σ in decreasing order) and the columns of P are the corresponding unit-length eigenvectors. With this A , Z 1 accounts for the max amount of variance of Y , then Z 2 the max amount of variance cond. on Z 1 , etc. Function of a Brownian motion (or other L´ evy process): Payoff depends on c -dimensional Brownian motion { X ( t ) , t ≥ 0 } observed at times 0 = t 0 < t 1 < · · · < t d = T . Sequential (or random walk) method: generate X ( t 1 ), then X ( t 2 ) − X ( t 1 ), then X ( t 3 ) − X ( t 2 ), etc. Bridge sampling (Moskowitz and Caflisch 1996). Suppose d = 2 m . generate X ( t d ), then X ( t d / 2 ) conditional on ( X (0) , X ( t d )), then X ( t d / 4 ) conditional on ( X (0) , X ( t d / 2 )), and so on. The first few N (0 , 1) r.v.’s already sketch the path trajectory.

  69. Draft 51 Principal component decomposition (PCA) (Ackworth et al. 1998): A = PD 1 / 2 where D = diag ( λ s , . . . , λ 1 ) (eigenvalues of Σ in decreasing order) and the columns of P are the corresponding unit-length eigenvectors. With this A , Z 1 accounts for the max amount of variance of Y , then Z 2 the max amount of variance cond. on Z 1 , etc. Function of a Brownian motion (or other L´ evy process): Payoff depends on c -dimensional Brownian motion { X ( t ) , t ≥ 0 } observed at times 0 = t 0 < t 1 < · · · < t d = T . Sequential (or random walk) method: generate X ( t 1 ), then X ( t 2 ) − X ( t 1 ), then X ( t 3 ) − X ( t 2 ), etc. Bridge sampling (Moskowitz and Caflisch 1996). Suppose d = 2 m . generate X ( t d ), then X ( t d / 2 ) conditional on ( X (0) , X ( t d )), then X ( t d / 4 ) conditional on ( X (0) , X ( t d / 2 )), and so on. The first few N (0 , 1) r.v.’s already sketch the path trajectory. Each of these methods corresponds to some matrix A . Choice has a large impact on the ANOVA decomposition of f .

  70. Draft 52 Example: Pricing an Asian basket option We have c assets, d observation times. Want to estimate E [ f ( U )], where   c d 1 f ( U ) = e − rT max � �  0 , S i ( t j ) − K  cd i =1 j =1 is the net discounted payoff and S i ( t j ) is the price of asset i at time t j .

  71. Draft 52 Example: Pricing an Asian basket option We have c assets, d observation times. Want to estimate E [ f ( U )], where   c d 1 f ( U ) = e − rT max � �  0 , S i ( t j ) − K  cd i =1 j =1 is the net discounted payoff and S i ( t j ) is the price of asset i at time t j . Suppose ( S 1 ( t ) , . . . , S c ( t )) obeys a geometric Brownian motion. Then, f ( U ) = g ( Y ) where Y = ( Y 1 , . . . , Y s ) ∼ N ( 0 , Σ ) and s = cd .

  72. Draft 52 Example: Pricing an Asian basket option We have c assets, d observation times. Want to estimate E [ f ( U )], where   c d 1 f ( U ) = e − rT max � �  0 , S i ( t j ) − K  cd i =1 j =1 is the net discounted payoff and S i ( t j ) is the price of asset i at time t j . Suppose ( S 1 ( t ) , . . . , S c ( t )) obeys a geometric Brownian motion. Then, f ( U ) = g ( Y ) where Y = ( Y 1 , . . . , Y s ) ∼ N ( 0 , Σ ) and s = cd . Even with Cholesky decompositions of Σ , the two-dimensional projections often account for more than 99% of the variance: low effective dimension in the superposition sense. With PCA or bridge sampling, we get low effective dimension in the truncation sense. In realistic examples, the first two coordinates Z 1 and Z 2 often account for more than 99.99% of the variance!

  73. Draft 53 Numerical experiment with c = 10 and d = 25 This gives a 250-dimensional integration problem. Let ρ i , j = 0 . 4 for all i � = j , T = 1, σ i = 0 . 1 + 0 . 4( i − 1) / 9 for all i , r = 0 . 04, S (0) = 100, and K = 100. (Imai and Tan 2002).

  74. Draft 53 Numerical experiment with c = 10 and d = 25 This gives a 250-dimensional integration problem. Let ρ i , j = 0 . 4 for all i � = j , T = 1, σ i = 0 . 1 + 0 . 4( i − 1) / 9 for all i , r = 0 . 04, S (0) = 100, and K = 100. (Imai and Tan 2002). Variance reduction factors for Cholesky (left) and PCA (right) (experiment from 2003): Korobov Lattice Rules n = 16381 n = 65521 n = 262139 a = 5693 a = 944 a = 21876 Lattice+shift 18 878 18 1504 9 2643 Lattice+shift+baker 50 4553 46 3657 43 7553 Sobol’ Nets n = 2 14 n = 2 16 n = 2 18 Sobol+Shift 10 1299 17 3184 32 6046 Sobol+LMS+Shift 6 4232 4 9219 35 16557 Note: The payoff function is not smooth and also unbounded!

  75. Draft 54 ANOVA Variances for ordinary Asian Option Asian Option with S (0) = 100, K = 100, r = 0 . 05, σ = 0 . 5 s = 3, seq. s = 3, BB s = 3, PCA s = 6, seq. s = 6, BB s = 6, PCA s = 12, seq. s = 12, BB s = 12, PCA 0 20 40 60 80 100 % of total variance for each cardinality of u

  76. Draft 55 Total Variance per Coordinate for the Asian Option Asian Option ( s = 6) with S (0) = 100, K = 100, r = 0 . 05, σ = 0 . 5 sequential Coordinate 1 Coordinate 2 Coordinate 3 BB Coordinate 4 Coordinate 5 Coordinate 6 PCA 0 20 40 60 80 100 % of total variance

Recommend


More recommend