long time scales and unlikely events sampling and coarse
play

Long time scales and unlikely events: sampling and coarse graining - PowerPoint PPT Presentation

Long time scales and unlikely events: sampling and coarse graining strategies Jonathan Weare University of Chicago October 26, 2012 Jonthan Weare Example: the Kuroshio Figure : Top: Mean flow paths in the large meander state. Bottom: Mean


  1. Long time scales and unlikely events: sampling and coarse graining strategies Jonathan Weare University of Chicago October 26, 2012 Jonthan Weare

  2. Example: the Kuroshio Figure : Top: Mean flow paths in the large meander state. Bottom: Mean flow paths in the small meander state Jonthan Weare

  3. In the case of the meander transition of the Kuroshio we might like to know: How does this event occur, i.e. what rearrangements have to happen to trigger the event? How does the frequency or severity of the event depend on various environmental parameters? Can we predict the event from data in real-time? Similar questions are relevant for chemical reactions or the failure of a reliable electronic device or dramatic swings in a stock price . Jonthan Weare

  4. Answering any of these questions requires not only that we can simulate the underlying system but that we can simulate the rare event itself. In the case of the Kuroshio one simulates on about the seconds–minutes time scale and the event occurs every 5–10 years. In the case of a chemical reaction one simulates on about the femtosecond time scale and the event might occur, for example, on the seconds time scale. Certain memory devices are expected to carry out 10 15 or so write operations without failure. Jonthan Weare

  5. So what are our options? We could build a faster bigger computer (e.g. Anton), build 1 faster models (e.g. dimensional reduction), or better parallelization strategies (e.g. parallel-in-time) to reach longer time scales with direct simulation. We could try to directly sample the rare events of interest. 2 For example by importance sampling methods based on careful low temperature analysis. highly parallelizable ensemble methods that use particle interaction (e.g. cloning and killing) to bias dynamics. We could try to find (analytically or computationally) a 3 smaller/cheaper coarse grained model to describe the large scale features of the system that we are really interested in. This may require modeling the memory or at least quantifying its effect. Jonthan Weare

  6. So what are our options? We could build a faster bigger computer (e.g. Anton), build 1 faster models (e.g. dimensional reduction), or better parallelization strategies (e.g. parallel-in-time) to reach longer time scales with direct simulation. We could try to directly sample the rare events of interest. 2 For example by importance sampling methods based on careful low temperature analysis. highly parallelizable ensemble methods that use particle interaction (e.g. cloning and killing) to bias dynamics. We could try to find (analytically or computationally) a 3 smaller/cheaper coarse grained model to describe the large scale features of the system that we are really interested in. This may require modeling the memory or at least quantifying its effect. Jonthan Weare

  7. So what are our options? We could build a faster bigger computer (e.g. Anton), build 1 faster models (e.g. dimensional reduction), or better parallelization strategies (e.g. parallel-in-time) to reach longer time scales with direct simulation. We could try to directly sample the rare events of interest. 2 For example by importance sampling methods based on careful low temperature analysis. highly parallelizable ensemble methods that use particle interaction (e.g. cloning and killing) to bias dynamics. We could try to find (analytically or computationally) a 3 smaller/cheaper coarse grained model to describe the large scale features of the system that we are really interested in. This may require modeling the memory or at least quantifying its effect. Jonthan Weare

  8. Diffusion Monte Carlo (with M. Hairer) We’ll begin with perhaps the simplest and widely used ensemble sampling algorithm (though it’s use in rare event simulation is recent). The original motivation for DMC was to compute the ground state energy of the Hamiltonian H ψ = − 1 2 ∆ ψ + u ψ. But it’s a very general idea. Jonthan Weare

  9. Diffusion Monte Carlo generates an ensemble of points X i ( t ) so that   N ( t k ) � f ( X ( t k )) e − � k j = 1 v ( X ( t j − 1 ) , X ( t j )) � �  = E E f ( X i ( t k ))  i = 1 for any reasonable observable f where X ( t 0 ) , X ( t 1 ) , X ( t 2 ) , . . . is some underlying process. The ensemble of points evolves in two steps: Evolve each point according to the underlying dynamics for 1 one increment. To incorporate the additional “weight” factor e − v ( X ( s − 1 ) , X ( s )) 2 copy particles with large weight and kill those with low weight. Jonthan Weare

  10. Jonthan Weare

  11. Over the last 40 years or so the basic DMC idea has spread. For example in Sequential Monte Carlo (e.g. particle filters) one transforms N samples X i ( t k − 1 ) approximately drawn from some density p k − 1 ( x ) to N samples X i ( t k ) approximately drawn from � e − v ( y ) p ( y | x ) p k − 1 ( x ) dx p k ( y ) ∝ where e − v ( y ) is some “weight” function (e.g. from data) and p ( y | x ) is a transition density. This is done by Sampling X i ( t k ) ∼ p ( x | X ( t k − 1 )) 1 Weighting each sample by e − v ( X i ( t k )) 2 Resampling from the weighted distribution. 3 Jonthan Weare

  12. Over the last 40 years or so the basic DMC idea has spread. For example in Sequential Monte Carlo (e.g. particle filters) one transforms N samples X i ( t k − 1 ) approximately drawn from some density p k − 1 ( x ) to N samples X i ( t k ) approximately drawn from � e − v ( y ) p ( y | x ) p k − 1 ( x ) dx p k ( y ) ∝ where e − v ( y ) is some “weight” function (e.g. from data) and p ( y | x ) is a transition density. This is done by Sampling X i ( t k ) ∼ p ( x | X ( t k − 1 )) 1 Weighting each sample by e − v ( X i ( t k )) 2 Resampling from the weighted distribution. 3 Jonthan Weare

  13. In the Quantum Monte Carlo application v is essentially specified by the potential in the Hamiltonian. In other applications we may want to choose v to achieve some goal. What if we choose v ( x , y ) = G ( y ) − G ( x )? So if G decreases in a step more copies will be created. By choosing G to be relatively small in a region we can sample that region more thoroughly DMC can compute   N ( t k ) e − G ( X ( 0 ) E � f ( X i ( t k )) e G ( X i ( t k )  = E [ f ( X ( t k ))] .  i = 1 Jonthan Weare

  14. We point out that DMC is very general and is already the central ingredient in SMC. But DMC has an instability in the continuous time limit for certain choices of v (e.g. rare event simulation or continuous time filtering). We suggest an improvement that stabilizes this limit. The improvement results in a method that is unconditionally superior to DMC (e.g. it improves QMC) but for rare event simulation and continuous time filtering it’s crucial. The continuous time limit (the Brownian fan) of the modified method interesting in it’s own right. Jonthan Weare

  15. A rare event example: � dX ( t ) = −∇ V ( X ( t )) dt + 2 µ dW ( t ) Figure : ( Left ) Initial configuration x A . ( Right ) A typical snapshot after the transition. The 7 discs interact with each other through � � x i − x j � − 12 − � x i − x j � − 6 � � V ( x ) = 4 i < j Jonthan Weare

  16. Using �� � x i − 1 � � G ( x ) = λ � � � v ( x , y ) = G ( y ) − G ( x ) , µ min x j . � � 7 i ≥ 2 � j where x 1 is the point at the center. We approximate P ( X ( 2 ) ∈ B ) where B is the event that �� � � � � x i − 1 � � min i ≥ 2 j x j < 0 . 1 . � � 7 � variance brute force estimate workload µ λ × workload variance 1.125 × 10 − 2 4.732 × 10 − 3 1.112 × 10 − 2 0.4 1.9 6.451 2.340 × 10 − 3 2.344 × 10 − 4 2.335 × 10 − 3 0.2 1.3 5.818 7.723 × 10 − 5 7.473 × 10 − 7 7.722 × 10 − 5 0.1 1 6.936 9.290 × 10 − 8 1.002 × 10 − 11 9.290 × 10 − 8 0.05 0.85 15.42 1.129 × 10 − 13 1.311 × 10 − 21 1.129 × 10 − 13 0.025 0.8 102.4 λ is adjusted so that the expected number of particles ending in B is close to 1. Jonthan Weare

  17. Another data assimilation example: The solution to √ dX 1 ( t ) = 10 ( X 1 ( t ) − X 2 ( t )) dt + 2 dW 1 ( t ) √ dX 2 ( t ) = ( X 1 ( t )( 28 − X 3 ( t )) − X 2 ( t )) dt + 2 dW 2 ( t ) √ dX 3 ( t ) = ( X 1 ( t ) X 2 ( t ) − 8 3 X 3 ( t )) dt + 2 dW 3 ( t ) is hidden from us. 45 40 35 30 25 z 20 15 10 5 40 20 20 10 0 0 − 20 − 10 − 40 − 20 y x Jonthan Weare

  18. We see only  X 1 ( t )   dt + 0 . 1 dB ( t ) dH ( t ) = X 2 ( t )  X 3 ( t ) 0.01 0 x − 0.01 0 2 4 6 8 10 12 14 16 18 20 0.01 0 y − 0.01 0 2 4 6 8 10 12 14 16 18 20 − 3 10 x 10 5 z 0 − 5 0 2 4 6 8 10 12 14 16 18 20 time Our task is to estimate the hidden signal by computing � � ( X 1 ( t ) , X 2 ( t ) , X 3 ( t )) | F H E t Jonthan Weare

  19. With 10 particles: 20 10 x 0 − 10 − 20 0 2 4 6 8 10 12 14 16 18 20 40 20 0 y − 20 − 40 0 2 4 6 8 10 12 14 16 18 20 60 40 z 20 0 0 2 4 6 8 10 12 14 16 18 20 time Jonthan Weare

  20. What if we’d done the same thing without the tickets (i.e. standard DMC) 20 10 0 x − 10 − 20 0 2 4 6 8 10 12 14 16 18 20 40 20 0 y − 20 − 40 0 2 4 6 8 10 12 14 16 18 20 60 40 z 20 0 0 2 4 6 8 10 12 14 16 18 20 time Jonthan Weare

Recommend


More recommend