Long time scales and unlikely events: sampling and coarse graining strategies Jonathan Weare University of Chicago October 26, 2012 Jonthan Weare
Example: the Kuroshio Figure : Top: Mean flow paths in the large meander state. Bottom: Mean flow paths in the small meander state Jonthan Weare
In the case of the meander transition of the Kuroshio we might like to know: How does this event occur, i.e. what rearrangements have to happen to trigger the event? How does the frequency or severity of the event depend on various environmental parameters? Can we predict the event from data in real-time? Similar questions are relevant for chemical reactions or the failure of a reliable electronic device or dramatic swings in a stock price . Jonthan Weare
Answering any of these questions requires not only that we can simulate the underlying system but that we can simulate the rare event itself. In the case of the Kuroshio one simulates on about the seconds–minutes time scale and the event occurs every 5–10 years. In the case of a chemical reaction one simulates on about the femtosecond time scale and the event might occur, for example, on the seconds time scale. Certain memory devices are expected to carry out 10 15 or so write operations without failure. Jonthan Weare
So what are our options? We could build a faster bigger computer (e.g. Anton), build 1 faster models (e.g. dimensional reduction), or better parallelization strategies (e.g. parallel-in-time) to reach longer time scales with direct simulation. We could try to directly sample the rare events of interest. 2 For example by importance sampling methods based on careful low temperature analysis. highly parallelizable ensemble methods that use particle interaction (e.g. cloning and killing) to bias dynamics. We could try to find (analytically or computationally) a 3 smaller/cheaper coarse grained model to describe the large scale features of the system that we are really interested in. This may require modeling the memory or at least quantifying its effect. Jonthan Weare
So what are our options? We could build a faster bigger computer (e.g. Anton), build 1 faster models (e.g. dimensional reduction), or better parallelization strategies (e.g. parallel-in-time) to reach longer time scales with direct simulation. We could try to directly sample the rare events of interest. 2 For example by importance sampling methods based on careful low temperature analysis. highly parallelizable ensemble methods that use particle interaction (e.g. cloning and killing) to bias dynamics. We could try to find (analytically or computationally) a 3 smaller/cheaper coarse grained model to describe the large scale features of the system that we are really interested in. This may require modeling the memory or at least quantifying its effect. Jonthan Weare
So what are our options? We could build a faster bigger computer (e.g. Anton), build 1 faster models (e.g. dimensional reduction), or better parallelization strategies (e.g. parallel-in-time) to reach longer time scales with direct simulation. We could try to directly sample the rare events of interest. 2 For example by importance sampling methods based on careful low temperature analysis. highly parallelizable ensemble methods that use particle interaction (e.g. cloning and killing) to bias dynamics. We could try to find (analytically or computationally) a 3 smaller/cheaper coarse grained model to describe the large scale features of the system that we are really interested in. This may require modeling the memory or at least quantifying its effect. Jonthan Weare
Diffusion Monte Carlo (with M. Hairer) We’ll begin with perhaps the simplest and widely used ensemble sampling algorithm (though it’s use in rare event simulation is recent). The original motivation for DMC was to compute the ground state energy of the Hamiltonian H ψ = − 1 2 ∆ ψ + u ψ. But it’s a very general idea. Jonthan Weare
Diffusion Monte Carlo generates an ensemble of points X i ( t ) so that N ( t k ) � f ( X ( t k )) e − � k j = 1 v ( X ( t j − 1 ) , X ( t j )) � � = E E f ( X i ( t k )) i = 1 for any reasonable observable f where X ( t 0 ) , X ( t 1 ) , X ( t 2 ) , . . . is some underlying process. The ensemble of points evolves in two steps: Evolve each point according to the underlying dynamics for 1 one increment. To incorporate the additional “weight” factor e − v ( X ( s − 1 ) , X ( s )) 2 copy particles with large weight and kill those with low weight. Jonthan Weare
Jonthan Weare
Over the last 40 years or so the basic DMC idea has spread. For example in Sequential Monte Carlo (e.g. particle filters) one transforms N samples X i ( t k − 1 ) approximately drawn from some density p k − 1 ( x ) to N samples X i ( t k ) approximately drawn from � e − v ( y ) p ( y | x ) p k − 1 ( x ) dx p k ( y ) ∝ where e − v ( y ) is some “weight” function (e.g. from data) and p ( y | x ) is a transition density. This is done by Sampling X i ( t k ) ∼ p ( x | X ( t k − 1 )) 1 Weighting each sample by e − v ( X i ( t k )) 2 Resampling from the weighted distribution. 3 Jonthan Weare
Over the last 40 years or so the basic DMC idea has spread. For example in Sequential Monte Carlo (e.g. particle filters) one transforms N samples X i ( t k − 1 ) approximately drawn from some density p k − 1 ( x ) to N samples X i ( t k ) approximately drawn from � e − v ( y ) p ( y | x ) p k − 1 ( x ) dx p k ( y ) ∝ where e − v ( y ) is some “weight” function (e.g. from data) and p ( y | x ) is a transition density. This is done by Sampling X i ( t k ) ∼ p ( x | X ( t k − 1 )) 1 Weighting each sample by e − v ( X i ( t k )) 2 Resampling from the weighted distribution. 3 Jonthan Weare
In the Quantum Monte Carlo application v is essentially specified by the potential in the Hamiltonian. In other applications we may want to choose v to achieve some goal. What if we choose v ( x , y ) = G ( y ) − G ( x )? So if G decreases in a step more copies will be created. By choosing G to be relatively small in a region we can sample that region more thoroughly DMC can compute N ( t k ) e − G ( X ( 0 ) E � f ( X i ( t k )) e G ( X i ( t k ) = E [ f ( X ( t k ))] . i = 1 Jonthan Weare
We point out that DMC is very general and is already the central ingredient in SMC. But DMC has an instability in the continuous time limit for certain choices of v (e.g. rare event simulation or continuous time filtering). We suggest an improvement that stabilizes this limit. The improvement results in a method that is unconditionally superior to DMC (e.g. it improves QMC) but for rare event simulation and continuous time filtering it’s crucial. The continuous time limit (the Brownian fan) of the modified method interesting in it’s own right. Jonthan Weare
A rare event example: � dX ( t ) = −∇ V ( X ( t )) dt + 2 µ dW ( t ) Figure : ( Left ) Initial configuration x A . ( Right ) A typical snapshot after the transition. The 7 discs interact with each other through � � x i − x j � − 12 − � x i − x j � − 6 � � V ( x ) = 4 i < j Jonthan Weare
Using �� � x i − 1 � � G ( x ) = λ � � � v ( x , y ) = G ( y ) − G ( x ) , µ min x j . � � 7 i ≥ 2 � j where x 1 is the point at the center. We approximate P ( X ( 2 ) ∈ B ) where B is the event that �� � � � � x i − 1 � � min i ≥ 2 j x j < 0 . 1 . � � 7 � variance brute force estimate workload µ λ × workload variance 1.125 × 10 − 2 4.732 × 10 − 3 1.112 × 10 − 2 0.4 1.9 6.451 2.340 × 10 − 3 2.344 × 10 − 4 2.335 × 10 − 3 0.2 1.3 5.818 7.723 × 10 − 5 7.473 × 10 − 7 7.722 × 10 − 5 0.1 1 6.936 9.290 × 10 − 8 1.002 × 10 − 11 9.290 × 10 − 8 0.05 0.85 15.42 1.129 × 10 − 13 1.311 × 10 − 21 1.129 × 10 − 13 0.025 0.8 102.4 λ is adjusted so that the expected number of particles ending in B is close to 1. Jonthan Weare
Another data assimilation example: The solution to √ dX 1 ( t ) = 10 ( X 1 ( t ) − X 2 ( t )) dt + 2 dW 1 ( t ) √ dX 2 ( t ) = ( X 1 ( t )( 28 − X 3 ( t )) − X 2 ( t )) dt + 2 dW 2 ( t ) √ dX 3 ( t ) = ( X 1 ( t ) X 2 ( t ) − 8 3 X 3 ( t )) dt + 2 dW 3 ( t ) is hidden from us. 45 40 35 30 25 z 20 15 10 5 40 20 20 10 0 0 − 20 − 10 − 40 − 20 y x Jonthan Weare
We see only X 1 ( t ) dt + 0 . 1 dB ( t ) dH ( t ) = X 2 ( t ) X 3 ( t ) 0.01 0 x − 0.01 0 2 4 6 8 10 12 14 16 18 20 0.01 0 y − 0.01 0 2 4 6 8 10 12 14 16 18 20 − 3 10 x 10 5 z 0 − 5 0 2 4 6 8 10 12 14 16 18 20 time Our task is to estimate the hidden signal by computing � � ( X 1 ( t ) , X 2 ( t ) , X 3 ( t )) | F H E t Jonthan Weare
With 10 particles: 20 10 x 0 − 10 − 20 0 2 4 6 8 10 12 14 16 18 20 40 20 0 y − 20 − 40 0 2 4 6 8 10 12 14 16 18 20 60 40 z 20 0 0 2 4 6 8 10 12 14 16 18 20 time Jonthan Weare
What if we’d done the same thing without the tickets (i.e. standard DMC) 20 10 0 x − 10 − 20 0 2 4 6 8 10 12 14 16 18 20 40 20 0 y − 20 − 40 0 2 4 6 8 10 12 14 16 18 20 60 40 z 20 0 0 2 4 6 8 10 12 14 16 18 20 time Jonthan Weare
Recommend
More recommend