+ 2D Ising Model (3) 35 โ๐พ ๐ก ๐ฆ๐ ๐ก(๐ฆ๐) ๐ฆ๐๐ฆ๐ โ๐น ๐ โ๐พ๐ผ(๐ก) ๐ ๏ฎ P( ๐ก ) = = ๐ ๐ ๏ฎ If ๐พ = 0 all spin configurations have same probability ๏ฎ If ๐พ > 0 lower energy preferred
+ Exact Inference is Hard 36 ๏ฎ Posterior distribution over ๐ก ๏ฎ If ๐ is any observed variable ๐ ๐ก,๐ง ๏ฎ ๐ ๐ก ๐) = ๐(๐ก,๐ง) ๏ฎ Intractable computation ๏ฎ Joint probability distribution - 2 ๐ possible combinations of ๐ก ๏ฎ Marginal probability distribution at a site ๏ฎ MAP estimation
+ The Big Question 37 ๏ฎ Given ๐ on ๐ = ๐ก 1 , ๐ก 2 , โฆ , ๐ก ๐ simulate a random object with distribution ๐
+ Methods 38 ๏ฎ Generate random samples to estimate a quantity ๏ฎ Samples are generated โMarkov - chain styleโ ๏ฎ Markov Chain Monte Carlo (MCMC) ๏ฎ Propp Wilson Simulation ๏ฎ Las Vegas variant ๏ฎ Sandwiching ๏ฎ Improvement on Propp Wilson
+ MARKOV CHAIN MONTE CARLO METHOD (MCMC) YAMILET R. SERRANO LL. 39
+ Recall 40 Given a probability distribution ฯ on S = {s 1 , โฆ, s k }, how do we simulate a random object with distribution ฯ ?
+ Intuition 41 Given a probability distribution ฯ on S = {s 1 , โฆ, s k }, how do we simulate a random object with distribution ฯ ? High Dimension Space
+ MCMC 42 1. Construct an irreducible and aperiodic Markov Chain [X 0 ,X 1 , โฆ ], whose stationary distribution ฯ . 2. If we run the chain with arbitrary initial distribution then 1. The Markov Chain Convergence Theorem guarantees that the distribution of the chain at time n converges to ฯ . 3. Hence, if we run the chain for a sufficiently long time n, then the distribution of X n , will be very close to ฯ . So it can be used as a sample.
+ MCMC(2) 43 ๏ฎ Generally, two types of MCMC algorithm ๏ฎ Metropolis Hasting ๏ฎ Gibbs Sampling
+ Metropolis Hastings Algorithm 44 ๏ฎ Original method: ๏ฎ Metropolis, Rosenbluth, Rosenbluth, Teller and Teller (1953). ๏ฎ Generalized by Hasting in 1970. ๏ฎ Rediscovered by Tanner and Wong (1987) and Gelfang and Smith (1990) ๏ฎ Is one way to implement MCMC. Nicholas Metropolis
+ Metropolis Hastings Algorithm 45 Basic Idea GIVEN: A probability distribution ฯ on S = {s 1 , โฆ, s k } GOAL : Approx. sample from ฯ Start with a proposal distribution Q(x,y) x: current state โข Q(x,y) specifies transition of Markov Chain y: new proposal โข Q(x,y) plays the role of the transition matrix By accepting/rejecting the proposal, MH simulates a Markov Chain, whose stationary distribution is ฯ
+ Metropolis Hastings Algorithm 46 Algorithm A probability distribution ฯ on S = {s 1 , โฆ, s k } GIVEN: Approx. sample from ฯ GOAL : Given current sample, ๏ผ Draw y from the proposal distribution, ๏ผ Draw U ~ (0,1) and update where the acceptance probability is
+ Metropolis Hastings Algorithm 47 The Ising Model Consider m sites around a circle. Each site i can have one of two spins x i ๏ {-1,1} The target distribution :
+ Metropolis Hastings Algorithm The Ising Model Target Distribution: Proposal Distribution: 1. Randomly pick one out of the m spins 2. Flip its sign Acceptance probability (say i-th spin flipped):
+ Metropolis Hastings Algorithm 49 The Ising Model Image has been taken from Montecarlo investigation of the Ising Model(2006) โ Tobin Fricke
+ Disadvantages of MCMC 50 ๏ฎ No matter how large n is taken to be in the MCMC algorithm, there will still be some discrepancy between the distribution of the output and the target distribution ฯ . ๏ฎ In order to make the previous error small, we need to figure out how large n needs to be.
+ Bounds on Convergence of MCMC Aditya Kulkarni 51
+ Seminar on probabilities 52 (Strasunbourge - 1983) ๏ฎ If it is difficult to obtain asymptotic bounds on the convergence time of MCMC algorithms for Ising models, use quantitative bounds ๏ฎ Use a characteristics of Ising model MCMC algorithm to obtain quantitative bounds David Aldous
+ What we need to ensure 53 ๏ฎ As we go along the time We approach to the stationary distribution in a monotonically decreasing fashion. ๐ ๐ข โ 0 as ๐ข โ โ ๏ฎ When we stop The sample should follow a distribution which is not further apart from a stationary distribution by some factor ๐ป . ๐ ๐ข โค ๐
+ Total Variation Distance || โ || ๐๐ 54 ๏ฎ Given two distributions ๐ and ๐ over a finite set of states ๐ , ๏ฎ The total variation distance โ โ โ ๐๐ is 1 1 | ๐ โ ๐ | ๐๐ = 2 ||๐ โ ๐|| 1 = 2 ๐ฆโ๐ |๐ ๐ฆ โ ๐ ๐ฆ |
+ Convergence Time 55 ๏ฎ Let ๐ = ๐ 0 , ๐ 1 , โฆ be a Markov Chain of a state space with stationary distribution ๐ ๏ฎ We define ๐ ๐ข as the worst total variation distance at time ๐ข . ๐ ๐ข = ๐๐๐ฆ ๐ฆโ๐ ||๐ ๐ ๐ข ๐ 0 = ๐ฆ โ ๐|| ๐๐ ๏ฎ The mixing time ๐ข is the minimum time ๐ข such that ๐ ๐ข is at most ๐ . ฯ (๐) = min {๐ข โถ ๐ ๐ข โค ๐} 1 2๐ = min {๐ข โถ ๐ ๐ข โค 1 2๐ } ฯ = ฯ ๏ฎ Define
+ Quantitative results 56 1 ๐ ๐ ๐ ๐๐ ๐ป ฯ ฯ (๐ป) 0 ๐
+ Lemma 57 ๏ฎ Consider two random walks started from state i and state j Define ๐ ๐ข as the worst total variation distance between their respective probability distributions at time ๐ข a. ๐ ๐ข โค 2๐ ๐ข b. ๐(๐ข) is decreasing
+ Upper bound on d ( ๐ข ) proof 58 From part a and the definition of ฯ = ฯ 1 2๐ = min {๐ข โถ ๐ ๐ข โค 1 2๐ } we get ๐ ๐ โค ๐ โ1 Also from part b, we deduce that upper bound of ๐ at a particular time ๐ข 0 gives upper bound for later times ๐ ๐ข โค (๐(๐ข 0 )) ๐ ; ๐๐ข 0 โค ๐ข โค (๐ + 1)๐ข 0 ๐ข ๐ข 0 โ1) ๐ ๐ข โค (๐(๐ข 0 )) ( Substitute ๐ instead of ๐ข 0 ๐ข ๐ โ1) ๐ ๐ข โค (๐(๐)) ( ๐ข ฯ , ๐ข โฅ 0 ๐ ๐ข โค exp 1 โ
+ Upper bound on ฯ ( ๐ ) proof 59 ๏ฎ Algebraic calculations: ๐ข ฯ ๐ ๐ข โค exp 1 โ log ๐(๐ข) โค 1 โ ๐ข ๐ ๐ข 1 ๐(๐ข) ) ๐ โค 1 โ log ๐ ๐ข โค 1 + log( ๐ข โค ๐ โ (1 + log( 1 ๐ )) ๐ ๐ โค 2๐๐ข โ (1 + log( 1 ๐ ))
+ Upper bounds 60 ๐ข ฯ ), ๐ข โฅ 0 d ๐ข โค min(1, exp 1 โ 1 ๐ ), 0 < ๐ < 1 ฯ ( ๐ ) โค ฯ โ (1 + log ฯ ( ๐ ) โค 2๐๐ข โ (1 + log 1 ๐ ), 0 < ๐ < 1
+ Entropy of initial distribution 61 Measure of randomness: ๐๐๐ข ๐ = โ ๐ ๐ฆ log ๐ ๐ฆ ๐ฆโ๐ ๐ฆ is initial state ๐ is initial probability distribution
+ Few more lemmaโs 62 1. Let (๐ ๐ข ) is a random walk associated with ๐ , and ๐ ๐ข be the distribution of ๐ ๐ข then ๐๐๐ข ๐ ๐ข โค ๐ข โ ๐๐๐ข(๐) 2. If ๐ค is a distribution of ๐ such that ๐ค โ ๐ โค ๐ then ๐๐๐ข ๐ค โฅ (1 โ ๐) log |๐|
+ Lower bound on ๐(๐ข) proof 63 ๏ฎ From lemma 2, ๐๐๐ข ๐ค โฅ 1 โ ๐ log ๐ ๐๐๐ข ๐ค log |๐| โฅ 1 โ ๐ โฅ (1 โ ๐(๐ข)) ๐ ๐ข โฅ 1 โ ๐๐๐ข ๐ค log |๐| ๐ขโ ๐๐๐ข ๐ ๐ ๐ข โฅ 1 โ log |๐|
+ Lower bound on ฯ ( ฮต ) proof 64 ๏ฎ From lemma 1, ๐๐๐ข ๐ ๐ข โค ๐ข ๐๐๐ข(๐) ๐๐๐ข ๐ ๐ข ๐ข โฅ ๐๐๐ข(๐) ๏ฎ From lemma 2, ๐๐๐ข ๐ ๐ข โฅ (1 โ ๐) log |๐| ๐ข โฅ 1 โ ๐ log |๐| ๐๐๐ข(๐) ฯ ( ๐ ) โฅ 1 โ ๐ log |๐| ๐๐๐ข(๐)
+ 65 Lower bounds ๐ ๐ข โฅ 1 โ ๐ข โ ๐๐๐ข(๐) log ๐ 1 โ ๐ log |๐| ฯ ๐ โฅ ๐๐๐ข(๐)
+ Propp Wilson Tobias Bertelsen 66
+ An exact version of MCMC 67 ๏ฎ Problems with MCMC A. Have accuracy error, which depends on starting state We must know number of iterations B. ๏ฎ James Propp and David Wilson propos Coupling from the past (1996) ๏ฎ A.k.a. the Propp-Wilson algorithm ๏ฎ Idea: ๏ฎ Solve problems by running chain infinitely
+ An exact version of MCMC 68 Theoretical โข Runs all configurations infinitely โข Literarily takes โ time โข Impossible Coupling from the past โข Runs all configurations for finite time โข Might take 1000โs of years โข Infeasible Sandwiching โข Run few configurations for finite time โข Takes seconds โข Practicable
+ Theoretical exact sampling 69 ๏ฎ Recall the convergence theorem: ๏ฎ We will approach the stationary distribution as the number of steps goes to infinity ๏ฎ Intuitive approach: ๏ฎ To sample perfectly we start a chain and run for infinity ๏ฎ Start at ๐ข = 0 , sample at ๐ข = โ ๏ฎ Problem: We never get a sample ๏ฎ Alternative approach: ๏ฎ To sample perfectly we take a chain that have already been running for an infinite amount of time ๏ฎ Start at ๐ข = โโ , sample at ๐ข = 0
+ Theoretical independence of 70 starting state ๏ฎ Sample from a Markov chain in MCMC depends solely on ๏ฎ Starting state ๐ โโ ๏ฎ Sequence of random numbers ๐ ๏ฎ We want to be independent of the starting state. ๏ฎ For a given sequence of random numbers ๐ โโ , โฆ , ๐ โ1 we want to ensure that the starting state ๐ โโ has no effect on ๐ 0
+ Theoretical independence of 71 starting state Collisions: ๏ฎ For a given ๐ โโ , โฆ , ๐ โ1 if two Markov chains is at the same state at some ๐ขโฒ the will continue on together:
+ Coupling from the past 72 ๏ฎ At some finite past time ๐ข = โ๐ ๏ฎ All past chains has already run infinitively and has coupled into one โ โ ๐ = โ ๏ฎ We want to continue that coupled chain to ๐ข = 0 ๏ฎ But we donโt know which at state they will be at ๐ข = โ๐ ๏ฎ Run all states from โ๐ instead of โโ
+ Coupling from the past 73 Let ๐ = ๐ โ1 , ๐ โ2 , ๐ โ3 , โฆ be the sequence of independnent 1. uniformly random numbers For ๐ ๐ โ 1,2,4,8, โฆ 2. Extend ๐ to length ๐ ๐ , keeping ๐ โ1 , โฆ , ๐ ๐ ๐โ1 the same 1. Start one chain from each state at ๐ข = โ๐ 2. ๐ For ๐ข from โ๐ ๐ to zero: Simulate the chains using ๐ ๐ข 3. If all chains has converged at ๐ ๐ at ๐ข = 0 , return ๐ ๐ 4. Else repeat loop 5.
+ Coupling from the past 74
+ Questions 75 ๏ฎ Why do we double the lengths? ๏ฎ Worst case < 4๐ ๐๐๐ข steps, where ๐ ๐๐๐ข is the minimal ๐ at which we can achieve convergence ๏ฎ Compare to N โ [1,2,3,4,5, โฆ ] , ๐ ๐ ๐๐๐ข 2 steps ๏ฎ Why do we have to use the same random numbers? ๏ฎ Different samples might take longer or shorter to converge. ๏ฎ We must evaluate the same sample in each iteration. ๏ฎ The sample should only be dependent on ๐ not the different ๐
+ Using the same ๐ 76 ๏ฎ We have ๐ states with the following update function 1 1 ๐ ๐ , ๐ = ๐ 1 ๐ < ๐ ๐ ๐ , ๐ = ๐ 1 ๐ < 2 2 ๐ ๐+1 ๐๐ขโ๐๐ ๐ฅ๐๐ก๐ ๐ ๐ ๐๐ขโ๐๐ ๐ฅ๐๐ก๐ 1 2 ๏ฎ ๐ ๐ ๐ = 2 ๐ , ๐ ๐ = 2 ๐
+ 77 Using the same ๐ ๏ฎ The probability of ๐ 1 only depends on the last random number: ๐ ๐ 1 = ๐ ๐ โ1 < 1 = 1 2 2 ๏ฎ Lets assume we generate a new ๐ for each run: ๐ 1 , ๐ 2 , ๐ 3 , โฆ ๐ธ ๐ป ๐ acc. ๐ ๐ฝ ๐ธ ๐ป ๐ 1 1 < 1 1 50 % ๐ โ1 ๐ ๐ โ1 2 2 , ๐ โ1 2 2 < 1 1 < 1 2 ๐ โ2 75 % ๐ ๐ โ1 2 โจ ๐ โ1 2 3 , โฆ , ๐ โ1 3 3 < 1 2 < 1 1 < 1 4 81.25 % ๐ โ4 ๐ ๐ โ1 2 โจ ๐ โ1 2 โจ ๐ โ1 2 4 , โฆ , ๐ โ1 4 ] 3 < 1 2 < 1 1 < 1 4 < 1 8 [๐ โ8 81.64 % ๐ ๐ โ1 2 โจ ๐ โ1 2 โจ ๐ โ1 2 โจ ๐ 1โ 2
+ 78 Using the same ๐ ๏ฎ The probability of ๐ 1 only depends on the last random number: ๐ ๐ 1 = ๐ ๐ โ1 < 1 = 1 2 2 ๏ฎ Lets instead use the same ๐ for each run: ๐ 1 ๐ธ ๐ป ๐ acc. ๐ ๐ฝ ๐ธ ๐ป ๐ 1 1 < 1 1 50 % ๐ โ1 ๐ ๐ โ1 2 1 , ๐ โ1 1 1 < 1 2 ๐ โ2 50 % ๐ ๐ โ1 2 1 , โฆ , ๐ โ1 1 1 < 1 4 50 % ๐ โ4 ๐ ๐ โ1 2 1 , โฆ , ๐ โ1 1 ] 1 < 1 8 [๐ โ8 50 % ๐ ๐ โ1 2
+ Problem 79 ๏ฎ In each step we update up to ๐ chains ๏ฎ Total execution time is ๐ ๐ ๐๐๐ข ๐ ๏ฎ BUT ๐ = 2 ๐ 2 in the Ising model ๏ฎ Worse than the naรฏve approach ๐(๐)
+ Sandwiching Nirandika Wanigasekara 80
+ Sandwiching 81 โข Many vertices ๏ running ๐ Propp Wilson chains will take time algorithm โข Impractical for large ๐ Choose a โข Can we still get the same relatively small results? state space โข Try sandwiching
+ Sandwiching 82 ๏ฎ Idea ๏ฎ Find two chains bounding all other chains ๏ฎ If we have such two boundary chains ๏ฎ Check if those two chains converge ๏ฎ Then all other chains have also converged
+ Sandwiching 83 ๏ฎ To come up with the boundary chains ๏ฎ Need a way to order the states ๏ฎ ๐ 1 โค ๐ 2 โค ๐ 3 โฆ ๏ฎ Chain in higher state does not cross a chain in lower state 2 2 2 2 2 2 1 1 1 1 1 1 ๏ฎ Results in a Markov Chain obeying certain monotonicity properties
+ Sandwiching 84 ๏ฎ Lets consider ๏ฎ A fixed set of states ๏ ๐ ๏ฎ State space ๐ = {1, โฆ . , ๐} ๏ฎ A transition matrix 12 = 1 ๏ฎ ๐ 11 = ๐ 2 ๏ฎ ๐ ๐๐ = ๐ ๐,๐โ1 = 1 2 ๏ฎ ๐๐๐ ๐ =2, โฆ.. ๐ โ 1, ๐ ๐,๐โ1 = ๐ ๐,๐+1 = 1 2 ๏ฎ All the other entries are 0
+ Sandwiching 85 ๏ฎ What is this Markov Chain doing? ๏ฎ Take one step up and one step down the ladder, at each integer 1 time, with probability 2 ๏ฎ If at the top or the bottom(state ๐ ) ๏ฎ It will stays where it is ๏ฎ Ladder Walk on ๐ vertices ๏ฎ The stationary distribution ๐ of this Markov Chain 1 ๏ฎ ๐ ๐ = ๐ ๐๐๐ ๐ = 1, โฆ . , ๐
+ Sandwiching 86 ๏ฎ Propp-Wilson for this Markov Chain with ๏ฎ Valid update function ๐ ๏ฎ if u < 1 2 then step down ๏ฎ if ๐ฃ โฅ 1 2 then step up ๏ฎ negative starting times ๐ 1 , ๐ 2 , โฆ = 1, 2, 4, 8, โฆ ๏ฎ ๏ฎ States ๐ = 5
+ Sandwiching 87
+ Sandwiching 88 ๏ฎ Update function preserves ordering between states ๏ฎ for all ๐ โ 0, 1 ๐๐๐ ๐๐๐ ๐, ๐ โ 1, โฆ . , ๐ ๐ก๐ฃ๐โ ๐ขโ๐๐ข ๐ โค ๐ ๐ฅ๐ โ๐๐ค๐ ๐ ๐, ๐ โค ๐(๐, ๐) ๏ฎ It is sufficient to run only 2 chains rather than k
+ Sandwiching 89 ๏ฎ Are these conditions always met ๏ฎ No, not always ๏ฎ But, there are frequent instances where these conditions are met ๏ฎ Especially useful when k is large ๏ฎ Ising model is a good example for this
+ Ising Model Malay Singh 90
+ 2D Ising Model 91 ๏ฎ A grid with sites numbered from 1 to ๐ 2 where ๐ is grid size. ๏ฎ Each site ๐ can have spin ๐ฆ ๐ โ โ1, +1 ๏ฎ ๐ is set of sites ๐ = โ1,1 ๐ defines all possible configurations (states) ๏ฎ Magnetisation ๐ for a state ๐ก is ๐ ๐ก ๐ฆ ๐ ๏ฎ ๐ ๐ก = ๐ 2 ๏ฎ The energy of a state ๐ก ๏ฎ H ๐ก = โ ๐๐ ๐ก ๐ฆ ๐ ๐ก ๐ฆ ๐
+ Ordering of states 92 ๏ฎ For two states ๐ก, ๐ก โฒ we say s โค ๐ก โฒ if ๐ก ๐ฆ < ๐ก โฒ ๐ฆ โ ๐ฆ โ ๐ Maxima ๐ = 1 Minima ๐ = โ1 ๐ก min x = โ1 โ๐ฆ โ ๐ ๐กmax x = +1 โ๐ฆ โ ๐ Hence ๐ก min โค ๐ก โค ๐ก max for all ๐ก
+ The update function 93 We use the sequence of random numbers ๐ ๐ , ๐ ๐โ1 , โฆ , ๐ 0 We are updating the state at ๐ ๐ . We choose a site ๐ฆ in 0, ๐ 2 uniformly. exp 2๐พ ๐ฟ + ๐ฆ, ๐ก โ ๐ฟ โ ๐ฆ, ๐ก +1, ๐๐ ๐ ๐+1 < ๐ ๐+1 ๐ฆ = exp 2๐พ ๐ฟ + ๐ฆ, ๐ก โ ๐ฟ โ ๐ฆ, ๐ก + 1 โ1, ๐๐ขโ๐๐ ๐ฅ๐๐ก๐
+ Maintaining ordering 94 ๏ฎ Ordering after update (From ๐ ๐ to ๐ ๐+1 ) ๏ฎ We choose the same site ๐ฆ to update in both chains ๏ฎ The spin of ๐ฆ at ๐ ๐+1 depends on update function ๏ฎ We want to check exp 2๐พ ๐ฟ + ๐ฆ, ๐ก โฒ โ ๐ฟ โ ๐ฆ, ๐ก โฒ exp 2๐พ ๐ฟ + ๐ฆ, ๐ก โ ๐ฟ โ ๐ฆ, ๐ก โค exp 2๐พ ๐ฟ + ๐ฆ, ๐ก โฒ โ ๐ฟ โ ๐ฆ, ๐ก โฒ exp 2๐พ ๐ฟ + ๐ฆ, ๐ก โ ๐ฟ โ ๐ฆ, ๐ก + 1 + 1 ๏ฎ That is equivalent to checking ๐ฟ + ๐ฆ, ๐ก โ ๐ฟ โ ๐ฆ, ๐ก โค ๐ฟ + ๐ฆ, ๐ก โฒ โ ๐ฟ โ ๐ฆ, ๐ก โฒ
+ Maintaining ordering 95 ๏ฎ As ๐ก โค ๐ก โฒ we have ๏ฎ ๐ฟ + ๐ฆ, ๐ก โค ๐ฟ + ๐ฆ, ๐กโฒ ๏ฎ ๐ฟ โ ๐ฆ, ๐ก โฅ ๐ฟ โ ๐ฆ, ๐กโฒ ๏ฎ First equation minus second equation ๏ฎ ๐ฟ + ๐ฆ, ๐ก โ ๐ฟ โ ๐ฆ, ๐ก โค ๐ฟ + ๐ฆ, ๐กโฒ โ ๐ฟ โ ๐ฆ, ๐กโฒ
+ Ising Model L = 4 96 ๐ = 3.5 and ๐ = 512 ๐ = 4.8 and ๐ = 128
+ Ising Model ๐ = 8 97 ๐ = 5.9 and ๐ = 512
+ Ising Model ๐ = 16 98 ๐ = 5.3 and ๐ = 16 384
+ Summary 99 ๏ฎ Exact sampling ๏ฎ Markov chains monte carlo but when we converge ๏ฎ Propp Wilson with Sandwiching to rescue
+ Questions? 100
Recommend
More recommend