advanced sampling algorithms mobashir mohammad hirak
play

+ Advanced Sampling Algorithms + Mobashir Mohammad Hirak Sarkar - PowerPoint PPT Presentation

+ Advanced Sampling Algorithms + Mobashir Mohammad Hirak Sarkar Parvathy Sudhir Yamilet Serrano Llerena Advanced Sampling Aditya Kulkarni Algorithms Tobias Bertelsen Nirandika Wanigasekara Malay Singh +


  1. + 2D Ising Model (3) 35 โˆ’๐›พ ๐‘ก ๐‘ฆ๐‘— ๐‘ก(๐‘ฆ๐‘˜) ๐‘ฆ๐‘—๐‘ฆ๐‘˜ โˆˆ๐น ๐‘“ โˆ’๐›พ๐ผ(๐‘ก) ๐‘“ ๏ฎ P( ๐‘ก ) = = ๐‘Ž ๐‘Ž ๏ฎ If ๐›พ = 0 all spin configurations have same probability ๏ฎ If ๐›พ > 0 lower energy preferred

  2. + Exact Inference is Hard 36 ๏ฎ Posterior distribution over ๐‘ก ๏ฎ If ๐‘ is any observed variable ๐‘ž ๐‘ก,๐‘ง ๏ฎ ๐‘„ ๐‘ก ๐‘) = ๐‘ž(๐‘ก,๐‘ง) ๏ฎ Intractable computation ๏ฎ Joint probability distribution - 2 ๐‘‚ possible combinations of ๐‘ก ๏ฎ Marginal probability distribution at a site ๏ฎ MAP estimation

  3. + The Big Question 37 ๏ฎ Given ๐œŒ on ๐‘‡ = ๐‘ก 1 , ๐‘ก 2 , โ€ฆ , ๐‘ก ๐‘œ simulate a random object with distribution ๐œŒ

  4. + Methods 38 ๏ฎ Generate random samples to estimate a quantity ๏ฎ Samples are generated โ€œMarkov - chain styleโ€ ๏ฎ Markov Chain Monte Carlo (MCMC) ๏ฎ Propp Wilson Simulation ๏ฎ Las Vegas variant ๏ฎ Sandwiching ๏ฎ Improvement on Propp Wilson

  5. + MARKOV CHAIN MONTE CARLO METHOD (MCMC) YAMILET R. SERRANO LL. 39

  6. + Recall 40 Given a probability distribution ฯ€ on S = {s 1 , โ€ฆ, s k }, how do we simulate a random object with distribution ฯ€ ?

  7. + Intuition 41 Given a probability distribution ฯ€ on S = {s 1 , โ€ฆ, s k }, how do we simulate a random object with distribution ฯ€ ? High Dimension Space

  8. + MCMC 42 1. Construct an irreducible and aperiodic Markov Chain [X 0 ,X 1 , โ€ฆ ], whose stationary distribution ฯ€ . 2. If we run the chain with arbitrary initial distribution then 1. The Markov Chain Convergence Theorem guarantees that the distribution of the chain at time n converges to ฯ€ . 3. Hence, if we run the chain for a sufficiently long time n, then the distribution of X n , will be very close to ฯ€ . So it can be used as a sample.

  9. + MCMC(2) 43 ๏ฎ Generally, two types of MCMC algorithm ๏ฎ Metropolis Hasting ๏ฎ Gibbs Sampling

  10. + Metropolis Hastings Algorithm 44 ๏ฎ Original method: ๏ฎ Metropolis, Rosenbluth, Rosenbluth, Teller and Teller (1953). ๏ฎ Generalized by Hasting in 1970. ๏ฎ Rediscovered by Tanner and Wong (1987) and Gelfang and Smith (1990) ๏ฎ Is one way to implement MCMC. Nicholas Metropolis

  11. + Metropolis Hastings Algorithm 45 Basic Idea GIVEN: A probability distribution ฯ€ on S = {s 1 , โ€ฆ, s k } GOAL : Approx. sample from ฯ€ Start with a proposal distribution Q(x,y) x: current state โ€ข Q(x,y) specifies transition of Markov Chain y: new proposal โ€ข Q(x,y) plays the role of the transition matrix By accepting/rejecting the proposal, MH simulates a Markov Chain, whose stationary distribution is ฯ€

  12. + Metropolis Hastings Algorithm 46 Algorithm A probability distribution ฯ€ on S = {s 1 , โ€ฆ, s k } GIVEN: Approx. sample from ฯ€ GOAL : Given current sample, ๏ƒผ Draw y from the proposal distribution, ๏ƒผ Draw U ~ (0,1) and update where the acceptance probability is

  13. + Metropolis Hastings Algorithm 47 The Ising Model Consider m sites around a circle. Each site i can have one of two spins x i ๏ƒŽ {-1,1} The target distribution :

  14. + Metropolis Hastings Algorithm The Ising Model Target Distribution: Proposal Distribution: 1. Randomly pick one out of the m spins 2. Flip its sign Acceptance probability (say i-th spin flipped):

  15. + Metropolis Hastings Algorithm 49 The Ising Model Image has been taken from Montecarlo investigation of the Ising Model(2006) โ€“ Tobin Fricke

  16. + Disadvantages of MCMC 50 ๏ฎ No matter how large n is taken to be in the MCMC algorithm, there will still be some discrepancy between the distribution of the output and the target distribution ฯ€ . ๏ฎ In order to make the previous error small, we need to figure out how large n needs to be.

  17. + Bounds on Convergence of MCMC Aditya Kulkarni 51

  18. + Seminar on probabilities 52 (Strasunbourge - 1983) ๏ฎ If it is difficult to obtain asymptotic bounds on the convergence time of MCMC algorithms for Ising models, use quantitative bounds ๏ฎ Use a characteristics of Ising model MCMC algorithm to obtain quantitative bounds David Aldous

  19. + What we need to ensure 53 ๏ฎ As we go along the time We approach to the stationary distribution in a monotonically decreasing fashion. ๐‘’ ๐‘ข โ†’ 0 as ๐‘ข โ†’ โˆž ๏ฎ When we stop The sample should follow a distribution which is not further apart from a stationary distribution by some factor ๐œป . ๐‘’ ๐‘ข โ‰ค ๐œ

  20. + Total Variation Distance || โ‹… || ๐‘ˆ๐‘Š 54 ๏ฎ Given two distributions ๐‘ž and ๐‘Ÿ over a finite set of states ๐‘‡ , ๏ฎ The total variation distance โ€– โ‹… โ€– ๐‘ˆ๐‘Š is 1 1 | ๐‘ž โˆ’ ๐‘Ÿ | ๐‘ˆ๐‘Š = 2 ||๐‘ž โˆ’ ๐‘Ÿ|| 1 = 2 ๐‘ฆโˆˆ๐‘‡ |๐‘ž ๐‘ฆ โˆ’ ๐‘Ÿ ๐‘ฆ |

  21. + Convergence Time 55 ๏ฎ Let ๐‘Œ = ๐‘Œ 0 , ๐‘Œ 1 , โ€ฆ be a Markov Chain of a state space with stationary distribution ๐œŒ ๏ฎ We define ๐‘’ ๐‘ข as the worst total variation distance at time ๐‘ข . ๐‘’ ๐‘ข = ๐‘›๐‘๐‘ฆ ๐‘ฆโˆˆ๐‘Š ||๐‘„ ๐‘Œ ๐‘ข ๐‘Œ 0 = ๐‘ฆ โˆ’ ๐œŒ|| ๐‘ˆ๐‘Š ๏ฎ The mixing time ๐‘ข is the minimum time ๐‘ข such that ๐‘’ ๐‘ข is at most ๐œ— . ฯ„ (๐œ) = min {๐‘ข โˆถ ๐‘’ ๐‘ข โ‰ค ๐œ} 1 2๐‘“ = min {๐‘ข โˆถ ๐‘’ ๐‘ข โ‰ค 1 2๐‘“ } ฯ„ = ฯ„ ๏ฎ Define

  22. + Quantitative results 56 1 ๐’† ๐’– ๐Ÿ ๐Ÿ‘๐’‡ ๐œป ฯ„ ฯ„ (๐œป) 0 ๐’–

  23. + Lemma 57 ๏ฎ Consider two random walks started from state i and state j Define ๐œ ๐‘ข as the worst total variation distance between their respective probability distributions at time ๐‘ข a. ๐œ ๐‘ข โ‰ค 2๐‘’ ๐‘ข b. ๐‘’(๐‘ข) is decreasing

  24. + Upper bound on d ( ๐‘ข ) proof 58 From part a and the definition of ฯ„ = ฯ„ 1 2๐‘“ = min {๐‘ข โˆถ ๐‘’ ๐‘ข โ‰ค 1 2๐‘“ } we get ๐œ ๐œ โ‰ค ๐‘“ โˆ’1 Also from part b, we deduce that upper bound of ๐œ at a particular time ๐‘ข 0 gives upper bound for later times ๐œ ๐‘ข โ‰ค (๐œ(๐‘ข 0 )) ๐‘œ ; ๐‘œ๐‘ข 0 โ‰ค ๐‘ข โ‰ค (๐‘œ + 1)๐‘ข 0 ๐‘ข ๐‘ข 0 โˆ’1) ๐œ ๐‘ข โ‰ค (๐œ(๐‘ข 0 )) ( Substitute ๐œ instead of ๐‘ข 0 ๐‘ข ๐œ โˆ’1) ๐œ ๐‘ข โ‰ค (๐œ(๐œ)) ( ๐‘ข ฯ„ , ๐‘ข โ‰ฅ 0 ๐‘’ ๐‘ข โ‰ค exp 1 โˆ’

  25. + Upper bound on ฯ„ ( ๐œ ) proof 59 ๏ฎ Algebraic calculations: ๐‘ข ฯ„ ๐‘’ ๐‘ข โ‰ค exp 1 โˆ’ log ๐‘’(๐‘ข) โ‰ค 1 โˆ’ ๐‘ข ๐œ ๐‘ข 1 ๐‘’(๐‘ข) ) ๐œ โ‰ค 1 โˆ’ log ๐‘’ ๐‘ข โ‰ค 1 + log( ๐‘ข โ‰ค ๐œ โˆ— (1 + log( 1 ๐œ )) ๐œ ๐œ โ‰ค 2๐‘“๐‘ข โˆ— (1 + log( 1 ๐œ ))

  26. + Upper bounds 60 ๐‘ข ฯ„ ), ๐‘ข โ‰ฅ 0 d ๐‘ข โ‰ค min(1, exp 1 โˆ’ 1 ๐œ ), 0 < ๐œ < 1 ฯ„ ( ๐œ ) โ‰ค ฯ„ โˆ— (1 + log ฯ„ ( ๐œ ) โ‰ค 2๐‘“๐‘ข โˆ— (1 + log 1 ๐œ ), 0 < ๐œ < 1

  27. + Entropy of initial distribution 61 Measure of randomness: ๐‘“๐‘œ๐‘ข ๐œˆ = โˆ’ ๐œˆ ๐‘ฆ log ๐œˆ ๐‘ฆ ๐‘ฆโˆˆ๐‘Š ๐‘ฆ is initial state ๐œˆ is initial probability distribution

  28. + Few more lemmaโ€™s 62 1. Let (๐‘Œ ๐‘ข ) is a random walk associated with ๐œˆ , and ๐œˆ ๐‘ข be the distribution of ๐‘Œ ๐‘ข then ๐‘“๐‘œ๐‘ข ๐œˆ ๐‘ข โ‰ค ๐‘ข โ‹… ๐‘“๐‘œ๐‘ข(๐œˆ) 2. If ๐‘ค is a distribution of ๐‘‡ such that ๐‘ค โˆ’ ๐œŒ โ‰ค ๐œ then ๐‘“๐‘œ๐‘ข ๐‘ค โ‰ฅ (1 โˆ’ ๐œ) log |๐‘‡|

  29. + Lower bound on ๐‘’(๐‘ข) proof 63 ๏ฎ From lemma 2, ๐‘“๐‘œ๐‘ข ๐‘ค โ‰ฅ 1 โˆ’ ๐œ log ๐‘‡ ๐‘“๐‘œ๐‘ข ๐‘ค log |๐‘‡| โ‰ฅ 1 โˆ’ ๐œ โ‰ฅ (1 โˆ’ ๐‘’(๐‘ข)) ๐‘’ ๐‘ข โ‰ฅ 1 โˆ’ ๐‘“๐‘œ๐‘ข ๐‘ค log |๐‘‡| ๐‘ขโ‹…๐‘“๐‘œ๐‘ข ๐œˆ ๐‘’ ๐‘ข โ‰ฅ 1 โˆ’ log |๐‘‡|

  30. + Lower bound on ฯ„ ( ฮต ) proof 64 ๏ฎ From lemma 1, ๐‘“๐‘œ๐‘ข ๐œˆ ๐‘ข โ‰ค ๐‘ข ๐‘“๐‘œ๐‘ข(๐œˆ) ๐‘“๐‘œ๐‘ข ๐œˆ ๐‘ข ๐‘ข โ‰ฅ ๐‘“๐‘œ๐‘ข(๐œˆ) ๏ฎ From lemma 2, ๐‘“๐‘œ๐‘ข ๐œˆ ๐‘ข โ‰ฅ (1 โˆ’ ๐œ) log |๐‘‡| ๐‘ข โ‰ฅ 1 โˆ’ ๐œ log |๐‘‡| ๐‘“๐‘œ๐‘ข(๐œˆ) ฯ„ ( ๐œ ) โ‰ฅ 1 โˆ’ ๐œ log |๐‘‡| ๐‘“๐‘œ๐‘ข(๐œˆ)

  31. + 65 Lower bounds ๐‘’ ๐‘ข โ‰ฅ 1 โˆ’ ๐‘ข โ‹… ๐‘“๐‘œ๐‘ข(๐œˆ) log ๐‘‡ 1 โˆ’ ๐œ log |๐‘‡| ฯ„ ๐œ โ‰ฅ ๐‘“๐‘œ๐‘ข(๐œˆ)

  32. + Propp Wilson Tobias Bertelsen 66

  33. + An exact version of MCMC 67 ๏ฎ Problems with MCMC A. Have accuracy error, which depends on starting state We must know number of iterations B. ๏ฎ James Propp and David Wilson propos Coupling from the past (1996) ๏ฎ A.k.a. the Propp-Wilson algorithm ๏ฎ Idea: ๏ฎ Solve problems by running chain infinitely

  34. + An exact version of MCMC 68 Theoretical โ€ข Runs all configurations infinitely โ€ข Literarily takes โˆž time โ€ข Impossible Coupling from the past โ€ข Runs all configurations for finite time โ€ข Might take 1000โ€™s of years โ€ข Infeasible Sandwiching โ€ข Run few configurations for finite time โ€ข Takes seconds โ€ข Practicable

  35. + Theoretical exact sampling 69 ๏ฎ Recall the convergence theorem: ๏ฎ We will approach the stationary distribution as the number of steps goes to infinity ๏ฎ Intuitive approach: ๏ฎ To sample perfectly we start a chain and run for infinity ๏ฎ Start at ๐‘ข = 0 , sample at ๐‘ข = โˆž ๏ฎ Problem: We never get a sample ๏ฎ Alternative approach: ๏ฎ To sample perfectly we take a chain that have already been running for an infinite amount of time ๏ฎ Start at ๐‘ข = โˆ’โˆž , sample at ๐‘ข = 0

  36. + Theoretical independence of 70 starting state ๏ฎ Sample from a Markov chain in MCMC depends solely on ๏ฎ Starting state ๐‘Œ โˆ’โˆž ๏ฎ Sequence of random numbers ๐‘‰ ๏ฎ We want to be independent of the starting state. ๏ฎ For a given sequence of random numbers ๐‘‰ โˆ’โˆž , โ€ฆ , ๐‘‰ โˆ’1 we want to ensure that the starting state ๐‘Œ โˆ’โˆž has no effect on ๐‘Œ 0

  37. + Theoretical independence of 71 starting state Collisions: ๏ฎ For a given ๐‘‰ โˆ’โˆž , โ€ฆ , ๐‘‰ โˆ’1 if two Markov chains is at the same state at some ๐‘ขโ€ฒ the will continue on together:

  38. + Coupling from the past 72 ๏ฎ At some finite past time ๐‘ข = โˆ’๐‘‚ ๏ฎ All past chains has already run infinitively and has coupled into one โˆž โˆ’ ๐‘‚ = โˆž ๏ฎ We want to continue that coupled chain to ๐‘ข = 0 ๏ฎ But we donโ€™t know which at state they will be at ๐‘ข = โˆ’๐‘‚ ๏ฎ Run all states from โˆ’๐‘‚ instead of โˆ’โˆž

  39. + Coupling from the past 73 Let ๐‘‰ = ๐‘‰ โˆ’1 , ๐‘‰ โˆ’2 , ๐‘‰ โˆ’3 , โ€ฆ be the sequence of independnent 1. uniformly random numbers For ๐‘‚ ๐‘˜ โˆˆ 1,2,4,8, โ€ฆ 2. Extend ๐‘‰ to length ๐‘œ ๐‘˜ , keeping ๐‘‰ โˆ’1 , โ€ฆ , ๐‘‰ ๐‘œ ๐‘˜โˆ’1 the same 1. Start one chain from each state at ๐‘ข = โˆ’๐‘‚ 2. ๐‘˜ For ๐‘ข from โˆ’๐‘‚ ๐‘˜ to zero: Simulate the chains using ๐‘‰ ๐‘ข 3. If all chains has converged at ๐‘‡ ๐‘— at ๐‘ข = 0 , return ๐‘‡ ๐‘— 4. Else repeat loop 5.

  40. + Coupling from the past 74

  41. + Questions 75 ๏ฎ Why do we double the lengths? ๏ฎ Worst case < 4๐‘‚ ๐‘๐‘ž๐‘ข steps, where ๐‘‚ ๐‘๐‘ž๐‘ข is the minimal ๐‘‚ at which we can achieve convergence ๏ฎ Compare to N โˆˆ [1,2,3,4,5, โ€ฆ ] , ๐‘ƒ ๐‘‚ ๐‘๐‘ž๐‘ข 2 steps ๏ฎ Why do we have to use the same random numbers? ๏ฎ Different samples might take longer or shorter to converge. ๏ฎ We must evaluate the same sample in each iteration. ๏ฎ The sample should only be dependent on ๐‘‰ not the different ๐‘‚

  42. + Using the same ๐‘‰ 76 ๏ฎ We have ๐‘™ states with the following update function 1 1 ๐‘‡ ๐‘— , ๐‘‰ = ๐‘‡ 1 ๐‘‰ < ๐œš ๐‘‡ ๐‘™ , ๐‘‰ = ๐‘‡ 1 ๐‘‰ < 2 2 ๐‘‡ ๐‘—+1 ๐‘๐‘ขโ„Ž๐‘“๐‘ ๐‘ฅ๐‘—๐‘ก๐‘“ ๐‘‡ ๐‘™ ๐‘๐‘ขโ„Ž๐‘“๐‘ ๐‘ฅ๐‘—๐‘ก๐‘“ 1 2 ๏ฎ ๐œŒ ๐‘‡ ๐‘— = 2 ๐‘— , ๐œŒ ๐‘™ = 2 ๐‘™

  43. + 77 Using the same ๐‘‰ ๏ฎ The probability of ๐‘‡ 1 only depends on the last random number: ๐‘„ ๐‘‡ 1 = ๐‘„ ๐‘‰ โˆ’1 < 1 = 1 2 2 ๏ฎ Lets assume we generate a new ๐‘‰ for each run: ๐‘‰ 1 , ๐‘‰ 2 , ๐‘‰ 3 , โ€ฆ ๐‘ธ ๐‘ป ๐Ÿ acc. ๐’ ๐‘ฝ ๐‘ธ ๐‘ป ๐Ÿ 1 1 < 1 1 50 % ๐‘‰ โˆ’1 ๐‘„ ๐‘‰ โˆ’1 2 2 , ๐‘‰ โˆ’1 2 2 < 1 1 < 1 2 ๐‘‰ โˆ’2 75 % ๐‘„ ๐‘‰ โˆ’1 2 โˆจ ๐‘‰ โˆ’1 2 3 , โ€ฆ , ๐‘‰ โˆ’1 3 3 < 1 2 < 1 1 < 1 4 81.25 % ๐‘‰ โˆ’4 ๐‘„ ๐‘‰ โˆ’1 2 โˆจ ๐‘‰ โˆ’1 2 โˆจ ๐‘‰ โˆ’1 2 4 , โ€ฆ , ๐‘‰ โˆ’1 4 ] 3 < 1 2 < 1 1 < 1 4 < 1 8 [๐‘‰ โˆ’8 81.64 % ๐‘„ ๐‘‰ โˆ’1 2 โˆจ ๐‘‰ โˆ’1 2 โˆจ ๐‘‰ โˆ’1 2 โˆจ ๐‘‰ 1โˆ’ 2

  44. + 78 Using the same ๐‘‰ ๏ฎ The probability of ๐‘‡ 1 only depends on the last random number: ๐‘„ ๐‘‡ 1 = ๐‘„ ๐‘‰ โˆ’1 < 1 = 1 2 2 ๏ฎ Lets instead use the same ๐‘‰ for each run: ๐‘‰ 1 ๐‘ธ ๐‘ป ๐Ÿ acc. ๐’ ๐‘ฝ ๐‘ธ ๐‘ป ๐Ÿ 1 1 < 1 1 50 % ๐‘‰ โˆ’1 ๐‘„ ๐‘‰ โˆ’1 2 1 , ๐‘‰ โˆ’1 1 1 < 1 2 ๐‘‰ โˆ’2 50 % ๐‘„ ๐‘‰ โˆ’1 2 1 , โ€ฆ , ๐‘‰ โˆ’1 1 1 < 1 4 50 % ๐‘‰ โˆ’4 ๐‘„ ๐‘‰ โˆ’1 2 1 , โ€ฆ , ๐‘‰ โˆ’1 1 ] 1 < 1 8 [๐‘‰ โˆ’8 50 % ๐‘„ ๐‘‰ โˆ’1 2

  45. + Problem 79 ๏ฎ In each step we update up to ๐‘™ chains ๏ฎ Total execution time is ๐‘ƒ ๐‘‚ ๐‘๐‘ž๐‘ข ๐‘™ ๏ฎ BUT ๐‘™ = 2 ๐‘€ 2 in the Ising model ๏ฎ Worse than the naรฏve approach ๐‘ƒ(๐‘™)

  46. + Sandwiching Nirandika Wanigasekara 80

  47. + Sandwiching 81 โ€ข Many vertices ๏ƒ  running ๐‘™ Propp Wilson chains will take time algorithm โ€ข Impractical for large ๐‘™ Choose a โ€ข Can we still get the same relatively small results? state space โ€ข Try sandwiching

  48. + Sandwiching 82 ๏ฎ Idea ๏ฎ Find two chains bounding all other chains ๏ฎ If we have such two boundary chains ๏ฎ Check if those two chains converge ๏ฎ Then all other chains have also converged

  49. + Sandwiching 83 ๏ฎ To come up with the boundary chains ๏ฎ Need a way to order the states ๏ฎ ๐‘‡ 1 โ‰ค ๐‘‡ 2 โ‰ค ๐‘‡ 3 โ€ฆ ๏ฎ Chain in higher state does not cross a chain in lower state 2 2 2 2 2 2 1 1 1 1 1 1 ๏ฎ Results in a Markov Chain obeying certain monotonicity properties

  50. + Sandwiching 84 ๏ฎ Lets consider ๏ฎ A fixed set of states ๏ƒ  ๐‘™ ๏ฎ State space ๐‘‡ = {1, โ€ฆ . , ๐‘™} ๏ฎ A transition matrix 12 = 1 ๏ฎ ๐‘„ 11 = ๐‘„ 2 ๏ฎ ๐‘„ ๐‘™๐‘™ = ๐‘„ ๐‘™,๐‘™โˆ’1 = 1 2 ๏ฎ ๐‘”๐‘๐‘  ๐‘— =2, โ€ฆ.. ๐‘™ โˆ’ 1, ๐‘„ ๐‘—,๐‘—โˆ’1 = ๐‘„ ๐‘—,๐‘—+1 = 1 2 ๏ฎ All the other entries are 0

  51. + Sandwiching 85 ๏ฎ What is this Markov Chain doing? ๏ฎ Take one step up and one step down the ladder, at each integer 1 time, with probability 2 ๏ฎ If at the top or the bottom(state ๐‘™ ) ๏ฎ It will stays where it is ๏ฎ Ladder Walk on ๐’ vertices ๏ฎ The stationary distribution ๐œŒ of this Markov Chain 1 ๏ฎ ๐œŒ ๐‘— = ๐‘™ ๐‘”๐‘๐‘  ๐‘— = 1, โ€ฆ . , ๐‘™

  52. + Sandwiching 86 ๏ฎ Propp-Wilson for this Markov Chain with ๏ฎ Valid update function ๐œš ๏ฎ if u < 1 2 then step down ๏ฎ if ๐‘ฃ โ‰ฅ 1 2 then step up ๏ฎ negative starting times ๐‘‚ 1 , ๐‘‚ 2 , โ€ฆ = 1, 2, 4, 8, โ€ฆ ๏ฎ ๏ฎ States ๐‘™ = 5

  53. + Sandwiching 87

  54. + Sandwiching 88 ๏ฎ Update function preserves ordering between states ๏ฎ for all ๐‘‰ โˆˆ 0, 1 ๐‘๐‘œ๐‘’ ๐‘๐‘š๐‘š ๐‘—, ๐‘˜ โˆˆ 1, โ€ฆ . , ๐‘™ ๐‘ก๐‘ฃ๐‘‘โ„Ž ๐‘ขโ„Ž๐‘๐‘ข ๐‘— โ‰ค ๐‘˜ ๐‘ฅ๐‘“ โ„Ž๐‘๐‘ค๐‘“ ๐œš ๐‘—, ๐‘‰ โ‰ค ๐œš(๐‘˜, ๐‘‰) ๏ฎ It is sufficient to run only 2 chains rather than k

  55. + Sandwiching 89 ๏ฎ Are these conditions always met ๏ฎ No, not always ๏ฎ But, there are frequent instances where these conditions are met ๏ฎ Especially useful when k is large ๏ฎ Ising model is a good example for this

  56. + Ising Model Malay Singh 90

  57. + 2D Ising Model 91 ๏ฎ A grid with sites numbered from 1 to ๐‘€ 2 where ๐‘€ is grid size. ๏ฎ Each site ๐‘— can have spin ๐‘ฆ ๐‘— โˆˆ โˆ’1, +1 ๏ฎ ๐‘Š is set of sites ๐‘‡ = โˆ’1,1 ๐‘Š defines all possible configurations (states) ๏ฎ Magnetisation ๐‘› for a state ๐‘ก is ๐‘— ๐‘ก ๐‘ฆ ๐‘— ๏ฎ ๐‘› ๐‘ก = ๐‘€ 2 ๏ฎ The energy of a state ๐‘ก ๏ฎ H ๐‘ก = โˆ’ ๐‘—๐‘˜ ๐‘ก ๐‘ฆ ๐‘— ๐‘ก ๐‘ฆ ๐‘˜

  58. + Ordering of states 92 ๏ฎ For two states ๐‘ก, ๐‘ก โ€ฒ we say s โ‰ค ๐‘ก โ€ฒ if ๐‘ก ๐‘ฆ < ๐‘ก โ€ฒ ๐‘ฆ โˆ€ ๐‘ฆ โˆˆ ๐‘Š Maxima ๐‘› = 1 Minima ๐‘› = โˆ’1 ๐‘ก min x = โˆ’1 โˆ€๐‘ฆ โˆˆ ๐‘Š ๐‘กmax x = +1 โˆ€๐‘ฆ โˆˆ ๐‘Š Hence ๐‘ก min โ‰ค ๐‘ก โ‰ค ๐‘ก max for all ๐‘ก

  59. + The update function 93 We use the sequence of random numbers ๐‘‰ ๐‘œ , ๐‘‰ ๐‘œโˆ’1 , โ€ฆ , ๐‘‰ 0 We are updating the state at ๐‘Œ ๐‘œ . We choose a site ๐‘ฆ in 0, ๐‘€ 2 uniformly. exp 2๐›พ ๐ฟ + ๐‘ฆ, ๐‘ก โˆ’ ๐ฟ โˆ’ ๐‘ฆ, ๐‘ก +1, ๐‘—๐‘” ๐‘‰ ๐‘œ+1 < ๐‘Œ ๐‘œ+1 ๐‘ฆ = exp 2๐›พ ๐ฟ + ๐‘ฆ, ๐‘ก โˆ’ ๐ฟ โˆ’ ๐‘ฆ, ๐‘ก + 1 โˆ’1, ๐‘๐‘ขโ„Ž๐‘“๐‘ ๐‘ฅ๐‘—๐‘ก๐‘“

  60. + Maintaining ordering 94 ๏ฎ Ordering after update (From ๐‘Œ ๐‘œ to ๐‘Œ ๐‘œ+1 ) ๏ฎ We choose the same site ๐‘ฆ to update in both chains ๏ฎ The spin of ๐‘ฆ at ๐‘Œ ๐‘œ+1 depends on update function ๏ฎ We want to check exp 2๐›พ ๐ฟ + ๐‘ฆ, ๐‘ก โ€ฒ โˆ’ ๐ฟ โˆ’ ๐‘ฆ, ๐‘ก โ€ฒ exp 2๐›พ ๐ฟ + ๐‘ฆ, ๐‘ก โˆ’ ๐ฟ โˆ’ ๐‘ฆ, ๐‘ก โ‰ค exp 2๐›พ ๐ฟ + ๐‘ฆ, ๐‘ก โ€ฒ โˆ’ ๐ฟ โˆ’ ๐‘ฆ, ๐‘ก โ€ฒ exp 2๐›พ ๐ฟ + ๐‘ฆ, ๐‘ก โˆ’ ๐ฟ โˆ’ ๐‘ฆ, ๐‘ก + 1 + 1 ๏ฎ That is equivalent to checking ๐ฟ + ๐‘ฆ, ๐‘ก โˆ’ ๐ฟ โˆ’ ๐‘ฆ, ๐‘ก โ‰ค ๐ฟ + ๐‘ฆ, ๐‘ก โ€ฒ โˆ’ ๐ฟ โˆ’ ๐‘ฆ, ๐‘ก โ€ฒ

  61. + Maintaining ordering 95 ๏ฎ As ๐‘ก โ‰ค ๐‘ก โ€ฒ we have ๏ฎ ๐ฟ + ๐‘ฆ, ๐‘ก โ‰ค ๐ฟ + ๐‘ฆ, ๐‘กโ€ฒ ๏ฎ ๐ฟ โˆ’ ๐‘ฆ, ๐‘ก โ‰ฅ ๐ฟ โˆ’ ๐‘ฆ, ๐‘กโ€ฒ ๏ฎ First equation minus second equation ๏ฎ ๐ฟ + ๐‘ฆ, ๐‘ก โˆ’ ๐ฟ โˆ’ ๐‘ฆ, ๐‘ก โ‰ค ๐ฟ + ๐‘ฆ, ๐‘กโ€ฒ โˆ’ ๐ฟ โˆ’ ๐‘ฆ, ๐‘กโ€ฒ

  62. + Ising Model L = 4 96 ๐‘ˆ = 3.5 and ๐‘‚ = 512 ๐‘ˆ = 4.8 and ๐‘‚ = 128

  63. + Ising Model ๐‘€ = 8 97 ๐‘ˆ = 5.9 and ๐‘‚ = 512

  64. + Ising Model ๐‘€ = 16 98 ๐‘ˆ = 5.3 and ๐‘‚ = 16 384

  65. + Summary 99 ๏ฎ Exact sampling ๏ฎ Markov chains monte carlo but when we converge ๏ฎ Propp Wilson with Sandwiching to rescue

  66. + Questions? 100

Recommend


More recommend