sampling from distributive lattices the markov chain
play

Sampling from distributive lattices the Markov chain approach - PowerPoint PPT Presentation

Sampling from distributive lattices the Markov chain approach Graduiertenkolleg MDS TU Berlin April 20., 2009 Stefan Felsner Technische Universit at Berlin felsner@math.tu-berlin.de Topics Markov Chain Monte Carlo Coupling and CFTP


  1. Sampling from distributive lattices – the Markov chain approach Graduiertenkolleg MDS TU Berlin April 20., 2009 Stefan Felsner Technische Universit¨ at Berlin felsner@math.tu-berlin.de

  2. Topics Markov Chain Monte Carlo Coupling and CFTP Distributive Lattices α -Orientations and Heights Block Coupling for Heights

  3. The Sampling Problem • Ω a (large) finite set • µ : Ω → [ 0, 1 ] a probability distribution Problem. Sample from Ω according to µ . i.e., Pr ( output = ω ) = µ ( ω ) .

  4. The Sampling Problem • Ω a (large) finite set • µ : Ω → [ 0, 1 ] a probability distribution Problem. Sample from Ω according to µ . i.e., Pr ( output = ω ) = µ ( ω ) . There are many hard instances of the sampling problem. Relaxation: Approximate sampling i.e., Pr ( output = ω ) = � µ ( ω ) for some � µ ≈ µ .

  5. Applications of Sampling • Get hand on typical examples from Ω . • Approximate counting.

  6. Preliminaries on Markov Chains M transition matrix • size Ω × Ω • entries ∈ [ 0, 1 ] • row sums = 1 (stochastic)

  7. Preliminaries on Markov Chains M transition matrix • size Ω × Ω • entries ∈ [ 0, 1 ] • row sums = 1 (stochastic) Intuition: a 2 1 2 0 3 1 3 3 3 1 1 1 M = 2 4 4 2 1 0 1 3 3 2 1 c b 4 1 1 2 3 4 3 M specifies a random walk

  8. Instance of a Markov Chains ( X 0 , X 1 , X 2 , . . . X r , . . . ) an instance of M • X i random variable with values in Ω • Pr ( X i + 1 = x | X i = s ) = M ( s, x ) Proposition. Probability distribution of X t is µ t with µ t = µ 0 M t

  9. Ergodic Markov Chains M is ergodic (i.e., irreducible and aperiodic) = ⇒ multiplicity of eigenvalue 1 is one ⇒ unique π with π = π M . = Fundamental Theorem. t →∞ µ 0 M t = π . M ergodic = lim ⇒

  10. Ergodic Markov Chains M is ergodic (i.e., irreducible and aperiodic) = ⇒ multiplicity of eigenvalue 1 is one ⇒ unique π with π = π M . = Fundamental Theorem. t →∞ µ 0 M t = π . M ergodic = lim ⇒ M symmetric and ergodic ⇒ M T ✶ T = M ✶ T = ✶ T , hence ✶ M = ✶ = = ⇒ π is the uniform distribution.

  11. Example: Linear Extensions A Markov chain for linear extensions L t = x 1 , x 2 , . . . , x n the state at time t . • Choose i ∈ { 1, 2, . . . , n − 1 } uniformly. • If x i and x i + 1 are incomparable, then L t + 1 = x 1 , x 2 , . . . , x i − 1 , x i + 1 , x i , x i + 2 , . . . , x n Proposition. The chain is ergodic and symmetric.

  12. Measuring Convergence Variation distance � � µ − µ ′ � VD := 1 | µ ( x ) − µ ′ ( x ) | 2 x ∈ Ω

  13. Measuring Convergence Variation distance � � µ − µ ′ � VD := 1 | µ ( x ) − µ ′ ( x ) | 2 x ∈ Ω � µ − µ ′ � VD = max A ⊂ Ω ( µ ( A ) − µ ′ ( A )) Lemma. � µ = � µ ′ = 1 µ ′ B µ A ⇒ � A = � B =

  14. Mixing Time x = δ x M t the distrib. after t steps starting in x µ t ∆ ( t ) := max ( � µ t x − π � VD : x ∈ Ω ) τ ( ε ) = min ( t : ∆ ( t ) ≤ ε ) • τ ( ε ) is the mixing time . • M is rapidly mixing ⇒ τ ( ε ) is a polynomial function ⇐ of the problem size and log ( ε − 1 ) .

  15. Mixing Time and Eigenvalues • M stochastic = ⇒ | λ | ≤ 1 for all eigenvalues λ . • M lazy (i.e., m i,i ≥ 1/2 for all i ) = ⇒ λ ≥ 0 for all eigenvalues λ . • M ergodic = ⇒ multiplicity of eigenvalue 1 is one. • M symmetric ⇒ ONB of eigenvectors. = Proposition. Mixing time, i.e., Convergence rate to π , depends on second largest eigenvalue.

  16. Topics Markov Chain Monte Carlo Coupling and CFTP Distributive Lattices α -Orientations and Heights Block Coupling for Heights

  17. Coupling for Distributions µ , ν distributions on Ω . A distribution ω on Ω × Ω is a coupling of µ and ν ⇒ ω has µ and ν as marginals, i.e., ⇐ � y ω ( x, y ) = µ ( x ) for all x and � x ω ( x, y ) = ν ( y ) for all y . Coupling Lemma. ω a coupling of µ and ν and ( X, Y ) chosen from ω then � µ − ν � VD ≤ Pr ( X � = Y ) .

  18. Coupling for Distributions Lemma. � µ − ν � VD ≤ Pr ( X � = Y ) . We use µ ( z ) = � Proof. y ω ( z, y ) ≥ ω ( z, z ) ν ( z ) = � x ω ( x, z ) ≥ ω ( z, z ) . Pr ( X � = Y ) = 1 − Pr ( X = Y ) � � = µ ( z ) − ω ( z, z ) z z � � ≥ µ ( z ) − min ( µ ( z ) , ν ( z )) z z � = µ ( z ) − ν ( z ) z : ν ≤ µ � � = max µ ( A ) − ν ( A ) = � µ − ν � VD A ⊂ Ω

  19. Coupling for Markov Chains A coupling for M is a sequence ( Z 0 , Z 1 , Z 2 , . . . ) with Z i = ( X i , Y i ) such that ( X 0 , X 1 , X 2 , . . . ) and ( Y 0 , Y 1 , Y 2 , . . . ) are instances for M . In particular Pr ( X i + 1 = x ′ | Z i = ( x, y )) = Pr ( X i + 1 = x ′ | X i = x ) M ( x, x ′ ) =

  20. Coupling and Mixing Times Z i = ( X i , Y i ) a coupling for M . Theorem [ D¨ oblin 1938 ]. � � If Pr X T � = Y T | Z 0 = ( x 0 , y 0 ) < ε for every initial ( x 0 , y 0 ) and T steps = ⇒ τ ( ε ) ≤ T Proof. Choose y 0 from stationary distribution π Y t is in stationary distribution π for all t X t is in distribution µ t x 0 . � � Pr X T � = Y T | Z 0 = ( x 0 , y 0 ) < ε ⇒ max x � µ T Coupling Lemma = x − π � VD < ε definition of τ = ⇒ τ ( ε ) ≤ T

  21. Example : Linear Extensions of Width 2 Orders 4 8 3 3 4 7 2 1 2 6 5 1 6 7 8 5 Linear extensions are paths. The Markov chain and the coupling • choose position k and s ∈ { ↑ , ↓ } • Flip the path at position k in direction s (if possible)

  22. Linear Extensions of Width 2 Orders the Analysis • dist ( X, Y ) = Area between paths ≤ n 2 • E ( dist ( X i + 1 , Y i + 1 )) ≤ dist ( X i , Y i ) The distance is a projection to a random walk on the line ⇒ expected coupling time O ( n 4 log n ). = ⇒ τ ( ε ) ∈ O ( n 4 log n log ε − 1 ) . =

  23. Coupling From the Past M a Markov chain on Ω F a family of maps f : Ω → Ω such that for random f ∈ F : Pr ( f ( x ) = x ′ ) = M ( x, x ′ )

  24. Coupling From the Past M a Markov chain on Ω F a family of maps f : Ω → Ω such that for random f ∈ F : Pr ( f ( x ) = x ′ ) = M ( x, x ′ ) Coupling-FTP F ← id Ω repeat choose f ∈ F at random F ← F ◦ f until F is a constant map return F ( x )

  25. Coupling From the Past

  26. Coupling From the Past Theorem. The state returned by Coupling-FTP is exactly( ! ) in the stationary distribution.

  27. Monotone Coupling From the Past: An Example The problem with CFTP is the need of functions f on Ω .

  28. Monotone Coupling From the Past: An Example The problem with CFTP is the need of functions f on Ω . Order relation < Ω on Ω with ^ 0 and ^ 1 • x < Ω x ′ = ⇒ f ( x ) < Ω f ( x ′ ) for all f ∈ F Example: Objects: Lattice path in a grid F = { f k,s : apply position k and direction s to all paths } This family is monotone!

  29. Topics Markov Chain Monte Carlo Coupling and CFTP Distributive Lattices α -Orientations and Heights Block Coupling for Heights

  30. Distributive Lattices Fact. L is a finite distributive lattice ⇐ ⇒ there is a poset P such that that L is isomorphic to the inclusion order on downsets of P . L P P 4 5 6 1 2 3

  31. Markov Chains on Distributive Lattices A natural Markov chain on L P ( lattice walk ): Identify state with downset D • choose x ∈ P choose s ∈ { ↑ , ↓ } • depending on s move to D + x or D − x (if possible) Fact. The chain is ergodic and symmetric, i.e, π is uniform.

  32. Monotone Coupling on Distributive Lattices The coupling family F : f x,s : Use element x and direction s for all D . Is monotone! = ⇒ uniform sampling from distributive lattices is easy.

  33. Monotone Coupling on Distributive Lattices The coupling family F : f x,s : Use element x and direction s for all D . Is monotone! = ⇒ uniform sampling from distributive lattices is easy. Q: Is it fast (rapidly mixing)? A: In most cases not.

  34. Slow Mixing • On distributive lattices based on Kleitman-Rothschild posets the mixing time of the lattice walk is exponential. • The mixing time of the lattice walk is exponential for random bipartite graphs with degrees ≥ 6 . (Dyer, Frieze and Jerrum)

  35. Fast Mixing • The mixing time of the lattice walk is polynomial for random bipartite graphs with max-degree ≤ 4 . (Dyer and Greenhill) In several situations where planarity plays a role rapid mixing could be proven: • Monotone paths in the grid. • Lozenge tilings of an a × b × c hexagon. • Domino tilings of a rectangle.

  36. Topics Markov Chain Monte Carlo Coupling and CFTP Distributive Lattices α -Orientations and Heights Block Coupling for Heights

  37. alpha-Orientations Definition. Given G = ( V, E ) and α : V → IN . An α -orientation of G is an orientation with outdeg ( v ) = α ( v ) for all v . Example. Two orientations for the same α .

  38. Potentials and Lattice Structure Definition. An α -potential for G is a mapping ℘ : Faces ( G ) → Z Z such that ℘ ( outer ) = 0 and • | ℘ ( C ) − ℘ ( C ′ ) | ≤ 1 , if C and C ′ share an edge e . • ℘ ( C l ( e ) ) ≤ ℘ ( C r ( e ) ) for all e relative to some fixed α -orientation. Lemma. There is a bijection between α -potentials and α -orientations.

Recommend


More recommend