Sampling from distributive lattices the Markov chain approach - PowerPoint PPT Presentation

Sampling from distributive lattices – the Markov chain approach Graduiertenkolleg MDS TU Berlin April 20., 2009 Stefan Felsner Technische Universit¨ at Berlin felsner@math.tu-berlin.de

Topics Markov Chain Monte Carlo Coupling and CFTP Distributive Lattices α -Orientations and Heights Block Coupling for Heights

The Sampling Problem • Ω a (large) finite set • µ : Ω → [ 0, 1 ] a probability distribution Problem. Sample from Ω according to µ . i.e., Pr ( output = ω ) = µ ( ω ) .

The Sampling Problem • Ω a (large) finite set • µ : Ω → [ 0, 1 ] a probability distribution Problem. Sample from Ω according to µ . i.e., Pr ( output = ω ) = µ ( ω ) . There are many hard instances of the sampling problem. Relaxation: Approximate sampling i.e., Pr ( output = ω ) = � µ ( ω ) for some � µ ≈ µ .

Applications of Sampling • Get hand on typical examples from Ω . • Approximate counting.

Preliminaries on Markov Chains M transition matrix • size Ω × Ω • entries ∈ [ 0, 1 ] • row sums = 1 (stochastic)

Preliminaries on Markov Chains M transition matrix • size Ω × Ω • entries ∈ [ 0, 1 ] • row sums = 1 (stochastic) Intuition: a 2 1 2 0 3 1 3 3 3 1 1 1 M = 2 4 4 2 1 0 1 3 3 2 1 c b 4 1 1 2 3 4 3 M specifies a random walk

Instance of a Markov Chains ( X 0 , X 1 , X 2 , . . . X r , . . . ) an instance of M • X i random variable with values in Ω • Pr ( X i + 1 = x | X i = s ) = M ( s, x ) Proposition. Probability distribution of X t is µ t with µ t = µ 0 M t

Ergodic Markov Chains M is ergodic (i.e., irreducible and aperiodic) = ⇒ multiplicity of eigenvalue 1 is one ⇒ unique π with π = π M . = Fundamental Theorem. t →∞ µ 0 M t = π . M ergodic = lim ⇒

Ergodic Markov Chains M is ergodic (i.e., irreducible and aperiodic) = ⇒ multiplicity of eigenvalue 1 is one ⇒ unique π with π = π M . = Fundamental Theorem. t →∞ µ 0 M t = π . M ergodic = lim ⇒ M symmetric and ergodic ⇒ M T ✶ T = M ✶ T = ✶ T , hence ✶ M = ✶ = = ⇒ π is the uniform distribution.

Example: Linear Extensions A Markov chain for linear extensions L t = x 1 , x 2 , . . . , x n the state at time t . • Choose i ∈ { 1, 2, . . . , n − 1 } uniformly. • If x i and x i + 1 are incomparable, then L t + 1 = x 1 , x 2 , . . . , x i − 1 , x i + 1 , x i , x i + 2 , . . . , x n Proposition. The chain is ergodic and symmetric.

Measuring Convergence Variation distance � � µ − µ ′ � VD := 1 | µ ( x ) − µ ′ ( x ) | 2 x ∈ Ω

Measuring Convergence Variation distance � � µ − µ ′ � VD := 1 | µ ( x ) − µ ′ ( x ) | 2 x ∈ Ω � µ − µ ′ � VD = max A ⊂ Ω ( µ ( A ) − µ ′ ( A )) Lemma. � µ = � µ ′ = 1 µ ′ B µ A ⇒ � A = � B =

Mixing Time x = δ x M t the distrib. after t steps starting in x µ t ∆ ( t ) := max ( � µ t x − π � VD : x ∈ Ω ) τ ( ε ) = min ( t : ∆ ( t ) ≤ ε ) • τ ( ε ) is the mixing time . • M is rapidly mixing ⇒ τ ( ε ) is a polynomial function ⇐ of the problem size and log ( ε − 1 ) .

Mixing Time and Eigenvalues • M stochastic = ⇒ | λ | ≤ 1 for all eigenvalues λ . • M lazy (i.e., m i,i ≥ 1/2 for all i ) = ⇒ λ ≥ 0 for all eigenvalues λ . • M ergodic = ⇒ multiplicity of eigenvalue 1 is one. • M symmetric ⇒ ONB of eigenvectors. = Proposition. Mixing time, i.e., Convergence rate to π , depends on second largest eigenvalue.

Coupling for Distributions µ , ν distributions on Ω . A distribution ω on Ω × Ω is a coupling of µ and ν ⇒ ω has µ and ν as marginals, i.e., ⇐ � y ω ( x, y ) = µ ( x ) for all x and � x ω ( x, y ) = ν ( y ) for all y . Coupling Lemma. ω a coupling of µ and ν and ( X, Y ) chosen from ω then � µ − ν � VD ≤ Pr ( X � = Y ) .

Coupling for Distributions Lemma. � µ − ν � VD ≤ Pr ( X � = Y ) . We use µ ( z ) = � Proof. y ω ( z, y ) ≥ ω ( z, z ) ν ( z ) = � x ω ( x, z ) ≥ ω ( z, z ) . Pr ( X � = Y ) = 1 − Pr ( X = Y ) � � = µ ( z ) − ω ( z, z ) z z � � ≥ µ ( z ) − min ( µ ( z ) , ν ( z )) z z � = µ ( z ) − ν ( z ) z : ν ≤ µ � � = max µ ( A ) − ν ( A ) = � µ − ν � VD A ⊂ Ω

Coupling for Markov Chains A coupling for M is a sequence ( Z 0 , Z 1 , Z 2 , . . . ) with Z i = ( X i , Y i ) such that ( X 0 , X 1 , X 2 , . . . ) and ( Y 0 , Y 1 , Y 2 , . . . ) are instances for M . In particular Pr ( X i + 1 = x ′ | Z i = ( x, y )) = Pr ( X i + 1 = x ′ | X i = x ) M ( x, x ′ ) =

Coupling and Mixing Times Z i = ( X i , Y i ) a coupling for M . Theorem [ D¨ oblin 1938 ]. � � If Pr X T � = Y T | Z 0 = ( x 0 , y 0 ) < ε for every initial ( x 0 , y 0 ) and T steps = ⇒ τ ( ε ) ≤ T Proof. Choose y 0 from stationary distribution π Y t is in stationary distribution π for all t X t is in distribution µ t x 0 . � � Pr X T � = Y T | Z 0 = ( x 0 , y 0 ) < ε ⇒ max x � µ T Coupling Lemma = x − π � VD < ε definition of τ = ⇒ τ ( ε ) ≤ T

Example : Linear Extensions of Width 2 Orders 4 8 3 3 4 7 2 1 2 6 5 1 6 7 8 5 Linear extensions are paths. The Markov chain and the coupling • choose position k and s ∈ { ↑ , ↓ } • Flip the path at position k in direction s (if possible)

Linear Extensions of Width 2 Orders the Analysis • dist ( X, Y ) = Area between paths ≤ n 2 • E ( dist ( X i + 1 , Y i + 1 )) ≤ dist ( X i , Y i ) The distance is a projection to a random walk on the line ⇒ expected coupling time O ( n 4 log n ). = ⇒ τ ( ε ) ∈ O ( n 4 log n log ε − 1 ) . =

Coupling From the Past M a Markov chain on Ω F a family of maps f : Ω → Ω such that for random f ∈ F : Pr ( f ( x ) = x ′ ) = M ( x, x ′ )

Coupling From the Past M a Markov chain on Ω F a family of maps f : Ω → Ω such that for random f ∈ F : Pr ( f ( x ) = x ′ ) = M ( x, x ′ ) Coupling-FTP F ← id Ω repeat choose f ∈ F at random F ← F ◦ f until F is a constant map return F ( x )

Coupling From the Past

Coupling From the Past Theorem. The state returned by Coupling-FTP is exactly( ! ) in the stationary distribution.

Monotone Coupling From the Past: An Example The problem with CFTP is the need of functions f on Ω .

Monotone Coupling From the Past: An Example The problem with CFTP is the need of functions f on Ω . Order relation < Ω on Ω with ^ 0 and ^ 1 • x < Ω x ′ = ⇒ f ( x ) < Ω f ( x ′ ) for all f ∈ F Example: Objects: Lattice path in a grid F = { f k,s : apply position k and direction s to all paths } This family is monotone!

Distributive Lattices Fact. L is a finite distributive lattice ⇐ ⇒ there is a poset P such that that L is isomorphic to the inclusion order on downsets of P . L P P 4 5 6 1 2 3

Markov Chains on Distributive Lattices A natural Markov chain on L P ( lattice walk ): Identify state with downset D • choose x ∈ P choose s ∈ { ↑ , ↓ } • depending on s move to D + x or D − x (if possible) Fact. The chain is ergodic and symmetric, i.e, π is uniform.

Monotone Coupling on Distributive Lattices The coupling family F : f x,s : Use element x and direction s for all D . Is monotone! = ⇒ uniform sampling from distributive lattices is easy.

Monotone Coupling on Distributive Lattices The coupling family F : f x,s : Use element x and direction s for all D . Is monotone! = ⇒ uniform sampling from distributive lattices is easy. Q: Is it fast (rapidly mixing)? A: In most cases not.

Slow Mixing • On distributive lattices based on Kleitman-Rothschild posets the mixing time of the lattice walk is exponential. • The mixing time of the lattice walk is exponential for random bipartite graphs with degrees ≥ 6 . (Dyer, Frieze and Jerrum)

Fast Mixing • The mixing time of the lattice walk is polynomial for random bipartite graphs with max-degree ≤ 4 . (Dyer and Greenhill) In several situations where planarity plays a role rapid mixing could be proven: • Monotone paths in the grid. • Lozenge tilings of an a × b × c hexagon. • Domino tilings of a rectangle.

alpha-Orientations Definition. Given G = ( V, E ) and α : V → IN . An α -orientation of G is an orientation with outdeg ( v ) = α ( v ) for all v . Example. Two orientations for the same α .

Potentials and Lattice Structure Definition. An α -potential for G is a mapping ℘ : Faces ( G ) → Z Z such that ℘ ( outer ) = 0 and • | ℘ ( C ) − ℘ ( C ′ ) | ≤ 1 , if C and C ′ share an edge e . • ℘ ( C l ( e ) ) ≤ ℘ ( C r ( e ) ) for all e relative to some fixed α -orientation. Lemma. There is a bijection between α -potentials and α -orientations.

Sampling from distributive lattices the Markov chain approach - PowerPoint PPT Presentation

Sampling from distributive lattices the Markov chain approach Graduiertenkolleg MDS TU Berlin April 20., 2009 Stefan Felsner Technische Universit at Berlin felsner@math.tu-berlin.de Topics Markov Chain Monte Carlo Coupling and CFTP

Sampling Methods Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 11 Sampling Rejection Sampling

Markov Chains Markov Processes Discrete-time Markov Chains Continuous-time Markov Chains Dr

Introduction to Priestley duality 1 / 24 Outline What is a distributive lattice? Priestley

Hidden Markov Models Discrete Markov Processes 1 Hidden Markov Models Hidden Markov Models 2

MAP 2010 - LOGRO NO November 8-12, 2010 Chain calculus and Krull dimension in distributive

Classifying Unification Problems in Preliminaries Algebraic Unifiers Distributive Lattices and

Outline Distributive Lattices and Markov Chains Coupling from the Past Mixing time on

Lattices from Codes or Codes from Lattices Amin Sakzad Dept of Electrical and Computer Systems

Markov Chain Monte Carlo Methods Michel Bierlaire michel.bierlaire@epfl.ch Transport and

Chapter 7. Sampling Chapter 7. Sampling methods? methods? Two types of sampling methods Two

Multiple importance sampling Slides for CS6630 lecture 6 sampling the BRDF sampling the

What is the strengths and weakness of these sampling methods? Sampling Strengths /

Markov chain Monte Carlo Dr. Jarad Niemi STAT 544 - Iowa State University April 2, 2018 Jarad

Part 3 Markov Chain Modeling Markov Chain Model Stochastic model Amounts to sequence of

Markov chains and Hidden Markov Models 9000 Markov chains and HMMs We will discuss: Markov

CSCE 471/871 Lecture 3: Markov Chains Markov Chains and and Hidden Markov Models Hidden

An abstract approach to finite Ramsey theory and a self-dual Ramsey theorem S lawomir Solecki

Approximate Ramsey properties of finite dimensional normed spaces. J. Lopez-Abad Instituto de

Dynamic Equicorrelation Bryan Kelly (Joint work with Rob Engle) The Problem with Covariances...

Logical limit laws in combinatorics Marc Noy Universitat Polit` ecnica de Catalunya Barcelona

On Inequality and the Poverty Line. Making the poverty line dependent on reference groups:

Monads, Partial Evaluations, and Rewriting Paolo Perrone Joint work with Tobias Fritz Max Planck

Roulette: Inheritance Case Study What are scenarios? Roulette involves a player, a wheel, and

Roulette: Inheritance Case Study Roulette involves a player, a wheel, and bets Real game