modern discrete probability vi spectral techniques
play

Modern Discrete Probability VI - Spectral Techniques Background S - PowerPoint PPT Presentation

Review Bounding the mixing time via the spectral gap Applications: random walk on cycle and hypercube Infinite networks Modern Discrete Probability VI - Spectral Techniques Background S ebastien Roch UWMadison Mathematics December 1,


  1. Review Bounding the mixing time via the spectral gap Applications: random walk on cycle and hypercube Infinite networks Modern Discrete Probability VI - Spectral Techniques Background S´ ebastien Roch UW–Madison Mathematics December 1, 2014 S´ ebastien Roch, UW–Madison Modern Discrete Probability – Spectral Techniques

  2. Review Bounding the mixing time via the spectral gap Applications: random walk on cycle and hypercube Infinite networks Review 1 Bounding the mixing time via the spectral gap 2 Applications: random walk on cycle and hypercube 3 Infinite networks 4 S´ ebastien Roch, UW–Madison Modern Discrete Probability – Spectral Techniques

  3. Review Bounding the mixing time via the spectral gap Applications: random walk on cycle and hypercube Infinite networks Mixing time I Theorem (Convergence to stationarity) Consider a finite state space V. Suppose the transition matrix P is irreducible, aperiodic and has stationary distribution π . Then, for all x , y, P t ( x , y ) → π ( y ) as t → + ∞ . For probability measures µ, ν on V , let their total variation distance be � µ − ν � TV := sup A ⊆ V | µ ( A ) − ν ( A ) | . Definition (Mixing time) The mixing time is t mix ( ε ) := min { t ≥ 0 : d ( t ) ≤ ε } , where d ( t ) := max x ∈ V � P t ( x , · ) − π ( · ) � TV . S´ ebastien Roch, UW–Madison Modern Discrete Probability – Spectral Techniques

  4. Review Bounding the mixing time via the spectral gap Applications: random walk on cycle and hypercube Infinite networks Mixing time II Definition (Separation distance) The separation distance is defined as 1 − P t ( x , y ) � � s x ( t ) := max , π ( y ) y ∈ V and we let s ( t ) := max x ∈ V s x ( t ) . Because both { π ( y ) } and { P t ( x , y ) } are non-negative and sum to 1, we have that s x ( t ) ≥ 0. Lemma (Separation distance v. total variation distance) d ( t ) ≤ s ( t ) . S´ ebastien Roch, UW–Madison Modern Discrete Probability – Spectral Techniques

  5. Review Bounding the mixing time via the spectral gap Applications: random walk on cycle and hypercube Infinite networks Mixing time III y P t ( x , y ) , Proof: Because 1 = � y π ( y ) = � � � � � � � π ( y ) − P t ( x , y ) P t ( x , y ) − π ( y ) = . y : P t ( x , y ) <π ( y ) y : P t ( x , y ) ≥ π ( y ) So � P t ( x , · ) − π ( · ) � TV = 1 � � � � π ( y ) − P t ( x , y ) � � 2 � y � � � π ( y ) − P t ( x , y ) = y : P t ( x , y ) <π ( y ) 1 − P t ( x , y ) � � � = π ( y ) π ( y ) y : P t ( x , y ) <π ( y ) ≤ s x ( t ) . S´ ebastien Roch, UW–Madison Modern Discrete Probability – Spectral Techniques

  6. Review Bounding the mixing time via the spectral gap Applications: random walk on cycle and hypercube Infinite networks Reversible chains Definition (Reversible chain) A transition matrix P is reversible w.r.t. a measure η if η ( x ) P ( x , y ) = η ( y ) P ( y , x ) for all x , y ∈ V . By summing over y , such a measure is necessarily stationary. S´ ebastien Roch, UW–Madison Modern Discrete Probability – Spectral Techniques

  7. Review Bounding the mixing time via the spectral gap Applications: random walk on cycle and hypercube Infinite networks Example I Recall: Definition (Random walk on a graph) Let G = ( V , E ) be a finite or countable, locally finite graph. Simple random walk on G is the Markov chain on V , started at an arbitrary vertex, which at each time picks a uniformly chosen neighbor of the current state. Let ( X t ) be simple random walk on a connected graph G . Then ( X t ) is reversible w.r.t. η ( v ) := δ ( v ) , where δ ( v ) is the degree of vertex v . S´ ebastien Roch, UW–Madison Modern Discrete Probability – Spectral Techniques

  8. Review Bounding the mixing time via the spectral gap Applications: random walk on cycle and hypercube Infinite networks Example II Definition (Random walk on a network) Let G = ( V , E ) be a finite or countable, locally finite graph. Let c : E → R + be a positive edge weight function on G . We call N = ( G , c ) a network . Random walk on N is the Markov chain on V , started at an arbitrary vertex, which at each time picks a neighbor of the current state proportionally to the weight of the corresponding edge. Any countable, reversible Markov chain can be seen as a random walk on a network (not necessarily locally finite) by setting c ( e ) := π ( x ) P ( x , y ) = π ( y ) P ( y , x ) for all e = { x , y } ∈ E . Let ( X t ) be random walk on a network N = ( G , c ) . Then ( X t ) is reversible w.r.t. η ( v ) := c ( v ) , where c ( v ) := � x ∼ v c ( v , x ) . S´ ebastien Roch, UW–Madison Modern Discrete Probability – Spectral Techniques

  9. Review Bounding the mixing time via the spectral gap Applications: random walk on cycle and hypercube Infinite networks Eigenbasis I We let n := | V | < + ∞ . Assume that P is irreducible and reversible w.r.t. its stationary distribution π > 0. Define � � f � 2 � f , g � π := π ( x ) f ( x ) g ( x ) , π := � f , f � π , x ∈ V � ( Pf )( x ) := P ( x , y ) f ( y ) . y We let ℓ 2 ( V , π ) be the Hilbert space of real-valued functions on V equipped with the inner product �· , ·� π (equivalent to the vector space ( R n , �· , ·� π ) ). Theorem There is an orthonormal basis of ℓ 2 ( V , π ) formed of eigenfunctions { f j } n j = 1 of P with real eigenvalues { λ j } n j = 1 . S´ ebastien Roch, UW–Madison Modern Discrete Probability – Spectral Techniques

  10. Review Bounding the mixing time via the spectral gap Applications: random walk on cycle and hypercube Infinite networks Eigenbasis II Proof: We work over ( R n , �· , ·� π ) . Let D π be the diagonal matrix with π on the diagonal. By reversibility, � � π ( x ) π ( y ) M ( x , y ) := π ( y ) P ( x , y ) = π ( x ) P ( y , x ) =: M ( y , x ) . So M = ( M ( x , y )) x , y = D 1 / 2 π PD − 1 / 2 , as a symmetric matrix, has real π j = 1 forming an orthonormal basis of R n with corresponding eigenvectors { φ j } n j = 1 . Define f j := D − 1 / 2 real eigenvalues { λ j } n φ j . Then π Pf j = PD − 1 / 2 φ j = D − 1 / 2 D 1 / 2 π PD − 1 / 2 φ j = D − 1 / 2 M φ j = λ j D − 1 / 2 φ j = λ j f j , π π π π π and � f i , f j � π = � D − 1 / 2 φ i , D − 1 / 2 φ j � π π π � π ( x )[ π ( x ) − 1 / 2 φ i ( x )][ π ( x ) − 1 / 2 φ j ( x )] = x = � φ i , φ j � . S´ ebastien Roch, UW–Madison Modern Discrete Probability – Spectral Techniques

  11. Review Bounding the mixing time via the spectral gap Applications: random walk on cycle and hypercube Infinite networks Eigenbasis III Lemma For all j � = 1 , � x π ( x ) f j ( x ) = 0 . Proof: By orthonormality, � f 1 , f j � π = 0. Now use the fact that f 1 ≡ 1. Let δ x ( y ) := ✶ { x = y } . Lemma For all x , y, � n j = 1 f j ( x ) f j ( y ) = π ( x ) − 1 δ x ( y ) . Proof: Using the notation of the theorem, the matrix Φ whose columns are the φ j s is unitary so ΦΦ ′ = I . That is, � n j = 1 φ j ( x ) φ j ( y ) = δ x ( y ) , or � n � π ( x ) π ( y ) f j ( x ) f j ( y ) = δ x ( y ) . Rearranging gives the result. j = 1 S´ ebastien Roch, UW–Madison Modern Discrete Probability – Spectral Techniques

  12. Review Bounding the mixing time via the spectral gap Applications: random walk on cycle and hypercube Infinite networks Eigenbasis IV Lemma Let g ∈ ℓ 2 ( V , π ) . Then g = � n j = 1 � g , f j � π f j . Proof: By the previous lemma, for all x n n � � � � π ( y ) g ( y )[ π ( x ) − 1 δ x ( y )] = g ( x ) . � g , f j � π f j ( x ) = π ( y ) g ( y ) f j ( y ) f j ( x ) = j = 1 j = 1 y y Lemma π = � n Let g ∈ ℓ 2 ( V , π ) . Then � g � 2 j = 1 � g , f j � 2 π . Proof: By the previous lemma, 2 � � n � n n � n � � � g � 2 � � � � � � π = � g , f j � π f j = � g , f i � π f i , � g , f j � π f j = � g , f i � π � g , f j � π � f i , f j � π , � � � � j = 1 i = 1 j = 1 i , j = 1 � � π π S´ ebastien Roch, UW–Madison Modern Discrete Probability – Spectral Techniques

  13. Review Bounding the mixing time via the spectral gap Applications: random walk on cycle and hypercube Infinite networks Eigenvalues I Let P be finite, irreducible and reversible. Lemma Any eigenvalue λ of P satisfies | λ | ≤ 1 . Proof: Pf = λ f = ⇒ | λ |� f � ∞ = � Pf � ∞ = max x | � y P ( x , y ) f ( y ) | ≤ � f � ∞ We order the eigenvalues 1 ≥ λ 1 ≥ · · · ≥ λ n ≥ − 1. In fact: Lemma We have λ 1 = 1 and λ 2 < 1 . Also we can take f 1 ≡ 1 . Proof: Because P is stochastic, the all-one vector is a right eigenvector with eigenvalue 1. Any eigenfunction with eigenvalue 1 is P -harmonic. By Corollary 3.22 for a finite, irreducible chain the only harmonic functions are the constant functions. So the eigenspace corresponding to 1 is one-dimensional. Since all eigenvalues are real, we must have λ 2 < 1. S´ ebastien Roch, UW–Madison Modern Discrete Probability – Spectral Techniques

Recommend


More recommend