Complexity Results for MCMC derived from Quantitative Bounds Jun Yang (joint work with Jeffrey S. Rosenthal) Department of Statistical Sciences University of Toronto SSC 2018, McGill University MCMC Complexity Bounds (J. Yang) 1
Motivation ◮ Quantitative bounds for MCMC e.g. Drift and Minorization, Rosenthal, JASA , 1995. ◮ Big data (high-dimensional setting) “Large p and large n ” or “large p and small n ”. ◮ Convergence complexity of MCMC e.g. “little is known for MCMC complexity” by Yang, Wainwright, Jordan, Ann. Stats. , 2016. ◮ Directly translating the quantitative bounds is problematic e.g. Rajaratnam and Sparks, arXiv , 2015: “We therefore hope that one consequence of our work will be to motivate the proposal and development of new ideas analogous to those of Rosenthal that are suitable for high-dimensional settings.” MCMC Complexity Bounds (J. Yang) 2
A Realistic MCMC Model Consider the following example: Y i | θ i ∼ N ( θ i , 1) , 1 ≤ i ≤ n , θ i | µ, A ∼ N ( µ, A ) , 1 ≤ i ≤ n , µ ∼ flat prior on R , A ∼ IG ( a , b ) . ◮ n observed data: ( Y 1 , . . . , Y n ); ◮ p = n + 2 states: x = ( A , µ, θ 1 , . . . , θ n ); ◮ Posterior distribution: π ( · ) = L ( A , µ, θ 1 , . . . , θ n | Y 1 , . . . , Y n ) . ◮ A Gibbs sampler was originally analyzed by Rosenthal in 1996. ◮ Directly translated complexity bound: Ω(exp( p )). MCMC Complexity Bounds (J. Yang) 3
Tight Complexity Bound for the Gibbs Sampler Consider the Gibbs sampler: θ (0) , A (0) � � µ (1) ∼ N ¯ , n � µ (1) + Y i A (0) A (0) � θ (1) ∼ N , , i 1 + A (0) 1 + A (0) � n � a + n − 1 , b + 1 A (1) ∼ IG � ( θ (1) − ¯ θ (1) ) 2 . i 2 2 i =1 We can show using the new approach: ◮ Mixing time is O (1) if choosing initial states : i =1 ( Y i − ¯ � n Y ) 2 θ (0) = ¯ A (0) = ¯ − 1 . Y , n − 1 ◮ Mixing time is O (log p ) if initial states are not “too bad”. MCMC Complexity Bounds (J. Yang) 4
Drift and Minorization Drift Condition For some function f : X → R + , some 0 < λ < 1, and b < ∞ E [ f ( X 1 ) | X 0 = x ] ≤ λ f ( x ) + b , ∀ x ∈ X . The KEY for good complexity bound: b and λ has small complexity order Generalized Drift Condition (Y. and Rosenthal, 2017) Let R ′ ∈ X be a large set, function f ( · ) satisfies E [ f ( X 1 ) | X 0 = x , X 1 ∈ R ′ ] ≤ E [ f ( X 1 ) | X 0 = x ] ∀ x ∈ R ′ , ≤ λ f ( x ) + b , MCMC Complexity Bounds (J. Yang) 5
Modified Drift and Minorization New Quantitative Bound (Y. and Rosenthal, 2017) ǫ is established by associated minorization condition, α and Λ are functions of λ and b ( α Λ) rk � � b − α rk 1 + 1 − λ + f ( x 0 ) � P k ( x 0 , · ) − π ( · ) � ≤ (1 − ǫ ) rk + α k − α rk k + k π (( R ′ ) c ) + � P i ( x 0 , ( R ′ ) c ) , ∀ 0 < r < 1 , i =1 The large set R ′ should be chosen to balance the two parts. The Gibbs sampler example For large enough n , we have k (1 + k ) k � P k ( x 0 , · ) − π ( · ) � ≤ C 1 γ k + C 2 + C 3 √ p , p which implies the mixing time is O (1). MCMC Complexity Bounds (J. Yang) 6
References ◮ Rosenthal, Minorization conditions and convergence rates for Markov chain Monte Carlo , JASA, 1995. ◮ Rajaratnam and Sparks, MCMC-based inference in the era of big data: A fundamental analysis of the convergence complexity of high-dimensional chains , arXiv:1508:00947, 2015. ◮ Yang, Wainwright, and Jordan, On the computational complexity of high-dimensional Bayesian variable selection , AoS, 2016. ◮ Y. and Rosenthal, Complexity results for MCMC derived from quantitative bounds , arXiv:1708.00829, 2017. MCMC Complexity Bounds (J. Yang) 7
Recommend
More recommend