Metropolis-Hastings algorithm Dr. Jarad Niemi STAT 544 - Iowa State University April 2, 2019 Jarad Niemi (STAT544@ISU) Metropolis-Hastings April 2, 2019 1 / 32
Outline Metropolis-Hastings algorithm Independence proposal Random-walk proposal Optimal tuning parameter Binomial example Normal example Binomial hierarchical example Jarad Niemi (STAT544@ISU) Metropolis-Hastings April 2, 2019 2 / 32
Metropolis-Hastings algorithm Metropolis-Hastings algorithm Let p ( θ | y ) be the target distribution and θ ( t ) be the current draw from p ( θ | y ) . The Metropolis-Hastings algorithm performs the following 1. propose θ ∗ ∼ g ( θ | θ ( t ) ) 2. accept θ ( t +1) = θ ∗ with probability min { 1 , r } where r = r ( θ ( t ) , θ ∗ ) = p ( θ ∗ | y ) /g ( θ ∗ | θ ( t ) ) p ( θ ( t ) | y ) /g ( θ ( t ) | θ ∗ ) = p ( θ ∗ | y ) g ( θ ( t ) | θ ∗ ) p ( θ ( t ) | y ) g ( θ ∗ | θ ( t ) ) otherwise, set θ ( t +1) = θ ( t ) . Jarad Niemi (STAT544@ISU) Metropolis-Hastings April 2, 2019 3 / 32
Metropolis-Hastings algorithm Metropolis-Hastings algorithm Suppose we only know the target up to a normalizing constant, i.e. p ( θ | y ) = q ( θ | y ) /q ( y ) where we only know q ( θ | y ) . The Metropolis-Hastings algorithm performs the following 1. propose θ ∗ ∼ g ( θ | θ ( t ) ) 2. accept θ ( t +1) = θ ∗ with probability min { 1 , r } where r = r ( θ ( t ) , θ ∗ ) = p ( θ ∗ | y ) g ( θ ∗ | θ ( t ) ) = q ( θ ∗ | y ) /q ( y ) g ( θ ( t ) | θ ∗ ) g ( θ ( t ) | θ ∗ ) g ( θ ∗ | θ ( t ) ) = q ( θ ∗ | y ) g ( θ ( t ) | θ ∗ ) p ( θ ( t ) | y ) q ( θ ( t ) | y ) /q ( y ) q ( θ ( t ) | y ) g ( θ ∗ | θ ( t ) ) otherwise, set θ ( t +1) = θ ( t ) . Jarad Niemi (STAT544@ISU) Metropolis-Hastings April 2, 2019 4 / 32
Metropolis-Hastings algorithm Two standard Metropolis-Hastings algorithms Independent Metropolis-Hastings Independent proposal, i.e. g ( θ | θ ( t ) ) = g ( θ ) Random-walk Metropolis Symmetric proposal, i.e. g ( θ | θ ( t ) ) = g ( θ ( t ) | θ ) for all θ, θ ( t ) . Jarad Niemi (STAT544@ISU) Metropolis-Hastings April 2, 2019 5 / 32
Independence Metropolis-Hastings Independence Metropolis-Hastings Let p ( θ | y ) ∝ q ( θ | y ) be the target distribution, θ ( t ) be the current draw from p ( θ | y ) , and g ( θ | θ ( t ) ) = g ( θ ) , i.e. the proposal is independent of the current value. The independence Metropolis-Hastings algorithm performs the following 1. propose θ ∗ ∼ g ( θ ) 2. accept θ ( t +1) = θ ∗ with probability min { 1 , r } where q ( θ ( t ) | y ) /g ( θ ( t ) ) = q ( θ ∗ | y ) q ( θ ∗ | y ) /g ( θ ∗ ) g ( θ ( t ) ) r = g ( θ ∗ ) q ( θ ( t ) | y ) otherwise, set θ ( t +1) = θ ( t ) . Jarad Niemi (STAT544@ISU) Metropolis-Hastings April 2, 2019 6 / 32
Independence Metropolis-Hastings Intuition through examples proposed= −1 proposed= 0 proposed= 1 0.4 0.3 current= −1 0.2 distribution 0.1 proposal 0.0 target 0.4 0.3 accept current= 0 0.2 FALSE y TRUE 0.1 0.0 value 0.4 current 0.3 current= 1 proposed 0.2 0.1 0.0 −2 −1 0 1 2 3 −2 −1 0 1 2 3 −2 −1 0 1 2 3 theta Jarad Niemi (STAT544@ISU) Metropolis-Hastings April 2, 2019 7 / 32
Independence Metropolis-Hastings Example: Normal-Cauchy model Let Y ∼ N ( θ, 1) with θ ∼ Ca (0 , 1) such that the posterior is p ( θ | y ) ∝ p ( y | θ ) p ( θ ) ∝ exp( − ( y − θ ) 2 / 2) 1 + θ 2 Use N ( y, 1) as the proposal, then the Metropolis-Hastings acceptance probability is the min { 1 , r } with q ( θ ∗ | y ) g ( θ ( t ) ) r = g ( θ ∗ ) q ( θ ( t ) | y ) exp( − ( y − θ ∗ ) 2 / 2) / 1+( θ ∗ ) 2 exp( − ( θ ( t ) − y ) 2 / 2) = exp( − ( θ ∗ − y ) 2 / 2) exp( − ( y − θ ( t ) ) 2 / 2) / 1+( θ ( t ) ) 2 = 1+( θ ( t ) ) 2 1+( θ ∗ ) 2 Jarad Niemi (STAT544@ISU) Metropolis-Hastings April 2, 2019 8 / 32
Independence Metropolis-Hastings Example: Normal-Cauchy model 0.4 distribution density proposal target 0.2 0.0 −2 0 2 theta Jarad Niemi (STAT544@ISU) Metropolis-Hastings April 2, 2019 9 / 32
Independence Metropolis-Hastings Example: Normal-Cauchy model Independence Metropolis−Hastings 1 θ 0 −1 0 25 50 75 100 Iteration (t) Independence Metropolis−Hastings (poor starting value) 10.0 7.5 5.0 θ 2.5 0.0 0 25 50 75 100 Iteration (t) Jarad Niemi (STAT544@ISU) Metropolis-Hastings April 2, 2019 10 / 32
Independence Metropolis-Hastings Need heavy tails Recall that rejection sampling requires the proposal to have heavy tails and importance sampling is efficient only when the proposal has heavy tails. Independence Metropolis-Hastings also requires heavy tailed proposals for efficiency since if θ ( t ) is in a region where p ( θ ( t ) | y ) >> g ( θ ( t ) ) , i.e. target has heavier tails than the proposal, then any proposal θ ∗ such that p ( θ ∗ | y ) ≈ g ( θ ∗ ) , i.e. in the center of the target and proposal, will result in r = g ( θ ( t ) ) p ( θ ∗ | y ) g ( θ ∗ ) ≈ 0 p ( θ ( t ) | y ) and few samples will be accepted. Jarad Niemi (STAT544@ISU) Metropolis-Hastings April 2, 2019 11 / 32
Independence Metropolis-Hastings Need heavy tails - example Suppose θ | y ∼ Ca (0 , 1) and we use a standard normal as a proposal. Then 0.4 0.3 density value 0.2 target proposal 0.1 0.0 −4 −2 0 2 4 x 100 target / proposal 10 1 −4 −2 0 2 4 theta Jarad Niemi (STAT544@ISU) Metropolis-Hastings April 2, 2019 12 / 32
Independence Metropolis-Hastings Need heavy tails 2 θ 0 −2 0 250 500 750 1000 Iteration (t) Jarad Niemi (STAT544@ISU) Metropolis-Hastings April 2, 2019 13 / 32
Random-walk Metropolis Random-walk Metropolis Let p ( θ | y ) ∝ q ( θ | y ) be the target distribution, θ ( t ) be the current draw from p ( θ | y ) , and g ( θ ∗ | θ ( t ) ) = g ( θ ( t ) | θ ∗ ) , i.e. the proposal is symmetric. The Metropolis algorithm performs the following 1. propose θ ∗ ∼ g ( θ | θ ( t ) ) 2. accept θ ( t +1) = θ ∗ with probability min { 1 , r } where r = q ( θ ∗ | y ) g ( θ ∗ | θ ( t ) ) = q ( θ ∗ | y ) g ( θ ( t ) | θ ∗ ) q ( θ ( t ) | y ) q ( θ ( t ) | y ) otherwise, set θ ( t +1) = θ ( t ) . This is also referred to as random-walk Metropolis. Jarad Niemi (STAT544@ISU) Metropolis-Hastings April 2, 2019 14 / 32
Random-walk Metropolis Stochastic hill climbing Notice that r = q ( θ ∗ | y ) /q ( θ ( t ) | y ) and thus will accept whenever the target density is larger when evaluated at the proposed value than it is when evaluated at the current value. Suppose θ | y ∼ N (0 , 1) , θ ( t ) = 1 , and θ ∗ ∼ N ( θ ( t ) , 1) . 0.4 Target Proposal 0.3 dnorm(x) 0.2 0.1 0.0 −3 −2 −1 0 1 2 3 x Jarad Niemi (STAT544@ISU) Metropolis-Hastings April 2, 2019 15 / 32
Random-walk Metropolis Example: Normal-Cauchy model Let Y ∼ N ( θ, 1) with θ ∼ Ca (0 , 1) such that the posterior is p ( θ | y ) ∝ p ( y | θ ) p ( θ ) ∝ exp( − ( y − θ ) 2 / 2) 1 + θ 2 Use N ( θ ( t ) , v 2 ) as the proposal, then the acceptance probability is the min { 1 , r } with r = q ( θ ∗ | y ) p ( y | θ ∗ ) p ( θ ∗ ) q ( θ ( t ) | y ) = p ( y | θ ( t ) ) p ( θ ( t ) ) . For this example, let v 2 = 1 . Jarad Niemi (STAT544@ISU) Metropolis-Hastings April 2, 2019 16 / 32
Random-walk Metropolis Example: Normal-Cauchy model Random−walk Metropolis 2 1 0 θ −1 −2 0 25 50 75 100 t Random−walk Metropolis (poor starting value) 10.0 7.5 5.0 θ 2.5 0.0 0 25 50 75 100 t Jarad Niemi (STAT544@ISU) Metropolis-Hastings April 2, 2019 17 / 32
Random-walk Metropolis Optimal tuning parameter Random-walk tuning parameter Let p ( θ | y ) be the target distribution, the proposal is symmetric with scale v 2 , and θ ( t ) is (approximately) distributed according to p ( θ | y ) . If v 2 ≈ 0 , then θ ∗ ≈ θ ( t ) and r = q ( θ ∗ | y ) q ( θ ( t ) | y ) ≈ 1 and all proposals are accepted, but θ ∗ ≈ θ ( t ) . As v 2 → ∞ , then q ( θ ∗ | y ) ≈ 0 since θ ∗ will be far from the mass of the target distribution and r = q ( θ ∗ | y ) q ( θ ( t ) | y ) ≈ 0 so all proposed values are rejected. So there is an optimal v 2 somewhere. For normal targets, the optimal random-walk proposal variance is 2 . 4 2 V ar ( θ | y ) /d where d is the dimension of θ which results in an acceptance rate of 40% for d = 1 down to 20% as d → ∞ . Jarad Niemi (STAT544@ISU) Metropolis-Hastings April 2, 2019 18 / 32
Random-walk Metropolis Optimal tuning parameter Random-walk with tuning parameter that is too big and too small Let y | θ ∼ N ( θ, 1) , θ ∼ Ca (0 , 1) , and y = 1 . 0.8 0.4 as.factor(v) theta 0.1 0.0 10 −0.4 0 25 50 75 100 iteration Jarad Niemi (STAT544@ISU) Metropolis-Hastings April 2, 2019 19 / 32
Recommend
More recommend