part 7 bayesian hierarchical modelling simulation and mcmc
play

Part 7 Bayesian hierarchical modelling, simulation and MCMC by - PowerPoint PPT Presentation

Wednesday 14:00-17:30 Part 7 Bayesian hierarchical modelling, simulation and MCMC by Gero Walter 252 Bayesian hierarchical modelling, simulation and MCMC Outline Bayesian hierarchical modelling / Bayesian networks / graphical models


  1. Factorization of the joint: Example m 1 m 2 t 1 t 2 omitting the fixed values in notation: the joint distribution β 1 β 2 f ( y 1 , . . . , y n , β 1 , β 2 , τ ) a b n x i � f ( y i | β 1 , β 2 , τ ) = f ( β 1 ) f ( β 2 ) f ( τ ) τ � �� � i = 1 prior � �� � likelihood y i is ∝ posterior f ( β 1 , β 2 , τ | y 1 , . . . , y n ) i = 1 , . . . , n ◮ it would be really useful to get posterior estimates based on the non-normalized density f ( y 1 , . . . , y n , β 1 , β 2 , τ ) ! 261

  2. Bayesian Networks: Inference ◮ Markov Chain Monte Carlo: simulate samples from the joint ∝ posterior ( ◮ next block!) 262

  3. Bayesian Networks: Inference ◮ Markov Chain Monte Carlo: simulate samples from the joint ∝ posterior ( ◮ next block!) ◮ can get any distributions for any (set of) variables in the graph by conditioning and marginalizing of the joint 262

  4. Bayesian Networks: Inference ◮ Markov Chain Monte Carlo: simulate samples from the joint ∝ posterior ( ◮ next block!) ◮ can get any distributions for any (set of) variables in the graph by conditioning and marginalizing of the joint ◮ for a set of M samples from the joint, { β m 1 , β m 2 , τ m } , m = 1 , . . . , M , 262

  5. Bayesian Networks: Inference ◮ Markov Chain Monte Carlo: simulate samples from the joint ∝ posterior ( ◮ next block!) ◮ can get any distributions for any (set of) variables in the graph by conditioning and marginalizing of the joint ◮ for a set of M samples from the joint, { β m 1 , β m 2 , τ m } , m = 1 , . . . , M , ◮ marginalizing = use only, e.g., { β m 1 } , m = 1 , . . . , M 262

  6. Bayesian Networks: Inference ◮ Markov Chain Monte Carlo: simulate samples from the joint ∝ posterior ( ◮ next block!) ◮ can get any distributions for any (set of) variables in the graph by conditioning and marginalizing of the joint ◮ for a set of M samples from the joint, { β m 1 , β m 2 , τ m } , m = 1 , . . . , M , ◮ marginalizing = use only, e.g., { β m 1 } , m = 1 , . . . , M ◮ conditioning = use only samples m with the right value of the conditioning parameter(s) (or redo the sampling with fixed conditioned values) 262

  7. Bayesian Networks with Imprecise Probability ◮ use sets of conditional distributions at nodes: credal networks (see, e.g., [2, §10], [17]) 263

  8. Bayesian Networks with Imprecise Probability ◮ use sets of conditional distributions at nodes: credal networks (see, e.g., [2, §10], [17]) ◮ specific algorithms for discrete credal networks (see, e.g., [2, §10.5.3], or [14]) 263

  9. Bayesian Networks with Imprecise Probability ◮ use sets of conditional distributions at nodes: credal networks (see, e.g., [2, §10], [17]) ◮ specific algorithms for discrete credal networks (see, e.g., [2, §10.5.3], or [14]) ◮ conditional independence with IP gets very non-trivial (see, e.g., [2, §4] for the gory details) 263

  10. Bayesian Networks with Imprecise Probability ◮ use sets of conditional distributions at nodes: credal networks (see, e.g., [2, §10], [17]) ◮ specific algorithms for discrete credal networks (see, e.g., [2, §10.5.3], or [14]) ◮ conditional independence with IP gets very non-trivial (see, e.g., [2, §4] for the gory details) ◮ here: do sensitivity analysis by varying prior distributions in sets: f ( β 1 ) ∈ M β 1 , . . . 263

  11. Other Graph-Based Methods: SEM, Path Analysis Value PERV1 CUSL1 CUSA3 PERV2 Loyalty CUSL2 CUEX2 PERQ4 CUSA1 Satisfaction CUSL3 PERQ5 CUSA2 CUEX3 PERQ6 Complaints CUSCO Expectation Quality PERQ7 CUEX1 PERQ1 PERQ2 PERQ3 IMAG1 Image IMAG2 IMAG3 IMAG4 IMAG5 264

  12. Value PERV1 CUSL1 CUSA3 PERV2 Loyalty CUSL2 CUEX2 PERQ4 CUSA1 Satisfaction CUSL3 PERQ5 CUSA2 CUEX3 PERQ6 Complaints CUSCO Expectation Quality PERQ7 CUEX1 PERQ1 PERQ2 PERQ3 IMAG1 Image IMAG2 IMAG3 IMAG4 IMAG5 Other Graph-Based Methods: SEM, Path Analysis 264

  13. Other Graph-Based Methods: SEM, Path Analysis ◮ Structural Equation Modeling (SEM, a.k.a. path modeling) uses graphs like BNs, but is something different 265

  14. Other Graph-Based Methods: SEM, Path Analysis ◮ Structural Equation Modeling (SEM, a.k.a. path modeling) uses graphs like BNs, but is something different ◮ used to estimate latent constructs by assuming linear relationships with measurements ( measurement / outer model ) and relationships between latent constructs ( structural model ) 265

  15. Other Graph-Based Methods: SEM, Path Analysis ◮ Structural Equation Modeling (SEM, a.k.a. path modeling) uses graphs like BNs, but is something different ◮ used to estimate latent constructs by assuming linear relationships with measurements ( measurement / outer model ) and relationships between latent constructs ( structural model ) ◮ example: customer satisfaction is measured by survey questions 1 and 2 (measurement model); brand loyality is a function of customer satisfaction (structural model) 265

  16. Other Graph-Based Methods: SEM, Path Analysis ◮ Structural Equation Modeling (SEM, a.k.a. path modeling) uses graphs like BNs, but is something different ◮ used to estimate latent constructs by assuming linear relationships with measurements ( measurement / outer model ) and relationships between latent constructs ( structural model ) ◮ example: customer satisfaction is measured by survey questions 1 and 2 (measurement model); brand loyality is a function of customer satisfaction (structural model) ◮ estimation of factor loadings ( = regression coefficients) 265

  17. Other Graph-Based Methods: SEM, Path Analysis ◮ Structural Equation Modeling (SEM, a.k.a. path modeling) uses graphs like BNs, but is something different ◮ used to estimate latent constructs by assuming linear relationships with measurements ( measurement / outer model ) and relationships between latent constructs ( structural model ) ◮ example: customer satisfaction is measured by survey questions 1 and 2 (measurement model); brand loyality is a function of customer satisfaction (structural model) ◮ estimation of factor loadings ( = regression coefficients) ◮ likelihood-based ( R package lavaan ): models expectations and (co)variances, not full distributions ( → multivariate normal) 265

  18. Other Graph-Based Methods: SEM, Path Analysis ◮ Structural Equation Modeling (SEM, a.k.a. path modeling) uses graphs like BNs, but is something different ◮ used to estimate latent constructs by assuming linear relationships with measurements ( measurement / outer model ) and relationships between latent constructs ( structural model ) ◮ example: customer satisfaction is measured by survey questions 1 and 2 (measurement model); brand loyality is a function of customer satisfaction (structural model) ◮ estimation of factor loadings ( = regression coefficients) ◮ likelihood-based ( R package lavaan ): models expectations and (co)variances, not full distributions ( → multivariate normal) ◮ Bayesian SEM: ( R package blavaan ) 265

  19. Other Graph-Based Methods: SEM, Path Analysis ◮ Structural Equation Modeling (SEM, a.k.a. path modeling) uses graphs like BNs, but is something different ◮ used to estimate latent constructs by assuming linear relationships with measurements ( measurement / outer model ) and relationships between latent constructs ( structural model ) ◮ example: customer satisfaction is measured by survey questions 1 and 2 (measurement model); brand loyality is a function of customer satisfaction (structural model) ◮ estimation of factor loadings ( = regression coefficients) ◮ likelihood-based ( R package lavaan ): models expectations and (co)variances, not full distributions ( → multivariate normal) ◮ Bayesian SEM: ( R package blavaan ) ◮ partial least squares ( R package semPLS ): iterative fitting of latent variable values and regression coefficients via least squares 265

  20. Other Graph-Based Methods: SEM, Path Analysis ◮ Structural Equation Modeling (SEM, a.k.a. path modeling) uses graphs like BNs, but is something different ◮ used to estimate latent constructs by assuming linear relationships with measurements ( measurement / outer model ) and relationships between latent constructs ( structural model ) ◮ example: customer satisfaction is measured by survey questions 1 and 2 (measurement model); brand loyality is a function of customer satisfaction (structural model) ◮ estimation of factor loadings ( = regression coefficients) ◮ likelihood-based ( R package lavaan ): models expectations and (co)variances, not full distributions ( → multivariate normal) ◮ Bayesian SEM: ( R package blavaan ) ◮ partial least squares ( R package semPLS ): iterative fitting of latent variable values and regression coefficients via least squares ◮ path analysis: special case where a measurement can be linked to only one construct 265

  21. Bayesian hierarchical modelling, simulation and MCMC Outline Bayesian hierarchical modelling / Bayesian networks / graphical models Exercises I Simulation & MCMC Exercises II 266

  22. Exercise 1: Factorization of a Joint x 1 x 2 x 3 Which factorization of � � f { x i } i ∈ [ 1 ,..., 7 ] does this graph encode? x 4 x 5 x 6 x 7 267

  23. Exercise 2: Naive Bayes Classifier The naive Bayes classifier from Part 6 assumes that the joint distribution of class c and attributes a 1 , . . . , a k can be factorized as k � p ( c , a ) = p ( c ) p ( a | c ) = p ( c ) p ( a i | c ) . i = 1 Draw the corresponding DAG! (Hint: use either a plate or consider two attributes a 1 and a 2 only.) 268

  24. Exercise 3: Naive Bayes Classifier with Dirichlet Priors We can introduce parameters for p ( c ) and p ( a i | c ) : ( n ( c )) c ∈C ∼ Multinomal ( θ c ; c ∈ C ) (36) ∀ c ∈ C : ( n ( a i , c )) a i ∈A i | c ∼ Multinomal ( θ a i | c ; a i ∈ A i ) (37) where C denotes the set of all possible class values, and A i denotes the set of all possible values of attribute i . The θ parameters can be estimated using a Dirichlet prior: ( θ c ) c ∈C ∼ Dir ( s , ( t ( c )) c ∈C ) (38) ∀ c ∈ C : ( θ a i | c ) a i ∈A i | c ∼ Dir ( s , ( t ( a i , c )) a i ∈A i ) (39) where we must have that � a i ∈A i t ( a i , c ) = t ( c ) . [Note that t ( c ) is the prior expectation of θ c and t ( a i , c ) / t ( c ) is the prior expectation of θ a i | c .] Draw the corresponding graph! 269

  25. Exercise 4: Sensitivity Analysis m 1 m 2 t 1 t 2 β 1 β 2 In the linear regression example there are 6 hyperparameters m 1 , t 1 , m 2 , t 2 , a , b . a b x i How would you do sensitivity analysis τ over the prior in that example? What problems do you foresee? y i i = 1 , . . . , n 270

  26. Bayesian hierarchical modelling, simulation and MCMC Outline Bayesian hierarchical modelling / Bayesian networks / graphical models Exercises I Simulation & MCMC Exercises II 271

  27. Simulation & Markov Chain Monte Carlo: What? Why? ◮ BNs allow us to formulate complex models 272

  28. Simulation & Markov Chain Monte Carlo: What? Why? ◮ BNs allow us to formulate complex models ◮ complex variance structures, . . . 272

  29. Simulation & Markov Chain Monte Carlo: What? Why? ◮ BNs allow us to formulate complex models ◮ complex variance structures, . . . ◮ joint ∝ posterior usually intractable: how to do inference? 272

  30. Simulation & Markov Chain Monte Carlo: What? Why? ◮ BNs allow us to formulate complex models ◮ complex variance structures, . . . ◮ joint ∝ posterior usually intractable: how to do inference? ◮ simulate samples from joint / posterior: approximate . . . 272

  31. Simulation & Markov Chain Monte Carlo: What? Why? ◮ BNs allow us to formulate complex models ◮ complex variance structures, . . . ◮ joint ∝ posterior usually intractable: how to do inference? ◮ simulate samples from joint / posterior: approximate . . . ◮ . . . posterior cdf by empirical cdf (density: kernel dens. est.) 272

  32. Simulation & Markov Chain Monte Carlo: What? Why? ◮ BNs allow us to formulate complex models ◮ complex variance structures, . . . ◮ joint ∝ posterior usually intractable: how to do inference? ◮ simulate samples from joint / posterior: approximate . . . ◮ . . . posterior cdf by empirical cdf (density: kernel dens. est.) ◮ . . . posterior expectation by sample mean 272

  33. Simulation & Markov Chain Monte Carlo: What? Why? ◮ BNs allow us to formulate complex models ◮ complex variance structures, . . . ◮ joint ∝ posterior usually intractable: how to do inference? ◮ simulate samples from joint / posterior: approximate . . . ◮ . . . posterior cdf by empirical cdf (density: kernel dens. est.) ◮ . . . posterior expectation by sample mean ◮ . . . any function of posterior parameters by sample equivalent 272

  34. Simulation & Markov Chain Monte Carlo: What? Why? ◮ BNs allow us to formulate complex models ◮ complex variance structures, . . . ◮ joint ∝ posterior usually intractable: how to do inference? ◮ simulate samples from joint / posterior: approximate . . . ◮ . . . posterior cdf by empirical cdf (density: kernel dens. est.) ◮ . . . posterior expectation by sample mean ◮ . . . any function of posterior parameters by sample equivalent ◮ first: quick look at sampling from univariate distributions 272

  35. Simulation & Markov Chain Monte Carlo: What? Why? ◮ BNs allow us to formulate complex models ◮ complex variance structures, . . . ◮ joint ∝ posterior usually intractable: how to do inference? ◮ simulate samples from joint / posterior: approximate . . . ◮ . . . posterior cdf by empirical cdf (density: kernel dens. est.) ◮ . . . posterior expectation by sample mean ◮ . . . any function of posterior parameters by sample equivalent ◮ first: quick look at sampling from univariate distributions ◮ then: MCMC for sampling from multivariate distributions 272

  36. Monte Carlo Estimation: Why does it work? � ◮ want to estimate E ( g ( X )) = g ( x ) f ( x | . . . ) dx 273

  37. Monte Carlo Estimation: Why does it work? � ◮ want to estimate E ( g ( X )) = g ( x ) f ( x | . . . ) dx ◮ Monte Carlo sample x 1 , . . . , x M ( M samples drawn from f ( x | ... ) ) 273

  38. Monte Carlo Estimation: Why does it work? � ◮ want to estimate E ( g ( X )) = g ( x ) f ( x | . . . ) dx ◮ Monte Carlo sample x 1 , . . . , x M ( M samples drawn from f ( x | ... ) ) M E ( g ( X )) = 1 � � ◮ estimate E ( g ( X )) by g ( x i ) M i = 1 273

  39. Monte Carlo Estimation: Why does it work? � ◮ want to estimate E ( g ( X )) = g ( x ) f ( x | . . . ) dx ◮ Monte Carlo sample x 1 , . . . , x M ( M samples drawn from f ( x | ... ) ) M E ( g ( X )) = 1 � � ◮ estimate E ( g ( X )) by g ( x i ) M i = 1 � � � ◮ unbiased: E E ( g ( X )) = E ( g ( X )) 273

  40. Monte Carlo Estimation: Why does it work? � ◮ want to estimate E ( g ( X )) = g ( x ) f ( x | . . . ) dx ◮ Monte Carlo sample x 1 , . . . , x M ( M samples drawn from f ( x | ... ) ) M E ( g ( X )) = 1 � � ◮ estimate E ( g ( X )) by g ( x i ) M i = 1 � � � ◮ unbiased: E E ( g ( X )) = E ( g ( X )) � � = 1 � ◮ variance: Var E ( g ( X )) M Var ( g ( X )) for independent samples only! 273

  41. Monte Carlo Estimation: Why does it work? � ◮ want to estimate E ( g ( X )) = g ( x ) f ( x | . . . ) dx ◮ Monte Carlo sample x 1 , . . . , x M ( M samples drawn from f ( x | ... ) ) M E ( g ( X )) = 1 � � ◮ estimate E ( g ( X )) by g ( x i ) M i = 1 � � � ◮ unbiased: E E ( g ( X )) = E ( g ( X )) � � = 1 � ◮ variance: Var E ( g ( X )) M Var ( g ( X )) for independent samples only! ◮ precision of MC estimate increases with M , independent of parameter dimension! (numeric integration: number of evaluation points increases exponentially with dimension) 273

  42. Monte Carlo Estimation: Why does it work? � ◮ want to estimate E ( g ( X )) = g ( x ) f ( x | . . . ) dx ◮ Monte Carlo sample x 1 , . . . , x M ( M samples drawn from f ( x | ... ) ) M E ( g ( X )) = 1 � � ◮ estimate E ( g ( X )) by g ( x i ) M i = 1 � � � ◮ unbiased: E E ( g ( X )) = E ( g ( X )) � � = 1 � ◮ variance: Var E ( g ( X )) M Var ( g ( X )) for independent samples only! ◮ precision of MC estimate increases with M , independent of parameter dimension! (numeric integration: number of evaluation points increases exponentially with dimension) a.s. � − − → E ( g ( X )) (strong law of large numbers) lim E ( g ( X )) ◮ M →∞ 273

  43. Monte Carlo Estimation: Why does it work? � ◮ want to estimate E ( g ( X )) = g ( x ) f ( x | . . . ) dx ◮ Monte Carlo sample x 1 , . . . , x M ( M samples drawn from f ( x | ... ) ) M E ( g ( X )) = 1 � � ◮ estimate E ( g ( X )) by g ( x i ) M i = 1 � � � ◮ unbiased: E E ( g ( X )) = E ( g ( X )) � � = 1 � ◮ variance: Var E ( g ( X )) M Var ( g ( X )) for independent samples only! ◮ precision of MC estimate increases with M , independent of parameter dimension! (numeric integration: number of evaluation points increases exponentially with dimension) a.s. � − − → E ( g ( X )) (strong law of large numbers) lim E ( g ( X )) ◮ M →∞ � � E ( g ( X )) a.s. � E ( g ( X )) , 1 ∼ N M Var ( g ( X )) (central limit thm) ◮ 273

  44. Simulation & MCMC: Univariate Sampling ◮ assumption for all sampling algorithms: we can sample from the uniform U ([ 0 , 1 ]) ◮ done by pseudo-random number generator (PNRG), in R : ?RNG 274

  45. Simulation & MCMC: Univariate Sampling ◮ assumption for all sampling algorithms: we can sample from the uniform U ([ 0 , 1 ]) ◮ done by pseudo-random number generator (PNRG), in R : ?RNG F ( a ) 1 u 0 a a 1 a 2 a 3 a 4 a 5 274

  46. Simulation & MCMC: Univariate Sampling ◮ assumption for all sampling algorithms: we can sample from the uniform U ([ 0 , 1 ]) ◮ done by pseudo-random number generator (PNRG), in R : ?RNG F ( x ) 1 u 0 x 274

  47. Simulation & MCMC: Univariate Sampling ◮ assumption for all sampling algorithms: we can sample from the uniform U ([ 0 , 1 ]) ◮ done by pseudo-random number generator (PNRG), in R : ?RNG ◮ does not work F ( x ) well in 1 dimensions > 1 u 0 x 274

  48. Simulation & MCMC: Univariate Sampling ◮ assumption for all sampling algorithms: we can sample from the uniform U ([ 0 , 1 ]) ◮ done by pseudo-random number generator (PNRG), in R : ?RNG ◮ does not work F ( x ) well in 1 dimensions > 1 ◮ needs F − 1 ( · ) u 0 x 274

  49. Simulation & MCMC: Univariate Sampling ◮ assumption for all sampling algorithms: we can sample from the uniform U ([ 0 , 1 ]) ◮ done by pseudo-random number generator (PNRG), in R : ?RNG ◮ does not work F ( x ) well in 1 dimensions > 1 ◮ needs F − 1 ( · ) u ◮ needs normalization factor ◮ rejection sampling 0 x 274

  50. Simulation & MCMC: Rejection Sampling kq ( z ) kq ( z 0 ) p ( z ) ∝ target density ˜ q ( z ) proposal density p ( z ) � u 0 z 0 z 275

  51. Simulation & MCMC: Rejection Sampling kq ( z ) kq ( z 0 ) p ( z ) ∝ target density ˜ q ( z ) proposal density p ( z ) � u 0 z 0 z 1. sample z ( ↔ ) from q ( z ) 275

  52. Simulation & MCMC: Rejection Sampling kq ( z ) kq ( z 0 ) p ( z ) ∝ target density ˜ q ( z ) proposal density p ( z ) � u 0 z 0 z 1. sample z ( ↔ ) from q ( z ) 2. sample u ( � ) from U ([ 0 , kq ( z )]) 275

  53. Simulation & MCMC: Rejection Sampling kq ( z ) kq ( z 0 ) p ( z ) ∝ target density ˜ q ( z ) proposal density p ( z ) � u 0 z 0 z 1. sample z ( ↔ ) from q ( z ) sample points uniformly from union of white and grey areas 2. sample u ( � ) from U ([ 0 , kq ( z )]) 275

  54. Simulation & MCMC: Rejection Sampling kq ( z ) kq ( z 0 ) p ( z ) ∝ target density ˜ q ( z ) proposal density p ( z ) � u 0 z 0 z 1. sample z ( ↔ ) from q ( z ) sample points uniformly from union of white and grey areas 2. sample u ( � ) from U ([ 0 , kq ( z )]) 3. reject all points in the grey area 275

  55. Simulation & MCMC: Rejection Sampling kq ( z ) kq ( z 0 ) p ( z ) ∝ target density ˜ q ( z ) proposal density p ( z ) � u 0 z 0 z 1. sample z ( ↔ ) from q ( z ) sample points uniformly from union of white and grey areas 2. sample u ( � ) from U ([ 0 , kq ( z )]) 3. reject all points in the grey area 4. forget about u : z distributed ∝ ˜ p ( z ) ! 275

  56. Markov Chain Monte Carlo: General Idea ◮ need to sample from high-dimensional distributions 276

  57. Markov Chain Monte Carlo: General Idea ◮ need to sample from high-dimensional distributions ◮ idea: produce samples by a Markov Chain: random walk over parameter space 276

  58. Markov Chain Monte Carlo: General Idea ◮ need to sample from high-dimensional distributions ◮ idea: produce samples by a Markov Chain: random walk over parameter space ◮ random walk spends more time in high-probability regions 276

  59. Markov Chain Monte Carlo: General Idea ◮ need to sample from high-dimensional distributions ◮ idea: produce samples by a Markov Chain: random walk over parameter space ◮ random walk spends more time in high-probability regions ◮ if in each step we move in one dimension only: need to sample from one-dimensional distribution only, can use previous algorithms for that! 276

  60. Markov Chain Monte Carlo: General Idea ◮ need to sample from high-dimensional distributions ◮ idea: produce samples by a Markov Chain: random walk over parameter space ◮ random walk spends more time in high-probability regions ◮ if in each step we move in one dimension only: need to sample from one-dimensional distribution only, can use previous algorithms for that! ◮ but: samples are not independent! 276

  61. Markov Chain Monte Carlo: Algorithms ◮ Metropolis-Hastings: 277

  62. Markov Chain Monte Carlo: Algorithms ◮ Metropolis-Hastings: ◮ propose a step (draw from easy-to-sample-from proposal distribution ) 277

  63. Markov Chain Monte Carlo: Algorithms ◮ Metropolis-Hastings: ◮ propose a step (draw from easy-to-sample-from proposal distribution ) ◮ accept step with certain probability (tailored to make chain approach the target distribution) 277

  64. Markov Chain Monte Carlo: Algorithms ◮ Metropolis-Hastings: ◮ propose a step (draw from easy-to-sample-from proposal distribution ) ◮ accept step with certain probability (tailored to make chain approach the target distribution) ◮ Stan uses an improved variant called Hamiltonian MH 277

  65. Markov Chain Monte Carlo: Algorithms ◮ Metropolis-Hastings: ◮ propose a step (draw from easy-to-sample-from proposal distribution ) ◮ accept step with certain probability (tailored to make chain approach the target distribution) ◮ Stan uses an improved variant called Hamiltonian MH ◮ Gibbs sampler: 277

  66. Markov Chain Monte Carlo: Algorithms ◮ Metropolis-Hastings: ◮ propose a step (draw from easy-to-sample-from proposal distribution ) ◮ accept step with certain probability (tailored to make chain approach the target distribution) ◮ Stan uses an improved variant called Hamiltonian MH ◮ Gibbs sampler: ◮ loop over parameter vector ( θ 1 , θ 2 , . . . ) 277

  67. Markov Chain Monte Carlo: Algorithms ◮ Metropolis-Hastings: ◮ propose a step (draw from easy-to-sample-from proposal distribution ) ◮ accept step with certain probability (tailored to make chain approach the target distribution) ◮ Stan uses an improved variant called Hamiltonian MH ◮ Gibbs sampler: ◮ loop over parameter vector ( θ 1 , θ 2 , . . . ) ◮ draw from the full conditionals f ( θ i | everything else ) ∝ joint 277

  68. Markov Chain Monte Carlo: Algorithms ◮ Metropolis-Hastings: ◮ propose a step (draw from easy-to-sample-from proposal distribution ) ◮ accept step with certain probability (tailored to make chain approach the target distribution) ◮ Stan uses an improved variant called Hamiltonian MH ◮ Gibbs sampler: ◮ loop over parameter vector ( θ 1 , θ 2 , . . . ) ◮ draw from the full conditionals f ( θ i | everything else ) ∝ joint ◮ special case of MH where proposals are always accepted 277

  69. Markov Chain Monte Carlo: Why does this work? Algorithms create a ◮ stationary, ◮ irreducible and ◮ aperiodic Markov chain which has the joint as its limiting (invariant) distribution 278

  70. Markov Chain Monte Carlo: Why does this work? Algorithms create a ◮ stationary, ◮ irreducible and . . . X T − 1 X 1 X 2 X T ◮ aperiodic Markov chain which has the joint as its limiting (invariant) distribution 278

  71. Markov Chain Monte Carlo: Why does this work? Algorithms create a ◮ stationary , f ( x 2 | x 1 ) = · · · = f ( x n | x n − 1 ) ◮ irreducible and . . . X T − 1 X 1 X 2 X T ◮ aperiodic Markov chain which has the joint as its limiting (invariant) distribution 278

  72. Markov Chain Monte Carlo: Why does this work? Algorithms create a ◮ stationary, ◮ irreducible and 0 . 4 1 2 ◮ aperiodic 0 . 5 Markov chain which has the joint as its 0 . 2 0 . 6 0 . 7 0 . 5 limiting (invariant) distribution 0 . 8 3 4 0 . 3 278

  73. Markov Chain Monte Carlo: Why does this work? Algorithms create a ◮ stationary, ◮ irreducible and 0 . 4 0 . 1 1 2 5 ◮ aperiodic 0 . 5 Markov chain 1 which has the joint as its 0 . 2 0 . 6 0 . 7 0 . 5 limiting (invariant) distribution 0 . 8 3 4 0 . 3 278

  74. Markov Chain Monte Carlo: Why does this work? Algorithms create a ◮ stationary, ◮ irreducible and 0 . 4 1 2 ◮ aperiodic 0 . 5 Markov chain which has the joint as its 0 . 2 0 . 6 0 . 7 0 . 5 limiting (invariant) distribution 0 . 8 3 4 0 . 3 278

  75. Markov Chain Monte Carlo: Why does this work? Algorithms create a ◮ stationary, 1 1 ◮ irreducible and 0 . 4 1 2 ◮ aperiodic 0 . 5 Markov chain which has the joint as its 0 . 2 0 . 6 0 . 7 0 . 5 limiting (invariant) distribution 0 . 8 3 4 0 . 3 1 1 278

  76. Markov Chain Monte Carlo: Why does this work? Algorithms create a ◮ stationary, 1 1 ◮ irreducible and 0 . 4 1 2 ◮ aperiodic 0 . 5 Markov chain which has the joint as its 0 . 2 0 . 6 0 . 7 0 . 5 limiting (invariant) distribution 0 . 8 3 4 0 . 3 1 1 278

Recommend


More recommend