Factorization of the joint: Example m 1 m 2 t 1 t 2 omitting the fixed values in notation: the joint distribution β 1 β 2 f ( y 1 , . . . , y n , β 1 , β 2 , τ ) a b n x i � f ( y i | β 1 , β 2 , τ ) = f ( β 1 ) f ( β 2 ) f ( τ ) τ � �� � i = 1 prior � �� � likelihood y i is ∝ posterior f ( β 1 , β 2 , τ | y 1 , . . . , y n ) i = 1 , . . . , n ◮ it would be really useful to get posterior estimates based on the non-normalized density f ( y 1 , . . . , y n , β 1 , β 2 , τ ) ! 261
Bayesian Networks: Inference ◮ Markov Chain Monte Carlo: simulate samples from the joint ∝ posterior ( ◮ next block!) 262
Bayesian Networks: Inference ◮ Markov Chain Monte Carlo: simulate samples from the joint ∝ posterior ( ◮ next block!) ◮ can get any distributions for any (set of) variables in the graph by conditioning and marginalizing of the joint 262
Bayesian Networks: Inference ◮ Markov Chain Monte Carlo: simulate samples from the joint ∝ posterior ( ◮ next block!) ◮ can get any distributions for any (set of) variables in the graph by conditioning and marginalizing of the joint ◮ for a set of M samples from the joint, { β m 1 , β m 2 , τ m } , m = 1 , . . . , M , 262
Bayesian Networks: Inference ◮ Markov Chain Monte Carlo: simulate samples from the joint ∝ posterior ( ◮ next block!) ◮ can get any distributions for any (set of) variables in the graph by conditioning and marginalizing of the joint ◮ for a set of M samples from the joint, { β m 1 , β m 2 , τ m } , m = 1 , . . . , M , ◮ marginalizing = use only, e.g., { β m 1 } , m = 1 , . . . , M 262
Bayesian Networks: Inference ◮ Markov Chain Monte Carlo: simulate samples from the joint ∝ posterior ( ◮ next block!) ◮ can get any distributions for any (set of) variables in the graph by conditioning and marginalizing of the joint ◮ for a set of M samples from the joint, { β m 1 , β m 2 , τ m } , m = 1 , . . . , M , ◮ marginalizing = use only, e.g., { β m 1 } , m = 1 , . . . , M ◮ conditioning = use only samples m with the right value of the conditioning parameter(s) (or redo the sampling with fixed conditioned values) 262
Bayesian Networks with Imprecise Probability ◮ use sets of conditional distributions at nodes: credal networks (see, e.g., [2, §10], [17]) 263
Bayesian Networks with Imprecise Probability ◮ use sets of conditional distributions at nodes: credal networks (see, e.g., [2, §10], [17]) ◮ specific algorithms for discrete credal networks (see, e.g., [2, §10.5.3], or [14]) 263
Bayesian Networks with Imprecise Probability ◮ use sets of conditional distributions at nodes: credal networks (see, e.g., [2, §10], [17]) ◮ specific algorithms for discrete credal networks (see, e.g., [2, §10.5.3], or [14]) ◮ conditional independence with IP gets very non-trivial (see, e.g., [2, §4] for the gory details) 263
Bayesian Networks with Imprecise Probability ◮ use sets of conditional distributions at nodes: credal networks (see, e.g., [2, §10], [17]) ◮ specific algorithms for discrete credal networks (see, e.g., [2, §10.5.3], or [14]) ◮ conditional independence with IP gets very non-trivial (see, e.g., [2, §4] for the gory details) ◮ here: do sensitivity analysis by varying prior distributions in sets: f ( β 1 ) ∈ M β 1 , . . . 263
Other Graph-Based Methods: SEM, Path Analysis Value PERV1 CUSL1 CUSA3 PERV2 Loyalty CUSL2 CUEX2 PERQ4 CUSA1 Satisfaction CUSL3 PERQ5 CUSA2 CUEX3 PERQ6 Complaints CUSCO Expectation Quality PERQ7 CUEX1 PERQ1 PERQ2 PERQ3 IMAG1 Image IMAG2 IMAG3 IMAG4 IMAG5 264
Value PERV1 CUSL1 CUSA3 PERV2 Loyalty CUSL2 CUEX2 PERQ4 CUSA1 Satisfaction CUSL3 PERQ5 CUSA2 CUEX3 PERQ6 Complaints CUSCO Expectation Quality PERQ7 CUEX1 PERQ1 PERQ2 PERQ3 IMAG1 Image IMAG2 IMAG3 IMAG4 IMAG5 Other Graph-Based Methods: SEM, Path Analysis 264
Other Graph-Based Methods: SEM, Path Analysis ◮ Structural Equation Modeling (SEM, a.k.a. path modeling) uses graphs like BNs, but is something different 265
Other Graph-Based Methods: SEM, Path Analysis ◮ Structural Equation Modeling (SEM, a.k.a. path modeling) uses graphs like BNs, but is something different ◮ used to estimate latent constructs by assuming linear relationships with measurements ( measurement / outer model ) and relationships between latent constructs ( structural model ) 265
Other Graph-Based Methods: SEM, Path Analysis ◮ Structural Equation Modeling (SEM, a.k.a. path modeling) uses graphs like BNs, but is something different ◮ used to estimate latent constructs by assuming linear relationships with measurements ( measurement / outer model ) and relationships between latent constructs ( structural model ) ◮ example: customer satisfaction is measured by survey questions 1 and 2 (measurement model); brand loyality is a function of customer satisfaction (structural model) 265
Other Graph-Based Methods: SEM, Path Analysis ◮ Structural Equation Modeling (SEM, a.k.a. path modeling) uses graphs like BNs, but is something different ◮ used to estimate latent constructs by assuming linear relationships with measurements ( measurement / outer model ) and relationships between latent constructs ( structural model ) ◮ example: customer satisfaction is measured by survey questions 1 and 2 (measurement model); brand loyality is a function of customer satisfaction (structural model) ◮ estimation of factor loadings ( = regression coefficients) 265
Other Graph-Based Methods: SEM, Path Analysis ◮ Structural Equation Modeling (SEM, a.k.a. path modeling) uses graphs like BNs, but is something different ◮ used to estimate latent constructs by assuming linear relationships with measurements ( measurement / outer model ) and relationships between latent constructs ( structural model ) ◮ example: customer satisfaction is measured by survey questions 1 and 2 (measurement model); brand loyality is a function of customer satisfaction (structural model) ◮ estimation of factor loadings ( = regression coefficients) ◮ likelihood-based ( R package lavaan ): models expectations and (co)variances, not full distributions ( → multivariate normal) 265
Other Graph-Based Methods: SEM, Path Analysis ◮ Structural Equation Modeling (SEM, a.k.a. path modeling) uses graphs like BNs, but is something different ◮ used to estimate latent constructs by assuming linear relationships with measurements ( measurement / outer model ) and relationships between latent constructs ( structural model ) ◮ example: customer satisfaction is measured by survey questions 1 and 2 (measurement model); brand loyality is a function of customer satisfaction (structural model) ◮ estimation of factor loadings ( = regression coefficients) ◮ likelihood-based ( R package lavaan ): models expectations and (co)variances, not full distributions ( → multivariate normal) ◮ Bayesian SEM: ( R package blavaan ) 265
Other Graph-Based Methods: SEM, Path Analysis ◮ Structural Equation Modeling (SEM, a.k.a. path modeling) uses graphs like BNs, but is something different ◮ used to estimate latent constructs by assuming linear relationships with measurements ( measurement / outer model ) and relationships between latent constructs ( structural model ) ◮ example: customer satisfaction is measured by survey questions 1 and 2 (measurement model); brand loyality is a function of customer satisfaction (structural model) ◮ estimation of factor loadings ( = regression coefficients) ◮ likelihood-based ( R package lavaan ): models expectations and (co)variances, not full distributions ( → multivariate normal) ◮ Bayesian SEM: ( R package blavaan ) ◮ partial least squares ( R package semPLS ): iterative fitting of latent variable values and regression coefficients via least squares 265
Other Graph-Based Methods: SEM, Path Analysis ◮ Structural Equation Modeling (SEM, a.k.a. path modeling) uses graphs like BNs, but is something different ◮ used to estimate latent constructs by assuming linear relationships with measurements ( measurement / outer model ) and relationships between latent constructs ( structural model ) ◮ example: customer satisfaction is measured by survey questions 1 and 2 (measurement model); brand loyality is a function of customer satisfaction (structural model) ◮ estimation of factor loadings ( = regression coefficients) ◮ likelihood-based ( R package lavaan ): models expectations and (co)variances, not full distributions ( → multivariate normal) ◮ Bayesian SEM: ( R package blavaan ) ◮ partial least squares ( R package semPLS ): iterative fitting of latent variable values and regression coefficients via least squares ◮ path analysis: special case where a measurement can be linked to only one construct 265
Bayesian hierarchical modelling, simulation and MCMC Outline Bayesian hierarchical modelling / Bayesian networks / graphical models Exercises I Simulation & MCMC Exercises II 266
Exercise 1: Factorization of a Joint x 1 x 2 x 3 Which factorization of � � f { x i } i ∈ [ 1 ,..., 7 ] does this graph encode? x 4 x 5 x 6 x 7 267
Exercise 2: Naive Bayes Classifier The naive Bayes classifier from Part 6 assumes that the joint distribution of class c and attributes a 1 , . . . , a k can be factorized as k � p ( c , a ) = p ( c ) p ( a | c ) = p ( c ) p ( a i | c ) . i = 1 Draw the corresponding DAG! (Hint: use either a plate or consider two attributes a 1 and a 2 only.) 268
Exercise 3: Naive Bayes Classifier with Dirichlet Priors We can introduce parameters for p ( c ) and p ( a i | c ) : ( n ( c )) c ∈C ∼ Multinomal ( θ c ; c ∈ C ) (36) ∀ c ∈ C : ( n ( a i , c )) a i ∈A i | c ∼ Multinomal ( θ a i | c ; a i ∈ A i ) (37) where C denotes the set of all possible class values, and A i denotes the set of all possible values of attribute i . The θ parameters can be estimated using a Dirichlet prior: ( θ c ) c ∈C ∼ Dir ( s , ( t ( c )) c ∈C ) (38) ∀ c ∈ C : ( θ a i | c ) a i ∈A i | c ∼ Dir ( s , ( t ( a i , c )) a i ∈A i ) (39) where we must have that � a i ∈A i t ( a i , c ) = t ( c ) . [Note that t ( c ) is the prior expectation of θ c and t ( a i , c ) / t ( c ) is the prior expectation of θ a i | c .] Draw the corresponding graph! 269
Exercise 4: Sensitivity Analysis m 1 m 2 t 1 t 2 β 1 β 2 In the linear regression example there are 6 hyperparameters m 1 , t 1 , m 2 , t 2 , a , b . a b x i How would you do sensitivity analysis τ over the prior in that example? What problems do you foresee? y i i = 1 , . . . , n 270
Bayesian hierarchical modelling, simulation and MCMC Outline Bayesian hierarchical modelling / Bayesian networks / graphical models Exercises I Simulation & MCMC Exercises II 271
Simulation & Markov Chain Monte Carlo: What? Why? ◮ BNs allow us to formulate complex models 272
Simulation & Markov Chain Monte Carlo: What? Why? ◮ BNs allow us to formulate complex models ◮ complex variance structures, . . . 272
Simulation & Markov Chain Monte Carlo: What? Why? ◮ BNs allow us to formulate complex models ◮ complex variance structures, . . . ◮ joint ∝ posterior usually intractable: how to do inference? 272
Simulation & Markov Chain Monte Carlo: What? Why? ◮ BNs allow us to formulate complex models ◮ complex variance structures, . . . ◮ joint ∝ posterior usually intractable: how to do inference? ◮ simulate samples from joint / posterior: approximate . . . 272
Simulation & Markov Chain Monte Carlo: What? Why? ◮ BNs allow us to formulate complex models ◮ complex variance structures, . . . ◮ joint ∝ posterior usually intractable: how to do inference? ◮ simulate samples from joint / posterior: approximate . . . ◮ . . . posterior cdf by empirical cdf (density: kernel dens. est.) 272
Simulation & Markov Chain Monte Carlo: What? Why? ◮ BNs allow us to formulate complex models ◮ complex variance structures, . . . ◮ joint ∝ posterior usually intractable: how to do inference? ◮ simulate samples from joint / posterior: approximate . . . ◮ . . . posterior cdf by empirical cdf (density: kernel dens. est.) ◮ . . . posterior expectation by sample mean 272
Simulation & Markov Chain Monte Carlo: What? Why? ◮ BNs allow us to formulate complex models ◮ complex variance structures, . . . ◮ joint ∝ posterior usually intractable: how to do inference? ◮ simulate samples from joint / posterior: approximate . . . ◮ . . . posterior cdf by empirical cdf (density: kernel dens. est.) ◮ . . . posterior expectation by sample mean ◮ . . . any function of posterior parameters by sample equivalent 272
Simulation & Markov Chain Monte Carlo: What? Why? ◮ BNs allow us to formulate complex models ◮ complex variance structures, . . . ◮ joint ∝ posterior usually intractable: how to do inference? ◮ simulate samples from joint / posterior: approximate . . . ◮ . . . posterior cdf by empirical cdf (density: kernel dens. est.) ◮ . . . posterior expectation by sample mean ◮ . . . any function of posterior parameters by sample equivalent ◮ first: quick look at sampling from univariate distributions 272
Simulation & Markov Chain Monte Carlo: What? Why? ◮ BNs allow us to formulate complex models ◮ complex variance structures, . . . ◮ joint ∝ posterior usually intractable: how to do inference? ◮ simulate samples from joint / posterior: approximate . . . ◮ . . . posterior cdf by empirical cdf (density: kernel dens. est.) ◮ . . . posterior expectation by sample mean ◮ . . . any function of posterior parameters by sample equivalent ◮ first: quick look at sampling from univariate distributions ◮ then: MCMC for sampling from multivariate distributions 272
Monte Carlo Estimation: Why does it work? � ◮ want to estimate E ( g ( X )) = g ( x ) f ( x | . . . ) dx 273
Monte Carlo Estimation: Why does it work? � ◮ want to estimate E ( g ( X )) = g ( x ) f ( x | . . . ) dx ◮ Monte Carlo sample x 1 , . . . , x M ( M samples drawn from f ( x | ... ) ) 273
Monte Carlo Estimation: Why does it work? � ◮ want to estimate E ( g ( X )) = g ( x ) f ( x | . . . ) dx ◮ Monte Carlo sample x 1 , . . . , x M ( M samples drawn from f ( x | ... ) ) M E ( g ( X )) = 1 � � ◮ estimate E ( g ( X )) by g ( x i ) M i = 1 273
Monte Carlo Estimation: Why does it work? � ◮ want to estimate E ( g ( X )) = g ( x ) f ( x | . . . ) dx ◮ Monte Carlo sample x 1 , . . . , x M ( M samples drawn from f ( x | ... ) ) M E ( g ( X )) = 1 � � ◮ estimate E ( g ( X )) by g ( x i ) M i = 1 � � � ◮ unbiased: E E ( g ( X )) = E ( g ( X )) 273
Monte Carlo Estimation: Why does it work? � ◮ want to estimate E ( g ( X )) = g ( x ) f ( x | . . . ) dx ◮ Monte Carlo sample x 1 , . . . , x M ( M samples drawn from f ( x | ... ) ) M E ( g ( X )) = 1 � � ◮ estimate E ( g ( X )) by g ( x i ) M i = 1 � � � ◮ unbiased: E E ( g ( X )) = E ( g ( X )) � � = 1 � ◮ variance: Var E ( g ( X )) M Var ( g ( X )) for independent samples only! 273
Monte Carlo Estimation: Why does it work? � ◮ want to estimate E ( g ( X )) = g ( x ) f ( x | . . . ) dx ◮ Monte Carlo sample x 1 , . . . , x M ( M samples drawn from f ( x | ... ) ) M E ( g ( X )) = 1 � � ◮ estimate E ( g ( X )) by g ( x i ) M i = 1 � � � ◮ unbiased: E E ( g ( X )) = E ( g ( X )) � � = 1 � ◮ variance: Var E ( g ( X )) M Var ( g ( X )) for independent samples only! ◮ precision of MC estimate increases with M , independent of parameter dimension! (numeric integration: number of evaluation points increases exponentially with dimension) 273
Monte Carlo Estimation: Why does it work? � ◮ want to estimate E ( g ( X )) = g ( x ) f ( x | . . . ) dx ◮ Monte Carlo sample x 1 , . . . , x M ( M samples drawn from f ( x | ... ) ) M E ( g ( X )) = 1 � � ◮ estimate E ( g ( X )) by g ( x i ) M i = 1 � � � ◮ unbiased: E E ( g ( X )) = E ( g ( X )) � � = 1 � ◮ variance: Var E ( g ( X )) M Var ( g ( X )) for independent samples only! ◮ precision of MC estimate increases with M , independent of parameter dimension! (numeric integration: number of evaluation points increases exponentially with dimension) a.s. � − − → E ( g ( X )) (strong law of large numbers) lim E ( g ( X )) ◮ M →∞ 273
Monte Carlo Estimation: Why does it work? � ◮ want to estimate E ( g ( X )) = g ( x ) f ( x | . . . ) dx ◮ Monte Carlo sample x 1 , . . . , x M ( M samples drawn from f ( x | ... ) ) M E ( g ( X )) = 1 � � ◮ estimate E ( g ( X )) by g ( x i ) M i = 1 � � � ◮ unbiased: E E ( g ( X )) = E ( g ( X )) � � = 1 � ◮ variance: Var E ( g ( X )) M Var ( g ( X )) for independent samples only! ◮ precision of MC estimate increases with M , independent of parameter dimension! (numeric integration: number of evaluation points increases exponentially with dimension) a.s. � − − → E ( g ( X )) (strong law of large numbers) lim E ( g ( X )) ◮ M →∞ � � E ( g ( X )) a.s. � E ( g ( X )) , 1 ∼ N M Var ( g ( X )) (central limit thm) ◮ 273
Simulation & MCMC: Univariate Sampling ◮ assumption for all sampling algorithms: we can sample from the uniform U ([ 0 , 1 ]) ◮ done by pseudo-random number generator (PNRG), in R : ?RNG 274
Simulation & MCMC: Univariate Sampling ◮ assumption for all sampling algorithms: we can sample from the uniform U ([ 0 , 1 ]) ◮ done by pseudo-random number generator (PNRG), in R : ?RNG F ( a ) 1 u 0 a a 1 a 2 a 3 a 4 a 5 274
Simulation & MCMC: Univariate Sampling ◮ assumption for all sampling algorithms: we can sample from the uniform U ([ 0 , 1 ]) ◮ done by pseudo-random number generator (PNRG), in R : ?RNG F ( x ) 1 u 0 x 274
Simulation & MCMC: Univariate Sampling ◮ assumption for all sampling algorithms: we can sample from the uniform U ([ 0 , 1 ]) ◮ done by pseudo-random number generator (PNRG), in R : ?RNG ◮ does not work F ( x ) well in 1 dimensions > 1 u 0 x 274
Simulation & MCMC: Univariate Sampling ◮ assumption for all sampling algorithms: we can sample from the uniform U ([ 0 , 1 ]) ◮ done by pseudo-random number generator (PNRG), in R : ?RNG ◮ does not work F ( x ) well in 1 dimensions > 1 ◮ needs F − 1 ( · ) u 0 x 274
Simulation & MCMC: Univariate Sampling ◮ assumption for all sampling algorithms: we can sample from the uniform U ([ 0 , 1 ]) ◮ done by pseudo-random number generator (PNRG), in R : ?RNG ◮ does not work F ( x ) well in 1 dimensions > 1 ◮ needs F − 1 ( · ) u ◮ needs normalization factor ◮ rejection sampling 0 x 274
Simulation & MCMC: Rejection Sampling kq ( z ) kq ( z 0 ) p ( z ) ∝ target density ˜ q ( z ) proposal density p ( z ) � u 0 z 0 z 275
Simulation & MCMC: Rejection Sampling kq ( z ) kq ( z 0 ) p ( z ) ∝ target density ˜ q ( z ) proposal density p ( z ) � u 0 z 0 z 1. sample z ( ↔ ) from q ( z ) 275
Simulation & MCMC: Rejection Sampling kq ( z ) kq ( z 0 ) p ( z ) ∝ target density ˜ q ( z ) proposal density p ( z ) � u 0 z 0 z 1. sample z ( ↔ ) from q ( z ) 2. sample u ( � ) from U ([ 0 , kq ( z )]) 275
Simulation & MCMC: Rejection Sampling kq ( z ) kq ( z 0 ) p ( z ) ∝ target density ˜ q ( z ) proposal density p ( z ) � u 0 z 0 z 1. sample z ( ↔ ) from q ( z ) sample points uniformly from union of white and grey areas 2. sample u ( � ) from U ([ 0 , kq ( z )]) 275
Simulation & MCMC: Rejection Sampling kq ( z ) kq ( z 0 ) p ( z ) ∝ target density ˜ q ( z ) proposal density p ( z ) � u 0 z 0 z 1. sample z ( ↔ ) from q ( z ) sample points uniformly from union of white and grey areas 2. sample u ( � ) from U ([ 0 , kq ( z )]) 3. reject all points in the grey area 275
Simulation & MCMC: Rejection Sampling kq ( z ) kq ( z 0 ) p ( z ) ∝ target density ˜ q ( z ) proposal density p ( z ) � u 0 z 0 z 1. sample z ( ↔ ) from q ( z ) sample points uniformly from union of white and grey areas 2. sample u ( � ) from U ([ 0 , kq ( z )]) 3. reject all points in the grey area 4. forget about u : z distributed ∝ ˜ p ( z ) ! 275
Markov Chain Monte Carlo: General Idea ◮ need to sample from high-dimensional distributions 276
Markov Chain Monte Carlo: General Idea ◮ need to sample from high-dimensional distributions ◮ idea: produce samples by a Markov Chain: random walk over parameter space 276
Markov Chain Monte Carlo: General Idea ◮ need to sample from high-dimensional distributions ◮ idea: produce samples by a Markov Chain: random walk over parameter space ◮ random walk spends more time in high-probability regions 276
Markov Chain Monte Carlo: General Idea ◮ need to sample from high-dimensional distributions ◮ idea: produce samples by a Markov Chain: random walk over parameter space ◮ random walk spends more time in high-probability regions ◮ if in each step we move in one dimension only: need to sample from one-dimensional distribution only, can use previous algorithms for that! 276
Markov Chain Monte Carlo: General Idea ◮ need to sample from high-dimensional distributions ◮ idea: produce samples by a Markov Chain: random walk over parameter space ◮ random walk spends more time in high-probability regions ◮ if in each step we move in one dimension only: need to sample from one-dimensional distribution only, can use previous algorithms for that! ◮ but: samples are not independent! 276
Markov Chain Monte Carlo: Algorithms ◮ Metropolis-Hastings: 277
Markov Chain Monte Carlo: Algorithms ◮ Metropolis-Hastings: ◮ propose a step (draw from easy-to-sample-from proposal distribution ) 277
Markov Chain Monte Carlo: Algorithms ◮ Metropolis-Hastings: ◮ propose a step (draw from easy-to-sample-from proposal distribution ) ◮ accept step with certain probability (tailored to make chain approach the target distribution) 277
Markov Chain Monte Carlo: Algorithms ◮ Metropolis-Hastings: ◮ propose a step (draw from easy-to-sample-from proposal distribution ) ◮ accept step with certain probability (tailored to make chain approach the target distribution) ◮ Stan uses an improved variant called Hamiltonian MH 277
Markov Chain Monte Carlo: Algorithms ◮ Metropolis-Hastings: ◮ propose a step (draw from easy-to-sample-from proposal distribution ) ◮ accept step with certain probability (tailored to make chain approach the target distribution) ◮ Stan uses an improved variant called Hamiltonian MH ◮ Gibbs sampler: 277
Markov Chain Monte Carlo: Algorithms ◮ Metropolis-Hastings: ◮ propose a step (draw from easy-to-sample-from proposal distribution ) ◮ accept step with certain probability (tailored to make chain approach the target distribution) ◮ Stan uses an improved variant called Hamiltonian MH ◮ Gibbs sampler: ◮ loop over parameter vector ( θ 1 , θ 2 , . . . ) 277
Markov Chain Monte Carlo: Algorithms ◮ Metropolis-Hastings: ◮ propose a step (draw from easy-to-sample-from proposal distribution ) ◮ accept step with certain probability (tailored to make chain approach the target distribution) ◮ Stan uses an improved variant called Hamiltonian MH ◮ Gibbs sampler: ◮ loop over parameter vector ( θ 1 , θ 2 , . . . ) ◮ draw from the full conditionals f ( θ i | everything else ) ∝ joint 277
Markov Chain Monte Carlo: Algorithms ◮ Metropolis-Hastings: ◮ propose a step (draw from easy-to-sample-from proposal distribution ) ◮ accept step with certain probability (tailored to make chain approach the target distribution) ◮ Stan uses an improved variant called Hamiltonian MH ◮ Gibbs sampler: ◮ loop over parameter vector ( θ 1 , θ 2 , . . . ) ◮ draw from the full conditionals f ( θ i | everything else ) ∝ joint ◮ special case of MH where proposals are always accepted 277
Markov Chain Monte Carlo: Why does this work? Algorithms create a ◮ stationary, ◮ irreducible and ◮ aperiodic Markov chain which has the joint as its limiting (invariant) distribution 278
Markov Chain Monte Carlo: Why does this work? Algorithms create a ◮ stationary, ◮ irreducible and . . . X T − 1 X 1 X 2 X T ◮ aperiodic Markov chain which has the joint as its limiting (invariant) distribution 278
Markov Chain Monte Carlo: Why does this work? Algorithms create a ◮ stationary , f ( x 2 | x 1 ) = · · · = f ( x n | x n − 1 ) ◮ irreducible and . . . X T − 1 X 1 X 2 X T ◮ aperiodic Markov chain which has the joint as its limiting (invariant) distribution 278
Markov Chain Monte Carlo: Why does this work? Algorithms create a ◮ stationary, ◮ irreducible and 0 . 4 1 2 ◮ aperiodic 0 . 5 Markov chain which has the joint as its 0 . 2 0 . 6 0 . 7 0 . 5 limiting (invariant) distribution 0 . 8 3 4 0 . 3 278
Markov Chain Monte Carlo: Why does this work? Algorithms create a ◮ stationary, ◮ irreducible and 0 . 4 0 . 1 1 2 5 ◮ aperiodic 0 . 5 Markov chain 1 which has the joint as its 0 . 2 0 . 6 0 . 7 0 . 5 limiting (invariant) distribution 0 . 8 3 4 0 . 3 278
Markov Chain Monte Carlo: Why does this work? Algorithms create a ◮ stationary, ◮ irreducible and 0 . 4 1 2 ◮ aperiodic 0 . 5 Markov chain which has the joint as its 0 . 2 0 . 6 0 . 7 0 . 5 limiting (invariant) distribution 0 . 8 3 4 0 . 3 278
Markov Chain Monte Carlo: Why does this work? Algorithms create a ◮ stationary, 1 1 ◮ irreducible and 0 . 4 1 2 ◮ aperiodic 0 . 5 Markov chain which has the joint as its 0 . 2 0 . 6 0 . 7 0 . 5 limiting (invariant) distribution 0 . 8 3 4 0 . 3 1 1 278
Markov Chain Monte Carlo: Why does this work? Algorithms create a ◮ stationary, 1 1 ◮ irreducible and 0 . 4 1 2 ◮ aperiodic 0 . 5 Markov chain which has the joint as its 0 . 2 0 . 6 0 . 7 0 . 5 limiting (invariant) distribution 0 . 8 3 4 0 . 3 1 1 278
Recommend
More recommend