MCMC — convergence After 1000 iterations 42 7 10 2 694 643 947 517 787 627 71 436 301 673 441 388 365 319 6 450 614 935 484 511 281 44 405 976 271 476 848 544 678 914 487 91 190 442 292 554 727 994 393 865 750 661 611 210 841 101 413 76 818 82 193 729 726 959 631 205 323 878 904 407 745 147 185 686 58 776 579 681 700 254 230 111 832 260 134 483 485 340 640 123 285 495 995 654 655 792 724 150 155 371 872 880 596 911 784 90 524 537538 149 937 296 646 161 173 877 33 575 648 221 574 421 701 639 997 612 177 266 606 103 635 410 540 710 379 587 5 253 429 846 598 461 980 29 465 400 369 492 801 668 844 939 342 982 386 412 807 966 416 18 1000 828 572 615 244 184 867 40 826 61 990 744 274 555 156 541 277 968 295 653 675 σ 493 378 771 217 346 770 873 226 526 170 733 130 854 986 187 739 146 80 283 269 905 717 503 740 817 636 930 395 52 499 264 641 796 658 344 633 550 795 567 64 637 683 352 996 12 276 748 240 124 398 928 882 571 457 774 819 657 960 22 956 355 202 93 24 223 582 515 577 477 863 899 900 651 464 978 207 120 143 765 138 585 829 679 629 312 625 957 565 563 251 852 481 321 109 720 945 376 317 293 752 377 599 824 448 215 617 689 593 543 667 890 756 821 196 381 220 136 309 3 451 186 126 991 797 181 203 56 106 834 258 27 237238 921 731 363 619 811 670 459 37 97 414 298 157 182 605 252 925 250 502 246 514 373 7 664 194 337 558 72 525 929 938 443 444 382 692 754 506 255 716 932 924 549 311 197 698 468 800 860 705 167 690 849 940 523 884 556 59 164 65 780 546 73 718 594 88 789 14 200 360 962 963 354 509 491 813 368 75 529 552 951 437 211 394 415 498 870 315 908 576 180 521 474 684 338 918 48 35 518 366 262 766 403 159 374 975 560 851 659 888 201 696 742 588 397 769 455 333 69 628 618 174 449 272 267 922 399 919 920 602 847 671 66 153 154 583 453 875 591 893 248 125 278 816 778 357 907 969 970 808 755 802 343 144 234 326 411 313 783 677 212 85 603 853 569 236 50 113 508 431 793 347 284 835 135 438 522 520 171 941 282 372 162 163 707 460 115 699 663 261 132 909 118 838 936 391 424 706 857 933 927 645 49 601 856 773 823 286 183 92 446 711 626 232 327 964 445 858 158 993 243 179 862 229 535 78 714 530 362 898 504 430 370 297 305 885 732 988 753 471 119 735 165 401 736 913 339 359 329 31 779 788 439 912 730 721 722 723 712 616 665 385 383 104 622 570 513 916 433 604 304 322 402 559 998 715 325 761 290 4 915 662 213 222 466 589 141 785 418 227 973 595 117 99 96 241 310 592 279 198 691 151 488 208 452 257 288 489 981 805 469 608 116 84 737 887 472 473 204 55 480 810 87 746 874 891 886 977 265 324 906 417 175 845 533 291 419 547 781 54 169 650 496 409 422 478 396 432 127 519 462 100 883 747 142 287 270 390 831 110 896 952 943 728 833 864 318 102 881 512 426 361 19 4 131 695 757758 564 505 107 108 17 804 869 427 497 609 9 842 903 341 89 734 408 45 944 356 751 256 738 794 568 798 314 763 39 20 105 620 435 458 139 676 985 490 948 632 486 121 871 259 806 708 709 475 839 790 140 843 767 768 233 67 26 983 837 47 332 889 972 500 206 218 137 168 923 917 775 299 623 561 62 954 249 953 330 534 607 307 98 235 463 423 528 148 425 79 38 456 961 21 334 987 791 188 189 979 892 803 294 114 63 16 741 580 30 971 216 77 25 389 440 669 81 335 199 129 861 95 812 989 6 152 902 302 191 719 958 434 70 268 697 855 772 590 345 910 992 868 467 760 303 688 682 553 702 786 364 942 392 86 214 836 367 965 447 350 192 172 830 384 532 894 879 53 647 83 876 584 672 404 634 548 375 693 950 8 782 814 380 850 840 470 901 799 931 638 621 358 895 178 242 300 897 128 573 387 306 713 967 578 225 545 764 224 34 501 348 624 94 122 762 610 494 231 820 566 328 859 219 166 275 74 551 542 57 28 228 479 13 51 353 974 23 644 68 557 336 531 320 308 749 5 955 777 176 984 454 420 703 597 825 536 649 133 656 15 949 539 209 510 331 349 41 666 674 680 652 239 60 46 482 245 43 406 809 562 428 704 36 926 280 160 685 351 822 11 613 660 946 145 581 999 759 289 247 866 642 725 934 507 815 586 527 32 195 600 316 273 3 687 112 630 827 263 516 743 1 2 −2 0 2 4 6 8 µ Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 7 / 92
MCMC — convergence 200 Chain 1 Chain 2 150 100 50 0 −50 −100 Sample after convergence Burn−in 0 100 200 300 400 500 Iteration Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 7 / 92
MCMC — convergence Uncentred model Centred model 3240 −2300 α α −2600 3180 0 100 200 300 400 500 0 100 200 300 400 500 Iteration Iteration 150 170 144 β β 138 150 0 100 200 300 400 500 0 100 200 300 400 500 Iteration Iteration • Formal assessment of convergence: potential scale reduction � � Var ( θ k | y ) ˆ R = W ( θ k ) Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 8 / 92
MCMC — autocorrelation Autocorrelation function for α − Uncentred model Autocorrelation function for α − Centred model 1.0 1.0 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0.0 0.0 0 5 10 15 20 25 30 0 5 10 15 20 25 30 Lag Lag • Formal assessment of autocorrelation: effective sample size S 1 + 2 � ∞ n eff = t =1 ρ t Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 9 / 92
MCMC — brute force Autocorrelation function for α − Uncentred model with thinning Uncentred model with thinning 1.0 −2000 α 0.8 −2800 0 100 200 300 400 500 0.6 Iteration 0.4 155 0.2 140 β 0.0 125 0 100 200 300 400 500 0 5 10 15 20 25 30 Iteration Lag Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 10 / 92
MCMC — pros & cons • “Standard” MCMC sampler are generally easy-ish to program and are in fact implemented in readily available software • However, depending on the complexity of the problem, their efficiency might be limited Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 11 / 92
MCMC — pros & cons • “Standard” MCMC sampler are generally easy-ish to program and are in fact implemented in readily available software • However, depending on the complexity of the problem, their efficiency might be limited • Possible solutions 1 More complex model specification • Blocking • Overparameterisation Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 11 / 92
MCMC — pros & cons • “Standard” MCMC sampler are generally easy-ish to program and are in fact implemented in readily available software • However, depending on the complexity of the problem, their efficiency might be limited • Possible solutions 1 More complex model specification • Blocking • Overparameterisation 2 More complex sampling schemes • Hamiltonian Monte Carlo • No U-turn sampling (eg stan — more on this later!) Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 11 / 92
MCMC — pros & cons • “Standard” MCMC sampler are generally easy-ish to program and are in fact implemented in readily available software • However, depending on the complexity of the problem, their efficiency might be limited • Possible solutions 1 More complex model specification • Blocking • Overparameterisation 2 More complex sampling schemes • Hamiltonian Monte Carlo • No U-turn sampling (eg stan — more on this later!) 3 Alternative methods of inference • Approximate Bayesian Computation (ABC) • INLA Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 11 / 92
MCMC — pros & cons • “Standard” MCMC sampler are generally easy-ish to program and are in fact implemented in readily available software • However, depending on the complexity of the problem, their efficiency might be limited • Possible solutions 1 More complex model specification • Blocking • Overparameterisation 2 More complex sampling schemes • Hamiltonian Monte Carlo • No U-turn sampling (eg stan — more on this later!) 3 Alternative methods of inference • Approximate Bayesian Computation (ABC) • INLA Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 11 / 92
Basics of INLA The basic ideas revolve around • Formulating the model using a specific characterisation – All models that can be formulated in this way have certain features in common, which facilitate the computational aspects – The characterisation is still quite general and covers a wide range of possible models (more on that later!) – NB : This implies less flexibility with respect to MCMC — but in many cases this is not a huge limitation! Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 12 / 92
Basics of INLA The basic ideas revolve around • Formulating the model using a specific characterisation – All models that can be formulated in this way have certain features in common, which facilitate the computational aspects – The characterisation is still quite general and covers a wide range of possible models (more on that later!) – NB : This implies less flexibility with respect to MCMC — but in many cases this is not a huge limitation! • Use some basic probability conditions to approximate the relevant distributions Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 12 / 92
Basics of INLA The basic ideas revolve around • Formulating the model using a specific characterisation – All models that can be formulated in this way have certain features in common, which facilitate the computational aspects – The characterisation is still quite general and covers a wide range of possible models (more on that later!) – NB : This implies less flexibility with respect to MCMC — but in many cases this is not a huge limitation! • Use some basic probability conditions to approximate the relevant distributions • Compute the relevant quantities typically using numerical methods Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 12 / 92
Latent Gaussian models (LGMs) • The general problem of (parametric) inference is posited by assuming a probability model for the observed data, as a function of some relevant parameters � n y | θ , ψ ∼ p ( y | θ , ψ ) = p ( y i | θ , ψ ) i =1 Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 13 / 92
Latent Gaussian models (LGMs) • The general problem of (parametric) inference is posited by assuming a probability model for the observed data, as a function of some relevant parameters � n y | θ , ψ ∼ p ( y | θ , ψ ) = p ( y i | θ , ψ ) i =1 • Often (in fact for a surprisingly large range of models!), we can assume that the parameters are described by a Gaussian Markov Random Field (GMRF) θ | ψ ∼ Normal ( 0 , Σ ( ψ )) θ l ⊥ ⊥ θ m | θ − lm where – The notation “ − lm ” indicates all the other elements of the parameters vector, excluding elements l and m – The covariance matrix Σ depends on some hyper-parameters ψ Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 13 / 92
Latent Gaussian models (LGMs) • The general problem of (parametric) inference is posited by assuming a probability model for the observed data, as a function of some relevant parameters � n y | θ , ψ ∼ p ( y | θ , ψ ) = p ( y i | θ , ψ ) i =1 • Often (in fact for a surprisingly large range of models!), we can assume that the parameters are described by a Gaussian Markov Random Field (GMRF) θ | ψ ∼ Normal ( 0 , Σ ( ψ )) θ l ⊥ ⊥ θ m | θ − lm where – The notation “ − lm ” indicates all the other elements of the parameters vector, excluding elements l and m – The covariance matrix Σ depends on some hyper-parameters ψ • This kind of models is often referred to as Latent Gaussian models Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 13 / 92
LGMs as a general framework • In general, we can partition ψ = ( ψ 1 , ψ 2 ) and re-express a LGM as ψ ∼ p ( ψ ) (“hyperprior”) θ | ψ ∼ p ( θ | ψ ) = Normal (0 , Σ ( ψ 1 )) (“GMRF prior”) � y | θ , ψ ∼ p ( y i | θ , ψ 2 ) (“data model”) i ie ψ 1 are the hyper-parameters and ψ 2 are nuisance parameters Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 14 / 92
LGMs as a general framework • In general, we can partition ψ = ( ψ 1 , ψ 2 ) and re-express a LGM as ψ ∼ p ( ψ ) (“hyperprior”) θ | ψ ∼ p ( θ | ψ ) = Normal (0 , Σ ( ψ 1 )) (“GMRF prior”) � y | θ , ψ ∼ p ( y i | θ , ψ 2 ) (“data model”) i ie ψ 1 are the hyper-parameters and ψ 2 are nuisance parameters • The dimension of θ can be very large (eg 10 2 -10 5 ) • Conversely, because of the conditional independence properties, the dimension of ψ is generally small (eg 1-5) Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 14 / 92
LGMs as a general framework • A very general way of specifying the problem is by modelling the mean for the i -th unit by means of an additive linear predictor, defined on a suitable scale (e.g. logistic for binomial data) � M � L η i = α + β m x mi + f l ( z li ) m =1 l =1 where – α is the intercept; – β = ( β 1 , . . . , β M ) quantify the effect of x = ( x 1 , . . . , x M ) on the response; – f = { f 1 ( · ) , . . . , f L ( · ) } is a set of functions defined in terms of some covariates z = ( z 1 , . . . , z L ) and then assume θ = ( α, β , f ) ∼ GMRF ( ψ ) Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 15 / 92
LGMs as a general framework • A very general way of specifying the problem is by modelling the mean for the i -th unit by means of an additive linear predictor, defined on a suitable scale (e.g. logistic for binomial data) � M � L η i = α + β m x mi + f l ( z li ) m =1 l =1 where – α is the intercept; – β = ( β 1 , . . . , β M ) quantify the effect of x = ( x 1 , . . . , x M ) on the response; – f = { f 1 ( · ) , . . . , f L ( · ) } is a set of functions defined in terms of some covariates z = ( z 1 , . . . , z L ) and then assume θ = ( α, β , f ) ∼ GMRF ( ψ ) • NB : This of course implies some form of Normally-distributed marginals for α, β and f Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 15 / 92
LGMs as a general framework — examples Upon varying the form of the functions f l ( · ) , this formulation can accommodate a wide range of models Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 16 / 92
LGMs as a general framework — examples Upon varying the form of the functions f l ( · ) , this formulation can accommodate a wide range of models • Standard regression – f l ( · ) = NULL Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 16 / 92
LGMs as a general framework — examples Upon varying the form of the functions f l ( · ) , this formulation can accommodate a wide range of models • Standard regression – f l ( · ) = NULL • Hierarchical models – f l ( · ) ∼ Normal (0 , σ 2 f ) (Exchangeable) σ 2 f | ψ ∼ some common distribution Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 16 / 92
LGMs as a general framework — examples Upon varying the form of the functions f l ( · ) , this formulation can accommodate a wide range of models • Standard regression – f l ( · ) = NULL • Hierarchical models – f l ( · ) ∼ Normal (0 , σ 2 f ) (Exchangeable) σ 2 f | ψ ∼ some common distribution • Spatial and spatio-temporal models – Two components: f 1 ( · ) ∼ CAR (Spatially structured effects) Two components: f 2 ( · ) ∼ Normal (0 , σ 2 f 2 ) (Unstructured residual) Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 16 / 92
LGMs as a general framework — examples Upon varying the form of the functions f l ( · ) , this formulation can accommodate a wide range of models • Standard regression – f l ( · ) = NULL • Hierarchical models – f l ( · ) ∼ Normal (0 , σ 2 f ) (Exchangeable) σ 2 f | ψ ∼ some common distribution • Spatial and spatio-temporal models – Two components: f 1 ( · ) ∼ CAR (Spatially structured effects) Two components: f 2 ( · ) ∼ Normal (0 , σ 2 f 2 ) (Unstructured residual) • Spline smoothing – f l ( · ) ∼ AR ( φ, σ 2 ε ) Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 16 / 92
LGMs as a general framework — examples Upon varying the form of the functions f l ( · ) , this formulation can accommodate a wide range of models • Standard regression – f l ( · ) = NULL • Hierarchical models – f l ( · ) ∼ Normal (0 , σ 2 f ) (Exchangeable) σ 2 f | ψ ∼ some common distribution • Spatial and spatio-temporal models – Two components: f 1 ( · ) ∼ CAR (Spatially structured effects) Two components: f 2 ( · ) ∼ Normal (0 , σ 2 f 2 ) (Unstructured residual) • Spline smoothing – f l ( · ) ∼ AR ( φ, σ 2 ε ) • Survival models / logGaussian Cox Processes – More complex specification in theory, but relatively easy to fit within the INLA framework! Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 16 / 92
Gaussian Markov Random Field In order to preserve the underlying conditional independence structure in a GMRF, it is necessary to constrain the parameterisation Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 17 / 92
Gaussian Markov Random Field In order to preserve the underlying conditional independence structure in a GMRF, it is necessary to constrain the parameterisation • Generally, it is complicated to do it in terms of the covariance matrix Σ – Typically, Σ is dense (ie many of the entries are non-zero) – If it happens to be sparse, this implies (marginal) independence among the relevant elements of θ — this is generally too stringent a requirement! Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 17 / 92
Gaussian Markov Random Field In order to preserve the underlying conditional independence structure in a GMRF, it is necessary to constrain the parameterisation • Generally, it is complicated to do it in terms of the covariance matrix Σ – Typically, Σ is dense (ie many of the entries are non-zero) – If it happens to be sparse, this implies (marginal) independence among the relevant elements of θ — this is generally too stringent a requirement! • Conversely, it is much simpler when using the precision matrix Q =: Σ − 1 – As it turns out, it can be shown that θ l ⊥ ⊥ θ m | θ − lm ⇔ Q lm = 0 – Thus, under conditional independence (which is a less restrictive constraint), the precision matrix is typically sparse – We can use existing numerical methods to deal with sparse matrices (eg the R package Matrix ) – Most computations in GMRFs are performed in terms of computing Cholesky’s factorisations Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 17 / 92
Precision matrix and conditional independence Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 18 / 92
Precision matrix and conditional independence Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 18 / 92
Precision matrix and conditional independence ——————————————————– Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 18 / 92
MCMC and LGMs • (Standard) MCMC methods tend to perform poorly when applied to (non-trivial) LGMs. This is due to several factors – The components of the latent Gaussian field θ tend to be highly correlated, thus impacting on convergence and autocorrelation – Especially when the number of observations is large, θ and ψ also tend to be highly correlated ρ = 0 . 95 ρ = 0 . 20 θ 2 θ 2 θ 1 θ 1 Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 19 / 92
MCMC and LGMs • (Standard) MCMC methods tend to perform poorly when applied to (non-trivial) LGMs. This is due to several factors – The components of the latent Gaussian field θ tend to be highly correlated, thus impacting on convergence and autocorrelation – Especially when the number of observations is large, θ and ψ also tend to be highly correlated ρ = 0 . 95 ρ = 0 . 20 θ 2 θ 2 θ 1 θ 1 • Again, blocking and overparameterisation can alleviate , but rarely eliminate the problem Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 19 / 92
Summary so far • Bayesian computation (especially for LGMs) is in general difficult • MCMC can be efficiently used in many simple cases, but becomes a bit trickier for complex models – Issues with convergence – Time to run can be very long Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 20 / 92
Summary so far • Bayesian computation (especially for LGMs) is in general difficult • MCMC can be efficiently used in many simple cases, but becomes a bit trickier for complex models – Issues with convergence – Time to run can be very long • A wide class of statistical models can be represented in terms of LGM • This allows us to take advantage of nice computational properties – GMRFs – Sparse precision matrices • This is at the heart of the INLA approach Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 20 / 92
Introduction to INLA Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 21 / 92
Integrated Nested Laplace Approximation (INLA) • The starting point to understand the INLA approach is the definition of conditional probability, which holds for any pair of variables ( x, z ) — and, technically, provided p ( z ) > 0 p ( x | z ) =: p ( x, z ) p ( z ) which can be re-written as p ( z ) = p ( x, z ) p ( x | z ) Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 22 / 92
Integrated Nested Laplace Approximation (INLA) • The starting point to understand the INLA approach is the definition of conditional probability, which holds for any pair of variables ( x, z ) — and, technically, provided p ( z ) > 0 p ( x | z ) =: p ( x, z ) p ( z ) which can be re-written as p ( z ) = p ( x, z ) p ( x | z ) • In particular, a conditional version can be obtained further considering a third variable w as p ( z | w ) = p ( x, z | w ) p ( x | z, w ) which is particularly relevant to the Bayesian case Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 22 / 92
Integrated Nested Laplace Approximation (INLA) • The second “ingredient” is Laplace approximation Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 23 / 92
Integrated Nested Laplace Approximation (INLA) • The second “ingredient” is Laplace approximation • Main idea: approximate log g ( x ) using a quadratic function by means of a Taylor’s series expansion around the mode ˆ x ∂ 2 log g (ˆ x ) + ∂ log g (ˆ x ) x ) + 1 x ) x ) 2 log g ( x ) ≈ log g (ˆ ( x − ˆ ( x − ˆ ∂x 2 ∂x 2 � � ∂ 2 log g (ˆ x ) + 1 x ) since ∂ log g (ˆ x ) x ) 2 = log g (ˆ ( x − ˆ = 0 ∂x 2 ∂x 2 Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 23 / 92
Integrated Nested Laplace Approximation (INLA) • The second “ingredient” is Laplace approximation • Main idea: approximate log g ( x ) using a quadratic function by means of a Taylor’s series expansion around the mode ˆ x ∂ 2 log g (ˆ x ) + ∂ log g (ˆ x ) x ) + 1 x ) x ) 2 log g ( x ) ≈ log g (ˆ ( x − ˆ ( x − ˆ ∂x 2 ∂x 2 � � ∂ 2 log g (ˆ x ) + 1 x ) since ∂ log g (ˆ x ) x ) 2 = log g (ˆ ( x − ˆ = 0 ∂x 2 ∂x 2 � ∂ 2 log g (ˆ σ 2 = − 1 x ) • Setting ˆ we can re-write ∂x 2 1 x ) 2 log g ( x ) ≈ log g (ˆ x ) − σ 2 ( x − ˆ 2ˆ or equivalently � � � � � x ) 2 − ( x − ˆ e log g ( x ) dx ≈ const g ( x ) dx = exp dx 2ˆ σ 2 Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 23 / 92
Integrated Nested Laplace Approximation (INLA) • The second “ingredient” is Laplace approximation • Main idea: approximate log g ( x ) using a quadratic function by means of a Taylor’s series expansion around the mode ˆ x ∂ 2 log g (ˆ x ) + ∂ log g (ˆ x ) x ) + 1 x ) x ) 2 log g ( x ) ≈ log g (ˆ ( x − ˆ ( x − ˆ ∂x 2 ∂x 2 � � ∂ 2 log g (ˆ x ) + 1 x ) since ∂ log g (ˆ x ) x ) 2 = log g (ˆ ( x − ˆ = 0 ∂x 2 ∂x 2 � ∂ 2 log g (ˆ σ 2 = − 1 x ) • Setting ˆ we can re-write ∂x 2 1 x ) 2 log g ( x ) ≈ log g (ˆ x ) − σ 2 ( x − ˆ 2ˆ or equivalently � � � � � x ) 2 − ( x − ˆ e log g ( x ) dx ≈ const g ( x ) dx = exp dx 2ˆ σ 2 σ 2 ) • Thus, under LA, g ( x ) ≈ Normal (ˆ x, ˆ Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 23 / 92
Laplace approximation — example − x k 2 − 1 e • Consider a χ 2 distribution: p ( x ) = g ( x ) = x 2 c c Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 24 / 92
Laplace approximation — example − x k 2 − 1 e • Consider a χ 2 distribution: p ( x ) = g ( x ) = x 2 c c � k � log x − x 1 l ( x ) = log g ( x ) = 2 − 1 2 Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 24 / 92
Laplace approximation — example − x k 2 − 1 e • Consider a χ 2 distribution: p ( x ) = g ( x ) = x 2 c c � k � log x − x 1 l ( x ) = log g ( x ) = 2 − 1 2 2 l ′ ( x ) = ∂ log g ( x ) � k � x − 1 − 1 = 2 − 1 ∂x 2 Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 24 / 92
Laplace approximation — example − x k 2 − 1 e • Consider a χ 2 distribution: p ( x ) = g ( x ) = x 2 c c � k � log x − x 1 l ( x ) = log g ( x ) = 2 − 1 2 2 l ′ ( x ) = ∂ log g ( x ) � k � x − 1 − 1 = 2 − 1 ∂x 2 3 l ′′ ( x ) = ∂ 2 log g ( x ) � k � x − 2 = − 2 − 1 ∂x 2 Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 24 / 92
Laplace approximation — example − x k 2 − 1 e • Consider a χ 2 distribution: p ( x ) = g ( x ) = x 2 c c � k � log x − x 1 l ( x ) = log g ( x ) = 2 − 1 2 2 l ′ ( x ) = ∂ log g ( x ) � k � x − 1 − 1 = 2 − 1 ∂x 2 3 l ′′ ( x ) = ∂ 2 log g ( x ) � k � x − 2 = − 2 − 1 ∂x 2 • Then – Solving l ′ ( x ) = 0 we find the mode: ˆ x = k − 2 1 σ 2 = 2( k − 2) – Evaluating − l ′′ ( x ) at the mode gives ˆ Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 24 / 92
Laplace approximation — example − x k 2 − 1 e • Consider a χ 2 distribution: p ( x ) = g ( x ) = x 2 c c � k � log x − x 1 l ( x ) = log g ( x ) = 2 − 1 2 2 l ′ ( x ) = ∂ log g ( x ) � k � x − 1 − 1 = 2 − 1 ∂x 2 3 l ′′ ( x ) = ∂ 2 log g ( x ) � k � x − 2 = − 2 − 1 ∂x 2 • Then – Solving l ′ ( x ) = 0 we find the mode: ˆ x = k − 2 1 σ 2 = 2( k − 2) – Evaluating − l ′′ ( x ) at the mode gives ˆ • Consequently, we can approximate p ( x ) as p ( x ) ≈ ˜ p ( x ) = Normal ( k − 2 , 2( k − 2)) Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 24 / 92
Laplace approximation — example 0.30 0.15 — χ 2(3) — χ 2(6) 0.25 - - - Normal (1 , 2) - - - Normal (4 , 8) 0.20 0.10 0.15 0.10 0.05 0.05 0.00 0.00 0 2 4 6 8 10 0 5 10 15 — χ 2(10) — χ 2(20) 0.10 - - - Normal (8 , 16) - - - Normal (18 , 36) 0.06 0.08 0.06 0.04 0.04 0.02 0.02 0.00 0.00 0 5 10 15 20 0 10 20 30 40 Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 25 / 92
Integrated Nested Laplace Approximation (INLA) • The general idea is that using the fundamental probability equations, we can approximate a generic conditional (posterior) distribution as p ( z | w ) = p ( x, z | w ) ˜ p ( x | z, w ) , ˜ where ˜ p ( x | z, w ) is the Laplace approximation to the conditional distribution of x given z, w Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 26 / 92
Integrated Nested Laplace Approximation (INLA) • The general idea is that using the fundamental probability equations, we can approximate a generic conditional (posterior) distribution as p ( z | w ) = p ( x, z | w ) ˜ p ( x | z, w ) , ˜ where ˜ p ( x | z, w ) is the Laplace approximation to the conditional distribution of x given z, w • This idea can be used to approximate any generic required posterior distribution Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 26 / 92
Integrated Nested Laplace Approximation (INLA) Objective of Bayesian estimation • In a Bayesian LGM, the required distributions are � � p ( θ j | y ) = p ( θ j , ψ | y ) d ψ = p ( ψ | y ) p ( θ j | ψ , y ) d ψ � p ( ψ k | y ) = p ( ψ | y ) d ψ − k Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 27 / 92
Integrated Nested Laplace Approximation (INLA) Objective of Bayesian estimation • In a Bayesian LGM, the required distributions are � � p ( θ j | y ) = p ( θ j , ψ | y ) d ψ = p ( ψ | y ) p ( θ j | ψ , y ) d ψ � p ( ψ k | y ) = p ( ψ | y ) d ψ − k • Thus we need to estimate: (1.) p ( ψ | y ) , from which also all the relevant marginals p ( ψ k | y ) can be obtained; Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 27 / 92
Integrated Nested Laplace Approximation (INLA) Objective of Bayesian estimation • In a Bayesian LGM, the required distributions are � � p ( θ j | y ) = p ( θ j , ψ | y ) d ψ = p ( ψ | y ) p ( θ j | ψ , y ) d ψ � p ( ψ k | y ) = p ( ψ | y ) d ψ − k • Thus we need to estimate: (1.) p ( ψ | y ) , from which also all the relevant marginals p ( ψ k | y ) can be obtained; (2.) p ( θ j | ψ , y ) , which is needed to compute the marginal posterior for the parameters Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 27 / 92
Integrated Nested Laplace Approximation (INLA) (1.) can be easily estimated as p ( θ , ψ | y ) p ( ψ | y ) = p ( θ | ψ , y ) Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 28 / 92
Integrated Nested Laplace Approximation (INLA) (1.) can be easily estimated as p ( θ , ψ | y ) p ( ψ | y ) = p ( θ | ψ , y ) p ( y | θ , ψ ) p ( θ , ψ ) 1 = p ( y ) p ( θ | ψ , y ) Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 28 / 92
Integrated Nested Laplace Approximation (INLA) (1.) can be easily estimated as p ( θ , ψ | y ) p ( ψ | y ) = p ( θ | ψ , y ) p ( y | θ , ψ ) p ( θ , ψ ) 1 = p ( y ) p ( θ | ψ , y ) p ( y | θ ) p ( θ | ψ ) p ( ψ ) 1 = p ( y ) p ( θ | ψ , y ) Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 28 / 92
Integrated Nested Laplace Approximation (INLA) (1.) can be easily estimated as p ( θ , ψ | y ) p ( ψ | y ) = p ( θ | ψ , y ) p ( y | θ , ψ ) p ( θ , ψ ) 1 = p ( y ) p ( θ | ψ , y ) p ( y | θ ) p ( θ | ψ ) p ( ψ ) 1 = p ( y ) p ( θ | ψ , y ) p ( ψ ) p ( θ | ψ ) p ( y | θ ) ∝ p ( θ | ψ , y ) Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 28 / 92
Integrated Nested Laplace Approximation (INLA) (1.) can be easily estimated as p ( θ , ψ | y ) p ( ψ | y ) = p ( θ | ψ , y ) p ( y | θ , ψ ) p ( θ , ψ ) 1 = p ( y ) p ( θ | ψ , y ) p ( y | θ ) p ( θ | ψ ) p ( ψ ) 1 = p ( y ) p ( θ | ψ , y ) p ( ψ ) p ( θ | ψ ) p ( y | θ ) ∝ p ( θ | ψ , y ) � � p ( ψ ) p ( θ | ψ ) p ( y | θ ) � ≈ =: ˜ p ( ψ | y ) � p ( θ | ψ , y ) ˜ θ = ˆ θ ( ψ ) where – ˜ p ( θ | ψ , y ) is the Laplace approximation of p ( θ | ψ , y ) – θ = ˆ θ ( ψ ) is its mode Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 28 / 92
Integrated Nested Laplace Approximation (INLA) (2.) is slightly more complex, because in general there will be more elements in θ than there are in ψ and thus this computation is more expensive Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 29 / 92
Integrated Nested Laplace Approximation (INLA) (2.) is slightly more complex, because in general there will be more elements in θ than there are in ψ and thus this computation is more expensive • One easy possibility is to approximate p ( θ j | ψ , y ) directly using a Normal distribution, where the precision matrix is based on the Cholesky decomposition of the precision matrix Q . While this is very fast, the approximation is generally not very good Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 29 / 92
Integrated Nested Laplace Approximation (INLA) (2.) is slightly more complex, because in general there will be more elements in θ than there are in ψ and thus this computation is more expensive • One easy possibility is to approximate p ( θ j | ψ , y ) directly using a Normal distribution, where the precision matrix is based on the Cholesky decomposition of the precision matrix Q . While this is very fast, the approximation is generally not very good • Alternatively, we can write θ = { θ j , θ − j } , use the definition of conditional probability and again Laplace approximation to obtain p ( { θ j , θ − j } | ψ , y ) p ( θ j | ψ , y ) = p ( θ − j | θ j , ψ , y ) Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 29 / 92
Integrated Nested Laplace Approximation (INLA) (2.) is slightly more complex, because in general there will be more elements in θ than there are in ψ and thus this computation is more expensive • One easy possibility is to approximate p ( θ j | ψ , y ) directly using a Normal distribution, where the precision matrix is based on the Cholesky decomposition of the precision matrix Q . While this is very fast, the approximation is generally not very good • Alternatively, we can write θ = { θ j , θ − j } , use the definition of conditional probability and again Laplace approximation to obtain p ( { θ j , θ − j } | ψ , y ) = p ( { θ j , θ − j } , ψ | y ) 1 p ( θ j | ψ , y ) = p ( θ − j | θ j , ψ , y ) p ( ψ | y ) p ( θ − j | θ j , ψ , y ) Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 29 / 92
Integrated Nested Laplace Approximation (INLA) (2.) is slightly more complex, because in general there will be more elements in θ than there are in ψ and thus this computation is more expensive • One easy possibility is to approximate p ( θ j | ψ , y ) directly using a Normal distribution, where the precision matrix is based on the Cholesky decomposition of the precision matrix Q . While this is very fast, the approximation is generally not very good • Alternatively, we can write θ = { θ j , θ − j } , use the definition of conditional probability and again Laplace approximation to obtain p ( { θ j , θ − j } | ψ , y ) = p ( { θ j , θ − j } , ψ | y ) 1 p ( θ j | ψ , y ) = p ( θ − j | θ j , ψ , y ) p ( ψ | y ) p ( θ − j | θ j , ψ , y ) p ( θ , ψ | y ) ∝ p ( θ − j | θ j , ψ , y ) Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 29 / 92
Integrated Nested Laplace Approximation (INLA) (2.) is slightly more complex, because in general there will be more elements in θ than there are in ψ and thus this computation is more expensive • One easy possibility is to approximate p ( θ j | ψ , y ) directly using a Normal distribution, where the precision matrix is based on the Cholesky decomposition of the precision matrix Q . While this is very fast, the approximation is generally not very good • Alternatively, we can write θ = { θ j , θ − j } , use the definition of conditional probability and again Laplace approximation to obtain p ( { θ j , θ − j } | ψ , y ) = p ( { θ j , θ − j } , ψ | y ) 1 p ( θ j | ψ , y ) = p ( θ − j | θ j , ψ , y ) p ( ψ | y ) p ( θ − j | θ j , ψ , y ) p ( θ , ψ | y ) ∝ p ( ψ ) p ( θ | ψ ) p ( y | θ ) ∝ p ( θ − j | θ j , ψ , y ) p ( θ − j | θ j , ψ , y ) Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 29 / 92
Integrated Nested Laplace Approximation (INLA) (2.) is slightly more complex, because in general there will be more elements in θ than there are in ψ and thus this computation is more expensive • One easy possibility is to approximate p ( θ j | ψ , y ) directly using a Normal distribution, where the precision matrix is based on the Cholesky decomposition of the precision matrix Q . While this is very fast, the approximation is generally not very good • Alternatively, we can write θ = { θ j , θ − j } , use the definition of conditional probability and again Laplace approximation to obtain p ( { θ j , θ − j } | ψ , y ) = p ( { θ j , θ − j } , ψ | y ) 1 p ( θ j | ψ , y ) = p ( θ − j | θ j , ψ , y ) p ( ψ | y ) p ( θ − j | θ j , ψ , y ) p ( θ , ψ | y ) ∝ p ( ψ ) p ( θ | ψ ) p ( y | θ ) ∝ p ( θ − j | θ j , ψ , y ) p ( θ − j | θ j , ψ , y ) � � p ( ψ ) p ( θ | ψ ) p ( y | θ ) � ≈ =: ˜ p ( θ j | ψ , y ) � p ( θ − j | θ j , ψ , y ) ˜ θ − j = ˆ θ − j ( θ j , ψ ) Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 29 / 92
Integrated Nested Laplace Approximation (INLA) • Because ( θ − j | θ j , ψ , y ) are reasonably Normal, the approximation works generally well • However, this strategy can be computationally expensive Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 30 / 92
Integrated Nested Laplace Approximation (INLA) • Because ( θ − j | θ j , ψ , y ) are reasonably Normal, the approximation works generally well • However, this strategy can be computationally expensive • The most efficient algorithm is the “ Simplified Laplace Approximation ” – Based on a Taylor’s series expansion up to the third order of both numerator and denominator for ˜ p ( θ j | ψ , y ) – This effectively “corrects” the Gaussian approximation for location and skewness to increase the fit to the required distribution Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 30 / 92
Integrated Nested Laplace Approximation (INLA) • Because ( θ − j | θ j , ψ , y ) are reasonably Normal, the approximation works generally well • However, this strategy can be computationally expensive • The most efficient algorithm is the “ Simplified Laplace Approximation ” – Based on a Taylor’s series expansion up to the third order of both numerator and denominator for ˜ p ( θ j | ψ , y ) – This effectively “corrects” the Gaussian approximation for location and skewness to increase the fit to the required distribution • This is the algorithm implemented by default by R-INLA , but this choice can be modified – If extra precision is required, it is possible to run the full Laplace approximation — of course at the expense of running time! Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 30 / 92
Integrated Nested Laplace Approximation (INLA) Operationally, the INLA algorithm proceeds with the following steps: i. Explore the marginal joint posterior for the hyper-parameters ˜ p ( ψ | y ) ψ 2 ψ 1 Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 31 / 92
Integrated Nested Laplace Approximation (INLA) Operationally, the INLA algorithm proceeds with the following steps: i. Explore the marginal joint posterior for the hyper-parameters ˜ p ( ψ | y ) – Locate the mode ˆ ψ by optimising log ˜ p ( ψ | y ) , eg using Newton-like algorithms ψ 2 ψ 1 Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 31 / 92
Integrated Nested Laplace Approximation (INLA) Operationally, the INLA algorithm proceeds with the following steps: i. Explore the marginal joint posterior for the hyper-parameters ˜ p ( ψ | y ) – Locate the mode ˆ ψ by optimising log ˜ p ( ψ | y ) , eg using Newton-like algorithms – Compute the Hessian at ˆ ψ and change co-ordinates to standardise the variables; this corrects for scale and rotation and simplifies integration ψ 2 E [ z ] = 0 2 z V [ z ] = σ 2 I 1 z ψ 1 Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 31 / 92
Integrated Nested Laplace Approximation (INLA) Operationally, the INLA algorithm proceeds with the following steps: i. Explore the marginal joint posterior for the hyper-parameters ˜ p ( ψ | y ) – Locate the mode ˆ ψ by optimising log ˜ p ( ψ | y ) , eg using Newton-like algorithms – Compute the Hessian at ˆ ψ and change co-ordinates to standardise the variables; this corrects for scale and rotation and simplifies integration p ( ψ | y ) and produce a grid of H points { ψ ∗ – Explore log ˜ h } associated with the bulk of the mass, together with a corresponding set of area weights { ∆ h } ψ 2 2 z 1 z ψ 1 Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 31 / 92
Integrated Nested Laplace Approximation (INLA) ii. For each element ψ ∗ h in the grid, p ( ψ ∗ – Obtain the marginal posterior ˜ h | y ) , using interpolation and possibly correcting for (probable) skewness by using log-splines; p ( θ j | ψ ∗ – Evaluate the conditional posteriors ˜ h , y ) on a grid of selected values for θ j ; Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 32 / 92
Integrated Nested Laplace Approximation (INLA) ii. For each element ψ ∗ h in the grid, p ( ψ ∗ – Obtain the marginal posterior ˜ h | y ) , using interpolation and possibly correcting for (probable) skewness by using log-splines; p ( θ j | ψ ∗ – Evaluate the conditional posteriors ˜ h , y ) on a grid of selected values for θ j ; iii. Marginalise ψ ∗ h to obtain the marginal posteriors ˜ p ( θ j | y ) using numerical integration � H p ( θ j | ψ ∗ p ( ψ ∗ ˜ p ( θ j | y ) ≈ ˜ h , y )˜ h | y )∆ h h =1 Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 32 / 92
Integrated Nested Laplace Approximation (INLA) So, it’s all in the name... Integrated Nested Laplace Approximation • Because Laplace approximation is the basis to estimate the unknown distributions • Because the Laplace approximations are nested within one another – Since (2.) is needed to estimate (1.) – NB: Consequently the estimation of (1.) might not be good enough, but it can be refined • Because the required marginal posterior distributions are obtained by (numerical) integration Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 33 / 92
Integrated Nested Laplace Approximation (INLA) So, it’s all in the name... Integrated Nested Laplace Approximation • Because Laplace approximation is the basis to estimate the unknown distributions • Because the Laplace approximations are nested within one another – Since (2.) is needed to estimate (1.) – NB: Consequently the estimation of (1.) might not be good enough, but it can be refined • Because the required marginal posterior distributions are obtained by (numerical) integration Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 33 / 92
Integrated Nested Laplace Approximation (INLA) So, it’s all in the name... Integrated Nested Laplace Approximation • Because Laplace approximation is the basis to estimate the unknown distributions • Because the Laplace approximations are nested within one another – Since (2.) is needed to estimate (1.) – NB: Consequently the estimation of (1.) might not be good enough, but it can be refined • Because the required marginal posterior distributions are obtained by (numerical) integration Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 33 / 92
Integrated Nested Laplace Approximation (INLA) So, it’s all in the name... Integrated Nested Laplace Approximation • Because Laplace approximation is the basis to estimate the unknown distributions • Because the Laplace approximations are nested within one another – Since (2.) is needed to estimate (1.) – NB: Consequently the estimation of (1.) might not be good enough, but it can be refined • Because the required marginal posterior distributions are obtained by (numerical) integration Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 33 / 92
INLA — example • Suppose we want to make inference on a very simple model Normal ( θ j , σ 2 ( σ 2 y ij | θ j , ψ ∼ 0 ) 0 assumed known) ( ψ = τ − 1 is the precision) θ j | ψ ∼ Normal (0 , τ ) ψ ∼ Gamma ( a, b ) Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 34 / 92
INLA — example • Suppose we want to make inference on a very simple model Normal ( θ j , σ 2 ( σ 2 y ij | θ j , ψ ∼ 0 ) 0 assumed known) ( ψ = τ − 1 is the precision) θ j | ψ ∼ Normal (0 , τ ) ψ ∼ Gamma ( a, b ) • So, the model is made by a three-level hierarchy: 1 Data y = ( y ij ) for i = 1 , . . . , n j and j = 1 , . . . , J 2 Parameters θ = ( θ 1 , . . . , θ J ) 3 Hyper-parameter ψ Gianluca Baio ( UCL) Introduction to INLA Bayes 2013, 21 May 2013 34 / 92
Recommend
More recommend