workshop 7 2b introduction to bayesian models
play

Workshop 7.2b: Introduction to Bayesian models Murray Logan 07 - PowerPoint PPT Presentation

Workshop 7.2b: Introduction to Bayesian models Murray Logan 07 Feb 2017 Section 1 Frequentist vs Bayesian Frequentist P(DH) long-run frequency simple analytical methods to solve roots conclusions pertain to data, not


  1. Workshop 7.2b: Introduction to Bayesian models Murray Logan 07 Feb 2017

  2. Section 1 Frequentist vs Bayesian

  3. Frequentist • P(D฀H) • long-run frequency • simple analytical methods to solve roots • conclusions pertain to data, not parameters or hypotheses • compared to theoretical distribution when NULL is true • probability of obtaining observed data or MORE EXTREME data

  4. Frequentist • P-value ◦ probabulity of rejecting NULL ◦ NOT a measure of the magnitude of an effect or degree of significance! ◦ measure of whether the sample size is large enough • 95% CI ◦ NOT about the parameter it is about the interval ◦ does not tell you the range of values likely to contain the true mean

  5. ------------------------------------------------- Random, $P(H|D)$ $P(D|H)$ Degree of belief Long-run frequency Probability Parameters Data Inferences distribution Fixed, true ------------------------------------------------- Parameters Fixed, true One possible Obs. data ------------ ------------ -------------- Bayesian Frequentist Frequentist vs Bayesian

  6. Frequentist vs Bayesian ● 250 250 250 ● ● ● ● ● ● ● ● ● ● 200 ● ● ● ● ● ● ● ● ● ● 200 200 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 150 150 150 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 100 100 100 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 50 50 50 ● 0 0 0 2 4 6 8 10 2 4 6 8 10 2 4 6 8 10 x x x n: 10 Slope: -0.1022 t: -2.3252 p: 0.0485 n: 10 Slope: -10.2318 t: -2.2115 p: 0.0579 n: 100 Slope: -10.4713 t: -6.6457 p: 1.7101362 ฀ 10-9

  7. Frequentist vs Bayesian 250 250 ● ● 200 ● ● ● ● ● ● ● ● ● ● 200 ● ● 150 150 ● ● ● ● ● 100 100 ● 50 50 0 0 2 4 6 8 10 2 4 6 8 10 x x Population A Population B Percentage 0.46 45.46 change Prob. ฀5% decline 0 0.86

  8. Section 2 Bayesian Statistics

  9. Bayesian u l e s r a y e B P ( D | H ) × P ( H ) P ( H | D ) = P ( D ) posterior belief likelihood × prior probability ( probability ) = normalizing constant

  10. Bayesian u l e s r y e B a P ( H | D ) = P ( D | H ) × P ( H ) P ( D ) posterior belief likelihood × prior probability ( probability ) = normalizing constant The normalizing constant is required for probability - turn a frequency distribution into a probability distribution

  11. Estimation: OLS

  12. Estimation: Likelihood P ( D | H )

  13. Bayesian • conclusions pertain to hypotheses • computationally robust (sample size,balance,collinearity) • inferential flexibility - derive any number of inferences

  14. Bayesian • subjectivity? • intractable P ( D | H ) × P ( H ) P ( H | D ) = P ( D ) P ( D ) - probability of data from all possible hypotheses

  15. MCMC sampling Marchov Chain Monte Carlo sampling • draw samples proportional to likelihood two parameters and infinitely vague priors - posterior likelihood only likelihood multivariate normal

  16. MCMC sampling Marchov Chain Monte Carlo sampling • draw samples proportional to likelihood

  17. MCMC sampling Marchov Chain Monte Carlo sampling • draw samples proportional to likelihood

  18. MCMC sampling Marchov Chain Monte Carlo sampling • chain of samples

  19. MCMC sampling Marchov Chain Monte Carlo sampling • 1000 samples

  20. MCMC sampling Marchov Chain Monte Carlo sampling • 10,000 samples

  21. MCMC sampling Marchov Chain Monte Carlo sampling • Aim: samples reflect posterior frequency distribution • samples used to construct posterior prob. dist. • the sharper the multidimensional ฀features฀ - more samples • chain should have traversed entire posterior • inital location should not influence

  22. MCMC diagnostics o t s p l a c e T r

  23. MCMC diagnostics n t i o e l a o r r t o c A u • Summary stats on non-independent values are biased • Thinning factor = 1

  24. MCMC diagnostics n t i o e l a o r r t o c A u • Summary stats on non-independent values are biased • Thinning factor = 10

  25. MCMC diagnostics n t i o e l a o r r t o c A u • Summary stats on non-independent values are biased • Thinning factor = 10, n=10,000

  26. MCMC diagnostics s i o n b u t t r i D i s o f t P l o

  27. Sampler types Metropolis-Hastings http://twiecki.github.io/blog/2014/01/02/visualizing-mcmc/

  28. Sampler types Gibbs

  29. Sampler types NUTS

  30. Sampling • thinning • burning (warmup) • chains

  31. Bayesian software (for R) • MCMCpack • winbugs (R2winbugs) • jags (R2jags) • stan (rstan, brms)

  32. summary() stanplot(,type=) stancode() residuals() logLik() fitted() standata() predict() marginal_effects() coef() influence.measures() plot() BRMS Extractor Description Residuals Predicted values Predict new responses Extract model coefficients Diagnostic plots More diagnostic plots Partial effects Extract log-likelihood LOO() and WAIC() Calculate WAIC and LOO Leverage, Cook฀s D Model output Model passed to stan Data list passed to stan

  33. Section 3 Worked Examples

  34. 84 80 90 154 148 169 206 244 212 248 > summary (fert) 1st Qu.: 81.25 : 80.0 Min. : 25.00 Min. YIELD FERTILIZER 169 Median :137.50 150 6 148 125 5 154 100 4 1st Qu.:104.5 Median :161.5 75 > str (fert) : int $ YIELD 25 50 75 100 125 150 175 200 225 250 $ FERTILIZER: int 2 variables: 10 obs. of 'data.frame': :248.0 Mean Max. :250.00 Max. 3rd Qu.:210.5 3rd Qu.:193.75 :163.5 Mean :137.50 90 3 > fert <- read.csv ('../data/fertilizer.csv', strip.white=T) 75 148 125 5 154 100 4 90 3 150 80 50 2 84 25 1 FERTILIZER YIELD > fert 6 169 80 250 50 2 84 25 1 FERTILIZER YIELD > head (fert) 248 10 7 212 225 9 244 200 8 206 175 Worked Examples

  35. Worked Examples Question: is there a relationship between fertilizer concentration and grass yield? Linear model: Frequentist ε ∼ N (0 , σ 2 ) y i = β 0 + β 1 x i + ε i Bayesian y i ∼ N ( η i , σ 2 ) η i = β 0 + β 1 x i β 0 ∼ N (0 , 1000) β 1 ∼ N (0 , 1000) σ 2 ∼ cauchy (0 , 4)

Recommend


More recommend