05 model comparison and hypothesis testing
play

05 Model comparison and hypothesis testing Shravan Vasishth - PowerPoint PPT Presentation

05 Model comparison and hypothesis testing Shravan Vasishth September 03, 2019 Shravan Vasishth 05 Model comparison and hypothesis testing 1 / 64 September 03, 2019 Introduction Bayes rule can be written with reference to a specific


  1. 05 Model comparison and hypothesis testing Shravan Vasishth September 03, 2019 Shravan Vasishth 05 Model comparison and hypothesis testing 1 / 64 September 03, 2019

  2. Introduction Bayes’ rule can be written with reference to a specific statistical model M 1 . D refers to the data. θ is the parameter, or vector of parameters. P ( θ | D , M 1 ) = P ( D | θ, M 1 ) P ( θ | M 1 ) (1) P ( D | M 1 ) Shravan Vasishth 05 Model comparison and hypothesis testing 2 / 64 September 03, 2019

  3. Introduction P ( D | M 1 ) is the likelihood, and is a single number that tells you the likelihood of the observed data D given the model M 1 (and only in the discrete case, it tells you the probability of the observed data D given the model). Shravan Vasishth 05 Model comparison and hypothesis testing 3 / 64 September 03, 2019

  4. Introduction Obviously, you would prefer a model that gives a higher likelihood. For example, and speaking informally, if you have data that were generated from a Normal(0,1) distribution, then the likelihood of the data given that µ = 0 will be higher than the likelihood given some other value like µ = 10. Shravan Vasishth 05 Model comparison and hypothesis testing 4 / 64 September 03, 2019

  5. Introduction The higher likelihood is telling us that the underlying model is more likely to have produced the data. So we would prefer the model with the higher likelihood: we would prefer Normal(0,1) over Normal(10,1) as the presumed distribution that generated the data. Shravan Vasishth 05 Model comparison and hypothesis testing 5 / 64 September 03, 2019

  6. Introduction Assume for simplicity that σ = 1. ## sample 100 iid data points: x<- rnorm (100) ## compute log likelihood under mu=0 (loglikmu0<- sum ( dnorm (x,mean=0,sd=1,log=TRUE))) ## [1] -154.63 ## compute log likelihood under mu=10 (loglikmu10<- sum ( dnorm (x,mean=10,sd=1,log=TRUE))) ## [1] -5018 ## the likelihood ratio is a difference of logliks ## on the log scale: loglikmu0 - loglikmu10 ## [1] 4863.4 Shravan Vasishth 05 Model comparison and hypothesis testing 6 / 64 September 03, 2019

  7. Introduction One way to compare two models M 1 and M 2 is to use the Bayes factor: BF 12 = P ( D | M 1 ) (2) P ( D | M 2 ) The Bayes factor is similar to the frequentist likelihood ratio test (or ANOVA), with the difference that in the Bayes factor, the likelihood is integrated over the parameter space, not maximized (shown below). Shravan Vasishth 05 Model comparison and hypothesis testing 7 / 64 September 03, 2019

  8. Introduction How to compute the likelihood? Consider the simple binomial case where we have a subject answer 10 questions, and they get 9 right. That’s our data. Shravan Vasishth 05 Model comparison and hypothesis testing 8 / 64 September 03, 2019

  9. Introduction Discrete example Assuming a binomial likelihood function, Binomial ( n , θ ), the two models we will compare are M 1 , the parameter has a point value θ = 0 . 5 with probability 1 (a very sharp prior), and M 2 , the parameter has a vague prior θ ∼ Beta (1 , 1). Recall that this Beta (1 , 1) distribution is Uniform (0 , 1). Shravan Vasishth 05 Model comparison and hypothesis testing 9 / 64 September 03, 2019

  10. Introduction Discrete example The likelihood under M 1 is: � � � � n 10 θ 9 (1 − θ ) 1 = 0 . 5 10 (3) k 9 We already know how to compute this: (probDataM1<- dbinom (9,p=0.5,size=10)) ## [1] 0.0097656 Shravan Vasishth 05 Model comparison and hypothesis testing 10 / 64 September 03, 2019

  11. Introduction Discrete example The marginal likelihood under M 2 involves solving the following integral: ˆ P ( D | M 2 ) = P ( D | θ, M 2 ) P ( θ | M 2 ) d θ (4) The integral is simply integrating out (“summing over”) all possible values of the parameter θ . Shravan Vasishth 05 Model comparison and hypothesis testing 11 / 64 September 03, 2019

  12. Introduction Discrete example To see what summing over all possible values means, first consider a discrete version of this: suppose we say that our θ can take on only these three values: θ 1 = 0 , θ 2 = 0 . 5 , θ 3 = 1, and each has probability 1 / 3. Then, the marginal likelihood of the data given this prior specification of θ would be: P ( D | M ) = P ( θ 1 ) P ( D | θ 1 ) + P ( θ 2 ) P ( D | θ 2 ) + P ( θ 3 ) P ( D | θ 3 ) (5) � = P ( D | θ i , M ) P ( θ i | M ) Shravan Vasishth 05 Model comparison and hypothesis testing 12 / 64 September 03, 2019

  13. Introduction Discrete example In our discrete example, this evaluates to: res<-(1 / 3) * ( choose (10,9) * (0) ^ 9 * (1-0) ^ 1) + (1 / 3) * ( choose (10,9) * (0.5) ^ 9 * (1-0.5) ^ 1) + (1 / 3) * ( choose (10,9) * (1) ^ 9 * (1-1) ^ 1) res ## [1] 0.0032552 This may be easier to read in mathematical form: P ( D | M ) = P ( θ 1 ) P ( D | θ 1 ) + P ( θ 2 ) P ( D | θ 2 ) + P ( θ 3 ) P ( D | θ 3 ) �� � � �� � � =1 10 + 1 10 0 9 (1 − 0) 1 0 . 5 9 (1 − 0 . 5) 1 3 9 3 9 (6) �� � � +1 10 1 9 (1 − 1) 1 3 9 =0 . 003 Shravan Vasishth 05 Model comparison and hypothesis testing 13 / 64 September 03, 2019

  14. Introduction Discrete example Essentially, we are computing the marginal likelihood P ( D | M ) by averaging the likelihood across possible parameter values (here, only three possible values), with the prior probabilities for each parameter value serving as a weight. Shravan Vasishth 05 Model comparison and hypothesis testing 14 / 64 September 03, 2019

  15. Introduction Discrete example The Bayes factor for Model 1 vs Model 2 would then be 0.0097 / 0.003 ## [1] 3.2333 Model 1, which assumes that θ has a point value 0.5, is approximately three times more likely than the Model 2 with the discrete prior over θ ( θ 1 = 0 , θ 2 = 0 . 5 , θ 3 = 1, each with probability 1 / 3). Shravan Vasishth 05 Model comparison and hypothesis testing 15 / 64 September 03, 2019

  16. Introduction Continuous example The integral shown above does essentially the calculation we show above, but summing over the entire continuous space that is the range of possible values of θ : ˆ P ( D | M 2 ) = P ( D | θ, M 2 ) P ( θ | M 2 ) d θ (7) Shravan Vasishth 05 Model comparison and hypothesis testing 16 / 64 September 03, 2019

  17. Introduction Continuous example Let’s solve this integral analytically. We need to know only one small detail from integral calculus: ˆ b x 9 dx = [ x 10 10 ] b (8) a a Similarly: ˆ b x 10 dx = [ x 11 11 ] b (9) a a Having reminded ourselves of how to solve this simple integral, we proceed as follows. Shravan Vasishth 05 Model comparison and hypothesis testing 17 / 64 September 03, 2019

  18. Introduction Continuous example Our prior for θ is Beta ( α = 1 , β = 1): P ( θ | M 2 ) = Γ( α + β ) Γ( α )Γ( β ) θ α − 1 θ β − 1 Γ(2) (10) Γ(1)Γ(1) θ 1 − 1 θ 1 − 1 = =1 Shravan Vasishth 05 Model comparison and hypothesis testing 18 / 64 September 03, 2019

  19. Introduction Continuous example So, our integral simplifies to: ˆ 1 P ( D | M 2 ) = P ( D | θ, M 2 ) d θ 0 ˆ 1 � � 10 θ 9 (1 − θ ) 1 d θ = 9 0 ˆ 1 � � 10 ( θ 9 − θ 10 ) d θ (11) = 9 0 � 1 � θ 10 10 − θ 11 =10 11 0 110 = 1 1 =10 × 11 Shravan Vasishth 05 Model comparison and hypothesis testing 19 / 64 September 03, 2019

  20. Introduction Continuous example So, when Model 1 assumes that the θ parameter is 0.5, and Model 2 has a vague prior Beta (1 , 1) on the θ parameter, our Bayes factor will be: BF 12 = P ( D | M 1 ) P ( D | M 2 ) = 0 . 00977 = 0 . 107 (12) 1 / 11 Shravan Vasishth 05 Model comparison and hypothesis testing 20 / 64 September 03, 2019

  21. Introduction Continuous example Thus, the model with the vague prior (M2) is about 9 times more likely than the model with θ = 0 . 5: 1 0 . 10742 = 9 . 309 (13) Shravan Vasishth 05 Model comparison and hypothesis testing 21 / 64 September 03, 2019

  22. Introduction Continuous example We could conclude that we have some evidence against the guessing model M1 in this case. Jeffreys (n.d.) has suggested the following decision criterion using Bayes factors. Here, we are comparing two models, labeled 1 and 2. BF 12 > 100: Decisive evidence BF 12 = 32 − 100: Very strong BF 12 = 10 − 32: Strong BF 12 = 3 − 10: Substantial BF 12 = 2 − 3: Not worth more than a bare mention Shravan Vasishth 05 Model comparison and hypothesis testing 22 / 64 September 03, 2019

  23. Introduction Prior sensitivity The Bayes factor is sensitive to the choice of prior. It is therefore important to do a sensitivity analysis with different priors. Shravan Vasishth 05 Model comparison and hypothesis testing 23 / 64 September 03, 2019

  24. Introduction Prior sensitivity For the model M 2 above, consider the case where we have a prior on θ such that there are 10 possible values for θ , 0.1, 0.2, 0.3,. . . ,1, and the probabilities of each value of θ are 1/10. theta<- seq (0.1,1,by=0.1) w<- rep (1 / 10,10) prob<- rep (NA, length (w)) for (i in 1 :length (theta)){ prob[i]<-(1 / w[i]) *choose (10,9) * theta[i] ^ 9 * (1 - theta[i] ^ 1) } ## Likelihood for model M2 with ## new prior on theta: sum (prob) ## [1] 8.2871 Shravan Vasishth 05 Model comparison and hypothesis testing 24 / 64 September 03, 2019

Recommend


More recommend