data driven model selection for approximate bayesian
play

Data-driven Model Selection for Approximate Bayesian Computation via - PowerPoint PPT Presentation

Data-driven Model Selection for Approximate Bayesian Computation via Multiple Logisitic Regression. Ben Rohrlach Prof. Nigel Bean, Dr Jonathan Tuke University of Adelaide November 6, 2014 Adam Rohrlach Table of Contents Introduction. 1


  1. ABC Using Summary Statistics. What are sufficient summary statistics? Sufficient summary statistics contain all of the information about a parameter that is available in a sample (i.e. ¯ X is sufficient for µ ). A summary statistic S ( X ) is sufficient if it can be written in Fisher-Neymann factorised form: � � � � � � L X � θ = g ( X ) h θ S ( X ) � θ Adam Rohrlach

  2. ABC Using Summary Statistics. � � � � � � It can be shown P θ � X obs = P θ � S ( X obs ) . Adam Rohrlach

  3. ABC Using Summary Statistics. � � � � � � It can be shown P θ � X obs = P θ � S ( X obs ) . That is, we can compare sufficient summary statistics to obtain the exact posterior distribution for θ . Adam Rohrlach

  4. The Modified Rejection-Acceptance Algorithm. For some distance function ρ ( S ( X ) , S ( Y )) , and some ‘tolerance’ parameter ǫ , the algorithm now becomes: 1: Set i = 0 2: while i < ℓ do Sample θ ∗ from π ( θ ) 3: Simulate X ∗ from f ( X � � θ ∗ ) 4: if ( ρ ( S ( X ∗ ) , S ( X obs )) < ǫ ) then 5: accept θ ∗ 6: i = i + 1 7: end if 8: 9: end while Adam Rohrlach

  5. ABC Using Summary Statistics. ˆ � Gives the same posterior distribution P ( θ � S ( X obs )) if S ( X ) is sufficient. Adam Rohrlach

  6. ABC Using Summary Statistics. ˆ � Gives the same posterior distribution P ( θ � S ( X obs )) if S ( X ) is sufficient. ˆ � � Again, P ( θ � S ( X obs )) → P ( θ � X obs ) as ǫ → 0. Adam Rohrlach

  7. ABC Using Summary Statistics. ˆ � Gives the same posterior distribution P ( θ � S ( X obs )) if S ( X ) is sufficient. ˆ � � Again, P ( θ � S ( X obs )) → P ( θ � X obs ) as ǫ → 0. Convergence can now be faster. Adam Rohrlach

  8. ABC Using Summary Statistics. ˆ � Gives the same posterior distribution P ( θ � S ( X obs )) if S ( X ) is sufficient. ˆ � � Again, P ( θ � S ( X obs )) → P ( θ � X obs ) as ǫ → 0. Convergence can now be faster. Sufficient summary statistics rarely show up when required. Adam Rohrlach

  9. ABC Using Summary Statistics. ˆ � Gives the same posterior distribution P ( θ � S ( X obs )) if S ( X ) is sufficient. ˆ � � Again, P ( θ � S ( X obs )) → P ( θ � X obs ) as ǫ → 0. Convergence can now be faster. Sufficient summary statistics rarely show up when required. Choosing a ‘best summary statistic’ was the focus of my Masters [2]. Adam Rohrlach

  10. Approximately Sufficient Summary Statistics We have insufficient summary statistics S = { S 1 , · · · , S T } . Adam Rohrlach

  11. Approximately Sufficient Summary Statistics We have insufficient summary statistics S = { S 1 , · · · , S T } . We have parameters of interest Φ = { φ 1 , · · · , φ P } Adam Rohrlach

  12. Approximately Sufficient Summary Statistics We have insufficient summary statistics S = { S 1 , · · · , S T } . We have parameters of interest Φ = { φ 1 , · · · , φ P } Create Γ simulations, which gives Γ × T summary statistics with known input parameters (call this TrainDat). Adam Rohrlach

  13. Approximately Sufficient Summary Statistics We have insufficient summary statistics S = { S 1 , · · · , S T } . We have parameters of interest Φ = { φ 1 , · · · , φ P } Create Γ simulations, which gives Γ × T summary statistics with known input parameters (call this TrainDat). For each n ∈ { 1 , · · · , P } perform linear regression on the TrainDat such that we can get predictions T β ( n ) β ( n ) φ n = ˆ ˆ � ˆ + s j 0 j j = 1 Adam Rohrlach

  14. Approximately Sufficient Summary Statistics We have insufficient summary statistics S = { S 1 , · · · , S T } . We have parameters of interest Φ = { φ 1 , · · · , φ P } Create Γ simulations, which gives Γ × T summary statistics with known input parameters (call this TrainDat). For each n ∈ { 1 , · · · , P } perform linear regression on the TrainDat such that we can get predictions T β ( n ) β ( n ) φ n = ˆ ˆ � ˆ + s j 0 j j = 1 We now have a ‘best predicted parameter value’ if we have summary statistics. Adam Rohrlach

  15. Model Selection in ABC. How do we choose which model we might wish to simulate data under? Adam Rohrlach

  16. Model Selection in ABC. Consider models M = { M 1 , · · · , M q } Adam Rohrlach

  17. Model Selection in ABC. Consider models M = { M 1 , · · · , M q } We can add a step which selects which model we might simulate under. Adam Rohrlach

  18. The Very Modified Rejection-Acceptance Algorithm. Let R ( M k ) be the probability of Model k , and π k ( θ ) be the prior distribution for parameters under Model k . Adam Rohrlach

  19. The Very Modified Rejection-Acceptance Algorithm. Let R ( M k ) be the probability of Model k , and π k ( θ ) be the prior distribution for parameters under Model k . Consider obtaining ℓ posterior samples from a possible q models using some observed data X obs : Adam Rohrlach

  20. The Very Modified Rejection-Acceptance Algorithm. Let R ( M k ) be the probability of Model k , and π k ( θ ) be the prior distribution for parameters under Model k . Consider obtaining ℓ posterior samples from a possible q models using some observed data X obs : 1: Set i = 0 2: while i < ℓ do 10: end while Adam Rohrlach

  21. The Very Modified Rejection-Acceptance Algorithm. Let R ( M k ) be the probability of Model k , and π k ( θ ) be the prior distribution for parameters under Model k . Consider obtaining ℓ posterior samples from a possible q models using some observed data X obs : 1: Set i = 0 2: while i < ℓ do Randomly select some model k to simulate via R ( · ) 3: 10: end while Adam Rohrlach

  22. The Very Modified Rejection-Acceptance Algorithm. Let R ( M k ) be the probability of Model k , and π k ( θ ) be the prior distribution for parameters under Model k . Consider obtaining ℓ posterior samples from a possible q models using some observed data X obs : 1: Set i = 0 2: while i < ℓ do Randomly select some model k to simulate via R ( · ) 3: Sample θ ∗ from π k ( θ ) 4: 10: end while Adam Rohrlach

  23. The Very Modified Rejection-Acceptance Algorithm. Let R ( M k ) be the probability of Model k , and π k ( θ ) be the prior distribution for parameters under Model k . Consider obtaining ℓ posterior samples from a possible q models using some observed data X obs : 1: Set i = 0 2: while i < ℓ do Randomly select some model k to simulate via R ( · ) 3: Sample θ ∗ from π k ( θ ) 4: Simulate X ∗ from f k ( X � � θ ∗ ) 5: if ( ρ ( S ( X ∗ ) , S ( X obs )) < ǫ ) then 6: accept θ ∗ 7: i = i + 1 8: end if 9: 10: end while Adam Rohrlach

  24. Model Selection in ABC. How can we choose which M i best fits our data? Adam Rohrlach

  25. Model Selection in ABC. How can we choose which M i best fits our data? Common approach is to use ‘Bayes Factors’ B ij , i � = j ∈ { 1 , · · · , q } . Adam Rohrlach

  26. Bayes Factors. The Bayes Factor for Models i and j is: Adam Rohrlach

  27. Bayes Factors. The Bayes Factor for Models i and j is: � � � B ij = P X � M i � � � P X � M j Adam Rohrlach

  28. Bayes Factors. The Bayes Factor for Models i and j is: � � � B ij = P X � M i � � � P X � M j � � � = P M i � X P ( X ) / R ( M i ) � � � P ( X ) / R ( M j ) P M j � X Adam Rohrlach

  29. Bayes Factors. The Bayes Factor for Models i and j is: � � � B ij = P X � M i � � � P X � M j � � � = P M i � X P ( X ) / R ( M i ) � � � P ( X ) / R ( M j ) P M j � X � � � = P M i � X � , � � P M j � X if R ( · ) has a uniform distribution. Adam Rohrlach

  30. Bayes Factors. The Bayes Factor for Models i and j is � � � B ij = P M i � X � . � � P M j � X Adam Rohrlach

  31. Bayes Factors. The Bayes Factor for Models i and j is � � � B ij = P M i � X � . � � P M j � X This is just the ‘posterior ratio’ for Models i and j . Adam Rohrlach

  32. Bayes Factors. The Bayes Factor for Models i and j is � � � B ij = P M i � X � . � � P M j � X This is just the ‘posterior ratio’ for Models i and j . Imagine out of 300 retained posterior parameter samples: 200 are from model i , and 100 are from model j , = ⇒ B ij Adam Rohrlach

  33. Bayes Factors. The Bayes Factor for Models i and j is � � � B ij = P M i � X � . � � P M j � X This is just the ‘posterior ratio’ for Models i and j . Imagine out of 300 retained posterior parameter samples: 200 are from model i , and 100 are from model j , ⇒ B ij = 200 / 300 = 100 / 300 Adam Rohrlach

  34. Bayes Factors. The Bayes Factor for Models i and j is � � � B ij = P M i � X � . � � P M j � X This is just the ‘posterior ratio’ for Models i and j . Imagine out of 300 retained posterior parameter samples: 200 are from model i , and 100 are from model j , ⇒ B ij = 200 / 300 = 100 / 300 = 2. Adam Rohrlach

  35. A Fundamental Flaw of Bayes Factors. It can be shown that [3]: � � � � � � B ij = P M i � X � × h j X � S ( X ) � � � � � P M j � X h i X � S ( X ) Adam Rohrlach

  36. A Fundamental Flaw of Bayes Factors. It can be shown that [3]: � � � � � � B ij = P M i � X � × h j X � S ( X ) � � � � � P M j � X h i X � S ( X ) � � � = P M i � X � � � P M j � X Adam Rohrlach

  37. A Fundamental Flaw of Bayes Factors. It can be shown that [3]: � � � � � � B ij = P M i � X � × h j X � S ( X ) � � � � � P M j � X h i X � S ( X ) � � � = P M i � X � � � P M j � X � � � � � � ⇐ ⇒ h j X � S ( X ) = h i X � S ( X ) Adam Rohrlach

  38. A Fundamental Flaw of Bayes Factors. It can be shown that [3]: � � � � � � B ij = P M i � X � × h j X � S ( X ) � � � � � P M j � X h i X � S ( X ) � � � = P M i � X � � � P M j � X � � � � � � ⇐ ⇒ h j X � S ( X ) = h i X � S ( X ) That is, B ij will be biased unless the probability of seeing the data, given the observed summary statistics, is equal for each model. Adam Rohrlach

  39. Post-Hoc Model Comparison. Consider other problems with B ij (and any post-hoc model comparison method). Adam Rohrlach

  40. Post-Hoc Model Comparison. Consider other problems with B ij (and any post-hoc model comparison method). Posterior distributions are sensitive to choices of prior distributions. Adam Rohrlach

  41. Post-Hoc Model Comparison. Consider other problems with B ij (and any post-hoc model comparison method). Posterior distributions are sensitive to choices of prior distributions. A particularly poor choice of π j ( θ ) may reduce the number of retained simulations under Model j , and hence inflate B ij . Adam Rohrlach

  42. Post-Hoc Model Comparison. We would like a model selection algorithm that avoids comparing posterior distributions. Adam Rohrlach

  43. Post-Hoc Model Comparison. We would like a model selection algorithm that avoids comparing posterior distributions. Given that our ‘semi-automatic summary selection’ version ABC is an example of ‘supervised learning’, we could consider a similar method for model selection. Adam Rohrlach

  44. Multiple Logistic Regression. Let X be our data (the collection of Γ × T summary statistics), Adam Rohrlach

  45. Multiple Logistic Regression. Let X be our data (the collection of Γ × T summary statistics), Let x m = be the m th row of X (the summary s m 1 , · · · , s m � � T statistics from the m th simulation). Adam Rohrlach

  46. Multiple Logistic Regression. Let X be our data (the collection of Γ × T summary statistics), Let x m = be the m th row of X (the summary s m 1 , · · · , s m � � T statistics from the m th simulation). Let Y m be the category of the m th observation (the model used for the m th simulation). Adam Rohrlach

  47. Multiple Logistic Regression. Let X be our data (the collection of Γ × T summary statistics), Let x m = be the m th row of X (the summary s m 1 , · · · , s m � � T statistics from the m th simulation). Let Y m be the category of the m th observation (the model used for the m th simulation). Let β c = β c 0 , · · · , β c � � be the vector of coefficients for T category c . Adam Rohrlach

  48. Multiple Logistic Regression. Let X be our data (the collection of Γ × T summary statistics), Let x m = be the m th row of X (the summary s m 1 , · · · , s m � � T statistics from the m th simulation). Let Y m be the category of the m th observation (the model used for the m th simulation). Let β c = β c 0 , · · · , β c � � be the vector of coefficients for T category c . We aim to best fit the model P ( Y m = c � � � � X ) = β c · x , , ln P ( Y m = q � � X ) for c = 1 , · · · , J − 1. Adam Rohrlach

  49. Multiple Logistic Regression. We end up with a predictive model such that we can predict for X NEW : P ( Y m = c � � X NEW ) = p c for each c ∈ { 1 , · · · , q } , such that q � p i = 1 . i = 1 Adam Rohrlach

  50. Multiple Logistic Regression Example. Consider two opposing models of population dynamics: 150000 100000 Model N e ( t ) Bottleneck Exponential 50000 0 0 4000 8000 12000 16000 Generations Before Present (t) Adam Rohrlach

  51. Multiple Logistic Regression Example. The Bottleneck Model: A sudden reduction to between 20% and 40% of the effective population size occurs before the species dies out. The Exponential Model: There was no sudden population size reduction, the species just died out (relatively) slowly over 3000 generations. Adam Rohrlach

  52. Multiple Logistic Regression Example. However, we don’t know which model fits our data best. If the data came from the Bottleneck Model, my prior belief is that: N ( 16000 ) = 150 , 000 , N ( 15500 ) ∼ U ( 30 , 000 , 75 , 000 ) and N ( 12000 ) ∼ U ( 300 , 12500 ) . If the data came from the Exponential Model, my prior belief is that: N ( 16000 ) = 150 , 000 , N ( 15500 ) = 150 , 000 and N ( 12000 ) ∼ U ( 300 , 7500 ) . Adam Rohrlach

  53. Multiple Logistic Regression Example. I produced training data of this form with only 10,000 (5000 simulations from each model ≈ 2 mins), and fit the MLR (call this trainDat). Adam Rohrlach

Recommend


More recommend