ABC Using Summary Statistics. What are sufficient summary statistics? Sufficient summary statistics contain all of the information about a parameter that is available in a sample (i.e. ¯ X is sufficient for µ ). A summary statistic S ( X ) is sufficient if it can be written in Fisher-Neymann factorised form: � � � � � � L X � θ = g ( X ) h θ S ( X ) � θ Adam Rohrlach
ABC Using Summary Statistics. � � � � � � It can be shown P θ � X obs = P θ � S ( X obs ) . Adam Rohrlach
ABC Using Summary Statistics. � � � � � � It can be shown P θ � X obs = P θ � S ( X obs ) . That is, we can compare sufficient summary statistics to obtain the exact posterior distribution for θ . Adam Rohrlach
The Modified Rejection-Acceptance Algorithm. For some distance function ρ ( S ( X ) , S ( Y )) , and some ‘tolerance’ parameter ǫ , the algorithm now becomes: 1: Set i = 0 2: while i < ℓ do Sample θ ∗ from π ( θ ) 3: Simulate X ∗ from f ( X � � θ ∗ ) 4: if ( ρ ( S ( X ∗ ) , S ( X obs )) < ǫ ) then 5: accept θ ∗ 6: i = i + 1 7: end if 8: 9: end while Adam Rohrlach
ABC Using Summary Statistics. ˆ � Gives the same posterior distribution P ( θ � S ( X obs )) if S ( X ) is sufficient. Adam Rohrlach
ABC Using Summary Statistics. ˆ � Gives the same posterior distribution P ( θ � S ( X obs )) if S ( X ) is sufficient. ˆ � � Again, P ( θ � S ( X obs )) → P ( θ � X obs ) as ǫ → 0. Adam Rohrlach
ABC Using Summary Statistics. ˆ � Gives the same posterior distribution P ( θ � S ( X obs )) if S ( X ) is sufficient. ˆ � � Again, P ( θ � S ( X obs )) → P ( θ � X obs ) as ǫ → 0. Convergence can now be faster. Adam Rohrlach
ABC Using Summary Statistics. ˆ � Gives the same posterior distribution P ( θ � S ( X obs )) if S ( X ) is sufficient. ˆ � � Again, P ( θ � S ( X obs )) → P ( θ � X obs ) as ǫ → 0. Convergence can now be faster. Sufficient summary statistics rarely show up when required. Adam Rohrlach
ABC Using Summary Statistics. ˆ � Gives the same posterior distribution P ( θ � S ( X obs )) if S ( X ) is sufficient. ˆ � � Again, P ( θ � S ( X obs )) → P ( θ � X obs ) as ǫ → 0. Convergence can now be faster. Sufficient summary statistics rarely show up when required. Choosing a ‘best summary statistic’ was the focus of my Masters [2]. Adam Rohrlach
Approximately Sufficient Summary Statistics We have insufficient summary statistics S = { S 1 , · · · , S T } . Adam Rohrlach
Approximately Sufficient Summary Statistics We have insufficient summary statistics S = { S 1 , · · · , S T } . We have parameters of interest Φ = { φ 1 , · · · , φ P } Adam Rohrlach
Approximately Sufficient Summary Statistics We have insufficient summary statistics S = { S 1 , · · · , S T } . We have parameters of interest Φ = { φ 1 , · · · , φ P } Create Γ simulations, which gives Γ × T summary statistics with known input parameters (call this TrainDat). Adam Rohrlach
Approximately Sufficient Summary Statistics We have insufficient summary statistics S = { S 1 , · · · , S T } . We have parameters of interest Φ = { φ 1 , · · · , φ P } Create Γ simulations, which gives Γ × T summary statistics with known input parameters (call this TrainDat). For each n ∈ { 1 , · · · , P } perform linear regression on the TrainDat such that we can get predictions T β ( n ) β ( n ) φ n = ˆ ˆ � ˆ + s j 0 j j = 1 Adam Rohrlach
Approximately Sufficient Summary Statistics We have insufficient summary statistics S = { S 1 , · · · , S T } . We have parameters of interest Φ = { φ 1 , · · · , φ P } Create Γ simulations, which gives Γ × T summary statistics with known input parameters (call this TrainDat). For each n ∈ { 1 , · · · , P } perform linear regression on the TrainDat such that we can get predictions T β ( n ) β ( n ) φ n = ˆ ˆ � ˆ + s j 0 j j = 1 We now have a ‘best predicted parameter value’ if we have summary statistics. Adam Rohrlach
Model Selection in ABC. How do we choose which model we might wish to simulate data under? Adam Rohrlach
Model Selection in ABC. Consider models M = { M 1 , · · · , M q } Adam Rohrlach
Model Selection in ABC. Consider models M = { M 1 , · · · , M q } We can add a step which selects which model we might simulate under. Adam Rohrlach
The Very Modified Rejection-Acceptance Algorithm. Let R ( M k ) be the probability of Model k , and π k ( θ ) be the prior distribution for parameters under Model k . Adam Rohrlach
The Very Modified Rejection-Acceptance Algorithm. Let R ( M k ) be the probability of Model k , and π k ( θ ) be the prior distribution for parameters under Model k . Consider obtaining ℓ posterior samples from a possible q models using some observed data X obs : Adam Rohrlach
The Very Modified Rejection-Acceptance Algorithm. Let R ( M k ) be the probability of Model k , and π k ( θ ) be the prior distribution for parameters under Model k . Consider obtaining ℓ posterior samples from a possible q models using some observed data X obs : 1: Set i = 0 2: while i < ℓ do 10: end while Adam Rohrlach
The Very Modified Rejection-Acceptance Algorithm. Let R ( M k ) be the probability of Model k , and π k ( θ ) be the prior distribution for parameters under Model k . Consider obtaining ℓ posterior samples from a possible q models using some observed data X obs : 1: Set i = 0 2: while i < ℓ do Randomly select some model k to simulate via R ( · ) 3: 10: end while Adam Rohrlach
The Very Modified Rejection-Acceptance Algorithm. Let R ( M k ) be the probability of Model k , and π k ( θ ) be the prior distribution for parameters under Model k . Consider obtaining ℓ posterior samples from a possible q models using some observed data X obs : 1: Set i = 0 2: while i < ℓ do Randomly select some model k to simulate via R ( · ) 3: Sample θ ∗ from π k ( θ ) 4: 10: end while Adam Rohrlach
The Very Modified Rejection-Acceptance Algorithm. Let R ( M k ) be the probability of Model k , and π k ( θ ) be the prior distribution for parameters under Model k . Consider obtaining ℓ posterior samples from a possible q models using some observed data X obs : 1: Set i = 0 2: while i < ℓ do Randomly select some model k to simulate via R ( · ) 3: Sample θ ∗ from π k ( θ ) 4: Simulate X ∗ from f k ( X � � θ ∗ ) 5: if ( ρ ( S ( X ∗ ) , S ( X obs )) < ǫ ) then 6: accept θ ∗ 7: i = i + 1 8: end if 9: 10: end while Adam Rohrlach
Model Selection in ABC. How can we choose which M i best fits our data? Adam Rohrlach
Model Selection in ABC. How can we choose which M i best fits our data? Common approach is to use ‘Bayes Factors’ B ij , i � = j ∈ { 1 , · · · , q } . Adam Rohrlach
Bayes Factors. The Bayes Factor for Models i and j is: Adam Rohrlach
Bayes Factors. The Bayes Factor for Models i and j is: � � � B ij = P X � M i � � � P X � M j Adam Rohrlach
Bayes Factors. The Bayes Factor for Models i and j is: � � � B ij = P X � M i � � � P X � M j � � � = P M i � X P ( X ) / R ( M i ) � � � P ( X ) / R ( M j ) P M j � X Adam Rohrlach
Bayes Factors. The Bayes Factor for Models i and j is: � � � B ij = P X � M i � � � P X � M j � � � = P M i � X P ( X ) / R ( M i ) � � � P ( X ) / R ( M j ) P M j � X � � � = P M i � X � , � � P M j � X if R ( · ) has a uniform distribution. Adam Rohrlach
Bayes Factors. The Bayes Factor for Models i and j is � � � B ij = P M i � X � . � � P M j � X Adam Rohrlach
Bayes Factors. The Bayes Factor for Models i and j is � � � B ij = P M i � X � . � � P M j � X This is just the ‘posterior ratio’ for Models i and j . Adam Rohrlach
Bayes Factors. The Bayes Factor for Models i and j is � � � B ij = P M i � X � . � � P M j � X This is just the ‘posterior ratio’ for Models i and j . Imagine out of 300 retained posterior parameter samples: 200 are from model i , and 100 are from model j , = ⇒ B ij Adam Rohrlach
Bayes Factors. The Bayes Factor for Models i and j is � � � B ij = P M i � X � . � � P M j � X This is just the ‘posterior ratio’ for Models i and j . Imagine out of 300 retained posterior parameter samples: 200 are from model i , and 100 are from model j , ⇒ B ij = 200 / 300 = 100 / 300 Adam Rohrlach
Bayes Factors. The Bayes Factor for Models i and j is � � � B ij = P M i � X � . � � P M j � X This is just the ‘posterior ratio’ for Models i and j . Imagine out of 300 retained posterior parameter samples: 200 are from model i , and 100 are from model j , ⇒ B ij = 200 / 300 = 100 / 300 = 2. Adam Rohrlach
A Fundamental Flaw of Bayes Factors. It can be shown that [3]: � � � � � � B ij = P M i � X � × h j X � S ( X ) � � � � � P M j � X h i X � S ( X ) Adam Rohrlach
A Fundamental Flaw of Bayes Factors. It can be shown that [3]: � � � � � � B ij = P M i � X � × h j X � S ( X ) � � � � � P M j � X h i X � S ( X ) � � � = P M i � X � � � P M j � X Adam Rohrlach
A Fundamental Flaw of Bayes Factors. It can be shown that [3]: � � � � � � B ij = P M i � X � × h j X � S ( X ) � � � � � P M j � X h i X � S ( X ) � � � = P M i � X � � � P M j � X � � � � � � ⇐ ⇒ h j X � S ( X ) = h i X � S ( X ) Adam Rohrlach
A Fundamental Flaw of Bayes Factors. It can be shown that [3]: � � � � � � B ij = P M i � X � × h j X � S ( X ) � � � � � P M j � X h i X � S ( X ) � � � = P M i � X � � � P M j � X � � � � � � ⇐ ⇒ h j X � S ( X ) = h i X � S ( X ) That is, B ij will be biased unless the probability of seeing the data, given the observed summary statistics, is equal for each model. Adam Rohrlach
Post-Hoc Model Comparison. Consider other problems with B ij (and any post-hoc model comparison method). Adam Rohrlach
Post-Hoc Model Comparison. Consider other problems with B ij (and any post-hoc model comparison method). Posterior distributions are sensitive to choices of prior distributions. Adam Rohrlach
Post-Hoc Model Comparison. Consider other problems with B ij (and any post-hoc model comparison method). Posterior distributions are sensitive to choices of prior distributions. A particularly poor choice of π j ( θ ) may reduce the number of retained simulations under Model j , and hence inflate B ij . Adam Rohrlach
Post-Hoc Model Comparison. We would like a model selection algorithm that avoids comparing posterior distributions. Adam Rohrlach
Post-Hoc Model Comparison. We would like a model selection algorithm that avoids comparing posterior distributions. Given that our ‘semi-automatic summary selection’ version ABC is an example of ‘supervised learning’, we could consider a similar method for model selection. Adam Rohrlach
Multiple Logistic Regression. Let X be our data (the collection of Γ × T summary statistics), Adam Rohrlach
Multiple Logistic Regression. Let X be our data (the collection of Γ × T summary statistics), Let x m = be the m th row of X (the summary s m 1 , · · · , s m � � T statistics from the m th simulation). Adam Rohrlach
Multiple Logistic Regression. Let X be our data (the collection of Γ × T summary statistics), Let x m = be the m th row of X (the summary s m 1 , · · · , s m � � T statistics from the m th simulation). Let Y m be the category of the m th observation (the model used for the m th simulation). Adam Rohrlach
Multiple Logistic Regression. Let X be our data (the collection of Γ × T summary statistics), Let x m = be the m th row of X (the summary s m 1 , · · · , s m � � T statistics from the m th simulation). Let Y m be the category of the m th observation (the model used for the m th simulation). Let β c = β c 0 , · · · , β c � � be the vector of coefficients for T category c . Adam Rohrlach
Multiple Logistic Regression. Let X be our data (the collection of Γ × T summary statistics), Let x m = be the m th row of X (the summary s m 1 , · · · , s m � � T statistics from the m th simulation). Let Y m be the category of the m th observation (the model used for the m th simulation). Let β c = β c 0 , · · · , β c � � be the vector of coefficients for T category c . We aim to best fit the model P ( Y m = c � � � � X ) = β c · x , , ln P ( Y m = q � � X ) for c = 1 , · · · , J − 1. Adam Rohrlach
Multiple Logistic Regression. We end up with a predictive model such that we can predict for X NEW : P ( Y m = c � � X NEW ) = p c for each c ∈ { 1 , · · · , q } , such that q � p i = 1 . i = 1 Adam Rohrlach
Multiple Logistic Regression Example. Consider two opposing models of population dynamics: 150000 100000 Model N e ( t ) Bottleneck Exponential 50000 0 0 4000 8000 12000 16000 Generations Before Present (t) Adam Rohrlach
Multiple Logistic Regression Example. The Bottleneck Model: A sudden reduction to between 20% and 40% of the effective population size occurs before the species dies out. The Exponential Model: There was no sudden population size reduction, the species just died out (relatively) slowly over 3000 generations. Adam Rohrlach
Multiple Logistic Regression Example. However, we don’t know which model fits our data best. If the data came from the Bottleneck Model, my prior belief is that: N ( 16000 ) = 150 , 000 , N ( 15500 ) ∼ U ( 30 , 000 , 75 , 000 ) and N ( 12000 ) ∼ U ( 300 , 12500 ) . If the data came from the Exponential Model, my prior belief is that: N ( 16000 ) = 150 , 000 , N ( 15500 ) = 150 , 000 and N ( 12000 ) ∼ U ( 300 , 7500 ) . Adam Rohrlach
Multiple Logistic Regression Example. I produced training data of this form with only 10,000 (5000 simulations from each model ≈ 2 mins), and fit the MLR (call this trainDat). Adam Rohrlach
Recommend
More recommend