Data-driven Model Selection for Approximate Bayesian Computation via - PowerPoint PPT Presentation

ABC Using Summary Statistics. What are sufficient summary statistics? Sufficient summary statistics contain all of the information about a parameter that is available in a sample (i.e. ¯ X is sufficient for µ ). A summary statistic S ( X ) is sufficient if it can be written in Fisher-Neymann factorised form: � � � � � � L X � θ = g ( X ) h θ S ( X ) � θ Adam Rohrlach

ABC Using Summary Statistics. � � � � � � It can be shown P θ � X obs = P θ � S ( X obs ) . Adam Rohrlach

ABC Using Summary Statistics. � � � � � � It can be shown P θ � X obs = P θ � S ( X obs ) . That is, we can compare sufficient summary statistics to obtain the exact posterior distribution for θ . Adam Rohrlach

The Modified Rejection-Acceptance Algorithm. For some distance function ρ ( S ( X ) , S ( Y )) , and some ‘tolerance’ parameter ǫ , the algorithm now becomes: 1: Set i = 0 2: while i < ℓ do Sample θ ∗ from π ( θ ) 3: Simulate X ∗ from f ( X � � θ ∗ ) 4: if ( ρ ( S ( X ∗ ) , S ( X obs )) < ǫ ) then 5: accept θ ∗ 6: i = i + 1 7: end if 8: 9: end while Adam Rohrlach

ABC Using Summary Statistics. ˆ � Gives the same posterior distribution P ( θ � S ( X obs )) if S ( X ) is sufficient. Adam Rohrlach

ABC Using Summary Statistics. ˆ � Gives the same posterior distribution P ( θ � S ( X obs )) if S ( X ) is sufficient. ˆ � � Again, P ( θ � S ( X obs )) → P ( θ � X obs ) as ǫ → 0. Adam Rohrlach

ABC Using Summary Statistics. ˆ � Gives the same posterior distribution P ( θ � S ( X obs )) if S ( X ) is sufficient. ˆ � � Again, P ( θ � S ( X obs )) → P ( θ � X obs ) as ǫ → 0. Convergence can now be faster. Adam Rohrlach

ABC Using Summary Statistics. ˆ � Gives the same posterior distribution P ( θ � S ( X obs )) if S ( X ) is sufficient. ˆ � � Again, P ( θ � S ( X obs )) → P ( θ � X obs ) as ǫ → 0. Convergence can now be faster. Sufficient summary statistics rarely show up when required. Adam Rohrlach

ABC Using Summary Statistics. ˆ � Gives the same posterior distribution P ( θ � S ( X obs )) if S ( X ) is sufficient. ˆ � � Again, P ( θ � S ( X obs )) → P ( θ � X obs ) as ǫ → 0. Convergence can now be faster. Sufficient summary statistics rarely show up when required. Choosing a ‘best summary statistic’ was the focus of my Masters [2]. Adam Rohrlach

Approximately Sufficient Summary Statistics We have insufficient summary statistics S = { S 1 , · · · , S T } . Adam Rohrlach

Approximately Sufficient Summary Statistics We have insufficient summary statistics S = { S 1 , · · · , S T } . We have parameters of interest Φ = { φ 1 , · · · , φ P } Adam Rohrlach

Approximately Sufficient Summary Statistics We have insufficient summary statistics S = { S 1 , · · · , S T } . We have parameters of interest Φ = { φ 1 , · · · , φ P } Create Γ simulations, which gives Γ × T summary statistics with known input parameters (call this TrainDat). Adam Rohrlach

Approximately Sufficient Summary Statistics We have insufficient summary statistics S = { S 1 , · · · , S T } . We have parameters of interest Φ = { φ 1 , · · · , φ P } Create Γ simulations, which gives Γ × T summary statistics with known input parameters (call this TrainDat). For each n ∈ { 1 , · · · , P } perform linear regression on the TrainDat such that we can get predictions T β ( n ) β ( n ) φ n = ˆ ˆ � ˆ + s j 0 j j = 1 Adam Rohrlach

Approximately Sufficient Summary Statistics We have insufficient summary statistics S = { S 1 , · · · , S T } . We have parameters of interest Φ = { φ 1 , · · · , φ P } Create Γ simulations, which gives Γ × T summary statistics with known input parameters (call this TrainDat). For each n ∈ { 1 , · · · , P } perform linear regression on the TrainDat such that we can get predictions T β ( n ) β ( n ) φ n = ˆ ˆ � ˆ + s j 0 j j = 1 We now have a ‘best predicted parameter value’ if we have summary statistics. Adam Rohrlach

Model Selection in ABC. How do we choose which model we might wish to simulate data under? Adam Rohrlach

Model Selection in ABC. Consider models M = { M 1 , · · · , M q } Adam Rohrlach

Model Selection in ABC. Consider models M = { M 1 , · · · , M q } We can add a step which selects which model we might simulate under. Adam Rohrlach

The Very Modified Rejection-Acceptance Algorithm. Let R ( M k ) be the probability of Model k , and π k ( θ ) be the prior distribution for parameters under Model k . Adam Rohrlach

The Very Modified Rejection-Acceptance Algorithm. Let R ( M k ) be the probability of Model k , and π k ( θ ) be the prior distribution for parameters under Model k . Consider obtaining ℓ posterior samples from a possible q models using some observed data X obs : Adam Rohrlach

The Very Modified Rejection-Acceptance Algorithm. Let R ( M k ) be the probability of Model k , and π k ( θ ) be the prior distribution for parameters under Model k . Consider obtaining ℓ posterior samples from a possible q models using some observed data X obs : 1: Set i = 0 2: while i < ℓ do 10: end while Adam Rohrlach

The Very Modified Rejection-Acceptance Algorithm. Let R ( M k ) be the probability of Model k , and π k ( θ ) be the prior distribution for parameters under Model k . Consider obtaining ℓ posterior samples from a possible q models using some observed data X obs : 1: Set i = 0 2: while i < ℓ do Randomly select some model k to simulate via R ( · ) 3: 10: end while Adam Rohrlach

The Very Modified Rejection-Acceptance Algorithm. Let R ( M k ) be the probability of Model k , and π k ( θ ) be the prior distribution for parameters under Model k . Consider obtaining ℓ posterior samples from a possible q models using some observed data X obs : 1: Set i = 0 2: while i < ℓ do Randomly select some model k to simulate via R ( · ) 3: Sample θ ∗ from π k ( θ ) 4: 10: end while Adam Rohrlach

The Very Modified Rejection-Acceptance Algorithm. Let R ( M k ) be the probability of Model k , and π k ( θ ) be the prior distribution for parameters under Model k . Consider obtaining ℓ posterior samples from a possible q models using some observed data X obs : 1: Set i = 0 2: while i < ℓ do Randomly select some model k to simulate via R ( · ) 3: Sample θ ∗ from π k ( θ ) 4: Simulate X ∗ from f k ( X � � θ ∗ ) 5: if ( ρ ( S ( X ∗ ) , S ( X obs )) < ǫ ) then 6: accept θ ∗ 7: i = i + 1 8: end if 9: 10: end while Adam Rohrlach

Model Selection in ABC. How can we choose which M i best fits our data? Adam Rohrlach

Model Selection in ABC. How can we choose which M i best fits our data? Common approach is to use ‘Bayes Factors’ B ij , i � = j ∈ { 1 , · · · , q } . Adam Rohrlach

Bayes Factors. The Bayes Factor for Models i and j is: Adam Rohrlach

Bayes Factors. The Bayes Factor for Models i and j is: � � � B ij = P X � M i � � � P X � M j Adam Rohrlach

Bayes Factors. The Bayes Factor for Models i and j is: � � � B ij = P X � M i � � � P X � M j � � � = P M i � X P ( X ) / R ( M i ) � � � P ( X ) / R ( M j ) P M j � X Adam Rohrlach

Bayes Factors. The Bayes Factor for Models i and j is: � � � B ij = P X � M i � � � P X � M j � � � = P M i � X P ( X ) / R ( M i ) � � � P ( X ) / R ( M j ) P M j � X � � � = P M i � X � , � � P M j � X if R ( · ) has a uniform distribution. Adam Rohrlach

Bayes Factors. The Bayes Factor for Models i and j is � � � B ij = P M i � X � . � � P M j � X Adam Rohrlach

Bayes Factors. The Bayes Factor for Models i and j is � � � B ij = P M i � X � . � � P M j � X This is just the ‘posterior ratio’ for Models i and j . Adam Rohrlach

Bayes Factors. The Bayes Factor for Models i and j is � � � B ij = P M i � X � . � � P M j � X This is just the ‘posterior ratio’ for Models i and j . Imagine out of 300 retained posterior parameter samples: 200 are from model i , and 100 are from model j , = ⇒ B ij Adam Rohrlach

Bayes Factors. The Bayes Factor for Models i and j is � � � B ij = P M i � X � . � � P M j � X This is just the ‘posterior ratio’ for Models i and j . Imagine out of 300 retained posterior parameter samples: 200 are from model i , and 100 are from model j , ⇒ B ij = 200 / 300 = 100 / 300 Adam Rohrlach

Bayes Factors. The Bayes Factor for Models i and j is � � � B ij = P M i � X � . � � P M j � X This is just the ‘posterior ratio’ for Models i and j . Imagine out of 300 retained posterior parameter samples: 200 are from model i , and 100 are from model j , ⇒ B ij = 200 / 300 = 100 / 300 = 2. Adam Rohrlach

A Fundamental Flaw of Bayes Factors. It can be shown that [3]: � � � � � � B ij = P M i � X � × h j X � S ( X ) � � � � � P M j � X h i X � S ( X ) Adam Rohrlach

A Fundamental Flaw of Bayes Factors. It can be shown that [3]: � � � � � � B ij = P M i � X � × h j X � S ( X ) � � � � � P M j � X h i X � S ( X ) � � � = P M i � X � � � P M j � X Adam Rohrlach

A Fundamental Flaw of Bayes Factors. It can be shown that [3]: � � � � � � B ij = P M i � X � × h j X � S ( X ) � � � � � P M j � X h i X � S ( X ) � � � = P M i � X � � � P M j � X � � � � � � ⇐ ⇒ h j X � S ( X ) = h i X � S ( X ) Adam Rohrlach

A Fundamental Flaw of Bayes Factors. It can be shown that [3]: � � � � � � B ij = P M i � X � × h j X � S ( X ) � � � � � P M j � X h i X � S ( X ) � � � = P M i � X � � � P M j � X � � � � � � ⇐ ⇒ h j X � S ( X ) = h i X � S ( X ) That is, B ij will be biased unless the probability of seeing the data, given the observed summary statistics, is equal for each model. Adam Rohrlach

Post-Hoc Model Comparison. Consider other problems with B ij (and any post-hoc model comparison method). Adam Rohrlach

Post-Hoc Model Comparison. Consider other problems with B ij (and any post-hoc model comparison method). Posterior distributions are sensitive to choices of prior distributions. Adam Rohrlach

Post-Hoc Model Comparison. Consider other problems with B ij (and any post-hoc model comparison method). Posterior distributions are sensitive to choices of prior distributions. A particularly poor choice of π j ( θ ) may reduce the number of retained simulations under Model j , and hence inflate B ij . Adam Rohrlach

Post-Hoc Model Comparison. We would like a model selection algorithm that avoids comparing posterior distributions. Adam Rohrlach

Post-Hoc Model Comparison. We would like a model selection algorithm that avoids comparing posterior distributions. Given that our ‘semi-automatic summary selection’ version ABC is an example of ‘supervised learning’, we could consider a similar method for model selection. Adam Rohrlach

Multiple Logistic Regression. Let X be our data (the collection of Γ × T summary statistics), Adam Rohrlach

Multiple Logistic Regression. Let X be our data (the collection of Γ × T summary statistics), Let x m = be the m th row of X (the summary s m 1 , · · · , s m � � T statistics from the m th simulation). Adam Rohrlach

Multiple Logistic Regression. Let X be our data (the collection of Γ × T summary statistics), Let x m = be the m th row of X (the summary s m 1 , · · · , s m � � T statistics from the m th simulation). Let Y m be the category of the m th observation (the model used for the m th simulation). Adam Rohrlach

Multiple Logistic Regression. Let X be our data (the collection of Γ × T summary statistics), Let x m = be the m th row of X (the summary s m 1 , · · · , s m � � T statistics from the m th simulation). Let Y m be the category of the m th observation (the model used for the m th simulation). Let β c = β c 0 , · · · , β c � � be the vector of coefficients for T category c . Adam Rohrlach

Multiple Logistic Regression. Let X be our data (the collection of Γ × T summary statistics), Let x m = be the m th row of X (the summary s m 1 , · · · , s m � � T statistics from the m th simulation). Let Y m be the category of the m th observation (the model used for the m th simulation). Let β c = β c 0 , · · · , β c � � be the vector of coefficients for T category c . We aim to best fit the model P ( Y m = c � � � � X ) = β c · x , , ln P ( Y m = q � � X ) for c = 1 , · · · , J − 1. Adam Rohrlach

Multiple Logistic Regression. We end up with a predictive model such that we can predict for X NEW : P ( Y m = c � � X NEW ) = p c for each c ∈ { 1 , · · · , q } , such that q � p i = 1 . i = 1 Adam Rohrlach

Multiple Logistic Regression Example. Consider two opposing models of population dynamics: 150000 100000 Model N e ( t ) Bottleneck Exponential 50000 0 0 4000 8000 12000 16000 Generations Before Present (t) Adam Rohrlach

Multiple Logistic Regression Example. The Bottleneck Model: A sudden reduction to between 20% and 40% of the effective population size occurs before the species dies out. The Exponential Model: There was no sudden population size reduction, the species just died out (relatively) slowly over 3000 generations. Adam Rohrlach

Multiple Logistic Regression Example. However, we don’t know which model fits our data best. If the data came from the Bottleneck Model, my prior belief is that: N ( 16000 ) = 150 , 000 , N ( 15500 ) ∼ U ( 30 , 000 , 75 , 000 ) and N ( 12000 ) ∼ U ( 300 , 12500 ) . If the data came from the Exponential Model, my prior belief is that: N ( 16000 ) = 150 , 000 , N ( 15500 ) = 150 , 000 and N ( 12000 ) ∼ U ( 300 , 7500 ) . Adam Rohrlach

Multiple Logistic Regression Example. I produced training data of this form with only 10,000 (5000 simulations from each model ≈ 2 mins), and fit the MLR (call this trainDat). Adam Rohrlach

Data-driven Model Selection for Approximate Bayesian Computation via - PowerPoint PPT Presentation

Data-driven Model Selection for Approximate Bayesian Computation via Multiple Logisitic Regression. Ben Rohrlach Prof. Nigel Bean, Dr Jonathan Tuke University of Adelaide November 6, 2014 Adam Rohrlach Table of Contents Introduction. 1

Approximate Bayesian Computation Chris Drovandi, Charisse Farr October 24, 2012 Chris Drovandi,

Approximate Bayesian Computation Dr. Jarad Niemi STAT 615 - Iowa State University December 5,

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Bayesian Model Selection and Averaging Nonlinear Models Bayes factors Example Families FFX

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Bayesian variable selection Dr. Jarad Niemi Iowa State University September 4, 2017 Jarad Niemi

Sequential Monte Carlo Methods for Bayesian Model Selection in Positron Emission Tomography Yan

Priority-Driven Scheduling of Periodic Tasks Priority-driven vs. clock-driven scheduling:

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?

Luigi Spezia Biomathematics & Statistics Scotland Aberdeen BAYESIAN VARIABLE SELECTION

STAT 213 Model Selection II Colin Reimer Dawson Oberlin College March 30, 2018 1 / 13 Outline

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

Approximate Bayesian Computation Michael Gutmann https://sites.google.com/site/michaelgutmann

Bayesian model averaging Dr. Jarad Niemi Iowa State University September 7, 2017 Jarad Niemi

COMPLEXITY OF MULTI-VALUED REGISTER SIMULATIONS: A RETROSPECTIVE Jennifer L. Welch Texas

Be part of the conversation! Follow APSE on Twitter and LinkedIn @apseevents @ APSE - Association

McBits: Objectives fast constant-time Set new speed records code-based cryptography for

Efficient implementation of Objectives code-based cryptography Set new speed records D. J.

Interannual variation of a 12,760 km transequatorial ionospheric channel availability and its

Fourier Series and Twisted Crossed Products Villa Mondragone, Frascati, June 2014 JFAA 15, 2009

Geometry as made rigorous by Euclid and Descartes David Pierce October ,

E R G O D I C I T Y, E I G E N S TAT E T H E R M A L I Z AT I O N , A N D T H E F O U N D AT I

Data-driven Model Selection for Approximate Bayesian Computation via - PowerPoint PPT Presentation

Data-driven Model Selection for Approximate Bayesian Computation via Multiple Logisitic Regression. Ben Rohrlach Prof. Nigel Bean, Dr Jonathan Tuke University of Adelaide November 6, 2014 Adam Rohrlach Table of Contents Introduction. 1

Approximate Bayesian Computation Chris Drovandi, Charisse Farr October 24, 2012 Chris Drovandi,

Approximate Bayesian Computation Dr. Jarad Niemi STAT 615 - Iowa State University December 5,

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Bayesian Model Selection and Averaging Nonlinear Models Bayes factors Example Families FFX

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Bayesian variable selection Dr. Jarad Niemi Iowa State University September 4, 2017 Jarad Niemi

Sequential Monte Carlo Methods for Bayesian Model Selection in Positron Emission Tomography Yan

Priority-Driven Scheduling of Periodic Tasks Priority-driven vs. clock-driven scheduling:

ERP Selection KIRTANE &amp; PANDIT Suhas Deshpande Why ERP Selection is important ?

Luigi Spezia Biomathematics &amp; Statistics Scotland Aberdeen BAYESIAN VARIABLE SELECTION

STAT 213 Model Selection II Colin Reimer Dawson Oberlin College March 30, 2018 1 / 13 Outline

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

Approximate Bayesian Computation Michael Gutmann https://sites.google.com/site/michaelgutmann

Bayesian model averaging Dr. Jarad Niemi Iowa State University September 7, 2017 Jarad Niemi

COMPLEXITY OF MULTI-VALUED REGISTER SIMULATIONS: A RETROSPECTIVE Jennifer L. Welch Texas

Be part of the conversation! Follow APSE on Twitter and LinkedIn @apseevents @ APSE - Association

McBits: Objectives fast constant-time Set new speed records code-based cryptography for

Efficient implementation of Objectives code-based cryptography Set new speed records D. J.

Interannual variation of a 12,760 km transequatorial ionospheric channel availability and its

Fourier Series and Twisted Crossed Products Villa Mondragone, Frascati, June 2014 JFAA 15, 2009

Geometry as made rigorous by Euclid and Descartes David Pierce October ,

E R G O D I C I T Y, E I G E N S TAT E T H E R M A L I Z AT I O N , A N D T H E F O U N D AT I

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?

Luigi Spezia Biomathematics & Statistics Scotland Aberdeen BAYESIAN VARIABLE SELECTION