Meta-Bayesian Analysis Jun Yang joint work with Daniel M. Roy - PowerPoint PPT Presentation

Meta-Bayesian Analysis Jun Yang joint work with Daniel M. Roy Department of Statistical Sciences University of Toronto ISBA 2016 June 16, 2016 Meta-Bayesian Analysis (Yang) 1

Motivation “ All models are wrong, “truth [...] is much too complicated to some are useful.” allow anything but approximations.” — George Box – John von Neumann ◮ Subjectivism Bayesian: alluring but impossible to practice when model is wrong ◮ Prior probability = degree of Belief... in what? What is a prior? ◮ Is there any role for (subjective) Bayesianism? Our proposal: More inclusive and pragmatic definition for “prior”. Our approach: Bayesian decision theory Meta-Bayesian Analysis (Yang) 2

Example: Grossly Misspecified Model Setting: Machine learning data are collection of documents: ◮ Model: Latent Dirichlet Allocation (LDA) aka “topic modeling” ◮ Prior belief: ˜ π ≡ 0, i.e., no setting of LDA is faithful to our true beliefs about data. ◮ Conjugate priors π ( d θ ) ∼ Dirichlet ( α ) What is the meaning of a prior on LDA parameters? Pragmatic question: If we use an LDA model (for whatever reason), how should we choose our “prior”? Meta-Bayesian Analysis (Yang) 3

Example: Accurate but still Misspecified Model Setting: Careful Science data are experimental measurements: ◮ Model: ( Q θ ) θ ∈ Θ , painstakingly produced after years of effort ◮ Prior belief: ˜ π ≡ 0, i.e., no Q θ is 100% faithful to our true beliefs about data. What is the meaning of a prior in a misspecified model? (All models are misspecified.) Pragmatic question: How should we choose a “prior”? Meta-Bayesian Analysis (Yang) 4

Standard Bayesian Analysis for Prediction Q θ ( · ) Model on X × Y given parameter θ X : what you will observe Y : what you will then predict π ( · ) prior on θ � ( π Q )( · ) = Q θ ( · ) π ( d θ ) Marginal distribution on X × Y Bayes optimal action minimizes expected Believe ( X , Y ) ∼ π Q loss under the conditional distribution of Y given X = x , written π Q ( dy | x ): The Task 1. Observe X . BayesOptAction ( π Q , x ) 2. Take action ˆ Y . � = arg min L ( a , y ) π Q ( dy | x ) . 3. Suffer loss L ( ˆ Y , Y ) a ◮ Quadratic loss − → posterior mean. The Goal ◮ Self-information loss (log loss) Minimize expected loss − → posterior π Q ( ·| x ). Meta-Bayesian Analysis (Yang) 5

Meta-Bayesian Analysis ◮ ( Q θ ) θ ∈ Θ : the model, i.e., a family of distributions on X × Y . ◮ Don’t believe Q θ , i.e., model is misspecified ◮ P : represents our true belief on X × Y . Believe ( X , Y ) ∼ P But We Will Use Q θ The Task 1. Choose (surrogate) prior π 2. Observe X . 3. Take action ˆ Y = BayesOptAction ( π Q , x ) 4. Suffer loss L ( ˆ Y , Y ) The Goal Minimize expected loss with respect to P not π Q . Meta-Bayesian Analysis (Yang) 6

Meta-Bayesian Analysis Key ideas: ◮ Believe ( X , Y ) ∼ P ◮ But predict using π Q ( ·| X = x ) for some prior π ◮ Prior π is an choice/decision/action. ◮ Loss associated with π and ( x , y ) is L ∗ ( π, ( x , y )) = L ( BayesOptAction ( π Q , x ) , y ) Meta-Bayesian risk ◮ Bayes risk under P of doing Bayesian analysis under π Q � L ∗ ( π, ( x , y )) P ( dx × dy ) . R ( P , π ) = ◮ Meta-Bayesian optimal prior minimizes meta-Bayesian risk: π ∈F R ( P , π ) , inf where F is some set of priors under consideration. Meta-Bayesian Analysis (Yang) 7

Meta-Bayesian Analysis Recipe ◮ Step 1: State P , Q θ , and select a loss function L ; ◮ Step 2: Choose prior π that minimizes meta-Bayesian risk. Examples ◮ Log loss: minimizing the conditional relative entropy � � � P 2 ( x , · ) || π Q ( ·| x ) P 1 ( dx ) inf KL π where P ( dx , dy ) = P 1 ( dx ) P 2 ( x , dy ). ◮ Quadratic loss: minimizing the expected quadratic distance between two posterior means π Q ( ·| x ) and P 2 ( x , · ): � � m π Q ( x ) − m P 2 ( x ) � 2 2 P 1 ( dx ) inf π Meta-Bayesian Analysis (Yang) 8

Meta-Bayesian Analysis High-level Goals ◮ Meta-Bayesian analysis for Q θ under P is generally no easier than doing Bayesian analysis under P directly. ◮ But P serves only as a placeholder for an impossible-to-express true belief. ◮ Our theoretical approach is to attempt to prove general theorems true of broad classes of “true beliefs” P . ◮ The hope is that this will tell us something deep about subjective Bayesianism. Remaining results are some key findings. Meta-Bayesian Analysis (Yang) 9

Meta-Bayesian 101: optimal prior depends on loss data are coin tosses: 10001001100001000100100 ◮ Model: i.i.d. Bernoulli( θ ) sequence, unknown θ ◮ True prior belief ˜ π ( d θ ) Problem Setting ◮ X = { 0 , 1 } n , Y = { 0 , 1 } k . ◮ P : [ Bernoulli ( θ )] n + k , θ ∼ ˜ π ( d θ ) ◮ Q θ : [ Bernoulli ( θ )] n + k , θ ∼ π ( d θ ) Results from Meta-Bayesian Analysis ◮ Log loss: π should match the first n + k moments of ˜ π ; The optimal prior usually depends on n and k ! Meta-Bayesian Analysis (Yang) 10

Meta-Bayesian Analysis for i.i.d. Bernoulli Model Example ◮ true belief P : two state { 0 , 1 } discrete Markov chain with � 1 − p � p transition matrix . q 1 − q ◮ model Q k θ = Bernoulli ( θ ) k . ◮ true prior belief ˜ ν ( d p , d q ) = ˜ π ( d θ ) ˜ κ ( d ψ | θ ) , where p θ = p + q is the limiting relative frequency of 1’s (LRF). Meta-Bayesian Analysis (Yang) 11

What does a prior on an i.i.d. Bernoulli model mean? Conjecture Optimal prior for the model Q k θ is our true belief ˜ π ( d θ ) on the LRF. Theorem (Y.–Roy) False. Example for n = 1 and k = 1 Beta(0.01,0.01) 0.5 ◮ Sticky Markov Chain: 0.45 0.4 0000001111111100000011111111 0.35 0.3 ◮ i.i.d. Model: f π ( θ ) 0.25 0.2 0010011101001011001001001001 0.15 0.1 ◮ Beta (0 . 01 , 0 . 01) is better. 0.05 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 θ Meta-Bayesian Analysis (Yang) 12

What does a prior on an i.i.d. Bernoulli model mean? Theorem (Y.–Roy) 1. Let Q k θ be the i.i.d. Bernoulli model. 2. Let P be true belief and assume P believes in LRF. 3. Let ˜ π ( d θ ) be the true belief about the LRF and assume ˜ π is absolutely continuous. 4. Let π ∗ k = arg min π R ( P , π ) be an optimal surrogate prior. Then � � � KL ( P ( k ) || π ∗ − KL ( P ( k ) || ˜ � k Q k ) π Q k ) � � → 0 as k → ∞ . � �� R ( P ,π ∗ k ) R ( P , ˜ π ) True belief about limiting relative frequency is an asymptotically optimal (surrogate) prior. Meta-Bayesian Analysis (Yang) 13

General Results when P is a mixture of i.i.d. Theorem (Berk 1966). Posterior distribution of θ concentrates asymptotically on point minimizing the KL divergence. p fi PTE tidy ifao ~ % , / l : 9 KLIPYHQOI ' u arg y Problem Setting � ˜ ν ( d ψ ), where ˜ ◮ P = P ψ ˜ P ψ is i.i.d. ◮ Let Ψ θ be the set of ψ such that Q θ closest to ˜ P ψ . � � Ψ θ ˜ ◮ Define P θ = P ψ ˜ ν ( d ψ | θ ) and ˜ π ( d θ ) = Ψ θ ˜ ν ( d ψ ). Meta-Bayesian Analysis (Yang) 14

General Results when P is a mixture of i.i.d. Theorem (Y.–Roy) π P ( k ) = � P ( k ) π ( d θ ) , where P ( k ) 1. ˜ ˜ is i.i.d. θ θ 2. For every θ ∈ Θ , the point θ is the unique point in Θ achieving the infimum inf θ ′ ∈ Θ KL ( Q ( k ) θ ′ || P ( k ) ) for k = 1 . θ Then � � � KL ( P ( k ) || π ∗ k Q k ) − KL ( P ( k ) || ˜ π Q k ) � � � → 0 as k → ∞ . � �� R ( P ,π ∗ k ) R ( P , ˜ π ) True belief about asymptotic “location” of posterior distribution is an asymptotically optimal (surrogate) prior. Meta-Bayesian Analysis (Yang) 15

Conclusion and Future work Conclusion ◮ Standard definition of a (subjective) prior too restrictive ◮ More useful definition using Bayesian decision theory. ◮ Meta-Bayesian prior is one you believe will lead to best results. Future Work ◮ Beyond choosing priors: General Meta-Bayesian analysis (optimal prediction algorithms) ◮ Analysis of the rationality of non-subjective procedures (e.g, switching, empirical Bayes) Meta-Bayesian Analysis (Yang) 16

Meta-Bayesian Analysis Jun Yang joint work with Daniel M. Roy - PowerPoint PPT Presentation

Meta-Bayesian Analysis Jun Yang joint work with Daniel M. Roy Department of Statistical Sciences University of Toronto ISBA 2016 June 16, 2016 Meta-Bayesian Analysis (Yang) 1 Motivation All models are wrong, truth [...] is much too

SKY NETWORK TELEVISION ANNUAL RESULTS 2005 Jun-05 Jun-04 Wholesale Jun-03 Jun-02 Jun-01

Alice Springs Annual Water Production and Rainfall Jun 06 Jun 04 Jun 02 Jun 00 Jun

Meta-Bayesian Analysis A Bayesian decision-theoretic analysis of Bayesian inference under model

Meta- Meta -Programming with Programming with Modelica Modelica for Meta- for Meta

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Bayesian Model-Agnostic Meta-Learning Taesup Kim* (presenter), Jaesik Yoon* Ousmane Dia,

Intelligent Tutoring Systems: A Meta-Analysis Meta-Analysis Wenting Ma March, 2011

Lecture 31/Chapter 25 More about Meta-Analysis Benefits and Pitfalls An Application:

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Comparison of Bayesian Network Meta-Analysis Models for Survival Data Purvi Prajapati James

META Seal of Recognition and META Prize Award Ceremony Georg Rehm (DFKI) on behalf of the

Spectral Inference under Complex Temporal Dynamics Jun Yang joint work with Zhou Zhou

On Bounding the Union Probability Jun Yang (joint work with Fady Alajaji and Glen Takahara)

YANG Data Models for TE and RSVP drafu-ietg-teas-yang-te-08 drafu-ietg-teas-yang-rsvp-07

Yang Yang MICHIGAN TECH Yang Yang , yyang7@mtu.edu RESEARCH FORUM TECHTALKS Current research

User Modeling on Demographic Attributes in Big Mobile Social Networks Yang Yang Northwestern

Towards the Operational Implementation of the Canadian Land Data Assimilation System Bernard

Proactive Customer Service How to do it Well 1 Gather the facts 1 2 1 Think about your

Mastering Your Mindset Mastering Your Money Focus: How to Focus on Earning More Income, and

Object-Oriented Software Engineering Practical Software Development using UML and Java Chapter 2:

Investor Presentation Investor Presentation Investor Presentation Investor Presentation

Evaluating widening participation Key findings from CFEs reports to HEFCE 2 nd September 2015

1989 UK Enterprise Management and Research Association (UKEMRA) 1992 Institute for Small

How to Access Schoology Information (Campus Website Our School Schoology) Schoology

Sambuz

Useful Links

Newsletter

Mail Us