On Some Geometrical Aspects of Bayesian Inference Miguel de Carvalho - PowerPoint PPT Presentation

On Some Geometrical Aspects of Bayesian Inference Miguel de Carvalho † † Joint with B. J. Barney and G. L. Page; Brigham Young University, US School of Mathematics M. de Carvalho On the Geometry of Bayesian Inference 1 / 37

ISBA 2018: Edinburgh, June 24–29 World Meeting of the International Society for Bayesian Analysis M. de Carvalho On the Geometry of Bayesian Inference 2 / 37

Introduction Motivation Bayesian methodologies have become main stream. Because of this, there is a need to develop methods accessible to ‘non-experts’ that assess the influence of model choices on inference. These will need to be: Easy to interpret. 1 Easy to calculate. 2 M. de Carvalho On the Geometry of Bayesian Inference 3 / 37

Introduction Motivation Bayesian methodologies have become main stream. Because of this, there is a need to develop methods accessible to ‘non-experts’ that assess the influence of model choices on inference. These will need to be: Easy to interpret. 1 Easy to calculate. 2 Ideally: Provide a unified treatment to all pieces of Bayes theorem. M. de Carvalho On the Geometry of Bayesian Inference 3 / 37

Introduction Motivation Much work has been devoted to developing methods to assess the sensitivity of the posterior to changes in the prior and likelihood. The so-called prior–data conflict has been another subject which has been attracting attention (Evans and Moshonov, 2006; Walter and Augustin, 2009; Al Labadi and Evans, 2016). Others have investigated two competing priors to specifiy so-called weakly informative priors (Evans and Jang, 2011; Gelman et al., 2011). M. de Carvalho On the Geometry of Bayesian Inference 4 / 37

Introduction Goals The novel contribution we intend to make is to provide a metric that is able to carry out comparisons between the: prior and likelihood: to assess the prior–data agreement; prior and posterior: to assess the influence that the prior has on inference; prior and prior: to compare information available on competing priors. To be useful this metric should be: Easy to interpret. 1 Easy to calculate. 2 Ideally: Provide a unified treatment to all pieces of Bayes theorem. M. de Carvalho On the Geometry of Bayesian Inference 5 / 37

Introduction Line of Attack To this end, we view each of the components of Bayes theorem as if they belonged to a geometry and seek to provide intuitively appealing interpretations of the norms and angles between the vectors of this geometry. We will show that calculating these quantities is very straightforward and can be done online. Interpretations are similar to those that accompany the correlation coefficient for continuous random variables. M. de Carvalho On the Geometry of Bayesian Inference 6 / 37

Introduction On-the-Job Drug Usage Toy Example Example (Christensen et al, 2011, pp. 26–27) Suppose interest lies in estimating the proportion θ ∈ [ 0 , 1 ] of US transportation industry workers that use drugs on the job. Suppose y = ( 0 , 1 , 0 , 0 , 0 , 0 , 1 , 0 , 0 , 0 ) and that iid θ | y ∼ Beta ( a ⋆ , b ⋆ ) , y | θ ∼ Bern ( θ ) , θ ∼ Beta ( a , b ) , with a ⋆ = ∑ Y i + a and b ⋆ = n − ∑ Y i + b . The authors conduct the analysis picking ( a , b ) = ( 3 . 44 , 22 . 99 ) . M. de Carvalho On the Geometry of Bayesian Inference 7 / 37

Introduction Natural Questions Some key questions: How compatible is the likelihood with this prior choice? How similar are the posterior and prior distributions? How does the choice of Beta ( a , b ) compare to other possible prior distributions? We provide a unified treatment to answer the questions above. M. de Carvalho On the Geometry of Bayesian Inference 8 / 37

Storyboard Plan of this Talk 1 Introduction (Done) 2 Bayes Geometry (Next) 3 Posterior and Prior Mean-Based Estimators of Compatibility 4 Discussion M. de Carvalho On the Geometry of Bayesian Inference 9 / 37

Bayes Geometry Primitive Structures of Interest Suppose the inference of interest is over a parameter θ in Θ ⊆ R p . We work in L 2 (Θ) , and use the geometry of the Hilbert space H = ( L 2 (Θ) , �· , ·� ) , with inner-product � � g , h � = Θ g ( θ ) h ( θ ) d θ , g , h ∈ L 2 (Θ) , and norm �·� = ( �· , ·� ) 1 / 2 . The fact that H is an Hilbert space is often known as the Riesz–Fischer theorem (Cheney, 2001, p. 411). M. de Carvalho On the Geometry of Bayesian Inference 10 / 37

Bayes Geometry A Geometric View of Bayes Theorem ℓ Bayes theorem p π ( θ ) f ( y | θ ) p ( θ | y ) = � Θ π ( θ ) f ( y | θ ) d θ = π ( θ ) ℓ ( θ ) . � π ,ℓ � π pace1.5cm The likelihood vector is used to enlarge/reduce the magnitude and suitably tilt the direction of the prior vector. M. de Carvalho On the Geometry of Bayesian Inference 11 / 37

Bayes Geometry A Geometric View of Bayes Theorem Define the angle measure between the prior and the likelihood as π ∠ ℓ = arccos � π ,ℓ � � π �� ℓ � . M. de Carvalho On the Geometry of Bayesian Inference 12 / 37

Bayes Geometry A Geometric View of Bayes Theorem Define the angle measure between the prior and the likelihood as π ∠ ℓ = arccos � π ,ℓ � � π �� ℓ � . Since π and ℓ are nonnegative, π ∠ ℓ ∈ [ 0 , 90 ◦ ] . Bayes theorem is incompatible with a prior being orthogonal to the likelihood as π ∠ ℓ = 90 ◦ ⇒ � π ,ℓ � = 0 , thus leading to a division by zero. Our first target object of interest is given by a standardized inner product κ π ,ℓ = � π ,ℓ � � π �� ℓ � , which quantifies how much an expert’s opinion agrees with the data, thus providing a natural measure of prior–data agreement. M. de Carvalho On the Geometry of Bayesian Inference 12 / 37

Bayes Geometry A Geometric View of Bayes Theorem Definition (Millman and Parker, 1991, p. 17) An abstract geometry A consists of a pair { P , L } , where the elements of set P are designed as points, and the elements of the collection L are designed as lines, such that: 1 For every two points A , B ∈ P , there is a line l ∈ L . 2 Every line has at least two points. Our abstract geometry of interest is A = { P , L } , where P = L 2 (Θ) and L = { g + kh , : g , h ∈ L 2 (Θ) } . In our setting points are, for example, prior densities, posterior densities, or likelihoods, as long as they are in L 2 (Θ) . M. de Carvalho On the Geometry of Bayesian Inference 13 / 37

Bayes Geometry A Geometric View of Bayes Theorem Lines are elements of L , so that for example if g and h are densities, line segments in our geometry consist of all possible mixture distributions which can be obtained from g and h , i.e.: { λ g +( 1 − λ ) h : λ ∈ [ 0 , 1 ] } . Vectors in A = { P , L } are defined through the difference of elements in P = L 2 (Θ) . If g , h ∈ L 2 (Θ) are vectors then we say that g and h are collinear if there exists k ∈ R , such that g ( θ ) = kh ( θ ) . Put differently, we say g and h are collinear if g ( θ ) ∝ h ( θ ) , for all θ ∈ Θ . M. de Carvalho On the Geometry of Bayesian Inference 14 / 37

Bayes Geometry A Geometric View of Bayes Theorem π 1 Two different densities π 1 and π 2 cannot be collinear: π 2 � π 2 ( θ ) d θ � = 1 . If π 1 = k π 2 , then k = 1, otherwise A density can be collinear to a likelihood: p If the prior is uniform p ( θ | y ) ∝ ℓ ( θ ) . ℓ M. de Carvalho On the Geometry of Bayesian Inference 15 / 37

Bayes Geometry A Geometric View of Bayes Theorem Our geometry is compatible with having two likelihoods being ℓ collinear. This can be used to rethink the strong likelihood principle that states that if ℓ ∗ ℓ ( θ ) = f ( θ | y ) ∝ f ( θ | y ∗ ) = ℓ ∗ ( θ ) , then the same inference should be drawn from both samples. pace0.5cm According to our geometry the strong likelihood principle reads: “Likelihoods with the same direction should yield the same inference.” M. de Carvalho On the Geometry of Bayesian Inference 16 / 37

Bayes Geometry A Geometric View of Bayes Theorem Definition (Compatibility) The compatibility between points in the geometry under consideration is the mapping κ : L 2 (Θ) × L 2 (Θ) → [ 0 , 1 ] defined as κ g , h = � g , h � � g �� h � , g , h ∈ L 2 (Θ) . pace-.1cmPearson correlation coefficient vs . compatibility � � � X , Y � = Ω XY d P , X , Y ∈ L 2 (Ω , B Ω , P ) , M. de Carvalho On the Geometry of Bayesian Inference 17 / 37

Bayes Geometry A Geometric View of Bayes Theorem Definition (Compatibility) The compatibility between points in the geometry under consideration is the mapping κ : L 2 (Θ) × L 2 (Θ) → [ 0 , 1 ] defined as κ g , h = � g , h � � g �� h � , g , h ∈ L 2 (Θ) . pace-.1cmPearson correlation coefficient vs . compatibility � � � X , Y � = Ω XY d P , instead of X , Y ∈ L 2 (Ω , B Ω , P ) , M. de Carvalho On the Geometry of Bayesian Inference 17 / 37

On Some Geometrical Aspects of Bayesian Inference Miguel de Carvalho - PowerPoint PPT Presentation

On Some Geometrical Aspects of Bayesian Inference Miguel de Carvalho Joint with B. J. Barney and G. L. Page; Brigham Young University, US School of Mathematics M. de Carvalho On the Geometry of Bayesian Inference 1 / 37 ISBA 2018:

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Basics of Bayesian Inference A frequentist thinks of unknown parameters as fixed Basics of

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Meta-Bayesian Analysis A Bayesian decision-theoretic analysis of Bayesian inference under model

EST5104 Bayesian Inference EST5803 Advanced Bayesian Inference Ricardo Ehlers ehlers@icmc.usp.br

Machine Learning: Foundations Lecturer: Yishay Mansour Lecture 2 Bayesian Inference Kfir Bar

Analytics, Inference and Computation in Cosmology: Exercises on Bayesian Inference Roberto

Approximate Bayesian inference for latent Gaussian models avard Rue 1 H Department of

CS 730/730W/830: Intro AI Bayesian Networks Approx. Inference Exact Inference 1 handout: slides

CS 730/830: Intro AI Bayesian Networks Approx. Inference Exact Inference Wheeler Ruml (UNH)

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Some Geometrical Considerations James H. Steiger Department of Psychology and Human Development

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

A Fresh Look at the Bayes Theorem from Information Theory Tan Bui-Thanh Computational

Introduction to Bayesian Analysis in Stata The Method Bayes rule Fundamental equation MCMC

Review of Conditional Probability and Independence Definition L7.3 (Def 1.3.2 on p.20): If A, B

Hierarchical Methods for Bayesian Inverse Problems Optimization and Inversion under Uncertainty,

Generative Learning INFO-4604, Applied Machine Learning University of Colorado Boulder November

2. Naive Bayes Classification Machine Learning and Real-world Data (MLRD) Paula Buttery (based

CSC 411 Lecture 19: Bayesian Linear Regression Roger Grosse, Amir-massoud Farahmand, and Juan

Nave Bayes Classification Nickolai Riabov, Kenneth Tiong Brown University Fall 2013 Nickolai