Bayesian Methods for Parameter Estimation Bayesian vs Frequentist - PowerPoint PPT Presentation

Bayesian Methods for Parameter Estimation Bayesian vs Frequentist Inference Frequentist Chris Williams, Division of Informatics • Assumes that there is an unknown but fixed parameter θ University of Edinburgh • Estimates θ with some confidence Overview • Prediction by using the estimated parameter value • Introduction to Bayesian Statistics: Learning a Probability Bayesian • Represents uncertainty about the unknown parameter • Learning the mean of a Gaussian • Uses probability to quantify this uncertainty. Unknown parameters as random variables • Prediction follows rules of probability • Readings: Tipping chapter 8; Jordan ch 5; Heckerman tutorial section 2 Frequentist method Bayesian method • Model p ( x | θ, M ) , data D = { x 1 , . . . , x n } • Prior distribution p ( θ | M ) • Posterior distribution p ( θ | D, M ) ˆ θ = argmax θ p ( D | θ, M ) p ( θ | D, M ) = p ( D | θ, M ) p ( θ | M ) p ( D | M ) • Prediction for x n +1 is based on p ( x n +1 | ˆ θ, M )

Bayes, MAP and Maximum Likelihood • Making predictions � p ( x n +1 | D, M ) = p ( x n +1 , θ | D, M ) dθ � p ( x n +1 | D, M ) = p ( x n +1 | θ, M ) p ( θ | D, M ) dθ � = p ( x n +1 | θ, D, M ) p ( θ | D, M ) dθ • Maximum a posteriori value of θ θ MAP = argmax θ p ( θ | D, M ) � = p ( x n +1 | θ, M ) p ( θ | D, M ) dθ Note: not invariant to reparameterization (cf ML estimator) Interpretation: average of predictions p ( x n +1 | θ, M ) weighted by • If posterior is sharply peaked about the most probable value θ MAP then p ( θ | D, M ) p ( x n +1 | D, M ) ≃ p ( x n +1 | θ MAP , M ) • In the limit n → ∞ , θ MAP converges to ˆ θ (as long as p (ˆ θ ) � = 0 ) • Marginal likelihood (important for model comparison) • Bayesian approach most effective when data is limited, n is small � p ( D | M ) = P ( D | θ, M ) P ( θ | M ) dθ Learning probabilities: thumbtack example Likelihood Frequentist Approach • Likelihood for a sequence of heads and tails • The probability of heads θ is un- heads tails known p ( hhth . . . tth | θ ) = θ n h (1 − θ ) n t • Given iid data, estimate θ using an estimator with good proper- • MLE ties (e.g. ML estimator) n h ˆ θ = n h + n t

Learning probabilities: thumbtack example Examples of the Beta distribution 3.5 2 Bayesian Approach: (a) the prior 1.8 3 1.6 1.4 2.5 1.2 2 1 0.8 1.5 0.6 • Prior density p ( θ ) , use beta distribution 0.4 1 0.2 0.5 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 p ( θ ) = Beta( α h , α t ) ∝ θ α h − 1 (1 − θ ) α t − 1 Beta(0.5,0.5) Beta(1,1) 1.8 4.5 for α h , α t > 0 1.6 4 1.4 3.5 1.2 3 1 2.5 0.8 2 0.6 1.5 0.4 1 • Properties of the beta distribution 0.2 0.5 0 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 α h � E [ θ ] = θp ( θ ) = α h + α t Beta(3,2) Beta(15,10) Bayesian Approach: (b) the posterior Bayesian Approach: (c) making predictions p ( θ | D ) ∝ p ( θ ) p ( D | θ ) θ ∝ θ α h − 1 (1 − θ ) α t − 1 θ n h (1 − θ ) n t ∝ θ α h + n h − 1 (1 − θ ) α t + n t − 1 x x x x n+1 n 1 2 • Posterior is also a Beta distribution ∼ Beta( α h + n h , α t + n t ) • The Beta prior is conjugate to the binomial likelihood (i.e. they have the � p ( X n +1 = heads | D, M ) = p ( X n +1 = heads | θ ) p ( θ | D, M ) dθ same parametric form) � = θ Beta(( α h + n h , α t + n t ) dθ • α h and α t can be thought of as imaginary counts, with α = α h + α t as the equivalent sample size = α h + n h α + n

Beyond Conjugate Priors • The thumbtack came from a magic shop → a mixture prior p ( θ ) = 0 . 4Beta(20 , 0 . 5) + 0 . 2Beta(2 , 2) + 0 . 4Beta(0 . 5 , 20) Generalization to multinomial variables • Posterior distribution r � • Dirichlet prior θ α i + n i − 1 p ( θ | n 1 , . . . , n r ) ∝ i r i =1 � θ α i − 1 p ( θ 1 , . . . , θ r ) = Dir( α 1 , . . . , α r ) ∝ i • Marginal likelihood i =1 with r Γ( α ) Γ( α i + n i ) � p ( D | M ) = � θ i = 1 , α i > 0 Γ( α + n ) Γ( α i ) i =1 i • α i ’s are imaginary counts, α = � i α i is equivalent sample size • Properties E ( θ i ) = α i α • Dirichlet distribution is conjugate to the multinomial likelihood

Inferring the mean of a Gaussian with n x = 1 � x i n i =1 • Likelihood nσ 2 σ 2 p ( x | µ ) ∼ N ( µ, σ 2 ) 0 µ n = 0 + σ 2 x + 0 + σ 2 µ 0 nσ 2 nσ 2 1 σ 2 + 1 = n • Prior σ 2 σ 2 p ( µ ) ∼ N ( µ 0 , σ 2 n 0 0 ) • See Tipping § 8.3.1 for details • Given data D = { x 1 , . . . , x n } , what is p ( µ | D ) ? p ( µ | D ) ∼ N ( µ n , σ 2 n ) Comparing Bayesian and Frequentist approaches • Frequentist : fi x θ , consider all possible data sets generated with θ fi xed • Bayesian : fi x D , consider all possible values of θ • One view is that Bayesian and Frequentist approaches have different defi nitions of what it means to be a good estimator

Bayesian Methods for Parameter Estimation Bayesian vs Frequentist - PowerPoint PPT Presentation

Bayesian Methods for Parameter Estimation Bayesian vs Frequentist Inference Frequentist Chris Williams, Division of Informatics Assumes that there is an unknown but fixed parameter University of Edinburgh Estimates with some

I 4 - Bayesian parameter estimation in a normal model STAT 587 (Engineering) Iowa State

Lecture 6. Bayesian estimation Lecture 6. Bayesian estimation 1 (172) 6. Bayesian estimation

Maximum-likelihood and Bayesian parameter estimation Andrea Passerini passerini@disi.unitn.it

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Supervised Maximum Likelihood

Overview Bayesian Methods for Parameter Estimation Introduction to Bayesian Statistics: Learning

Maximum likelihood parameter estimation Maximum likelihood parameter estimation For an HMM

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Martin Emms September 20, 2019 4CSLL5

6. Parameter Passing Parameter Passing CS 381 Spring 2016 Example (Formal) Parameter void

10/16/19 Parameter Control Genetic Algorithms Motivation Parameter setting Tuning

Outline Introduction Knowledge Structures Parameter Estimation Maximum Likelihood Estimation

Parameter estimation methods for fault detection and isolation LAAS-CNRS UPC Teresa Escobet

Parameter Passing and Pointers Parameter passing and functions I: reference parameters

10/16/19 Parameters and Parameter Tuning Genetic Algorithms History Taxonomy

Part 3. Spectrum Estimation Part 3. Spectrum Estimation 3.2 Parametric Methods for Spectral

Some Bayesian Approaches for ERGM Ranran Wang, UW MURI-UCI August 25, 2009 Some Bayesian

Lecture 3: Comparing frequentist and Bayesian estimation techniques Julia Hockenmaier

Lecture 11: Group Assignment 1 Review, Procrustes Intro COMPSCI/MATH 290-04 Chris Tralie, Duke

3 Visualizing quantitative Information 1 Outline New ideas about good and bad graphs

3/1/2010 Acceleration Techniques V1.2 Anthony Steed Anthony Steed Based on slides from Celine

Frequentist and Bayesian statistics Claus Ekstrm E-mail: ekstrom@life.ku.dk Outline 1

Welcome ! BAYE SIAN R E G R E SSION MOD E L IN G W ITH R STAN AR M Jake Thompson Ps y

A Frequentist Semantics for a Generalized Jeffrey Conditionalization Dirk Draheim Tallinn

Probability Basics Martin Emms October 1, 2020 Probability Basics Outline Probability