Bayesian Econometrics Primer St´ ephane Adjemian stephane.adjemian@univ-lemans.fr March, 2016 cba
Introduction ◮ In this chapter we present the Bayesian approach to econometrics. ◮ Basically, this approach allows to incorporate prior knowledge about the model and its parameters in the inference procedure. ◮ We will only deal with problems for which closed form solutions exist (linear models). ◮ In general DSGE models do not admit closed form solutions for the posterior distribution. We will deal with these models in the next chapter. cba
Outline Introduction Maximum likelihood estimation Prior and posterior beliefs Joint, conditional and marginal posterior distributions Point estimate Marginal density of the sample Forecasts Asymptotic properties Non informative priors cba
MV Estimation ◮ A model ( M ) defines a joint probability distribution parameterized (by θ M ) function over a sample of variables (say Y T ): f ( Y T | θ M , M ) (1) ◮ The parameters θ M can be estimated by confronting the model to the data through: – Some moments of the DGP. – The probability density function of the DGP (all the moments). ◮ The first approach is a method of moments, the second one corresponds to the Maximum Likelihood approach. ◮ Basically, a MV estimate for θ M is obtained by maximizing the density of the sample with respect to the parameters (we seek the value of θ M that maximizes the “probability of occurence” of the sample given by the Nature). ◮ In the sequel, we will denote L ( θ ) = f ( Y T | θ ) the likelihood function, omitting the indexation with respect to the model when not necessary. cba
MV Estimation A simple static model ◮ As a first example, we consider the following model: y t = µ 0 + ǫ t (2-a) where ǫ t ∼ iid N (0 , 1) and µ 0 is an unknown finite real parameter. ◮ According to this model, y t is normally distributed: y t | µ 0 ∼ N ( µ 0 , 1) and E [ y t y s ] = 0 for all s � = t . ◮ Suppose that a sample Y T = { y 1 , . . . , y T } is available. The likelihood is defined by: L ( µ ) = f ( y 1 , . . . , y T | µ ) ◮ Because the y s are iid , the joint conditional density is equal to a product of conditional densities: T � L ( µ ) = g ( y t | µ ) t =1 cba
MV Estimation A simple static model ◮ Because the model is Gaussian: � T 1 e − ( yt − µ )2 √ L ( µ ) = 2 2 π t =1 ◮ Finally we have: � T L ( µ ) = (2 π ) − T 2 e − 1 t =1 ( y t − µ ) 2 (2-b) 2 ◮ Note that the likelihood function depends on the data. ◮ Suppose that T = 1 (only one observation in the sample). We can graphically determine the ML estimator of µ in this case. cba
MV Estimation A simple static model (cont’d) L ( µ ) • f ( y 1 | µ = ¯ µ ) Y T y 1 Clearly, the value of the density of y 1 conditional on µ , ie the likelihood, is maximized for µ = y 1 : for any ¯ µ � = y 1 we have f ( y 1 | µ = ¯ µ ) < f ( y 1 | µ = y 1 ) cba
MV Estimation A simple static model (cont’d) ⇒ If we have only one observation, y 1 , the Maximum Likelihood estimator is the observation: � µ = y 1 . ◮ This estimator is unbiased and its variance is 1. ◮ More generally, one can show that the maximum likelihood estimator is equal to the sample mean: T � µ T = 1 � y t (2-c) T t =1 ◮ This estimator is unbiased and its variance is given by: µ T ] = 1 V [ � (2-d) T ◮ Because V [ � µ ] goes to zero as the sample size goes to infinity, we know that this estimator converges in probability to the true value µ 0 of the unknown parameter: proba � µ T T →∞ µ 0 − → cba
The ML estimator of µ must satisfy the following first order condition (considering the log of the likelihood): T � � � yt − � µ T = 0 t =1 T � ⇔ T � µ T = yt t =1 T 1 � ⇔ � µ T = yt T t =1 We establish that this estimator is unbiased by showing that its unconditional expectation is equal to the true value of µ . We have: T 1 � � � µ T = E � E yt T t =1 T 1 � � � = E µ 0 + ǫ t T t =1 1 = T µ 0 + 0 T = µ 0 where the second equality is obtained by linearity of the unconditional expectation and by substituting the DGP. Following the same steps, cba
we easily obtain the variance of the ML estimator: T 1 � � � V µ T � = T 2 V yt t =1 T 1 � � � = V µ 0 + ǫ t T 2 t =1 1 = T 2 T V [ ǫ t ] + 0 1 = T � � = σ 2 where the second equality is a consequence of the independence of the y s. If the variance of ǫ is not unitary we obtain V µ T � ǫ/ T instead. The smaller is the size of the perturbation (or the greater is the sample), the more precise is the ML estimator of µ . This result is intuitive, the more noise we have in the sample (larger variance of ǫ ) the more difficult is the extraction of the true value of µ . cba
MV Estimation A simple dynamic model ◮ Suppose that the data are generated by an AR(1) model: y t = ϕ y t − 1 + ǫ t � � 0 , σ 2 with | ϕ | < 1 and ǫ t ∼ iid N . ǫ ◮ In this case, y t depends (directly) on y t − 1 and also on y t − 2 , y t − 3 , ... . ◮ It is no more legal to write the likelihood as the as a product of marginal densities of the observations. Ex. 1 Show that the density of y ≡ ( y t , y t +1 , . . . , y t + H − 1 ) ′ is given by: 2 y ′ Σ − 1 f ( y ) = (2 π ) − H 2 | Σ y | − 1 2 e − 1 y y with ϕ 2 ϕ H − 1 1 ϕ . . . . . . ϕ 2 ϕ H − 2 ϕ 1 ϕ . . . σ 2 ǫ Σ y = . 1 − ϕ 2 . . ϕ H − 1 ϕ H − 2 . . . . . . ϕ 1 under the assumption of stationarity. cba
MV Estimation A simple dynamic model Ex. 2 Let Y T = { y 1 , y 2 , . . . , y T } be the sample. Write the likelihood function of the AR(1) model under the assumption of stationarity. Admitting that the inverse of the covariance matrix, Σ y , can be factorized as Σ − 1 = σ − 2 ǫ L ′ L with: y � 1 − ϕ 2 0 0 . . . 0 0 − ϕ 1 0 . . . 0 0 0 − ϕ 1 . . . 0 0 L = . . . . . . 0 − ϕ 1 a T × T matrix, show that the likelihood function can be written as: � � − 1 − 1 − ϕ 2 � T σ 2 2 2 y 2 1 t =2 ( y t − ϕ y t − 1 ) ǫ ) = (2 π ) − T 1 e − L ( ϕ, σ 2 ǫ σ T − 1 2 σ 2 2 σ 2 e 2 ǫ ǫ ǫ 1 − ϕ 2 cba
Bayes theorem ◮ Let A and B be two events. ◮ Let P ( A ) and P ( B ) be the marginal probabilities of these events. ◮ Let P ( A ∩ B ) be the joint probability of events A and B . ◮ The Bayes theorem states that the probability of B conditional on A is given by: P ( B | A ) = P ( A ∩ B ) P ( A ) ◮ Or equivalently, that a joint probability can be expressed as the product of a conditional density and a marginal density: P ( A ∩ B ) = P ( B | A ) P ( A ) ⇒ P ( B | A ) = P ( A | B ) P ( B ) P ( A ) ◮ Same for continuous random variables. cba
Prior and posterior beliefs ◮ We assume that we are able to characterize our prior knowledge about a parameter with a probability density function. ◮ Let p 0 ( θ ) be the prior density characterizing our beliefs about the vector of parameters θ . ◮ Our aim is to update our (prior) beliefs about θ with the sample information ( Y T ) embodied in the likelihood function, L ( θ ) = f ( Y T | θ ). ◮ We define the posterior density, p 1 ( θ |Y T ), which represents our updated beliefs. ◮ By the Bayes theorem we have: p ( θ |Y T ) = g ( θ, Y T ) p ( Y T ) and p ( θ |Y T ) = f ( θ |Y T ) p 0 ( θ ) p ( Y T ) where g is the joint density of the sample and the parameters. cba
Prior and posterior beliefs (cont’d) ◮ The posterior density is given by: p ( θ |Y T ) = L ( θ ) p 0 ( θ ) p ( Y T ) ◮ Noting that the denominator does not depend on the parameters, we have that the posterior density is proportional (w.r.t θ ) to the product of the likelihood and the prior density: p ( θ |Y T ) ∝ L ( θ ) p 0 ( θ ) ◮ All the posterior inference about the parameters can be done with the posterior kernel: L ( θ ) p 0 ( θ ). ◮ The denominator is the marginal density of the sample. Because a density has to sum up to one, we have: � p ( Y T ) = f ( Y T | θ ) p 0 ( θ ) d θ The marginal density is a weighted average of the likelihood function → will be used later for model comparison. cba
Prior and posterior beliefs A simple static model (cont’d) ◮ For the sake of simplicity, we will see why later, we choose a Gaussian prior for the parameter µ , with prior expectation µ 0 and prior variance σ 2 µ : 1 µ ( µ − µ 0 ) 2 1 − 2 σ 2 p 0 ( µ ) = √ e σ µ 2 π ◮ The smaller is the prior variance, σ 2 µ , the more informative is the prior. ◮ The posterior density is proportional to the product of the prior density and the likelihood: 1 1 µ ( µ − µ 0 ) 2 � T − t =1 ( y t − µ ) 2 (2 π ) − T 2 e − 1 2 σ 2 p 1 ( µ |Y T ) ∝ √ e 2 σ µ 2 π ◮ One can show that the righthand side expression is proportional to a Gaussian density. cba
Recommend
More recommend