Microeconometrics Module A: Non-continuous outcomes I Alexander Ahammer Department of Economics, Johannes Kepler University, Linz, Austria Christian Doppler Laboratory Ageing, Health, and the Labor Market, Linz, Austria This version: March 22, 2019 Alexander Ahammer (JKU) Non-continuous outcomes I 1 / 42
Non-continuous outcomes Often outcome variables are limited , for example, they have only a finite and small number of possible realizations In this case, it doesn’t make sense to treat them as roughly continuous → we have to look for alternatives to OLS − How to model ◮ binary choices ◮ multiple choices Maximum likelihood Alexander Ahammer (JKU) Non-continuous outcomes I 2 / 42
A.1 Binary choices Alexander Ahammer (JKU) Non-continuous outcomes I 3 / 42
Binary choices We have a sample of individuals i = { 1 , 2 , . . . , N } For each i we observe a binary variable � with probability P ( Y = 1) = P 1 (1) Y = with probability P ( Y = 0) = 1 − P 0 X is a row vector of k potential factors that explain which outcome prevails. For individual i we observe the vector X i We are interested in the estimated effects of the factors X on the probability of observing Y = 1 , γ = ∂P (2) ∂X ′ where γ is a vector of k marginal effects Note that (3) E ( Y ) = 1 · P + 0 · (1 − P ) = P Alexander Ahammer (JKU) Non-continuous outcomes I 4 / 42
The linear probability model (LPM) The LPM assumes that (4) P = F ( X, β ) = Xβ where β is a column vector of k parameters and X ∈ R n × ( k +1) includes a constant Because of linearity, and using eq (3), Y = E ( Y ) + [ Y − E ( Y )] = P + [ Y − E ( Y )] (5) = Xβ + ε with � with probability P 1 − Xβ (6) ε = with probability 1 − P − Xβ Alexander Ahammer (JKU) Non-continuous outcomes I 5 / 42
The linear probability model (LPM) The marginal effect of X on P is γ = ∂P (7) ∂X ′ = β which we can estimate using OLS. How are partial effects β j for some x j interpreted? If x j is non-binary , β j is the change in the probability of success given a one-unit increase in x j . If x j is binary , β j is the difference in the probability of success given when x j switches from zero to one. Alexander Ahammer (JKU) Non-continuous outcomes I 6 / 42
The linear probability model (LPM) Alexander Ahammer (JKU) Non-continuous outcomes I 7 / 42
The linear probability model (LPM) The LPM has several shortcomings: Predictions are not bounded between [0 , 1] ⇒ may yield non-sense predictions = Heteroskedasticity is present by construction ⇒ easy fix: robust standard errors = LPM implicitly assumes that the partial effects of X on P are constant, regardless of the initial levels of the X Errors are not normally distributed But also many advantages: Easy to compute and interpret. Estimates are easily comparable with linear estimates of continuous outcomes. Widely accepted in the applied econometrics literature. Alexander Ahammer (JKU) Non-continuous outcomes I 8 / 42
Nonlinear probability models General class of nonlinear binary choice models Instead of assuming that P is linear in parameters, we can consider a class of binary response models of the form (8) P = P ( Y = 1) = F ( Xβ ) where F () is a symmetric cumulative distribution taking on values strictly between 0 and 1. That is, 0 < F ( z ) < 1 , for all z ∈ R . Let’s introduce an unobservable index function, Y ∗ = Xβ + ε (9) with � if Y ∗ ≥ 0 1 (10) Y = if Y ∗ < 0 0 (note that the choice of the threshold is irrelevant) Alexander Ahammer (JKU) Non-continuous outcomes I 9 / 42
Nonlinear probability models General class of nonlinear binary choice models � if Y ∗ ≥ 0 1 Y ∗ = Xβ + ε, (11) Y = if Y ∗ < 0 0 How can we express the probabilities of the two outcomes in such a model? P ( Y = 1) = P ( Y ∗ > 0) = P ( Xβ + ε > 0) = P ( ε > − Xβ ) (12) = P ( ε < Xβ ) where eq (12) is the cdf of ε evaluated at Xβ Alexander Ahammer (JKU) Non-continuous outcomes I 10 / 42
Nonlinear probability models General class of nonlinear binary choice models Under these assumptions, the marginal effect of X on P is γ = ∂P (13) ∂X ′ = F ′ β = fβ where f is the density function of F Note that F ′ and f are scalar functions of Xβ Unlike the LPM, in this model β is not sufficient to estimate a marginal effect, γ has to be evaluated at some realization of X Alexander Ahammer (JKU) Non-continuous outcomes I 11 / 42
Maximum likelihood The maximum likelihood estimator (MLE) � θ ML = � θ of a parameter θ is given by maximizing the likelihood � (14) θ = arg max L ( θ ) θ where L ( θ ) = f ( X, θ ) is the density function of θ . Typically it is more convenient to maximize the log-likelihood l ( θ ) = log L ( θ ) . Because of monotonicity of the log, � (15) θ = arg max L ( θ ) = arg max l ( θ ) θ θ For an iid sample i = 1 , . . . , n with probability density f ( X, θ ) of X , � n � n and (16) L ( θ ) = f ( X, θ ) l ( θ ) = log f ( X, θ ) i =1 i =1 Alexander Ahammer (JKU) Non-continuous outcomes I 12 / 42
Maximum likelihood Example Let X ∼ Binomial(n , π ) with realizations x . Then, � n � π x (1 − π ) n − x = π x (1 − π ) n − x (17) L ( π ) = x because the multiplicative constant can be ignored. The log-likelihood is (18) l ( π ) = log L ( π ) = x log π + ( n − x ) log(1 − π ) and the maximum likelihood estimator is l ′ ( π ) = x π − n − x (19) 1 − π = 0 π ML = x (20) = ⇒ � n Alexander Ahammer (JKU) Non-continuous outcomes I 13 / 42
Maximum likelihood Properties In plain English, MLE selects coefficients θ as to maximize the joint likelihood of the sample data, i.e., ‘maximize the likelihood that the process described by the model produced the data that we actually observe’ MLE does not have an analytic solution as OLS does — it is an extremum estimator, statistical software uses iterative numerical procedures to find a vector of coefficients � θ that solve the maximization problem in eq (15) In finite samples MLE may perform poor, but as n → ∞ MLE is both consistent and efficient Please refer to the other resources for more details, especially on the derivation of test statistics Alexander Ahammer (JKU) Non-continuous outcomes I 14 / 42
Nonlinear probability models Let’s return to binary choice models. We assumed P = F ( Xβ ) and γ = fβ The ML function is L = P ( Y 1 = y 1 , Y 2 = y 2 , . . . , Y n = y n ) � � = [1 − F ( X i β )] F ( X i β ) y i =0 y i =1 n � [1 − F ( X i β )] 1 − y i F ( X i β ) y i (21) = i =1 Taking logs, � n (22) l = [(1 − y i ) ln(1 − F ( X i β )) + y i ln( F ( X i β ))] i =1 Alexander Ahammer (JKU) Non-continuous outcomes I 15 / 42
Nonlinear probability models The foc for maximization is � y i f ( X i β ) � n � F ( X i β ) + − (1 − y i ) f ( X i β ) ∂l (23) ∂β ′ = X i = 0 1 − F ( X i β ) i =1 The solution of the system in eq (23) gives the vector of ML estimates � β The asymptotic covariance matrix V of the β is the inverse of the Hessian � ∂ 2 ln L � − 1 V = − H − 1 = (24) ∂β∂β ′ which is a k × k matrix Alexander Ahammer (JKU) Non-continuous outcomes I 16 / 42
Nonlinear probability models Coefficients and interpretation Recall that γ = ∂P (25) ∂X ′ = F ′ β = fβ and that f is a function of Xβ . Thus, to estimate the probability of the outcome and the marginal effects we need an estimate of β and some realization of X . Where should we evaluate the estimates of P and γ ? Compute � P and � γ for each i and then take averages over all observations 1 for the sample mean of the observations X i 2 for a particularly interesting value of X (e.g., median) 3 for an artificialy created individual with values of X defined by us 4 where solutions 1 and 2 are asymptotically equivalent but may differ in small samples. Alexander Ahammer (JKU) Non-continuous outcomes I 17 / 42
Nonlinear probability models Now we discuss the two most famous non-linear probability models: Probit Logit Alexander Ahammer (JKU) Non-continuous outcomes I 18 / 42
Probit Probit simply assumes that F is the standard normal, � Xβ (26) P ( Y = 1) = F ( Xβ ) = φ ( t )d t = Φ( Xβ ) −∞ Log-likelihood: n � (27) l = [(1 − y i ) ln(1 − Φ( X i β )) + y i ln(Φ( X i β ))] i =1 Marginal effect: γ = ∂P (28) ∂X ′ = φ ( Xβ ) β Alexander Ahammer (JKU) Non-continuous outcomes I 19 / 42
Logit Logit assumes that F is logistic, e Xβ (29) P ( Y = 1) = F ( Xβ ) = 1 + e Xβ = Λ( Xβ ) Note that (30) F ′ ( Xβ ) = f ( Xβ ) = Λ( Xβ )[1 − λ ( Xβ )] Log-likelihood: n � (31) l = [(1 − y i ) ln(1 − Λ( X i β )) + y i ln(Λ( X i β ))] i =1 Marginal effect: γ = ∂P (32) ∂X ′ = [Λ(1 − Λ)] β Alexander Ahammer (JKU) Non-continuous outcomes I 20 / 42
Recommend
More recommend