ECON 626: Applied Microeconomics Lecture 11: Maximum Likelihood Estimation Professors: Pamela Jakiela and Owen Ozier
Maximum Likelihood: Motivation So far, we’ve been thinking about average treatment effects, but the ATE may or may not be the main quantity of interest research-wise • Imperfect compliance ⇒ LATE/TOT estimates • Outcomes may be censored (as in a tobit model) ◮ OLS estimates of the treatment effect are inconsistent • Treatments may impact specific parameters in a structural or theoretical model; may want to know how much parameters change ◮ Theory can provide a framework for estimating treatment effects ML approaches can help to translate treatment effects into “economics” UMD Economics 626: Applied Microeconomics Lecture 11: Maximum Likelihood, Slide 2
Maximum Likelihood: Overview In ML estimation, the data-generating process is the theoretical model • First key decision: what is your theoretical model? ◮ Examples: utility function, production function, hazard model • Second key decision: continuous vs. discrete outcome variable ◮ Censoring, extensions lead to intermediate cases • Third key decision: structure of the error term ◮ Typically additive, but distribution matters UMD Economics 626: Applied Microeconomics Lecture 11: Maximum Likelihood, Slide 3
OLS in a Maximum Likelihood Framework Consider a linear model: i β + ε i where ε i | X i ∼ N (0 , σ 2 ) y i = X ′ i β, σ 2 ) ⇒ y i ∼ N ( X ′ The normal error term characterizes the distribution of y : �� � 2 / 2 � y − X ′ β 1 − √ σ f ( y | X ; θ ) = · e σ 2 π � y − X ′ β � = 1 σ φ σ = L ( θ ) where θ = ( β, σ ) UMD Economics 626: Applied Microeconomics Lecture 11: Maximum Likelihood, Slide 4
OLS in a Maximum Likelihood Framework Knowing f ( y | X ; θ ), we can write down the log-likelihood function for θ : � ln [ f ( y i | X i ; θ )] ℓ ( θ ) = i � 1 � y i − X ′ �� i β � = ln σ φ σ i UMD Economics 626: Applied Microeconomics Lecture 11: Maximum Likelihood, Slide 5
ML Estimation in Stata Estimating ˆ β in Stata: capture program drop myols program myols args lnf beta sigma quietly replace ‘lnf’=log((1/‘sigma’)*normalden(($ML_y1-‘beta’)/‘sigma’)) end ml model lf myols (beta: y = x) /sigma ml maximize where $ML y1 is the dependent variable • By default, Stata imposes a linear structure on independent variable UMD Economics 626: Applied Microeconomics Lecture 11: Maximum Likelihood, Slide 6
Tobit Estimation Suppose we only observe y ∗ i if y ∗ i > 0 � 0 if y ∗ i > 0 C i = 1 if y ∗ i ≤ 0 So, we observe: ( X i , y ∗ i · (1 − C i ) , C i ) for each observations i UMD Economics 626: Applied Microeconomics Lecture 11: Maximum Likelihood, Slide 7
Tobit Estimation Suppose we only observe y ∗ i if y ∗ i > 0 � 0 if y ∗ i > 0 C i = 1 if y ∗ i ≤ 0 So, we observe: ( X i , y ∗ i · (1 − C i ) , C i ) for each observations i With censoring of y ∗ i at 0, the likelihood function takes the form: i | X i ; θ )] 1 − C i · [Pr ( y ∗ i ≤ 0 | X i ; θ )] C i L i ( θ ) = [ f ( y ∗ UMD Economics 626: Applied Microeconomics Lecture 11: Maximum Likelihood, Slide 7
Tobit Estimation Since ε i = y ∗ i − X ′ i β , we know that: Pr ( y ∗ i ≤ 0 | X i ; θ ) = Pr ( ε i < − X ′ i β ) � � − X ′ i β = Φ σ � X ′ � i β = 1 − Φ σ UMD Economics 626: Applied Microeconomics Lecture 11: Maximum Likelihood, Slide 8
Tobit Estimation Since ε i = y ∗ i − X ′ i β , we know that: Pr ( y ∗ i ≤ 0 | X i ; θ ) = Pr ( ε i < − X ′ i β ) � � − X ′ i β = Φ σ � X ′ � i β = 1 − Φ σ We can re-write the likelihood as: � 1 � y − X ′ β �� 1 − C i � � X ′ �� C i i β L i ( θ ) = σ φ · 1 − Φ σ σ UMD Economics 626: Applied Microeconomics Lecture 11: Maximum Likelihood, Slide 8
Tobit ML Estimation in Stata Modifying the Stata likelihood function to adjust for censoring: capture program drop mytobit program mytobit args lnf beta sigma quietly replace ‘lnf’=log((1/‘sigma’)*normalden(($ML_y1-‘beta’)/‘sigma’)) quietly replace ‘lnf’= log(1-normal(‘beta’/‘sigma’)) if $ML_y1==0 end ml model lf myols (beta: ystar = x) /sigma ml maximize UMD Economics 626: Applied Microeconomics Lecture 11: Maximum Likelihood, Slide 9
Why Use Maximum Likelihood? Many economic applications start from a non-linear model of an individual decision rule some other underlying structural process • Impacts on preferences (e.g. risk, time) • Duration of unemployment spells • CES production, utility functions Maximum likelihood in Stata vs. Matlab: • Stata is fast and (relatively) easy, if it converges • No restrictions on the functional form of the likelihood in Matlab • Broader range of optimization options in Matlab UMD Economics 626: Applied Microeconomics Lecture 11: Maximum Likelihood, Slide 10
Maximum Likelihood Estimation Let y i be the observed decision in choice situation i for i = 1 , . . . , I y i = g ( x i ; θ ) + ε i where x i denotes the exogenous parameters of the situation (e.g. price), θ denotes the preference/structural parameters, and ε i ∼ N (0 , σ s ) • Space of outcomes/choices is continuous (i.e. not discrete) • g ( x ; θ ) + ε j is the structural model (e.g. demand function) ◮ Often derived by solving for utility-maximizing choice UMD Economics 626: Applied Microeconomics Lecture 11: Maximum Likelihood, Slide 11
Maximum Likelihood Estimation Let y i be the observed decision in choice situation i for i = 1 , . . . , I y i = g ( x i ; θ ) + ε i where x i denotes the exogenous parameters of the situation (e.g. price), θ denotes the preference/structural parameters, and ε i ∼ N (0 , σ s ) • Space of outcomes/choices is continuous (i.e. not discrete) • g ( x ; θ ) + ε j is the structural model (e.g. demand function) ◮ Often derived by solving for utility-maximizing choice Because ε i ∼ N (0 , σ s ), we know that y i − g ( x i ; θ ) ∼ N (0 , σ 2 ) � �� � ε i ⇒ y i ∼ N ( g ( x i ; θ ) , σ 2 ) UMD Economics 626: Applied Microeconomics Lecture 11: Maximum Likelihood, Slide 11
Maximum Likelihood Estimation: CRRA Example Assume utility over income takes the constant relative risk aversion (CRRA) form given risk aversion parameter ρ > 0: u ( x ) = x 1 − ρ 1 − ρ UMD Economics 626: Applied Microeconomics Lecture 11: Maximum Likelihood, Slide 12
Maximum Likelihood Estimation: CRRA Example Assume utility over income takes the constant relative risk aversion (CRRA) form given risk aversion parameter ρ > 0: u ( x ) = x 1 − ρ 1 − ρ Agent chooses an amount, z ∈ [0 , b ], to invest in a risky security that yields payoff of 0 with probability 1 2 and payoff of λ z with probability 1 2 1 � ( b − z ) 1 − ρ + ( b + λ z ) 1 − ρ � max z ∈ [0 , b ] 2(1 − ρ ) The optimal interior allocation to the risky security is given by � λ 1 /ρ − 1 � z ∗ ( b , λ ) = b λ 1 /ρ + λ UMD Economics 626: Applied Microeconomics Lecture 11: Maximum Likelihood, Slide 12
Maximum Likelihood Estimation: CRRA Example People implement their choices with error: z i = z ∗ ( b , λ ) + ε i where ε i | b , λ ∼ N (0 , σ s ) The normal error term characterizes the distribution of y : � 2 �� � y − z ∗ i ( b ,λ ) 1 / 2 − σ f ( z i | b , λ ; θ ) = √ · e 2 π σ � y − z ∗ � = 1 i ( b , λ ) σ φ σ where θ = ( ρ, σ ) UMD Economics 626: Applied Microeconomics Lecture 11: Maximum Likelihood, Slide 13
Maximum Likelihood Estimation: CRRA Example We only observe z ∗ i if z ∗ i < b � if z ∗ 0 i < b C i = 1 if z ∗ i ≥ b UMD Economics 626: Applied Microeconomics Lecture 11: Maximum Likelihood, Slide 14
Maximum Likelihood Estimation: CRRA Example We only observe z ∗ i if z ∗ i < b � if z ∗ 0 i < b C i = 1 if z ∗ i ≥ b With censoring, the likelihood function takes the form: L i ( θ ) = [ f ( z i | b , λ ; θ )] 1 − C i · [Pr ( z i ≥ b ; θ )] C i UMD Economics 626: Applied Microeconomics Lecture 11: Maximum Likelihood, Slide 14
Maximum Likelihood Estimation: CRRA Example We only observe z ∗ i if z ∗ i < b � if z ∗ 0 i < b C i = 1 if z ∗ i ≥ b With censoring, the likelihood function takes the form: L i ( θ ) = [ f ( z i | b , λ ; θ )] 1 − C i · [Pr ( z i ≥ b ; θ )] C i Log likelihood takes the form: ℓ i ( θ ) = (1 − C i ) ln [ f ( z i | b , λ ; θ )] + C i ln [Pr ( z i ≥ b | θ )] UMD Economics 626: Applied Microeconomics Lecture 11: Maximum Likelihood, Slide 14
Maximum Likelihood Estimation: CRRA Example Because we know that ε i | b , λ ∼ N (0 , σ s ), we can calculate: Pr ( z i ≥ b | θ ) = Pr ( z ∗ i ( b , λ ) + ε i ≥ b | θ ) = 1 − Pr ( ε i ≤ b − z ∗ i ( b , λ ) | θ ) � b − z ∗ � i ( b , λ ) = 1 − Φ σ UMD Economics 626: Applied Microeconomics Lecture 11: Maximum Likelihood, Slide 15
Recommend
More recommend