Microeconometrics Module A: Non-continuous outcomes I Alexander - PowerPoint PPT Presentation

Microeconometrics Module A: Non-continuous outcomes I Alexander Ahammer Department of Economics, Johannes Kepler University, Linz, Austria Christian Doppler Laboratory Ageing, Health, and the Labor Market, Linz, Austria This version: March 22, 2019 Alexander Ahammer (JKU) Non-continuous outcomes I 1 / 42

Non-continuous outcomes Often outcome variables are limited , for example, they have only a finite and small number of possible realizations In this case, it doesn’t make sense to treat them as roughly continuous → we have to look for alternatives to OLS − How to model ◮ binary choices ◮ multiple choices Maximum likelihood Alexander Ahammer (JKU) Non-continuous outcomes I 2 / 42

A.1 Binary choices Alexander Ahammer (JKU) Non-continuous outcomes I 3 / 42

Binary choices We have a sample of individuals i = { 1 , 2 , . . . , N } For each i we observe a binary variable � with probability P ( Y = 1) = P 1 (1) Y = with probability P ( Y = 0) = 1 − P 0 X is a row vector of k potential factors that explain which outcome prevails. For individual i we observe the vector X i We are interested in the estimated effects of the factors X on the probability of observing Y = 1 , γ = ∂P (2) ∂X ′ where γ is a vector of k marginal effects Note that (3) E ( Y ) = 1 · P + 0 · (1 − P ) = P Alexander Ahammer (JKU) Non-continuous outcomes I 4 / 42

The linear probability model (LPM) The LPM assumes that (4) P = F ( X, β ) = Xβ where β is a column vector of k parameters and X ∈ R n × ( k +1) includes a constant Because of linearity, and using eq (3), Y = E ( Y ) + [ Y − E ( Y )] = P + [ Y − E ( Y )] (5) = Xβ + ε with � with probability P 1 − Xβ (6) ε = with probability 1 − P − Xβ Alexander Ahammer (JKU) Non-continuous outcomes I 5 / 42

The linear probability model (LPM) The marginal effect of X on P is γ = ∂P (7) ∂X ′ = β which we can estimate using OLS. How are partial effects β j for some x j interpreted? If x j is non-binary , β j is the change in the probability of success given a one-unit increase in x j . If x j is binary , β j is the difference in the probability of success given when x j switches from zero to one. Alexander Ahammer (JKU) Non-continuous outcomes I 6 / 42

The linear probability model (LPM) Alexander Ahammer (JKU) Non-continuous outcomes I 7 / 42

The linear probability model (LPM) The LPM has several shortcomings: Predictions are not bounded between [0 , 1] ⇒ may yield non-sense predictions = Heteroskedasticity is present by construction ⇒ easy fix: robust standard errors = LPM implicitly assumes that the partial effects of X on P are constant, regardless of the initial levels of the X Errors are not normally distributed But also many advantages: Easy to compute and interpret. Estimates are easily comparable with linear estimates of continuous outcomes. Widely accepted in the applied econometrics literature. Alexander Ahammer (JKU) Non-continuous outcomes I 8 / 42

Nonlinear probability models General class of nonlinear binary choice models Instead of assuming that P is linear in parameters, we can consider a class of binary response models of the form (8) P = P ( Y = 1) = F ( Xβ ) where F () is a symmetric cumulative distribution taking on values strictly between 0 and 1. That is, 0 < F ( z ) < 1 , for all z ∈ R . Let’s introduce an unobservable index function, Y ∗ = Xβ + ε (9) with � if Y ∗ ≥ 0 1 (10) Y = if Y ∗ < 0 0 (note that the choice of the threshold is irrelevant) Alexander Ahammer (JKU) Non-continuous outcomes I 9 / 42

Nonlinear probability models General class of nonlinear binary choice models � if Y ∗ ≥ 0 1 Y ∗ = Xβ + ε, (11) Y = if Y ∗ < 0 0 How can we express the probabilities of the two outcomes in such a model? P ( Y = 1) = P ( Y ∗ > 0) = P ( Xβ + ε > 0) = P ( ε > − Xβ ) (12) = P ( ε < Xβ ) where eq (12) is the cdf of ε evaluated at Xβ Alexander Ahammer (JKU) Non-continuous outcomes I 10 / 42

Nonlinear probability models General class of nonlinear binary choice models Under these assumptions, the marginal effect of X on P is γ = ∂P (13) ∂X ′ = F ′ β = fβ where f is the density function of F Note that F ′ and f are scalar functions of Xβ Unlike the LPM, in this model β is not sufficient to estimate a marginal effect, γ has to be evaluated at some realization of X Alexander Ahammer (JKU) Non-continuous outcomes I 11 / 42

Maximum likelihood The maximum likelihood estimator (MLE) � θ ML = � θ of a parameter θ is given by maximizing the likelihood � (14) θ = arg max L ( θ ) θ where L ( θ ) = f ( X, θ ) is the density function of θ . Typically it is more convenient to maximize the log-likelihood l ( θ ) = log L ( θ ) . Because of monotonicity of the log, � (15) θ = arg max L ( θ ) = arg max l ( θ ) θ θ For an iid sample i = 1 , . . . , n with probability density f ( X, θ ) of X , � n � n and (16) L ( θ ) = f ( X, θ ) l ( θ ) = log f ( X, θ ) i =1 i =1 Alexander Ahammer (JKU) Non-continuous outcomes I 12 / 42

Maximum likelihood Example Let X ∼ Binomial(n , π ) with realizations x . Then, � n � π x (1 − π ) n − x = π x (1 − π ) n − x (17) L ( π ) = x because the multiplicative constant can be ignored. The log-likelihood is (18) l ( π ) = log L ( π ) = x log π + ( n − x ) log(1 − π ) and the maximum likelihood estimator is l ′ ( π ) = x π − n − x (19) 1 − π = 0 π ML = x (20) = ⇒ � n Alexander Ahammer (JKU) Non-continuous outcomes I 13 / 42

Maximum likelihood Properties In plain English, MLE selects coefficients θ as to maximize the joint likelihood of the sample data, i.e., ‘maximize the likelihood that the process described by the model produced the data that we actually observe’ MLE does not have an analytic solution as OLS does — it is an extremum estimator, statistical software uses iterative numerical procedures to find a vector of coefficients � θ that solve the maximization problem in eq (15) In finite samples MLE may perform poor, but as n → ∞ MLE is both consistent and efficient Please refer to the other resources for more details, especially on the derivation of test statistics Alexander Ahammer (JKU) Non-continuous outcomes I 14 / 42

Nonlinear probability models Let’s return to binary choice models. We assumed P = F ( Xβ ) and γ = fβ The ML function is L = P ( Y 1 = y 1 , Y 2 = y 2 , . . . , Y n = y n ) � � = [1 − F ( X i β )] F ( X i β ) y i =0 y i =1 n � [1 − F ( X i β )] 1 − y i F ( X i β ) y i (21) = i =1 Taking logs, � n (22) l = [(1 − y i ) ln(1 − F ( X i β )) + y i ln( F ( X i β ))] i =1 Alexander Ahammer (JKU) Non-continuous outcomes I 15 / 42

Nonlinear probability models The foc for maximization is � y i f ( X i β ) � n � F ( X i β ) + − (1 − y i ) f ( X i β ) ∂l (23) ∂β ′ = X i = 0 1 − F ( X i β ) i =1 The solution of the system in eq (23) gives the vector of ML estimates � β The asymptotic covariance matrix V of the β is the inverse of the Hessian � ∂ 2 ln L � − 1 V = − H − 1 = (24) ∂β∂β ′ which is a k × k matrix Alexander Ahammer (JKU) Non-continuous outcomes I 16 / 42

Nonlinear probability models Coefficients and interpretation Recall that γ = ∂P (25) ∂X ′ = F ′ β = fβ and that f is a function of Xβ . Thus, to estimate the probability of the outcome and the marginal effects we need an estimate of β and some realization of X . Where should we evaluate the estimates of P and γ ? Compute � P and � γ for each i and then take averages over all observations 1 for the sample mean of the observations X i 2 for a particularly interesting value of X (e.g., median) 3 for an artificialy created individual with values of X defined by us 4 where solutions 1 and 2 are asymptotically equivalent but may differ in small samples. Alexander Ahammer (JKU) Non-continuous outcomes I 17 / 42

Nonlinear probability models Now we discuss the two most famous non-linear probability models: Probit Logit Alexander Ahammer (JKU) Non-continuous outcomes I 18 / 42

Probit Probit simply assumes that F is the standard normal, � Xβ (26) P ( Y = 1) = F ( Xβ ) = φ ( t )d t = Φ( Xβ ) −∞ Log-likelihood: n � (27) l = [(1 − y i ) ln(1 − Φ( X i β )) + y i ln(Φ( X i β ))] i =1 Marginal effect: γ = ∂P (28) ∂X ′ = φ ( Xβ ) β Alexander Ahammer (JKU) Non-continuous outcomes I 19 / 42

Logit Logit assumes that F is logistic, e Xβ (29) P ( Y = 1) = F ( Xβ ) = 1 + e Xβ = Λ( Xβ ) Note that (30) F ′ ( Xβ ) = f ( Xβ ) = Λ( Xβ )[1 − λ ( Xβ )] Log-likelihood: n � (31) l = [(1 − y i ) ln(1 − Λ( X i β )) + y i ln(Λ( X i β ))] i =1 Marginal effect: γ = ∂P (32) ∂X ′ = [Λ(1 − Λ)] β Alexander Ahammer (JKU) Non-continuous outcomes I 20 / 42

Microeconometrics Module A: Non-continuous outcomes I Alexander - PowerPoint PPT Presentation

Microeconometrics Module A: Non-continuous outcomes I Alexander Ahammer Department of Economics, Johannes Kepler University, Linz, Austria Christian Doppler Laboratory Ageing, Health, and the Labor Market, Linz, Austria This version: March 22,

Advances in microeconometrics and finance using instrumental variables Christopher F Baum 1

MECT Microeconometrics Blundell Lecture 1 Overview and Binary Response Models Richard Blundell

MECT Microeconometrics Blundell Lecture 4 Evaluation Methods I Richard Blundell

Microeconometrics MECT2 Lecture 9: Evaluation Methods I Richard Blundell

MECT Microeconometrics Blundell Lecture 3 Selection Models Richard Blundell

Microeconometrics Blundell Lecture 1 Overview and Binary Response Models Richard Blundell

MECT Microeconometrics Blundell Lecture 2 Censored Data Models Richard Blundell

MECT Microeconometrics Blundell Lecture 3 Evaluation Methods II Richard Blundell

The Persistent Effects of Perus Mining Mita by Melissa Dell, Econometrica (2010)

Applied Statistics Lecturer: Serena Arima Introduction Binary model Example Fit Test

Week 7: Binary Outcomes Logistic Regression & Classification Max H. Farrell The University

Binary choice 3.2 Apply the model on data Michel Bierlaire Solution of the practice quiz.

Binary Choice Matthieu de Lapparent matthieu.delapparent@epfl.ch Transport and Mobility

Qualitative Response Models Michael R. Roberts Department of Finance The Wharton School

Estimation in the Fixed Effects Ordered Logit Model Chris Muris (SFU) Outline Introduction

KernGPLM A Package for Kernel-Based Fitting of Aim of this Talk Generalized Partial Linear

Recovering Preferences from Finite Data Christopher Chambers 1 , Federico Echenique 2 , Nicolas

Elimination of binary choice sequences Tatsuji Kawai Japan Advanced Institute of Science and

III.3 Probabilistic Retrieval Models 1. Probabilistic Ranking Principle 2. Binary Independence

Generalized Probit Model in Design of Dose Finding Experiments Yuehui Wu Valerii V. Fedorov

Modelling and Verification Lecture 1 Lecturer: Luca Aceto luca@ru.is or luca.aceto@gmail.com

CSC 311: Introduction to Machine Learning Lecture 7 - Probabilistic Models Roger Grosse Chris

Probabilistic Models Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National

Dynamic Programming Greedy. Build up a solution incrementally, myopically optimizing

Microeconometrics Module A: Non-continuous outcomes I Alexander - PowerPoint PPT Presentation

Microeconometrics Module A: Non-continuous outcomes I Alexander Ahammer Department of Economics, Johannes Kepler University, Linz, Austria Christian Doppler Laboratory Ageing, Health, and the Labor Market, Linz, Austria This version: March 22,

Advances in microeconometrics and finance using instrumental variables Christopher F Baum 1

MECT Microeconometrics Blundell Lecture 1 Overview and Binary Response Models Richard Blundell

MECT Microeconometrics Blundell Lecture 4 Evaluation Methods I Richard Blundell

Microeconometrics MECT2 Lecture 9: Evaluation Methods I Richard Blundell

MECT Microeconometrics Blundell Lecture 3 Selection Models Richard Blundell

Microeconometrics Blundell Lecture 1 Overview and Binary Response Models Richard Blundell

MECT Microeconometrics Blundell Lecture 2 Censored Data Models Richard Blundell

MECT Microeconometrics Blundell Lecture 3 Evaluation Methods II Richard Blundell

The Persistent Effects of Perus Mining Mita by Melissa Dell, Econometrica (2010)

Applied Statistics Lecturer: Serena Arima Introduction Binary model Example Fit Test

Week 7: Binary Outcomes Logistic Regression &amp; Classification Max H. Farrell The University

Binary choice 3.2 Apply the model on data Michel Bierlaire Solution of the practice quiz.

Binary Choice Matthieu de Lapparent matthieu.delapparent@epfl.ch Transport and Mobility

Qualitative Response Models Michael R. Roberts Department of Finance The Wharton School

Estimation in the Fixed Effects Ordered Logit Model Chris Muris (SFU) Outline Introduction

KernGPLM A Package for Kernel-Based Fitting of Aim of this Talk Generalized Partial Linear

Recovering Preferences from Finite Data Christopher Chambers 1 , Federico Echenique 2 , Nicolas

Elimination of binary choice sequences Tatsuji Kawai Japan Advanced Institute of Science and

III.3 Probabilistic Retrieval Models 1. Probabilistic Ranking Principle 2. Binary Independence

Generalized Probit Model in Design of Dose Finding Experiments Yuehui Wu Valerii V. Fedorov

Modelling and Verification Lecture 1 Lecturer: Luca Aceto luca@ru.is or luca.aceto@gmail.com

CSC 311: Introduction to Machine Learning Lecture 7 - Probabilistic Models Roger Grosse Chris

Probabilistic Models Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National

Dynamic Programming Greedy. Build up a solution incrementally, myopically optimizing

Week 7: Binary Outcomes Logistic Regression & Classification Max H. Farrell The University