classical discrete choice theory
play

Classical Discrete Choice Theory James J. Heckman University of - PowerPoint PPT Presentation

Classical Discrete Choice Theory James J. Heckman University of Chicago Econ 312, Spring 2019 Heckman Classical Discrete Choice Theory Classical regression model: y = x + 0 = E ( | x ) 0 , 2 I E N 1


  1. Debreu (1960) criticism of Luce Model • “Red Bus - Blue Bus Problem” • Suppose N + 1 th alternative is identical to the first 2 e θ ( s ) ′ x N +1 Pr(choose 1 or N + 1 | s , B ′ ) = � N +1 l =1 e θ ( s ) ′ x l • = ⇒ Introduction of identical good changes probability of riding a bus. • not an attractive result • comes from need to make iid assumption on new alternative Heckman Classical Discrete Choice Theory

  2. Debreu (1960) criticism of Luce Model: Some Alternative Assumptions 1 Could let v i = ln( θ ( s ) ′ x i ) θ ( s ) ′ x j Pr( j | s , B ) = � N +1 l =1 θ ( s ) ′ x l If we also imposed � N l =1 θ ( s ) ′ x l = 1, we would get linear probability model but this could violate IIA. 2 Could consider model of form e θ j ( s ) x i Pr( j | s , B ) = � N l =1 e θ l ( s ) x l but here we have lost our forecasting ability (cannot predict demand for a new good). 3 Universal Logit Model e ϕ i ( x 1 ,..., x N ) β ( s ) Pr( i | s , x 1 , ..., x N ) = � N l =1 e ϕ l ( x 1 ,..., x N ) β ( s ) Heckman Classical Discrete Choice Theory Here we lose IIA and forecasting (Bernstein Polynomial

  3. Criteria for a good PCS 1 Goal: We want a probabilistic choice model that 1 has a flexible functional form 2 is computationally practical 3 allows for flexibility in representing substitution patterns among choices 4 is consistent with a random utility model (RUM) = ⇒ has a structural interpretation Heckman Classical Discrete Choice Theory

  4. How do you verify that a candidate PCS is consistent with a RUM? 1 Goal: (a) Either start with a R.U.M. u i = v ( s , x i ) + ε ( s , x i ) and solve integral for � v l + ε l � Pr( u i > u l , ∀ l � = i ) = Pr( i = arg max ) l or (b) start with a candidate PCS and verify that it is consistent with a R.U.M. (easier) 2 McFadden provides sufficient conditions 3 See discussion of Daley-Zachary-Williams theorem Heckman Classical Discrete Choice Theory

  5. Link to Airum Models Heckman Classical Discrete Choice Theory

  6. Daly-Zachary-Williams Theorem • Daly-Zachary (1976) and Williams (1977) provide a set of conditions that makes it easy to derive a PCS from a RUM with a class of models (“generalized extreme value” (GEV) models) • Define G : G ( Y 1 , . . . , Y J ) • If G satisfies the following 1 nonnegative defined on Y 1 , . . . , Y J ≥ 0 2 homogeneous degree one in its arguments Y i →∞ G ( Y 1 , . . . , Y i , . . . , Y J ) → ∞ , ∀ i = 1 , . . . , J lim 3 • ∂ k G is nonnegative if k odd (1) nonpositive if even ∂ Y 1 · · · ∂ Y k Heckman Classical Discrete Choice Theory

  7. • Then for a R.U.M. with u i = v i + ε i and � � e − ε 1 , . . . , e − ε J �� F ( ε 1 , . . . , ε J ) = exp − G • This cdf has Weibull marginals but allows for more dependence among ε ’s. • The PCS is given by = e v i G i ( e v 1 , . . . , e v J ) P i = ∂ ln G ∂ v i G ( e v 1 , . . . , e v J ) • Note: McFadden shows that under certain conditions on the form of the indirect utility function (satisfies AIRUM form), the DZW result can be seen as a form of Roy’s identity. Heckman Classical Discrete Choice Theory

  8. • Let’s apply this result • Multinomial logit model (MNL) e − e − ε 1 · · · e − e − ε J ← cdf F ( ε 1 , . . . , ε J ) = − product of iid Weibulls e − � J j =1 e − ε j = • Can verify that G ( e v 1 , . . . , e v J ) = � J j =1 e v i satisfies DZW conditions e v j P ( j ) = ∂ ln G = l =1 e v l = MNL model � J ∂ v i Heckman Classical Discrete Choice Theory

  9. • Another GEV model • Nested logit model (addresses to a limited extent the IIA criticism) • Let   1 − σ m ( տ like an M �  � vi G ( e v 1 , . . . , e v J ) =  elasticity a m e 1 − σ m of substitution) m =1 i ∈ B m Heckman Classical Discrete Choice Theory

  10. • Idea: divide goods into branches • First choose branch, then good within branch red blue car bus • Will allow for correlation between errors (this is role of σ )) • B m ⊆ { 1 , . . . , J } � B m = B m =1 is a single branch—need not have all choices on all branches Heckman Classical Discrete Choice Theory

  11. • Note: if σ = 0, get usual MNL form • Calculate equation for � � 1 − σ m � �� � m vi ∂ ln m =1 a m i ∈ B m e 1 − σ m ∂ ln G p i = = ∂ v i ∂ v i � � �� � − σ m �� � − 1 �� � � vi vi vi vi m ∋ i ∈ B m a m e 1 − σ m i ∈ B m e 1 − σ m i ∈ B m e 1 − σ m i ∈ B m e 1 − σ m = �� � 1 − σ m � m vi m =1 a m i ∈ B m e 1 − σ m m � = P ( i | B m ) P ( B m ) m =1 Heckman Classical Discrete Choice Theory

  12. • Where vi e 1 − σ m P ( i | B m ) = if i ∈ B m , 0 otherwise � vi i ∈ B m e 1 − σ m �� � 1 − σ m vi a m i ∈ B m e 1 − σ m P ( B m ) = �� � 1 − σ m � m vi m =1 a m i ∈ B m e 1 − σ m • Note: If P ( B m ) = 1 get logit form • Nested logit requires that analyst make choices about nesting structure Heckman Classical Discrete Choice Theory

  13. • How does nested logit solve red bus/blue bus problem? • Suppose � � 1 − σ 1 1 Y i = e v i 1 − σ 1 − σ G = Y 1 + Y + Y 2 3 Heckman Classical Discrete Choice Theory

  14. e v 1 ∂ ln G P (1 | { 123 } ) = = � � 1 − σ ∂ v i v 2 v 3 e v 1 + 1 − σ + e e 1 − σ � � − σ v 2 v 2 v 3 1 − σ + e e e 1 − σ 1 − σ ∂ ln G P (2 | { 123 } ) = = � � 1 − σ ∂ v i v 2 v 3 e v 1 + 1 − σ + e e 1 − σ Heckman Classical Discrete Choice Theory

  15. • As v 3 → −∞ e v 1 P (1 | { 123 } ) = (get logistic) e v 1 + e v 2 • As v 1 → −∞ Heckman Classical Discrete Choice Theory

  16. What Role Does σ Play? • σ is the degree of substitutability parameter • Recall F ( ε 1 , ε 2 , ε 3 ) = exp {− G ( e − ε 1 , e − ε 2 , e − ε 3 ) } • Here cov ( ε 2 , ε 3 ) σ = √ var ε 2 var ε 3 = correlation coefficient • Thus we require − 1 ≤ σ ≤ 1, but turns out we also need to require σ > 0 for DZW conditions to be satisfied. This is unfortunate because it does not allow ε ’s to be negatively correlated. • Can show that e v 1 σ → 1 P (1 | { 123 } ) = lim e v 1 + max( e v 2 , e v 3 ) (L’Hˆ opital’s Rule) Heckman Classical Discrete Choice Theory

  17. • If v 2 = v 3 , then � � − σ v 2 v 2 e 2 e 1 − σ 1 − σ P (2 | { 123 } ) = � � 1 − σ v 2 e v 1 + 2 e 1 − σ e v 2 2 − σ = e v 1 + ( e v 2 ) (2 1 − σ ) e v 2 2 − 1 lim = e v 1 + e v 2 when v 1 = v 2 σ → 1 ր introduce 3rd identical alternative and cut the probability of choosing 2 in half • Solves red-bus/blue-bus problem • Probability cut in half with two identical alternatives Heckman Classical Discrete Choice Theory

  18. red bus blue bus car • σ is a measure of similarity between red and blue bus. • When σ close to one, the conditional choice probability selects with high probability the alternative. Heckman Classical Discrete Choice Theory

  19. • Remark: We can expand logit to accommodate multiple levels ex.  � 1 − σ m  ��   Q � � 1 1 − σ m G = a q a m y  3 levels i  q =1 m ∈ Q q i ∈ B m Heckman Classical Discrete Choice Theory

  20. • Example: Two Choices 1 Neighborhood ( m ) 2 Transportation mode ( t ) 3 P ( m ): choice of neighborhood 4 P ( i | B m ): probability of choosing i th mode, given neighborhood m Heckman Classical Discrete Choice Theory

  21. 1 Not all modes available in all neighborhoods �� T m � − σ m v ( m , t ) v ( m , t ) e t =1 e 1 − σ m 1 − σ m P m , t = �� T j � 1 − σ m � m v ( m , t ) t =1 e 1 − σ m j =1 v ( m , t ) e 1 − σ m P t | m = � T m v ( m , t ) t =1 e 1 − σ m �� T m � 1 − σ m v ( m , t ) t =1 e 1 − σ m P m = � 1 − σ m = P ( B m ) �� T j � m v ( m , t ) t =1 e 1 − σ m j =1 Heckman Classical Discrete Choice Theory

  22. • Standard type of utility function that people might use v ( m , t ) = z ′ t γ + x ′ mt β + y ′ m α Heckman Classical Discrete Choice Theory

  23. • z ′ t is transportation mode characteristics, x ′ mt is interactions and y ′ m is neighborhood characteristics. • Then ( z ′ t γ + x ′ mt β ) e 1 − σ m �� T m � P t | m = ( z ′ t γ + x ′ mt β ) t =1 e 1 − σ m �� T m � 1 − σ m ( z ′ t γ + x ′ mt β ) e y ′ m α t =1 e 1 − σ m P m = �� T m � 1 − σ j ( z ′ t γ + x ′ mt β ) � m j =1 e y ′ m α t =1 e 1 − σ j Heckman Classical Discrete Choice Theory

  24. • Estimation (in two steps) (see Amemiya, Chapter 9) • Let � T m ( z ′ t γ + x ′ mt β ) I m = e 1 − σ m t =1 Heckman Classical Discrete Choice Theory

  25. � γ � β 1 Within each neighborhood, get 1 − σ m and 1 − σ m by logit 2 Form � I m 3 Then estimate by MLE m α +(1 − σ m ) ln � e y ′ I m get � α, � σ m � m m α +(1 − σ j ) ln � j =1 e y ′ I j • Assume σ m = σ j ∀ j , m or at least need some restrictions across multiple neighborhoods? • Note: � I m is an estimated regressor (“Durbin problem”) • Need to correct standard errors Heckman Classical Discrete Choice Theory

  26. Multinomial Probit Models 1 Also known as: 1 Thurstone Model V (1929; 1930) 2 Thurstone-Quandt Model 3 Developed by Domencich-McFadden (1978) (on reading list) u i = v i + η i i = 1 , ..., J v i = Z i β (linear in parameters form) u i = Z i β + η i MNL MNP � ¯ � ( i ) β fixed ( i ) β random coefficient β ∼ N β, Σ β ( ii ) η i iid ( ii ) β independent of η η ∼ (0 , Σ η ), • Allow gen. forms of correlation between errors Heckman Classical Discrete Choice Theory

  27. � � u i = Z i ¯ β − ¯ β + Z i β + η i � � • ( β − ¯ β − ¯ β ) = ε and Z i β + η i is a composite heteroskedastic error term. • β random = taste heterogeneity, • η i can interpret as unobserved attributes of goods • Main advantage of MNP over MNL is that it allows for general error covariance structure. • Note: To make computation easier, users sometimes set Σ β = 0 (fixed coefficient version) • allowing for β random • permits random taste variation • allows for possibility that different persons value 2 characteristics differently Heckman Classical Discrete Choice Theory

  28. Problem of Identification and Normalization in the MNP Model • Reference: David Bunch (1979), “Estimability In the Multinominal Probit Model” in Transportation Research • Domencich and McFadden • Let     Z 1 · ¯ β η 1 J alternatives  .   .  Z ¯ . . β = ˜ η = K characteristics     . . Z J · ¯ β random β ∼ N ( β, Σ β ) β η J (2) Heckman Classical Discrete Choice Theory

  29. Problem of Identification and Normalization in the MNP Model • Pr (alternative j selected): = Pr ( u j > u i ) ∀ i � = j u j u j � ∞ � � = Φ ( u | V µ , Σ µ ) du J du l du j u j = −∞ u i = −∞ u J = −∞ where Φ ( u | V µ , Σ µ ) is pdf (Φ is J -dimensional MVN density with mean V µ , Σ µ ) • Note: Unlike the MVL, no closed form expression for the integral. • The integrals often evaluated using simulation methods (we will work an example). Heckman Classical Discrete Choice Theory

  30. How many parameters are there? • ¯ β : K parameters • Σ β : K × K symmetric matrix K 2 − K + K = K ( K +1) 2 2 J ( J +1) • Σ η : 2 • Note: When a person chooses j , all we know is relative utility, not absolute utility. • This suggests that not all parameters in the model will be identified. • Requires normalizations. Heckman Classical Discrete Choice Theory

  31. Digression on Identification • What does it mean to say a parameter is not identified in a model? • Model with one parameterization is observationally equivalent to another model with a different parameterization Heckman Classical Discrete Choice Theory

  32. Digression on Identification • Example: Binary Probit Model (fixed β ) Pr ( D = 1 | Z ) = Pr ( v 1 + ε 1 > v 2 + ε 2 ) = Pr ( x β + ε 1 > x 2 β + ε 2 ) = Pr (( x 1 − x 2 ) β > ε 2 − ε 1 ) � ( x 1 − x 2 ) β � > ε 2 − ε 1 = Pr σ σ � ˜ � x β = Φ x = x 1 − x 2 ¯ σ � ˜ � � ˜ � x β x β ∗ for β σ = β ∗ • Φ is observationally equivalent to Φ σ ∗ . σ σ ∗ Heckman Classical Discrete Choice Theory

  33. • β not separably identified relative to σ but ratio is identified: � ˜ � � ˜ � x β ∗ x β Φ = Φ σ ∗ σ � ˜ � ˜ � � x β ∗ x β Φ − 1 · Φ Φ − 1 Φ = σ ∗ σ σ = β ∗ β ⇒ σ ∗ • Set { b : b = β · δ, δ any positive scalar } is identified (say “ β is identified up to scale and sign is identified”). Heckman Classical Discrete Choice Theory

  34. Identification in the MVP model Pr ( j selected | V µ , Σ µ ) = Pr ( u i − u j < 0 ∀ i � = j )   1 0 .. − 1 .. 0   0 1 .. − 1 .. 0   Define ∆ j = (contrast matrix)   : : : 0 .. .. − 1 0 1 ( J − 1) × J   u ′ − u j   ∆ j ˜ u = : u J − u j Heckman Classical Discrete Choice Theory

  35. Identification in the MVP model Pr ( j selected | V µ , Σ µ ) = Pr (∆ j ˜ u < 0 | V µ , Σ µ ) = Φ (0 | V Z , Σ Z ) • Where u = ∆ j ˜ Z ¯ 1 V Z is the mean of ∆ j ˜ β 2 Σ Z is the variance of ∆ j ˜ Z Σ β ˜ Z ′ ∆ ′ j + ∆ j Σ η ∆ ′ j 3 V Z is ( J − 1) × 1 4 Σ Z : ( J − 1) × ( J − 1) • We reduce dimensions of the integral by one. Heckman Classical Discrete Choice Theory

  36. • This says that all of the information exists in the contrasts. • Can’t identify all the components because we only observe the contrasts. • Now define ˜ ∆ j as ∆ j with J th column removed and choose J as the reference alternative with corresponding ∆ J . • Then can verify that ∆ j = ˜ ∆ j · ∆ J Heckman Classical Discrete Choice Theory

  37. • For example, with three goods: � 1 � � 1 � � 1 � − 1 0 − 1 − 1 0 × = 0 − 1 0 1 1 0 − 1 1 ˜ • ∆ j , ( j = 2 , ∆ J , ( J = 3 , ∆ j , ( j = 2 , 3rd column included) 3rd column reference alt.) removed) Heckman Classical Discrete Choice Theory

  38. • Therefore, we can write ∆ j ˜ Z ¯ V Z = β ∆ j ˜ Z Σ β ˜ j + ˜ J ˜ Z ′ ∆ ′ ∆ j ∆ J Σ η ∆ ′ ∆ ′ Σ Z = j • where C J = ∆ J Σ η ∆ ′ J and ( J − 1) × ( J − 1) has ( J − 1) 2 − ( J − 1) + ( J + 1) parameters = J ( J − 1) total. 2 2 • Since original model can always be expressed in terms of a model with ( β, Σ β , C J ) , it follows that some of the parameters in the original model are not identified. Heckman Classical Discrete Choice Theory

  39. How many parameters not identified? • Original model: K + K ( K + 1) + J ( J + 1) 2 2 • Now: J 2 + J − ( J 2 − J ) K + K ( K + 1) + J ( J − 1) , 2 2 2 = J not identified • Turns out that one additional parameter not identified. • Total: J + 1 • Note : Evaluation of Φ (0 | kv Z , k 2 Σ Z ) k > 0 gives same result as evaluating Φ (0 | v Z , Σ Z ) can eliminate one more parameter by suitable choice of k . Heckman Classical Discrete Choice Theory

  40. Illustration   σ 11 σ 12 σ 13   J = 3 Σ η = σ 21 σ 22 σ 23 σ 31 σ 32 σ 33 � 1 � 1 � � ′ − 1 0 − 1 0 C 2 = ∆ 2 Σ η ∆ ′ 2 = · Σ η 0 − 1 1 0 − 1 1 � σ 11 � − 2 σ 21 + σ 22 , σ 21 − σ 31 − σ 32 + σ 22 = σ 21 − σ 31 − σ 32 + σ 22 , σ 33 − 2 σ 31 + σ 22 Heckman Classical Discrete Choice Theory

  41. Illustration � 1 � − 1 C 2 = ˜ ∆ 2 ∆ 3 Σ η ∆ ′ 3 ∆ ′ 2 = · 0 − 1 � σ 11 � − 2 σ 21 + σ 33 , σ 21 − σ 31 − σ 32 + σ 33 · σ 21 − σ 31 − σ 32 + σ 33 σ 22 − 2 σ 32 σ 33 � � 1 0 − 1 − 1 Heckman Classical Discrete Choice Theory

  42. Normalization Approach of Albreit, Lerman, and Manski (1978) • Note: Need J + 1 restrictions on VCV matrix. • Fix J parameters by setting last row and last column of Σ η to 0 • Fix scale by constraining diagonal elements of Σ η so that trace Σ ε J equals variance of a standard Weibull. (To compare estimates with MNL and independent probit) Heckman Classical Discrete Choice Theory

  43. How do we solve the forecasting problem? • Suppose that we have 2 goods and add a 3rd � � u 1 − u 2 ≥ 0 Pr (1 chosen) = Pr �� Z 1 − Z 2 � ¯ β ≥ ω 2 − ω 1 � = Pr 1 • where ω 1 = Z 1 � � ω 2 = Z 2 � � β − ¯ β − ¯ + η 1 , + η 2 β β ( Z 1 − Z 2 ) ¯ � β 1 1 / 2 [ σ 11+ σ 22 − 2 σ 12+ ( Z 2 − Z 1 ) Σ η ( Z 2 − Z 1 ) ′ ] e − t / 2 dt = √ 2 π −∞ • Now add a 3rd good β + Z 3 � � u 3 = Z 3 ¯ β − ¯ + η 3 . β Heckman Classical Discrete Choice Theory

  44. • Problem : We don’t know correlation of η 3 with other errors. • Suppose that η 3 = 0 ( i.e. only preference heterogeneity). Then � a � b Pr (1 chosen) = B . V . N . dt 1 dt 2 −∞ −∞ � Z 1 − Z 2 � ¯ β when a = � σ 11 + σ 22 − 2 σ 12 + ( Z 2 − Z 1 ) Σ β ( Z 2 − Z 1 ) ′ � 1 / 2 � Z 1 − Z 3 � ¯ β and b = � σ 11 + ( Z 3 − Z 1 ) Σ β ( Z 3 − Z 1 ) ′ � 1 / 2 • We could also solve the forecasting problem if we make an assumption like η 2 = η 3 . • We solve red-bus//blue-bus problem if η 2 = η 1 = 0 and z 3 = z 2 . Heckman Classical Discrete Choice Theory

  45. � � u 1 − u 2 ≥ 0 , u 1 − u 3 ≥ 0 Pr (1 chosen) = Pr • but u 1 − u 2 ≥ 0 ∧ u 1 − u 3 ≥ 0 are the same event. • ∴ adding a third choice does not change the choice of 1 . Heckman Classical Discrete Choice Theory

  46. Estimation Methods for MNP Models • Models tend to be difficult to estimate because of high dimensional integrals. • Integrals need to be evaluated at each stage of estimating the likelihood. • Simulation provides a means of estimating P ij = Pr ( i chooses j ) Heckman Classical Discrete Choice Theory

  47. Computation and Estimation Link to Appendix Heckman Classical Discrete Choice Theory

  48. Classical Models for Estimating Models with Limited Dependent Variables References: • Amemiya, Ch. 10 • Different types of sampling (previously discussed) (a) random sampling (b) censored sampling (c) truncated sampling (d) other non-random (exogenous stratified, choice-based) Heckman Classical Discrete Choice Theory

  49. Standard Tobit Model (Tobin, 1958) “Type I Tobit” y ∗ i = x i β + u i • Observe y ∗ if y ∗ i ≥ y 0 or y i = 1 ( y ∗ i ≥ y 0 ) y ∗ y i = i i if y ∗ y i = 0 i < y 0 • Tobin’s example-expenditure on a durable good only observed if good is purchased Heckman Classical Discrete Choice Theory

  50. Figure 1 expenditure x x x y x x 0 x x x x x individuals Note: Censored observations might have bought the good if price had been lower. • Estimator. Assume y ∗ i / x i ∼ N (0 , σ 2 y ∗ i / x i ∼ N ( x i β, σ 2 u ) u ) Heckman Classical Discrete Choice Theory

  51. Density of Latent Variables g ( y ∗ ) = π 0 Pr ( y ∗ i < y 0 ) + π 1 f ( y ∗ i | y i ≥ y 0 ) · Pr ( y ∗ i ≥ y 0 ) � u i � � y 0 − x i β � < y 0 − x i β Pr ( y ∗ i < y 0 ) = Pr ( x i β + u i < y 0 ) = Pr = Φ σ u σ u σ u � y ∗ � i − x i β 1 σ u φ σ u f ( y ∗ i | y ∗ � � why? i ≥ y 0 ) = y 0 − x i β 1 − Φ σ u Pr ( y ∗ = y ∗ i | y 0 ≤ y ∗ ) = Pr ( x β + u = y ∗ i | y 0 ≤ x β + u ) � u � = y ∗ i − x β | u ≥ y 0 − x β Pr σ u σ u σ u σ u Heckman Classical Discrete Choice Theory

  52. • Note that likelihood can be written as: � � � y 0 − x i β � � � y 0 − x i β �� y ∗ i − x i β 1 σ u φ σ u � � �� L = Π 0 Φ Π 1 1 − Φ Π 1 σ u σ u y 0 − x i β 1 − Φ � �� � σ u � �� � This part you would set with just a simple probit Additional information • You could estimate β up to scale using only the information on whether y i � y 0 , but will get more efficient estimate using additional information. * if you know y 0 , you can estimate σ u . Heckman Classical Discrete Choice Theory

  53. Truncated Version of Type I Tobit Observe y i = y ∗ i if y ∗ i > o � observe nothing for censored observations � example: only observe wages for workers � � y ∗ i − x i β 1 σ u φ σ u � � Z = Π 1 x i β Φ σ u Pr ( y ∗ i > 0) = Pr ( x β + u > 0) � u � > − x β = Pr σ u σ u � � u < x β = Pr σ u Heckman Classical Discrete Choice Theory

  54. Different Ways of Estimating Tobit β (a) if censored, could obtain estimates of σ u by simple probit (b) run OLS on observations for which y ∗ i is observed � u i � | u i > − x β E ( y i | x i β + u i ≥ 0) = x i β + σ u E ( y 0 = 0) σ u σ u σ u • where E ( y i | x i β + u i ≥ 0) is the conditional mean for truncated normal r.v and � � � u i � � x i β � − x β φ | u i > − x β σ u � � σ u E − → λ = σ u σ u σ u σ u π i β Φ σ u � � x i β • λ known as “Mill’s ratio” ; bias due to censoring, can be σ u viewed as an omitted variables problem Heckman Classical Discrete Choice Theory

  55. Heckman Two-Step procedure β • Step 1: estimate σ u by probit • Step 2: � � x i ˆ β form ˆ λ σ regress � x i β � x i β + σ ˆ y i = λ + v + ε σ � � x i β � � x i β �� − ˆ v = σ λ λ σ σ ε = u i − E ( u i | u i > x i β ) Heckman Classical Discrete Choice Theory

  56. • Note: errors (v+e) will be heteroskedatic; • need to account for fact that λ is estimated (Durbin problem) • Two ways of doing this: (a) Delta method (b) GMM (Newey, Economic Letters, 1984) (c) Suppose you run OLS using all the data � � u i �� > − x i β | u i E ( y i ) = Pr ( y ∗ i ≤ 0) · 0 + Pr ( y ∗ i > 0) x i β + σ u E σ u σ u σ � x i β � � � x i β �� =Φ x i β + σ u λ σ σ • could estimate model by replacing Φ with ˆ φ and λ with ˆ λ. • For both (b) and (c), errors are heteroskedatic, meaning that you could use weights to improve efficiency. • Also need to adjust for estimated regressor. (d) Estimate model by Tobit maximum likelihood directly. Heckman Classical Discrete Choice Theory

  57. Variations on Standard Tobit Model y ∗ = x 1 i β + u 1 i 1 i y ∗ = x 2 i β + u 2 i 2 i y ∗ y ∗ y 2 i = if 1 i ≥ 0 2 i = 0 else • Example • y 2 i student test scores • y ∗ 1 i index representing parents propensity to enroll students in school • Test scores only observed for proportion enrolled Heckman Classical Discrete Choice Theory

  58. L =Π 1 [Pr ( y ∗ 1 i > 0) f ( y 2 i | y ∗ 1 i > 0)] Π 0 [Pr ( y ∗ 1 i ≤ 0)] � ∞ 0 f ( y ∗ 1 i , y ∗ 2 i ) dy ∗ 1 i f ( y ∗ 2 i | y ∗ � ∞ 1 i ≥ 0) = 0 f ( y ∗ 1 i ) dy ∗ 1 i � ∞ 0 f ( y ∗ 1 i | y ∗ 2 i ) dy ∗ = f ( y 2 i ) 1 i � ∞ 0 f ( y ∗ 1 i ) dy ∗ 1 i � ∞ � y ∗ � 0 f ( y ∗ 1 i | y ∗ 2 i ) dy ∗ = 1 2 i − x 2 i β 2 1 i σ 2 φ · σ 2 Pr ( y ∗ 1 i > 0) � x 1 i β 1 , σ 2 � y 1 i ∼ N y 2 i ∼ N ( x 2 i β 2 , ) Heckman Classical Discrete Choice Theory

  59. � � x 1 i β 1 + σ 12 1 − σ 12 y ∗ 1 i | y ∗ ( y 2 i − x 2 i β 2 ) , σ 2 2 i ∼ N σ 2 σ 2 2 2 E ( y ∗ 1 i | u 2 i = y ∗ 2 i − x 2 i β ) = x 1 i β 1 + E ( u 1 i | u 2 i = y ∗ 2 i − x 2 i β ) Heckman Classical Discrete Choice Theory

  60. Estimation by MLE � � x 1 i β �� � y ∗ � 1 2 i − x 2 i β 2 L = Π 0 1 − Φ Π 1 · φ σ 1 σ 2 σ 2 � �     x 1 i β 1 + σ 12   − 2 ( y 2 i − x 2 i β 2 ) σ 2   ·  1 − Φ σ x  Heckman Classical Discrete Choice Theory

  61. Estimation by Two-Step Approach • Using data on y 2 i for which y 1 i > 0 E ( y 2 i | y 1 i > 0) = x 2 i β + E ( u 2 i | x i β + u 1 i > 0) � u 2 i � | u 1 i > − x 1 i β 1 = x 2 i β + σ 2 E σ 2 σ 1 σ 1 � u 1 i � σ 12 | u 1 i > − x 1 i β 1 = x 2 i β + \ σ 2 E σ 1 \ σ 2 σ 1 σ 1 σ 1 � − x i β � x 2 i β 2 + σ 12 = λ σ 1 σ Heckman Classical Discrete Choice Theory

  62. Example: Female labor supply model max u ( L , x ) s.t. x = wH + v H = 1 − L where H : hours worked v : asset income w given P x = 1 L : time spent at home for child care ∂ u ∂ L = w when L < 1 ∂ u ∂ x reservation wage = MRS | H =0 = w R Heckman Classical Discrete Choice Theory

  63. Example: Female labor supply model • We don’t observe w R directly. w 0 Model = x β + u (wage person would earn if they worked) w R = z γ + v w 0 i < w 0 w R w i = if i i = 0 else • Fits within previous Tobit framework if we set x β − z γ + u − v = w 0 − w R y ∗ = 1 i y 2 i = w i • Note - Gronau does not develop a model to explain hours of work. Heckman Classical Discrete Choice Theory

  64. Incorporate choice of H w 0 = x 2 i β 2 + u 2 i given ∂ u = γ H i + z ′ ∂ L MRS = i α + v i ∂ u ∂ x (Assume functional form for utility function that yields this) Heckman Classical Discrete Choice Theory

  65. w r ( H i = 0) z ′ = i α + v i w 0 work if = x 2 i β 2 + u 2 i > z i α + v i w 0 if work, then = MRS = ⇒ x 2 i β 2 + u 2 i = α H i + z i α + v i i H i = x 2 i β 2 − z ′ i α + u 2 i − v i = ⇒ γ = x 1 i β 1 + u 1 i ( x 2 i β 2 − z i α ) γ − 1 where x 1 i β 1 = u 1 i = u 2 i − v i Heckman Classical Discrete Choice Theory

  66. Type 3 Tobit Model y ∗ 1 i = x 1 i β 1 + u 1 i ← − hours y ∗ 2 i = x 2 i β 1 + u 2 i ← − wage y ∗ if y ∗ y 1 i = 1 i > 0 1 i if y ∗ = 0 1 i ≤ 0 y ∗ if y ∗ y 2 i = 1 i > 0 2 i if y ∗ = 0 1 i ≤ 0 Heckman Classical Discrete Choice Theory

  67. H ∗ H ∗ Here H i = if i > 0 i H ∗ = 0 if i ≤ 0 w 0 H ∗ w i = if i > 0 i H ∗ = 0 if i ≤ 0 • Note: Type IV Tobit simply adds y ∗ if y ∗ y 3 i = 1 i > 0 3 i if y ∗ = 0 1 i ≤ 0 Heckman Classical Discrete Choice Theory

  68. • Can estimate by (1) maximum likelihood (2) Two-step method � � w 0 E i | H i > 0 = γ H i + z i α + E ( v i | H i > 0) Heckman Classical Discrete Choice Theory

  69. Type V Tobit Model of Heckman (1978) y ∗ = γ y 2 i + x 1 i β + δ 2 w i + u 1 i 1 j γ 2 y ∗ y 2 i = 1 i + x 2 i β 2 + δ 2 w i + u 2 i • Analysis of an antidiscrimination law on average income of African Americans in i th state. • Observe x 1 i , x 2 i , y 2 i and w i if y ∗ w i = 1 1 i > 0 if y ∗ w i = 0 1 i ≤ 0 • y 2 i = average income of African Americans in the state • y ∗ 1 i = unobservable sentiment towards African Americans • w i = if law is in effect Heckman Classical Discrete Choice Theory

  70. • Adoption of Law is endogenous • Require restriction γδ 2 + δ 1 = 0 so that we can solve for y ∗ 1 j as a function that does not depend on w i . • This class of models known as “dummy endogenous variable” models. Coherency Problem (Suppose Not Restricted?) Heckman Classical Discrete Choice Theory

  71. Relaxing Parametric Assumptions in the Selection Model References: • Heckman (AER, 1990) “Varieties of Selection Bias” • Heckman (1980), “Addendum to Sample Selection Bias as Specification Error” • Heckmand and Robb (1985, 1986) y ∗ = x β + u 1 y ∗ = z γ + v 2 y ∗ if y ∗ y 1 = 2 > 0 1 Heckman Classical Discrete Choice Theory

  72. Relaxing Parametric Assumptions in the Selection Model E ( y ∗ 1 | observed) = x β + E ( u | x , z γ + u > 0) + [ u − E ( u | x , z γ + u > 0)] � ∞ � − z γ −∞ uf ( u , v | x , z ) dvdu −∞ � ∞ � − z γ −∞ f ( uv | x , z ) dvdu −∞ • Note: Pr ( y ∗ 2 > 0 | z ) = Pr ( z γ + u > 0 | z ) = P ( Z ) = 1 − F v ( − z γ ) Heckman Classical Discrete Choice Theory

  73. ⇒ F v ( − z γ ) = 1 − P ( Z ) − z γ = F − 1 ⇒ (1 − P ( Z )) if F v v • Can replace − z γ in integrals in integrals by F − 1 (1 − P ( Z )) if v in addition f ( u , v | x , z ) = f ( u , v | z γ ) (index sufficiency) • Then E ( y ∗ 1 | y 2 > 0) = x β + g ( P ( z )) + ε where g ( P ( Z )) is bias or “control function.” • Semiparametric selection model-Approximate bias function by Taylor series in P ( z γ ) , truncated power series. Heckman Classical Discrete Choice Theory

Recommend


More recommend