Cross Section Bias: Age, Period and Cohort Effects James J. Heckman University of Chicago Exon 312, Spring 2019 Heckman
ln W i = α 0 + α 1 a i + α 2 y ↑ ↑ age year + + + α 3 e i α 4 s i α 5 c i u i ↑ ↑ ↑ experience schooling vintage (birth cohort) Heckman
Two Identities e i = a i − s i “experience” (1) y = a i + c i c i = birth year (2) • Solve out for c i and a i to get estimable combinations. Heckman
• Take the simpler case first: ln W ( a , y , c ) = β 0 + β 1 a i + β 2 y i + β 3 c i + u i (age) (year) (cohort) = a i + c i , y i where y 1 is the current year, and c i is the year of birth. • Obviously, we get an exact linear dependence: ( β 0 , β 1 , β 2 , β 3 ) Heckman
• Substitute c i = y i − a i . • ln W i = α 0 + β 1 a i + β 2 y i + β 3 ( y i − a i ) + u i = α 0 + ( β 1 − β 3 ) a i + ( β 2 + β 3 ) y i + u i can identify only combinations of coefficients. • In a cross section, y i is the same for everyone. The intercept is [ α 0 + ( β 2 + β 3 ) y i ] . Heckman
• We can estimate ( β 1 − β 3 ) : age minus cohort effect. • If β 3 > 0, we underestimate true β 1 . • Will longitudinal data rescue us? — Not necessarily. • With panels, y i moves with time. Recall that y i = a i + c i . • So we still have exact linear dependence. This is true if we have dummy variables in place of continuous variables (verify). Panel data will rescue us — if we have no year effects. Heckman
• We acquire similar problems in models with nonlinear terms: y = a + c y 2 = a 2 + 2 ac + c 2 ay = a 2 + ac 3 linear dependencies in these set-ups cy = ca + c 2 • Thus when we write β 0 + β 1 a + β 2 y + β 3 c + β 4 a 2 + β 5 ac ln W = + β 6 ay + β 7 cy + β 8 c 2 + β 9 y 2 + u , we cannot identify all of the parameters (only 3 second order parameters are estimable out of 6 total. Heckman
Theorem . In a model with interactions of order k with j variables � j + k − 1 � and one linear restriction among the j variables, then of the k � j + k − 2 � coefficients of order k, only are estimable. (Heckman and k Robb, in S. Feinberg and W. Mason, Age, Period and Cohort Effects: Beyond the Identification Problem , Springer, 1986). E.g. k = 2 , j = 3; 6 coefficients and 3 are estimable, as in the preceding example. Theorem . In a model with ℓ restrictions on the j variables, then � j + k − ℓ − 1 � kth order coefficients are estimable (Heckman and Robb, k 1986). Question: Generalize this analysis for the case of polychotomous variables for age period and cohort effects. Heckman
• Return to the more general case. Substitute out for c i and a i , using (1) and (3): ln W i = α 0 + ( α 2 + α 5 ) y + ( α 1 + α 3 − α 5 ) e i + ( α 1 + α 4 − α 5 ) s i + u i . • In a single cross section, y is the same for everyone. The intercept is then α 0 + ( α 2 + α 5 ) y , where y is year of cross section. • Experience coefficient = α 1 + α 3 − α 5 = α 3 + ( α 1 − α 5 ) if later vintages get higher skills, α 5 > 0 and downward bias ( e.g. higher quality of schooling). If there is an aging effect ( > 0 , e.g. maturation) cannot separate. Produces upward bias for α 3 . Heckman
Schooling Coefficient • α 1 + α 4 − α 5 = α 4 + ( α 1 − α 5 ) • Vintage (cohort) effects lead to downward bias. • Age effects, upward bias. • Observe that from the experience coefficient − schooling coefficient: ( α 1 + α 3 − α 5 ) − ( α 1 + α 4 − α 5 ) = α 3 − α 4 . • Can estimate difference in “returns” to experience net of schooling. Heckman
• Observe that even if α 1 =0 (no aging effect), still can’t estimate these coefficients. • Is the solution longitudinal data (observations n the same people over time) — or repeated cross section data (observations on the same population over time but sampling different persons)? • If α 2 = 0 , (no year effects), we can estimated α 5 . • Alternatively, for each c i we can estimate α 1 + α 3 , and hence we can estimate α 5 . • We also know α 1 + α 4 . If α 1 = 0, then α 3 , α 4 , α 5 identified. Heckman
• Observe the weakness in the procedure. • If year effects are present, we have that there is no gain to going to longitudinal or repeated cross section data. • We gain a parameter when we move to the panel or repeated cross sectional data. Heckman
Solutions in Literature (1) Redefine vintage (cohort) e.g. vintage fixed over period of years (e.g. a cohort of Depression babies. • Then ln W = ( α 0 + α 5 c ) + α 1 a + α 2 y + α 3 e + α 4 s + u . • In single cross section, c and y are fixed. Heckman
• Substitute for e : e = a i − s i • Then ln W = [ α 0 + α 5 c + α 2 y ] + ( α 1 + α 3 ) a i + ( α 4 − α 3 ) s i . • We can estimate α 1 + α 3 and α 4 − α 3 , and thus α 1 + α 4 . • Successive time periods for the same vintage gives us α 2 directly [since c doesn’t move]. • If no age effect , we get α 3 , α 4 , α 2 , and from successive vintage estimations, we get α 5 . Heckman
(2) If we measure experience, a i � = e i + s i (non-market breaks), we get break in linear dependence. • Cost: better proxies may be endogenous. • E.g. experience = cumulated hours. • Results carry over in an obvious way to nonlinear models. Heckman
Example of Interpretive Pitfall (1) Johnson and Stafford (AER, 1974) (2) Weiss and Lillard (JPE, 1979) • Fact: Disparity in real wages between recent Ph.D. entrants and experienced workers rose in physics and mathematics in the late 60s and early 70s. Not observed in the social sciences . • Why? — Johnson-Safford story. • Supplies of Ph.D.s enlarged by federal grants whil emand for scientific personnel declined. Wage rigidity at the top end motivated by specific human capital. Spot market / entrant market bears the brunt of the burden. Heckman
• Weiss & Lillard: “experience – vintage” interaction ( ec ). • Ignore age effect: ln W ( e , c , s , y ) = ϕ 0 + ϕ 1 e + ϕ 2 c + ϕ 3 y + ϕ 4 s + ϕ 5 e 2 + ϕ 6 c 2 + ϕ 7 ec + ϕ 8 ey + ϕ 9 cy + ϕ 10 y 2 • Assume other powers and interactions are zero. Assume ϕ 10 = 0. • Johnson-Stafford: ϕ 8 > 0 or ϕ 9 < 0 • Weiss-Lillard: ϕ 7 > 0 • Recall that y = e + s + c . Heckman
• Weiss-Lillard ignore year effects. • We get Weiss-Lillard by substituting for y : ln W ( e , c , s ) = ϕ 0 + ( ϕ 1 + ϕ 3 ) e + ( ϕ 3 + ϕ 4 ) s +( ϕ 2 + ϕ 3 ) c + ( ϕ 5 + ϕ 8 ) e 2 + ϕ 8 es + ( ϕ 7 + ϕ 8 + ϕ 9 ) ec +( ϕ 6 + ϕ 8 ) c 2 • Note that if ϕ 7 = 0 but ϕ 9 > 0 , we get ec interaction, but it is “really” a year effect. If entry level wages fall relative to wages of experienced workers, the wage / experience profile is steeper in more recent cross-sections. Heckman
• Looking at social scientists where no interaction appears favors Johnson-Stafford. • Moral: auxiliary evidence and theory break the identification problem. Heckman
Cohort vs. Cross-Section Internal Rate of Return • Take a cohort rate of return. (1) Y h a , c is the earnings of a high school graduate of cohort c at age a . (2) Y d a , c is the earnings of a droupout of cohort c at age a . (3) ρ c = IRR c (cohort internal rate of return). A Y h a , c − Y d (4) a , c � = 0. (1 + ρ c ) a a =0 Heckman
• The cross-section consists of a set of member of different cohorts. • Start with c = 1 as the youngest age group and proceed. • At a point in time, we have a = 0 = ⇒ c = 1; c + a = t . . • The cross-section internal rate of return is A � Y h a , 1 − a − Y d � � a , 1 − a = 0, (1 + ρ t ) a a =0 where A + 1 is the maximum age in the population. Heckman
• When can ρ c = ρ t ? • This can occur if the environment is stationary. • With steady growth in differentials, it cannot help explain ρ c = ρ t . • The case ∆ h , d Y h a , c − Y d = (3) a , c a , c ∆ h , d ∆ h , d (1 + g ) j � � = a , c + j a , c will not work. • With constant growth, g cannot explain ρ t = ρ c (!) : c = 0 , 1 t = a + c . Heckman
• Consider a model with 2 cohorts, focus on cohort c = 0. ρ c is the root of 0 , 0 + Y h 1 , 0 − Y d 1 , 0 0 = Y h 0 , 0 − Y d . 1 + ρ c • Cross-section at t = 1, when cohort c enters, is 0 , 0 + Y h 1 , − 1 − Y d 1 , − 1 0 = Y h 0 , 0 − Y d text . 1 + ρ t • In general, ρ c � = ρ t . More generally, for cohort ¯ c , the benchmark cohort, ρ ¯ c is the IRR that solves A � Y h c − Y d � � a , ¯ a , ¯ c = 0 . c ) a (1 + ρ ¯ a =0 Heckman
• Cross section in year t = ¯ c produces the equation A � c − a d � Y h c − a − Y a , ¯ � a , ¯ = 0, (1 + ρ t ) a a =0 where ρ t is the root. • If growth rates across cohorts are benchmarked against ¯ c , we obtain A (1 + g ) − a � � Y h c − Y d a , ¯ a , ¯ � c = 0 (1 + ρ t ) a a =0 A � Y h c − Y d � � a , ¯ a , ¯ c = 0, [(1 + ρ t ) (1 + g )] a a =0 so clearly ρ t < ρ c . Heckman
Recommend
More recommend