seminar on longitudinal analysis
play

Seminar on Longitudinal Analysis James Heckman University of - PowerPoint PPT Presentation

Intro Review Models Covariates Diagnostic Check Duration Model Tests Seminar on Longitudinal Analysis James Heckman University of Chicago This draft, May 20, 2007 1 / 191 Intro Review Models Covariates Diagnostic Check Duration


  1. Intro Review Models Covariates Diagnostic Check Duration Model Tests Finally, a more complex hazard function with a concrete basis for 6 the parametrization is the following: t = elapsed calendar time Let x = age of an individual female A hazard specification for the fertility of women who have had a child might be as follows: h ( t , x ) = ρη ( x )( η ( x ) t ) α − 1 exp [ − η ( x ) t ] 23 / 191

  2. Intro Review Models Covariates Diagnostic Check Duration Model Tests Notice the relation between this function and the unimodal hazard of § 5. There the parameter which governs the mode is η ( x ). By specifying that η ( x ) declines with the age of the female, the declining fertility of older women is incorporated into the hazard. An example of η ( x ) would be  0 if x < 14    1 if 14 ≤ x < 25 � � x − 25 � τ � 2 η ( x ) =  1 − if 25 ≤ x < 45   20 0 if x ≥ 45 24 / 191

  3. Intro Review Models Covariates Diagnostic Check Duration Model Tests 1 0 (x) x 14 25 45 25 / 191

  4. Intro Review Models Covariates Diagnostic Check Duration Model Tests Mixtures of constant hazard rates Even in the simple model in which all individuals have some constant hazard rate, θ , and these rates are distributed across the population according to some mixing density, m ( θ ), problems of identification arise. First, we show that such a mixing model always generates a declining proportional hazard rate, and then discuss distinguishing such a model from one in which each individual has a decreasing hazard rate. 26 / 191

  5. Intro Review Models Covariates Diagnostic Check Duration Model Tests This form of heterogeneity, a mixture of constant hazard rates is common in the duration analysis literature. � ∞ S ( t ) = exp( − θ t ) d µ ( θ ) 0 where d µ ( θ ) is equivalent to m ( θ ) d θ , with m ( θ ) = mixing density of θ in the population θ = unobserved heterogeneous hazard rates 27 / 191

  6. Intro Review Models Covariates Diagnostic Check Duration Model Tests For a population with such a combination of individuals, the proportional, or population hazard rate, is always decreasing over time: � ∞ 0 θ e − θ t d µ ( θ ) d � ∞ h ( t ) = dt [ − ln S ( t )] = 0 e − θ t d µ ( θ ) �� ∞ � 2 − � ∞ � ∞ 0 θ e − θ t d µ ( θ ) 0 e − θ t d µ ( θ ) 0 θ 2 e − θ t d µ ( θ ) d dt h ( t ) = �� ∞ � 2 0 e − θ t d µ ( θ ) By the Cauchy-Schwartz inequality for L 2 , the sign of d dt h ( t ) is negative. 1 1 �� � 2 ≤ � � x 2 ( s ) ds y 2 ( s ) ds . x ( s ) y ( s ) ds � � ∞ � � ∞ e − θ t d µ ( θ ) and θ 2 e − θ t d µ ( θ ) x 2 ( s ) ds = y 2 ( s ) ds = Let 0 0 28 / 191

  7. Intro Review Models Covariates Diagnostic Check Duration Model Tests This result is quite general but the converse is not true. One restriction that helps but does not completely resolve the difficulty of distinguishing between mixtures of constant hazards and mixtures of decreasing hazards is that of complete monotonicity. That is, mixtures of constant hazards must have derivatives which alternate in sign: ( − 1) n d n S ( t ) ≥ 0 for all t ≥ 0 , n ∈ I + dt n If data is fine enough to support differentiation, then a simple sufficient condition (see Feller 1971, Vol. II) for this property to be violated is − h ′′ ( t 0 ) + 3 h ( t 0 ) h ′ ( t 0 ) − h 3 ( t 0 ) > 0 . 29 / 191

  8. Intro Review Models Covariates Diagnostic Check Duration Model Tests However, it may be that an alternative specification with decreasing hazards, � ∞ e − θ t α d µ ( θ ) , 0 < α < 1 S ( t ) = (1) 0 is identical to a constant hazard model � ∞ e − θ t d µ ∗ ( θ ) , (2) 0 where the new mixture is properly defined. Both (1) and (2) are completely monotone, and of course “observationally equivalent.” 30 / 191

  9. Intro Review Models Covariates Diagnostic Check Duration Model Tests This underidentification can be overcome only by further a priori assumptions. If only densities with finite means are allowed then (2) may be ruled out. For example, if d µ ( θ ) is a gamma density, then d µ ∗ ( θ ) must have a “fat” tail. Alternatively, if there is a theoretical restriction on the time dependence, which is presumed the same for all individuals, i.e., α is known, then the mixture d µ ( θ ) is identified. 31 / 191

  10. Intro Review Models Covariates Diagnostic Check Duration Model Tests Actuarial Estimators � t Non-parametric estimates of S ( t ) and 0 h ( u ) du will involve maximum likelihood estimators over a function space. No distributional assumptions regarding m ( θ ) are made. The first example of such a likelihood is an actuarial estimator on a single-state process. 32 / 191

  11. Intro Review Models Covariates Diagnostic Check Duration Model Tests The actuarial estimator is based on the following observation: if failure times across a population are governed by the same distribution but the information on the process gives only the number of survivors at arbitrary time intervals, I 1 I 2 I 3 t 1 t 2 t 3 t k . . . then the survivor function evaluated at each one of the time points S ( t k ) = P ( T > t k ) = P ( T > t 1 , T > t 2 , . . . , T > t k ) = P ( T > t k | T > t 1 , T > t 2 , . . . , T > t k − 1 ) × P ( T > t k − 1 | T > t 1 , T > t 2 , . . . , T > t k − 2 ) . . . × P ( T > t 1 ) = P ( T > t k | T > t k − 1 ) P ( T > t k − 1 | T > t k − 2 ) · · · P ( T > t 1 ) 33 / 191

  12. Intro Review Models Covariates Diagnostic Check Duration Model Tests Notice that if T > t k − 1 , then T > t j , for j < k − 1. If there are η 1 individuals at risk at the beginning of time interval I 1 , then an estimated probability of surviving each interval is P ( T > t j ) = 1 − 1 � . η 1 The estimated survivor function is � � k � 1 − 1 � S ( t k ) = . η j j =1 34 / 191

  13. Intro Review Models Covariates Diagnostic Check Duration Model Tests If the number of failures within an interval is greater than one, then the atuarial estimation may be modified: Let d i be the number of terminations in interval I i . � � � k 1 − d j � � S ( t k ) = . η j j =1 The Kaplan-Meier non-parametric maximum likelihood estimator is an extension of this actuarial construct. � � � 1 − d j � S ( t ) = . η j t i < t Here t 1 < t 2 < · · · < t are the actual times at which individuals experience the event: d i = number of individuals exiting at the i th event time η i = number of individuals at risk at the i th time 35 / 191

  14. Intro Review Models Covariates Diagnostic Check Duration Model Tests Given the non-parametric likelihood, in order to perform hypothesis testing, we need a standard deviation as a function of time. Greenwood’s formula gives � � � � k t d 0 γ ( t ) = � � � S ( t ) ( η − j )( η − j + 1) j =1 � � k t = value of k such that t ∈ t ( k ) , t ( k +1) 36 / 191

  15. Intro Review Models Covariates Diagnostic Check Duration Model Tests The Aalen estimator of integrated hazard is a good descriptive device which uses the same technique used in procedures for estimating multi-state transition rates which may involve complicated time dependence. Where the survivor function is � � t � S ( t ) = exp − h ( u ) du . 0 A procedure to estimate the integrated hazard sums the ratio of exiting individuals to the number remaining at risk at each event time: � t � � d i h ( u ) du = η i 0 i = t i < t 37 / 191

  16. Intro Review Models Covariates Diagnostic Check Duration Model Tests Let � S ( t ) be the Kaplan-Meier estimator. The relation of the Aalen estimate to � S ( t ) may be seen by the following transformation: � � � � � t � � 1 − d i 1 − d i − ln = ln = h ( u ) du η i η i 0 i = t i < t 38 / 191

  17. Intro Review Models Covariates Diagnostic Check Duration Model Tests Each term of this Kaplan-Meier integrated hazard, when written as a series expansion is � � − d 2 + d 3 1 − d i = d i i i − ln − · · · . 2 η 2 6 η 3 η i η i i i The Aalen estimator ignores the higher order terms of this expansion. If the number at risk is large, most of the weight in this series does indeed fall on the first term alone. The Aalen estimator also tends to correct the bias introduced by the nonlinear transformation − ln( S ( t )). 39 / 191

  18. Intro Review Models Covariates Diagnostic Check Duration Model Tests In failure time models, once an individual experiences the event, he is out of the pool of individuals at risk for the rest of the survey. In the following example taken from economics, transitions between unemployment, employment and being out of the labor force may be repeated any number of times. The Aalen estimator may be used in this context as well in testing a hypothesis regarding the transition rates between states. 40 / 191

  19. Intro Review Models Covariates Diagnostic Check Duration Model Tests Given three employment states, there are six possible transition rates. U E O 41 / 191

  20. Intro Review Models Covariates Diagnostic Check Duration Model Tests Define the following event rates: r U , E ( t ) = expected number of unemployment to employment transitions per unit time, per individual at risk. r O , E ( t ) = expected number of transitions from out of the labor force to employment Flinn and Heckman pose the question of whether unemployment and out of the labor force should be designated as separate classifications. 42 / 191

  21. Intro Review Models Covariates Diagnostic Check Duration Model Tests To test this, they examine the hypothesis H 0 : r U , E ( t ) = r O , E ( t ) . First, the assumption that transitions depend on past history is made. Suppose data available gives a counting process over the entire population. The six cumulative counts are N e ( t ) = ( N U , E ( t ) , N N , E ( t ) , N E , U ( t ) , . . . ) Note that individuals may appear in these counts more than once. 43 / 191

  22. Intro Review Models Covariates Diagnostic Check Duration Model Tests The Aalen estimator for the integrated transition rate is � t � d k r U , E ( u ) du = Y U ( t U , E ( k ) ) 0 k :0 ≤ t k < t where Y U = the number of individuals in state U at time t . Although Y U need not decline monotonically with multistate flows, it is still the number of individuals at risk in state U . 44 / 191

  23. Intro Review Models Covariates Diagnostic Check Duration Model Tests The test relies on the query, “how frequently do events occur per time period per individual?” The event times of the two transition patterns do not have to match, but having complete event history data is crucial. O, E . . . s k s 1 s 2 s 3 U, E . . . t 1 t 2 t 3 t k The test asks whether � t � d k r U , E ( u ) du = Y U ( t U , E ( k ) ) 0 k :0 ≤ t k < t is equal to � t � d k r O , E ( u ) du = Y O ( t O , E ( k ) ) 0 k :0 ≤ t k < t 45 / 191

  24. Intro Review Models Covariates Diagnostic Check Duration Model Tests If event history were not available, and instead prospective point data were, then multiple intermediate transitions would be unobservable. To infer what jumps occurred between observed points, on might try to fit a Markov or semi-Markov process. 46 / 191

  25. Intro Review Models Covariates Diagnostic Check Duration Model Tests Estimation of Separable Hazard Models In this section, the problem of estimating the form of time dependence is addressed. Specifically, the sensitivity of the time dependence estimate to the form of the distribution of unobserved heterogeneity assumed and to the parametrization of the time dependency is examined. Finally, the nonparametric method of the EM algorithm is presented as an alternative to the standard maximum likelihood methods. 47 / 191

  26. Intro Review Models Covariates Diagnostic Check Duration Model Tests If the survivor function is of the separable form exp X f β S ( t | x ) = [ S 0 ( t )] then partial likelihood estimation will yield a ˆ β estimate, and parametric specifications of the time dependence, such as h 0 ( t ) = α t α − 1 or h 0 ( t ) = e α t , where � � � t S 0 ( t ) = exp − h 0 ( u ) du 0 may be estimated with standard techniques. Given the variety of functional forms that might be chosen for the time dependence, on would perhaps like to data to speak for itself, in the absence of any theory on time dependence. 48 / 191

  27. Intro Review Models Covariates Diagnostic Check Duration Model Tests A nonparametric approach would proceed in the following fashion: Choose a priori a set of time intervals that need not correspond to jump times associated with occurrences of transitions. They must satisfy the condition that they are long enough to be non-empty of events.  e α 1  for 0 ≤ t < t 1    e α 2 for t 1 ≤ t < t 2 h 0 ( t ) = . .  .    e α k for t k − 1 ≤ t < t k 49 / 191

  28. Intro Review Models Covariates Diagnostic Check Duration Model Tests This places no restrictions of monotonicity on the number of modes for the hazard function. The integrated hazard will be � t � k e α j ( t j − t j − 1 ) h 0 ( u ) du = 0 j =1 for a stepwise hazard, as in the graph below: 50 / 191

  29. Intro Review Models Covariates Diagnostic Check Duration Model Tests Stepwise Hazard Function 1 2 3 4 5 6 51 / 191

  30. Intro Review Models Covariates Diagnostic Check Duration Model Tests The estimated hazards are �� �� � α k = ln d k = ln δ i exp X e i β e i ∈ R k where d k = number of individuals who experience the event of interest in the half open interval [ t k − 1 , t k ) R k = set of individuals at risk in the interval [ t k − 1 , t k ) δ i =1 if the individual is uncensored. Again, δ i points out that non-censored individuals provide information regarding time dependence. 52 / 191

  31. Intro Review Models Covariates Diagnostic Check Duration Model Tests If the estimated ˆ h 0 ( t ) is of the form in the graph (A), which appears unimodal, this would support a specification of h 0 ( t ) = λα ( λ t ) α − 1 1 + ( λ t ) α for α > 1 . 53 / 191

  32. Intro Review Models Covariates Diagnostic Check Duration Model Tests (A) Estimated ˆ h 0 ( t ) h(t) t 54 / 191

  33. Intro Review Models Covariates Diagnostic Check Duration Model Tests (B) Estimated ˆ h 0 ( t ) t 55 / 191

  34. Intro Review Models Covariates Diagnostic Check Duration Model Tests Survivor Analysis and GMLE A brief look at the theory underlying generalized maximum likelihood follows. (See Kiefer and Wolfowitz.) Define d µ ( x ) to be a dominating measure if for dP 0 ( x ) = f ( x ) d µ ( x ) , µ ( A ) = 0 ⇒ implies that P 0 ( A ) = 0 . Let P be the measure of all probability measures. For every pair of probability measures P 1 and P 2 in the class P , define dP 1 ( x ) f ( x ; P 1 , P 2 ) = d ( P 1 + P 2 )( x ) 56 / 191

  35. Intro Review Models Covariates Diagnostic Check Duration Model Tests Radon-Nikodym derivative Here, f ( x ; P 1 , P 2 ) plays the role of the likelihood. The measure ˆ P is a generalized maximum likelihood estimator if f ( x , ˆ P , P ) ≥ f ( x , P , ˆ P ) or d ˆ P ( x ) dP ( x ) ≥ . d (ˆ d (ˆ P + P )( x ) P + P )( x ) 57 / 191

  36. Intro Review Models Covariates Diagnostic Check Duration Model Tests Let T 1 , T 2 , . . . , T n be the times an event of interest occurs in a population surveyed. To extend the analysis, we now allow a censoring of the data in a random manner. Let C 1 , C 2 , . . . , C n be the censoring times: this is equivalent to an individual dropping out of a sample population before experiencing the event in question. 58 / 191

  37. Intro Review Models Covariates Diagnostic Check Duration Model Tests The observable is Y i = min( T i , C i ). At each time Y i , an individual either makes the transition out of the state having experienced the event or drops out from the sample’s observed population “prematurely.” Let the δ i variable indicate whether y i is censored or not: � 0 if the individual is censored δ i = 1 if not Consider the probability distributions on x = (( y 1 , δ 1 ) , ( y 2 , δ 2 ) , . . . , ( y n , δ n )) . 59 / 191

  38. Intro Review Models Covariates Diagnostic Check Duration Model Tests Finding a generalized maximum likelihood estimator of the hazard rate is the same as finding a P ( x ) such that the observed events are given the maximum probability: � n � � δ i { Pr( T > y i ) } 1 − δ i . P ( x ) = Pr( T = y ( i ) ) i =1 The first bracketed term is the probability of an event occurring at exactly time T . It is given weight only if the event is uncensored, i.e., when δ i = 1. The second term is the probability that the event occurs any time after the time T , which is the most information that a censoring at T yields. It receives weight only if δ i = 0. 60 / 191

  39. Intro Review Models Covariates Diagnostic Check Duration Model Tests More succinctly, the problem is to � � � n � n max p i · p j for p 1 , p 2 , . . . , p n ≥ 0 . i =1 j =1 When the number of individuals is finite, the sum of the number of events and censoring must also be finite. (The two are equal.) 61 / 191

  40. Intro Review Models Covariates Diagnostic Check Duration Model Tests When the time partitioning is fine enough so that no two events occur exactly simultaneously, then the solution to the maximum problem is exactly the Kaplan-Meier estimate: � � � i − 1 δ δ j ˆ P i = 1 − . η − i + 1 η − j + 1 j =1 A comparison of time dependent rate estimates depends on the assumption of a homogeneous population. Finding differences across covariates requires further investigation. 62 / 191

  41. Intro Review Models Covariates Diagnostic Check Duration Model Tests Duration Models with Covariates Complicating the underlying process by postulating that it is affected by some covariates (which we assume are observable, for now) leads to gains in estimation efficiency to some specification of the form of duration dependence. The issue which arises, on which statisticians and social scientists are divided, is whether to model the durations (i.e., waiting times) themselves, or to model the rates of exit (the hazard function). Application of a standard regression framework to durations imply complex hazard specifications. Statisticians tend to favor modeling curves to fit the hazard itself with more convenient parameterizations. 63 / 191

  42. Intro Review Models Covariates Diagnostic Check Duration Model Tests Regression Framework In order to apply linear regression to waiting times, a standard technique is to take a log transformation, mapping non-negative waiting times onto the entire real line. Then a symmetric disturbance term ε i may be applied to a regression of log durations: � k ln t i = β 0 + β j x ji + ε i . j =1 x ji = the value of the j th covariate for individual i . ε i = iid, ∼ Φ(0 , δ 2 ) where Φ is the normal cdf. 64 / 191

  43. Intro Review Models Covariates Diagnostic Check Duration Model Tests The theorizing here is at the level of linking the covariates to the expected value of log duration. The hazard is subsumed in this specification, and is worth examining. Although the waiting times are straightforward, the hazard is complex (read “crazy”). 65 / 191

  44. Intro Review Models Covariates Diagnostic Check Duration Model Tests Let ′ β ln T = β 0 + X + ε � � � � ′ β T = exp β 0 + X + ε . � � The conditional survivor function is � � � � ′ β P ( T > t | X ) = P exp β 0 + X exp ( ε ) > t � � � � � �� ′ β = P exp ( ε ) > t exp − β 0 − X � � � � ′ β = P ε > ln t − β 0 − X � � � � ′ β = 1 − Φ ln t − β 0 − X . � � 66 / 191

  45. Intro Review Models Covariates Diagnostic Check Duration Model Tests The survivor function is � � t � S ( t ) = 1 − Φ( T ) = exp − h ( u ) du . 0 The hazard my be retrieved as well by differentiation. � t � � �� ′ β h ( u ) du = − ln 1 − Φ ln t − β 0 − X � 0 � 1 ′ β t φ (ln t − β 0 − X ) � h ( t ) = ) . � 1 − Φ(ln t − β 0 − X ′ β � � Another drawback of this method is having to know the completed waiting times t i . Traditionally, one may know only transition times censored in some fashion. 67 / 191

  46. Intro Review Models Covariates Diagnostic Check Duration Model Tests Consider a More General Approach This approach postulates a general separable hazard specification where the time dependent portion is multiplicatively separable from the portion which varies with X , some vector of covariates, i.e., � h ( t | X ) = ψ ( t ) u ( x ) , � where ψ ( t ), u ( x ) are non-negative valued functions. 68 / 191

  47. Intro Review Models Covariates Diagnostic Check Duration Model Tests An example of such a hazard is the Cox specification: ′ β h ( t | X ) = ψ ( t ) exp( X ) � � � or, � � � t ) , ′ β exp( X exp − h ( u | X ) du = [ S 0 ( t )] � � � 0 where � � � t S 0 ( t ) = exp − φ ( u | X ) du . � 0 A full maximum likelihood estimator of the hazard and the coefficients on the covariates is constructed as follows. 69 / 191

  48. Intro Review Models Covariates Diagnostic Check Duration Model Tests Define D i = indicator for individuals who experience an event of interest at time i . C i = indicator for censored individuals: those who drop out of the sample before they experience an event. By looking at the conditional survivor function, we find the contribution to the likelihood for individuals experiencing the event is � � exp( X � � exp( X ) − 1 β ) β S 0 ( t + � S 0 ( t ( i ) ) ( i ) ) � � � where t + ( i ) denotes the time just after t ( i ) , i.e., t ( i ) + ∆. 70 / 191

  49. Intro Review Models Covariates Diagnostic Check Duration Model Tests For a censored individual, the contribution is � � exp( X β ) S 0 ( t + � ( i ) ) � . The likelihood is therefore �� ) � � � exp( X � � exp( X β ) ℓ β S 0 ( t + L = S 0 ( t ( i ) ) − ( i ) + ∆) � � � � ℓ ∈ D i �� ) � � exp( X � ℓ β S 0 ( t + × ( i ) ) . � � ℓ ∈ C i 71 / 191

  50. Intro Review Models Covariates Diagnostic Check Duration Model Tests Following a method suggested by Cox, if one is interested in the covariates, then one might maximize the partial likelihood. This ignores the time-dependent part of the hazard, separating out the changes which enter through the covariates. Given an additional requirement that the covariates X be � time-invariant, (see B. Efron). The Cox likelihood proposal is � n exp( X i β ) � max � � ℓ ∈ R ( t ( i ) ) exp( X ℓ β ) β i =1 � � where R ( t ( i ) ) is the number of individuals at risk at time t ( i ) . 72 / 191

  51. Intro Review Models Covariates Diagnostic Check Duration Model Tests Heuristically, this term may be related to the full maximum likelihood by the following argument. The conditional probability that person ( i ) experiences an event at t ( i ) given that R ( t ( i ) ) individuals are at risk and that exactly on event occurs at time t ( i ) is ′ h ( t | X i ) = P ( t < T < t + δ | T > t ) ∼ φ ( t ) exp( X i β ) . � 73 / 191

  52. Intro Review Models Covariates Diagnostic Check Duration Model Tests By the definition of conditional probability, this is � � the fraction with covariates φ ( t ) exp( X ( i ) ) β i who exit in interval ( t , t + z ) X � � ℓ β ) = � � total population at risk � � ℓ ∈ R ( t + ) φ ( t ) exp( X � as of time t Making the assumption that the nature of time dependence is the same for all individuals, φ ( t ) cancels out, yielding one element of the Cox partial likelihood objective. 74 / 191

  53. Intro Review Models Covariates Diagnostic Check Duration Model Tests A Diagnostic Check for Specific Hazards Given the expense of generalized maximum likelihood estimation, it is useful to have simple diagnostic tests of the specification of the time dependence in a duration model. A test of proportional hazards under Weibull time dependence can be performed by a graphical technique: exp( X β ) Let S ( t | x ) = P ( T > t | X = x ) = [ S 0 ( t )] � � � � � Taking a log transformation twice over, we have − ln S ( t | x ) = − ln S 0 ( t ) exp( X β ) � � ln( − ln S ( t | x )) = ln( − ln S 0 ( t )) + X β . (3) � � 75 / 191

  54. Intro Review Models Covariates Diagnostic Check Duration Model Tests When the time dependence part of the survivor function is of the Weibull family, then S 0 ( t ) = exp( − t α ) and the first term of the equation (3) is ln( − ln S 0 ( t )) = α ln t and ln( − ln S ( t | x )) = α ln t + X β . � � 76 / 191

  55. Intro Review Models Covariates Diagnostic Check Duration Model Tests Graphing ln( − ln S ( t | x )), a “double” log transformation of the survivor function against ln( t ) should produce a family of straight lines of the same slope, α . The intercepts should vary according to X β under this separable � � specification of S ( t | x ). If the graph does not conform then this convenient specification does not apply. 77 / 191

  56. Intro Review Models Covariates Diagnostic Check Duration Model Tests Double log Transformation of Survivor Function Against ln( t ) ln( -ln( S ( t ) ) ) slope = V ln( t ) 78 / 191

  57. Intro Review Models Covariates Diagnostic Check Duration Model Tests More generally, unless parallel curves are generated by plots of ln[ − ln S ( t | x )] against ln t for various chosen values of the � covariates x , then the assumption of separable cannot hold. � Example, see J. Menken, J. Trussel, D. Stampel, O. Babokol, Demography, Vol. 18, 1981 pp 181-200 (on marital dissolution) 79 / 191

  58. Intro Review Models Covariates Diagnostic Check Duration Model Tests A Duration Model from Economic Theory In a search model of unemployment, the Poisson arrival of new job offers and a reservation wage strategy of income maximizing workers generates observed unemployment spells which have fundamentally non-separable hazards. Heterogeneity across workers is likely to exist either in costs of search or wage offer distributions. This model is outlined here as an example of a duration model based on optimizing behavior by economic agents. A thorough presentation may be found in Lippman & MaCall, (1976). 80 / 191

  59. Intro Review Models Covariates Diagnostic Check Duration Model Tests λ = Poisson encounter rate with new job offers V = value of search rV = reservation wage Let c = instantaneous cost of search r = instantaneous interest rate F ( w ) = distribution of wage offers, assumed to have finite mean. 81 / 191

  60. Intro Review Models Covariates Diagnostic Check Duration Model Tests Agents maximize income subject to the following scheme: If search cost C is incurred, job offers arrive at rate λ independent of c . Wage offers are independently drawn without recall from distribution F ( w ). Agents are infinite-lived and jobs last forever, having present discounted value w r . The value of search is � � w � 1+ r ∆ t + 1 − λ ∆ t − c ∆ t λ ∆ t 1+ r ∆ t V + 1+ r ∆ t E max r , V + O(∆ t ) if V > 0 V = 0 otherwise . 82 / 191

  61. Intro Review Models Covariates Diagnostic Check Duration Model Tests Passing to the limit, we have � ∞ c + rV = λ ( w − rV ) dF ( w ) r rV and reservation strategy: � 1 if w > rV , job offer accepted d = 0 if w ≤ rV , job offer rejected . Then the probability that an unemployment spell exceeds duration t u , given hazard rate of exit (acceptance of a job) h u = λ (1 − F ( rV )), is Pr( T u > t u ) = exp( − λ (1 − F ( rV )) t u ) . 83 / 191

  62. Intro Review Models Covariates Diagnostic Check Duration Model Tests Returning to the entire population, with some observed characteristics x and unobserved characteristics θ , we have � P r ( T u > t u | x ) = exp [ − λ ( x , θ )(1 − F ( rV ( x , θ ))) t u ] d µ ( θ ) � Θ where � � � ∞ θ | 0 < λ Θ = ( w − rV ( θ, x )) dF ( w ) − c ( x , θ ) ≤ ω r rV Obviously, there is no general separable hazard specification that emerges. Instead, the hazard and covariates are linked by the solution to the Bellman equation, in which the reservation wage rV is implicitly defined. See Heckman and Singer (1984). 84 / 191

  63. Intro Review Models Covariates Diagnostic Check Duration Model Tests Duration Models with Unobservables In most applications, the analyst has data on some process, along with some observable characteristics regarding individuals included in the survey. In addition to these, there may be other characteristics of these individuals which are factors affecting the process but which are unmeasured. For example, in the Stanford study of heart transplant recipients, it is likely that each patient differed in some dimension, call it “frailty”, which affected their survival times after receiving transplants. Thus, the survivor function includes an unobservable θ : P ( T > t | x , θ ) = exp( − H ( t ) U ( x ) V ( θ )) � � ′ β = exp( − H ( t ) exp( x + θ )) � � 85 / 191

  64. Intro Review Models Covariates Diagnostic Check Duration Model Tests The statistician has the following issues to address: what estimation strategy to use to obtain ˆ β in the presence of θ , and what estimation can reveal regarding the distribution of θ itself. Even when this distribution of θ is of no interest to the analyst, its presence will affect the consistency of estimation of β . Assuming some form of the mixing distribution of θ , d µ ( θ ), the data may be confronted with the integrated survivor function, where θ has been integrated out: � ′ β P ( T > t | X ) = exp( − H ( t ) exp( X + θ )) d µ ( θ ) � � � 86 / 191

  65. Intro Review Models Covariates Diagnostic Check Duration Model Tests Commonly used functional forms for d µ ( θ ) are gamma and normal distributions b a θ a − 1 exp( − b θ ) Gamma: d µ ( θ ) = d θ Γ( a ) exp( − ( θ − a ) 2 / 2 b ) Normal: d µ ( θ ) = d θ. √ 2 π b Both of these offer a flexible, analytically tractable and computationally convenient family of distributions. Both are fully described by the specification of two parameters a , b above. 87 / 191

  66. Intro Review Models Covariates Diagnostic Check Duration Model Tests For all the claimed convenience of these specifications, along with lognormal variation, they do not all yield the same qualitative estimates for the coefficients on time dependence and observed covariates. Sensitivity of these estimators is apparent in one study, the Heckman and Singer analysis of labor earnings data, but not in the Manton, Stallard and Vaupel study of mortality risks among the aged. 88 / 191

  67. Intro Review Models Covariates Diagnostic Check Duration Model Tests Mortality Risks Among the Aged I. Weibull Parameter Estimates: φ ( t ) = α t α − 1 Age Gamma Heterogeneity Inverse Gaussian No Heterogeneity 65+ 5.89 (.05) 5.88 (.08) 5.44 (.04) 67+ 6.00 (.06) 5.98 (.09) 5.46 (.04) 70+ 6.39 (.07) 6.35 (.11) 5.69 (.05) II. Gompertz Parameter Estimates: φ ( t ) = e γ t Age Gamma Heterogeneity Inverse Gaussian No Heterogeneity 7 . 48 · 10 − 2 ( . 09 · 10 − 2 ) 7 . 72 · 10 − 2 ( . 21 · 10 − 2 ) 6 . 26 · 10 − 2 ( . 06 · 10 − 2 ) 65+ 7 . 56 · 10 − 2 ( . 10 · 10 − 2 ) 7 . 77 · 10 − 2 ( . 23 · 10 − 2 ) 6 . 11 · 10 − 2 ( . 06 · 10 − 2 ) 67+ 7 . 97 · 10 − 2 ( . 12 · 10 − 2 ) 8 . 13 · 10 − 2 ( . 25 · 10 − 2 ) 6 . 18 · 10 − 2 ( . 05 · 10 − 2 ) 70+ 89 / 191

  68. Intro Review Models Covariates Diagnostic Check Duration Model Tests Although the labor economics study shows qualitative differences in the model and in the duration dependence, the mortality study shows duration dependence that is robust to alternative specifications of the mixing distribution. It has yet to be shown what characterizes a data set which will be robust to alternative heterogeneity specifications. 90 / 191

  69. Intro Review Models Covariates Diagnostic Check Duration Model Tests Alternatively, in a study of child mortality, the specification of time dependence is shown to be sensitive to the inclusion of unobservables in the estimation strategy. Nonparametric maximum likelihood estimation has been shown to give unbiased estimates of covariate coefficients provided one has chosen a particular form of time dependence in Monte Carlo studies. This is where theory must play an important role. 91 / 191

  70. Intro Review Models Covariates Diagnostic Check Duration Model Tests 0.6 0.6 unobservables w/ unobservables ignored est. by NPMLE 0.4 0.4 0.2 0.2 NPMLE unobservables ignored Weibull Gompertz 92 / 191

  71. Intro Review Models Covariates Diagnostic Check Duration Model Tests Without strong enough theory to suggest the “time” underlying functional form of time dependence, these studies suggest, the effect of covariates and time dependence cannot be distinguished, even with the use of a nonparametric estimation strategy. 93 / 191

  72. Intro Review Models Covariates Diagnostic Check Duration Model Tests The following digression on an area of statistics on the extraction of “true test scores” focuses on the issues that underlie the reasons why duration models with underlying heterogeneity need strong restrictions a priori from theory to obtain identification. The demands on the data from models of this sort are far greater than those of regression models. 94 / 191

  73. Intro Review Models Covariates Diagnostic Check Duration Model Tests True test scores are discussed in Lord and Novick, Statistical Theory of Mental Test Scores (1968). Let X = observed test score ξ = errors, with density h ξ u = unobserved true test score, with density g u X = ξ + u � ∞ f ( X ) = h ξ ( X − t ) g u ( t ) dt −∞ 95 / 191

  74. Intro Review Models Covariates Diagnostic Check Duration Model Tests Notice that a survivor function of form � S ( t ) = k ( t | θ ) d µ ( θ ) is a general form of h ξ ( X − θ ). The density of true test scores g u ( t ) is the analog of the mixing distribution. The question of both problems is fundamentally the same: When can an observed histogram be decomposed and purged of some noise? J.W. Tukey does so with strong assumptions regarding the densities h ξ and g u in “Named and Faceless Values,” Sankhya 1974. 96 / 191

  75. Intro Review Models Covariates Diagnostic Check Duration Model Tests Heuristically, you want to decompose observed from true test scores, something that can be accomplished only with some already known characteristics of the errors: θ i = X i + σ 2 � Z ( X − X i ) σ 2 X In this linear model, an empirical Bayes estimator � θ uses a variance obtained from a previous study by ETS on errors in test scores to extract an estimate of u . 97 / 191

  76. Intro Review Models Covariates Diagnostic Check Duration Model Tests For the survivor analysis, we need a nonlinear version of this. For each individual, identify a normal density with mean θ i and variance σ 2 i where σ 2 i is the variance of θ i . The estimated time score distribution must be extracted from an observed histogram, where each portion of that histogram is one observation on an underlying distribution for that one individual. 98 / 191

  77. Intro Review Models Covariates Diagnostic Check Duration Model Tests ^ 0 ( 2 , F ) 2 i i 2 1 2 2 2 n . . . 99 / 191

  78. Intro Review Models Covariates Diagnostic Check Duration Model Tests 2 2 100 / 191

Recommend


More recommend