Definitions and Some Examples of Biased Samples Distribution of Y ∗ is G ( y ∗ | Y > c ) = F ( y ∗ | Y > c ) = F ( y ∗ | ∆ = 1) (6a) F ( y ∗ ) 1 − F ( c ) , y ∗ > c . = Point mass at Y ∗ = 0 (Convention) for Y ∗ = 0 (∆ = 0) . (6b) 22 / 125
Definitions and Some Examples of Biased Samples Observe that (6a) is obtained from (1) by setting ω ( y ∗ ) = 1 if y > c , and ω ( y ∗ ) = 0 otherwise, and integrating up with respect to y ∗ . The distribution of ∆ is Pr(∆ = δ ) = [1 − F ( c )] δ [ F ( c )] 1 − δ , δ ∈ { 0 , 1 } . The joint distribution of ( Y ∗ , ∆) for a censored sample: F ( y ∗ , δ ) F ( y ∗ | δ )Pr( δ ) = (7) � � δ F ( y ∗ ) [1 − F ( c )] δ (1) 1 − δ [ F ( c )] 1 − δ = (1 − F ( c )) [ F ( y ∗ )] δ [ F ( c )] 1 − δ . = 23 / 125
Definitions and Some Examples of Biased Samples (7) is obtained from (4) by setting ω ( y ) = 0 y < c , ω ( y ) = 1 otherwise, by setting i ( y ) = ω ( y ), and by integrating up with respect to y ∗ . For normally distributed Y : (7) is “Tobit” model. 24 / 125
Definitions and Some Examples of Biased Samples More information in a censored sample than in a truncated sample because one can obtain (6a) from (7) (by conditioning on ∆ = 1) but not vice versa. 25 / 125
Definitions and Some Examples of Biased Samples Inferences about the population distribution based on assuming that F ( y ∗ | Y > c ) closely approximates F ( y ) are potentially very misleading. A description of population income inequality based on a subsample of high income people may convey no information about the true population distribution. 26 / 125
Definitions and Some Examples of Biased Samples Without further information about F and its support, it is not possible to recover F from G ( y ∗ ) from either a censored or a truncated sample. Access to a censored sample enables the analyst to recover F ( y ) for y > c but obviously does not provide any information on the shape of the true distribution for values of y ≤ c . 27 / 125
Definitions and Some Examples of Biased Samples Problem is routinely “solved” by assuming that F is of a known functional form. This solution strategy does not always work. If F is normal, then it can be recovered from a censored or truncated sample (Pearson, 1900). If F is Pareto, F cannot be recovered from either a truncated or a censored sample (see Flinn and Heckman, 1982b). Show this. If F is real analytic (i.e., possesses derivatives of all order) and the support of Y is known, then F can be recovered (Heckman and Singer, 1986). 28 / 125
Definitions and Some Examples of Biased Samples Example 2. Expand the previous discussion to a linear regression setting . Let Y = ❳ β + U (8) be the population earnings function where Y is earnings. “ β ”: suitably dimensioned parameter vector. ❳ is a regressor vector assumed to be distributed independently of mean zero disturbance U . ⊥ X ; E ( XX ′ ) full rank, E ( U ) = 0. U ⊥ 29 / 125
Definitions and Some Examples of Biased Samples Data are collected on incomes of persons for whom Y exceeds c . Weight depends solely on y : ω ( y , ① ) = 0 , y ≤ c , ω ( y , ① ) = 1 , y > c . Can identify the sample distribution of Y above c the sample distribution of ❳ for Y above c and the proportion of the original random sample with income below c . Do not know Y below c . 30 / 125
Definitions and Some Examples of Biased Samples As before, let Y ∗ = Y if Y > c . Define Y ∗ = 0 otherwise. ∆ = 1 if Y > c , ∆ = 0 otherwise. The probability of the event ∆ = 1 given ❳ = ① is Pr(∆ = 1 | ❳ = ① ) = Pr( Y > c | ❳ = ① ) = Pr( U > c − ① β | ❳ = ① ) . 31 / 125
Definitions and Some Examples of Biased Samples Invoke independence between U and ❳ and letting F u denote the distribution of U , Pr(∆ = 1 | ❳ = ① ) = 1 − F u ( c − ① β ) (9a) and Pr(∆ = 0 | ❳ = ① ) = F u ( c − ① β ) . (9b) 32 / 125
Definitions and Some Examples of Biased Samples The distribution of Y ∗ conditional on ❳ : G ( y ∗ | Y > 0 , ❳ = ① ) F ( y ∗ | X = x , Y > c ) = (10a) F ( y ∗ | ❳ = ① , ∆ = 1) = F u ( y ∗ − ① β ) y ∗ > c . = 1 − F u ( c − ① β ) , G ( y ∗ | Y ≤ 0) = 1 for Y ∗ = 0 (∆ = 0) . (10b) 33 / 125
Definitions and Some Examples of Biased Samples The joint distribution of ( Y ∗ , ∆) given ❳ = ① is ① ) = F ( y ∗ | δ, ① ) Pr ( δ | ① ) F ( y ∗ , δ | ❳ = (11) { F u ( y ∗ − ① β ) } δ { F u ( c − ① β ) } 1 − δ . = In particular, E ( Y ∗ | ❳ = ① , ∆ = 1) = ① β + E ( U | ❳ = ① , δ = 1) (12) � ∞ z d F u ( z ) = ① β + (1 − F u ( c − ① β )) c − ① β z : dummy variable of integration. 34 / 125
Definitions and Some Examples of Biased Samples Population mean regression function is E ( Y | ❳ = ① ) = ① β. (13) Contrast between (12) and (13) illuminating. When theoretical model is estimated on a selected sample (∆ = 1), the true conditional expectation is (12) not (13). 35 / 125
Definitions and Some Examples of Biased Samples The conditional mean of U depends on ① . Omitted variable analysis, E ( U | ❳ = ① , ∆ = 1): omitted from the regression. Likely to be correlated with ① . Least squares estimates of β obtained on selected samples which do not account for selection are biased and inconsistent. 36 / 125
Definitions and Some Examples of Biased Samples Illustrate the nature of the bias, it is useful to draw on the work of Cain and Watts (1973). Suppose that X is a scalar random variable (e.g., education) and that its associated coefficient is positive ( β > 0). Under conventional assumptions about U (e.g., mean zero, independently and identically distributed and distributed independently of X ), the population regression of Y on X is a straight line. The scatter about the regression line and the regression line are given in Figure 1. 37 / 125
Definitions and Some Examples of Biased Samples Figure 1: Y Population regression Selected sample regression c 38 / 125
Definitions and Some Examples of Biased Samples When Y > c is imposed as a sample inclusion requirement, lower population values of U are excluded from the sample in a way that systematically depends on x . ( Y > c or U > c − x β ). As x increases and β > 0, the conditional mean of U : [ E ( U | X = x , ∆ = 1)] decreases. Regression estimates of β that do not correct for sample selection (i.e., include E ( U | X = x , ∆ = 1) Downward biased because of the negative correlation between x and E ( U | X = x , ∆ = 1). Flattened regression line for the selected sample in Figure 1. 39 / 125
Definitions and Some Examples of Biased Samples In models with more than one regressor, no sharp result on the sign of the bias in the regression estimate that results from ignoring the selected nature of the sample is available. Conventional least squares estimates of β obtained from selected samples are biased and inconsistent remains true. 40 / 125
Definitions and Some Examples of Biased Samples Fruitful to distinguish between the case of a truncated sample and the case of a censored sample. In the truncated sample case, no information is available about the fraction of the population that would be allocated to the truncated sample [Pr (∆ = 1)]. In the censored sample case, this fraction is known or can be consistently estimated. Fruitful to distinguish two further cases: Case (a), the case in which ❳ is not observed when ∆ = 0. Case (b) is the one most fully developed in the literature: X observed when D = 0. 41 / 125
Definitions and Some Examples of Biased Samples Conditional mean E ( U | ❳ = ① , ∆ = 1) is a function of c − ① β solely through Pr(∆ = 1 | ① ) . Since Pr(∆ = 1 | ① ) is monotonic in c − ① β . The conditional mean depends solely on Pr(∆ = 1 | ① ) and the parameters F u i.e., since F − 1 u (1 − Pr(∆ = 1 | x )) = c − ① β ∞ � zdF u ( z ) E ( U | X = x , ∆ = 1) = Pr(∆ = 1 | ① ) F − 1 [1 − Pr(∆=1 | ① )] u = K ( P (∆ = 1 | x )) ln P (∆ = 1 | x ) → 1 , K ( P (∆ = 1 | x )) = 0 . 42 / 125
Definitions and Some Examples of Biased Samples This relationship demonstrates that the conditional mean is a function of the probability of selection. As the probability of selection goes to 1, the conditional mean goes to zero. For samples chosen so that the values of ① are such that the observations are certain to be included the sample, there is no problem in using ordinary least squares on selected samples to estimate β . Thus in Figure 1, ordinary least squares regressions fit on samples selected to have large ① values closely approximate the true regression function and become arbitrarily close as ① becomes large. 43 / 125
Definitions and Some Examples of Biased Samples The conditional mean in (12) is a surrogate for Pr(∆ = 1 | ① ) . As this probability goes to one, the problem of sample selection in regression analysis becomes negligibly small. Much more general idea Heckman (1976) demonstrates that β and F u are identified if U is normally distributed and standard conditions invoked in regression analysis are satisfied. In Newey; Gallant and Nycha, Powell, etc., F u is consistently nonparametrically estimated. 44 / 125
Definitions and Some Examples of Biased Samples Example 3 : censored random variables . This concept extends the notion of a truncated random variable by letting a more general rule than truncation on the outcome of interest generate the selected sample. Because the sample generating rule may be different from a simple truncation of the outcome being studied, the concept of a censored random variable in general requires at least two distinct random variables. 45 / 125
Definitions and Some Examples of Biased Samples Let Y 1 be the outcome of interest. Let Y 2 be another random variable. Denote observed Y 1 by Y ∗ 1 . If Y 2 < c , Y 1 is observed. Otherwise Y 1 is not observed and we can set Y ∗ 1 = 0 or any other convenient value (assuming that Y 1 has no point mass at Y 1 = 0 or at the alternative convenient value). In weighting function ω ; ω ( y 1 , y 2 ) = 0 if y 2 > c . ω ( y 1 , y 2 ) = 1 if y 2 ≤ c . 46 / 125
Definitions and Some Examples of Biased Samples Selection rule Y 2 < c does not necessarily restrict the range of Y 1 . Thus Y ∗ 1 is not in general a truncated random variable. Define ∆ = 1 if Y 2 < c ; ∆ = 0 otherwise. 47 / 125
Definitions and Some Examples of Biased Samples If F ( y 1 , y 2 ) is the population distribution of ( Y 1 , Y 2 ), the distribution of ∆ is Pr(∆ = δ ) = [1 − F 2 ( c )] 1 − δ [ F 2 ( c )] δ , δ = 0 , 1 , F 2 is the marginal distribution of Y 2 . 48 / 125
Definitions and Some Examples of Biased Samples The distribution of Y ∗ 1 is 1 ; δ = 1) = F ( y ∗ 1 ; c ) G ( y ∗ 1 ) = F ( y ∗ ∆ = 1 , (14a) F 2 ( c ) , G ( y ∗ 1 = 0) = 1 , ∆ = 0 . (14b) (14a): the distribution function corresponding to the density in (1) when ω ( y 1 , y 2 ) = 1 if y 2 ≤ c and ω ( y 1 , y 2 ) = 0 otherwise. 49 / 125
Definitions and Some Examples of Biased Samples The joint distribution of ( Y ∗ 1 , ∆) is G ( y ∗ 1 , δ ) = [ F ( y ∗ 1 ; c )] δ [1 − F 2 ( c )] 1 − δ . (15) This is the distribution function corresponding to density (4) for the special weighting rule of this example. In a censored sample, under general conditions it is possible to consistently estimate Pr(∆ = δ ) and G ( y ∗ 1 ). 50 / 125
Definitions and Some Examples of Biased Samples In a truncated sample, only conditional distribution (14a) can be estimated. A degenerate version of this model has Y 1 ≡ Y 2 . In that case, censored random variable Y 1 is also a truncated random variable. Note that a censored random variable may be defined for a truncated or censored sample. 51 / 125
Definitions and Some Examples of Biased Samples Example 3: Let Y 1 be the wage of a woman. Wages of women are observed only if women work. Let Y 2 be an index of a woman’s propensity to work. 52 / 125
Definitions and Some Examples of Biased Samples Y 2 is postulated as the difference between reservation wages (the value of time at home determined from household preference functions) and potential market wages Y 1 . Then if Y 2 < 0, the woman works. Otherwise, she does not. Y ∗ 1 = Y 1 if Y 2 < 0 is the observed wage. 53 / 125
Definitions and Some Examples of Biased Samples If Y 1 is the offered wage of an unemployed worker, and Y 2 is the difference between reservation wages (the return to searching) and offered market wages, Y ∗ 1 = Y 1 if Y 2 < 0 is the accepted wage for an unemployed worker (see Flinn and Heckman, 1982a). If Y 1 is the potential output of a firm and Y 2 is its profitability, Y ∗ 1 = Y 1 if Y 2 > 0. If Y 1 is the potential income in occupation one and Y 2 is the potential income in occupation two. 54 / 125
Definitions and Some Examples of Biased Samples Y ∗ 1 = Y 1 if Y 1 − Y 2 < 0 while Y ∗ 2 = Y 2 if Y 1 − Y 2 ≥ 0. 55 / 125
Definitions and Some Examples of Biased Samples Example 4 . Builds on example 3 by introducing regressors. This produces the censored regression model Heckman (1976, 1979). In example 3 set Y 1 = ❳ 1 β 1 + U 1 (16a) Y 2 = ❳ 2 β 2 + U 2 (16b) where ( ❳ 1 , ❳ 2 ) are distributed independently of ( U 1 , U 2 ) , a mean zero, finite variance random vector. 56 / 125
Definitions and Some Examples of Biased Samples Conventional assumptions are invoked to ensure that if Y 1 and Y 2 can be observed, least squares applied to a random sample of data on ( Y 1 , Y 2 , ❳ 1 , ❳ 2 ) would consistently estimate β 1 and β 2 . Y ∗ 1 = Y 1 if Y 2 < 0. If Y 2 < 0 , ∆ = 1. Regression function for the selected sample is E ( Y ∗ 1 | ❳ 1 = ① 1 , Y 2 < 0) = E ( Y ∗ 1 | ❳ 1 = ① 1 , ∆ = 1) = ❳ 1 β 1 + E ( U 1 | ❳ 1 = ① 1 , ∆ = 1) (17) Regression function for the population is E ( Y 1 | ❳ 1 = ① 1 ) = ❳ 1 β 1 . (18) 57 / 125
Definitions and Some Examples of Biased Samples The conditional mean is a surrogate for the probability of selection [Pr(∆ = 1 | ① 2 )]. As Pr(∆ = 1 | x 2 ) goes to one, the problem of sample selection bias becomes negligible. In the censored regression case, a new phenomenon appears. If there are variables in ❳ 2 not in ❳ 1 , such variables may appear to be statistically important determinants of Y 1 when ordinary least squares is applied to data generated from censored samples. 58 / 125
Definitions and Some Examples of Biased Samples Example: suppose that survey statisticians use some extraneous (to X 1 ) variables to determine sample enrollment. Such variables may appear to be important determinants of Y 1 when in fact they are not. They are important determinants of Y 1 when in fact they are not. They are important determinants of Y ∗ 1 . 59 / 125
Definitions and Some Examples of Biased Samples In an analysis of self-selection, let Y 1 be the wage that a potential worker could earn were they to accept a market offer. Let Y 2 be the difference between the best non-market opportunity available to the potential worker and Y 1 . If Y 2 < 0, the agent works. The conditional expectation of observed wages ( Y ∗ 1 = Y , if Y 2 < 0) given ① 1 and ① 2 will be a non-trivial function of ① 2 . 60 / 125
Definitions and Some Examples of Biased Samples Thus variables determining non-market opportunities will determine Y ∗ 1 , even though they do not determine Y 1 . For example, the number of children less than six may appear to be significant determinants of Y 1 when inadequate account is taken of sample selection, even though the market does not place any value or penalty on small children in generating wage offers for potential workers. 61 / 125
Definitions and Some Examples of Biased Samples Example 5. Length biased sampling. Let T be the duration of an event such as a completed unemployment spell or a completed duration of a job with an employer. The population distribution of T is F ( t ) with density f ( t ). The sampling rule is such that individuals are sampled at random. Data are recorded on a completed spell provided that at the time of the interview the individual is experiencing the event. Such sampling rules are in wide use in many national surveys of employment and unemployment. 62 / 125
Definitions and Some Examples of Biased Samples In order to have a sampled completed spell, a person must be in the state at the time of the interview. Let “0” be the date of the survey. Decompose any completed spell T into a component that occurs before the survey T b and a component that occurs after the survey T a . Then T = T a + T b . For a person to be sampled, T b > 0. The density of T given T b = t b is f ( t ) f ( t | t b ) = 1 − F ( t b ) , t ≥ t b . (19) This is the hazard rate . 63 / 125
Definitions and Some Examples of Biased Samples Suppose that the environment is stationary. The population entry rate into the state at each instant of time is k . From each vintage of entrants into the state distinguished by their distance from the survey date t b , only 1 − F ( t b ) = Pr( T > t b ) survive. Aggregating over all cohorts of entrants, the population proportion in the state at the date of the interview is P where � ∞ P = k (1 − F ( t b )) dt b (20) 0 which is assumed to exist. In a duration of unemployment example, P is the unemployment rate. 64 / 125
Definitions and Some Examples of Biased Samples The density of T ∗ b , sampled presurvey duration, is b > 0) = k (1 − F ( t ∗ b )) g ( t ∗ b | t ∗ . (21) P The density of sampled completed durations is thus � t ∗ g ( t ∗ ) f ( t ∗ | t ∗ b ) g ( t ∗ b | t ∗ b > 0) dt ∗ = b 0 � t ∗ f ( t ∗ ) 1 − F ( t ∗ b ) dt ∗ = k b 1 − F ( t ∗ b ) P 0 k t ∗ f ( t ∗ ) = . P Length biased sampling . 65 / 125
Definitions and Some Examples of Biased Samples Integration by parts: � ∞ � ∞ P = k (1 − F ( z )) dz = k zdF ( z ) = kE ( T ) . 0 0 Note that g ( t ∗ ) = t ∗ f ( t ∗ ) (22) E ( T ) . We know that g ( t ∗ ). t ∗ , t ∗ > 0. Can form g ( t ∗ ) ∴ we know f ( t ∗ ) E ( T ) . =1 � �� � � ∞ 0 f ( t ∗ ) dt ∗ � ∞ t ∗ dt ∗ = g ( t ∗ ) Apply analysis of (5): . 0 E ( T ) ∴ know f ( t ∗ ). 66 / 125
Definitions and Some Examples of Biased Samples In this form (22) is equivalent to (1) with ω ( t ) = t . E ( T ). Length biased sampling. Intuitively, longer spells are oversampled when the requirement is imposed that a spell be in progress at the time the survey is conducted ( T b > 0). Suppose, instead, that individuals are randomly sampled and data are recorded on the next spell of the event (after the survey date). 67 / 125
Definitions and Some Examples of Biased Samples As long as successive spells are independent, such a sampling frame does not distort the sampled distribution because no requirement is imposed that the sampled spell be in progress at the date of the interview. It is important to notice that the source of the bias is the requirement that T b > 0 (i.e., sampled spells are in progress), not that only a fraction of the population experiences the event ( P < 1). 68 / 125
Definitions and Some Examples of Biased Samples The simple length weight ( ω ( t ) = t ) that produces (22) is an artefact of the stationarity assumption. Heckman and Singer (1986): non-stationarity and unobservables when there is selection on the event that a person be in the state at the time of the interview. They also demonstrate the bias that results from estimating parametric models on samples generated by length biased sampling rules when inadequate account is taken of the sampling plan. 69 / 125
Definitions and Some Examples of Biased Samples The probability that a spell lasts until t c given that it has lasted t b f ( t c ) f ( t c | t b ) = 1 − F ( t b ) So the density of a spell that lasts for t c is � t c g ( t c ) = f ( t c | t b ) g ( t b ) dt b 0 � t c f ( t c ) m dt b = f ( t c ) t c = m 0 70 / 125
Definitions and Some Examples of Biased Samples Likewise, the density of a spell that lasts until t a is � ∞ g ( t a ) = f ( t a + t b | t b ) g ( t b ) dt b 0 � ∞ f ( t a + t b ) = dt b m 0 � ∞ 1 = f ( t b ) dt b m t a 1 − F ( t a ) = m So the functional form of g ( t b ) = g ( t a ). Stationarity ⇒ backward and forward densities same. Mirror images. Back to the future. 71 / 125
Definitions and Some Examples of Biased Samples Some useful results that follow from this model: If f ( t ) = θ e − t θ , then g ( t b ) = θ e − t b θ and g ( t a ) = θ e − t a θ . 1 Proof : 2 θ e − t θ → m = 1 f ( t ) = θ, 1 − e − t θ → g ( t a ) = 1 − F ( t ) = θ e − t θ F ( t ) = m 72 / 125
Definitions and Some Examples of Biased Samples 2 (1 + σ 2 E ( T a ) = m m 2 ). 1 Proof : � � 1 − F ( t a ) E ( T a ) = t a f ( t a ) dt a = t a dt a m � 1 � 1 � 1 2 t 2 a (1 − F ( t a )) | ∞ 2 t 2 = 0 − a d (1 − F ( t a )) m � 1 1 a f ( t a ) dt a = 1 2 t 2 2 m [ var ( t a ) + E 2 ( t a )] = m 1 2 m [ σ 2 + m 2 ] = 73 / 125
Definitions and Some Examples of Biased Samples 2 (1 + σ 2 E ( T b ) = m m 2 ). 1 Proof : See proof of Proposition 2. 2 E ( T c ) = m (1 + σ 2 m 2 ). 3 Proof : 4 � t 2 c f ( t c ) dt c = 1 m ( var ( t c ) + E 2 ( t c )) E ( T c ) = m → E ( T c ) = 2 E ( T a ) = 2 E ( T b ) , E ( T c ) > m unless σ 2 = 0 74 / 125
Definitions and Some Examples of Biased Samples Examples 75 / 125
Definitions and Some Examples of Biased Samples Specification of the Distribution Weibull Distribution Parameters: λ > 0 , k > 0 Probability Density Function (PDF): � t � t � � k � � k − 1 λ exp − λ k k Cumulative Density Function: � t � � k � 1 − exp − k Set of Parameters: λ 1 , k 1 = 0 . 5 λ 2 , k 1 = 1 . 0 , respectively λ 3 , k 1 = 2 . 0 λ 3 , k 1 = 3 . 0 76 / 125
Definitions and Some Examples of Biased Samples Basic Distribution Graphs ��� ��� &������ ������������ ��� �� &������ ������������ 3 1 Weibull Distribution λ = 0.1, k = 0.5 0.9 Weibull Distribution λ = 0.5, k = 1.0 Weibull Distribution λ = 0.5, k = 2.0 2.5 s 0.8 Weibull Distribution λ = 1.0, k = 3.0 n o i l t l u u b b 0.7 i i r e t 2 s W i D : n 0.6 l l o u b t i u i e b W 1.5 0.5 i r t s : i s D l e l e 0.4 p h S t f e 1 o h F 0.3 t f D o C F Weibull Distribution λ = 0.1, k = 0.5 D 0.2 P 0.5 Weibull Distribution λ = 0.5, k = 1.0 Weibull Distribution λ = 0.5, k = 2.0 0.1 Weibull Distribution λ = 1.0, k = 3.0 0 0 0 0.5 1 1.5 2 0 0.5 1 1.5 2 t t 77 / 125
Definitions and Some Examples of Biased Samples Basic Duration Graphs �� ��� �������� ��� &������ ������������ !���"����� �� ��� �������� ��� &������ 10 10 Weibull Distribution λ = 0.1, k = 0.5 Weibull Distribution λ = 0.1, k = 0.5 Integrated Hazard Function of the Distribution: Weibull 9 Weibull Distribution λ = 0.5, k = 1.0 9 Weibull Distribution λ = 0.5, k = 1.0 Hazard Function of the Distribution: Weibull Weibull Distribution λ = 0.5, k = 2.0 Weibull Distribution λ = 0.5, k = 2.0 8 8 Weibull Distribution λ = 1.0, k = 3.0 Weibull Distribution λ = 1.0, k = 3.0 7 7 6 6 5 5 4 4 3 3 2 2 1 1 0 0 0 0.5 1 1.5 2 0 0.5 1 1.5 2 t t 78 / 125
Definitions and Some Examples of Biased Samples Observed and Original Distribution for T b (Example 1) 3 The Observed PDF of Spells (T b ) The Original PDF (Weibull Distribution λ = 0.1, k = 0.5) Observed (T b ) and Original PDFs of the Spells 2.5 2 1.5 1 0.5 0 0 0.5 1 1.5 2 t 79 / 125
Definitions and Some Examples of Biased Samples Observed and Original Distribution for T b (Example 2) 2.5 The Observed PDF of Spells (T b ) The Original PDF (Weibull Distribution λ = 0.5, k = 2.0) Observed (T b ) and Original PDFs of the Spells 2 1.5 1 0.5 0 0 0.5 1 1.5 t 80 / 125
Definitions and Some Examples of Biased Samples Observed and Original Distribution for T b (Example 3) 1.6 The Observed PDF of Spells (T b ) The Original PDF (Weibull Distribution λ = 1.0, k = 3.0) 1.4 Observed (T b ) and Original PDFs of the Spells 1.2 1 0.8 0.6 0.4 0.2 0 0 0.5 1 1.5 2 t 81 / 125
Definitions and Some Examples of Biased Samples Observed and Original Distribution for T c (Example 1) 3 The Observed PDF of Spells (T c ) The Original PDF (Weibull Distribution λ = 0.1, k = 0.5) s l l 2.5 e p S e h t f o 2 s F D P l a n 1.5 i g i r O d n a ) 1 T c ( d e v r e 0.5 s b O 0 0 0.5 1 1.5 2 t 82 / 125
Definitions and Some Examples of Biased Samples Observed and Original Distribution for T c (Example 2) 2 The Observed PDF of Spells (T c ) 1.8 The Original PDF (Weibull Distribution λ = 0.5, k = 1.0) s l l e p 1.6 S e h t 1.4 f o s F D 1.2 P l a n i 1 g i r O d 0.8 n a ) T c 0.6 ( d e v 0.4 r e s b O 0.2 0 0 0.5 1 1.5 2 t 83 / 125 �
Definitions and Some Examples of Biased Samples Observed and Original Distribution for T c (Example 3) 2.5 The Observed PDF of Spells (T c ) The Original PDF (Weibull Distribution λ = 0.5, k = 2.0) Observed (T c ) and Original PDFs of the Spells 2 1.5 1 0.5 0 0 0.5 1 1.5 t 84 / 125
Definitions and Some Examples of Biased Samples Observed and Original Distribution for T c (Example 4) 1.6 The Observed PDF of Spells (T c ) The Original PDF (Weibull Distribution λ = 1.0, k = 3.0) 1.4 Observed (T c ) and Original PDFs of the Spells 1.2 1 0.8 0.6 0.4 0.2 0 0 0.5 1 1.5 2 t 85 / 125
Definitions and Some Examples of Biased Samples Example 6. Choice based sampling. Let D be a discrete valued random variable which assumes a finite number of values I . Discrete choice model. D = i , i = 1 , . . . , I corresponds to the occurrence of state i . States are mutually exclusive. In the existing literature the states may be modes of transportation choice for commuters (Domencich and McFadden, 1975), occupations, migration destinations, financial solvency status of firms, schooling choices of students, etc. 86 / 125
Definitions and Some Examples of Biased Samples Interest centers on estimating a population choice model Pr( D = i | ❳ = ① ) , i = 1 , . . . , I . (23) The population density of ( D , ❳ ) is f ( d , ① ) = Pr( D = d | ❳ = ① ) h ( x ) (24) where, in this example, h ( ① ) is the density of the data. 87 / 125
Definitions and Some Examples of Biased Samples For example, interviews about transportation preferences conducted at train stations tend to over-sample train riders and under-sample bus riders. Interviews about occupational choice preferences conducted at leading universities over-sample those who select professional occupations. 88 / 125
Definitions and Some Examples of Biased Samples In choice based sampling, selection occurs solely on the D coordinate of ( D , ❳ ). In terms of (1) (extended to allow for discrete random variables), ω ( d , ❳ ) = ω ( d ). Then sampled ( D ∗ , ❳ ∗ ) has density ω ( d ∗ ) f ( d ∗ , ① ∗ ) g ( d ∗ , ① ∗ ) = . (25) I � � ω ( i ) f ( i , x ∗ )d x ∗ i =1 89 / 125
Definitions and Some Examples of Biased Samples Notice that the dominator can be simplified to I � ω ( i ) f ( i ) i =1 f ( d ∗ ) is the marginal distribution of D ∗ so that g ( d ∗ , ① ∗ ) = ω ( d ∗ ) f ( d ∗ , ① ∗ ) . (26) I � ω ( i ) f ( i ) i =1 90 / 125
Definitions and Some Examples of Biased Samples Integrating (25) with respect to ① using (26) we obtain g ( d ∗ ) = ω ( d ∗ ) f ( d ∗ ) (27) I � ω ( i ) f ( i ) i =1 Sampling rule causes the sampled proportions to deviate from the population proportions. 91 / 125
Definitions and Some Examples of Biased Samples Note further that as a consequence of sampling only on D , the population conditional density h ( ① ∗ | d ∗ ) = f ( d ∗ , x ∗ ) (28) f ( d ∗ ) can be recovered from the choice based sample. The density of x in the sample is thus I � g ( x ∗ ) = h ( x ∗ | i ) g ( i ) . (29) i =1 92 / 125
Definitions and Some Examples of Biased Samples Then using (26)-(29) we reach g ( d ∗ | x ∗ ) f ( d ∗ | x ∗ ) = (30) ω ( d ∗ ) 1 × . I I � � f ( i | x ∗ ) g ( i ) ω ( i ) f ( i ) f ( i ) i =1 i =1 The bias that results from using choice based samples to make inference about f ( d ∗ | x ∗ ) is a consequence of neglecting the terms in braces on the right-hand side of (30). 93 / 125
Definitions and Some Examples of Biased Samples Notice that if the data are generated by a random sampling rule, ω ( d ∗ ) = 1 , g ( d ∗ ) = f ( d ∗ ) and the term in braces is one. 94 / 125
Definitions and Some Examples of Biased Samples Further Discussion of Choice Based Samples 95 / 125
Definitions and Some Examples of Biased Samples Pick D first ( e.g. travel mode). Probability of selecting D is C ( D ) . f ( D , X ) is the joint density of D and X in the population. f ( D , X | θ ) = g ( D | X , θ ) h ( X ) = ϕ ( X | D ) f ( D | θ ) � f ( D | θ ) = g ( D | X , θ ) h ( X ) dX Given D we observe X (the implicit assumption is that we are sampling only on D , not on D and X ). Probability of sampled X , D is ϕ ( X | D ) C ( D ) . 96 / 125
Definitions and Some Examples of Biased Samples A fact we use later is � g ( D | X ) h ( X ) � ϕ ( X | D ) C ( D ) = C ( D ) f ( D ) g ( D | X ) h ( X ) C ( D ) = � . �� g ( D | X ) h ( X ) dX � When C ( D ) = f ( D ) = g ( D | X ) h ( X ) dX , choice based sampling is random sampling. 97 / 125
Definitions and Some Examples of Biased Samples Note, the likelihood function in an exogenous sampling scheme is I I � � L = f ( D i , X i ) = f ( D i | X i , θ ) h ( X i ) i =1 i =1 I � � ln L = ln f ( D i | X i ) + ln h ( X i ) . i =1 By exogeneity, we get the lack of dependence of distribution of X on θ. 98 / 125
Definitions and Some Examples of Biased Samples Likelihood function for a choice-based sampling scheme is I � ln L = [ln g ( D i | X i ) + ln h ( X i ) − ln f ( D i ) + ln C ( D i )] . i =1 f ( D ) depends on parameters θ. · Suppose .. Max with θ. I I ∂ ln L ∂ ln g ( D i | X i ) ∂ ln f ( D i ) � � = − . ∂θ ∂θ ∂θ i =1 i =1 � �� � source of bias We neglect the second term in forming the usual estimators using only the first term. That is the source of the inconsistency. 99 / 125
Definitions and Some Examples of Biased Samples Further Analysis of Choice Based Samples: An example in discrete choice. (c) Draw d by ϕ ( d ) . (d) Draw X by f ( X | d = 1) . Joint density of data: ϕ ( d = 1) f ( X | d = 1 , θ ) � Pr( d = 1 | X , θ ) f ( X ) � = ϕ ( d = 1) Pr( d = 1 | θ ) 100 / 125
Recommend
More recommend