Generalized sample selection model Małgorzata Wojtyś 1 , Giampiero Marra 2 1 Plymouth University, 2 University College London XLII Konferencja "Statystyka Matematyczna", Będlewo, November 29, 2016 Małgorzata Wojtyś 1 , Giampiero Marra 2 Generalized sample selection model
Plan Sample selection problem: Classical Heckman model Generalized model using GAM and copulae Estimation approach Real life application example Małgorzata Wojtyś 1 , Giampiero Marra 2 Generalized sample selection model
Motivating example Example : HIV prevalence P(HIV positive) ∼ socio-economic and health characteristics Some individuals in the sample refused to say whether they are HIV positive. They may differ in important characteristics from individ- uals who did answer the question. If the link between decision to provide an answer and being HIV pos- itive exists and is not only through observables then sample selection bias arises and univariate equation model is not appropriate. Małgorzata Wojtyś 1 , Giampiero Marra 2 Generalized sample selection model
Sample selection Regression of primary interest: i ∼ x (1) Y ∗ , i = 1 , . . . , n , i where x ( 1 ) - row vector of predictors. i But: observations on some Y ∗ are missing, based on a combination of i observed and unobserved characteristics. Małgorzata Wojtyś 1 , Giampiero Marra 2 Generalized sample selection model
Sample selection Regression of primary interest: i ∼ x (1) Y ∗ , i = 1 , . . . , n , i where x ( 1 ) - row vector of predictors. i But: observations on some Y ∗ are missing, based on a combination of i observed and unobserved characteristics. Observables: Y i = Y ∗ i U i , where U i - binary selection variable, U i ∈ { 0 , 1 } . Małgorzata Wojtyś 1 , Giampiero Marra 2 Generalized sample selection model
Sample selection Regression of primary interest: i ∼ x (1) Y ∗ , i = 1 , . . . , n , i where x ( 1 ) - row vector of predictors. i But: observations on some Y ∗ are missing, based on a combination of i observed and unobserved characteristics. Observables: Y i = Y ∗ i U i , where U i - binary selection variable, U i ∈ { 0 , 1 } . Selection mechanism: P ( U i = 1) ∼ x (2) , i where x (2) - vector of covariates. i Małgorzata Wojtyś 1 , Giampiero Marra 2 Generalized sample selection model
Classical Heckmann’s (1979) model For i = 1 , . . . , n β (1) + ε 1 i i = x (1) Y ∗ i β (2) + ε 2 i i = x (2) U ∗ i Małgorzata Wojtyś 1 , Giampiero Marra 2 Generalized sample selection model
Classical Heckmann’s (1979) model For i = 1 , . . . , n β (1) + ε 1 i i = x (1) Y ∗ i β (2) + ε 2 i i = x (2) U ∗ i where � ε 1 i �� 0 � σ 2 � � �� ρσ ∼ N , ε 2 i 0 ρσ 1 Małgorzata Wojtyś 1 , Giampiero Marra 2 Generalized sample selection model
Classical Heckmann’s (1979) model For i = 1 , . . . , n β (1) + ε 1 i i = x (1) Y ∗ i β (2) + ε 2 i i = x (2) U ∗ i where � ε 1 i �� 0 � σ 2 � � �� ρσ ∼ N , ε 2 i 0 ρσ 1 Latent variables: Y ∗ i , U ∗ i . Observables: U i = I ( U ∗ i > 0) ( ⇒ probit regression) Y i = Y ∗ i U i Małgorzata Wojtyś 1 , Giampiero Marra 2 Generalized sample selection model
Classical Heckmann’s (1979) model For i = 1 , . . . , n β (1) + ε 1 i i = x (1) Y ∗ i β (2) + ε 2 i i = x (2) U ∗ i where � ε 1 i �� 0 � σ 2 � � �� ρσ ∼ N , ε 2 i 0 ρσ 1 Latent variables: Y ∗ i , U ∗ i . Observables: U i = I ( U ∗ i > 0) ( ⇒ probit regression) Y i = Y ∗ i U i Modifications: eg. bivariate t -distribution (Marchenko & Genton, 2012), Archimedean copulas (Smith, 2003). Małgorzata Wojtyś 1 , Giampiero Marra 2 Generalized sample selection model
Generalized sample selection model Random component Y ∗ ∼ f 1 belongs to an exponential family of distributions: � y η 1 − b ( η 1 ) � f 1 ( y | η 1 , φ ) = exp + c ( y , φ ) φ for some b ( · ) and c ( · ). It holds E ( Y ∗ ) = b ′ ( η 1 ) and Var ( Y ∗ ) = b ′′ ( η 1 ). Selection variable U = I ( U ∗ > 0) and U ∗ ∼ f 2 , where − ( u − η 2 ) 2 � � f 2 ( u | η 2 ) = exp . implying the probit regression model for U . F ( y , u ) – joint cdf of ( Y ∗ , U ∗ ), F 1 ( y ), F 2 ( u ) - marginal cdf’s. C θ – the copula such that F ( y , u ) = C θ ( F 1 ( y ) , F 2 ( u )) , where θ - dependence parameter of copula. Małgorzata Wojtyś 1 , Giampiero Marra 2 Generalized sample selection model
Generalized sample selection model Random component Y ∗ ∼ f 1 belongs to an exponential family of distributions: � y η 1 − b ( η 1 ) � f 1 ( y | η 1 , φ ) = exp + c ( y , φ ) φ for some b ( · ) and c ( · ). It holds E ( Y ∗ ) = b ′ ( η 1 ) and Var ( Y ∗ ) = b ′′ ( η 1 ). Selection variable U = I ( U ∗ > 0) and U ∗ ∼ f 2 , where − ( u − η 2 ) 2 � � f 2 ( u | η 2 ) = exp . implying the probit regression model for U . F ( y , u ) – joint cdf of ( Y ∗ , U ∗ ), F 1 ( y ), F 2 ( u ) - marginal cdf’s. C θ – the copula such that F ( y , u ) = C θ ( F 1 ( y ) , F 2 ( u )) , where θ - dependence parameter of copula. Małgorzata Wojtyś 1 , Giampiero Marra 2 Generalized sample selection model
Generalized sample selection model Random component Y ∗ ∼ f 1 belongs to an exponential family of distributions: � y η 1 − b ( η 1 ) � f 1 ( y | η 1 , φ ) = exp + c ( y , φ ) φ for some b ( · ) and c ( · ). It holds E ( Y ∗ ) = b ′ ( η 1 ) and Var ( Y ∗ ) = b ′′ ( η 1 ). Selection variable U = I ( U ∗ > 0) and U ∗ ∼ f 2 , where − ( u − η 2 ) 2 � � f 2 ( u | η 2 ) = exp . implying the probit regression model for U . F ( y , u ) – joint cdf of ( Y ∗ , U ∗ ), F 1 ( y ), F 2 ( u ) - marginal cdf’s. C θ – the copula such that F ( y , u ) = C θ ( F 1 ( y ) , F 2 ( u )) , where θ - dependence parameter of copula. Małgorzata Wojtyś 1 , Giampiero Marra 2 Generalized sample selection model
Likelihood Likelihood of an observed outcome ( Y , U ): � P ( U = 0) if U = 0 , L = f Y | U ( Y | U = 1) P ( U = 1) if U = 1 , Małgorzata Wojtyś 1 , Giampiero Marra 2 Generalized sample selection model
Likelihood Likelihood of an observed outcome ( Y , U ): � P ( U = 0) if U = 0 , L = f Y | U ( Y | U = 1) P ( U = 1) if U = 1 , It holds f Y | U ( y | U = 1) = ∂ ∂ y P ( Y ≤ y | U = 1) = Małgorzata Wojtyś 1 , Giampiero Marra 2 Generalized sample selection model
Likelihood Likelihood of an observed outcome ( Y , U ): � P ( U = 0) if U = 0 , L = f Y | U ( Y | U = 1) P ( U = 1) if U = 1 , It holds f Y | U ( y | U = 1) = ∂ ∂ y P ( Y ≤ y | U = 1) = P ( Y ∗ ≤ y , U ∗ > 0) = ∂ = ∂ F 1 ( y ) − F ( y , 0) = ∂ y P ( U = 1) ∂ y P ( U = 1) Małgorzata Wojtyś 1 , Giampiero Marra 2 Generalized sample selection model
Likelihood Likelihood of an observed outcome ( Y , U ): � P ( U = 0) if U = 0 , L = f Y | U ( Y | U = 1) P ( U = 1) if U = 1 , It holds f Y | U ( y | U = 1) = ∂ ∂ y P ( Y ≤ y | U = 1) = P ( Y ∗ ≤ y , U ∗ > 0) = ∂ = ∂ F 1 ( y ) − F ( y , 0) = ∂ y P ( U = 1) ∂ y P ( U = 1) 1 � f 1 ( y ) − ∂ � = ∂ y F ( y , 0) P ( U = 1) Małgorzata Wojtyś 1 , Giampiero Marra 2 Generalized sample selection model
Likelihood Likelihood of an observed outcome ( Y , U ): � P ( U = 0) if U = 0 , L = f Y | U ( Y | U = 1) P ( U = 1) if U = 1 , It holds f Y | U ( y | U = 1) = ∂ ∂ y P ( Y ≤ y | U = 1) = P ( Y ∗ ≤ y , U ∗ > 0) = ∂ = ∂ F 1 ( y ) − F ( y , 0) = ∂ y P ( U = 1) ∂ y P ( U = 1) 1 � f 1 ( y ) − ∂ � = ∂ y F ( y , 0) P ( U = 1) So � P ( U = 0) = F 2 (0) if U = 0 , L = f 1 ( y ) − ∂ ∂ y F ( y , 0) | y = Y if U = 1 , Małgorzata Wojtyś 1 , Giampiero Marra 2 Generalized sample selection model
Log-likelihood So: � U � f 1 ( y ) − ∂ L ( Y , U ) = F 2 (0) 1 − U × ∂ y F ( y , 0) | y = Y Małgorzata Wojtyś 1 , Giampiero Marra 2 Generalized sample selection model
Log-likelihood So: � U � f 1 ( y ) − ∂ L ( Y , U ) = F 2 (0) 1 − U × ∂ y F ( y , 0) | y = Y Using copula representation, we obtain log-likelihood: ℓ = (1 − U ) log F 2 (0) + U log ( f 1 ( Y ) (1 − z ( Y , η 1 , η 2 ))) , where z ( y , η 1 , η 2 ) = ∂ � ∂ v C θ ( v , F 2 (0)) � v → F 1 ( y ) The function z can be also expressed as z ( y , η 1 , η 2 ) = P ( U = 0) f Y ∗ | U ( y | U = 0)( f 1 ( y | η 1 )) − 1 . Małgorzata Wojtyś 1 , Giampiero Marra 2 Generalized sample selection model
The fact that E ( Y ) = b ′ ( η 1 ) implies ∂ ∂η 1 z ( Y , η 1 , η 2 ) ∂ ℓ = U ( Y − µ 1 ) + U ∂η 1 1 − z ( Y , η 1 , η 2 ) where µ 1 = E ( Y ). As E ( ∂ ∂η 1 ℓ ) = 0, � ∂ � ∂η 1 z ( Y , η 1 , η 2 ) Cov ( U , Y ) = − E U 1 − z ( Y , η 1 , η 2 ) which provides another interpretation for the function z ( Y , η 1 , η 2 ). Małgorzata Wojtyś 1 , Giampiero Marra 2 Generalized sample selection model
The fact that E ( Y ) = b ′ ( η 1 ) implies ∂ ∂η 1 z ( Y , η 1 , η 2 ) ∂ ℓ = U ( Y − µ 1 ) + U ∂η 1 1 − z ( Y , η 1 , η 2 ) where µ 1 = E ( Y ). As E ( ∂ ∂η 1 ℓ ) = 0, � ∂ � ∂η 1 z ( Y , η 1 , η 2 ) Cov ( U , Y ) = − E U 1 − z ( Y , η 1 , η 2 ) which provides another interpretation for the function z ( Y , η 1 , η 2 ). Małgorzata Wojtyś 1 , Giampiero Marra 2 Generalized sample selection model
Recommend
More recommend