conditional density estimation in a censored single index
play

Conditional density estimation in a censored single-index regression - PowerPoint PPT Presentation

Conditional density estimation in a censored single-index regression model Olivier Bouaziz 1 and Olivier Lopez 2 1 Laboratoire de Statistique Thorique et Applique 2 Crest-Ensai, Irmar, and Weierstrass Institute (Berlin) International Workshop


  1. Conditional density estimation in a censored single-index regression model Olivier Bouaziz 1 and Olivier Lopez 2 1 Laboratoire de Statistique Théorique et Appliquée 2 Crest-Ensai, Irmar, and Weierstrass Institute (Berlin) International Workshop on Applied Probability Compiègne, 10-07-08 O. Bouaziz and O. Lopez Estimation in a SIM with censored data IWAP 10-07-08 1 / 24

  2. Asymptotic results for ˆ Estimation procedure θ Key ingredients of proof Simulation study and analysis on real data Introduction Standford Heart Transplant Data : Y i response variable : survival time of the patient i . X i covariate vector (age and square of age) Censored data : for some patients Y i is not observed. Possible causes : Administrative censoring Patient died of causes independent of the heart transplant ... Regression model on these data : Miller and Halpern (1982), Wei et al. (1990), Stute et al. (2000)... O. Bouaziz and O. Lopez Estimation in a SIM with censored data IWAP 10-07-08 2 / 24

  3. Asymptotic results for ˆ Estimation procedure θ Key ingredients of proof Simulation study and analysis on real data Semiparametric model Conditional density estimation of Y given X = x : f ( Y | x ) . Problem of the “curse of dimensionality”. Semiparametric model for dimension reduction. S.I.M. assumption ∃ θ 0 ∈ Θ ⊂ R d s.a. f ( y | x ) = f θ 0 ( y , x ′ θ 0 ) where f θ ( y , u ) denotes the conditional density of Y given X ′ θ = u evaluated at Y = y . O. Bouaziz and O. Lopez Estimation in a SIM with censored data IWAP 10-07-08 3 / 24

  4. Asymptotic results for ˆ Estimation procedure θ Key ingredients of proof Simulation study and analysis on real data Censored data We look at Y 1 ,..., Y n (non observed). C 1 ,..., C n censoring random variables. Observations  Z i = Y i ∧ C i 1 ≤ i ≤ n   δ i = 1 Y i ≤ C i 1 ≤ i ≤ n  X i ∈ χ ⊂ R d 1 ≤ i ≤ n .  Assumptions of Koul et al. (1981), Stute (1996), Stute (1999), Stute et al. (2000), Sellero et al. (2005)... For i = 1 ... n , P ( Y i = C i ) = 0 Y i ⊥ ⊥ C i P ( Y i ≤ C i | X i , Y i ) = P ( Y i ≤ C i | Y i ) . O. Bouaziz and O. Lopez Estimation in a SIM with censored data IWAP 10-07-08 4 / 24

  5. Asymptotic results for ˆ Estimation procedure θ Key ingredients of proof Simulation study and analysis on real data Outline Estimation procedure 1 Asymptotic results for ˆ θ 2 Key ingredients of proof 3 Simulation study and analysis on real data 4 O. Bouaziz and O. Lopez Estimation in a SIM with censored data IWAP 10-07-08 5 / 24

  6. Asymptotic results for ˆ Estimation procedure θ Key ingredients of proof Simulation study and analysis on real data Estimation procedure Assume we know f θ and define for any function J ≥ 0 , log f θ ( Y , θ ′ X ) J ( X ) � � L ( θ , J ) = E � log f θ ( y , θ ′ x ) J ( x ) dF X , Y ( x , y ) = where F X , Y ( x , y ) = P ( X ≤ x , Y ≤ y ) . Then θ 0 = argmax L ( θ , J ) . θ ∈ Θ Problems Estimation of F X , Y ( x , y ) Estimation of f θ O. Bouaziz and O. Lopez Estimation in a SIM with censored data IWAP 10-07-08 6 / 24

  7. Asymptotic results for ˆ Estimation procedure θ Key ingredients of proof Simulation study and analysis on real data Estimation procedure Assume we know f θ and define for any function J ≥ 0 , log f θ ( Y , θ ′ X ) J ( X ) � � L ( θ , J ) = E � log f θ ( y , θ ′ x ) J ( x ) dF X , Y ( x , y ) = where F X , Y ( x , y ) = P ( X ≤ x , Y ≤ y ) . Then θ 0 = argmax L ( θ , J ) . θ ∈ Θ Problems Estimation of F X , Y ( x , y ) Estimation of f θ O. Bouaziz and O. Lopez Estimation in a SIM with censored data IWAP 10-07-08 6 / 24

  8. Asymptotic results for ˆ Estimation procedure θ Key ingredients of proof Simulation study and analysis on real data Estimation of F X , Y Estimator of F X , Y Estimator of F X , Y proposed by Stute (1993) : n ˆ ∑ F ( x , y ) = δ i W in 1 Z i ≤ y , X i ≤ x i = 1 G ( Z i − )) and ˆ 1 where W in = G is the Kaplan Meier estimator of n ( 1 − ˆ G ( · ) = P ( C ≤ · ) . O. Bouaziz and O. Lopez Estimation in a SIM with censored data IWAP 10-07-08 7 / 24

  9. Asymptotic results for ˆ Estimation procedure θ Key ingredients of proof Simulation study and analysis on real data Estimation of f θ We use a nonparametric kernel smoothing estimator. Let K be a kernel and h a bandwith with classical hypotheses. Estimator of f θ � K h ( θ ′ x − θ ′ u ) K h ( z − y ) d ˆ F ( u , y ) ˆ f h θ ( z , θ ′ x ) = , � K h ( θ ′ x − θ ′ u ) d ˆ F X ( u ) where K h ( · ) = h − 1 K ( · / h ) and ˆ F X is the empirical estimator of F X . O. Bouaziz and O. Lopez Estimation in a SIM with censored data IWAP 10-07-08 8 / 24

  10. Asymptotic results for ˆ Estimation procedure θ Key ingredients of proof Simulation study and analysis on real data First estimator of θ We use the following pseudo-likelihood : Pseudo likelihood � L n ( θ , ˆ log ˆ θ ( y , θ ′ x ) J ( x ) d ˆ f h f h θ , J ) = F X , Y ( x , y ) n δ i W in log ˆ f h θ ( Z i , θ ′ X i ) J ( X i ) ∑ = i = 1 We derive the following estimator : O. Bouaziz and O. Lopez Estimation in a SIM with censored data IWAP 10-07-08 9 / 24

  11. Asymptotic results for ˆ Estimation procedure θ Key ingredients of proof Simulation study and analysis on real data First estimator of θ We use the following pseudo-likelihood : Pseudo likelihood � L n ( θ , ˆ log ˆ θ ( y , θ ′ x ) J ( x ) d ˆ f h f h θ , J ) = F X , Y ( x , y ) n δ i W in log ˆ f h θ ( Z i , θ ′ X i ) J ( X i ) ∑ = i = 1 We derive the following estimator : Estimator of θ ˆ L n ( θ , ˆ f h θ ( h ) = argmax θ , J ) . θ ∈ Θ O. Bouaziz and O. Lopez Estimation in a SIM with censored data IWAP 10-07-08 9 / 24

  12. Asymptotic results for ˆ Estimation procedure θ Key ingredients of proof Simulation study and analysis on real data First estimator of θ We use the following pseudo-likelihood : Pseudo likelihood � L n ( θ , ˆ log ˆ θ ( y , θ ′ x ) J ( x ) d ˆ f h f h θ , J ) = F X , Y ( x , y ) n δ i W in log ˆ f h θ ( Z i , θ ′ X i ) J ( X i ) ∑ = i = 1 We derive the following estimator : Estimator of θ ˆ θ (ˆ ˆ L n ( θ , ˆ h h ) = argmax f θ , J ) . θ ∈ Θ O. Bouaziz and O. Lopez Estimation in a SIM with censored data IWAP 10-07-08 9 / 24

  13. Asymptotic results for ˆ Estimation procedure θ Key ingredients of proof Simulation study and analysis on real data Adaptive choice of τ The Kaplan-Meier estimator does not behave well in the tail of the distribution. Truncation bound : we only keep observations lower than τ . SIM assumption For any τ , L ( Y | X , Y ≤ τ ) = L ( Y | X ′ θ 0 , Y ≤ τ ) How can we choose τ from the data ? Asymptotic criterion : O. Bouaziz and O. Lopez Estimation in a SIM with censored data IWAP 10-07-08 10 / 24

  14. Asymptotic results for ˆ Estimation procedure θ Key ingredients of proof Simulation study and analysis on real data Adaptive choice of τ The Kaplan-Meier estimator does not behave well in the tail of the distribution. Truncation bound : we only keep observations lower than τ . SIM assumption For any τ , L ( Y | X , Y ≤ τ ) = L ( Y | X ′ θ 0 , Y ≤ τ ) How can we choose τ from the data ? Asymptotic criterion : O. Bouaziz and O. Lopez Estimation in a SIM with censored data IWAP 10-07-08 10 / 24

  15. Asymptotic results for ˆ Estimation procedure θ Key ingredients of proof Simulation study and analysis on real data Adaptive choice of τ The Kaplan-Meier estimator does not behave well in the tail of the distribution. Truncation bound : we only keep observations lower than τ . SIM assumption For any τ , L ( Y | X , Y ≤ τ ) = L ( Y | X ′ θ 0 , Y ≤ τ ) How can we choose τ from the data ? Asymptotic criterion : O. Bouaziz and O. Lopez Estimation in a SIM with censored data IWAP 10-07-08 10 / 24

  16. Asymptotic results for ˆ Estimation procedure θ Key ingredients of proof Simulation study and analysis on real data Adaptive choice of τ The Kaplan-Meier estimator does not behave well in the tail of the distribution. Truncation bound : we only keep observations lower than τ . SIM assumption For any τ , L ( Y | X , Y ≤ τ ) = L ( Y | X ′ θ 0 , Y ≤ τ ) How can we choose τ from the data ? Asymptotic criterion : O. Bouaziz and O. Lopez Estimation in a SIM with censored data IWAP 10-07-08 10 / 24

  17. Asymptotic results for ˆ Estimation procedure θ Key ingredients of proof Simulation study and analysis on real data Adaptive choice of τ The Kaplan-Meier estimator does not behave well in the tail of the distribution. Truncation bound : we only keep observations lower than τ . SIM assumption For any τ , L ( Y | X , Y ≤ τ ) = L ( Y | X ′ θ 0 , Y ≤ τ ) How can we choose τ from the data ? Asymptotic criterion : � h τ ) − θ 0 � 2 � E 2 ( τ ) := lim � ˆ θ τ (ˆ n E O. Bouaziz and O. Lopez Estimation in a SIM with censored data IWAP 10-07-08 10 / 24

Recommend


More recommend