th ese de doctorat de l universit e pierre et marie curie
play

Th` ese de Doctorat de lUniversit e Pierre et Marie Curie - PowerPoint PPT Presentation

Th` ese de Doctorat de lUniversit e Pierre et Marie Curie Contributions ` a la Pr evision Statistique Olivier P. Faugeras Universit e Pierre et Mare Curie - Paris VI Laboratoire de Statistique Th eorique et Appliqu ee


  1. Introduction The Statistical Prediction Problem Prediction by temporal separation Prevision vs Regression Limit law of the Predictor Towards asymptotic independence Prevision versus Regression R´ egression estimation step : on the data D n := { ( X i , Y i ) , i = 0 , . . . , n } , estimate 1 r ( x ) = E [ Y | X = x ] by ˆ r ( x, D n ) prediction step : for a new ( X, Y ) , predict Y by ˆ r ( X, D n ) 2 if ( X, Y ) were independent of D n , then E [ Y | X, D n ] = E [ Y | X ] and � r ( X, D n )) 2 | X = x r ( X, D n )] 2 E θ [ r ( X ) − ˆ � ( r ( X ) − ˆ � = E θ dP X ( x ) � r ( x, D n )) 2 � � ( r ( x ) − ˆ = E θ dP X ( x ) → The Prediction error is the same as the MISE regression error. Prediction For a Markov process, ( X i , Y i ) = ( X i , X i +1 ) et ( X, Y ) = ( X T , X T +1 ) ⇒ D n not independent of X Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  2. Introduction The Statistical Prediction Problem Prediction by temporal separation Prevision vs Regression Limit law of the Predictor Towards asymptotic independence Prevision versus Regression R´ egression estimation step : on the data D n := { ( X i , Y i ) , i = 0 , . . . , n } , estimate 1 r ( x ) = E [ Y | X = x ] by ˆ r ( x, D n ) prediction step : for a new ( X, Y ) , predict Y by ˆ r ( X, D n ) 2 if ( X, Y ) were independent of D n , then E [ Y | X, D n ] = E [ Y | X ] and � r ( X, D n )) 2 | X = x r ( X, D n )] 2 E θ [ r ( X ) − ˆ � ( r ( X ) − ˆ � = E θ dP X ( x ) � r ( x, D n )) 2 � � ( r ( x ) − ˆ = E θ dP X ( x ) → The Prediction error is the same as the MISE regression error. Prediction For a Markov process, ( X i , Y i ) = ( X i , X i +1 ) et ( X, Y ) = ( X T , X T +1 ) ⇒ D n not independent of X Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  3. Introduction The Statistical Prediction Problem Prediction by temporal separation Prevision vs Regression Limit law of the Predictor Towards asymptotic independence Prevision versus Regression R´ egression estimation step : on the data D n := { ( X i , Y i ) , i = 0 , . . . , n } , estimate 1 r ( x ) = E [ Y | X = x ] by ˆ r ( x, D n ) prediction step : for a new ( X, Y ) , predict Y by ˆ r ( X, D n ) 2 if ( X, Y ) were independent of D n , then E [ Y | X, D n ] = E [ Y | X ] and � r ( X, D n )) 2 | X = x r ( X, D n )] 2 E θ [ r ( X ) − ˆ � ( r ( X ) − ˆ � = E θ dP X ( x ) � r ( x, D n )) 2 � � ( r ( x ) − ˆ = E θ dP X ( x ) → The Prediction error is the same as the MISE regression error. Prediction For a Markov process, ( X i , Y i ) = ( X i , X i +1 ) et ( X, Y ) = ( X T , X T +1 ) ⇒ D n not independent of X Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  4. Introduction The Statistical Prediction Problem Prediction by temporal separation Prevision vs Regression Limit law of the Predictor Towards asymptotic independence Prevision versus Regression R´ egression estimation step : on the data D n := { ( X i , Y i ) , i = 0 , . . . , n } , estimate 1 r ( x ) = E [ Y | X = x ] by ˆ r ( x, D n ) prediction step : for a new ( X, Y ) , predict Y by ˆ r ( X, D n ) 2 if ( X, Y ) were independent of D n , then E [ Y | X, D n ] = E [ Y | X ] and � r ( X, D n )) 2 | X = x r ( X, D n )] 2 E θ [ r ( X ) − ˆ � ( r ( X ) − ˆ � = E θ dP X ( x ) � r ( x, D n )) 2 � � ( r ( x ) − ˆ = E θ dP X ( x ) → The Prediction error is the same as the MISE regression error. Prediction For a Markov process, ( X i , Y i ) = ( X i , X i +1 ) et ( X, Y ) = ( X T , X T +1 ) ⇒ D n not independent of X Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  5. Introduction The Statistical Prediction Problem Prediction by temporal separation Prevision vs Regression Limit law of the Predictor Towards asymptotic independence Prevision versus Regression R´ egression estimation step : on the data D n := { ( X i , Y i ) , i = 0 , . . . , n } , estimate 1 r ( x ) = E [ Y | X = x ] by ˆ r ( x, D n ) prediction step : for a new ( X, Y ) , predict Y by ˆ r ( X, D n ) 2 if ( X, Y ) were independent of D n , then E [ Y | X, D n ] = E [ Y | X ] and � r ( X, D n )) 2 | X = x r ( X, D n )] 2 E θ [ r ( X ) − ˆ � ( r ( X ) − ˆ � = E θ dP X ( x ) � r ( x, D n )) 2 � � ( r ( x ) − ˆ = E θ dP X ( x ) → The Prediction error is the same as the MISE regression error. Prediction For a Markov process, ( X i , Y i ) = ( X i , X i +1 ) et ( X, Y ) = ( X T , X T +1 ) ⇒ D n not independent of X Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  6. Introduction The Statistical Prediction Problem Prediction by temporal separation Prevision vs Regression Limit law of the Predictor Towards asymptotic independence Prevision versus Regression R´ egression estimation step : on the data D n := { ( X i , Y i ) , i = 0 , . . . , n } , estimate 1 r ( x ) = E [ Y | X = x ] by ˆ r ( x, D n ) prediction step : for a new ( X, Y ) , predict Y by ˆ r ( X, D n ) 2 if ( X, Y ) were independent of D n , then E [ Y | X, D n ] = E [ Y | X ] and � r ( X, D n )) 2 | X = x r ( X, D n )] 2 E θ [ r ( X ) − ˆ � ( r ( X ) − ˆ � = E θ dP X ( x ) � r ( x, D n )) 2 � � ( r ( x ) − ˆ = E θ dP X ( x ) → The Prediction error is the same as the MISE regression error. Prediction For a Markov process, ( X i , Y i ) = ( X i , X i +1 ) et ( X, Y ) = ( X T , X T +1 ) ⇒ D n not independent of X Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  7. Introduction The Statistical Prediction Problem Prediction by temporal separation Prevision vs Regression Limit law of the Predictor Towards asymptotic independence Towards asymptotic independence Issue How to let X be independent of D n ? A solution : temporal separation Let ϕ ( T ) → ∞ and k T → ∞ such that k T − ϕ ( T ) → ∞ . Split the data ( X 0 , . . . , X T ) : estimate θ on [0 , ϕ ( T )] : ˆ θ ϕ ( T ) 1 predict on [ T − k T , T ] : ˆ θ ϕ ( T ) ( X T Y := r ˆ T − k T ) 2 by using an assumption of asymptotic independence (short memory) on the process. Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  8. Introduction The Statistical Prediction Problem Prediction by temporal separation Prevision vs Regression Limit law of the Predictor Towards asymptotic independence Towards asymptotic independence Issue How to let X be independent of D n ? A solution : temporal separation Let ϕ ( T ) → ∞ and k T → ∞ such that k T − ϕ ( T ) → ∞ . Split the data ( X 0 , . . . , X T ) : estimate θ on [0 , ϕ ( T )] : ˆ θ ϕ ( T ) 1 predict on [ T − k T , T ] : ˆ θ ϕ ( T ) ( X T Y := r ˆ T − k T ) 2 by using an assumption of asymptotic independence (short memory) on the process. Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  9. Model Introduction Statistical Prediction and assumptions Prediction by temporal separation Results : Consistency of the predictor Limit law of the Predictor Example Outline Introduction 1 The Statistical Prediction Problem Prevision vs Regression Towards asymptotic independence Prediction by temporal separation 2 Model Statistical Prediction and assumptions Results : Consistency of the predictor Example Limit law of the Predictor 3 Assumptions Result : Limit law of the predictor Conclusions Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  10. Model Introduction Statistical Prediction and assumptions Prediction by temporal separation Results : Consistency of the predictor Limit law of the Predictor Example Some notions on α -mixing Definition : α -mixing coefficients, Rosenblatt [1956] Let (Ω , A , P ) a probability space and B , C two sub-sigma fields of A . The α -mixing coefficient between B and C is defined by α ( B , C ) = sup | P ( B ∩ C ) − P ( B ) P ( C ) | B ∈B C ∈C and the α -mixing coefficient of order k for the stochastic process X = { X t , t ∈ N } defined on the probability space (Ω , A , P ) as α ( σ ( X s , s ≤ t ) , σ ( X s , s ≥ t + k )) α ( k ) = sup t ∈ N Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  11. Model Introduction Statistical Prediction and assumptions Prediction by temporal separation Results : Consistency of the predictor Limit law of the Predictor Example Model Let X = ( X t , t ∈ N ) a stochastic process. We assume that : X is a second order, square integrable, α -mixing process. 1 the regression function r θ ( . ) depends approximately of the last k T values 2 ( X T − i , i = 1 , . . . , k T ) : k T � � � X ∗ � X T � T +1 := E θ X T +1 := r i ( X T − i , θ ) + η k T ( X , θ ) . � 0 i =0 Assumptions H 0 on the process T →∞ E θ ( η 2 (i) lim k T ( X , θ )) = 0 ; (ii) for all i ∈ N , � r i ( X T − i , θ 1 ) − r i ( X T − i , θ 2 ) � ≤ H i ( X T − i ) � θ 1 − θ 2 � , ∀ θ 1 , θ 2 ; � 1 /r < ∞ . E θ H 2 r � (iii) there exists a r > 1 such that sup i ( X T − i ) i ∈ N This additive model is an extension of a model studied by Bosq [2007]. Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  12. Model Introduction Statistical Prediction and assumptions Prediction by temporal separation Results : Consistency of the predictor Limit law of the Predictor Example Statistical Prediction and assumptions We assume we have an estimator ˆ θ T of θ . Assumptions H 1 on the estimator ˆ θ T θ T − θ ) 2 < ∞ ; T.E θ (ˆ (i) lim sup T →∞ θ T − θ ) 2 q < ∞ . T q E (ˆ (ii) there exists q > 1 such that lim sup T →∞ We build a statistical predictor : ˆ X T +1 := � k T i =0 r i ( X T − i , ˆ θ ϕ ( T ) ) Assumptions H 2 on the coefficients k 2 (i) T →∞ 0 ; → T ϕ ( T ) (ii) ( T − k T − ϕ ( T )) → T →∞ ∞ . Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  13. Model Introduction Statistical Prediction and assumptions Prediction by temporal separation Results : Consistency of the predictor Limit law of the Predictor Example Statistical Prediction and assumptions We assume we have an estimator ˆ θ T of θ . Assumptions H 1 on the estimator ˆ θ T θ T − θ ) 2 < ∞ ; T.E θ (ˆ (i) lim sup T →∞ θ T − θ ) 2 q < ∞ . T q E (ˆ (ii) there exists q > 1 such that lim sup T →∞ We build a statistical predictor : ˆ X T +1 := � k T i =0 r i ( X T − i , ˆ θ ϕ ( T ) ) Assumptions H 2 on the coefficients k 2 (i) T →∞ 0 ; → T ϕ ( T ) (ii) ( T − k T − ϕ ( T )) → T →∞ ∞ . Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  14. Model Introduction Statistical Prediction and assumptions Prediction by temporal separation Results : Consistency of the predictor Limit law of the Predictor Example Consistency of the predictor Theorem 2.5 Under the assumptions H 0 , H 1 , H 2 , we have that T +1 ) 2 = 0 E θ ( ˆ X T +1 − X ∗ lim sup T →∞ Tool : Davydov’s covariance inequality Let X ∈ L q ( P ) and Y ∈ L r ( P ) , if q > 1 , r > 1 and 1 r + 1 q = 1 − 1 p , then � 1 p � X � q � Y � r . | Cov ( X, Y ) | ≤ 2 p � 2 α ( σ ( X ) , σ ( Y )) Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  15. Model Introduction Statistical Prediction and assumptions Prediction by temporal separation Results : Consistency of the predictor Limit law of the Predictor Example Consistency of the predictor Theorem 2.5 Under the assumptions H 0 , H 1 , H 2 , we have that T +1 ) 2 = 0 E θ ( ˆ X T +1 − X ∗ lim sup T →∞ Tool : Davydov’s covariance inequality Let X ∈ L q ( P ) and Y ∈ L r ( P ) , if q > 1 , r > 1 and 1 r + 1 q = 1 − 1 p , then � 1 p � X � q � Y � r . | Cov ( X, Y ) | ≤ 2 p � 2 α ( σ ( X ) , σ ( Y )) Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  16. Model Introduction Statistical Prediction and assumptions Prediction by temporal separation Results : Consistency of the predictor Limit law of the Predictor Example Example of process For a linear, weakly stationary, centered, non deterministic, inversible process in discrete time, its Wold decomposition writes: k T � � X T = e T + ϕ i ( θ ) X T − i + ϕ i ( θ ) X T − i i =1 i>k T ∞ ϕ 2 � � with i ( θ ) < ∞ . Set η k T ( X , θ ) = ϕ i ( θ ) X T +1 − i i =1 i>k T +1 Proposition If X verifies the assumptions ∀ i , ϕ i is differentiable and � ϕ ′ i ( . ) � ∞ < ∞ ; 1 there exists a r > 1 such as ( X t ) has a moment of order 2 r ; 2 ϕ i +1 ( θ ) ϕ j +1 ( θ ) α 1 /p ( | i − j | ) < ∞ . X is α -mixing and such that � 3 i,j Then, X verifies the assumptions of theorem 2.5. Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  17. Introduction Assumptions Prediction by temporal separation Result : Limit law of the predictor Limit law of the Predictor Conclusions Outline Introduction 1 The Statistical Prediction Problem Prevision vs Regression Towards asymptotic independence Prediction by temporal separation 2 Model Statistical Prediction and assumptions Results : Consistency of the predictor Example Limit law of the Predictor 3 Assumptions Result : Limit law of the predictor Conclusions Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  18. Introduction Assumptions Prediction by temporal separation Result : Limit law of the predictor Limit law of the Predictor Conclusions Assumptions for the limit law Assumptions H ′ 0 on the process (i) θ �→ r i ( X T − i , θ ) is twice differentiable w.r.t. θ ; � ∂ 2 � � (ii) sup θ r i ( X T − i , . ) ∞ = O P (1); � i �� � 1 (iii) η k T ( X , θ ) = o P ; ϕ ( T ) + ∞ (iv) � ∂ θ r i ( X T − i ; θ ) exists and converge a. s. to a vector V as T → + ∞ . i =0 1 on the estimator ˆ Assumption H ′ θ T √ T (ˆ L � N (0 , σ 2 ( θ )) . θ T − θ ) (i) Assumption H ′ 2 on the coefficients � (i) k T = o ( ϕ ( T )) ; (ii) ( T − k T − ϕ ( T )) → T →∞ ∞ . Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  19. Introduction Assumptions Prediction by temporal separation Result : Limit law of the predictor Limit law of the Predictor Conclusions Limit law of the predictor Theorem 2.10 If the assumptions H ′ 0 , H ′ 1 , H ′ 2 are verified, then ϕ ( T )( ˆ L � X T +1 − X ∗ T +1 ) � < U, V > where U and V are two independent random variables, U with law N (0 , σ 2 ( θ )) + ∞ and V is the limit of � ∂ θ r i ( X T − i ; θ ) as T → ∞ i =0 Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  20. Introduction Assumptions Prediction by temporal separation Result : Limit law of the predictor Limit law of the Predictor Conclusions Tool An asymptotic independence lemma Let ( X ′ n ) and ( X ′′ n ) two sequences of real-valued random variables with laws P ′ n and P ′′ n respectively, defined on the probability space (Ω , A , P ) . Assume that ( X ′ n ) and ( X ′′ n ) are asymptotically mixing w.r.t. each other, in the sense that there exists a sequence of coefficients α ( n ) with α ( n ) → n →∞ 0 such that, for all Borel set A and B of R , � P ( X ′ n ∈ A, X ′′ n ∈ B ) − P ( X ′ n ∈ A ) P ( X ′′ � ≤ α ( n ) � � n ∈ B ) Then, if � X ′ with law P ′ ; L X ′ 1 n � X ′′ with law P ′′ ; L X ′′ 2 n � ( X ′ , X ′′ ) , and the law ( X ′ , X ′′ ) is P ′ ⊗ P ′′ . L ( X ′ n , X ′′ n ) Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  21. Introduction Assumptions Prediction by temporal separation Result : Limit law of the predictor Limit law of the Predictor Conclusions Conclusions Some limits of the temporal decoupling method heuristically under-efficient : gap in the data ; 1 the mixing coefficients = a real number which reduces the dependence 2 structure of the process to a property of asymptotic independence ; practical applications are difficult to undertake. 3 References Faugeras, O. (2007) Pr´ evision statistique param´ etrique par s´ eparation temporelle. Accepted to Annales de l’ISUP. Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  22. Introduction Assumptions Prediction by temporal separation Result : Limit law of the predictor Limit law of the Predictor Conclusions Part II : A nonparametric quantile-copula approach to conditional density estimation. Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  23. Introduction The Quantile-Copula estimator Why estimating the conditional density? Asymptotic results Two classical approaches for estimation Comparison with competitors The trouble with ratio shaped estimators Application to prediction and discussions Summary and conclusions Outline Introduction 4 Why estimating the conditional density? Two classical approaches for estimation The trouble with ratio shaped estimators The Quantile-Copula estimator 5 The quantile transform The copula representation A product shaped estimator Asymptotic results 6 Consistency and asymptotic normality Sketch of the proofs Comparison with competitors 7 Theoretical comparison Finite sample simulation Application to prediction and discussions 8 Application to prediction Discussions Summary and conclusions 9 Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  24. Introduction The Quantile-Copula estimator Why estimating the conditional density? Asymptotic results Two classical approaches for estimation Comparison with competitors The trouble with ratio shaped estimators Application to prediction and discussions Summary and conclusions Setup and Motivation Objective observe a sample (( X i , Y i ); i = 1 , . . . , n ) i.i.d. of ( X, Y ) . predict the output Y for an input X at location x with minimal assumptions on the law of ( X, Y ) (Nonparametric setup). Notation ( X, Y ) → joint c.d.f F X,Y , joint density f X,Y ; X → c.d.f. F , density f ; Y → c.d.f. G , density g . Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  25. Introduction The Quantile-Copula estimator Why estimating the conditional density? Asymptotic results Two classical approaches for estimation Comparison with competitors The trouble with ratio shaped estimators Application to prediction and discussions Summary and conclusions Setup and Motivation Objective observe a sample (( X i , Y i ); i = 1 , . . . , n ) i.i.d. of ( X, Y ) . predict the output Y for an input X at location x with minimal assumptions on the law of ( X, Y ) (Nonparametric setup). Notation ( X, Y ) → joint c.d.f F X,Y , joint density f X,Y ; X → c.d.f. F , density f ; Y → c.d.f. G , density g . Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  26. Introduction The Quantile-Copula estimator Why estimating the conditional density? Asymptotic results Two classical approaches for estimation Comparison with competitors The trouble with ratio shaped estimators Application to prediction and discussions Summary and conclusions Setup and Motivation Objective observe a sample (( X i , Y i ); i = 1 , . . . , n ) i.i.d. of ( X, Y ) . predict the output Y for an input X at location x with minimal assumptions on the law of ( X, Y ) (Nonparametric setup). Notation ( X, Y ) → joint c.d.f F X,Y , joint density f X,Y ; X → c.d.f. F , density f ; Y → c.d.f. G , density g . Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  27. Introduction The Quantile-Copula estimator Why estimating the conditional density? Asymptotic results Two classical approaches for estimation Comparison with competitors The trouble with ratio shaped estimators Application to prediction and discussions Summary and conclusions Why estimating the conditional density ? What is a good prediction ? Classical approach ( L 2 theory): the conditional mean or regression 1 function r ( x ) = E ( Y | X = x ) , Fully informative approach: the conditional density f ( y | x ) 2 Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  28. Introduction The Quantile-Copula estimator Why estimating the conditional density? Asymptotic results Two classical approaches for estimation Comparison with competitors The trouble with ratio shaped estimators Application to prediction and discussions Summary and conclusions Why estimating the conditional density ? What is a good prediction ? Classical approach ( L 2 theory): the conditional mean or regression 1 function r ( x ) = E ( Y | X = x ) , Fully informative approach: the conditional density f ( y | x ) 2 Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  29. Introduction The Quantile-Copula estimator Why estimating the conditional density? Asymptotic results Two classical approaches for estimation Comparison with competitors The trouble with ratio shaped estimators Application to prediction and discussions Summary and conclusions Why estimating the conditional density ? What is a good prediction ? Classical approach ( L 2 theory): the conditional mean or regression 1 function r ( x ) = E ( Y | X = x ) , Fully informative approach: the conditional density f ( y | x ) 2 Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  30. Introduction The Quantile-Copula estimator Why estimating the conditional density? Asymptotic results Two classical approaches for estimation Comparison with competitors The trouble with ratio shaped estimators Application to prediction and discussions Summary and conclusions Why estimating the conditional density ? What is a good prediction ? Classical approach ( L 2 theory): the conditional mean or regression 1 function r ( x ) = E ( Y | X = x ) , Fully informative approach: the conditional density f ( y | x ) 2 Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  31. Introduction The Quantile-Copula estimator Why estimating the conditional density? Asymptotic results Two classical approaches for estimation Comparison with competitors The trouble with ratio shaped estimators Application to prediction and discussions Summary and conclusions Why estimating the conditional density ? What is a good prediction ? Classical approach ( L 2 theory): the conditional mean or regression 1 function r ( x ) = E ( Y | X = x ) , Fully informative approach: the conditional density f ( y | x ) 2 Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  32. Introduction The Quantile-Copula estimator Why estimating the conditional density? Asymptotic results Two classical approaches for estimation Comparison with competitors The trouble with ratio shaped estimators Application to prediction and discussions Summary and conclusions Estimating the conditional density - 1 A first density -based approach ˆ f ( y | x ) = f X,Y ( x, y ) f X,Y ( x, y ) ← f ( x ) ˆ f ( x ) f X,Y , ˆ ˆ f : Parzen-Rosenblatt kernel estimators with kernels K , K ′ , bandwidths h and h ′ . The double kernel estimator n K ′ � h ′ ( X i − x ) K h ( Y i − y ) ˆ i =1 f ( y | x ) = → ratio shaped n K ′ � h ′ ( X i − x ) i =1 Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  33. Introduction The Quantile-Copula estimator Why estimating the conditional density? Asymptotic results Two classical approaches for estimation Comparison with competitors The trouble with ratio shaped estimators Application to prediction and discussions Summary and conclusions Estimating the conditional density - 1 A first density -based approach ˆ f ( y | x ) = f X,Y ( x, y ) f X,Y ( x, y ) ← f ( x ) ˆ f ( x ) f X,Y , ˆ ˆ f : Parzen-Rosenblatt kernel estimators with kernels K , K ′ , bandwidths h and h ′ . The double kernel estimator n K ′ � h ′ ( X i − x ) K h ( Y i − y ) ˆ i =1 f ( y | x ) = → ratio shaped n K ′ � h ′ ( X i − x ) i =1 Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  34. Introduction The Quantile-Copula estimator Why estimating the conditional density? Asymptotic results Two classical approaches for estimation Comparison with competitors The trouble with ratio shaped estimators Application to prediction and discussions Summary and conclusions Estimating the conditional density - 1 A first density -based approach ˆ f ( y | x ) = f X,Y ( x, y ) f X,Y ( x, y ) ← f ( x ) ˆ f ( x ) f X,Y , ˆ ˆ f : Parzen-Rosenblatt kernel estimators with kernels K , K ′ , bandwidths h and h ′ . The double kernel estimator n K ′ � h ′ ( X i − x ) K h ( Y i − y ) ˆ i =1 f ( y | x ) = → ratio shaped n K ′ � h ′ ( X i − x ) i =1 Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  35. Introduction The Quantile-Copula estimator Why estimating the conditional density? Asymptotic results Two classical approaches for estimation Comparison with competitors The trouble with ratio shaped estimators Application to prediction and discussions Summary and conclusions Estimating the conditional density - 1 A first density -based approach ˆ f ( y | x ) = f X,Y ( x, y ) f X,Y ( x, y ) ← f ( x ) ˆ f ( x ) f X,Y , ˆ ˆ f : Parzen-Rosenblatt kernel estimators with kernels K , K ′ , bandwidths h and h ′ . The double kernel estimator n K ′ � h ′ ( X i − x ) K h ( Y i − y ) ˆ i =1 f ( y | x ) = → ratio shaped n K ′ � h ′ ( X i − x ) i =1 Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  36. Introduction The Quantile-Copula estimator Why estimating the conditional density? Asymptotic results Two classical approaches for estimation Comparison with competitors The trouble with ratio shaped estimators Application to prediction and discussions Summary and conclusions Estimating the conditional density - 2 A regression strategy � � Fact: E 1 | Y − y |≤ h | X = x = F ( y + h | x ) − F ( y − h | x ) ≈ 2 h.f ( y | x ) Conditional density estimation problem → a regression framework Transform the data: 1 Y i → Y ′ i := (2 h ) − 1 1 | Y i − y |≤ h Y i → Y ′ i := K h ( Y i − y ) smoothed version Perform a nonparametric regression of Y ′ i on X i s by local averaging 2 methods (Nadaraya-Watson, local polynomial, orthogonal series,...) Nadaraya-Watson estimator n K ′ � h ′ ( X i − x ) K h ( Y i − y ) ˆ i =1 f ( y | x ) = → (same) ratio shape. n � K ′ h ′ ( X i − x ) i =1 Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  37. Introduction The Quantile-Copula estimator Why estimating the conditional density? Asymptotic results Two classical approaches for estimation Comparison with competitors The trouble with ratio shaped estimators Application to prediction and discussions Summary and conclusions Estimating the conditional density - 2 A regression strategy � � Fact: E 1 | Y − y |≤ h | X = x = F ( y + h | x ) − F ( y − h | x ) ≈ 2 h.f ( y | x ) Conditional density estimation problem → a regression framework Transform the data: 1 Y i → Y ′ i := (2 h ) − 1 1 | Y i − y |≤ h Y i → Y ′ i := K h ( Y i − y ) smoothed version Perform a nonparametric regression of Y ′ i on X i s by local averaging 2 methods (Nadaraya-Watson, local polynomial, orthogonal series,...) Nadaraya-Watson estimator n K ′ � h ′ ( X i − x ) K h ( Y i − y ) ˆ i =1 f ( y | x ) = → (same) ratio shape. n � K ′ h ′ ( X i − x ) i =1 Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  38. Introduction The Quantile-Copula estimator Why estimating the conditional density? Asymptotic results Two classical approaches for estimation Comparison with competitors The trouble with ratio shaped estimators Application to prediction and discussions Summary and conclusions Estimating the conditional density - 2 A regression strategy � � Fact: E 1 | Y − y |≤ h | X = x = F ( y + h | x ) − F ( y − h | x ) ≈ 2 h.f ( y | x ) Conditional density estimation problem → a regression framework Transform the data: 1 Y i → Y ′ i := (2 h ) − 1 1 | Y i − y |≤ h Y i → Y ′ i := K h ( Y i − y ) smoothed version Perform a nonparametric regression of Y ′ i on X i s by local averaging 2 methods (Nadaraya-Watson, local polynomial, orthogonal series,...) Nadaraya-Watson estimator n K ′ � h ′ ( X i − x ) K h ( Y i − y ) ˆ i =1 f ( y | x ) = → (same) ratio shape. n � K ′ h ′ ( X i − x ) i =1 Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  39. Introduction The Quantile-Copula estimator Why estimating the conditional density? Asymptotic results Two classical approaches for estimation Comparison with competitors The trouble with ratio shaped estimators Application to prediction and discussions Summary and conclusions Ratio shaped estimators Bibliography Double kernel estimator: Rosenblatt [1969], Roussas [1969], Stute [1986], 1 Hyndman, Bashtannyk and Grunwald [1996]; Local Polynomial: Fan, Yao and Tong [1996], Fan and Yao [2005]; 2 Local parametric and constrained local polynomial: Hyndman and Yao 3 [2002]; Rojas, Genovese, Wasserman [2009]; Partitioning type estimate: Gy¨ orfi and Kohler [2007]; 4 Projection type estimate: Lacour [2007]. 5 Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  40. Introduction The Quantile-Copula estimator Why estimating the conditional density? Asymptotic results Two classical approaches for estimation Comparison with competitors The trouble with ratio shaped estimators Application to prediction and discussions Summary and conclusions The trouble with ratio shaped estimators Drawbacks quotient shape of estimator is tricky to study; explosive behavior when the denominator is small → numerical implementation delicate (trimming); minoration hypothesis on the marginal density f ( x ) ≥ c > 0 . How to remedy these problems? → build on the idea of using synthetic data: find a representation of the data more adapted to the problem. Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  41. Introduction The Quantile-Copula estimator Why estimating the conditional density? Asymptotic results Two classical approaches for estimation Comparison with competitors The trouble with ratio shaped estimators Application to prediction and discussions Summary and conclusions The trouble with ratio shaped estimators Drawbacks quotient shape of estimator is tricky to study; explosive behavior when the denominator is small → numerical implementation delicate (trimming); minoration hypothesis on the marginal density f ( x ) ≥ c > 0 . How to remedy these problems? → build on the idea of using synthetic data: find a representation of the data more adapted to the problem. Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  42. Introduction The Quantile-Copula estimator The quantile transform Asymptotic results The copula representation Comparison with competitors A product shaped estimator Application to prediction and discussions Summary and conclusions Outline Introduction 4 Why estimating the conditional density? Two classical approaches for estimation The trouble with ratio shaped estimators The Quantile-Copula estimator 5 The quantile transform The copula representation A product shaped estimator Asymptotic results 6 Consistency and asymptotic normality Sketch of the proofs Comparison with competitors 7 Theoretical comparison Finite sample simulation Application to prediction and discussions 8 Application to prediction Discussions Summary and conclusions 9 Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  43. Introduction The Quantile-Copula estimator The quantile transform Asymptotic results The copula representation Comparison with competitors A product shaped estimator Application to prediction and discussions Summary and conclusions The quantile transform What is the “best” transformation of the data in that context ? The quantile transform theorem when F is arbitrary, if U is a uniformly distributed random variable on d = F − 1 ( U ) ; (0 , 1) , X whenever F is continuous, the random variable U = F ( X ) is uniformly distributed on (0 , 1) . → use the invariance property of the quantile transform to construct a pseudo-sample ( U i , V i ) with a prescribed uniform marginal distribution. ( X 1 , . . . , X n ) ( Y 1 , . . . , Y n ) ↓ ↓ ( U 1 = F ( X 1 ) , . . . , U n = F ( X n )) ( V 1 = G ( Y 1 ) , . . . , V n = G ( Y n )) Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  44. Introduction The Quantile-Copula estimator The quantile transform Asymptotic results The copula representation Comparison with competitors A product shaped estimator Application to prediction and discussions Summary and conclusions The quantile transform What is the “best” transformation of the data in that context ? The quantile transform theorem when F is arbitrary, if U is a uniformly distributed random variable on d = F − 1 ( U ) ; (0 , 1) , X whenever F is continuous, the random variable U = F ( X ) is uniformly distributed on (0 , 1) . → use the invariance property of the quantile transform to construct a pseudo-sample ( U i , V i ) with a prescribed uniform marginal distribution. ( X 1 , . . . , X n ) ( Y 1 , . . . , Y n ) ↓ ↓ ( U 1 = F ( X 1 ) , . . . , U n = F ( X n )) ( V 1 = G ( Y 1 ) , . . . , V n = G ( Y n )) Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  45. Introduction The Quantile-Copula estimator The quantile transform Asymptotic results The copula representation Comparison with competitors A product shaped estimator Application to prediction and discussions Summary and conclusions The quantile transform What is the “best” transformation of the data in that context ? The quantile transform theorem when F is arbitrary, if U is a uniformly distributed random variable on d = F − 1 ( U ) ; (0 , 1) , X whenever F is continuous, the random variable U = F ( X ) is uniformly distributed on (0 , 1) . → use the invariance property of the quantile transform to construct a pseudo-sample ( U i , V i ) with a prescribed uniform marginal distribution. ( X 1 , . . . , X n ) ( Y 1 , . . . , Y n ) ↓ ↓ ( U 1 = F ( X 1 ) , . . . , U n = F ( X n )) ( V 1 = G ( Y 1 ) , . . . , V n = G ( Y n )) Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  46. Introduction The Quantile-Copula estimator The quantile transform Asymptotic results The copula representation Comparison with competitors A product shaped estimator Application to prediction and discussions Summary and conclusions The copula representation → leads naturally to the copula function: Sklar’s theorem [1959] For any bivariate cumulative distribution function F X,Y on R 2 , with marginal c.d.f. F of X and G of Y , there exists some function C : [0 , 1] 2 → [0 , 1] , called the dependence or copula function, such as F X,Y ( x, y ) = C ( F ( x ) , G ( y )) , − ∞ ≤ x, y ≤ + ∞ . If F and G are continuous, this representation is unique with respect to ( F, G ) . The copula function C is itself a c.d.f. on [0 , 1] 2 with uniform marginals. → captures the dependence structure of the vector ( X, Y ) , irrespectively of the marginals. → allows to deal with the randomness of the dependence structure and the randomness of the marginals separately . Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  47. Introduction The Quantile-Copula estimator The quantile transform Asymptotic results The copula representation Comparison with competitors A product shaped estimator Application to prediction and discussions Summary and conclusions The copula representation → leads naturally to the copula function: Sklar’s theorem [1959] For any bivariate cumulative distribution function F X,Y on R 2 , with marginal c.d.f. F of X and G of Y , there exists some function C : [0 , 1] 2 → [0 , 1] , called the dependence or copula function, such as F X,Y ( x, y ) = C ( F ( x ) , G ( y )) , − ∞ ≤ x, y ≤ + ∞ . If F and G are continuous, this representation is unique with respect to ( F, G ) . The copula function C is itself a c.d.f. on [0 , 1] 2 with uniform marginals. → captures the dependence structure of the vector ( X, Y ) , irrespectively of the marginals. → allows to deal with the randomness of the dependence structure and the randomness of the marginals separately . Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  48. Introduction The Quantile-Copula estimator The quantile transform Asymptotic results The copula representation Comparison with competitors A product shaped estimator Application to prediction and discussions Summary and conclusions The copula representation → leads naturally to the copula function: Sklar’s theorem [1959] For any bivariate cumulative distribution function F X,Y on R 2 , with marginal c.d.f. F of X and G of Y , there exists some function C : [0 , 1] 2 → [0 , 1] , called the dependence or copula function, such as F X,Y ( x, y ) = C ( F ( x ) , G ( y )) , − ∞ ≤ x, y ≤ + ∞ . If F and G are continuous, this representation is unique with respect to ( F, G ) . The copula function C is itself a c.d.f. on [0 , 1] 2 with uniform marginals. → captures the dependence structure of the vector ( X, Y ) , irrespectively of the marginals. → allows to deal with the randomness of the dependence structure and the randomness of the marginals separately . Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  49. Introduction The Quantile-Copula estimator The quantile transform Asymptotic results The copula representation Comparison with competitors A product shaped estimator Application to prediction and discussions Summary and conclusions A product shaped estimator Assume that the copula function C ( u, v ) has a density c ( u, v ) = ∂ 2 C ( u,v ) ∂u∂v i.e. c ( u, v ) is the density of the transformed r.v. ( U, V ) = ( F ( X ) , G ( Y )) . A product form of the conditional density By differentiating Sklar’s formula, f Y | X ( y | x ) = f XY ( x, y ) = g ( y ) c ( F ( x ) , G ( y )) f ( x ) A product shaped estimator ˆ f Y | X ( y | x ) = ˆ g n ( y )ˆ c n ( F n ( x ) , G n ( y )) Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  50. Introduction The Quantile-Copula estimator The quantile transform Asymptotic results The copula representation Comparison with competitors A product shaped estimator Application to prediction and discussions Summary and conclusions A product shaped estimator Assume that the copula function C ( u, v ) has a density c ( u, v ) = ∂ 2 C ( u,v ) ∂u∂v i.e. c ( u, v ) is the density of the transformed r.v. ( U, V ) = ( F ( X ) , G ( Y )) . A product form of the conditional density By differentiating Sklar’s formula, f Y | X ( y | x ) = f XY ( x, y ) = g ( y ) c ( F ( x ) , G ( y )) f ( x ) A product shaped estimator ˆ f Y | X ( y | x ) = ˆ g n ( y )ˆ c n ( F n ( x ) , G n ( y )) Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  51. Introduction The Quantile-Copula estimator The quantile transform Asymptotic results The copula representation Comparison with competitors A product shaped estimator Application to prediction and discussions Summary and conclusions Construction of the estimator - 1 → get an estimator of the conditional density by plugging estimators of each quantities. n � � y − Y i 1 density of Y : g ← kernel estimator ˆ g n ( y ) := � K 0 nh n h n i =1 n F n ( x ) = 1 � F ( x ) ← 1 X j � x n j =1 n c.d.f. empirical c.d.f. G n ( y ) := 1 � G ( y ) ← 1 Y j � y n j =1 copula density c ( u, v ) ← c n ( u, v ) a bivariate Parzen-Rosenblatt kernel density ( pseudo ) estimator � u − U i n 1 , v − V i � � c n ( u, v ) := K (1) na 2 a n a n n i =1 with kernel K ( u, v ) = K 1 ( u ) K 2 ( v ) , and bandwidths a n . Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  52. Introduction The Quantile-Copula estimator The quantile transform Asymptotic results The copula representation Comparison with competitors A product shaped estimator Application to prediction and discussions Summary and conclusions Construction of the estimator - 1 → get an estimator of the conditional density by plugging estimators of each quantities. n � � y − Y i 1 density of Y : g ← kernel estimator ˆ g n ( y ) := � K 0 nh n h n i =1 n F n ( x ) = 1 � F ( x ) ← 1 X j � x n j =1 n c.d.f. empirical c.d.f. G n ( y ) := 1 � G ( y ) ← 1 Y j � y n j =1 copula density c ( u, v ) ← c n ( u, v ) a bivariate Parzen-Rosenblatt kernel density ( pseudo ) estimator � u − U i n 1 , v − V i � � c n ( u, v ) := K (1) na 2 a n a n n i =1 with kernel K ( u, v ) = K 1 ( u ) K 2 ( v ) , and bandwidths a n . Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  53. Introduction The Quantile-Copula estimator The quantile transform Asymptotic results The copula representation Comparison with competitors A product shaped estimator Application to prediction and discussions Summary and conclusions Construction of the estimator - 1 → get an estimator of the conditional density by plugging estimators of each quantities. n � � y − Y i 1 density of Y : g ← kernel estimator ˆ g n ( y ) := � K 0 nh n h n i =1 n F n ( x ) = 1 � F ( x ) ← 1 X j � x n j =1 n c.d.f. empirical c.d.f. G n ( y ) := 1 � G ( y ) ← 1 Y j � y n j =1 copula density c ( u, v ) ← c n ( u, v ) a bivariate Parzen-Rosenblatt kernel density ( pseudo ) estimator � u − U i n 1 , v − V i � � c n ( u, v ) := K (1) na 2 a n a n n i =1 with kernel K ( u, v ) = K 1 ( u ) K 2 ( v ) , and bandwidths a n . Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  54. Introduction The Quantile-Copula estimator The quantile transform Asymptotic results The copula representation Comparison with competitors A product shaped estimator Application to prediction and discussions Summary and conclusions Construction of the estimator - 1 → get an estimator of the conditional density by plugging estimators of each quantities. n � � y − Y i 1 density of Y : g ← kernel estimator ˆ g n ( y ) := � K 0 nh n h n i =1 n F n ( x ) = 1 � F ( x ) ← 1 X j � x n j =1 n c.d.f. empirical c.d.f. G n ( y ) := 1 � G ( y ) ← 1 Y j � y n j =1 copula density c ( u, v ) ← c n ( u, v ) a bivariate Parzen-Rosenblatt kernel density ( pseudo ) estimator � u − U i n 1 , v − V i � � c n ( u, v ) := K (1) na 2 a n a n n i =1 with kernel K ( u, v ) = K 1 ( u ) K 2 ( v ) , and bandwidths a n . Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  55. Introduction The Quantile-Copula estimator The quantile transform Asymptotic results The copula representation Comparison with competitors A product shaped estimator Application to prediction and discussions Summary and conclusions Construction of the estimator - 2 But, F and G are unknown: the random variables ( U i = F ( X i ) , V i = G ( Y i )) are not observable. ⇒ c n : is not a true statistic. → approximate the pseudo-sample ( U i , V i ) , i = 1 , . . . , n by its empirical counterpart ( F n ( X i ) , G n ( Y i )) , i = 1 , . . . , n . A genuine estimator of c ( u, v ) � u − F n ( X i ) � v − G n ( Y i ) n � � 1 � c n ( u, v ) := ˆ K 1 K 2 . na 2 a n a n n i =1 Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  56. Introduction The Quantile-Copula estimator The quantile transform Asymptotic results The copula representation Comparison with competitors A product shaped estimator Application to prediction and discussions Summary and conclusions Construction of the estimator - 2 But, F and G are unknown: the random variables ( U i = F ( X i ) , V i = G ( Y i )) are not observable. ⇒ c n : is not a true statistic. → approximate the pseudo-sample ( U i , V i ) , i = 1 , . . . , n by its empirical counterpart ( F n ( X i ) , G n ( Y i )) , i = 1 , . . . , n . A genuine estimator of c ( u, v ) � u − F n ( X i ) � v − G n ( Y i ) n � � 1 � c n ( u, v ) := ˆ K 1 K 2 . na 2 a n a n n i =1 Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  57. Introduction The Quantile-Copula estimator The quantile transform Asymptotic results The copula representation Comparison with competitors A product shaped estimator Application to prediction and discussions Summary and conclusions Construction of the estimator - 2 But, F and G are unknown: the random variables ( U i = F ( X i ) , V i = G ( Y i )) are not observable. ⇒ c n : is not a true statistic. → approximate the pseudo-sample ( U i , V i ) , i = 1 , . . . , n by its empirical counterpart ( F n ( X i ) , G n ( Y i )) , i = 1 , . . . , n . A genuine estimator of c ( u, v ) � u − F n ( X i ) � v − G n ( Y i ) n � � 1 � c n ( u, v ) := ˆ K 1 K 2 . na 2 a n a n n i =1 Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  58. Introduction The Quantile-Copula estimator The quantile transform Asymptotic results The copula representation Comparison with competitors A product shaped estimator Application to prediction and discussions Summary and conclusions Construction of the estimator - 2 But, F and G are unknown: the random variables ( U i = F ( X i ) , V i = G ( Y i )) are not observable. ⇒ c n : is not a true statistic. → approximate the pseudo-sample ( U i , V i ) , i = 1 , . . . , n by its empirical counterpart ( F n ( X i ) , G n ( Y i )) , i = 1 , . . . , n . A genuine estimator of c ( u, v ) � u − F n ( X i ) � v − G n ( Y i ) n � � 1 � c n ( u, v ) := ˆ K 1 K 2 . na 2 a n a n n i =1 Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  59. Introduction The Quantile-Copula estimator The quantile transform Asymptotic results The copula representation Comparison with competitors A product shaped estimator Application to prediction and discussions Summary and conclusions Construction of the estimator - 2 But, F and G are unknown: the random variables ( U i = F ( X i ) , V i = G ( Y i )) are not observable. ⇒ c n : is not a true statistic. → approximate the pseudo-sample ( U i , V i ) , i = 1 , . . . , n by its empirical counterpart ( F n ( X i ) , G n ( Y i )) , i = 1 , . . . , n . A genuine estimator of c ( u, v ) � u − F n ( X i ) � v − G n ( Y i ) n � � 1 � c n ( u, v ) := ˆ K 1 K 2 . na 2 a n a n n i =1 Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  60. Introduction The Quantile-Copula estimator The quantile transform Asymptotic results The copula representation Comparison with competitors A product shaped estimator Application to prediction and discussions Summary and conclusions The quantile-copula estimator Recollecting all elements, we get, The quantile-copula estimator ˆ f n ( y | x ) := ˆ g n ( y )ˆ c n ( F n ( x ) , G n ( y )) . that is to say, � y − Y i � F n ( x ) − F n ( X i ) � n �� � n � 1 1 ˆ � � f n ( y | x ) := K 0 . K 1 na 2 nh n h n a n n i =1 i =1 � G n ( y ) − G n ( Y i ) �� K 2 a n Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  61. Introduction The Quantile-Copula estimator Asymptotic results Consistency and asymptotic normality Comparison with competitors Sketch of the proofs Application to prediction and discussions Summary and conclusions Outline Introduction 4 Why estimating the conditional density? Two classical approaches for estimation The trouble with ratio shaped estimators The Quantile-Copula estimator 5 The quantile transform The copula representation A product shaped estimator Asymptotic results 6 Consistency and asymptotic normality Sketch of the proofs Comparison with competitors 7 Theoretical comparison Finite sample simulation Application to prediction and discussions 8 Application to prediction Discussions Summary and conclusions 9 Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  62. Introduction The Quantile-Copula estimator Asymptotic results Consistency and asymptotic normality Comparison with competitors Sketch of the proofs Application to prediction and discussions Summary and conclusions Hypothesis Assumptions on the densities i) the c.d.f F of X and G of Y are strictly increasing and differentiable; ii) the densities g and c are twice differentiable with continuous bounded second derivatives on their support. Assumptions on the kernels (i) K and K 0 are of bounded support and of bounded variation; (ii) 0 ≤ K ≤ C and 0 ≤ K 0 ≤ C for some constant C ; (iii) K and K 0 are second order kernels: m 0 ( K ) = 1 , m 1 ( K ) = 0 and m 2 ( K ) < + ∞ , and the same for K 0 . (iv) K is twice differentiable with bounded second partial derivatives. → classical regularity assumptions in nonparametric literature. Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  63. Introduction The Quantile-Copula estimator Asymptotic results Consistency and asymptotic normality Comparison with competitors Sketch of the proofs Application to prediction and discussions Summary and conclusions Hypothesis Assumptions on the densities i) the c.d.f F of X and G of Y are strictly increasing and differentiable; ii) the densities g and c are twice differentiable with continuous bounded second derivatives on their support. Assumptions on the kernels (i) K and K 0 are of bounded support and of bounded variation; (ii) 0 ≤ K ≤ C and 0 ≤ K 0 ≤ C for some constant C ; (iii) K and K 0 are second order kernels: m 0 ( K ) = 1 , m 1 ( K ) = 0 and m 2 ( K ) < + ∞ , and the same for K 0 . (iv) K is twice differentiable with bounded second partial derivatives. → classical regularity assumptions in nonparametric literature. Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  64. Introduction The Quantile-Copula estimator Asymptotic results Consistency and asymptotic normality Comparison with competitors Sketch of the proofs Application to prediction and discussions Summary and conclusions Asymptotic results - 1 Under the above regularity assumptions, with h n → 0 , a n → 0 , Pointwise Consistency weak consistency h n ≃ n − 1 / 5 , a n ≃ n − 1 / 6 entail � n − 1 / 3 � ˆ f n ( y | x ) = f ( y | x ) + O P . strong consistency h n ≃ (ln ln n/n ) 1 / 5 and a n ≃ (ln ln n/n ) 1 / 6 �� ln ln n � 1 / 3 � ˆ f n ( y | x ) = f ( y | x ) + O a.s. . n asymptotic normality nh n → ∞ , na 4 n → ∞ , na 6 n → 0 , and √ ln ln n/ ( na 3 n ) → 0 entail � � d � ˆ 0 , g ( y ) f ( y | x ) || K || 2 � � na 2 f n ( y | x ) − f ( y | x ) � N . n 2 Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  65. Introduction The Quantile-Copula estimator Asymptotic results Consistency and asymptotic normality Comparison with competitors Sketch of the proofs Application to prediction and discussions Summary and conclusions Asymptotic results - 2 Uniform Consistency Under the above regularity assumptions, with h n → 0 , a n → 0 , for x in the interior of the support of f and [ a, b ] included in the interior of the support of g , weak consistency h n ≃ (ln n/n ) 1 / 5 , a n ≃ (ln n/n ) 1 / 6 entail � (ln n/n ) 1 / 3 � | ˆ sup f n ( y | x ) − f ( y | x ) | = O P . y ∈ [ a,b ] strong consistency h n ≃ (ln n/n ) 1 / 5 , a n ≃ (ln n/n ) 1 / 6 entail �� ln n � 1 / 3 � | ˆ sup f n ( y | x ) − f ( y | x ) | = O a.s. . n y ∈ [ a,b ] Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  66. Introduction The Quantile-Copula estimator Asymptotic results Consistency and asymptotic normality Comparison with competitors Sketch of the proofs Application to prediction and discussions Summary and conclusions Asymptotic Mean square error Asymptotic Bias and Variance for the quantile-copula estimator Bias: f n ( y | x )) − f ( y | x ) = g ( y ) m 2 ( K ) . ∇ 2 c ( F ( x ) , G ( y )) a 2 E ( ˆ 2 + o ( a 2 n n ) with m 2 ( K ) = ( m 2 ( K 1 ) , m 2 ( K 2 )) , ∇ 2 c ( u, v ) = ( ∂ 2 c ( u,v ) , ∂ 2 c ( u,v ) ) . ∂u 2 ∂v 2 Variance: V ar ( ˆ f ( y | x )) = 1 / ( na 2 n ) g ( y ) f ( y | x ) || K || 2 2 + o (1 / ( na 2 n )) . Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  67. Introduction The Quantile-Copula estimator Asymptotic results Consistency and asymptotic normality Comparison with competitors Sketch of the proofs Application to prediction and discussions Summary and conclusions Sketch of the proofs Decomposition diagram ˆ g ( y )ˆ c n ( F n ( x ) , G n ( y )) ↓ g ( y )ˆ c n ( F n ( x ) , G n ( y )) → g ( y )ˆ c n ( F ( x ) , G ( y )) → g ( y ) c n ( F ( x ) , G ( y )) ↓ g ( y ) c ( F ( x ) , G ( y )) ↓ : consistency results of the kernel density estimators → : two approximation lemmas ˆ c n from ( F n ( x ) , F n ( y )) → ( F ( x ) , G ( y )) 1 c n → c n . ˆ 2 Tools: results for the K-S statistics || F − F n || ∞ and || G − G n || ∞ . → Heuristic: rate of convergence of density estimators < rate of approximation of the K-S Statistic. Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  68. Introduction The Quantile-Copula estimator Asymptotic results Theoretical comparison Comparison with competitors Finite sample simulation Application to prediction and discussions Summary and conclusions Outline Introduction 4 Why estimating the conditional density? Two classical approaches for estimation The trouble with ratio shaped estimators The Quantile-Copula estimator 5 The quantile transform The copula representation A product shaped estimator Asymptotic results 6 Consistency and asymptotic normality Sketch of the proofs Comparison with competitors 7 Theoretical comparison Finite sample simulation Application to prediction and discussions 8 Application to prediction Discussions Summary and conclusions 9 Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  69. Introduction The Quantile-Copula estimator Asymptotic results Theoretical comparison Comparison with competitors Finite sample simulation Application to prediction and discussions Summary and conclusions Theoretical asymptotic comparison - 1 Competitor: e.g. Local Polynomial estimator, ˆ f ( LP ) ( y | x ) := ˆ θ 0 with n n j =0 θ j ( X i − x ) j � 2 � r � � K ′ R ( θ, x, y ) := K h 2 ( Y i − y ) − h 1 ( X i − x ) , i =1 where ˆ θ xy := (ˆ θ 0 , ˆ θ 1 , . . . , ˆ θ r ) is the value of θ which minimizes R ( θ, x, y ) . Comparative Bias B LP = h 2 1 m 2 ( K ′ ) ∂ 2 f ( y | x ) + h 2 ∂ 2 f ( y | x ) 2 m 2 ( K ) + o ( h 2 1 + h 2 2 ) 2 ∂x 2 2 ∂y 2 B QC = g ( y ) m 2 ( K ) . ∇ 2 c ( F ( x ) , G ( y )) a 2 2 + o ( a 2 n n ) Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  70. Introduction The Quantile-Copula estimator Asymptotic results Theoretical comparison Comparison with competitors Finite sample simulation Application to prediction and discussions Summary and conclusions Theoretical asymptotic comparison - 2 Asymptotic bias comparison All estimators have bias of the same order ≈ h 2 ≈ n − 1 / 3 ; Distribution dependent terms: difficult to compare sometimes less unknown terms for the quantile-copula estimator c of compact support : the “classical” kernel method to estimate the copula density induces bias on the boundaries of [0 , 1] 2 → techniques to reduce the bias of the kernel estimator on the edges (boundary kernels, beta kernels, reflection and transformation methods,...) Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  71. Introduction The Quantile-Copula estimator Asymptotic results Theoretical comparison Comparison with competitors Finite sample simulation Application to prediction and discussions Summary and conclusions Theoretical asymptotic comparison - 2 Asymptotic bias comparison All estimators have bias of the same order ≈ h 2 ≈ n − 1 / 3 ; Distribution dependent terms: difficult to compare sometimes less unknown terms for the quantile-copula estimator c of compact support : the “classical” kernel method to estimate the copula density induces bias on the boundaries of [0 , 1] 2 → techniques to reduce the bias of the kernel estimator on the edges (boundary kernels, beta kernels, reflection and transformation methods,...) Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  72. Introduction The Quantile-Copula estimator Asymptotic results Theoretical comparison Comparison with competitors Finite sample simulation Application to prediction and discussions Summary and conclusions Theoretical asymptotic comparison - 2 Asymptotic bias comparison All estimators have bias of the same order ≈ h 2 ≈ n − 1 / 3 ; Distribution dependent terms: difficult to compare sometimes less unknown terms for the quantile-copula estimator c of compact support : the “classical” kernel method to estimate the copula density induces bias on the boundaries of [0 , 1] 2 → techniques to reduce the bias of the kernel estimator on the edges (boundary kernels, beta kernels, reflection and transformation methods,...) Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  73. Introduction The Quantile-Copula estimator Asymptotic results Theoretical comparison Comparison with competitors Finite sample simulation Application to prediction and discussions Summary and conclusions Theoretical asymptotic comparison - 3 Asymptotic Variance comparison Main terms in the asymptotic variance: Ratio shaped estimators: V ar ( LP ) := f ( y | x ) → explosive variance for f ( x ) small value of the density f ( x ) , e.g. in the tail of the distribution of X . Quantile-copula estimator: V ar ( QC ) := g ( y ) f ( y | x ) → does not suffer from the unstable nature of competitors. Asymptotic relative efficiency: ratio of variances V ar ( QC ) V ar ( LP ) := f ( x ) g ( y ) → the QC has a lower asymptotic variance for a large amount of x,y values. Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  74. Introduction The Quantile-Copula estimator Asymptotic results Theoretical comparison Comparison with competitors Finite sample simulation Application to prediction and discussions Summary and conclusions Theoretical asymptotic comparison - 3 Asymptotic Variance comparison Main terms in the asymptotic variance: Ratio shaped estimators: V ar ( LP ) := f ( y | x ) → explosive variance for f ( x ) small value of the density f ( x ) , e.g. in the tail of the distribution of X . Quantile-copula estimator: V ar ( QC ) := g ( y ) f ( y | x ) → does not suffer from the unstable nature of competitors. Asymptotic relative efficiency: ratio of variances V ar ( QC ) V ar ( LP ) := f ( x ) g ( y ) → the QC has a lower asymptotic variance for a large amount of x,y values. Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  75. Introduction The Quantile-Copula estimator Asymptotic results Theoretical comparison Comparison with competitors Finite sample simulation Application to prediction and discussions Summary and conclusions Theoretical asymptotic comparison - 3 Asymptotic Variance comparison Main terms in the asymptotic variance: Ratio shaped estimators: V ar ( LP ) := f ( y | x ) → explosive variance for f ( x ) small value of the density f ( x ) , e.g. in the tail of the distribution of X . Quantile-copula estimator: V ar ( QC ) := g ( y ) f ( y | x ) → does not suffer from the unstable nature of competitors. Asymptotic relative efficiency: ratio of variances V ar ( QC ) V ar ( LP ) := f ( x ) g ( y ) → the QC has a lower asymptotic variance for a large amount of x,y values. Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  76. Introduction The Quantile-Copula estimator Asymptotic results Theoretical comparison Comparison with competitors Finite sample simulation Application to prediction and discussions Summary and conclusions Theoretical asymptotic comparison - 3 Asymptotic Variance comparison Main terms in the asymptotic variance: Ratio shaped estimators: V ar ( LP ) := f ( y | x ) → explosive variance for f ( x ) small value of the density f ( x ) , e.g. in the tail of the distribution of X . Quantile-copula estimator: V ar ( QC ) := g ( y ) f ( y | x ) → does not suffer from the unstable nature of competitors. Asymptotic relative efficiency: ratio of variances V ar ( QC ) V ar ( LP ) := f ( x ) g ( y ) → the QC has a lower asymptotic variance for a large amount of x,y values. Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  77. Introduction The Quantile-Copula estimator Asymptotic results Theoretical comparison Comparison with competitors Finite sample simulation Application to prediction and discussions Summary and conclusions Finite sample simulation Model Sample of n = 100 i.i.d. variables ( X i , Y i ) , from the following model: X, Y is marginally distributed as N (0 , 1) X, Y is linked via Frank Copula . C ( u, v, θ ) = ln[( θ + θ u + v − θ u − θ v ) / ( θ − 1)] ln θ with parameter θ = 100 . Practical implementation: Beta kernels for copula estimator, Epanechnikov for other. simple Rule-of-thumb method for the bandwidths. Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  78. Introduction The Quantile-Copula estimator Asymptotic results Theoretical comparison Comparison with competitors Finite sample simulation Application to prediction and discussions Summary and conclusions Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  79. Introduction The Quantile-Copula estimator Asymptotic results Theoretical comparison Comparison with competitors Finite sample simulation Application to prediction and discussions Summary and conclusions Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  80. Introduction The Quantile-Copula estimator Asymptotic results Theoretical comparison Comparison with competitors Finite sample simulation Application to prediction and discussions Summary and conclusions Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  81. Introduction The Quantile-Copula estimator Asymptotic results Theoretical comparison Comparison with competitors Finite sample simulation Application to prediction and discussions Summary and conclusions Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  82. Introduction The Quantile-Copula estimator Asymptotic results Application to prediction Comparison with competitors Discussions Application to prediction and discussions Summary and conclusions Outline Introduction 4 Why estimating the conditional density? Two classical approaches for estimation The trouble with ratio shaped estimators The Quantile-Copula estimator 5 The quantile transform The copula representation A product shaped estimator Asymptotic results 6 Consistency and asymptotic normality Sketch of the proofs Comparison with competitors 7 Theoretical comparison Finite sample simulation Application to prediction and discussions 8 Application to prediction Discussions Summary and conclusions 9 Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  83. Introduction The Quantile-Copula estimator Asymptotic results Application to prediction Comparison with competitors Discussions Application to prediction and discussions Summary and conclusions Application to prediction - definitions Point predictors: Conditional mode predictor Definition of the mode: θ ( x ) := arg sup y f ( y | x ) → plug in predictor : ˆ θ ( x ) := arg sup y ˆ f n ( y | x ) Set predictors: Level sets Predictive set C α ( x ) such as P ( Y ∈ C α ( x ) | X = x ) = α → Level set or Highest density region C α ( x ) := { y : f ( y | x ) ≥ f α } with f α the largest value such that the prediction set has coverage probability α . → plug-in level set: C α,n ( x ) := { y : ˆ f n ( y | x ) ≥ ˆ f α } where ˆ f α is an estimate of f α . Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  84. Introduction The Quantile-Copula estimator Asymptotic results Application to prediction Comparison with competitors Discussions Application to prediction and discussions Summary and conclusions Application to prediction - definitions Point predictors: Conditional mode predictor Definition of the mode: θ ( x ) := arg sup y f ( y | x ) → plug in predictor : ˆ θ ( x ) := arg sup y ˆ f n ( y | x ) Set predictors: Level sets Predictive set C α ( x ) such as P ( Y ∈ C α ( x ) | X = x ) = α → Level set or Highest density region C α ( x ) := { y : f ( y | x ) ≥ f α } with f α the largest value such that the prediction set has coverage probability α . → plug-in level set: C α,n ( x ) := { y : ˆ f n ( y | x ) ≥ ˆ f α } where ˆ f α is an estimate of f α . Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  85. Introduction The Quantile-Copula estimator Asymptotic results Application to prediction Comparison with competitors Discussions Application to prediction and discussions Summary and conclusions Application to prediction - definitions Point predictors: Conditional mode predictor Definition of the mode: θ ( x ) := arg sup y f ( y | x ) → plug in predictor : ˆ θ ( x ) := arg sup y ˆ f n ( y | x ) Set predictors: Level sets Predictive set C α ( x ) such as P ( Y ∈ C α ( x ) | X = x ) = α → Level set or Highest density region C α ( x ) := { y : f ( y | x ) ≥ f α } with f α the largest value such that the prediction set has coverage probability α . → plug-in level set: C α,n ( x ) := { y : ˆ f n ( y | x ) ≥ ˆ f α } where ˆ f α is an estimate of f α . Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  86. Introduction The Quantile-Copula estimator Asymptotic results Application to prediction Comparison with competitors Discussions Application to prediction and discussions Summary and conclusions Application to prediction - definitions Point predictors: Conditional mode predictor Definition of the mode: θ ( x ) := arg sup y f ( y | x ) → plug in predictor : ˆ θ ( x ) := arg sup y ˆ f n ( y | x ) Set predictors: Level sets Predictive set C α ( x ) such as P ( Y ∈ C α ( x ) | X = x ) = α → Level set or Highest density region C α ( x ) := { y : f ( y | x ) ≥ f α } with f α the largest value such that the prediction set has coverage probability α . → plug-in level set: C α,n ( x ) := { y : ˆ f n ( y | x ) ≥ ˆ f α } where ˆ f α is an estimate of f α . Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

  87. Introduction The Quantile-Copula estimator Asymptotic results Application to prediction Comparison with competitors Discussions Application to prediction and discussions Summary and conclusions Application to prediction - definitions Point predictors: Conditional mode predictor Definition of the mode: θ ( x ) := arg sup y f ( y | x ) → plug in predictor : ˆ θ ( x ) := arg sup y ˆ f n ( y | x ) Set predictors: Level sets Predictive set C α ( x ) such as P ( Y ∈ C α ( x ) | X = x ) = α → Level set or Highest density region C α ( x ) := { y : f ( y | x ) ≥ f α } with f α the largest value such that the prediction set has coverage probability α . → plug-in level set: C α,n ( x ) := { y : ˆ f n ( y | x ) ≥ ˆ f α } where ˆ f α is an estimate of f α . Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie

Recommend


More recommend