Introduction The Statistical Prediction Problem Prediction by temporal separation Prevision vs Regression Limit law of the Predictor Towards asymptotic independence Prevision versus Regression R´ egression estimation step : on the data D n := { ( X i , Y i ) , i = 0 , . . . , n } , estimate 1 r ( x ) = E [ Y | X = x ] by ˆ r ( x, D n ) prediction step : for a new ( X, Y ) , predict Y by ˆ r ( X, D n ) 2 if ( X, Y ) were independent of D n , then E [ Y | X, D n ] = E [ Y | X ] and � r ( X, D n )) 2 | X = x r ( X, D n )] 2 E θ [ r ( X ) − ˆ � ( r ( X ) − ˆ � = E θ dP X ( x ) � r ( x, D n )) 2 � � ( r ( x ) − ˆ = E θ dP X ( x ) → The Prediction error is the same as the MISE regression error. Prediction For a Markov process, ( X i , Y i ) = ( X i , X i +1 ) et ( X, Y ) = ( X T , X T +1 ) ⇒ D n not independent of X Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction The Statistical Prediction Problem Prediction by temporal separation Prevision vs Regression Limit law of the Predictor Towards asymptotic independence Prevision versus Regression R´ egression estimation step : on the data D n := { ( X i , Y i ) , i = 0 , . . . , n } , estimate 1 r ( x ) = E [ Y | X = x ] by ˆ r ( x, D n ) prediction step : for a new ( X, Y ) , predict Y by ˆ r ( X, D n ) 2 if ( X, Y ) were independent of D n , then E [ Y | X, D n ] = E [ Y | X ] and � r ( X, D n )) 2 | X = x r ( X, D n )] 2 E θ [ r ( X ) − ˆ � ( r ( X ) − ˆ � = E θ dP X ( x ) � r ( x, D n )) 2 � � ( r ( x ) − ˆ = E θ dP X ( x ) → The Prediction error is the same as the MISE regression error. Prediction For a Markov process, ( X i , Y i ) = ( X i , X i +1 ) et ( X, Y ) = ( X T , X T +1 ) ⇒ D n not independent of X Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction The Statistical Prediction Problem Prediction by temporal separation Prevision vs Regression Limit law of the Predictor Towards asymptotic independence Prevision versus Regression R´ egression estimation step : on the data D n := { ( X i , Y i ) , i = 0 , . . . , n } , estimate 1 r ( x ) = E [ Y | X = x ] by ˆ r ( x, D n ) prediction step : for a new ( X, Y ) , predict Y by ˆ r ( X, D n ) 2 if ( X, Y ) were independent of D n , then E [ Y | X, D n ] = E [ Y | X ] and � r ( X, D n )) 2 | X = x r ( X, D n )] 2 E θ [ r ( X ) − ˆ � ( r ( X ) − ˆ � = E θ dP X ( x ) � r ( x, D n )) 2 � � ( r ( x ) − ˆ = E θ dP X ( x ) → The Prediction error is the same as the MISE regression error. Prediction For a Markov process, ( X i , Y i ) = ( X i , X i +1 ) et ( X, Y ) = ( X T , X T +1 ) ⇒ D n not independent of X Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction The Statistical Prediction Problem Prediction by temporal separation Prevision vs Regression Limit law of the Predictor Towards asymptotic independence Prevision versus Regression R´ egression estimation step : on the data D n := { ( X i , Y i ) , i = 0 , . . . , n } , estimate 1 r ( x ) = E [ Y | X = x ] by ˆ r ( x, D n ) prediction step : for a new ( X, Y ) , predict Y by ˆ r ( X, D n ) 2 if ( X, Y ) were independent of D n , then E [ Y | X, D n ] = E [ Y | X ] and � r ( X, D n )) 2 | X = x r ( X, D n )] 2 E θ [ r ( X ) − ˆ � ( r ( X ) − ˆ � = E θ dP X ( x ) � r ( x, D n )) 2 � � ( r ( x ) − ˆ = E θ dP X ( x ) → The Prediction error is the same as the MISE regression error. Prediction For a Markov process, ( X i , Y i ) = ( X i , X i +1 ) et ( X, Y ) = ( X T , X T +1 ) ⇒ D n not independent of X Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction The Statistical Prediction Problem Prediction by temporal separation Prevision vs Regression Limit law of the Predictor Towards asymptotic independence Prevision versus Regression R´ egression estimation step : on the data D n := { ( X i , Y i ) , i = 0 , . . . , n } , estimate 1 r ( x ) = E [ Y | X = x ] by ˆ r ( x, D n ) prediction step : for a new ( X, Y ) , predict Y by ˆ r ( X, D n ) 2 if ( X, Y ) were independent of D n , then E [ Y | X, D n ] = E [ Y | X ] and � r ( X, D n )) 2 | X = x r ( X, D n )] 2 E θ [ r ( X ) − ˆ � ( r ( X ) − ˆ � = E θ dP X ( x ) � r ( x, D n )) 2 � � ( r ( x ) − ˆ = E θ dP X ( x ) → The Prediction error is the same as the MISE regression error. Prediction For a Markov process, ( X i , Y i ) = ( X i , X i +1 ) et ( X, Y ) = ( X T , X T +1 ) ⇒ D n not independent of X Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction The Statistical Prediction Problem Prediction by temporal separation Prevision vs Regression Limit law of the Predictor Towards asymptotic independence Prevision versus Regression R´ egression estimation step : on the data D n := { ( X i , Y i ) , i = 0 , . . . , n } , estimate 1 r ( x ) = E [ Y | X = x ] by ˆ r ( x, D n ) prediction step : for a new ( X, Y ) , predict Y by ˆ r ( X, D n ) 2 if ( X, Y ) were independent of D n , then E [ Y | X, D n ] = E [ Y | X ] and � r ( X, D n )) 2 | X = x r ( X, D n )] 2 E θ [ r ( X ) − ˆ � ( r ( X ) − ˆ � = E θ dP X ( x ) � r ( x, D n )) 2 � � ( r ( x ) − ˆ = E θ dP X ( x ) → The Prediction error is the same as the MISE regression error. Prediction For a Markov process, ( X i , Y i ) = ( X i , X i +1 ) et ( X, Y ) = ( X T , X T +1 ) ⇒ D n not independent of X Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction The Statistical Prediction Problem Prediction by temporal separation Prevision vs Regression Limit law of the Predictor Towards asymptotic independence Towards asymptotic independence Issue How to let X be independent of D n ? A solution : temporal separation Let ϕ ( T ) → ∞ and k T → ∞ such that k T − ϕ ( T ) → ∞ . Split the data ( X 0 , . . . , X T ) : estimate θ on [0 , ϕ ( T )] : ˆ θ ϕ ( T ) 1 predict on [ T − k T , T ] : ˆ θ ϕ ( T ) ( X T Y := r ˆ T − k T ) 2 by using an assumption of asymptotic independence (short memory) on the process. Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction The Statistical Prediction Problem Prediction by temporal separation Prevision vs Regression Limit law of the Predictor Towards asymptotic independence Towards asymptotic independence Issue How to let X be independent of D n ? A solution : temporal separation Let ϕ ( T ) → ∞ and k T → ∞ such that k T − ϕ ( T ) → ∞ . Split the data ( X 0 , . . . , X T ) : estimate θ on [0 , ϕ ( T )] : ˆ θ ϕ ( T ) 1 predict on [ T − k T , T ] : ˆ θ ϕ ( T ) ( X T Y := r ˆ T − k T ) 2 by using an assumption of asymptotic independence (short memory) on the process. Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Model Introduction Statistical Prediction and assumptions Prediction by temporal separation Results : Consistency of the predictor Limit law of the Predictor Example Outline Introduction 1 The Statistical Prediction Problem Prevision vs Regression Towards asymptotic independence Prediction by temporal separation 2 Model Statistical Prediction and assumptions Results : Consistency of the predictor Example Limit law of the Predictor 3 Assumptions Result : Limit law of the predictor Conclusions Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Model Introduction Statistical Prediction and assumptions Prediction by temporal separation Results : Consistency of the predictor Limit law of the Predictor Example Some notions on α -mixing Definition : α -mixing coefficients, Rosenblatt [1956] Let (Ω , A , P ) a probability space and B , C two sub-sigma fields of A . The α -mixing coefficient between B and C is defined by α ( B , C ) = sup | P ( B ∩ C ) − P ( B ) P ( C ) | B ∈B C ∈C and the α -mixing coefficient of order k for the stochastic process X = { X t , t ∈ N } defined on the probability space (Ω , A , P ) as α ( σ ( X s , s ≤ t ) , σ ( X s , s ≥ t + k )) α ( k ) = sup t ∈ N Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Model Introduction Statistical Prediction and assumptions Prediction by temporal separation Results : Consistency of the predictor Limit law of the Predictor Example Model Let X = ( X t , t ∈ N ) a stochastic process. We assume that : X is a second order, square integrable, α -mixing process. 1 the regression function r θ ( . ) depends approximately of the last k T values 2 ( X T − i , i = 1 , . . . , k T ) : k T � � � X ∗ � X T � T +1 := E θ X T +1 := r i ( X T − i , θ ) + η k T ( X , θ ) . � 0 i =0 Assumptions H 0 on the process T →∞ E θ ( η 2 (i) lim k T ( X , θ )) = 0 ; (ii) for all i ∈ N , � r i ( X T − i , θ 1 ) − r i ( X T − i , θ 2 ) � ≤ H i ( X T − i ) � θ 1 − θ 2 � , ∀ θ 1 , θ 2 ; � 1 /r < ∞ . E θ H 2 r � (iii) there exists a r > 1 such that sup i ( X T − i ) i ∈ N This additive model is an extension of a model studied by Bosq [2007]. Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Model Introduction Statistical Prediction and assumptions Prediction by temporal separation Results : Consistency of the predictor Limit law of the Predictor Example Statistical Prediction and assumptions We assume we have an estimator ˆ θ T of θ . Assumptions H 1 on the estimator ˆ θ T θ T − θ ) 2 < ∞ ; T.E θ (ˆ (i) lim sup T →∞ θ T − θ ) 2 q < ∞ . T q E (ˆ (ii) there exists q > 1 such that lim sup T →∞ We build a statistical predictor : ˆ X T +1 := � k T i =0 r i ( X T − i , ˆ θ ϕ ( T ) ) Assumptions H 2 on the coefficients k 2 (i) T →∞ 0 ; → T ϕ ( T ) (ii) ( T − k T − ϕ ( T )) → T →∞ ∞ . Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Model Introduction Statistical Prediction and assumptions Prediction by temporal separation Results : Consistency of the predictor Limit law of the Predictor Example Statistical Prediction and assumptions We assume we have an estimator ˆ θ T of θ . Assumptions H 1 on the estimator ˆ θ T θ T − θ ) 2 < ∞ ; T.E θ (ˆ (i) lim sup T →∞ θ T − θ ) 2 q < ∞ . T q E (ˆ (ii) there exists q > 1 such that lim sup T →∞ We build a statistical predictor : ˆ X T +1 := � k T i =0 r i ( X T − i , ˆ θ ϕ ( T ) ) Assumptions H 2 on the coefficients k 2 (i) T →∞ 0 ; → T ϕ ( T ) (ii) ( T − k T − ϕ ( T )) → T →∞ ∞ . Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Model Introduction Statistical Prediction and assumptions Prediction by temporal separation Results : Consistency of the predictor Limit law of the Predictor Example Consistency of the predictor Theorem 2.5 Under the assumptions H 0 , H 1 , H 2 , we have that T +1 ) 2 = 0 E θ ( ˆ X T +1 − X ∗ lim sup T →∞ Tool : Davydov’s covariance inequality Let X ∈ L q ( P ) and Y ∈ L r ( P ) , if q > 1 , r > 1 and 1 r + 1 q = 1 − 1 p , then � 1 p � X � q � Y � r . | Cov ( X, Y ) | ≤ 2 p � 2 α ( σ ( X ) , σ ( Y )) Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Model Introduction Statistical Prediction and assumptions Prediction by temporal separation Results : Consistency of the predictor Limit law of the Predictor Example Consistency of the predictor Theorem 2.5 Under the assumptions H 0 , H 1 , H 2 , we have that T +1 ) 2 = 0 E θ ( ˆ X T +1 − X ∗ lim sup T →∞ Tool : Davydov’s covariance inequality Let X ∈ L q ( P ) and Y ∈ L r ( P ) , if q > 1 , r > 1 and 1 r + 1 q = 1 − 1 p , then � 1 p � X � q � Y � r . | Cov ( X, Y ) | ≤ 2 p � 2 α ( σ ( X ) , σ ( Y )) Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Model Introduction Statistical Prediction and assumptions Prediction by temporal separation Results : Consistency of the predictor Limit law of the Predictor Example Example of process For a linear, weakly stationary, centered, non deterministic, inversible process in discrete time, its Wold decomposition writes: k T � � X T = e T + ϕ i ( θ ) X T − i + ϕ i ( θ ) X T − i i =1 i>k T ∞ ϕ 2 � � with i ( θ ) < ∞ . Set η k T ( X , θ ) = ϕ i ( θ ) X T +1 − i i =1 i>k T +1 Proposition If X verifies the assumptions ∀ i , ϕ i is differentiable and � ϕ ′ i ( . ) � ∞ < ∞ ; 1 there exists a r > 1 such as ( X t ) has a moment of order 2 r ; 2 ϕ i +1 ( θ ) ϕ j +1 ( θ ) α 1 /p ( | i − j | ) < ∞ . X is α -mixing and such that � 3 i,j Then, X verifies the assumptions of theorem 2.5. Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction Assumptions Prediction by temporal separation Result : Limit law of the predictor Limit law of the Predictor Conclusions Outline Introduction 1 The Statistical Prediction Problem Prevision vs Regression Towards asymptotic independence Prediction by temporal separation 2 Model Statistical Prediction and assumptions Results : Consistency of the predictor Example Limit law of the Predictor 3 Assumptions Result : Limit law of the predictor Conclusions Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction Assumptions Prediction by temporal separation Result : Limit law of the predictor Limit law of the Predictor Conclusions Assumptions for the limit law Assumptions H ′ 0 on the process (i) θ �→ r i ( X T − i , θ ) is twice differentiable w.r.t. θ ; � ∂ 2 � � (ii) sup θ r i ( X T − i , . ) ∞ = O P (1); � i �� � 1 (iii) η k T ( X , θ ) = o P ; ϕ ( T ) + ∞ (iv) � ∂ θ r i ( X T − i ; θ ) exists and converge a. s. to a vector V as T → + ∞ . i =0 1 on the estimator ˆ Assumption H ′ θ T √ T (ˆ L � N (0 , σ 2 ( θ )) . θ T − θ ) (i) Assumption H ′ 2 on the coefficients � (i) k T = o ( ϕ ( T )) ; (ii) ( T − k T − ϕ ( T )) → T →∞ ∞ . Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction Assumptions Prediction by temporal separation Result : Limit law of the predictor Limit law of the Predictor Conclusions Limit law of the predictor Theorem 2.10 If the assumptions H ′ 0 , H ′ 1 , H ′ 2 are verified, then ϕ ( T )( ˆ L � X T +1 − X ∗ T +1 ) � < U, V > where U and V are two independent random variables, U with law N (0 , σ 2 ( θ )) + ∞ and V is the limit of � ∂ θ r i ( X T − i ; θ ) as T → ∞ i =0 Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction Assumptions Prediction by temporal separation Result : Limit law of the predictor Limit law of the Predictor Conclusions Tool An asymptotic independence lemma Let ( X ′ n ) and ( X ′′ n ) two sequences of real-valued random variables with laws P ′ n and P ′′ n respectively, defined on the probability space (Ω , A , P ) . Assume that ( X ′ n ) and ( X ′′ n ) are asymptotically mixing w.r.t. each other, in the sense that there exists a sequence of coefficients α ( n ) with α ( n ) → n →∞ 0 such that, for all Borel set A and B of R , � P ( X ′ n ∈ A, X ′′ n ∈ B ) − P ( X ′ n ∈ A ) P ( X ′′ � ≤ α ( n ) � � n ∈ B ) Then, if � X ′ with law P ′ ; L X ′ 1 n � X ′′ with law P ′′ ; L X ′′ 2 n � ( X ′ , X ′′ ) , and the law ( X ′ , X ′′ ) is P ′ ⊗ P ′′ . L ( X ′ n , X ′′ n ) Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction Assumptions Prediction by temporal separation Result : Limit law of the predictor Limit law of the Predictor Conclusions Conclusions Some limits of the temporal decoupling method heuristically under-efficient : gap in the data ; 1 the mixing coefficients = a real number which reduces the dependence 2 structure of the process to a property of asymptotic independence ; practical applications are difficult to undertake. 3 References Faugeras, O. (2007) Pr´ evision statistique param´ etrique par s´ eparation temporelle. Accepted to Annales de l’ISUP. Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction Assumptions Prediction by temporal separation Result : Limit law of the predictor Limit law of the Predictor Conclusions Part II : A nonparametric quantile-copula approach to conditional density estimation. Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction The Quantile-Copula estimator Why estimating the conditional density? Asymptotic results Two classical approaches for estimation Comparison with competitors The trouble with ratio shaped estimators Application to prediction and discussions Summary and conclusions Outline Introduction 4 Why estimating the conditional density? Two classical approaches for estimation The trouble with ratio shaped estimators The Quantile-Copula estimator 5 The quantile transform The copula representation A product shaped estimator Asymptotic results 6 Consistency and asymptotic normality Sketch of the proofs Comparison with competitors 7 Theoretical comparison Finite sample simulation Application to prediction and discussions 8 Application to prediction Discussions Summary and conclusions 9 Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction The Quantile-Copula estimator Why estimating the conditional density? Asymptotic results Two classical approaches for estimation Comparison with competitors The trouble with ratio shaped estimators Application to prediction and discussions Summary and conclusions Setup and Motivation Objective observe a sample (( X i , Y i ); i = 1 , . . . , n ) i.i.d. of ( X, Y ) . predict the output Y for an input X at location x with minimal assumptions on the law of ( X, Y ) (Nonparametric setup). Notation ( X, Y ) → joint c.d.f F X,Y , joint density f X,Y ; X → c.d.f. F , density f ; Y → c.d.f. G , density g . Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction The Quantile-Copula estimator Why estimating the conditional density? Asymptotic results Two classical approaches for estimation Comparison with competitors The trouble with ratio shaped estimators Application to prediction and discussions Summary and conclusions Setup and Motivation Objective observe a sample (( X i , Y i ); i = 1 , . . . , n ) i.i.d. of ( X, Y ) . predict the output Y for an input X at location x with minimal assumptions on the law of ( X, Y ) (Nonparametric setup). Notation ( X, Y ) → joint c.d.f F X,Y , joint density f X,Y ; X → c.d.f. F , density f ; Y → c.d.f. G , density g . Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction The Quantile-Copula estimator Why estimating the conditional density? Asymptotic results Two classical approaches for estimation Comparison with competitors The trouble with ratio shaped estimators Application to prediction and discussions Summary and conclusions Setup and Motivation Objective observe a sample (( X i , Y i ); i = 1 , . . . , n ) i.i.d. of ( X, Y ) . predict the output Y for an input X at location x with minimal assumptions on the law of ( X, Y ) (Nonparametric setup). Notation ( X, Y ) → joint c.d.f F X,Y , joint density f X,Y ; X → c.d.f. F , density f ; Y → c.d.f. G , density g . Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction The Quantile-Copula estimator Why estimating the conditional density? Asymptotic results Two classical approaches for estimation Comparison with competitors The trouble with ratio shaped estimators Application to prediction and discussions Summary and conclusions Why estimating the conditional density ? What is a good prediction ? Classical approach ( L 2 theory): the conditional mean or regression 1 function r ( x ) = E ( Y | X = x ) , Fully informative approach: the conditional density f ( y | x ) 2 Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction The Quantile-Copula estimator Why estimating the conditional density? Asymptotic results Two classical approaches for estimation Comparison with competitors The trouble with ratio shaped estimators Application to prediction and discussions Summary and conclusions Why estimating the conditional density ? What is a good prediction ? Classical approach ( L 2 theory): the conditional mean or regression 1 function r ( x ) = E ( Y | X = x ) , Fully informative approach: the conditional density f ( y | x ) 2 Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction The Quantile-Copula estimator Why estimating the conditional density? Asymptotic results Two classical approaches for estimation Comparison with competitors The trouble with ratio shaped estimators Application to prediction and discussions Summary and conclusions Why estimating the conditional density ? What is a good prediction ? Classical approach ( L 2 theory): the conditional mean or regression 1 function r ( x ) = E ( Y | X = x ) , Fully informative approach: the conditional density f ( y | x ) 2 Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction The Quantile-Copula estimator Why estimating the conditional density? Asymptotic results Two classical approaches for estimation Comparison with competitors The trouble with ratio shaped estimators Application to prediction and discussions Summary and conclusions Why estimating the conditional density ? What is a good prediction ? Classical approach ( L 2 theory): the conditional mean or regression 1 function r ( x ) = E ( Y | X = x ) , Fully informative approach: the conditional density f ( y | x ) 2 Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction The Quantile-Copula estimator Why estimating the conditional density? Asymptotic results Two classical approaches for estimation Comparison with competitors The trouble with ratio shaped estimators Application to prediction and discussions Summary and conclusions Why estimating the conditional density ? What is a good prediction ? Classical approach ( L 2 theory): the conditional mean or regression 1 function r ( x ) = E ( Y | X = x ) , Fully informative approach: the conditional density f ( y | x ) 2 Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction The Quantile-Copula estimator Why estimating the conditional density? Asymptotic results Two classical approaches for estimation Comparison with competitors The trouble with ratio shaped estimators Application to prediction and discussions Summary and conclusions Estimating the conditional density - 1 A first density -based approach ˆ f ( y | x ) = f X,Y ( x, y ) f X,Y ( x, y ) ← f ( x ) ˆ f ( x ) f X,Y , ˆ ˆ f : Parzen-Rosenblatt kernel estimators with kernels K , K ′ , bandwidths h and h ′ . The double kernel estimator n K ′ � h ′ ( X i − x ) K h ( Y i − y ) ˆ i =1 f ( y | x ) = → ratio shaped n K ′ � h ′ ( X i − x ) i =1 Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction The Quantile-Copula estimator Why estimating the conditional density? Asymptotic results Two classical approaches for estimation Comparison with competitors The trouble with ratio shaped estimators Application to prediction and discussions Summary and conclusions Estimating the conditional density - 1 A first density -based approach ˆ f ( y | x ) = f X,Y ( x, y ) f X,Y ( x, y ) ← f ( x ) ˆ f ( x ) f X,Y , ˆ ˆ f : Parzen-Rosenblatt kernel estimators with kernels K , K ′ , bandwidths h and h ′ . The double kernel estimator n K ′ � h ′ ( X i − x ) K h ( Y i − y ) ˆ i =1 f ( y | x ) = → ratio shaped n K ′ � h ′ ( X i − x ) i =1 Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction The Quantile-Copula estimator Why estimating the conditional density? Asymptotic results Two classical approaches for estimation Comparison with competitors The trouble with ratio shaped estimators Application to prediction and discussions Summary and conclusions Estimating the conditional density - 1 A first density -based approach ˆ f ( y | x ) = f X,Y ( x, y ) f X,Y ( x, y ) ← f ( x ) ˆ f ( x ) f X,Y , ˆ ˆ f : Parzen-Rosenblatt kernel estimators with kernels K , K ′ , bandwidths h and h ′ . The double kernel estimator n K ′ � h ′ ( X i − x ) K h ( Y i − y ) ˆ i =1 f ( y | x ) = → ratio shaped n K ′ � h ′ ( X i − x ) i =1 Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction The Quantile-Copula estimator Why estimating the conditional density? Asymptotic results Two classical approaches for estimation Comparison with competitors The trouble with ratio shaped estimators Application to prediction and discussions Summary and conclusions Estimating the conditional density - 1 A first density -based approach ˆ f ( y | x ) = f X,Y ( x, y ) f X,Y ( x, y ) ← f ( x ) ˆ f ( x ) f X,Y , ˆ ˆ f : Parzen-Rosenblatt kernel estimators with kernels K , K ′ , bandwidths h and h ′ . The double kernel estimator n K ′ � h ′ ( X i − x ) K h ( Y i − y ) ˆ i =1 f ( y | x ) = → ratio shaped n K ′ � h ′ ( X i − x ) i =1 Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction The Quantile-Copula estimator Why estimating the conditional density? Asymptotic results Two classical approaches for estimation Comparison with competitors The trouble with ratio shaped estimators Application to prediction and discussions Summary and conclusions Estimating the conditional density - 2 A regression strategy � � Fact: E 1 | Y − y |≤ h | X = x = F ( y + h | x ) − F ( y − h | x ) ≈ 2 h.f ( y | x ) Conditional density estimation problem → a regression framework Transform the data: 1 Y i → Y ′ i := (2 h ) − 1 1 | Y i − y |≤ h Y i → Y ′ i := K h ( Y i − y ) smoothed version Perform a nonparametric regression of Y ′ i on X i s by local averaging 2 methods (Nadaraya-Watson, local polynomial, orthogonal series,...) Nadaraya-Watson estimator n K ′ � h ′ ( X i − x ) K h ( Y i − y ) ˆ i =1 f ( y | x ) = → (same) ratio shape. n � K ′ h ′ ( X i − x ) i =1 Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction The Quantile-Copula estimator Why estimating the conditional density? Asymptotic results Two classical approaches for estimation Comparison with competitors The trouble with ratio shaped estimators Application to prediction and discussions Summary and conclusions Estimating the conditional density - 2 A regression strategy � � Fact: E 1 | Y − y |≤ h | X = x = F ( y + h | x ) − F ( y − h | x ) ≈ 2 h.f ( y | x ) Conditional density estimation problem → a regression framework Transform the data: 1 Y i → Y ′ i := (2 h ) − 1 1 | Y i − y |≤ h Y i → Y ′ i := K h ( Y i − y ) smoothed version Perform a nonparametric regression of Y ′ i on X i s by local averaging 2 methods (Nadaraya-Watson, local polynomial, orthogonal series,...) Nadaraya-Watson estimator n K ′ � h ′ ( X i − x ) K h ( Y i − y ) ˆ i =1 f ( y | x ) = → (same) ratio shape. n � K ′ h ′ ( X i − x ) i =1 Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction The Quantile-Copula estimator Why estimating the conditional density? Asymptotic results Two classical approaches for estimation Comparison with competitors The trouble with ratio shaped estimators Application to prediction and discussions Summary and conclusions Estimating the conditional density - 2 A regression strategy � � Fact: E 1 | Y − y |≤ h | X = x = F ( y + h | x ) − F ( y − h | x ) ≈ 2 h.f ( y | x ) Conditional density estimation problem → a regression framework Transform the data: 1 Y i → Y ′ i := (2 h ) − 1 1 | Y i − y |≤ h Y i → Y ′ i := K h ( Y i − y ) smoothed version Perform a nonparametric regression of Y ′ i on X i s by local averaging 2 methods (Nadaraya-Watson, local polynomial, orthogonal series,...) Nadaraya-Watson estimator n K ′ � h ′ ( X i − x ) K h ( Y i − y ) ˆ i =1 f ( y | x ) = → (same) ratio shape. n � K ′ h ′ ( X i − x ) i =1 Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction The Quantile-Copula estimator Why estimating the conditional density? Asymptotic results Two classical approaches for estimation Comparison with competitors The trouble with ratio shaped estimators Application to prediction and discussions Summary and conclusions Ratio shaped estimators Bibliography Double kernel estimator: Rosenblatt [1969], Roussas [1969], Stute [1986], 1 Hyndman, Bashtannyk and Grunwald [1996]; Local Polynomial: Fan, Yao and Tong [1996], Fan and Yao [2005]; 2 Local parametric and constrained local polynomial: Hyndman and Yao 3 [2002]; Rojas, Genovese, Wasserman [2009]; Partitioning type estimate: Gy¨ orfi and Kohler [2007]; 4 Projection type estimate: Lacour [2007]. 5 Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction The Quantile-Copula estimator Why estimating the conditional density? Asymptotic results Two classical approaches for estimation Comparison with competitors The trouble with ratio shaped estimators Application to prediction and discussions Summary and conclusions The trouble with ratio shaped estimators Drawbacks quotient shape of estimator is tricky to study; explosive behavior when the denominator is small → numerical implementation delicate (trimming); minoration hypothesis on the marginal density f ( x ) ≥ c > 0 . How to remedy these problems? → build on the idea of using synthetic data: find a representation of the data more adapted to the problem. Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction The Quantile-Copula estimator Why estimating the conditional density? Asymptotic results Two classical approaches for estimation Comparison with competitors The trouble with ratio shaped estimators Application to prediction and discussions Summary and conclusions The trouble with ratio shaped estimators Drawbacks quotient shape of estimator is tricky to study; explosive behavior when the denominator is small → numerical implementation delicate (trimming); minoration hypothesis on the marginal density f ( x ) ≥ c > 0 . How to remedy these problems? → build on the idea of using synthetic data: find a representation of the data more adapted to the problem. Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction The Quantile-Copula estimator The quantile transform Asymptotic results The copula representation Comparison with competitors A product shaped estimator Application to prediction and discussions Summary and conclusions Outline Introduction 4 Why estimating the conditional density? Two classical approaches for estimation The trouble with ratio shaped estimators The Quantile-Copula estimator 5 The quantile transform The copula representation A product shaped estimator Asymptotic results 6 Consistency and asymptotic normality Sketch of the proofs Comparison with competitors 7 Theoretical comparison Finite sample simulation Application to prediction and discussions 8 Application to prediction Discussions Summary and conclusions 9 Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction The Quantile-Copula estimator The quantile transform Asymptotic results The copula representation Comparison with competitors A product shaped estimator Application to prediction and discussions Summary and conclusions The quantile transform What is the “best” transformation of the data in that context ? The quantile transform theorem when F is arbitrary, if U is a uniformly distributed random variable on d = F − 1 ( U ) ; (0 , 1) , X whenever F is continuous, the random variable U = F ( X ) is uniformly distributed on (0 , 1) . → use the invariance property of the quantile transform to construct a pseudo-sample ( U i , V i ) with a prescribed uniform marginal distribution. ( X 1 , . . . , X n ) ( Y 1 , . . . , Y n ) ↓ ↓ ( U 1 = F ( X 1 ) , . . . , U n = F ( X n )) ( V 1 = G ( Y 1 ) , . . . , V n = G ( Y n )) Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction The Quantile-Copula estimator The quantile transform Asymptotic results The copula representation Comparison with competitors A product shaped estimator Application to prediction and discussions Summary and conclusions The quantile transform What is the “best” transformation of the data in that context ? The quantile transform theorem when F is arbitrary, if U is a uniformly distributed random variable on d = F − 1 ( U ) ; (0 , 1) , X whenever F is continuous, the random variable U = F ( X ) is uniformly distributed on (0 , 1) . → use the invariance property of the quantile transform to construct a pseudo-sample ( U i , V i ) with a prescribed uniform marginal distribution. ( X 1 , . . . , X n ) ( Y 1 , . . . , Y n ) ↓ ↓ ( U 1 = F ( X 1 ) , . . . , U n = F ( X n )) ( V 1 = G ( Y 1 ) , . . . , V n = G ( Y n )) Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction The Quantile-Copula estimator The quantile transform Asymptotic results The copula representation Comparison with competitors A product shaped estimator Application to prediction and discussions Summary and conclusions The quantile transform What is the “best” transformation of the data in that context ? The quantile transform theorem when F is arbitrary, if U is a uniformly distributed random variable on d = F − 1 ( U ) ; (0 , 1) , X whenever F is continuous, the random variable U = F ( X ) is uniformly distributed on (0 , 1) . → use the invariance property of the quantile transform to construct a pseudo-sample ( U i , V i ) with a prescribed uniform marginal distribution. ( X 1 , . . . , X n ) ( Y 1 , . . . , Y n ) ↓ ↓ ( U 1 = F ( X 1 ) , . . . , U n = F ( X n )) ( V 1 = G ( Y 1 ) , . . . , V n = G ( Y n )) Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction The Quantile-Copula estimator The quantile transform Asymptotic results The copula representation Comparison with competitors A product shaped estimator Application to prediction and discussions Summary and conclusions The copula representation → leads naturally to the copula function: Sklar’s theorem [1959] For any bivariate cumulative distribution function F X,Y on R 2 , with marginal c.d.f. F of X and G of Y , there exists some function C : [0 , 1] 2 → [0 , 1] , called the dependence or copula function, such as F X,Y ( x, y ) = C ( F ( x ) , G ( y )) , − ∞ ≤ x, y ≤ + ∞ . If F and G are continuous, this representation is unique with respect to ( F, G ) . The copula function C is itself a c.d.f. on [0 , 1] 2 with uniform marginals. → captures the dependence structure of the vector ( X, Y ) , irrespectively of the marginals. → allows to deal with the randomness of the dependence structure and the randomness of the marginals separately . Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction The Quantile-Copula estimator The quantile transform Asymptotic results The copula representation Comparison with competitors A product shaped estimator Application to prediction and discussions Summary and conclusions The copula representation → leads naturally to the copula function: Sklar’s theorem [1959] For any bivariate cumulative distribution function F X,Y on R 2 , with marginal c.d.f. F of X and G of Y , there exists some function C : [0 , 1] 2 → [0 , 1] , called the dependence or copula function, such as F X,Y ( x, y ) = C ( F ( x ) , G ( y )) , − ∞ ≤ x, y ≤ + ∞ . If F and G are continuous, this representation is unique with respect to ( F, G ) . The copula function C is itself a c.d.f. on [0 , 1] 2 with uniform marginals. → captures the dependence structure of the vector ( X, Y ) , irrespectively of the marginals. → allows to deal with the randomness of the dependence structure and the randomness of the marginals separately . Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction The Quantile-Copula estimator The quantile transform Asymptotic results The copula representation Comparison with competitors A product shaped estimator Application to prediction and discussions Summary and conclusions The copula representation → leads naturally to the copula function: Sklar’s theorem [1959] For any bivariate cumulative distribution function F X,Y on R 2 , with marginal c.d.f. F of X and G of Y , there exists some function C : [0 , 1] 2 → [0 , 1] , called the dependence or copula function, such as F X,Y ( x, y ) = C ( F ( x ) , G ( y )) , − ∞ ≤ x, y ≤ + ∞ . If F and G are continuous, this representation is unique with respect to ( F, G ) . The copula function C is itself a c.d.f. on [0 , 1] 2 with uniform marginals. → captures the dependence structure of the vector ( X, Y ) , irrespectively of the marginals. → allows to deal with the randomness of the dependence structure and the randomness of the marginals separately . Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction The Quantile-Copula estimator The quantile transform Asymptotic results The copula representation Comparison with competitors A product shaped estimator Application to prediction and discussions Summary and conclusions A product shaped estimator Assume that the copula function C ( u, v ) has a density c ( u, v ) = ∂ 2 C ( u,v ) ∂u∂v i.e. c ( u, v ) is the density of the transformed r.v. ( U, V ) = ( F ( X ) , G ( Y )) . A product form of the conditional density By differentiating Sklar’s formula, f Y | X ( y | x ) = f XY ( x, y ) = g ( y ) c ( F ( x ) , G ( y )) f ( x ) A product shaped estimator ˆ f Y | X ( y | x ) = ˆ g n ( y )ˆ c n ( F n ( x ) , G n ( y )) Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction The Quantile-Copula estimator The quantile transform Asymptotic results The copula representation Comparison with competitors A product shaped estimator Application to prediction and discussions Summary and conclusions A product shaped estimator Assume that the copula function C ( u, v ) has a density c ( u, v ) = ∂ 2 C ( u,v ) ∂u∂v i.e. c ( u, v ) is the density of the transformed r.v. ( U, V ) = ( F ( X ) , G ( Y )) . A product form of the conditional density By differentiating Sklar’s formula, f Y | X ( y | x ) = f XY ( x, y ) = g ( y ) c ( F ( x ) , G ( y )) f ( x ) A product shaped estimator ˆ f Y | X ( y | x ) = ˆ g n ( y )ˆ c n ( F n ( x ) , G n ( y )) Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction The Quantile-Copula estimator The quantile transform Asymptotic results The copula representation Comparison with competitors A product shaped estimator Application to prediction and discussions Summary and conclusions Construction of the estimator - 1 → get an estimator of the conditional density by plugging estimators of each quantities. n � � y − Y i 1 density of Y : g ← kernel estimator ˆ g n ( y ) := � K 0 nh n h n i =1 n F n ( x ) = 1 � F ( x ) ← 1 X j � x n j =1 n c.d.f. empirical c.d.f. G n ( y ) := 1 � G ( y ) ← 1 Y j � y n j =1 copula density c ( u, v ) ← c n ( u, v ) a bivariate Parzen-Rosenblatt kernel density ( pseudo ) estimator � u − U i n 1 , v − V i � � c n ( u, v ) := K (1) na 2 a n a n n i =1 with kernel K ( u, v ) = K 1 ( u ) K 2 ( v ) , and bandwidths a n . Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction The Quantile-Copula estimator The quantile transform Asymptotic results The copula representation Comparison with competitors A product shaped estimator Application to prediction and discussions Summary and conclusions Construction of the estimator - 1 → get an estimator of the conditional density by plugging estimators of each quantities. n � � y − Y i 1 density of Y : g ← kernel estimator ˆ g n ( y ) := � K 0 nh n h n i =1 n F n ( x ) = 1 � F ( x ) ← 1 X j � x n j =1 n c.d.f. empirical c.d.f. G n ( y ) := 1 � G ( y ) ← 1 Y j � y n j =1 copula density c ( u, v ) ← c n ( u, v ) a bivariate Parzen-Rosenblatt kernel density ( pseudo ) estimator � u − U i n 1 , v − V i � � c n ( u, v ) := K (1) na 2 a n a n n i =1 with kernel K ( u, v ) = K 1 ( u ) K 2 ( v ) , and bandwidths a n . Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction The Quantile-Copula estimator The quantile transform Asymptotic results The copula representation Comparison with competitors A product shaped estimator Application to prediction and discussions Summary and conclusions Construction of the estimator - 1 → get an estimator of the conditional density by plugging estimators of each quantities. n � � y − Y i 1 density of Y : g ← kernel estimator ˆ g n ( y ) := � K 0 nh n h n i =1 n F n ( x ) = 1 � F ( x ) ← 1 X j � x n j =1 n c.d.f. empirical c.d.f. G n ( y ) := 1 � G ( y ) ← 1 Y j � y n j =1 copula density c ( u, v ) ← c n ( u, v ) a bivariate Parzen-Rosenblatt kernel density ( pseudo ) estimator � u − U i n 1 , v − V i � � c n ( u, v ) := K (1) na 2 a n a n n i =1 with kernel K ( u, v ) = K 1 ( u ) K 2 ( v ) , and bandwidths a n . Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction The Quantile-Copula estimator The quantile transform Asymptotic results The copula representation Comparison with competitors A product shaped estimator Application to prediction and discussions Summary and conclusions Construction of the estimator - 1 → get an estimator of the conditional density by plugging estimators of each quantities. n � � y − Y i 1 density of Y : g ← kernel estimator ˆ g n ( y ) := � K 0 nh n h n i =1 n F n ( x ) = 1 � F ( x ) ← 1 X j � x n j =1 n c.d.f. empirical c.d.f. G n ( y ) := 1 � G ( y ) ← 1 Y j � y n j =1 copula density c ( u, v ) ← c n ( u, v ) a bivariate Parzen-Rosenblatt kernel density ( pseudo ) estimator � u − U i n 1 , v − V i � � c n ( u, v ) := K (1) na 2 a n a n n i =1 with kernel K ( u, v ) = K 1 ( u ) K 2 ( v ) , and bandwidths a n . Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction The Quantile-Copula estimator The quantile transform Asymptotic results The copula representation Comparison with competitors A product shaped estimator Application to prediction and discussions Summary and conclusions Construction of the estimator - 2 But, F and G are unknown: the random variables ( U i = F ( X i ) , V i = G ( Y i )) are not observable. ⇒ c n : is not a true statistic. → approximate the pseudo-sample ( U i , V i ) , i = 1 , . . . , n by its empirical counterpart ( F n ( X i ) , G n ( Y i )) , i = 1 , . . . , n . A genuine estimator of c ( u, v ) � u − F n ( X i ) � v − G n ( Y i ) n � � 1 � c n ( u, v ) := ˆ K 1 K 2 . na 2 a n a n n i =1 Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction The Quantile-Copula estimator The quantile transform Asymptotic results The copula representation Comparison with competitors A product shaped estimator Application to prediction and discussions Summary and conclusions Construction of the estimator - 2 But, F and G are unknown: the random variables ( U i = F ( X i ) , V i = G ( Y i )) are not observable. ⇒ c n : is not a true statistic. → approximate the pseudo-sample ( U i , V i ) , i = 1 , . . . , n by its empirical counterpart ( F n ( X i ) , G n ( Y i )) , i = 1 , . . . , n . A genuine estimator of c ( u, v ) � u − F n ( X i ) � v − G n ( Y i ) n � � 1 � c n ( u, v ) := ˆ K 1 K 2 . na 2 a n a n n i =1 Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction The Quantile-Copula estimator The quantile transform Asymptotic results The copula representation Comparison with competitors A product shaped estimator Application to prediction and discussions Summary and conclusions Construction of the estimator - 2 But, F and G are unknown: the random variables ( U i = F ( X i ) , V i = G ( Y i )) are not observable. ⇒ c n : is not a true statistic. → approximate the pseudo-sample ( U i , V i ) , i = 1 , . . . , n by its empirical counterpart ( F n ( X i ) , G n ( Y i )) , i = 1 , . . . , n . A genuine estimator of c ( u, v ) � u − F n ( X i ) � v − G n ( Y i ) n � � 1 � c n ( u, v ) := ˆ K 1 K 2 . na 2 a n a n n i =1 Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction The Quantile-Copula estimator The quantile transform Asymptotic results The copula representation Comparison with competitors A product shaped estimator Application to prediction and discussions Summary and conclusions Construction of the estimator - 2 But, F and G are unknown: the random variables ( U i = F ( X i ) , V i = G ( Y i )) are not observable. ⇒ c n : is not a true statistic. → approximate the pseudo-sample ( U i , V i ) , i = 1 , . . . , n by its empirical counterpart ( F n ( X i ) , G n ( Y i )) , i = 1 , . . . , n . A genuine estimator of c ( u, v ) � u − F n ( X i ) � v − G n ( Y i ) n � � 1 � c n ( u, v ) := ˆ K 1 K 2 . na 2 a n a n n i =1 Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction The Quantile-Copula estimator The quantile transform Asymptotic results The copula representation Comparison with competitors A product shaped estimator Application to prediction and discussions Summary and conclusions Construction of the estimator - 2 But, F and G are unknown: the random variables ( U i = F ( X i ) , V i = G ( Y i )) are not observable. ⇒ c n : is not a true statistic. → approximate the pseudo-sample ( U i , V i ) , i = 1 , . . . , n by its empirical counterpart ( F n ( X i ) , G n ( Y i )) , i = 1 , . . . , n . A genuine estimator of c ( u, v ) � u − F n ( X i ) � v − G n ( Y i ) n � � 1 � c n ( u, v ) := ˆ K 1 K 2 . na 2 a n a n n i =1 Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction The Quantile-Copula estimator The quantile transform Asymptotic results The copula representation Comparison with competitors A product shaped estimator Application to prediction and discussions Summary and conclusions The quantile-copula estimator Recollecting all elements, we get, The quantile-copula estimator ˆ f n ( y | x ) := ˆ g n ( y )ˆ c n ( F n ( x ) , G n ( y )) . that is to say, � y − Y i � F n ( x ) − F n ( X i ) � n �� � n � 1 1 ˆ � � f n ( y | x ) := K 0 . K 1 na 2 nh n h n a n n i =1 i =1 � G n ( y ) − G n ( Y i ) �� K 2 a n Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction The Quantile-Copula estimator Asymptotic results Consistency and asymptotic normality Comparison with competitors Sketch of the proofs Application to prediction and discussions Summary and conclusions Outline Introduction 4 Why estimating the conditional density? Two classical approaches for estimation The trouble with ratio shaped estimators The Quantile-Copula estimator 5 The quantile transform The copula representation A product shaped estimator Asymptotic results 6 Consistency and asymptotic normality Sketch of the proofs Comparison with competitors 7 Theoretical comparison Finite sample simulation Application to prediction and discussions 8 Application to prediction Discussions Summary and conclusions 9 Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction The Quantile-Copula estimator Asymptotic results Consistency and asymptotic normality Comparison with competitors Sketch of the proofs Application to prediction and discussions Summary and conclusions Hypothesis Assumptions on the densities i) the c.d.f F of X and G of Y are strictly increasing and differentiable; ii) the densities g and c are twice differentiable with continuous bounded second derivatives on their support. Assumptions on the kernels (i) K and K 0 are of bounded support and of bounded variation; (ii) 0 ≤ K ≤ C and 0 ≤ K 0 ≤ C for some constant C ; (iii) K and K 0 are second order kernels: m 0 ( K ) = 1 , m 1 ( K ) = 0 and m 2 ( K ) < + ∞ , and the same for K 0 . (iv) K is twice differentiable with bounded second partial derivatives. → classical regularity assumptions in nonparametric literature. Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction The Quantile-Copula estimator Asymptotic results Consistency and asymptotic normality Comparison with competitors Sketch of the proofs Application to prediction and discussions Summary and conclusions Hypothesis Assumptions on the densities i) the c.d.f F of X and G of Y are strictly increasing and differentiable; ii) the densities g and c are twice differentiable with continuous bounded second derivatives on their support. Assumptions on the kernels (i) K and K 0 are of bounded support and of bounded variation; (ii) 0 ≤ K ≤ C and 0 ≤ K 0 ≤ C for some constant C ; (iii) K and K 0 are second order kernels: m 0 ( K ) = 1 , m 1 ( K ) = 0 and m 2 ( K ) < + ∞ , and the same for K 0 . (iv) K is twice differentiable with bounded second partial derivatives. → classical regularity assumptions in nonparametric literature. Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction The Quantile-Copula estimator Asymptotic results Consistency and asymptotic normality Comparison with competitors Sketch of the proofs Application to prediction and discussions Summary and conclusions Asymptotic results - 1 Under the above regularity assumptions, with h n → 0 , a n → 0 , Pointwise Consistency weak consistency h n ≃ n − 1 / 5 , a n ≃ n − 1 / 6 entail � n − 1 / 3 � ˆ f n ( y | x ) = f ( y | x ) + O P . strong consistency h n ≃ (ln ln n/n ) 1 / 5 and a n ≃ (ln ln n/n ) 1 / 6 �� ln ln n � 1 / 3 � ˆ f n ( y | x ) = f ( y | x ) + O a.s. . n asymptotic normality nh n → ∞ , na 4 n → ∞ , na 6 n → 0 , and √ ln ln n/ ( na 3 n ) → 0 entail � � d � ˆ 0 , g ( y ) f ( y | x ) || K || 2 � � na 2 f n ( y | x ) − f ( y | x ) � N . n 2 Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction The Quantile-Copula estimator Asymptotic results Consistency and asymptotic normality Comparison with competitors Sketch of the proofs Application to prediction and discussions Summary and conclusions Asymptotic results - 2 Uniform Consistency Under the above regularity assumptions, with h n → 0 , a n → 0 , for x in the interior of the support of f and [ a, b ] included in the interior of the support of g , weak consistency h n ≃ (ln n/n ) 1 / 5 , a n ≃ (ln n/n ) 1 / 6 entail � (ln n/n ) 1 / 3 � | ˆ sup f n ( y | x ) − f ( y | x ) | = O P . y ∈ [ a,b ] strong consistency h n ≃ (ln n/n ) 1 / 5 , a n ≃ (ln n/n ) 1 / 6 entail �� ln n � 1 / 3 � | ˆ sup f n ( y | x ) − f ( y | x ) | = O a.s. . n y ∈ [ a,b ] Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction The Quantile-Copula estimator Asymptotic results Consistency and asymptotic normality Comparison with competitors Sketch of the proofs Application to prediction and discussions Summary and conclusions Asymptotic Mean square error Asymptotic Bias and Variance for the quantile-copula estimator Bias: f n ( y | x )) − f ( y | x ) = g ( y ) m 2 ( K ) . ∇ 2 c ( F ( x ) , G ( y )) a 2 E ( ˆ 2 + o ( a 2 n n ) with m 2 ( K ) = ( m 2 ( K 1 ) , m 2 ( K 2 )) , ∇ 2 c ( u, v ) = ( ∂ 2 c ( u,v ) , ∂ 2 c ( u,v ) ) . ∂u 2 ∂v 2 Variance: V ar ( ˆ f ( y | x )) = 1 / ( na 2 n ) g ( y ) f ( y | x ) || K || 2 2 + o (1 / ( na 2 n )) . Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction The Quantile-Copula estimator Asymptotic results Consistency and asymptotic normality Comparison with competitors Sketch of the proofs Application to prediction and discussions Summary and conclusions Sketch of the proofs Decomposition diagram ˆ g ( y )ˆ c n ( F n ( x ) , G n ( y )) ↓ g ( y )ˆ c n ( F n ( x ) , G n ( y )) → g ( y )ˆ c n ( F ( x ) , G ( y )) → g ( y ) c n ( F ( x ) , G ( y )) ↓ g ( y ) c ( F ( x ) , G ( y )) ↓ : consistency results of the kernel density estimators → : two approximation lemmas ˆ c n from ( F n ( x ) , F n ( y )) → ( F ( x ) , G ( y )) 1 c n → c n . ˆ 2 Tools: results for the K-S statistics || F − F n || ∞ and || G − G n || ∞ . → Heuristic: rate of convergence of density estimators < rate of approximation of the K-S Statistic. Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction The Quantile-Copula estimator Asymptotic results Theoretical comparison Comparison with competitors Finite sample simulation Application to prediction and discussions Summary and conclusions Outline Introduction 4 Why estimating the conditional density? Two classical approaches for estimation The trouble with ratio shaped estimators The Quantile-Copula estimator 5 The quantile transform The copula representation A product shaped estimator Asymptotic results 6 Consistency and asymptotic normality Sketch of the proofs Comparison with competitors 7 Theoretical comparison Finite sample simulation Application to prediction and discussions 8 Application to prediction Discussions Summary and conclusions 9 Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction The Quantile-Copula estimator Asymptotic results Theoretical comparison Comparison with competitors Finite sample simulation Application to prediction and discussions Summary and conclusions Theoretical asymptotic comparison - 1 Competitor: e.g. Local Polynomial estimator, ˆ f ( LP ) ( y | x ) := ˆ θ 0 with n n j =0 θ j ( X i − x ) j � 2 � r � � K ′ R ( θ, x, y ) := K h 2 ( Y i − y ) − h 1 ( X i − x ) , i =1 where ˆ θ xy := (ˆ θ 0 , ˆ θ 1 , . . . , ˆ θ r ) is the value of θ which minimizes R ( θ, x, y ) . Comparative Bias B LP = h 2 1 m 2 ( K ′ ) ∂ 2 f ( y | x ) + h 2 ∂ 2 f ( y | x ) 2 m 2 ( K ) + o ( h 2 1 + h 2 2 ) 2 ∂x 2 2 ∂y 2 B QC = g ( y ) m 2 ( K ) . ∇ 2 c ( F ( x ) , G ( y )) a 2 2 + o ( a 2 n n ) Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction The Quantile-Copula estimator Asymptotic results Theoretical comparison Comparison with competitors Finite sample simulation Application to prediction and discussions Summary and conclusions Theoretical asymptotic comparison - 2 Asymptotic bias comparison All estimators have bias of the same order ≈ h 2 ≈ n − 1 / 3 ; Distribution dependent terms: difficult to compare sometimes less unknown terms for the quantile-copula estimator c of compact support : the “classical” kernel method to estimate the copula density induces bias on the boundaries of [0 , 1] 2 → techniques to reduce the bias of the kernel estimator on the edges (boundary kernels, beta kernels, reflection and transformation methods,...) Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction The Quantile-Copula estimator Asymptotic results Theoretical comparison Comparison with competitors Finite sample simulation Application to prediction and discussions Summary and conclusions Theoretical asymptotic comparison - 2 Asymptotic bias comparison All estimators have bias of the same order ≈ h 2 ≈ n − 1 / 3 ; Distribution dependent terms: difficult to compare sometimes less unknown terms for the quantile-copula estimator c of compact support : the “classical” kernel method to estimate the copula density induces bias on the boundaries of [0 , 1] 2 → techniques to reduce the bias of the kernel estimator on the edges (boundary kernels, beta kernels, reflection and transformation methods,...) Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction The Quantile-Copula estimator Asymptotic results Theoretical comparison Comparison with competitors Finite sample simulation Application to prediction and discussions Summary and conclusions Theoretical asymptotic comparison - 2 Asymptotic bias comparison All estimators have bias of the same order ≈ h 2 ≈ n − 1 / 3 ; Distribution dependent terms: difficult to compare sometimes less unknown terms for the quantile-copula estimator c of compact support : the “classical” kernel method to estimate the copula density induces bias on the boundaries of [0 , 1] 2 → techniques to reduce the bias of the kernel estimator on the edges (boundary kernels, beta kernels, reflection and transformation methods,...) Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction The Quantile-Copula estimator Asymptotic results Theoretical comparison Comparison with competitors Finite sample simulation Application to prediction and discussions Summary and conclusions Theoretical asymptotic comparison - 3 Asymptotic Variance comparison Main terms in the asymptotic variance: Ratio shaped estimators: V ar ( LP ) := f ( y | x ) → explosive variance for f ( x ) small value of the density f ( x ) , e.g. in the tail of the distribution of X . Quantile-copula estimator: V ar ( QC ) := g ( y ) f ( y | x ) → does not suffer from the unstable nature of competitors. Asymptotic relative efficiency: ratio of variances V ar ( QC ) V ar ( LP ) := f ( x ) g ( y ) → the QC has a lower asymptotic variance for a large amount of x,y values. Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction The Quantile-Copula estimator Asymptotic results Theoretical comparison Comparison with competitors Finite sample simulation Application to prediction and discussions Summary and conclusions Theoretical asymptotic comparison - 3 Asymptotic Variance comparison Main terms in the asymptotic variance: Ratio shaped estimators: V ar ( LP ) := f ( y | x ) → explosive variance for f ( x ) small value of the density f ( x ) , e.g. in the tail of the distribution of X . Quantile-copula estimator: V ar ( QC ) := g ( y ) f ( y | x ) → does not suffer from the unstable nature of competitors. Asymptotic relative efficiency: ratio of variances V ar ( QC ) V ar ( LP ) := f ( x ) g ( y ) → the QC has a lower asymptotic variance for a large amount of x,y values. Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction The Quantile-Copula estimator Asymptotic results Theoretical comparison Comparison with competitors Finite sample simulation Application to prediction and discussions Summary and conclusions Theoretical asymptotic comparison - 3 Asymptotic Variance comparison Main terms in the asymptotic variance: Ratio shaped estimators: V ar ( LP ) := f ( y | x ) → explosive variance for f ( x ) small value of the density f ( x ) , e.g. in the tail of the distribution of X . Quantile-copula estimator: V ar ( QC ) := g ( y ) f ( y | x ) → does not suffer from the unstable nature of competitors. Asymptotic relative efficiency: ratio of variances V ar ( QC ) V ar ( LP ) := f ( x ) g ( y ) → the QC has a lower asymptotic variance for a large amount of x,y values. Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction The Quantile-Copula estimator Asymptotic results Theoretical comparison Comparison with competitors Finite sample simulation Application to prediction and discussions Summary and conclusions Theoretical asymptotic comparison - 3 Asymptotic Variance comparison Main terms in the asymptotic variance: Ratio shaped estimators: V ar ( LP ) := f ( y | x ) → explosive variance for f ( x ) small value of the density f ( x ) , e.g. in the tail of the distribution of X . Quantile-copula estimator: V ar ( QC ) := g ( y ) f ( y | x ) → does not suffer from the unstable nature of competitors. Asymptotic relative efficiency: ratio of variances V ar ( QC ) V ar ( LP ) := f ( x ) g ( y ) → the QC has a lower asymptotic variance for a large amount of x,y values. Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction The Quantile-Copula estimator Asymptotic results Theoretical comparison Comparison with competitors Finite sample simulation Application to prediction and discussions Summary and conclusions Finite sample simulation Model Sample of n = 100 i.i.d. variables ( X i , Y i ) , from the following model: X, Y is marginally distributed as N (0 , 1) X, Y is linked via Frank Copula . C ( u, v, θ ) = ln[( θ + θ u + v − θ u − θ v ) / ( θ − 1)] ln θ with parameter θ = 100 . Practical implementation: Beta kernels for copula estimator, Epanechnikov for other. simple Rule-of-thumb method for the bandwidths. Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction The Quantile-Copula estimator Asymptotic results Theoretical comparison Comparison with competitors Finite sample simulation Application to prediction and discussions Summary and conclusions Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction The Quantile-Copula estimator Asymptotic results Theoretical comparison Comparison with competitors Finite sample simulation Application to prediction and discussions Summary and conclusions Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction The Quantile-Copula estimator Asymptotic results Theoretical comparison Comparison with competitors Finite sample simulation Application to prediction and discussions Summary and conclusions Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction The Quantile-Copula estimator Asymptotic results Theoretical comparison Comparison with competitors Finite sample simulation Application to prediction and discussions Summary and conclusions Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction The Quantile-Copula estimator Asymptotic results Application to prediction Comparison with competitors Discussions Application to prediction and discussions Summary and conclusions Outline Introduction 4 Why estimating the conditional density? Two classical approaches for estimation The trouble with ratio shaped estimators The Quantile-Copula estimator 5 The quantile transform The copula representation A product shaped estimator Asymptotic results 6 Consistency and asymptotic normality Sketch of the proofs Comparison with competitors 7 Theoretical comparison Finite sample simulation Application to prediction and discussions 8 Application to prediction Discussions Summary and conclusions 9 Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction The Quantile-Copula estimator Asymptotic results Application to prediction Comparison with competitors Discussions Application to prediction and discussions Summary and conclusions Application to prediction - definitions Point predictors: Conditional mode predictor Definition of the mode: θ ( x ) := arg sup y f ( y | x ) → plug in predictor : ˆ θ ( x ) := arg sup y ˆ f n ( y | x ) Set predictors: Level sets Predictive set C α ( x ) such as P ( Y ∈ C α ( x ) | X = x ) = α → Level set or Highest density region C α ( x ) := { y : f ( y | x ) ≥ f α } with f α the largest value such that the prediction set has coverage probability α . → plug-in level set: C α,n ( x ) := { y : ˆ f n ( y | x ) ≥ ˆ f α } where ˆ f α is an estimate of f α . Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction The Quantile-Copula estimator Asymptotic results Application to prediction Comparison with competitors Discussions Application to prediction and discussions Summary and conclusions Application to prediction - definitions Point predictors: Conditional mode predictor Definition of the mode: θ ( x ) := arg sup y f ( y | x ) → plug in predictor : ˆ θ ( x ) := arg sup y ˆ f n ( y | x ) Set predictors: Level sets Predictive set C α ( x ) such as P ( Y ∈ C α ( x ) | X = x ) = α → Level set or Highest density region C α ( x ) := { y : f ( y | x ) ≥ f α } with f α the largest value such that the prediction set has coverage probability α . → plug-in level set: C α,n ( x ) := { y : ˆ f n ( y | x ) ≥ ˆ f α } where ˆ f α is an estimate of f α . Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction The Quantile-Copula estimator Asymptotic results Application to prediction Comparison with competitors Discussions Application to prediction and discussions Summary and conclusions Application to prediction - definitions Point predictors: Conditional mode predictor Definition of the mode: θ ( x ) := arg sup y f ( y | x ) → plug in predictor : ˆ θ ( x ) := arg sup y ˆ f n ( y | x ) Set predictors: Level sets Predictive set C α ( x ) such as P ( Y ∈ C α ( x ) | X = x ) = α → Level set or Highest density region C α ( x ) := { y : f ( y | x ) ≥ f α } with f α the largest value such that the prediction set has coverage probability α . → plug-in level set: C α,n ( x ) := { y : ˆ f n ( y | x ) ≥ ˆ f α } where ˆ f α is an estimate of f α . Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction The Quantile-Copula estimator Asymptotic results Application to prediction Comparison with competitors Discussions Application to prediction and discussions Summary and conclusions Application to prediction - definitions Point predictors: Conditional mode predictor Definition of the mode: θ ( x ) := arg sup y f ( y | x ) → plug in predictor : ˆ θ ( x ) := arg sup y ˆ f n ( y | x ) Set predictors: Level sets Predictive set C α ( x ) such as P ( Y ∈ C α ( x ) | X = x ) = α → Level set or Highest density region C α ( x ) := { y : f ( y | x ) ≥ f α } with f α the largest value such that the prediction set has coverage probability α . → plug-in level set: C α,n ( x ) := { y : ˆ f n ( y | x ) ≥ ˆ f α } where ˆ f α is an estimate of f α . Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Introduction The Quantile-Copula estimator Asymptotic results Application to prediction Comparison with competitors Discussions Application to prediction and discussions Summary and conclusions Application to prediction - definitions Point predictors: Conditional mode predictor Definition of the mode: θ ( x ) := arg sup y f ( y | x ) → plug in predictor : ˆ θ ( x ) := arg sup y ˆ f n ( y | x ) Set predictors: Level sets Predictive set C α ( x ) such as P ( Y ∈ C α ( x ) | X = x ) = α → Level set or Highest density region C α ( x ) := { y : f ( y | x ) ≥ f α } with f α the largest value such that the prediction set has coverage probability α . → plug-in level set: C α,n ( x ) := { y : ˆ f n ( y | x ) ≥ ˆ f α } where ˆ f α is an estimate of f α . Olivier P. Faugeras Th` ese de Doctorat de l’Universit´ e Pierre et Marie Curie
Recommend
More recommend