Conditional quantiles with functional covariates: an application to Ozone pollution forecasting Hervé Cardot, Christophe Crambes & Pascal Sarda Compstat - Prague August 2004 Compstat 2004 - Prague – p.1/14
Presentation of the data (1) Data (ORAMIP) : Compstat 2004 - Prague – p.2/14
Presentation of the data (1) Data (ORAMIP) : � 9 variables : NO, N 2 , O 3 , WD, WS, . . . (hourly measurements) Compstat 2004 - Prague – p.2/14
Presentation of the data (1) Data (ORAMIP) : � 9 variables : NO, N 2 , O 3 , WD, WS, . . . (hourly measurements) � 6 stations Compstat 2004 - Prague – p.2/14
Presentation of the data (1) Data (ORAMIP) : � 9 variables : NO, N 2 , O 3 , WD, WS, . . . (hourly measurements) � 6 stations � 4 years : 1997 − 2000 ( 15 th May - 15 th Sept) Compstat 2004 - Prague – p.2/14
Presentation of the data (2) 120 100 80 Ozone 60 40 20 0 0 10 20 30 40 50 60 70 hours Compstat 2004 - Prague – p.3/14
Presentation of the data (3) Compstat 2004 - Prague – p.4/14
Presentation of the data (3) � variable of interest : max of O 3 every day: Y = t ( Y 1 , . . . , Y n ) Compstat 2004 - Prague – p.4/14
Presentation of the data (3) � variable of interest : max of O 3 every day: Y = t ( Y 1 , . . . , Y n ) � covariates : NO, N 2 , O 3 , DV or V V : 18h . . . 24h 1h . . . 17h day 0/day 1 . . . . . . . . . . . . X 1 , 1 X 1 , 24 . . . . . . . . . day n − 1 /day n X n, 1 . . . . . . . . . . . . X n, 24 Compstat 2004 - Prague – p.4/14
Presentation of the data (3) � variable of interest : max of O 3 every day: Y = t ( Y 1 , . . . , Y n ) � covariates : NO, N 2 , O 3 , DV or V V : 18h . . . 24h 1h . . . 17h day 0/day 1 . . . . . . . . . . . . X 1 , 1 X 1 , 24 . . . . . . . . . day n − 1 /day n X n, 1 . . . . . . . . . . . . X n, 24 � ( X i , Y i ) i =1 ,...,n couples of random variables with Y i ∈ R and X i ∈ L 2 ( I ) Compstat 2004 - Prague – p.4/14
Presentation of the data (3) � variable of interest : max of O 3 every day: Y = t ( Y 1 , . . . , Y n ) � covariates : NO, N 2 , O 3 , DV or V V : 18h . . . 24h 1h . . . 17h day 0/day 1 . . . . . . . . . . . . X 1 , 1 X 1 , 24 . . . . . . . . . day n − 1 /day n X n, 1 . . . . . . . . . . . . X n, 24 � ( X i , Y i ) i =1 ,...,n couples of random variables with Y i ∈ R and X i ∈ L 2 ( I ) � X i is known in t 1 , . . . , t p ∈ I (equispaced) Compstat 2004 - Prague – p.4/14
Definition of the conditional quantiles Compstat 2004 - Prague – p.5/14
Definition of the conditional quantiles � α ∈ ]0 , 1[ , x ∈ L 2 ( I ) Compstat 2004 - Prague – p.5/14
Definition of the conditional quantiles � α ∈ ]0 , 1[ , x ∈ L 2 ( I ) � α conditional quantile : P ( Y ≤ g α ( X ) | X = x ) = α Compstat 2004 - Prague – p.5/14
Definition of the conditional quantiles � α ∈ ]0 , 1[ , x ∈ L 2 ( I ) � α conditional quantile : P ( Y ≤ g α ( X ) | X = x ) = α � property : g α ( x ) = arg min a ∈ R E ( l α ( Y − a ) | X = x ) with l α ( u ) = | u | + (2 α − 1) u Compstat 2004 - Prague – p.5/14
Presentation of the model Compstat 2004 - Prague – p.6/14
Presentation of the model � model (cf. Koenker and Bassett, 1978) : � g α ( X ) = c + � Ψ α , X � = c + Ψ α ( t ) X ( t ) dt I Compstat 2004 - Prague – p.6/14
Presentation of the model � model (cf. Koenker and Bassett, 1978) : � g α ( X ) = c + � Ψ α , X � = c + Ψ α ( t ) X ( t ) dt I � we want to estimate the function Ψ α ∈ L 2 ( I ) : spline estimation Compstat 2004 - Prague – p.6/14
Spline estimation of Ψ α Compstat 2004 - Prague – p.7/14
Spline estimation of Ψ α k ∈ N ⋆ , q ∈ N Compstat 2004 - Prague – p.7/14
Spline estimation of Ψ α k ∈ N ⋆ , q ∈ N interval I I j I 1 I k k sub−intervals Compstat 2004 - Prague – p.7/14
Spline estimation of Ψ α k ∈ N ⋆ , q ∈ N t ( B 1 , . . . , B k + q ) B -splines basis B k , q = Compstat 2004 - Prague – p.7/14
Spline estimation of Ψ α k ∈ N ⋆ , q ∈ N t ( B 1 , . . . , B k + q ) B -splines basis B k , q = k + q � t B k , q � � estimator : � Ψ α = θ = θ j B j j =1 Compstat 2004 - Prague – p.7/14
Spline estimation of Ψ α k ∈ N ⋆ , q ∈ N t ( B 1 , . . . , B k + q ) B -splines basis B k , q = k + q � t B k , q � � estimator : � Ψ α = θ = θ j B j j =1 Compstat 2004 - Prague – p.7/14
c and � Determination of � θ Compstat 2004 - Prague – p.8/14
c and � Determination of � θ � � θ and � c solution of the minimisation problem : � 1 n � l α ( Y i − c − � t B k , q θ , X i � ) + ρ � ( t B k , q θ ) ( m ) � 2 � min n θ ∈ R k + q i =1 Compstat 2004 - Prague – p.8/14
c and � Determination of � θ � � θ and � c solution of the minimisation problem : � 1 n � l α ( Y i − c − � t B k , q θ , X i � ) + ρ � ( t B k , q θ ) ( m ) � 2 � min n θ ∈ R k + q i =1 empirical version of E ( l α ( Y − c − � s, X � )) Compstat 2004 - Prague – p.8/14
c and � Determination of � θ � � θ and � c solution of the minimisation problem : � 1 n � l α ( Y i − c − � t B k , q θ , X i � ) + ρ � ( t B k , q θ ) ( m ) � 2 � min n θ ∈ R k + q i =1 penalization Compstat 2004 - Prague – p.8/14
c and � Determination of � θ � � θ and � c solution of the minimisation problem : � 1 n � l α ( Y i − c − � t B k , q θ , X i � ) + ρ � ( t B k , q θ ) ( m ) � 2 � min n θ ∈ R k + q i =1 � no explicit solution Compstat 2004 - Prague – p.8/14
c and � Determination of � θ � � θ and � c solution of the minimisation problem : � 1 n � l α ( Y i − c − � t B k , q θ , X i � ) + ρ � ( t B k , q θ ) ( m ) � 2 � min n θ ∈ R k + q i =1 � no explicit solution � algorithm : Iterative Reweighted Least Squares Compstat 2004 - Prague – p.8/14
Multiple conditional quantiles Compstat 2004 - Prague – p.9/14
Multiple conditional quantiles � v covariates X 1 , . . . , X v Compstat 2004 - Prague – p.9/14
Multiple conditional quantiles � v covariates X 1 , . . . , X v � model : g α ( X 1 , . . . , X v ) � � Ψ 1 α ( t ) X 1 ( t ) dt + . . . + Ψ v α ( t ) X v ( t ) dt = c + I I Compstat 2004 - Prague – p.9/14
Multiple conditional quantiles � v covariates X 1 , . . . , X v � model : g α ( X 1 , . . . , X v ) � � Ψ 1 α ( t ) X 1 ( t ) dt + . . . + Ψ v α ( t ) X v ( t ) dt = c + I I � algorithm : backfitting + Iterative Reweighted Least Squares Compstat 2004 - Prague – p.9/14
Application to the pollution data Compstat 2004 - Prague – p.10/14
Application to the pollution data � learning sample : ( X l i , Y l i ) i =1 ,...,n learn � test sample : ( X t i , Y t i ) i =1 ,...,n test Compstat 2004 - Prague – p.10/14
Application to the pollution data � learning sample : ( X l i , Y l i ) i =1 ,...,n learn � test sample : ( X t i , Y t i ) i =1 ,...,n test � number of knots : k = 8 (equispaced) � degree of splines functions : q = 3 � order of derivation in the penalization : m = 2 Compstat 2004 - Prague – p.10/14
Application to the pollution data � learning sample : ( X l i , Y l i ) i =1 ,...,n learn � test sample : ( X t i , Y t i ) i =1 ,...,n test � number of knots : k = 8 (equispaced) � degree of splines functions : q = 3 � order of derivation in the penalization : m = 2 � choice of ρ : Generalized Cross Validation Compstat 2004 - Prague – p.10/14
Quality criteria of the models Compstat 2004 - Prague – p.11/14
Quality criteria of the models � n t i =1 ( Y t i − � 1 Y t i ) 2 n t C 1 = � n t 1 i =1 ( Y t i − Y l ) 2 n t Compstat 2004 - Prague – p.11/14
Quality criteria of the models � n t i =1 ( Y t i − � 1 Y t i ) 2 n t C 1 = � n t 1 i =1 ( Y t i − Y l ) 2 n t n t � C 2 = 1 | Y t i − � Y t i | n t i =1 Compstat 2004 - Prague – p.11/14
Quality criteria of the models � n t i =1 ( Y t i − � 1 Y t i ) 2 n t C 1 = � n t 1 i =1 ( Y t i − Y l ) 2 n t n t � C 2 = 1 | Y t i − � Y t i | n t i =1 � n t i =1 l α ( Y t i − � 1 Y t i ) n t C 3 = � n t 1 i =1 l α ( Y t i − q α ( Y l )) n t Compstat 2004 - Prague – p.11/14
Results (conditional median) Compstat 2004 - Prague – p.12/14
Results (conditional median) Models Variables C 1 C 2 C 3 N2 0 . 814 16 . 916 0 . 906 1 covariate O3 0.414 12.246 0.656 WS 0 . 802 16 . 836 0 . 902 O3, NO 0 . 413 11 . 997 0 . 643 2 covariates 0 . 413 11 . 880 0 . 637 O3, N2 O3, WS 0 . 414 12 . 004 0 . 635 O3, NO, N2 0 . 412 12 . 127 0 . 644 3 covariates O3, N2, WD 0 . 409 12 . 004 0 . 645 O3, N2, WS 0 . 410 11 . 997 0 . 642 4 covariates O3, NO, N2, WS 0.400 11.718 0.634 5 covariates O3, NO, N2, WD, WS 0 . 401 11 . 750 0 . 639 Compstat 2004 - Prague – p.12/14
Forecasting (conditional median) Compstat 2004 - Prague – p.13/14
Recommend
More recommend