on the theory and practice of variable selection for
play

On the Theory and Practice of Variable Selection for Functional Data - PowerPoint PPT Presentation

On the Theory and Practice of Variable Selection for Functional Data Jos e Luis Torrecilla under the supervision of Jos e Ram on Berrendero and Antonio Cuevas Departamento de Matem aticas Universidad Aut onoma de Madrid Lectura


  1. On the Theory and Practice of Variable Selection for Functional Data Jos´ e Luis Torrecilla under the supervision of Jos´ e Ram´ on Berrendero and Antonio Cuevas Departamento de Matem´ aticas Universidad Aut´ onoma de Madrid Lectura de tesis Madrid - December 3, 2015 J.L. Torrecilla (UAM) Variable selection for functional data 1 / 76

  2. Outline Introduction 1 FDA Variable Selection Functional classification RKHS 2 The RKHS approach The absolutely continuous case The singular case Variable selection 3 Variable selection and RKHS mRMR-RD Maxima hunting Experiments 4 Conclusions and future work 5 J.L. Torrecilla (UAM) Variable selection for functional data 2 / 76

  3. Outline Introduction 1 FDA Variable Selection Functional classification RKHS 2 The RKHS approach The absolutely continuous case The singular case Variable selection 3 Variable selection and RKHS mRMR-RD Maxima hunting Experiments 4 Conclusions and future work 5 J.L. Torrecilla (UAM) Variable selection for functional data 3 / 76

  4. Outline Introduction 1 FDA Variable Selection Functional classification RKHS 2 The RKHS approach The absolutely continuous case The singular case Variable selection 3 Variable selection and RKHS mRMR-RD Maxima hunting Experiments 4 Conclusions and future work 5 J.L. Torrecilla (UAM) Variable selection for functional data 4 / 76

  5. Functional Data Analysis What are functional data? Let (Ω , F , P ) be a probability space and I ⊆ R an index set, an stochastic process is a collection of random variables { X ( ω, t ) : ω ∈ Ω , t ∈ I} where X ( · , t ) is an F -measurable function on Ω. A functional data is just a realization (often called “trajectory”) of a stochastic process for all t ∈ [0 , T ]. J.L. Torrecilla (UAM) Variable selection for functional data 5 / 76

  6. Difficulties and particularities No obvious order structure (distribution functions), nor closeness or centrality notions (outliers, depth). Representation problems. Function spaces are “difficult to fill”. No natural densities: no natural translation-invariant measure plays the role of Lebesgue measure in R n . Redundancy: close variables are closely related (continuity). Fails in linear models. High dimension: the curse of the dimensionality, overvitting, computational cost... J.L. Torrecilla (UAM) Variable selection for functional data 6 / 76

  7. Outline Introduction 1 FDA Variable Selection Functional classification RKHS 2 The RKHS approach The absolutely continuous case The singular case Variable selection 3 Variable selection and RKHS mRMR-RD Maxima hunting Experiments 4 Conclusions and future work 5 J.L. Torrecilla (UAM) Variable selection for functional data 7 / 76

  8. Variable selection Idea Choose the most informative subset among the original variables. Motivation ◮ Variable selection is a successful technique of dimension reduction in other fields. ◮ This was an almost unexplored topic in FDA classification. ◮ The dimension reduction is made in terms of the original variables (interpretability). Goals ◮ Remove useless and redundant variables improving temporal and storage performance. ◮ Improve the classification accuracy decreasing the overfitting risk. ◮ Get theoretical and more interpretable models. J.L. Torrecilla (UAM) Variable selection for functional data 8 / 76

  9. What do we mean by “variable selection” in FDA? Given a sample of functions X 1 ( t ) , · · · , X n ( t ), t ∈ [0 , 1] our aim is to replace every sample function X j with a vector ( X j ( t 1 ) , · · · , X j ( t d )) , for suitably chosen points t 1 , · · · , t d . Then we would apply multivariate methods (regression, classification,...) to the “reduced” data. According to our experience, the value of d should be typically small (not much larger than 5, say). J.L. Torrecilla (UAM) Variable selection for functional data 9 / 76

  10. Relevance Vs. Redundancy mRMR MaxRel 850 900 950 1000 1050 850 900 950 1000 1050 J.L. Torrecilla (UAM) Variable selection for functional data 10 / 76

  11. Relevance Vs. Redundancy mRMR MaxRel 850 900 950 1000 1050 850 900 950 1000 1050 err = 4 . 09% err = 1 . 86% J.L. Torrecilla (UAM) Variable selection for functional data 10 / 76

  12. Outline Introduction 1 FDA Variable Selection Functional classification RKHS 2 The RKHS approach The absolutely continuous case The singular case Variable selection 3 Variable selection and RKHS mRMR-RD Maxima hunting Experiments 4 Conclusions and future work 5 J.L. Torrecilla (UAM) Variable selection for functional data 11 / 76

  13. Functional classification problem 1.5 1.5 1.0 1.0 0.5 0.5 x(t) x(t) 0.0 0.0 −0.5 −0.5 −1.0 −1.0 0.0 0.4 0.8 0.0 0.4 0.8 t t J.L. Torrecilla (UAM) Variable selection for functional data 12 / 76

  14. Functional classification problem (II) Which is the class of this trajectory? 1.5 1.0 0.5 x(t) 0.0 −0.5 −1.0 0.0 0.2 0.4 0.6 0.8 1.0 t J.L. Torrecilla (UAM) Variable selection for functional data 13 / 76

  15. Statement of the problem Independent observations: ( X 1 , Y 1 ) , . . . , ( X n , Y n ). X ∈ F [0 , T ] Y ∈ { 0 , 1 } J.L. Torrecilla (UAM) Variable selection for functional data 14 / 76

  16. Statement of the problem Independent observations: ( X 1 , Y 1 ) , . . . , ( X n , Y n ). X ∈ F [0 , T ] Y ∈ { 0 , 1 } Optimal classification rule (Bayes rule) g ∗ ( X ) = I { η ( X ) > 1 / 2 } , where η ( x ) = E ( Y | X = x ). Bayes Error L ∗ = P ( g ∗ ( X ) � = Y ) . J.L. Torrecilla (UAM) Variable selection for functional data 14 / 76

  17. Statement of the problem Independent observations: ( X 1 , Y 1 ) , . . . , ( X n , Y n ). X ∈ F [0 , T ] Y ∈ { 0 , 1 } Optimal classification rule (Bayes rule) g ∗ ( X ) = I { η ( X ) > 1 / 2 } , where η ( x ) = E ( Y | X = x ). Bayes Error L ∗ = P ( g ∗ ( X ) � = Y ) . g ∗ ( X ) = 1 ⇔ dP 1 ( X ) > 1 − p dP 0 p See Ba´ ıllo et al., Scand. J. Stat. (2011), Theorem 1 J.L. Torrecilla (UAM) Variable selection for functional data 14 / 76

  18. Our general approach We consider the functional data as trajectories drawn from a stochastic process. We have tried to motivate our results and proposals in terms of this underlying stochastic process. This is somewhat in contrast with the mainstream research line in FDA, mostly centred in algorithmic aspects and real data analysis. “Curiously, despite a huge research activity in the field, few attempts have been made to connect the area of functional data analysis with the theory of stochastic processes” Biau et al. 2015 J.L. Torrecilla (UAM) Variable selection for functional data 15 / 76

  19. Contributions a) A mathematical contribution to the functional classification problem (RKHS) b) Functional variable selection: a theoretical motivation and three different proposals. c) Large and replicable simulation studies. J.L. Torrecilla (UAM) Variable selection for functional data 16 / 76

  20. Outline Introduction 1 FDA Variable Selection Functional classification RKHS 2 The RKHS approach The absolutely continuous case The singular case Variable selection 3 Variable selection and RKHS mRMR-RD Maxima hunting Experiments 4 Conclusions and future work 5 J.L. Torrecilla (UAM) Variable selection for functional data 17 / 76

  21. Outline Introduction 1 FDA Variable Selection Functional classification RKHS 2 The RKHS approach The absolutely continuous case The singular case Variable selection 3 Variable selection and RKHS mRMR-RD Maxima hunting Experiments 4 Conclusions and future work 5 J.L. Torrecilla (UAM) Variable selection for functional data 18 / 76

  22. RKHS approach “It turns out, in my opinion, that reproducing kernel Hilbert spaces are the natural setting in which to solve problems of statistical inference on time processes” . Parzen, 1961 Why natural? RKHS provides an intrinsic inner product depending on the covariance structure. Explicit expressions of the Bayes rule (equivalent distributions). Approximate optimal rule under mutually singular distributions. Insight into the near “perfect classification phenomenon” (Delaigle and Hall 2012) Natural setting to formalize variable selection problems (RK-VS and associated classifier). Berrendero, Cuevas and Torrecilla. On near perfect classification and functional Fisher rules via reproducing kernels. Manuscript. arXiv:1507.04398v2. J.L. Torrecilla (UAM) Variable selection for functional data 19 / 76

  23. Some background Definition: If X = { X t , t ∈ [0 , T ] } is a L 2 -process with covariance function K ( s , t ), define ( H 0 ( K ) , �· , ·� ) by n � H 0 ( K ) := { f : f ( s ) = a i K ( s , t i ) , a i ∈ R , t i ∈ [0 , T ] , n ∈ N } i � � f , g � K = α i β j K ( s j , t i ) , i , j where f ( x ) = � i α i K ( x , t i ) and g ( x ) = � j β j K ( x , s j ). The RKHS associated with K , H ( K ), is defined as the completion of H 0 ( K ). More precisely, H ( K ) is the set of functions f : [0 , T ] → R obtained as t pointwise limit of a Cauchy sequence { f n } in H 0 ( K ). J.L. Torrecilla (UAM) Variable selection for functional data 20 / 76

Recommend


More recommend