Carlos Ramos Carreño Grupo de Aprendizaje Automático, Department of Computer Science , Universidad Autónoma de Madrid (UAM)
Who are we? Carlos Ramos Carreño (carlos.ramos@uam.es)¹ ● José Luis Torrecilla Noguerales (joseluis.torrecilla@uam.es)² ● Alberto Suárez (alberto.suarez@uam.es)¹ ● Miguel Carbajo Berrocal ● Pablo Marcos Manchón ● Amanda Hernando Bernabé ● Pablo Pérez Manso ● ¹ Department of Computer Science , Universidad Autónoma de Madrid (UAM) ² Department of Mathematics, Universidad Autónoma de Madrid (UAM) 2
What is scikit-fda? ● A software package for Functional Data Analysis (FDA) ● Preprocessing, exploration and machine learning tools ● Fully integrated in the Python science ecosystem ● Efficient, flexible and easy to use 3
Which other tools for FDA are available? Mainly R software: General purpose FPCA ● ● fda fdapace ○ ○ fda.usc MFPCA ○ ○ tidyfun Regression ○ ● Representation refund ● ○ funData refund.wave ○ ○ Registration fdaPDE ● ○ fdasrvf sparseFLMM ○ ○ Robust analysis FDBoost ● ○ roahd ○ 4
Which other tools for FDA are available? Mainly R software: Visualization Clustering ● ● rainbow Funclustering ○ ○ Variable selection funcy ● ○ RFgroove funFEM ○ ○ Time series funHDDC ● ○ fds ○ ftsa ○ 5
Why Python? Powerful, easy to use, generic purpose programming language ● The Scipy environment: ● Numpy: N-dimensional arrays and linear algebra ○ SciPy: Utilities (statistics, integration, formats…) ○ Matplotlib: Plotting ○ Jupyter: Interactive notebooks ○ and much more... ○ 6
Why scikit? Scipy Toolkits (SciKits) ● Specialized science packages: ● 7
scikit-fda exploratory representation preprocessing analysis statistical inference machine learning 8
representation discretized representation regularly sampled irregularly sampled basis representation 9
Discretized representation Each curve is evaluated at the same points 10
Basis representation Expansion in a truncated basis of functions 11
preprocessing smoothing registration dimensionality reduction 12
Registration Alignment of the curves, so that common features (peaks, valleys...) are ● at the same points Typically, a warping function is used to transform the input ● Several methods ● Shift registration ○ Landmark registration ○ Elastic registration ○ ... ○ 13
Shift registration Warpings are translations ● Try to minimize the least squares criterion ● 14
Landmark registration Warping functions to move the predefined landmarks to fixed positions ● Landmarks should be specified by the user ● 15
Elastic registration ● Uses the square root velocity framework (Srivastava et al., 2011 <arXiv:1103.3817> and Tucker et al., 2014 <doi:10.1016/j.csda.2012.12.001>) ● Available also in fdasrvf in R ● Unsupervised method 16
exploratory analysis descriptive depth outliers statistics visualization 17
Functional data boxplot Similar to the boxplot of univariate data ● A depth function must be chosen ● 18
statistical inference estimation statistical hypothesis confidence intervals testing 19
machine learning clustering classification regression 20
K-means clustering Predefined number of clusters ● Finds the best position of the centroids of the clusters ● A functional metric must be chosen ● 21
Fuzzy K-means Fuzzy version of K-means ● Each observation does not necessary belong to only one of the clusters: ● it has a degree of membership to each of them The degrees of membership add up to one ● 22
Documentation ● Up to date and available online ● Easily searchable ● Cross referenced ● Detailed examples and interactive notebooks ● Examples downloadable as Python source files or Jupyter notebook 23
Where can I find more? PyPI: https://pypi.org/project/scikit-fda/ Github page: https://github.com/GAA-UAM/scikit-fda/ Documentation: https://fda.readthedocs.io 24
Thanks for your attention!! 25
Recommend
More recommend