ab
play

AB Introduction Functional data occurs for example in time series - PowerPoint PPT Presentation

Introduction to Functional Data Analysis Elia Liiti ainen ( eliitiai@cc.hut.fi ) Time Series Prediction Group Adaptive Informatics Research Centre Helsinki University of Technology, Finland January 30, 2007 AB Introduction Functional data


  1. Introduction to Functional Data Analysis Elia Liiti¨ ainen ( eliitiai@cc.hut.fi ) Time Series Prediction Group Adaptive Informatics Research Centre Helsinki University of Technology, Finland January 30, 2007 AB

  2. Introduction Functional data occurs for example in time series analysis, chemometry and econometry. In many cases the amount of samples available is small. Taking the structure of the inputs into account improves results of statistical inference. FDA is a framework that provides tools for this purpose. AB 2 / 22

  3. Outline 1 General Considerations 2 Correlation analysis 3 Interpolation AB 3 / 22

  4. Goal of Functional Data Analysis Exploratory data analysis: Data provides new information and sheds light on known features. Confirmatory analysis: Hypothesis testing. Prediction: Prediction of the future. AB 4 / 22

  5. Functional Data Real world phenomena are usually continuous at small enough time scale. The worst-case dimension of functional data is infinite (white noise). For smooth functions with bounded derivative the instrinsic dimension is finite. Typically for smooth functions the practical dimension is 10-20. AB 5 / 22

  6. Noise Typically in function data there is noise. In mathematical terms x i ( t ) = y i ( t ) + ǫ ( t ). (1) To make things worse, often Cov( ǫ ( t 2 ) , ǫ ( t 1 )) � = 0 for t 2 � = t 1 . AB 6 / 22

  7. Data Representation The form of the curve is important. The first step in FDA is transformation of the inputs to remove noise. Basic tools include smoothing and interpolation. AB 7 / 22

  8. Derivatives Derivatives are important. Numerical differentiation amplifies noise. Interpolation or smoothing helps in this regard. AB 8 / 22

  9. Covariance and Variance Functions { x i ( t ) } N i =1 is a sample of functions. Mean: N � x ( t ) = N − 1 ¯ x i ( t ). (2) i =1 Variance function: N � var X ( t ) = ( N − 1) − 1 x ( t )] 2 . [ x i ( t ) − ¯ (3) i =1 Covariance Function N � cov X ( t 1 , t 2 ) = ( N − 1) − 1 { x i ( t 1 ) − ¯ x i ( t 1 ) }{ x i ( t 2 ) − ¯ x i ( t 2 ) } . i =1 (4) AB 9 / 22

  10. Correlation Correlation function: cov X ( t 1 , t 2 ) corr X ( t 1 , t 2 ) = . (5) � var X ( t 1 )var X ( t 2 ) It is often useful to examine the plot of cross-correlation. AB 10 / 22

  11. Cross-correlation Now we have pairs of functions ( x i , y i ). Cross-covariance: N cov X , Y ( t 1 , t 2 ) = ( N − 1) − 1 � { x i ( t 1 ) − ¯ x ( t 1 ) }{ y i ( t 1 ) − ¯ y ( t 1 ) } . i =1 (6) Cross-correlation: cov X , Y ( t 1 , t 2 ) corr X ( t 1 , t 2 ) = . (7) � var X ( t 1 )var Y ( t 2 ) AB 11 / 22

  12. Case Study: Tecator Data 240 samples of absorbance spectrums. In addition to the absorbance spectrums we have fat content as output. The cross-correlation with the output can be misleading. AB 12 / 22

  13. 5.5 0.35 5 0.3 4.5 Absorbance Variance 4 0.25 3.5 3 0.2 2.5 2 850 900 950 1000 1050 850 900 950 1000 1050 Wavelength Wavelength 0.32 1050 0.3 Cross−correlation 1000 0.28 0.26 950 0.24 900 0.22 0.2 850 850 900 950 1000 1050 850 900 950 1000 1050 Wavelength Wavelength Figure: From left to right: the inputs, the variance function, the AB correlation function and the cross-correlation with the scalar output. 13 / 22

  14. Function Basis A basis is a linearly independent set of function { ω i } ∞ i =1 that spans the function space. Example: the set of monomials { t i } ∞ i =0 . Basis expansion: the functional inputs { x i ( t ) } N i =1 are approximated as (for some finite K > 0) K � x i ( t ) ≈ c k ω k ( t ). (8) k =1 The weights are solved by minimizing some cost function. AB 14 / 22

  15. Why to use basis expansions? Dimension reduction. Reduces computational demand in later stages of analysis. Noise removal. AB 15 / 22

  16. Fourier Basis Fourier basis on [0 , 1] is { sin 2 π jt , cos 2 π jt } ∞ j =1 . Sometimes good for periodic data. Lack of locality. Computational complexity O ( N log N ). AB 16 / 22

  17. Wavelets Under some conditions, the functions ψ jk ( t ) = 2 j / 2 ψ (2 j t − k ) (9) form a basis. Wavelets are local. Fast computation. AB 17 / 22

  18. Splines (1) Consider the interval [0 , 1] and the breakpoints τ = { τ l } L l =0 with τ 0 = 0 and τ L = 1. A spline is piecewise polynomial with degree K . At the breakpoints it is required that the values of the polynomials and derivatives up to K − 1 agree. Thus a spline is K-1 times differentiable. For K = 1, spline is a piecewise linear function. AB 18 / 22

  19. Splines (2) The number of intervals: L . Degrees of freedom: LK − ( L − 1)( K − 1) = K + L − 1, (10) that is, the number of interior knots plus the order. It is not necessary to require same smoothnes in all the knots. AB 19 / 22

  20. Spline Basis Splines can be represented using a basis expansion K + L − 1 � S ( t ) = c k B k ( t ). (11) k =1 The basis is not orthonormal the locality being determined by K (complexity grows linearly with respect to the number of data). The coefficients can be used in regression and data analysis. AB 20 / 22

  21. Order 2 Spline Order 3 Spline 1 1 0.8 0.8 0.6 0.6 B(x) B(x) 0.4 0.4 0.2 0.2 0 0 0 5 10 15 20 0 5 10 15 20 x x Order 4 Spline Order 5 Spline 1 1 0.8 0.8 0.6 0.6 B(x) B(x) 0.4 0.4 0.2 0.2 0 0 0 5 10 15 20 0 5 10 15 20 x x Figure: Spline basis for different orders. AB 21 / 22

  22. Conclusion Functional data occurs in real world. Important tools include correlation plots, derivatives and basis expansions. Removal of noise is needed. AB 22 / 22

Recommend


More recommend