clustering random walk time series
play

Clustering Random Walk Time Series GSI 2015 - Geometric Science of - PowerPoint PPT Presentation

Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion Clustering Random Walk Time Series GSI 2015 - Geometric Science of Information Gautier Marti, Frank Nielsen, Philippe Very, Philippe Donnat 29 October


  1. Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion Clustering Random Walk Time Series GSI 2015 - Geometric Science of Information Gautier Marti, Frank Nielsen, Philippe Very, Philippe Donnat 29 October 2015 Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

  2. Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion Introduction 1 2 Geometry of Random Walk Time Series The Hierarchical Block Model 3 Conclusion 4 Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

  3. Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion Context (data from www.datagrapple.com ) Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

  4. Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion What is a clustering program? Definition Clustering is the task of grouping a set of objects in such a way that objects in the same group (cluster) are more similar to each other than those in different groups. Example of a clustering program We aim at finding k groups by positioning k group centers { c 1 , . . . , c k } such that data points { x 1 , . . . , x n } minimize � n i =1 min k j =1 d ( x i , c j ) 2 min c 1 ,..., c k But, what is the distance d between two random walk time series? Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

  5. Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion What are clusters of Random Walk Time Series? French banks and building materials CDS over 2006-2015 Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

  6. Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion What are clusters of Random Walk Time Series? French banks and building materials CDS over 2006-2015 Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

  7. Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion Introduction 1 2 Geometry of Random Walk Time Series The Hierarchical Block Model 3 Conclusion 4 Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

  8. Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion Geometry of RW TS ≡ Geometry of Random Variables i.i.d. observations: X 1 X 2 X T X 1 : 1 , 1 , . . . , 1 X 1 X 2 X T X 2 : 2 , 2 , . . . , 2 . . . , . . . , . . . , . . . , . . . X 1 X 2 X T X N : N , N , . . . , N Which distances d ( X i , X j ) between dependent random variables? Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

  9. Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion Pitfalls of a basic distance Let ( X , Y ) be a bivariate Gaussian vector, with X ∼ N ( µ X , σ 2 X ), Y ∼ N ( µ Y , σ 2 Y ) and whose correlation is ρ ( X , Y ) ∈ [ − 1 , 1]. E [( X − Y ) 2 ] = ( µ X − µ Y ) 2 + ( σ X − σ Y ) 2 + 2 σ X σ Y (1 − ρ ( X , Y )) Now, consider the following values for correlation: ρ ( X , Y ) = 0, so E [( X − Y ) 2 ] = ( µ X − µ Y ) 2 + σ 2 X + σ 2 Y . Assume µ X = µ Y and σ X = σ Y . For σ X = σ Y ≫ 1, we obtain E [( X − Y ) 2 ] ≫ 1 instead of the distance 0, expected from comparing two equal Gaussians. ρ ( X , Y ) = 1, so E [( X − Y ) 2 ] = ( µ X − µ Y ) 2 + ( σ X − σ Y ) 2 . Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

  10. Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion Pitfalls of a basic distance Let ( X , Y ) be a bivariate Gaussian vector, with X ∼ N ( µ X , σ 2 X ), Y ∼ N ( µ Y , σ 2 Y ) and whose correlation is ρ ( X , Y ) ∈ [ − 1 , 1]. E [( X − Y ) 2 ] = ( µ X − µ Y ) 2 + ( σ X − σ Y ) 2 + 2 σ X σ Y (1 − ρ ( X , Y )) Now, consider the following values for correlation: ρ ( X , Y ) = 0, so E [( X − Y ) 2 ] = ( µ X − µ Y ) 2 + σ 2 X + σ 2 Y . Assume µ X = µ Y and σ X = σ Y . For σ X = σ Y ≫ 1, we obtain E [( X − Y ) 2 ] ≫ 1 instead of the distance 0, expected from comparing two equal Gaussians. ρ ( X , Y ) = 1, so E [( X − Y ) 2 ] = ( µ X − µ Y ) 2 + ( σ X − σ Y ) 2 . Probability density functions of Gaus- 0.40 sians N ( − 5 , 1) and N (5 , 1), Gaus- 0.35 0.30 sians N ( − 5 , 3) and N (5 , 3), and 0.25 Gaussians N ( − 5 , 10) and N (5 , 10). 0.20 0.15 Green, red and blue Gaussians are 0.10 equidistant using L 2 geometry on the 0.05 0.00 parameter space ( µ, σ ). 30 20 10 0 10 20 30 Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

  11. Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion Sklar’s Theorem Theorem (Sklar’s Theorem (1959)) For any random vector X = ( X 1 , . . . , X N ) having continuous marginal cdfs P i , 1 ≤ i ≤ N, its joint cumulative distribution P is uniquely expressed as P ( X 1 , . . . , X N ) = C ( P 1 ( X 1 ) , . . . , P N ( X N )) , where C, the multivariate distribution of uniform marginals, is known as the copula of X. Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

  12. Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion Sklar’s Theorem Theorem (Sklar’s Theorem (1959)) For any random vector X = ( X 1 , . . . , X N ) having continuous marginal cdfs P i , 1 ≤ i ≤ N, its joint cumulative distribution P is uniquely expressed as P ( X 1 , . . . , X N ) = C ( P 1 ( X 1 ) , . . . , P N ( X N )) , where C, the multivariate distribution of uniform marginals, is known as the copula of X. Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

  13. Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion The Copula Transform Definition (The Copula Transform) Let X = ( X 1 , . . . , X N ) be a random vector with continuous marginal cumulative distribution functions (cdfs) P i , 1 ≤ i ≤ N. The random vector U = ( U 1 , . . . , U N ) := P ( X ) = ( P 1 ( X 1 ) , . . . , P N ( X N )) is known as the copula transform. U i , 1 ≤ i ≤ N , are uniformly distributed on [0 , 1] (the probability integral transform): for P i the cdf of X i , we have x = P i ( P i − 1 ( x )) = Pr ( X i ≤ P i − 1 ( x )) = Pr ( P i ( X i ) ≤ x ), thus P i ( X i ) ∼ U [0 , 1]. Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

  14. Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion The Copula Transform Definition (The Copula Transform) Let X = ( X 1 , . . . , X N ) be a random vector with continuous marginal cumulative distribution functions (cdfs) P i , 1 ≤ i ≤ N. The random vector U = ( U 1 , . . . , U N ) := P ( X ) = ( P 1 ( X 1 ) , . . . , P N ( X N )) is known as the copula transform. ρ ≈ 0 . 84 ρ =1 2 1.2 1.0 0 Y ∼ ln( X ) 0.8 2 P Y ( Y ) 0.6 4 0.4 6 0.2 8 0.0 10 0.2 0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 X ∼U [0 , 1] P X ( X ) The Copula Transform invariance to strictly increasing transformation Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

  15. Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion Deheuvels’ Empirical Copula Transform Let ( X t 1 , . . . , X t N ), 1 ≤ t ≤ T , be T observations from a random vector ( X 1 , . . . , X N ) with continuous margins. Since one cannot directly obtain the corresponding copula observations ( U t 1 , . . . , U t N ) = ( P 1 ( X t 1 ) , . . . , P N ( X t N )), where t = 1 , . . . , T , without knowing a priori ( P 1 , . . . , P N ), one can instead Definition (The Empirical Copula Transform) � T estimate the N empirical margins P T i ( x ) = 1 t =1 1 ( X t i ≤ x ), T 1 ≤ i ≤ N , to obtain the T empirical observations ( ˜ 1 , . . . , ˜ N ) = ( P T 1 ( X t 1 ) , . . . , P T N ( X t U t U t N )) . Equivalently, since ˜ U t i = R t i / T , R t i being the rank of observation X t i , the empirical copula transform can be considered as the normalized rank transform . In practice x_transform = rankdata(x)/len(x) Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

  16. Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion Generic Non-Parametric Distance d 2 | P i ( X i ) − P j ( X j ) | 2 � � θ ( X i , X j ) = θ 3 E � 2 �� � (1 − θ )1 dP i dP j � d λ − + d λ 2 d λ R (i) 0 ≤ d θ ≤ 1, (ii) 0 < θ < 1, d θ metric, (iii) d θ is invariant under diffeomorphism Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

  17. Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion Generic Non-Parametric Distance � 2 �� � dP j dP i d 2 0 : 1 d λ = Hellinger 2 � d λ − 2 R d λ � 1 � 1 = 1 − ρ S d 2 | P i ( X i ) − P j ( X j ) | 2 � � 1 : 3 E = 2 − 6 C ( u , v ) d u d v 2 0 0 Remark: If f ( x , θ ) = c Φ ( u 1 , . . . , u N ; Σ) � N i =1 f i ( x i ; ν i ) then N ds 2 = ds 2 � ds 2 GaussCopula + margins i =1 Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

  18. Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion Introduction 1 2 Geometry of Random Walk Time Series The Hierarchical Block Model 3 Conclusion 4 Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

Recommend


More recommend