uncertain time series similarity return to the basics
play

Uncertain Time-Series Similarity: Return to the Basics Dallachiesa - PowerPoint PPT Presentation

Uncertain Time-Series Similarity: Return to the Basics Dallachiesa et al., VLDB 2012 Li Xiong, CS730 Problem Problem: uncertain time-series similarity Applications: location tracking of moving objects; traffic monitoring; remote


  1. Uncertain Time-Series Similarity: Return to the Basics Dallachiesa et al., VLDB 2012 Li Xiong, CS730

  2. Problem • Problem: uncertain time-series similarity • Applications: – location tracking of moving objects; traffic monitoring; remote sensing • Uncertain time-series is pervasive – Imprecision of sensor observations – Privacy preserving transformations • Similarity matching is basis for many analysis and mining – Clustering – Shapelet – Motif – …

  3. Overview • Review of 3 state-of-art techniques for similarity matching in uncertain time series – MUNICH, PROUD, DUST • Experimental comparison of the techniques for similarity matching on 17 real (perturbed) datasets • Two additional (simple) similarity measures which unexpectedly outperforms the state-of-art • Discussion of research directions

  4. Modeling/Representing uncertain time-series • Repeated measurements (samples) • Probability density function (pdf) over the uncertain values

  5. Modeling/Representing uncertain time-series • Repeated measurements (samples)

  6. Modeling/Representing uncertain time-series • Probability density function (pdf) over the uncertain values

  7. Similarity metrics • Euclidean Distance (ED) • Dynamic Time Warping (DTW)

  8. Similarity based range query • Range query: given a collection of time-series C, a query sequence Q, find similar series S in C • Probabilistic range query

  9. State-of-the-Art • MUNICH – Repeated observation model • PROUD – Random variable model • DUST – Random variable model

  10. MUNICH • Repeated observation model • Euclidean distance (Lp-norm) and Dynamic Time Warping (DTW) 10 21/2/2011

  11. MUNICH • Materialize uncertain sequences X and Y to all possible certain sequences • Define the set of distances between all possible sequences • Uncertain distance

  12. MUNICH • Naïve Computation: exponential computation cost (note the typo) 12 CAO Chen, DB Group, CSE, HKUST 21/2/2011

  13. MUNICH • Lower bounding and upper bounding the distance/probability • Approximate the samples using minimum bounding intervals

  14. MUNICH • Minimum bounding interval

  15. MUNICH • Compute upper bound and lower bound of distances between all possible interval sequences

  16. MUNICH • Recall uncertain distance and probabilistic range query • Compute lower bound and upper bound for Pr

  17. MUNICH • Pruning based on lower and upper bound True Hit True Drop • Stepwise refinement

  18. PROUD • Pdf model and Euclidean distance • Probabilistic distance model

  19. PROUD • Probabilistic distance model • The distance approaches a normal distribution when number of time points sufficiently large (central limit theorem)

  20. PROUD • Recall probabilistic range query • CDF of normal distribution expressed as error function and compute • Compute normalized epsilon and test

  21. DUST • Probability model • DUST similarity metric • Bayesian probability computation

  22. DUST: A Generalized Notion of Similarity between Uncertain Time Series Smruti R. Sarangi and Karin Murthy IBM Research Labs, Bangalore, India

  23. Resolving the Question Euclidean distance ( EUCL ) T 2 or T 3 ??? and Dynamic Time T 3 Warping ( DTW ) T 3 value T 2 T 1 T 2 DUST time • T 2 should be closer to T 1 than T 3 – This is because it is possible that T 2 and T 1 are the same time series. T 2 just has some additional error. – T 3 and T 1 can never be the same time series because the last value has a very large divergence 23

  24. Extending Prior Work Prior Work Two time series are considered similar if : P( DIST (T 1 ,T 2 ) ≤ ε) ≥ τ DIST (T 1 , T 2 ) = sqrt( Σ i dist (T 1 [i], T 2 [i]) 2 ) dist(x,y) = |x-y| Assumption P( DIST (T 1 ,T 2 ) ≤ ε ) = p( DIST (T 1 ,T 2 ) = 0) ε (irrespective of the size of ε ) 24

  25. Some Algebra P( DIST (T 1 ,T 2 ) ≤ ε ) > P( DIST (T 1 ,T 3 ) ≤ ε ) ≈ p( DIST (T 1 ,T 2 ) = 0) > p( DIST (T 1 ,T 3 ) = 0) Π i p( dist (T 1 [i], T 2 [i]) = 0) > Π i p( dist (T 1 [i], T 3 [i]) = 0) Σ i – log(p( dist (T 1 [i], T 2 [i]) = 0)) ≤ Σ i – log(p( dist (T 1 [i], T 3 [i]) = 0)) dist (x,y) is only -log ( φ (|T 1 [i] – T 2 [i]|) dependent on |x-y| φ (x) = p( dist (0,x) = 0) proved in the paper Definition dust (x,y) = -log( φ (|x-y|)) + log( φ (0) 25

  26. DUST • Compute • Bayes Theorem • Require – Data distribution (uniform) – Error distribution

  27. Comparison • Common assumption: value at each timestamp independent – Correlations neglected

  28. Comparison MUNICH PROUD DUST Uncertainty Multiple Random Random modeling observations variable variable A priori Mean and Data knowledge standard distribution deviation and error distribution Distance Euclidean, Euclidean DUST, metric DTW Euclidean, DTW Similarity Probabilistic Probabilistic kNN queries queries range queries range queries

  29. Experimental Study • Data – 17 real datasets from UCR: time series with exact values as ground truth – (not real) Perturbation with uniform, normal and exponential error distributions • Similarity matching: probabilistic range queries • Metric: F1 metric • Baseline: Euclidean distance

  30. Moving average filters • Uncertain moving average (UMA) – weigh less the observations with larger errror standard deviation • Uncertain exponential moving average (UEMA) – weigh more the nearest neighbors

  31. Discussion • Experiment and Analysis track paper • Good analytical and experimental survey • Unexpected results

  32. Discussion • What’s realistic prior knowledge to assume? • How to model correlations between time points?

Recommend


More recommend