Uncertain Time-Series Similarity: Return to the Basics Dallachiesa - PowerPoint PPT Presentation

Uncertain Time-Series Similarity: Return to the Basics Dallachiesa et al., VLDB 2012 Li Xiong, CS730

Problem • Problem: uncertain time-series similarity • Applications: – location tracking of moving objects; traffic monitoring; remote sensing • Uncertain time-series is pervasive – Imprecision of sensor observations – Privacy preserving transformations • Similarity matching is basis for many analysis and mining – Clustering – Shapelet – Motif – …

Overview • Review of 3 state-of-art techniques for similarity matching in uncertain time series – MUNICH, PROUD, DUST • Experimental comparison of the techniques for similarity matching on 17 real (perturbed) datasets • Two additional (simple) similarity measures which unexpectedly outperforms the state-of-art • Discussion of research directions

Modeling/Representing uncertain time-series • Repeated measurements (samples) • Probability density function (pdf) over the uncertain values

Modeling/Representing uncertain time-series • Repeated measurements (samples)

Modeling/Representing uncertain time-series • Probability density function (pdf) over the uncertain values

Similarity metrics • Euclidean Distance (ED) • Dynamic Time Warping (DTW)

Similarity based range query • Range query: given a collection of time-series C, a query sequence Q, find similar series S in C • Probabilistic range query

State-of-the-Art • MUNICH – Repeated observation model • PROUD – Random variable model • DUST – Random variable model

MUNICH • Repeated observation model • Euclidean distance (Lp-norm) and Dynamic Time Warping (DTW) 10 21/2/2011

MUNICH • Materialize uncertain sequences X and Y to all possible certain sequences • Define the set of distances between all possible sequences • Uncertain distance

MUNICH • Naïve Computation: exponential computation cost (note the typo) 12 CAO Chen, DB Group, CSE, HKUST 21/2/2011

MUNICH • Lower bounding and upper bounding the distance/probability • Approximate the samples using minimum bounding intervals

MUNICH • Minimum bounding interval

MUNICH • Compute upper bound and lower bound of distances between all possible interval sequences

MUNICH • Recall uncertain distance and probabilistic range query • Compute lower bound and upper bound for Pr

MUNICH • Pruning based on lower and upper bound True Hit True Drop • Stepwise refinement

PROUD • Pdf model and Euclidean distance • Probabilistic distance model

PROUD • Probabilistic distance model • The distance approaches a normal distribution when number of time points sufficiently large (central limit theorem)

PROUD • Recall probabilistic range query • CDF of normal distribution expressed as error function and compute • Compute normalized epsilon and test

DUST • Probability model • DUST similarity metric • Bayesian probability computation

DUST: A Generalized Notion of Similarity between Uncertain Time Series Smruti R. Sarangi and Karin Murthy IBM Research Labs, Bangalore, India

Resolving the Question Euclidean distance ( EUCL ) T 2 or T 3 ??? and Dynamic Time T 3 Warping ( DTW ) T 3 value T 2 T 1 T 2 DUST time • T 2 should be closer to T 1 than T 3 – This is because it is possible that T 2 and T 1 are the same time series. T 2 just has some additional error. – T 3 and T 1 can never be the same time series because the last value has a very large divergence 23

Extending Prior Work Prior Work Two time series are considered similar if : P( DIST (T 1 ,T 2 ) ≤ ε) ≥ τ DIST (T 1 , T 2 ) = sqrt( Σ i dist (T 1 [i], T 2 [i]) 2 ) dist(x,y) = |x-y| Assumption P( DIST (T 1 ,T 2 ) ≤ ε ) = p( DIST (T 1 ,T 2 ) = 0) ε (irrespective of the size of ε ) 24

Some Algebra P( DIST (T 1 ,T 2 ) ≤ ε ) > P( DIST (T 1 ,T 3 ) ≤ ε ) ≈ p( DIST (T 1 ,T 2 ) = 0) > p( DIST (T 1 ,T 3 ) = 0) Π i p( dist (T 1 [i], T 2 [i]) = 0) > Π i p( dist (T 1 [i], T 3 [i]) = 0) Σ i – log(p( dist (T 1 [i], T 2 [i]) = 0)) ≤ Σ i – log(p( dist (T 1 [i], T 3 [i]) = 0)) dist (x,y) is only -log ( φ (|T 1 [i] – T 2 [i]|) dependent on |x-y| φ (x) = p( dist (0,x) = 0) proved in the paper Definition dust (x,y) = -log( φ (|x-y|)) + log( φ (0) 25

DUST • Compute • Bayes Theorem • Require – Data distribution (uniform) – Error distribution

Comparison • Common assumption: value at each timestamp independent – Correlations neglected

Comparison MUNICH PROUD DUST Uncertainty Multiple Random Random modeling observations variable variable A priori Mean and Data knowledge standard distribution deviation and error distribution Distance Euclidean, Euclidean DUST, metric DTW Euclidean, DTW Similarity Probabilistic Probabilistic kNN queries queries range queries range queries

Experimental Study • Data – 17 real datasets from UCR: time series with exact values as ground truth – (not real) Perturbation with uniform, normal and exponential error distributions • Similarity matching: probabilistic range queries • Metric: F1 metric • Baseline: Euclidean distance

Moving average filters • Uncertain moving average (UMA) – weigh less the observations with larger errror standard deviation • Uncertain exponential moving average (UEMA) – weigh more the nearest neighbors

Discussion • Experiment and Analysis track paper • Good analytical and experimental survey • Unexpected results

Discussion • What’s realistic prior knowledge to assume? • How to model correlations between time points?

Uncertain Time-Series Similarity: Return to the Basics Dallachiesa - PowerPoint PPT Presentation

Uncertain Time-Series Similarity: Return to the Basics Dallachiesa et al., VLDB 2012 Li Xiong, CS730 Problem Problem: uncertain time-series similarity Applications: location tracking of moving objects; traffic monitoring; remote

COVID-19 VIRTUAL FORUM STRATEGY IN UNCERTAIN TIMES COVID-19: STRATEGY IN UNCERTAIN TIMES APRIL

Time- -dependent Similarity Measure dependent Similarity Measure Time Time-dependent Similarity

STAY HEALTHY | RETURN SMARTER | RETURN STRONGER THANK YOU STAY HEALTHY | RETURN SMARTER | RETURN

Iteration Announcements Return Return Statements 4 Return Statements A return statement

Semantic Similarity MultiJEDI ERC 259234 Semantic Similarity Semantic Similarity Mostly

Lead Screw Motors LSM08 Series LSM11 Series LSM14 Series LSM17 Series

Uncertain< T > A First-Order Type for Uncertain Data James Bornholt Australian National

Time Series Analysis and Mining with R Time Series Decomposi- tion Time Series Forecasting

Align, Disambiguate, and Walk A Unified Approach for Measuring Semantic Similarity Semantic

Quality of Similarity Rankings in Time Series T. Bernecker, in Time Series M. E. Houle, H.-P.

Outline Time series and forecasting Time series objects 1 in R Basic time series functionality

Uncertain Centroid based Partitional Clustering of Uncertain Data Francesco Gullo Andrea

Top-k Queries over Uncertain Scores Qing Liu, Debabrota Basu, Talel Abdessalem, St ephane

Similarity search Evaluating Strategies for Given a query Web page q , return Web Similarity

Studio Companion Series: Presentation Basics Studio Companion Series: Presentation Basics

standard series Overview DP series DX series H series M series bitte hier

Unit 11 Signed Representation Systems BINARY REPRESENTATION SYSTEMS Binary Arithmetic REVIEW

Introduction to Mobile Robotics Iterative Closest Point Algorithm Wolfram Burgard, Cyrill

6.2 Surface Reconstruction Hao Li http://cs621.hao-li.com 1 Surface Reconstruction physical

Conjugate Priors: Beta and Normal 18.05 Spring 2018 Review: Continuous priors, discrete data

Scan Matching Overview Problem statement: n Given a scan and a map, or a scan and a scan, or a

The Future of Video Indexing in the BBC Joanne Evans, BBC Information & Archives TrecVid

Gr a d u a t e S e mi n a r 1 9 S e p t . Mo r n i n g : p r o j e

PS 405 Week 5 Section: OLS Regression and Its Assumptions D.J. Flynn February 11, 2014

Uncertain Time-Series Similarity: Return to the Basics Dallachiesa - PowerPoint PPT Presentation

Uncertain Time-Series Similarity: Return to the Basics Dallachiesa et al., VLDB 2012 Li Xiong, CS730 Problem Problem: uncertain time-series similarity Applications: location tracking of moving objects; traffic monitoring; remote

COVID-19 VIRTUAL FORUM STRATEGY IN UNCERTAIN TIMES COVID-19: STRATEGY IN UNCERTAIN TIMES APRIL

Time- -dependent Similarity Measure dependent Similarity Measure Time Time-dependent Similarity

STAY HEALTHY | RETURN SMARTER | RETURN STRONGER THANK YOU STAY HEALTHY | RETURN SMARTER | RETURN

Iteration Announcements Return Return Statements 4 Return Statements A return statement

Semantic Similarity MultiJEDI ERC 259234 Semantic Similarity Semantic Similarity Mostly

Lead Screw Motors LSM08 Series LSM11 Series LSM14 Series LSM17 Series

Uncertain&lt; T &gt; A First-Order Type for Uncertain Data James Bornholt Australian National

Time Series Analysis and Mining with R Time Series Decomposi- tion Time Series Forecasting

Align, Disambiguate, and Walk A Unified Approach for Measuring Semantic Similarity Semantic

Quality of Similarity Rankings in Time Series T. Bernecker, in Time Series M. E. Houle, H.-P.

Outline Time series and forecasting Time series objects 1 in R Basic time series functionality

Uncertain Centroid based Partitional Clustering of Uncertain Data Francesco Gullo Andrea

Top-k Queries over Uncertain Scores Qing Liu, Debabrota Basu, Talel Abdessalem, St ephane

Similarity search Evaluating Strategies for Given a query Web page q , return Web Similarity

Studio Companion Series: Presentation Basics Studio Companion Series: Presentation Basics

standard series Overview DP series DX series H series M series bitte hier

Unit 11 Signed Representation Systems BINARY REPRESENTATION SYSTEMS Binary Arithmetic REVIEW

Introduction to Mobile Robotics Iterative Closest Point Algorithm Wolfram Burgard, Cyrill

6.2 Surface Reconstruction Hao Li http://cs621.hao-li.com 1 Surface Reconstruction physical

Conjugate Priors: Beta and Normal 18.05 Spring 2018 Review: Continuous priors, discrete data

Scan Matching Overview Problem statement: n Given a scan and a map, or a scan and a scan, or a

The Future of Video Indexing in the BBC Joanne Evans, BBC Information &amp; Archives TrecVid

Gr a d u a t e S e mi n a r 1 9 S e p t . Mo r n i n g : p r o j e

PS 405 Week 5 Section: OLS Regression and Its Assumptions D.J. Flynn February 11, 2014

Uncertain< T > A First-Order Type for Uncertain Data James Bornholt Australian National

The Future of Video Indexing in the BBC Joanne Evans, BBC Information & Archives TrecVid