FTW: Fast Similarity Search under the Time Warping Distance Yasushi - PowerPoint PPT Presentation

FTW: Fast Similarity Search under the Time Warping Distance Yasushi Sakurai (NTT Cyber Space Labs) Masatoshi Yoshikawa (Nagoya Univ.) Christos Faloutsos (Carnegie Mellon Univ.)

Motivation n Time-series data q many applications n computational biology, astrophysics, geology, meteorology, multimedia, economics n Similarity search q Euclidean distance q DTW (Dynamic Time Warping) n Useful for different sequence lengths n Different sampling rates n scaling along the time axis PODS 2005 2 Y. Sakurai et al

Mini-introduction to DTW n DTW allows sequences to be stretched along the time axis q Minimize the distance of sequences q Insert ‘stutters’ into a sequence q THEN compute the (Euclidean) distance original ‘ stutters’: PODS 2005 3 Y. Sakurai et al

Mini-introduction to DTW n DTW is computed by dynamic programming q Warping path: set of grid cells in the time warping matrix Optimum warping path (the best alignment ) data sequence P of length q M N p i p N p 1 p -stutters Q q j q 1 q 1 p 1 p i p N q M q j P query sequence Q of length M q -stutters PODS 2005 4 Y. Sakurai et al

Mini-introduction to DTW n DTW is computed by dynamic programming p 1 , p 2 , …, p i,; q 1 , q 2 , …, q j = D ( P , Q ) f ( N , M ) dtw p -stutter - ì f ( i , j 1 ) ï q -stutter = - + - f ( i , j ) p q min f ( i 1 , j ) í i j ï no stutter - - f ( i 1 , j 1 ) î PODS 2005 5 Y. Sakurai et al

Mini-introduction to DTW n Global constraints limit the warping scope q Warping scope: area that the warping path is allowed to visit q M q M Q Q q j q j q 1 q 1 p 1 p i p N p 1 p i p N P P Sakoe-Chiba Band Itakura Parallelogram PODS 2005 6 Y. Sakurai et al

Mini-introduction to DTW n Width of the warping scope W is user-defined q M q M W 1 W 2 Q Q q j q j q 1 q 1 p 1 p i p N p 1 p i p N P P Sakoe-Chiba Band PODS 2005 7 Y. Sakurai et al

Motivation n Similarity search for time-series data q DTW (Dynamic Time Warping) n scaling along the time axis But… n High search cost O(NM) n prohibitive for long sequences PODS 2005 8 Y. Sakurai et al

Our Solution, FTW n Requirements: 1. Fast 2. No false dismissals 3. No restriction on the sequence length n It should handle data sequences of different lengths 4. Support for any, as well as for no restriction on “warping scope” PODS 2005 9 Y. Sakurai et al

Problem Definition n Given q S time-series data sequences of unequal lengths { P 1 , P 2 , …, P S } , q a query sequence Q , q an integer k , q (optionally) a warping scope W , n Find the k -nearest neighbors of Q from the data sequence set by using DTW with W PODS 2005 10 Y. Sakurai et al

Overview n Introduction n Related work n Main ideas n Experimental results n Conclusions PODS 2005 11 Y. Sakurai et al

Related Work n Sequence indexing q Agrawal et al. (FODO 1998) q Keogh et al. (SIGMOD 2001) q … n Subsequence matching q Faloutsos et al. (SIGMOD 1994) q Moon et al. (SIGMOD 2002) q … PODS 2005 12 Y. Sakurai et al

Related Work n Fast sequence matching for DTW q Yi et al. (ICDE 1998) q Kim et al. (ICDE 2001) q Chu et al. (SDM 2002) q Keogh (VLDB 2002) q Zhu et al. (SIGMOD 2003) q … n None of the existing methods for DTW fulfills all the requirements PODS 2005 13 Y. Sakurai et al

Main Idea (1) - LBS n LBS (Lower Bounding distance measure with Segmentation) n P A : Approximate sequences R p : segment range q i U p : upper value q i L p : lower value q A P i p = R L U ( p : p ) i i i R p 4 R p 1 R q t : length of time intervals* R p 2 p 3 t t t t PODS 2005 15 Y. Sakurai et al

Main Idea (1) - LBS n Compute lower bounding distance R R p q q Distance of the two ranges and : i j distance of their two closest points R p i Value Lower bound Value Lower bound =0 R q j Time Time PODS 2005 16 Y. Sakurai et al

Main Idea (1) - LBS details n Compute lower bounding distance R R p q q Distance of the two ranges and : i j distance of their two closest points ì - > L U L U p q ( p q ) i j i j ï ï = - > R R L U L U D ( p , q ) q p ( q p ) í seg i j j i j i ï 0 ( otherwise ) ï î PODS 2005 17 Y. Sakurai et al

Main Idea (1) - LBS n Exact DTW distance P Q Q P PODS 2005 18 Y. Sakurai et al

Main Idea (1) - LBS A and Q A n Compute lower bounding distance from P n Use a dynamic programming approach £ A A D ( P , Q ) D ( P , Q ) lbs dtw A P A Q A Q A P PODS 2005 19 Y. Sakurai et al

Main Idea (1) - LBS A and Q A n Compute lower bounding distance from P n Use a dynamic programming approach £ A A D ( P , Q ) D ( P , Q ) lbs dtw A Q Q A P P PODS 2005 20 Y. Sakurai et al

Main Idea (2) - EarlyStopping n Exploit the fact that we have found k -near neighbors at distance d cb q d cb : k-nearest neighbor distance (the Current Best) the exact distance of the best k candidates so far PODS 2005 21 Y. Sakurai et al

Main Idea (2) - EarlyStopping d n Exclude useless warping paths by using cb > g ( 1 , 2 ) d q Omit g (1,3) if cb > g ( 3 , 1 ) d q Omit g (4,1) if cb A P A Q g (1,2) A Q g (3,1) A P PODS 2005 22 Y. Sakurai et al

Main Idea (3) - Refinement n Q: How to choose t (length of time intervals)? A P A Q g(1,2) A Q t g(3,1) A P t PODS 2005 23 Y. Sakurai et al

Main Idea (3) - Refinement n Q: How to choose t (length of intervals)? n A: Use multiple granularities, as follows: A P A Q g(1,2) A Q t g(3,1) A P t PODS 2005 24 Y. Sakurai et al

Main Idea (3) - Refinement n Compute the lower bounding distance from the coarsest sequences as the first refinement step > A A n Ignore P if , otherwise: D ( P , Q ) d lbs cb A P A Q g(1,2) A Q g(3,1) A P PODS 2005 25 Y. Sakurai et al

Main Idea (3) - Refinement n … compute the distance from more accurate sequences as the second refinement step n … repeat A P A Q A Q A P PODS 2005 26 Y. Sakurai et al

Main Idea (3) - Refinement n … until the finest granularity £ n Update the list of k -nearest neighbors if D ( P , Q ) d dtw cb P Q Q P PODS 2005 27 Y. Sakurai et al

Experimental results n Setup q Intel Xeon 2.8GHz, 1GB memory, Linux q Datasets: Temperature, Fintime, RandomWalk q Four different time intervals (for n =2048) t 1 =2, t 2 =8, t 3 =32, t 4 =128 n Evaluation q Compared FTW with LB_PAA (the best so far) q Mainly computation time PODS 2005 29 Y. Sakurai et al

Outline of experiments n Speed vs db size n Speed vs warping scope W n Effect of filtering n Effect of varying-length data sequences PODS 2005 30 Y. Sakurai et al

Search Performance n Itakura Parallelogram q M Q q j q 1 p 1 p i p N P PODS 2005 31 Y. Sakurai et al

Search Performance n Wall clock time as a function of data set size n Temperature FTW is up to 50 times faster! PODS 2005 32 Y. Sakurai et al

Search Performance n Wall clock time as a function of data set size n Fintime FTW is up to 40 times faster! PODS 2005 33 Y. Sakurai et al

Search Performance n Wall clock time as a function of data set size n RandomWalk FTW is up to 40 times faster! More effective as the size grows PODS 2005 34 Y. Sakurai et al

Search Performance n Sakoe-Chiba Band q M q M W 1 W 2 Q Q q j q j q 1 q 1 p 1 p i p N p 1 p i p N P P PODS 2005 36 Y. Sakurai et al

Search Performance n Wall clock time as a function of warping scope n Temperature FTW is up to 220 times faster! PODS 2005 37 Y. Sakurai et al

Search Performance n Wall clock time as a function of warping scope n Fintime FTW is up to 70 times faster! PODS 2005 38 Y. Sakurai et al

Search Performance n Wall clock time as a function of warping scope n RandomWalk FTW is up to 100 times faster! PODS 2005 39 Y. Sakurai et al

Effect of filtering n Most of data sequences are excluded by coarser approximations ( t 4 =128 and t 3 =32) q Using multiple granularities has significant advantages Frequency of approximation use PODS 2005 41 Y. Sakurai et al

Outline of experiments n Speed vs db size n Speed vs warping scope W n Effect of filtering n Effect of varying-length sequences PODS 2005 42 Y. Sakurai et al

Difference in Sequence Lengths n 5 sequence data sets Random (2048,0): length 2048 +/- 0 Random (2048,32): length 2048 +/- 16 Random (2048,64), Random (2048,128), Random (2048,256) Outperform by 2+ orders of magnitude LB_PAA can not handle PODS 2005 43 Y. Sakurai et al

FTW: Fast Similarity Search under the Time Warping Distance Yasushi - PowerPoint PPT Presentation

FTW: Fast Similarity Search under the Time Warping Distance Yasushi Sakurai (NTT Cyber Space Labs) Masatoshi Yoshikawa (Nagoya Univ.) Christos Faloutsos (Carnegie Mellon Univ.) Motivation n Time-series data q many applications n computational

Introduction Warping polynomial Span of warping polynomial Span and dealternating number Ayaka

RESOURCES FOR SPEECH SYNTHESIS OF VIENNESE VARIETIES Contents Project Viennese Sociolect

Today Alignment & warping 2d transformations Forward and inverse image warping

Business Plan For March, 2013 What Is FTW? Mission Statement FTW will provide the essential

LECTURE 4 Similarity and Distance Recommender Systems SIMILARITY AND DISTANCE Thanks to: Tan,

DATA MINING LECTURE 4 Similarity and Distance Recommender Systems SIMILARITY AND DISTANCE

Audio Files Realignment by Dynamic Time Warping (DTW) Florian Picard, Florian Tilquin June 27,

Exact Indexing of Dynamic Exact Indexing of Dynamic Time Warping Time Warping Eamonn Keogh

Distance Education Distance education used to be about the distance. 1700s 1800s 1900s 2000s

Time- -dependent Similarity Measure dependent Similarity Measure Time Time-dependent Similarity

COMP9313: Big Data Management High Dimensional Similarity Search Similarity Search Problem

DATA MINING LECTURE 5 Similarity and Distance Sketching, Locality Sensitive Hashing SIMILARITY

Semantic Similarity MultiJEDI ERC 259234 Semantic Similarity Semantic Similarity Mostly

Photo Album by Msizi Nyalungu By Dr Dipalesa Mokoboto Outli line Introduction to FTW

Motion Cyclification Cyclification Motion by by Time x Frequency Warping Time x Frequency

Stream Monitoring under the Time Warping Distance Yasushi Sakurai (NTT Cyber Space Labs)

Bringing Portraits to Life CS448V: Lecture 13 Motivation Motivation Motivation Bring Your

The PTW package Institute for Molecules and Materials Dept. Chemometrics / Analytical Chemistry

Rasterization May 1, 2006 Triangles Only We will discuss the rasterization of triangles

scamper Matthew Luckie mjl@wand.net.nz http://www.wand.net.nz/scamper/ 1 What is scamper?

Warp-and-Project Tomography for Rapidly Deforming Objects Guangming Zang, Ramzi Idoughi, Ran Tao,

with G. Compre, M.J. Rodriguez Motivation universal entropy for black holes good

CS 225 Data Structures April 30 Floyd- Warshalls Algorithm Wad ade Fag agen-Ulm

CSE 326: Data Structures distinguished vertex s , find the shortest weighted path from s to every

FTW: Fast Similarity Search under the Time Warping Distance Yasushi - PowerPoint PPT Presentation

FTW: Fast Similarity Search under the Time Warping Distance Yasushi Sakurai (NTT Cyber Space Labs) Masatoshi Yoshikawa (Nagoya Univ.) Christos Faloutsos (Carnegie Mellon Univ.) Motivation n Time-series data q many applications n computational

Introduction Warping polynomial Span of warping polynomial Span and dealternating number Ayaka

RESOURCES FOR SPEECH SYNTHESIS OF VIENNESE VARIETIES Contents Project Viennese Sociolect

Today Alignment &amp; warping 2d transformations Forward and inverse image warping

Business Plan For March, 2013 What Is FTW? Mission Statement FTW will provide the essential

LECTURE 4 Similarity and Distance Recommender Systems SIMILARITY AND DISTANCE Thanks to: Tan,

DATA MINING LECTURE 4 Similarity and Distance Recommender Systems SIMILARITY AND DISTANCE

Audio Files Realignment by Dynamic Time Warping (DTW) Florian Picard, Florian Tilquin June 27,

Exact Indexing of Dynamic Exact Indexing of Dynamic Time Warping Time Warping Eamonn Keogh

Distance Education Distance education used to be about the distance. 1700s 1800s 1900s 2000s

Time- -dependent Similarity Measure dependent Similarity Measure Time Time-dependent Similarity

COMP9313: Big Data Management High Dimensional Similarity Search Similarity Search Problem

DATA MINING LECTURE 5 Similarity and Distance Sketching, Locality Sensitive Hashing SIMILARITY

Semantic Similarity MultiJEDI ERC 259234 Semantic Similarity Semantic Similarity Mostly

Photo Album by Msizi Nyalungu By Dr Dipalesa Mokoboto Outli line Introduction to FTW

Motion Cyclification Cyclification Motion by by Time x Frequency Warping Time x Frequency

Stream Monitoring under the Time Warping Distance Yasushi Sakurai (NTT Cyber Space Labs)

Bringing Portraits to Life CS448V: Lecture 13 Motivation Motivation Motivation Bring Your

The PTW package Institute for Molecules and Materials Dept. Chemometrics / Analytical Chemistry

Rasterization May 1, 2006 Triangles Only We will discuss the rasterization of triangles

scamper Matthew Luckie mjl@wand.net.nz http://www.wand.net.nz/scamper/ 1 What is scamper?

Warp-and-Project Tomography for Rapidly Deforming Objects Guangming Zang, Ramzi Idoughi, Ran Tao,

with G. Compre, M.J. Rodriguez Motivation universal entropy for black holes good

CS 225 Data Structures April 30 Floyd- Warshalls Algorithm Wad ade Fag agen-Ulm

CSE 326: Data Structures distinguished vertex s , find the shortest weighted path from s to every

Today Alignment & warping 2d transformations Forward and inverse image warping