FTW: Fast Similarity Search under the Time Warping Distance Yasushi Sakurai (NTT Cyber Space Labs) Masatoshi Yoshikawa (Nagoya Univ.) Christos Faloutsos (Carnegie Mellon Univ.)
Motivation n Time-series data q many applications n computational biology, astrophysics, geology, meteorology, multimedia, economics n Similarity search q Euclidean distance q DTW (Dynamic Time Warping) n Useful for different sequence lengths n Different sampling rates n scaling along the time axis PODS 2005 2 Y. Sakurai et al
Mini-introduction to DTW n DTW allows sequences to be stretched along the time axis q Minimize the distance of sequences q Insert ‘stutters’ into a sequence q THEN compute the (Euclidean) distance original ‘ stutters’: PODS 2005 3 Y. Sakurai et al
Mini-introduction to DTW n DTW is computed by dynamic programming q Warping path: set of grid cells in the time warping matrix Optimum warping path (the best alignment ) data sequence P of length q M N p i p N p 1 p -stutters Q q j q 1 q 1 p 1 p i p N q M q j P query sequence Q of length M q -stutters PODS 2005 4 Y. Sakurai et al
Mini-introduction to DTW n DTW is computed by dynamic programming p 1 , p 2 , …, p i,; q 1 , q 2 , …, q j = D ( P , Q ) f ( N , M ) dtw p -stutter - ì f ( i , j 1 ) ï q -stutter = - + - f ( i , j ) p q min f ( i 1 , j ) í i j ï no stutter - - f ( i 1 , j 1 ) î PODS 2005 5 Y. Sakurai et al
Mini-introduction to DTW n Global constraints limit the warping scope q Warping scope: area that the warping path is allowed to visit q M q M Q Q q j q j q 1 q 1 p 1 p i p N p 1 p i p N P P Sakoe-Chiba Band Itakura Parallelogram PODS 2005 6 Y. Sakurai et al
Mini-introduction to DTW n Width of the warping scope W is user-defined q M q M W 1 W 2 Q Q q j q j q 1 q 1 p 1 p i p N p 1 p i p N P P Sakoe-Chiba Band PODS 2005 7 Y. Sakurai et al
Motivation n Similarity search for time-series data q DTW (Dynamic Time Warping) n scaling along the time axis But… n High search cost O(NM) n prohibitive for long sequences PODS 2005 8 Y. Sakurai et al
Our Solution, FTW n Requirements: 1. Fast 2. No false dismissals 3. No restriction on the sequence length n It should handle data sequences of different lengths 4. Support for any, as well as for no restriction on “warping scope” PODS 2005 9 Y. Sakurai et al
Problem Definition n Given q S time-series data sequences of unequal lengths { P 1 , P 2 , …, P S } , q a query sequence Q , q an integer k , q (optionally) a warping scope W , n Find the k -nearest neighbors of Q from the data sequence set by using DTW with W PODS 2005 10 Y. Sakurai et al
Overview n Introduction n Related work n Main ideas n Experimental results n Conclusions PODS 2005 11 Y. Sakurai et al
Related Work n Sequence indexing q Agrawal et al. (FODO 1998) q Keogh et al. (SIGMOD 2001) q … n Subsequence matching q Faloutsos et al. (SIGMOD 1994) q Moon et al. (SIGMOD 2002) q … PODS 2005 12 Y. Sakurai et al
Related Work n Fast sequence matching for DTW q Yi et al. (ICDE 1998) q Kim et al. (ICDE 2001) q Chu et al. (SDM 2002) q Keogh (VLDB 2002) q Zhu et al. (SIGMOD 2003) q … n None of the existing methods for DTW fulfills all the requirements PODS 2005 13 Y. Sakurai et al
Overview n Introduction n Related work n Main ideas n Experimental results n Conclusions PODS 2005 14 Y. Sakurai et al
Main Idea (1) - LBS n LBS (Lower Bounding distance measure with Segmentation) n P A : Approximate sequences R p : segment range q i U p : upper value q i L p : lower value q A P i p = R L U ( p : p ) i i i R p 4 R p 1 R q t : length of time intervals* R p 2 p 3 t t t t PODS 2005 15 Y. Sakurai et al
Main Idea (1) - LBS n Compute lower bounding distance R R p q q Distance of the two ranges and : i j distance of their two closest points R p i Value Lower bound Value Lower bound =0 R q j Time Time PODS 2005 16 Y. Sakurai et al
Main Idea (1) - LBS details n Compute lower bounding distance R R p q q Distance of the two ranges and : i j distance of their two closest points ì - > L U L U p q ( p q ) i j i j ï ï = - > R R L U L U D ( p , q ) q p ( q p ) í seg i j j i j i ï 0 ( otherwise ) ï î PODS 2005 17 Y. Sakurai et al
Main Idea (1) - LBS n Exact DTW distance P Q Q P PODS 2005 18 Y. Sakurai et al
Main Idea (1) - LBS A and Q A n Compute lower bounding distance from P n Use a dynamic programming approach £ A A D ( P , Q ) D ( P , Q ) lbs dtw A P A Q A Q A P PODS 2005 19 Y. Sakurai et al
Main Idea (1) - LBS A and Q A n Compute lower bounding distance from P n Use a dynamic programming approach £ A A D ( P , Q ) D ( P , Q ) lbs dtw A Q Q A P P PODS 2005 20 Y. Sakurai et al
Main Idea (2) - EarlyStopping n Exploit the fact that we have found k -near neighbors at distance d cb q d cb : k-nearest neighbor distance (the Current Best) the exact distance of the best k candidates so far PODS 2005 21 Y. Sakurai et al
Main Idea (2) - EarlyStopping d n Exclude useless warping paths by using cb > g ( 1 , 2 ) d q Omit g (1,3) if cb > g ( 3 , 1 ) d q Omit g (4,1) if cb A P A Q g (1,2) A Q g (3,1) A P PODS 2005 22 Y. Sakurai et al
Main Idea (3) - Refinement n Q: How to choose t (length of time intervals)? A P A Q g(1,2) A Q t g(3,1) A P t PODS 2005 23 Y. Sakurai et al
Main Idea (3) - Refinement n Q: How to choose t (length of intervals)? n A: Use multiple granularities, as follows: A P A Q g(1,2) A Q t g(3,1) A P t PODS 2005 24 Y. Sakurai et al
Main Idea (3) - Refinement n Compute the lower bounding distance from the coarsest sequences as the first refinement step > A A n Ignore P if , otherwise: D ( P , Q ) d lbs cb A P A Q g(1,2) A Q g(3,1) A P PODS 2005 25 Y. Sakurai et al
Main Idea (3) - Refinement n … compute the distance from more accurate sequences as the second refinement step n … repeat A P A Q A Q A P PODS 2005 26 Y. Sakurai et al
Main Idea (3) - Refinement n … until the finest granularity £ n Update the list of k -nearest neighbors if D ( P , Q ) d dtw cb P Q Q P PODS 2005 27 Y. Sakurai et al
Overview n Introduction n Related work n Main ideas n Experimental results n Conclusions PODS 2005 28 Y. Sakurai et al
Experimental results n Setup q Intel Xeon 2.8GHz, 1GB memory, Linux q Datasets: Temperature, Fintime, RandomWalk q Four different time intervals (for n =2048) t 1 =2, t 2 =8, t 3 =32, t 4 =128 n Evaluation q Compared FTW with LB_PAA (the best so far) q Mainly computation time PODS 2005 29 Y. Sakurai et al
Outline of experiments n Speed vs db size n Speed vs warping scope W n Effect of filtering n Effect of varying-length data sequences PODS 2005 30 Y. Sakurai et al
Search Performance n Itakura Parallelogram q M Q q j q 1 p 1 p i p N P PODS 2005 31 Y. Sakurai et al
Search Performance n Wall clock time as a function of data set size n Temperature FTW is up to 50 times faster! PODS 2005 32 Y. Sakurai et al
Search Performance n Wall clock time as a function of data set size n Fintime FTW is up to 40 times faster! PODS 2005 33 Y. Sakurai et al
Search Performance n Wall clock time as a function of data set size n RandomWalk FTW is up to 40 times faster! More effective as the size grows PODS 2005 34 Y. Sakurai et al
Outline of experiments n Speed vs db size n Speed vs warping scope W n Effect of filtering n Effect of varying-length data sequences PODS 2005 35 Y. Sakurai et al
Search Performance n Sakoe-Chiba Band q M q M W 1 W 2 Q Q q j q j q 1 q 1 p 1 p i p N p 1 p i p N P P PODS 2005 36 Y. Sakurai et al
Search Performance n Wall clock time as a function of warping scope n Temperature FTW is up to 220 times faster! PODS 2005 37 Y. Sakurai et al
Search Performance n Wall clock time as a function of warping scope n Fintime FTW is up to 70 times faster! PODS 2005 38 Y. Sakurai et al
Search Performance n Wall clock time as a function of warping scope n RandomWalk FTW is up to 100 times faster! PODS 2005 39 Y. Sakurai et al
Outline of experiments n Speed vs db size n Speed vs warping scope W n Effect of filtering n Effect of varying-length data sequences PODS 2005 40 Y. Sakurai et al
Effect of filtering n Most of data sequences are excluded by coarser approximations ( t 4 =128 and t 3 =32) q Using multiple granularities has significant advantages Frequency of approximation use PODS 2005 41 Y. Sakurai et al
Outline of experiments n Speed vs db size n Speed vs warping scope W n Effect of filtering n Effect of varying-length sequences PODS 2005 42 Y. Sakurai et al
Difference in Sequence Lengths n 5 sequence data sets Random (2048,0): length 2048 +/- 0 Random (2048,32): length 2048 +/- 16 Random (2048,64), Random (2048,128), Random (2048,256) Outperform by 2+ orders of magnitude LB_PAA can not handle PODS 2005 43 Y. Sakurai et al
Overview n Introduction n Related work n Main ideas n Experimental results n Conclusions PODS 2005 44 Y. Sakurai et al
Recommend
More recommend