Exact Indexing of Dynamic Exact Indexing of Dynamic Time Warping Time Warping Eamonn Keogh Eamonn Keogh Computer Science & Engineering Department University of California - Riverside Riverside,CA 92521 eamonn@cs.ucr.edu
Fair Use Agreement Fair Use Agreement If you use these slides (or any part thereof) for any lecture or class, please send me an email, if possible with a pointer to the relevant web page or document. eamonn eamonn@cs.ucr.edu
Outline of Talk Outline of Talk • Why do Time Series Similarity Matching? Why do Time Series Similarity Matching? • • Limitations of Euclidean Distance Limitations of Euclidean Distance • • Dynamic Time Warping Dynamic Time Warping • • Lower Bounding Dynamic Time Warping Lower Bounding Dynamic Time Warping • • Indexing Dynamic Time Warping Indexing Dynamic Time Warping • • Experimental Evaluation Experimental Evaluation • • Conclusions Conclusions • • Questions Questions •
Why do Time Series Similarity Matching? Why do Time Series Similarity Matching? Clustering Classification Clustering Classification Rule Discovery Rule Discovery Query by Content 10 ⇒ s = 0.5 c = 0.3
Euclidean Vs Dynamic Time Warping Euclidean Vs Dynamic Time Warping Euclidean Distance Sequences are aligned “one to one”. “ Warped” Time Axis Nonlinear alignments are possible.
Limitations of Euclidean Distance I Limitations of Euclidean Distance I Classification Classification Classification Experiment on Cylinder Cylinder- -Bell Bell- -Funnel Funnel Dataset Dataset Classification Experiment on Training data consists of 10 exemplars from each class. • (One) Nearest Neighbor Algorithm • “Leaving-one-out” evaluation, averaged over 100 runs 26.10% 26.10% • Euclidean Distance Error rate Euclidean Distance Error rate 2.87% 2.87% • Dynamic Time Warping Error rate Dynamic Time Warping Error rate •
Limitations of Euclidean Distance II Limitations of Euclidean Distance II Clustering Clustering Friday Monday Tuesday Thursday Saturday Sunday Wednesday Wednesday was a national holiday Euclidean Dynamic Time Warping
Because of the robustness of Dynamic Time Warping Because of the robustness of Dynamic Time Warping compared to Euclidean Distance, it is used in… compared to Euclidean Distance, it is used in… Bioinformatics: Aach, J. and Robotics: Schmill, M., Oates, T. & Church, G. (2001). Aligning gene Cohen, P. (1999). Learned models for continuous planning. In 7 th International expression time series with time warping algorithms. Bioinformatics. Workshop on Artificial Intelligence and Volume 17, pp 495-508. Statistics. Medicine: Caiani, E.G., et. al. Chemistry: Gollmer, K., & Posten, C. (1995) Detection of distorted pattern using (1998) Warped-average template technique dynamic time warping algorithm and to track on a cycle-by-cycle basis the cardiac filling phases on left ventricular application for supervision of bioprocesses. volume. IEEE Computers in Cardiology. IFAC CHEMFAS-4 Gesture Recognition: Meteorology/ Tracking/ Gavrila, D. M. & Davis,L. S.(1995). Biometrics / Astronomy / Towards 3-d model-based tracking and Finance / Manufacturing … recognition of human movement: a multi-view approach. In IEEE IWAFGR
How is DTW How is DTW ∑ = K = ( , ) min DTW Q C w K k k 1 Calculated? Calculated? γ (i,j) = d ( q i , c j ) + min{ γ ( i -1, j -1) , γ ( i -1, j ) , γ ( i , j -1) } C Q C Q Warping path w
DTW is much bet t er t han Euclidean dist ance f or classif icat ion, clust ering, query by cont ent et c. But is it not t rue t hat “ dynamic t ime warping cannot be speeded up by indexing *”, and is O( n 2 )? * Agrawal, R., Lin, K. I., Sawhney, H. S., & Shim, K. (1995). Fast similarity search in the presence of noise, scaling, Dooh and translation in times-series databases. VLDB pp. 490-501.
Constraints Global Constraints Global • Slightly speed up the calculations • Prevent pathological warpings C C Q Q Sakoe-Chiba Band Itakura Parallelogram
A global constraint constrains the indices of the warping path w k = ( i , j ) k such that j - r ≤ i ≤ j + r Where r is a term defining allowed range of warping for a given point in a sequence. r = Sakoe-Chiba Band Itakura Parallelogram
Lower Bounding Lower Bounding We can speed up similarity search under DTW by using a lower bounding function. Algorithm Lower_Bounding_Sequential_Scan(Q) Algorithm Lower_Bounding_Sequential_Scan(Q) Intuition 1. 1. best_so_far = infinity; best_so_far = infinity; 2. 2. for all sequences in database for all sequences in database 3. 3. LB_dist = lower_bound_distance( C i , Q); LB_dist = lower_bound_distance( C i , Q); Try to use a cheap lower 4. 4. if LB_dist < best_so_far if LB_dist < best_so_far bounding calculation as 5. 5. true_dist = DTW(C i , Q); true_dist = DTW(C i , Q); often as possible. if true_dist < best_so_far if true_dist < best_so_far 6. 6. 7. 7. best_so_far = true_dist; best_so_far = true_dist; 8. 8. index_of_best_match = i; index_of_best_match = i; Only do the expensive, 9. 9. endif endif full calculations when it is 10. 10. endif endif absolutely necessary. 11. endfor 11. endfor
Lower Bound of Kim et. al. Lower Bound of Kim et. al. C A D B LB_Kim The squared difference between the two Kim, S, Park, S, & Chu, W. An index-based approach for sequence’s first (A), last (D), minimum similarity search supporting time (B) and maximum points (C) is returned warping in large sequence as the lower bound databases . ICDE 01, pp 607-614
Lower Bound of Yi et. al. Lower Bound of Yi et. al. max(Q) min(Q) LB_Yi The sum of the squared length of gray Yi, B, Jagadish, H & Faloutsos, lines represent the minimum the C. Efficient retrieval of similar corresponding points contribution to the time sequences under time overall DTW distance, and thus can be warping . ICDE 98, pp 23-27. returned as the lower bounding measure
What we have seen so far… What we have seen so far… • Dynamic Time Warping (DTW) is a very robust technique for measuring time series similarity. • DTW is widely used in diverse fields. • Since DTW is expensive to calculate, techniques to speed up similarity search have been introduced, including global constraints and two different lower bounding techniques.
A Novel Lower Bounding Technique I A Novel Lower Bounding Technique I C Q U Q L Sakoe-Chiba Band U i = max(q i-r : q i+r ) L i = min(q i-r : q i+r ) C Q U Q L Itakura Parallelogram
A Novel Lower Bounding Technique II A Novel Lower Bounding Technique II C C U Q L Q − > Sakoe-Chiba Band 2 ( c U ) if c U i i i i n ∑ = − < 2 LB _ Keogh ( Q , C ) ( c L ) if c L i i i i = i 1 0 otherwise C C Q U LB_Keogh Itakura Parallelogram L Q
The tightness of the lower bound for each technique is proportional nal The tightness of the lower bound for each technique is proportio to the length of gray lines used in the illustrations to the length of gray lines used in the illustrations LB_Kim LB_Yi LB_Keogh Sakoe-Chiba LB_Keogh Itakura
Before we consider the problem of Before we consider the problem of indexing, let us empirically evaluate the indexing, let us empirically evaluate the quality of the proposed lowering quality of the proposed lowering bounding technique. bounding technique. This is a good idea, since it is an This is a good idea, since it is an implementation free measure of quality. measure of quality. implementation free First we must discuss our experimental First we must discuss our experimental philosophy… philosophy…
Experimental Philosophy Experimental Philosophy • We tested on 32 datasets from such diverse fields as finance, medicine, biometrics, chemistry, astronomy, robotics, networking and industry. The datasets cover the complete spectrum of stationary/ non-stationary, noisy/ smooth, cyclical/ non-cyclical, symmetric/ asymmetric etc • Our experiments are completely reproducible. We saved every random number, every setting and all data. • To ensure true randomness, we use random numbers created by a quantum mechanical process. • We test with the Sakoe-Chiba Band , which is the worst case for us (the Itakura Parallelogram would give us much better results).
Tightness of Lower Bound Experiment Tightness of Lower Bound Experiment • We measured T T = Lower Bound Estimate of Dynamic Time Warp Dista nce True Dynamic Time Warp Dista nce 0 ≤ T ≤ 1 • For each dataset, we randomly extracted 50 sequences of length 256 . The larger the We compared each sequence to the 49 better others. Query length of • For each dataset we report T as 256 is about the average ratio from the 1,225 (50*49/2) mean in the comparisons made. literature.
LB_Keogh LB_Yi 1.0 LB_Kim 0.8 0.6 0.4 0.2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
Effect of Query Length on Tightness of Lower Bounds Effect of Query Length on Tightness of Lower Bounds 1.0 Tightness of Lower Bound T 0.8 0.6 31 0.4 32 0.2 LB_Keogh 0 LB_Yi 16 32 64 128 256 512 1024 LB_Kim Query Length
Recommend
More recommend