Stream Monitoring under the Time Warping Distance Yasushi Sakurai (NTT Cyber Space Labs) Christos Faloutsos (Carnegie Mellon Univ.) Masashi Yamamuro (NTT Cyber Space Labs)
Introduction n Data-stream applications q Network analysis q Sensor monitoring q Financial data analysis q Moving object tracking n Goal q Monitor numerical streams q Find subsequences similar to the given query sequence q Distance measure: Dynamic Time Warping (DTW) ICDE 2007 2 Y. Sakurai et al
Introduction n DTW is computed by dynamic programming q Stretch sequences along the time axis to minimize the distance q Warping path: set of grid cells in the time warping matrix Optimal warping path (the best alignment ) X y M x i x N x 1 Y y j y 1 y 1 x 1 x i x N y M y j X Y Time warping matrix ICDE 2007 3 Y. Sakurai et al
Related Work n Sequence indexing, subsequence matching q Agrawal et al. (FODO 1998) q Keogh et al. (SIGMOD 2001) q Faloutsos et al. (SIGMOD 1994) q Moon et al. (SIGMOD 2002) n Fast sequence matching for DTW q Yi et al. (ICDE 1998) q Keogh (VLDB 2002) q Zhu et al. (SIGMOD 2003) q Sakurai et al. (PODS 2005) ICDE 2007 4 Y. Sakurai et al
Related Work n Data stream processing for pattern discovery q Clustering for data streams Guha et al. (TKDE 2003) q Monitoring multiple streams Zhu et al. (VLDB 2002) q Forecasting Papadimitriou et al. (VLDB 2003) q Detecting lag correlations Sakurai et al. (SIGMOD 2005) n DTW has been studied for finite, stored sequence sets n We address a new problem for DTW ICDE 2007 5 Y. Sakurai et al
Overview n Introduction / Related work n Problem definition n Main ideas n Experimental results ICDE 2007 6 Y. Sakurai et al
Problem Definition n Subsequence matching for data streams q (Fixed-length) query sequence Y= ( y 1 , y 2 ,…, y m ) q Sequence (data stream) X= ( x 1 , x 2 ,…, x n ) £ e D ( X [ t : t ], Y ) q Find all subsequences X [ t s ,t e ] such that s e ICDE 2007 7 Y. Sakurai et al
Subsequence Matching y 1 y m Y x t e x 1 x ts x n X Redundant, X [ t s :t e ] useless subsequences Other similar subsequences ICDE 2007 8 Y. Sakurai et al
Problem Definition n Subsequence matching for data streams q (Fixed-length) query sequence Y q Sequence (data stream) X= ( x 1 , x 2 ,…, x n ) £ e D ( X [ t : t ], Y ) q Find all subsequence X [ t s ,t e ] such that s e n Multiple matches by subsequences which heavily overlap with the “local minimum” best match [ double harm ] q Flood the user with redundant information q Slow down the algorithm by forcing it to keep track of and report all these useless “solutions” n Eliminate the redundant subsequences, and report only the “optimal” ones ICDE 2007 9 Y. Sakurai et al
Problem Definition Problem: Disjoint query n Given a threshold e , report all X [ t s :t e ] such that q £ e D ( X [ t : t ], Y ) 1. s e Only the local minimum 2. D ( X [ t : t ], Y ) is the smallest value in the group of s e overlapping subsequences that satisfy the first condition Additional challenges: streaming solution n Process a new value of X efficiently q Guarantee no false dismissals q Report each match as early as possible q ICDE 2007 10 Y. Sakurai et al
Overview n Introduction / Related work n Problem definition n Main ideas n Experimental results ICDE 2007 11 Y. Sakurai et al
Why not ‘ naive ’ ? n Compute the time warping matrices starting from every time-tick q Need O ( n ) matrices, O ( nm ) time per time-tick Capture the optimal subsequence starting from t = t s Y x 1 x ts x t e X n Disjoint query q Compute all the possible subsequences and then choose the optimal ones ICDE 2007 12 Y. Sakurai et al
Main idea (1) n Star-padding q Use only a single matrix (the naïve solution uses n matrices) q Prefix Y with ‘ * ’, that always gives zero distance q instead of Y=(y 1 , y 2 , …, y m ), compute distances with Y’ = Y ' ( y , y , y , ! , y ) 0 1 2 m = -¥ + ¥ y ( : ) 0 q O(m) time and space (the naïve requires O(nm) ) ICDE 2007 13 Y. Sakurai et al
SPRING Second subsequence Report X [ t s : t e ] Y ¢ t= 1 t=t s t=t e Start at zero distance on every bottom row X ICDE 2007 14 Y. Sakurai et al
Main idea (2) n STWM (Subsequence Time Warping Matrix) q Problem of the star-padding: we lose the information about the starting time-tick of the match q After the scan, “which is the optimal subsequence?” n Elements of STWM q Distance value of each subsequence q Starting position n Combination of star-padding and STWM q Efficiently identify the optimal subsequence in a stream fashion ICDE 2007 15 Y. Sakurai et al
Main idea (3) n Algorithm for disjoint queries n Designed to: q Guarantee no false dismissals q Report each match as early as possible ICDE 2007 16 Y. Sakurai et al
Algorithm for disjoint queries Update m elements (distance and starting position) 1. at every time-tick Keep track of the minimum distance d min when a 2. subsequence within e is found Report the subsequence that gives d min 3. if (a) and (b) are satisfied (a) the captured optimal subsequence cannot be replaced by the upcoming subsequences (b) the upcoming subsequences dot not overlap with the captured optimal subsequence ICDE 2007 17 Y. Sakurai et al
Algorithm for disjoint queries distance (upper number), starting position (number in parentheses) n X =(5,12,6,10,6,5,13), Y =(11,6,9,4), e = 20 n 54 110 14 38 6 7 88 y 4 = 4 (1) (2) (2) (2) (2) (2) (2) 53 46 10 2 10 17 18 y 3 = 9 (1) (2) (2) (2) (4) (4) (4) 37 37 1 17 1 2 51 y 2 = 6 (1) (2) (2) (4) (4) (4) (4) 36 1 25 1 25 36 4 y 1 = 11 (1) (2) (3) (4) (5) (6) (7) x t 5 12 6 10 6 5 13 t 1 2 3 4 5 6 7 ICDE 2007 18 Y. Sakurai et al
Algorithm for disjoint queries distance (upper number), starting position (number in parentheses) n X =(5,12,6,10,6,5,13), Y =(11,6,9,4), e = 20 n optimal subsequence, redundant subsequences n 54 110 14 38 6 7 88 y 4 = 4 (1) (2) (2) (2) (2) (2) (2) 53 46 10 2 10 17 18 y 3 = 9 (1) (2) (2) (2) (4) (4) (4) 37 37 1 17 1 2 51 y 2 = 6 (1) (2) (2) (4) (4) (4) (4) 36 1 25 1 25 36 4 y 1 = 11 (1) (2) (3) (4) (5) (6) (7) x t 5 12 6 10 6 5 13 t 1 2 3 4 5 6 7 ICDE 2007 19 Y. Sakurai et al
Algorithm for disjoint queries distance (upper number), starting position (number in parentheses) n X =(5,12,6,10,6,5,13), Y =(11,6,9,4), e = 20 n optimal subsequence, redundant subsequences n 54 110 14 38 6 7 88 y 4 = 4 (1) (2) (2) (2) (2) (2) (2) 53 46 10 2 10 17 18 y 3 = 9 (1) (2) (2) (2) (4) (4) (4) 37 37 1 17 1 2 51 y 2 = 6 (1) (2) (2) (4) (4) (4) (4) 36 1 25 1 25 36 4 y 1 = 11 (1) (2) (3) (4) (5) (6) (7) x t 5 12 6 10 6 5 13 t 1 2 3 4 5 6 7 ICDE 2007 20 Y. Sakurai et al
Algorithm for disjoint queries distance (upper number), starting position (number in parentheses) n X =(5,12,6,10,6,5,13), Y =(11,6,9,4), e = 20 n optimal subsequence, redundant subsequences n 54 110 14 38 6 7 88 y 4 = 4 (1) (2) (2) (2) (2) (2) (2) 53 46 10 2 10 17 18 y 3 = 9 (1) (2) (2) (2) (4) (4) (4) 37 37 1 17 1 2 51 y 2 = 6 (1) (2) (2) (4) (4) (4) (4) 36 1 25 1 25 36 4 y 1 = 11 (1) (2) (3) (4) (5) (6) (7) x t 5 12 6 10 6 5 13 t 1 2 3 4 5 6 7 ICDE 2007 21 Y. Sakurai et al
Algorithm for disjoint queries n Guarantee to report the optimal subsequence (a) The captured optimal subsequence cannot be replaced (b) The upcoming subsequences do not overlap with the captured optimal subsequence 54 110 14 38 6 7 88 y 4 = 4 (1) (2) (2) (2) (2) (2) (2) 53 46 10 2 10 17 18 y 3 = 9 (1) (2) (2) (2) (4) (4) (4) 37 37 1 17 1 2 51 y 2 = 6 (1) (2) (2) (4) (4) (4) (4) 36 1 25 1 25 36 4 y 1 = 11 (1) (2) (3) (4) (5) (6) (7) x t 5 12 6 10 6 5 13 t 1 2 3 4 5 6 7 ICDE 2007 22 Y. Sakurai et al
Algorithm for disjoint queries n Guarantee to report the optimal subsequence q Finally report the optimal subsequence X [2:5] at t= 7 q Initialize the distance values ( d 2 =51, d 3 =18, d 4 =88) 54 110 14 38 6 7 88 y 4 = 4 (1) (2) (2) (2) (2) (2) (2) 53 46 10 2 10 17 18 y 3 = 9 (1) (2) (2) (2) (4) (4) (4) 37 37 1 17 1 2 51 y 2 = 6 (1) (2) (2) (4) (4) (4) (4) 36 1 25 1 25 36 4 y 1 = 11 (1) (2) (3) (4) (5) (6) (7) x t 5 12 6 10 6 5 13 t 1 2 3 4 5 6 7 ICDE 2007 23 Y. Sakurai et al
Overview n Introduction / Related work n Problem definition n Main ideas n Experimental results ICDE 2007 24 Y. Sakurai et al
Experimental Results n Experiments with real and synthetic data sets q MaskedChirp, Temperature, Kursk, Sunspots n Evaluation q Accuracy for pattern discovery q Computation time q (Memory space consumption) ICDE 2007 25 Y. Sakurai et al
Pattern Discovery n MaskedChirp Query sequence Data stream ICDE 2007 26 Y. Sakurai et al
Pattern Discovery n MaskedChirp SPRING identifies all sound parts with varying time periods Query sequence The output time of each captured Data stream subsequence is very close to its end position ICDE 2007 27 Y. Sakurai et al
Pattern Discovery n Temperature Query sequence Data stream ICDE 2007 28 Y. Sakurai et al
Recommend
More recommend