Distance Measure for Querying Arrangements of Temporal Intervals, by Panagiotis Papapetrou. Distance Measure for Querying Arrangements of Temporal Intervals Orestis Kostakis, Panagiotis Papapetrou, and Jaakko Hollm´ en Department of Information and Computer Science, Aalto University. May 27, 2011
Distance Measure for Querying Arrangements of Temporal Intervals, by Panagiotis Papapetrou. Motivation Sign Language similarity search
Distance Measure for Querying Arrangements of Temporal Intervals, by Panagiotis Papapetrou. Motivation An expression in sign language contains a set of event-channels that are on or off over time. Each event is characterized by: a label: e.g., eye-brow raise. a duration, defined by a start and an end point. Figure: An example of a Wh-question expressed in sign language.
Distance Measure for Querying Arrangements of Temporal Intervals, by Panagiotis Papapetrou. Motivation Problem: How to assess the similarity of such representations? A� A� B� B� C� C� (a)� (b)� Figure: Two examples.
Distance Measure for Querying Arrangements of Temporal Intervals, by Panagiotis Papapetrou. Outline Background Method Experiments Conclusions Discussion
Distance Measure for Querying Arrangements of Temporal Intervals, by Panagiotis Papapetrou. Background: Definitions Sequences of interval-based events allow the representation of a wide range of real-world sequences. Formally, an e-sequence is defined as an ordered set S = { S 1 , . . . , S n } , where each S i = ( E i , t i start , t i end ) is called an event-interval , E i ∈ σ . A� B� C� C� D� time� 1� 3� 4� 7� 15� 19� 23� 30� 42� Figure: S = { ( A , 1 , 10) , ( B , 5 , 13) , ( C , 17 , 30) , ( A , 20 , 26) , ( D , 24 , 30) }
Distance Measure for Querying Arrangements of Temporal Intervals, by Panagiotis Papapetrou. Background: Related Work Existing work (e.g., Papapetrou et al. 2009, Moerchen 2010, Hoeppner 2001) focuses mainly on: mining frequent patterns of interval-based events; mining association rules involving interval-based events; mining semi-interval partial order events. So far: no formulation of any type of robust distance or similarity metrics.
Distance Measure for Querying Arrangements of Temporal Intervals, by Panagiotis Papapetrou. Problem: Example Problem: how to assess the similarity of two e-sequences? A� A� B� B� C� C� (a)� (b)� Figure: How similar are these two e-sequences?
Distance Measure for Querying Arrangements of Temporal Intervals, by Panagiotis Papapetrou. Problem: Formulation Problem Formulation Given two e-sequences S and T , define a distance measure D , such that ∀S , T : D ( S , T ) ≥ 0 (1) D ( S , S ) = 0 (2) D ( S , T ) = D ( T , S ) (3) The degree to which the two e-sequences differ should be reflected in the value of D ( S , T ) and should be in accordance with the knowledge obtained from domain experts.
Distance Measure for Querying Arrangements of Temporal Intervals, by Panagiotis Papapetrou. Problem: Solutions Problem: how to assess the similarity of two e-sequences? A� A� B� B� C� C� (a)� (b)� Some options: map them to traditional sequences of instantaneous events?
Distance Measure for Querying Arrangements of Temporal Intervals, by Panagiotis Papapetrou. Problem: Solutions Sequences of instantaneous events do not depict all the important information: A A A A Transforming the above sequences to sequences of instantaneous events would yield the same result: A start , A start , A end , A end .
Distance Measure for Querying Arrangements of Temporal Intervals, by Panagiotis Papapetrou. Problem: Solutions Problem: how to assess the similarity of two e-sequences? A� A� B� B� C� C� (a)� (b)� Some options: map them to traditional sequences of instantaneous events? × compare event-labels? √ compare event-interval relations? √
Distance Measure for Querying Arrangements of Temporal Intervals, by Panagiotis Papapetrou. Problem: Solutions Problem: how to assess the similarity of two e-sequences? A� A� B� B� C� C� (a)� (b)� what about event durations? for simplicity we ignore them. arrangement: an e-sequence where start and end “tags” are dropped [Papapetrou et al. 2009].
Distance Measure for Querying Arrangements of Temporal Intervals, by Panagiotis Papapetrou. Method: Key Idea Our approach: Focus on the relations between pairs of intervals. A B Follow(A,B) A Meet(A,B) B A B Overlap(A,B) A Match(A,B) B A Left B Contain(A,B) A Right B Contain(A,B) A Contain(A,B) B Figure: Allen’s temporal model [Allen et al. 1983].
Distance Measure for Querying Arrangements of Temporal Intervals, by Panagiotis Papapetrou. Method: Relation Matrix The solution: Given an event interval sequence S Create its relation matrix M A relation { A,A } { A,B } { B,A } { B,B } meet 0 1 0 0 match 0 0 1 0 overlap 1 2 0 1 contain 0 0 0 0 left-contain 0 0 0 0 right-contain 0 0 0 0 follow 0 0 0 0
Distance Measure for Querying Arrangements of Temporal Intervals, by Panagiotis Papapetrou. Method: Distance Arrangement Distance 1 | σ | 2 |I| p � � p ∈ N ∗ | M A ( i , j ) − M B ( i , j ) | p δ p ( A , B ) = , (4) i =1 j =1 Question: What would be a suitable value for p ? Manhattan Distance For p = 1, Eq. 8 corresponds to the Manhattan distance . | σ | 2 |I| � � δ 1 ( A , B ) = | M A ( i , j ) − M B ( i , j ) | (5) i =1 j =1
Distance Measure for Querying Arrangements of Temporal Intervals, by Panagiotis Papapetrou. Method: Distance Arrangement Distance 1 | σ | 2 |I| p � � p ∈ N ∗ | M A ( i , j ) − M B ( i , j ) | p δ p ( A , B ) = (6) , i =1 j =1 Question: What would be a suitable value for p ? Frobenius Norm For p = 2, Eq. 8 corresponds to the Frobenius norm of M A − M B : � � | σ | 2 |I| � � � | M A ( i , j ) − M B ( i , j ) | 2 δ 2 ( A , B ) = � (7) � i =1 j =1
Distance Measure for Querying Arrangements of Temporal Intervals, by Panagiotis Papapetrou. Method: Distance Normalized arrangement distance | σ | 2 |I| | M A ( i , j ) − M B ( i , j ) | � � δ norm ( A , B ) = (8) M A ( i , j ) + M B ( i , j ) i =1 j =1 based on the L 1 norm. normalized over the total possible # of relations where A and B can differ. non-metric: δ norm ( A , B ) if-f A = B (identity of the indiscernibles) is violated.
Distance Measure for Querying Arrangements of Temporal Intervals, by Panagiotis Papapetrou. Experiments: ASL Dataset SignStream Database: by the National Center for Sign Language and Gesture Resources at Boston University. # of e-sequences: 873. # of intervals: 15675. Min size: 4. Max size: 41. Average size: 18. Labels: 216. Classes: 5.
Distance Measure for Querying Arrangements of Temporal Intervals, by Panagiotis Papapetrou. Experiments: Setup We tested: robustness against artificial noise. classification accuracy. Artificial noise: shift probability s : each event-interval in S is shifted with probability s . distortion level d : the start point of each event-interval was shifted by ± d % |S| .
Distance Measure for Querying Arrangements of Temporal Intervals, by Panagiotis Papapetrou. Experiments: Robustness Robustness to noise: we compared the Normalized, the Manhattan, and the Frobenius distance in terms of: A nearest neighbor retrieval accuracy: the fraction of noisy queries for which the originating sequence is retrieved. B rank of nearest neighbor: for each query, the number of database sequences with distance less than or equal to that of the originating counterpart.
Distance Measure for Querying Arrangements of Temporal Intervals, by Panagiotis Papapetrou. Experiments: Robustness 1 1 0.9 0.99 Retrieval accuracy Retrieval accuracy 0.8 0.98 Probability 0.2 0.7 Probability 0.2 0.97 Probability 0.4 Probability 0.4 Probability 0.6 Probability 0.6 Probability 0.8 0.6 0.96 Probability 0.8 Probability 1.0 Probability 1.0 0.5 0.95 0.1 0.2 0.3 0.4 0.5 0.1 0.2 0.3 0.4 0.5 Distortion Distortion (a) Manhattan (b) Normalized Figure: Retrieval accuracy: success ratio of matching the noisy sequences to their original counterpart.
Distance Measure for Querying Arrangements of Temporal Intervals, by Panagiotis Papapetrou. Experiments: Robustness 1 1 0.95 0.95 Database Ratio Database Ratio 0.9 0.9 0.85 0.85 0.8 0.8 Normalized 0.75 Normalized 0.75 Frobenius Frobenius 0.7 0.7 Manhattan Manhattan 0.65 0.65 0 0.02 0.04 0.06 0.08 0.1 0 0.02 0.04 0.06 0.08 0.1 Rank of NN, Ratio Rank of NN, Ratio (a) Probability 0 . 6, distortion 50% (b) Probability 1 . 0, distortion 50% Figure: Comparison of the cumulative histograms for the rank of the 1-NN for each distance measure. Ranks are denoted as a ratio of the database size.
Distance Measure for Querying Arrangements of Temporal Intervals, by Panagiotis Papapetrou. Experiments: 1-NN Classification Accuracy 1-NN classification accuracy: the fraction of e-sequences for which their class is the same as that of their 1-NN. Data: # of classes: 5. # of e-sequences: 873. 1-NN Classification Accuracy ≈ 88%.
Recommend
More recommend