similarity matching of temporal event interval sequences
play

Similarity Matching of Temporal Event-Interval Sequences S . MO H - PowerPoint PPT Presentation

Similarity Matching of Temporal Event-Interval Sequences S . MO H A MMA D MI R B A G H E R I A N D H O WA RD J . H A MI LTO N U N I V E R S I T Y O F R E G I N A , R E G I N A , C A N A D A Outline 1. Introduction 2. Problem


  1. Similarity Matching of Temporal Event-Interval Sequences S . MO H A MMA D MI R B A G H E R I A N D H O WA RD J . H A MI LTO N U N I V E R S I T Y O F R E G I N A , R E G I N A , C A N A D A

  2. Outline 1. Introduction 2. Problem Statement 3. Similarity Matching 4. Experiments 5. Conclusion 2

  3. Introduction • Interval-based event sequences (e-sequence ) • Sequences of events persist over intervals of time of varying lengths • Exist in many application domains such as medicine, sensor networks, and sign languages • E-sequence dataset • Contains longitudinal data : instances are described by a series of event intervals • No features with a single value • Not organized appropriately for standard machine learning algorithms 3

  4. Problem Statement •Event interval: ◦ A triple e = (l, b, f) with event label, beginning and finishing time, e.g., (A ,4 ,8) •E-sequence: • A list of m event intervals placed in ascending order based on their beginning times, e.g., � � = <(A,4,8),(B,6,12),(C,14,18),(D,20,22) > •E-sequence dataset: • Set of n e-sequences { � � ,…, � � } where each e-sequence � � is associated with an unique identifier � . 4

  5. Problem Statement An example of e-sequence dataset with 4 e-sequences (e.g., 4 patients ) and 6 event labels (e.g., type of diseases) 5

  6. Problem Statement •E-sequence sliced time: {4,5,10,12,14,16,18,20,22} •Coincidence: • � � � (5,10) = {C,D,E} •Coincidence label sequence (L-sequence): • Ordered list of coincidences excluding gaps e.g., � � � = < {E},{C,D,E},{C,E},{E},{B},{B,F},{B} > 6

  7. Problem Statement • Problem : ◦ Similarity searching and matching of full-length e-sequences • Contributions : • We propose and evaluate three novel approaches • We intuitively view the similarity between two e-sequences � and � � in terms of: • Presence of event intervals with the same event labels • Order of occurrences of these event intervals • Duration of the event intervals • Temporal relations among these event intervals 7

  8. Similarity Matching 1. Matching Using Relative Frequency •Relative Frequency: event labels Duration event interval E in � � : d(E) = 14-4=10 Duration e-sequence � � : d( � � ) = 22-4= 18 # event labels •Function � � maps an e-sequence � to a vector of the relative frequencies of event labels •Distance between the relative frequency vectors of e-sequences � and � � 8

  9. Similarity Matching 2. Matching Using Position Code •Position Code: L-sequence Coincidence •Function � � maps an e-sequence � to a vector of the position codes of event labels •Distance between the position code vectors of e-sequences � and � � 9

  10. Similarity Matching 3. Matching Using Multiple Kernel Learning •Distance between two e-sequences � and � � based on Multiple Kernel Learning weight of functions functions: (kernels) e.g., {ERF, EPC} number of kernels 10

  11. Experiments •Eight real-world datasets •Method of evaluation: • Perform 1-NN classification prediction for every e-sequence in datasets and • Recording the fraction of correct predictions (accuracy) •Three existing competitors: • DTW-based method • Artemis • IBSM 11

  12. Experiments •Results: • EMKL outperforms the Artemis and DTW-based methods on all datasets • EMKL vs IBSM: • Outperforms on four datasets, • Ties on two datasets • Loses on two datasets 12

  13. Conclusion •We propose three distance functions to match the full-length of event interval sequences. 1. The ERF function measures the distance of e-sequences based on the relative frequency of the event intervals. 2. The EPC function matches e-sequences based on the position codes of the event intervals. 3. The EMKL function combines the ERF and EPC functions. •Experiments show that EMKL method is an effective approach to the task of matching of full-length e-sequences and it is a better choice compared to the state-of-the-art methods. 13

Recommend


More recommend