time focused density focused density based based time
play

Time- -focused density focused density- -based based Time - PowerPoint PPT Presentation

Time- -focused density focused density- -based based Time clustering of trajectories clustering of trajectories of moving objects of moving objects Margherita DAuria DAuria Margherita Mirco Nanni Mirco Nanni Dino Pedreschi


  1. Time- -focused density focused density- -based based Time clustering of trajectories clustering of trajectories of moving objects of moving objects Margherita D’Auria D’Auria Margherita Mirco Nanni Mirco Nanni Dino Pedreschi Pedreschi Dino

  2. Plan of the talk Plan of the talk � Introduction � Motivations � Problem & context � Density-based Clustering (OPTICS) � Density-based clustering on trajectories � Trajectory data model distance measure � Results � Temporal Focusing � A clustering quality measure � Heuristics for optimal temporal interval � Conclusions & future work �

  3. Motivations Motivations � Plenty of actual and future data sources for Plenty of actual and future data sources for � spatio- -temporal data temporal data spatio � Sophisticated analysis method are required, in Sophisticated analysis method are required, in � order to fully exploit them order to fully exploit them � Data mining methods � Which kind of patterns/models? � Main objectives Main objectives � � A better understanding of the application domain � An improvement for private and public services �

  4. Problem & & context context Problem � A A distinguishing distinguishing case: Mobile case: Mobile devices devices � � PDAs � Mobile phones � LBS-enabled devices (may include the two above) � They They (can) (can) yield yield traces traces of of their their movement movement � � An An important important problem problem: : � � Discovering groups of individuals that (approx.) move together in some period of time � E.g.: detection of traffic jams during rush hours � A candidate Data A candidate Data Mining Mining reformulation reformulation of the of the problem problem � � Clustering of individuals’ trajectories �

  5. Which kind kind of of clustering clustering? ? Which � Several Several alternatives alternatives are are available available � � General General requirements requirements: : � � Non-spherical clusters should be allowed � E.g.: A traffic jam along a road � It should be represented as a cluster which individuals form a “snake-shaped” cluster � Tolerance to noise � Low computational cost � Applicability to complex, possibly non-vectorial data � A A suitable suitable candidate: candidate: Density Density- -based based clustering clustering � � In particular, we adopt OPTICS �

  6. A crushed crushed intro intro to to OPTICS OPTICS A � A density threshold is defined through two parameters: � � : A neighborhood radius � MinPts : Minimum number of points � Key concepts: Key concepts: � � Core objects � Objects with a � -Neighborhood that contains at least MinPts objects � Reachability-distance reach-d( p, q ) � (simplified definition:) Distance between objects p and q � Example Example: : � � � � Object “q” is a core object if MinPts=2 reach ch-d(p,q ) q q � Object “p” is not p p � Their reach-d() is shown � – � –neighborhood of q neighborhood of q �

  7. A crushed crushed intro intro to to OPTICS OPTICS A The algorithm: Repeatedly choose a non-visited random object, until a core object 1. is selected Select the core object having the smallest reachability distance 2. from all the visited core objects. If none can be found, go to step 1 Output: reach-d() of all visited points Order of visit ( reachability ( reachability plot plot ) ) 0.22 18 0.2 13 “jump” from left - and group (0 h - 9 ) 9 14 0.18 to right - h and one (10 (10 - 1 8) 12 - 1 8) 0.16 15 11 3 0.14 10 Reachability 1 Y axis 16 0.12 2 threshold 17 0.1 0 4 Cluster 1 Cluster 2 0.08 5 6 0.06 8 0.04 7 0.02 X axis 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 �

  8. Applying OPTICS OPTICS to to trajectories trajectories Applying � Two Two key key issues issues have have to to be be solved solved � � A suitable representation for trajectories is needed � Which data model for trajectories? � A mean for comparing trajectories has to be provided � Which distance between objects? � OPTICS needs to define one to perform range queries �

  9. A trajectory trajectory data model data model A � Raw Raw input data: input data: � � Each trajectory is represented as a set of time-stamped coordinates � T=(t 1 ,x 1 ,y 1 ), …, (t n , x n , y n ) => Object position at time t i was (x i ,y i ) � Data model Data model � � Parametric-spaghetti: linear interpolation between consecutive points �

  10. A distance distance between between trajectories trajectories A � Adopted Adopted distance distance = = average average distance distance � T � d ( ( t ), ( t )) dt τ τ 1 2 T D ( , ) | τ τ = 1 2 | T | � It It is is a a metric metric => => efficient efficient indexing indexing methos methos allowed allowed � ��

  11. A sample sample dataset dataset A � Set of Set of trajectories trajectories forming forming 4 4 clusters clusters + + noise noise � � Generated Generated by by the CENTRE system ( the CENTRE system (KDDLab KDDLab software) software) � ��

  12. OPTICS vs. OPTICS vs. HAC & K- -means means HAC & K K-means HAC-average OPTICS ��

  13. Temporal focusing Temporal focusing � Different time intervals can show different Different time intervals can show different � behaviours behaviours � E.g.: objects that are close to each other within a time interval can be much distant in other periods of time � The time interval becomes a parameter The time interval becomes a parameter � � E.g.: rush hours vs. low traffic times � Problem: significant time intervals are not always Problem: significant time intervals are not always � known a priori a priori known � An automated mechanism is needed to find them ��

  14. Temporal focusing Temporal focusing The proposed proposed method method The � � Provide a notion notion of interestingness interestingness to be Provide a of to be 1. 1. associated with with time time intervals intervals associated � We define it in terms of estimated quality of the clustering extracted on the given time interval Formalize the the Temporal Temporal focusing focusing task task as as an an Formalize 2. 2. optimization problem problem optimization � Discover the time interval that maximizes the interestingness measure ��

  15. A quality measure for A quality measure for density- -based clustering based clustering density � General General principle principle � � High-density clusters separated by low-density noise are preferred � The The method method � � High-density clusters correspond to low dents in the reachability plot ��� => Evaluate the global quality Q of the ������� ������ clustering output as the average ���� ������� ������� reachability within clusters (noise is discarded) given � � and � Definition Definition: : given and dataset dataset D, D, compute compute Q Q D, � as: : D, � as � Q D, � = - R (D, � ’) = - AVG o in D’ reach-d(o) D’ = D – {noise objects} ��

  16. FAQs FAQs � How How Q() Q() is is computed computed for for a a given given time time interval interval I ? I ? � � Step 1: trajectory segments out of I are clipped away � Step 2: OPTICS is run on the clipped trajectories � Step 3: Q(I) is computed on the output reachability plot � How is the How is the reachability reachability threshold set for each interval? threshold set for each interval? � � A reachability threshold is needed in order to locate clusters (and noise) � The threshold for the largest I is manually set by the user � Thresholds for other intervals I’ � I are computed from the first one by proportionally rescaling w.r.t. average reachability � Is the optimal Q(I) biased towards tiny intervals? Is the optimal Q(I) biased towards tiny intervals? � � Yes. The problem has been fixed by defining Q’(I) = Q(I) / log |I| => A small decrease in Q(I) is accepted when it yields a much larger I ��

  17. Esperiments Esperiments A more A more complex complex sample sample dataset dataset ( (generated generated by by CENTRE) CENTRE) � � � Clear clusters in the central time interval vs. dispersion on the borders ��

  18. Optimizing Q() Q() Optimizing Find Find the the optimal optimal Q() Q() by by plotting plotting values values for for all all time time intervals intervals � � � The optimum corresponds to the central time interval ��

Recommend


More recommend