Online Clustering of High-Dimensional Trajectories under Concept - PowerPoint PPT Presentation

Online Clustering of High-Dimensional Trajectories under Concept Drift 2011-09-07, ECMLPKDD 2011 Athens, Greece Georg Krempl 1 , 2 Zaigham Siddiqui 2 Myra Spiliopoulou 2 2 University of Magdeburg 1 University of Graz { myra,siddiqui,krempl } georg.krempl@uni-graz.at @iti.cs.uni-magdeburg.de 1 / 25

Outline ◮ Problem Description ◮ Motivation and Objectives ◮ Modeling Trajectories as Gaussian Mixtures ◮ Trajectory Clustering with Expectation Maximization (offline) ◮ TRACER Algorithm (online) ◮ Overview ◮ Initialisation ◮ Update, Clustering and Prediction ◮ Experiments ◮ Settings ◮ Results ◮ Conclusion 2 / 25

Outline ◮ Problem Description ◮ Motivation and Objectives ◮ Modeling Trajectories as Gaussian Mixtures ◮ Trajectory Clustering with Expectation Maximization (offline) ◮ TRACER Algorithm (online) ◮ Overview ◮ Initialisation ◮ Update, Clustering and Prediction ◮ Experiments ◮ Settings ◮ Results ◮ Conclusion 3 / 25

◮ CRM Application ◮ Customers are shopping online ◮ Money is spent on different product groups in a basket ◮ Multiple visits per customer ◮ Behaviour changing over time (recession, new product) ◮ Can we cluster customers ? Can we predict values in the next basket ? 4 / 25

◮ CRM Application ◮ Customers are shopping online ◮ Money is spent on different product groups in a basket ◮ Multiple visits per customer ◮ Behaviour changing over time (recession, new product) ◮ Can we cluster customers ? Can we predict values in the next basket ? ◮ Trajectory Clustering Problem ◮ Customers: Population of individuals ◮ Each visit: Measurement , Money spent in all product groups: Measurement vector ◮ Customer history: Trajectory ◮ Subpopulations of customers: Clusters ◮ Multiple measurements per individual ◮ Measurements are not taken at equi-distant times ◮ Distribution of measurements is subject to drift 4 / 25

Clustering Trajectories under Drift: Objective ◮ Cluster individuals ◮ Track clusters over time ◮ Predict/Extrapolate cluster movements t 0 t 1 t 2 t 3 5 / 25

Clustering Trajectories under Drift: Objective ◮ Cluster individuals ◮ Track clusters over time ◮ Predict/Extrapolate cluster movements t 0 t 1 t 2 t 3 6 / 25

Clustering Trajectories under Drift: Objective ◮ Cluster individuals ◮ Track clusters over time ◮ Predict/Extrapolate cluster movements t 0 t 1 t 2 t 3 t 0 t 1 t 2 t 3 6 / 25

Clustering Trajectories under Drift: Objective ◮ Cluster individuals ◮ Track clusters over time ◮ Predict/Extrapolate cluster movements t 0 t 1 t 2 t 3 t 0 t 1 t 2 t 3 t 0 t 1 t 2 t 3 6 / 25

Clustering Trajectories under Drift ◮ Formulation as Gaussian Mixture Model ◮ z i = z i 1 , z i 2 , · · · , z in i are the n i observations of i -th individual ◮ K clusters, with ◮ mixing proportions α k ◮ distribution parameters θ k mean depends on time via regression coefficients β k , covariance matrix Σ k is static for the k -th cluster. 7 / 25

Clustering Trajectories under Drift ◮ Formulation as Gaussian Mixture Model ◮ z i = z i 1 , z i 2 , · · · , z in i are the n i observations of i -th individual ◮ K clusters, with ◮ mixing proportions α k ◮ distribution parameters θ k mean depends on time via regression coefficients β k , covariance matrix Σ k is static for the k -th cluster. ◮ Likelihood of observing trajectory of individual i : n i K � � p ( z i ; Θ) = α k p ( z il ; θ k ) (1) l =1 k =1 7 / 25

EM Trajectory Clustering ◮ EM algorithm for general likelihood maximisation problem: Dempster et al., 1977 ◮ Offline EM Trajectory Clustering algorithm: ◮ Gaffney and Smyth, 1999 ◮ Provides an initial clustering ◮ Problem: Offline algorithm, how to use in a stream? How robust against sudden change (non-smooth trajectories) 8 / 25

Outline ◮ Problem Description ◮ Motivation and Objectives ◮ Modelling Trajectories as Gaussian Mixtures ◮ Trajectory Clustering with Expectation Maximisation (offline) ◮ TRACER Algorithm (online) ◮ Overview ◮ Initialisation ◮ Update, Clustering and Prediction ◮ Experiments ◮ Settings ◮ Results ◮ Conclusion 9 / 25

TRACER Algorithm Overview ◮ Make an initial clustering using EM ◮ Update clustering: ◮ Estimate new position of clusters ◮ Assign new individuals to clusters ◮ Assumptions: ◮ Static number of clusters, K ◮ Static covariance matrices, Σ k 10 / 25

TRACER Algorithm Overview ◮ Make an initial clustering using EM ◮ Update clustering: ◮ Estimate new position of clusters ◮ Assign new individuals to clusters ◮ Assumptions: ◮ Static number of clusters, K ◮ Static covariance matrices, Σ k ◮ Approach: K´ alm´ an filter (K´ alm´ an, 1959 ) 10 / 25

K´ alm´ an filter ◮ State transition: New state x s x s = Ax s − 1 + w s (2) ◮ State-to-signal: Measurement z ∈ R D z s = Hx s + v s (3) 11 / 25

K´ alm´ an filter ◮ State transition: New state x s x s = Ax s − 1 + w s (2) ◮ State-to-signal: Measurement z ∈ R D z s = Hx s + v s (3) ◮ States: True (unobservable) cluster centroids, vector of length D ∗ ( O + 1) ◮ K´ alm´ an filter computes at each discrete time step s : State estimate for each cluster: ˆ x s Error estimate on cluster state: P s ◮ Questions: ◮ How to chose ˆ x 0 , A , Q , H , R ? ◮ How to assign individuals to clusters ? 11 / 25

TRACER Initialisation Initial State of Each Cluster State is initialised from β -coefficients obtained via EM ◮ State vector µ 0 of size ( D ∗ ( O + 1) x 1) at t = 0: f ( t ) = ( f 1 (0) , · · · , f D (0)) ◮ d -th coordinate estimate: f (0) ( t ) = β d 0 + t β d 1 + · · · + t o β do d ◮ Covariance matrix Σ 0 : Identity matrix 12 / 25

TRACER Initialisation State Transition Matrix A ◮ Matrix A = [ a ij ] with � δ q = ∆ q if ∃ q ∈ N 0 : i − j + D ∗ q = 0 q ! a i , j = 0 otherwise 13 / 25

TRACER Initialisation State Transition Matrix A ◮ Matrix A = [ a ij ] with � δ q = ∆ q if ∃ q ∈ N 0 : i − j + D ∗ q = 0 q ! a i , j = 0 otherwise ◮ Example for D = 2 and O = 2:  a 0 0 a 1 0 a 2 0  0 0 0 a 0 a 1 a 2     0 0 a 0 0 a 1 0   A =   0 0 0 0 a 0 a 1     0 0 0 0 a 0 0   0 0 0 0 0 a 0 with a 0 = 1, a 1 = ∆, a 2 = ∆ 2 2 13 / 25

TRACER Initialisation Process Noise Covariance Matrix Q ◮ Identity matrix multiplied by process noise factor ˆ q : Q = I ∗ ˆ q 14 / 25

TRACER Initialisation Process Noise Covariance Matrix Q ◮ Identity matrix multiplied by process noise factor ˆ q : Q = I ∗ ˆ q Measurement (or state-to-signal) Matrix H ◮ Set equal to the identity matrix, H = I 14 / 25

TRACER Initialisation Process Noise Covariance Matrix Q ◮ Identity matrix multiplied by process noise factor ˆ q : Q = I ∗ ˆ q Measurement (or state-to-signal) Matrix H ◮ Set equal to the identity matrix, H = I Measurement Noise Covariance Matrix R ◮ Computed as covariance matrix of EM clustering 14 / 25

TRACER Update and Clustering New measurement z Y z of known individual? Calc. meta-features (speed, accelaration, etc) Calc. measurement's Update Matrices cluster membership A, R probability Update individual's Estimate new cluster membership cluster positions* probability 15 / 25

Outline ◮ Problem Description ◮ Motivation and Objectives ◮ Modelling Trajectories as Gaussian Mixtures ◮ Trajectory Clustering with Expectation Maximisation (offline) ◮ TRACER Algorithm (online) ◮ Overview ◮ Initialisation ◮ Update, Clustering and Prediction ◮ Experiments ◮ Settings ◮ Results ◮ Conclusion 16 / 25

Objective ◮ Similar clustering quality of EM and TRACER? ◮ Robustness against sudden shift ◮ Speed and suitability for online processing 17 / 25

Objective ◮ Similar clustering quality of EM and TRACER? ◮ Robustness against sudden shift ◮ Speed and suitability for online processing Synthetic Data Streams with Drift ◮ 5 types of synthetic data sets: ◮ Different state transition noise ( A : high, C low) ◮ Different number of dimensions ( A , · · · , C : one; D , E : two) ◮ 10 data sets per type ◮ 1500 individuals, on average 2 measurements / individual, 1000 measurements for training, 1000 for test before shift, 1000 for test after shift 17 / 25

Update Strategies Method Description EM EM Expectation Maximisation (multivariate variant of [Gaffney and Smyth, 1999]) K-1 Confidence prop. to squared membership probability Kalman K-2 Confidence ∈ { 0; 1 } , winner-takes-all K-3 Confidence prop. to membership probability K-4 As K1, but 10x higher ST noise factor estimate K-5 As K1, but 10x smaller ST noise factor estimate K-6 As K1, but use of speed and acceleration as meta-features for membership probability estimation p 18 / 25

Measure ◮ Cluster Purity: K purity = 1 K � max i =1 C ij N j =1 C ij Number of elements in the i -th true and j -th pred. cluster N Total number of elements ◮ Wilcoxon signed rank sum test: Significance of differences in clustering quality 19 / 25

Accuracy of State Estimation over Time 600 Train Valid 1 Valid 2 400 200 0 Feature 1 -200 -400 -600 Component A Component B Component C Cluster 1 Cluster 2 Cluster 3 -800 0 0.5 1 1.5 2 2.5 3 Time 20 / 25

Online Clustering of High-Dimensional Trajectories under Concept - PowerPoint PPT Presentation

Online Clustering of High-Dimensional Trajectories under Concept Drift 2011-09-07, ECMLPKDD 2011 Athens, Greece Georg Krempl 1 , 2 Zaigham Siddiqui 2 Myra Spiliopoulou 2 2 University of Magdeburg 1 University of Graz { myra,siddiqui,krempl }

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Motivation High Dimensional Issues Subspace Clustering Full Dimensional Clustering Issues

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

Clustering A Categorization of Major Clustering Methods Partitioning Methods

Time- -focused density focused density- -based based Time clustering of trajectories

Clustering Hierarchical clustering and k-mean clustering Genome 373 Genomic Informatics

Clustering Hierarchical clustering, k-mean clustering Genome 559: Introduction to Statistical and

Clustering DSE 210 Clustering in R d Two common uses of clustering: Vector quantization Find

Trust based Clustering for Group Trust based Clustering for Group Trust based Clustering for

Finding Clusters Types of Clustering Approaches: Linkage Based, e.g. Hierarchical Clustering

Cl Clustering t i A Categorization of Major Clustering Methods Partitioning Methods

CSCE 478/878 Lecture 8: Stephen Scott Clustering Introduction Outline Clustering Stephen

Clustering and Dimensionality Reduction Preview Clustering K -means clustering

Clustering kMeans, Expectation Maximization, Self-Organizing Maps Outline K-means

Lecture 23: Spectral clustering Hierarchical clustering What is a good clustering?

MANHATTAN C O L L E C T I O N New York - USA This range will show off your jewellery with

Night Vision Capability (NVC) An Example of Programme Integration Brussels, 07/11/2019 Soeren

Summary of Change Proposals Presented by SPAWAR for S100WG4 Feb 2019 4.7 - Palette Support for

TASK SHEET ASSIGNMENT THREE (PRESENTATION) Information Systems For Service Industries Assignment

Antitrust Notice The Casualty Actuarial Society is committed to adhering strictly to the letter

Miami Beach. During this years Art Basel in Miami Beach the BMW Group will present two major

Presentation of the Norsk Gjenvinning Group September 2017 Content I. Presentation of the NG

Second Quarter 2019 Earnings Presentation July 31, 2019 Safe Harbor Statement NOTE: This