Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering - PowerPoint PPT Presentation

Evolutionary Clustering Presenter: Lei Tang

Evolutionary Clustering Evolutionary Clustering • Processing time stamped data to produce Processing time stamped data to produce a sequence of clustering. • Each clustering should be similar to the • Each clustering should be similar to the history, while accurate to reflect corresponding data corresponding data. • Trade-off between long-term concept d if drift and short-term variation. d h i i

Example I: Blogosphere Example I: Blogosphere

Blogosphere Blogosphere • Community detection Community detection • The overall interest and friendship network is drift slowly network is drift slowly. • Short-term variation is trigged by external event.

Example II Example II • Moving objects equipped with GPS Moving objects equipped with GPS sensors are to be clustered (for traffic jam prediction or animal migration analysis ) prediction or animal migration analysis ) • The object follow certain route in the long-term. long term • Its estimated coordinate at a given time may vary due to limitations on bandwidth d li i i b d id h and sensor accuracy.

The goal The goal • Current clusters should mainly depend on Current clusters should mainly depend on the current data features. • Data is expected to change not too • Data is expected to change not too quickly. (Temporal Smoothness)

Related Work Related Work • Online document clustering mainly focusing on novelty g y g y detection. • Clustering data streams: scalability and one-pass-access. • Incremental clustering: efficiently apply dynamic updates. • Constrained clustering: must link/can-not link Constrained clustering: must link/can-not link. • Evolutionary Clustering Evolutionary Clustering: – The similarity among existing data points varies with time. – How cluster evolves smoothly.

Basic framework Basic framework • Snapshot quality: sq(C M ) Snapshot quality: sq(C t , M t ) • History cost: hc(C t , C t-1 ) • The total quality of a cluster sequence Th l li f l • We try to find an optimal cluster sequence greedily without knowing the future. g y g • Each step, find a cluster that maximize

Construct the similarity matrix Construct the similarity matrix • Local Information Similarity Local Information Similarity • Temporal Similarity T l Si il i • Total Similarity Total Similarity

Instantiations I: K-means Instantiations I: K means • Snapshot quality: Snapshot quality: • History cost: • In each k-means iteration, the new I h k i i h centroid between the centroid suggested b by non-evolutionary k-means and its l i k d i closest match from previous time step. where

Agglomerative Clustering Agglomerative Clustering • This is more complicated: need to find out the cluster p similarity between two trees (T, T’). • Snapshot quality: the sum of the qualities of all merges performed to create T. f d T • History cost: • 4 greedy heuristics (skipped here): 4 greedy heuristics (skipped here): – Squared:

Experiment Setup Experiment Setup • Data: photo-tag pairs from flickr com Data: photo tag pairs from flickr.com • Task: Cluster tags • Two tags are similar if they both occur at T i il if h b h the same photo • However, the experiments in the paper doesn’t make much sense for me

Comments Comments • Pros: – New problem – Effective heuristics – Temporal smoothness is incorporated in both the affinity matrix and the history cost. • C • Cons – No global solution. – Can not handle the change of number of clusters Can not handle the change of number of clusters. – Experiment seems unreasonable.

Evolutionary Spectral Clustering Evolutionary Spectral Clustering • Idea is almost the same, but here focus on spectral , p clustering, which preserves nice properties (global solution to a relaxed cut problem, connections to k- means) means). • But the idea is presented clearer here. • How to measure the temporal smoothness? – Measure the cluster quality on past data – Compare the cluster membership

Spectral Clustering (1) Spectral Clustering (1) • K-way average association: y g • Negated Average Association: • Normalized Cut: • The basic objective is to minimize the normalized cut or negated average association. g g

Spectral Clustering (2) Spectral Clustering (2) • Typical Procedures Typical Procedures – Compute eigenvectors X of some variations of the similarity matrix of the similarity matrix – Project all data points into span(X) – Applying k-means algorithm to the projected Applying k means algorithm to the projected data points to obtain the clustering result.

K-means Clustering K means Clustering • Find a partition {v1 v2 Find a partition {v1,v2, … , vk} to vk} to minimize the following:

Preserving Cluster Quality Preserving Cluster Quality • K-means K means Check whether current cluster fits previous cluster. • A hidden problem, still needs to find the A hidd bl ill d fi d h cluster mapping.

Negated Average Association(1) Negated Average Association(1) • Similar to K-means strategy: gy • As we know, where Z T Z=I k., T So we just need to maximize the 2nd term.

Negated Average Association(2) Negated Average Association(2) • The solution to are actually the largest k eigenvectors of the matrix. • Notice that the solution is optimal in terms of a relaxed problem. • Connection to k-means. • It is shown that k-means can be reformulated as • It i h th t k b f l t d So k-means is actually a special case of negated average So k means is actually a special case of negated average association with a specific similarity definition.

Normalized Cut Normalized Cut • Normalized cut can be represented as p with certain constraints. • Since Again a trace • We have maximization problem.

Discussion on PCQ framework Discussion on PCQ framework • Very intuitive Very intuitive • The historic similarity matrix is scaled and combined with current similarity matrix combined with current similarity matrix.

Preserving Cluster Membership Preserving Cluster Membership • Temporal cost is measured as the difference Temporal cost is measured as the difference between current partition and historical partition. • Use chi-square statistics to represent the distance: q p So for K-means So for K-means

Negated Average Association(1) Negated Average Association(1) • Distance: Distance: • So

Negated Average Association(2) Negated Average Association(2) • It can be shown that the unrelaxed It can be shown that the unrelaxed partition: • So negated average association can be applied to solve the original evolutionary k-means

Normalized Cut Normalized Cut • Straight forward Straight forward

Comparing PQC & PCM Comparing PQC & PCM • As for the temporal cost, As for the temporal cost, – In PCQ, we need to maximize – In PCM, we need to maximize • Connection: • In PCQ, all the eigen vectors are considered and penalized according to the eigen values.

Real Blog Data Real Blog Data • 407 blogs during 63 consecutive weeks 407 blogs during 63 consecutive weeks. • 148,681 links. • Two communities (ground truth, labeled T i i ( d h l b l d manually based on contents) • Affinity matrix is constructed based on links

Experiment Result Experiment Result

Comments Comments • Nice formulation which has a global Nice formulation which has a global solution for the relaxed version. • Strong connection between k means and • Strong connection between k-means and negated average association. • Can handle new objects or change of C h dl bj h f number of clusters.

Any Questions? Any Questions?

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering - PowerPoint PPT Presentation

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering Processing time stamped data to produce Processing time stamped data to produce a sequence of clustering. Each clustering should be similar to

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Clustering A Categorization of Major Clustering Methods Partitioning Methods

CSE CSE 460 460 Evolutionary Evolutionary Methods Methods In this section we will look at

Evolutionary Algorithms CS 478 - Evolutionary Algorithms 1 Evolutionary Computation/Algorithms

Evolutionary Design By: Dianna Fox and Dan Morris Review 4 main types of Evolutionary

Trust based Clustering for Group Trust based Clustering for Group Trust based Clustering for

Finding Clusters Types of Clustering Approaches: Linkage Based, e.g. Hierarchical Clustering

Clustering Hierarchical clustering and k-mean clustering Genome 373 Genomic Informatics

Cl Clustering t i A Categorization of Major Clustering Methods Partitioning Methods

Clustering Hierarchical clustering, k-mean clustering Genome 559: Introduction to Statistical and

CSCE 478/878 Lecture 8: Stephen Scott Clustering Introduction Outline Clustering Stephen

Clustering and Dimensionality Reduction Preview Clustering K -means clustering

Clustering kMeans, Expectation Maximization, Self-Organizing Maps Outline K-means

Lecture 23: Spectral clustering Hierarchical clustering What is a good clustering?

PAC-Bayesian Analysis of Co-clustering, Graph Clustering and Pairwise Clustering Yevgeny Seldin

Islington Careers Clusters Pilot Project Jodi Pilling Learning and Skills Manager Londons

SMES CLUSTER COMMITTEE PRESENTATION By Ms. Stabile Mlilo Secretary to the Committee Minutes from

Online Detector Characterization using Neural Networks Roxana Popescu Rana Adhikari, TJ

Hypriot Cluster Lab An ARM-Powered Cloud Solution Utilizing Docker Marcel Gromann Andreas

Cluster Approach to Assessment and Moderation Christine Jones, Roddy Graham and Amanda Hamilton

Clustering+in+the+crea/ve+ industries+ The+what,+the+why,+the+how+and+the+ whats+it+like+

Ballymena Enabling Scheme & Community Clusters Programme An Innovative Partnership Approach

NDC SUPPORT CLUSTER NDC SUPPORT CLUSTER In 2015 the German Federal Environment Ministry (BMUB)

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering - PowerPoint PPT Presentation

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering Processing time stamped data to produce Processing time stamped data to produce a sequence of clustering. Each clustering should be similar to

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Clustering A Categorization of Major Clustering Methods Partitioning Methods

CSE CSE 460 460 Evolutionary Evolutionary Methods Methods In this section we will look at

Evolutionary Algorithms CS 478 - Evolutionary Algorithms 1 Evolutionary Computation/Algorithms

Evolutionary Design By: Dianna Fox and Dan Morris Review 4 main types of Evolutionary

Trust based Clustering for Group Trust based Clustering for Group Trust based Clustering for

Finding Clusters Types of Clustering Approaches: Linkage Based, e.g. Hierarchical Clustering

Clustering Hierarchical clustering and k-mean clustering Genome 373 Genomic Informatics

Cl Clustering t i A Categorization of Major Clustering Methods Partitioning Methods

Clustering Hierarchical clustering, k-mean clustering Genome 559: Introduction to Statistical and

CSCE 478/878 Lecture 8: Stephen Scott Clustering Introduction Outline Clustering Stephen

Clustering and Dimensionality Reduction Preview Clustering K -means clustering

Clustering kMeans, Expectation Maximization, Self-Organizing Maps Outline K-means

Lecture 23: Spectral clustering Hierarchical clustering What is a good clustering?

PAC-Bayesian Analysis of Co-clustering, Graph Clustering and Pairwise Clustering Yevgeny Seldin

Islington Careers Clusters Pilot Project Jodi Pilling Learning and Skills Manager Londons

SMES CLUSTER COMMITTEE PRESENTATION By Ms. Stabile Mlilo Secretary to the Committee Minutes from

Online Detector Characterization using Neural Networks Roxana Popescu Rana Adhikari, TJ

Hypriot Cluster Lab An ARM-Powered Cloud Solution Utilizing Docker Marcel Gromann Andreas

Cluster Approach to Assessment and Moderation Christine Jones, Roddy Graham and Amanda Hamilton

Clustering+in+the+crea/ve+ industries+ The+what,+the+why,+the+how+and+the+ whats+it+like+

Ballymena Enabling Scheme &amp; Community Clusters Programme An Innovative Partnership Approach

NDC SUPPORT CLUSTER NDC SUPPORT CLUSTER In 2015 the German Federal Environment Ministry (BMUB)

Ballymena Enabling Scheme & Community Clusters Programme An Innovative Partnership Approach