K-means algorithm select K points (m 1 ,...,m K ) randomly do (w 1 - PowerPoint PPT Presentation

K-means algorithm select K points (m 1 ,...,m K ) randomly do (w 1 ,...,w K ) = (m 1 ,...,m K ) all clusters C i = {} for each row w in M find the closest point in (w 1 ,...,w K )to w assign w to the corresponding cluster: C i = C i ∪ {w} (if w i is closest point) end for each cluster C i calculate the mean point m i ≠ w i while exists m i

K-means clustering ● Input: M (set of points), K (number of clusters) m 1 ,...,m k (Initial centroids) ● Choosing K – Study the data – Measure how squared error decreases as more clusters are added ● Choosing centroids – Typically randomly

K-means clustering ● Pros: – Easy – Scalable ● Cons: – Works only for certain clusters – Sensitive to outliers and noise

K-means clustering

K-means clustering Bad initial points

K-means clustering Non-spherical clusters

Questions ● Using the euclidean distance one gets spherical clusters, what types of clusters does one get using the manhattan distance? ● If we assume that the K-means algorithm converges in I iterations, with N points and X characteristics for each point give an approximation of the complexity of the algorithm expressed in K,I,N and X ● Can the K-means algorithm be parallellized? if yes how?

Practical K-Means I want to cluster this class into 5 different clusters. Assume that I know: ● Your Age ● What row you are sitting in ● Wether you handed in the first assignment on time or not ● How many years you have studied at university Design a method to use K-means to create these clusters

DB Scan ● Density based clustering ● Connected regions with sufficiently high density ● Clusters with arbitrary shape ● Avoids outliers, noise

DB Scan - key concepts ●  -neighbourhood  – the neighbourhood within a radius of an object ● core object – an object is a core object iff there are more than MinPts  objects in its -neighbourhood ● directly density reachable (ddr) – An object p is ddr from q iff q is a core object and p  is inside the -neighbourhood of q

DB Scan - key concepts ● density reachable (dr) – an object q is dr from p iff there exists a chain of objects p 1 ,...,p n such that p 1 is ddr from p , p 2 is ddr from p 1 , p 3 is ddr from ... and q is ddr from p n . ● density connected (dc) – p is dc to q iff exist an object o such that p is dr from o and q is dr from o

DB Scan - How to use DB scan to cluster ● Idea: – If object p is density connected to q, then p and q should belong to the same cluster – If an object is not density connected to any other object it is considered as noise

DB Scan - How to use DB scan to cluster ● Naïve Algorithm: i = 0 How do you do this? do take a point p from M find the set of points P which are density connected to p if P = {} M = M / {p} else C i =P, i=i+1, M = M / P end ≠ while M {}

DB Scan - How to use DB scan to cluster ● More practical Algorithm: i = 0, Find the core points CP in M do How do you do this? take a point p from CP find the set of points P which are density reachable from p ∩ P) C i =P, i=i+1, CP = CP / (CP ≠ while CP {}

DB Scan - How to use DB scan to cluster find the set of points P which are density reachable from p C={p},P={p} do Remove a point p' from C Find all of the points X that are directly density reachable from p' C = C ∪ ( X \ ( P X)) ∩ P = P ∪ X ≠ while C {}

Questions ● Why is the density connected criterion useful to define a cluster, instead of density reachable or directly density reachable? ● For which points are density reachable symmetric? ● Express using only core objects and directly density reachable, which objects will belong to a cluster.

Practical db scan Try to use the db scan algorithm with the following parameters: MinPts: Eps: To determine if you are a core point, if you belong to a cluster.

K-means algorithm select K points (m 1 ,...,m K ) randomly do (w 1 - PowerPoint PPT Presentation

K-means algorithm select K points (m 1 ,...,m K ) randomly do (w 1 ,...,w K ) = (m 1 ,...,m K ) all clusters C i = {} for each row w in M find the closest point in (w 1 ,...,w K )to w assign w to the corresponding cluster: C i = C i {w} (if w

K-MEANS++ OPTIMAL INITIALIZATION ALGORITHM An Improved K-means Clustering Method OVERVIEW

1 K-means clustering The K-means clustering algorithm can be seen as applying the EM algorithm to

k -means++ seeding Have seen that the k -means algorithm can output arbitrarily poor solutions, if

Odds Algorithm An Online Algorithm Group Fibonado 20. Dec 2016 Group Fibonado Odds Algorithm

Lecture 23/Chapter 19 Diversity of Sample Means Means versus Proportions Behavior of

Cluster-based Segmentation Algorithm Why Shift What is Mean Idea K-means++ based Algorithm

Data Clustering: Data Clustering: 50 Years Beyond K means 50 Years Beyond K means 50 Years

K -means Clustering Ke Chen Reading: [7.3, EA], [9.1, CMB] COMP24111 Machine Learning Outline

Visible Surface Determination CS418 Computer Graphics John C. Hart Painters Algorithm

Algorithm Analysis October 12, 2016 CMPE 250 Algorithm Analysis October 12, 2016 1 / 66

A history of the k-means algorithm Hans-Hermann Bock, RWTH Aachen, Allemagne 1. Clustering with

LECTURE 7 Clustering The k-means algorithm Hierarchical Clustering The DBSCAN algorithm

Quantum algorithms Daniel J. Bernstein Quantum algorithm means an algorithm that a

What do quantum computers do? Daniel J. Bernstein Quantum algorithm means an algorithm

Multi-variable Optimization K-means clustering K-means clustering on points is finding K

11/11/2014 Chapter 22 INFERENCES ABOUT MEANS 1 SAMPLING DISTRIBUTION FOR MEANS Recall, the

Points, Distances, and Cellular Automata: Geometric and Spatial Algorithmics Luidnel Maignan

Informed search Lirong Xia Spring, 2017 Last class Search problems state space graph:

COMP9313: Big Data Management High Dimensional Similarity Search Similarity Search Problem

CS 472 Homework CS 472 - Homework 1 Perceptron Homework Assume a 3 input perceptron plus bias

CSC 1800 Organization of Programming Languages Scope 1 Scope and Names Scope determines

Types of Types Each type supports a set of valid operations. Types can be latent or

Processing Latent Variable Models and Signal Separation Bhiksha Raj Class 13. 15 Oct 2013

Business Statistics CONTENTS Comparing two samples Comparing two unrelated samples Comparing