Pattern Recognition 2019 Clustering, Mixture Models and EM Ad - PowerPoint PPT Presentation

Pattern Recognition 2019 Clustering, Mixture Models and EM Ad Feelders Universiteit Utrecht December 13, 2019 Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 1 / 66

Objective of Clustering Put objects (persons, images, web-pages, ...) into a number of groups in such a way that the objects within the same group are similar, but the groups are dissimilar. Variable 2 Variable 1 Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 2 / 66

Similarity between objects Each object is described by a number of variables (also called features or attributes). The similarity between objects is determined on the basis of these variables. The measurement of similarity is central to many clustering methods. Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 3 / 66

Clustering � = Classification In classification the group to which an object belongs is given, and the task is to discriminate between groups on the basis of the variables used to describe the objects. In clustering the groups are not given, but the objective is to discover them. Clustering is sometimes called unsupervised learning, and classification supervised learning. Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 4 / 66

Clustering Techniques Many techniques have been developed to cluster objects into groups: Hierarchical clustering (not discussed). Partitioning methods (e.g. K-means, K-medoids). Model-based clustering (mixture models). Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 5 / 66

Data Matrix We have observations on N objects, that we want to cluster into a number of groups. For each object we observe D variables, numbered 1 , 2 , . . . , D . Data matrix:   x 11 x 1 j x 1 D . . . . . . . . . . . .   . . .     X = x n 1 x nj x nD . . . . . .   . . .   . . .   . . .   x N 1 x Nj x ND . . . . . . where x nj denotes the value of object n for variable j . Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 6 / 66

Distance Measures: numeric variables Object 2 Variable 2 (x 21 ,x 22 ) x 22 � x 12 Object 1 (x 11 ,x 12 ) x 21 � x 11 Variable 1 Dashed line: Euclidian distance Solid line: Manhattan distance Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 7 / 66

Distance Measures: numeric variables Manhattan distance between x i and x j : D � | x id − x jd | . d =1 Squared Euclidian distance between x i and x j : D ( x id − x jd ) 2 = � x i − x j � 2 . � d =1 Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 8 / 66

Standardization Units of measurement should not be important for cluster structure. Therefore variables are often standardized. For example: � N � 1 � � x j ) 2 s j = ( x nj − ¯ � N − 1 n =1 Standardized measurement: nj = x nj − ¯ x j x ∗ s j x ∗ j has mean zero and standard deviation 1. Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 9 / 66

Partitioning methods Search directly for a division of the N objects into K groups that maximizes the quality of the clustering. The number of distinct partitions P ( N , K ) of N objects into K non-empty groups is O ( K N ). For example: P (100 , 5) = 10 68 . Exhaustive search is not feasible. Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 10 / 66

K -means Clustering There are many possibilities to measure the quality of a partition. In case of numeric data, one can use for example N K � � r nk � x n − µ k � 2 J = (9.1) n =1 k =1 Sum of the squares of Euclidian distances of each data point to the center of the cluster to which it has been assigned. r nk = 1 if x n has been assigned to cluster k , and r nk = 0 otherwise (1-of- K coding). Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 11 / 66

Minimize J with respect to r nk (E-step) Optimize for each point n separately by choosing r nk to be 1 for the value of k that gives the minimum distance � x n − µ k � 2 . More formally � 1 if k = arg min j � x n − µ j � 2 r nk = (9.2) 0 otherwise. Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 12 / 66

Minimize J with respect to µ j (M-step) Take derivative of J with respect to µ j and equate to zero N � − 2 r nj ( x n − µ j ) = 0 (9.3) n =1 which gives � n r nj x n µ j = (9.4) � n r nj i.e. the mean of the points that are assigned to cluster j . Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 13 / 66

K -means algorithm 1 Partition the observations into K initial clusters. 2 Calculate the mean of each cluster (M-step). 3 Assign each observation to the cluster whose mean is nearest (E-step). 4 If reassignments have taken place, return to step 2; otherwise stop. Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 14 / 66

Old Faithful data set Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 15 / 66

Old Faithful data set (a) (b) (c) 2 2 2 0 0 0 −2 −2 −2 −2 0 2 −2 0 2 −2 0 2 (d) (e) (f) 2 2 2 0 0 0 −2 −2 −2 −2 0 2 −2 0 2 −2 0 2 Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 16 / 66

Old Faithful data set (g) (h) (i) 2 2 2 0 0 0 −2 −2 −2 −2 0 2 −2 0 2 −2 0 2 Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 17 / 66

� Convergence of algorithm 1000 500 0 1 2 3 4 Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 18 / 66

How to do this in R # load library/package MASS > library(MASS) # scale data > faith.sc <- scale(faithful) # K-means with K=2 applied to faithful data > faithful.k2 <- kmeans(faith.sc,2) # plot resulting clusters > plot(faith.sc[,1],faith.sc[,2],xlim=c(-2,2),type="n") > points(faith.sc[,1],faith.sc[,2], col=faithful.k2$cluster*2,pch=19) Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 19 / 66

Final clustering obtained 2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● waiting ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −2 ● −2 −1 0 1 2 eruptions Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 20 / 66

How many clusters? Required number of groups usually not known in advance. Determine the appropriate number of groups from the data. Informal: plot the quality criterion against the number of groups. Look for large jumps to determine the appropriate number of groups. Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 21 / 66

Example: Ruspini data 150 Variable 2 100 50 0 0 20 40 60 80 100 120 Variable 1 Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 22 / 66

Determining the number of groups 80000 60000 within sum of squares 40000 20000 2 3 4 5 6 number of groups Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 23 / 66

K -medoids Can be used with other dissimilarity measures than Euclidian distance. Use a number of representative objects (called medoids) instead of means. Advantage: less sensitive to outliers than K -means (cf. the mean and median of a sample). Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 24 / 66

K -medoids: cluster quality Each object is assigned to the cluster corresponding to the nearest medoid. The K representative objects should minimize the sum of the dissimilarities of all objects to their nearest medoid, i.e. N K ˜ � � J = r nk V ( x n , µ k ) (9.6) n =1 k =1 Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 25 / 66

Pattern Recognition 2019 Clustering, Mixture Models and EM Ad - PowerPoint PPT Presentation

Pattern Recognition 2019 Clustering, Mixture Models and EM Ad Feelders Universiteit Utrecht December 13, 2019 Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 1 / 66 Objective of Clustering Put objects (persons,

Bernoulli Mixture Models Victor Medina Researcher at SBIF DataCamp Mixture Models in R The

Structure of mixture models Victor Medina Researcher at SBIF DataCamp Mixture Models in R

Part 5 pattern recognition pattern recognition track pattern recognition: associate hits

Lecture 20 Lecture 20 Nov 12 th 2008 Clustering with Mixture of Gaussians Clustering with Mixture

Feature Selection Pattern Recognition: The Early Days Pattern Recognition: The Early Days Only

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Clustering: Models and Algorithms Shikui Tu 2019-02-28 1 Outline Clustering K-mean

AND MACHINE LEARNING CHAPTER 10: MIXTURE MODELS AND EM Mixture Models - Define a joint

Clustering and Dimensionality Reduction Preview Clustering K -means clustering

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION Pattern Recogniton Pattern: Any

Flexible Mixture Modeling and Model-Based Clustering in R Bettina Grn September 2017 c

Introduction to Machine Learning, Clustering and EM Barnab s P czos Contents Clustering

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

Clustering A Categorization of Major Clustering Methods Partitioning Methods

CS 7616 Pattern Recognition Introduction Aaron Bobick School of Interactive Computing

Unsupervised learning (part 1) Lecture 19 David Sontag New York University Slides adapted from

Machine Learning Supervised Learning Unsupervised Learning CSE 446: Expectation Maximization

Lecture 01 Part 01 Algorithms How do we turn it into something a computer Recall DSC

Example exploration Old Faithful R.W. Oldford Old Faithful In the Yellowstone National Park,

( ) Intro. on Artificial Intelligence from the perspective of probability

Gods Character W. Mark Lanier W. Mark Lanier Whats in a name? Commandment 3 Whats

Self-similar groups: old and new results Said Najati Sidki Universidade de Brasilia In 1998

CS 240A: Shared Memory & Multicore Programming with Cilk++ Multicore and NUMA

Pattern Recognition 2019 Clustering, Mixture Models and EM Ad - PowerPoint PPT Presentation

Pattern Recognition 2019 Clustering, Mixture Models and EM Ad Feelders Universiteit Utrecht December 13, 2019 Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 1 / 66 Objective of Clustering Put objects (persons,

Bernoulli Mixture Models Victor Medina Researcher at SBIF DataCamp Mixture Models in R The

Structure of mixture models Victor Medina Researcher at SBIF DataCamp Mixture Models in R

Part 5 pattern recognition pattern recognition track pattern recognition: associate hits

Lecture 20 Lecture 20 Nov 12 th 2008 Clustering with Mixture of Gaussians Clustering with Mixture

Feature Selection Pattern Recognition: The Early Days Pattern Recognition: The Early Days Only

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Clustering: Models and Algorithms Shikui Tu 2019-02-28 1 Outline Clustering K-mean

AND MACHINE LEARNING CHAPTER 10: MIXTURE MODELS AND EM Mixture Models - Define a joint

Clustering and Dimensionality Reduction Preview Clustering K -means clustering

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION Pattern Recogniton Pattern: Any

Flexible Mixture Modeling and Model-Based Clustering in R Bettina Grn September 2017 c

Introduction to Machine Learning, Clustering and EM Barnab s P czos Contents Clustering

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

Clustering A Categorization of Major Clustering Methods Partitioning Methods

CS 7616 Pattern Recognition Introduction Aaron Bobick School of Interactive Computing

Unsupervised learning (part 1) Lecture 19 David Sontag New York University Slides adapted from

Machine Learning Supervised Learning Unsupervised Learning CSE 446: Expectation Maximization

Lecture 01 Part 01 Algorithms How do we turn it into something a computer Recall DSC

Example exploration Old Faithful R.W. Oldford Old Faithful In the Yellowstone National Park,

( ) Intro. on Artificial Intelligence from the perspective of probability

Gods Character W. Mark Lanier W. Mark Lanier Whats in a name? Commandment 3 Whats

Self-similar groups: old and new results Said Najati Sidki Universidade de Brasilia In 1998

CS 240A: Shared Memory &amp; Multicore Programming with Cilk++ Multicore and NUMA

CS 240A: Shared Memory & Multicore Programming with Cilk++ Multicore and NUMA