clustering models and algorithms
play

Clustering: Models and Algorithms Shikui Tu 2019-02-28 1 Outline - PowerPoint PPT Presentation

Clustering: Models and Algorithms Shikui Tu 2019-02-28 1 Outline Clustering K-mean clustering, hierarchical clustering Adaptive learning (online learning) CL, FSCL, RPCL Gaussian Mixture Models (GMM)


  1. Clustering: Models and Algorithms Shikui Tu 2019-02-28 1

  2. Outline • Clustering – K-mean clustering, hierarchical clustering • Adaptive learning (online learning) – CL, FSCL, RPCL • Gaussian Mixture Models (GMM) • Expectation-Maximization (EM) for maximum likelihood 2

  3. What is clustering? ������������������ ���� Six malignant tumors (melanoma) 8 APRIL 2016 • VOL 352 ISSUE 6282, SCIENCE 3

  4. How to represent a cluster • ������������� �������������� … ��� ����� 4

  5. How to define error? Square distance: ! x t || ! - x t || 2 || ! - x 1 || 2 + || ! - x 2 || 2 + || ! - x 3 || 2 ������ ! ������������������ 5

  6. Matrix derivatives 6 http://www2.imm.dtu.dk/pubdb/views/edoc_download.php/3274/pdf/imm3274.pdf

  7. Clustering the data We have the following data: We want to cluster the data into two clusters (red and blue) How? 7

  8. Minimize the sum of square distances J minimize r nk = 1 if and only if data point x n is assigned to cluster k; otherwise r nk = 0. µ 2 r n1 = 1 k = 1, 2; K = 2 clusters r n2 = 0 n = 1, …, N; N: the total number of points. µ 1 We need to calculate { r nk }and { µ k } . 8

  9. If we know r n1 , r n2 for all n =1,…,N Since the points have been assigned to cluster 1 or cluster 2, we calculate µ 2 µ 1 = mean of the points in cluster 1 µ 2 = mean of the points in cluster 2 µ 1 Or formally We call it the M Step. 9

  10. If we know µ 1, µ 2 We should assign point x n to cluster 1, because || x n – µ 1 || 2 < || x n – µ 2 || 2 µ 2 Then, r n1 = 1 r n2 = 0 µ 1 Or formally We call it the E Step 10

  11. Initialization µ 1 µ 2 11

  12. Given µ 1, µ 2 , calculate r n1 , r n2 for all n =1,…,N Equal distance line E Step Assign the points to the nearest cluster: Steps 12

  13. Given r n1 , r n2 , calculate µ 1, µ 2 M Step Calculate the means of the points in each cluster: Steps 13

  14. Given µ 1, µ 2 , calculate r n1 , r n2 for all n =1,…,N E Step Assign the points to the nearest cluster: Steps 14

  15. Given r n1 , r n2 , calculate µ 1, µ 2 M Step Calculate the means of the points in each cluster: Steps 15

  16. Initialization E-Step M-Step E-Step M-Step E-Step If J does not change, or E-Step Convergence M-Step { µ 1, µ 2 } do not change, then the algorithm converges. 16

  17. K ����� • ������ ! 1 ,…, ! k • ���� – ������������������ ! i – � ! i ������������� • �������������

  18. Basic ingredients • Model or structure • Objective function • Algorithm • Convergence

  19. Questions for K-mean algorithm • Does it find the global optimum of J? – No, the nearest local optimum, depending on initialization • If Euclidean distance is not good for some data, do we have other choices? • Can we assign each data point to the clusters probabilistically? • If K (the total number of clusters) is unknown, can we estimate it from the data? 19

  20. Outline • Clustering – K-mean clustering, hierarchical clustering • Adaptive learning (online learning) – CL, FSCL, RPCL • Gaussian Mixture Models (GMM) • Expectation-Maximization (EM) for maximum likelihood 20

  21. Hierarchical Clustering • k -means clustering requires – k – Positions of initial centers – A distance measure between points ( e.g. Euclidean distance) • Hierarchical clustering requires a measure of distance between groups of data points 21 Adapted from Blei, D . Hierarchial Cluster [PwerPoint slides]. www.cs.princeton.edu/courses/archive/spr08/cos424/slides/clustering-2.pdf

  22. Hierarchical Clustering • Agglomerative clustering • A very simple procedure: – Assign each data point into its own group – Repeat: look for the two closest groups and merge them into one group – Stop when all the data points are merged into a single cluster 22 Adapted from Blei, D . Hierarchial Cluster [PwerPoint slides]. www.cs.princeton.edu/courses/archive/spr08/cos424/slides/clustering-2.pdf

  23. Distance Measure • Distance between data points a and b : d ( a, b ) – • Group A and B – Single-linkage d ( A, B ) = a ∈ A,b ∈ B d ( a, b ) min – Complete-linkage d ( A, B ) = a ∈ A,b ∈ B d ( a, b ) max – Average-linkage P a ∈ A,b ∈ B d ( a, b ) d ( A, B ) = | A | · | B | 23

  24. Dendrogram Distance 24 Jain, A. K., Murty, M. N., Flynn, P. J. (1999) "Data Clustering: A Review". ACM Computing Surveys (CSUR), 31(3), p.264-323, 1999.

  25. Outline • Clustering – K-mean clustering, hierarchical clustering • Adaptive learning (online learning) – CL, FSCL, RPCL • Gaussian Mixture Models (GMM) • Expectation-Maximization (EM) for maximum likelihood 25

  26. From batch to adaptive • Given a batch of data points • Data points come one by one: … x 1 x 2 x N 26

  27. Competitive learning • Data points come one by one: … x 1 x 2 x N 27

  28. When starting with “bad initializations” 28

  29. A four-cluster case 29

  30. frequency sensitive competitive learning (FSCL) [Ahalt et al., 1990] The idea is to penalize the frequent winners: 30

  31. FSCL is not good when there are extra centers When k is pre-assigned to 5. the frequency sensitive mechanism also brings the extra one into data to disturb the correct locations of others 31

  32. Rival penalized competitive learning (RPCL) (Xu, Krzyzak, & Oja, 1992 , 1993) The RPCL differs from FSCL by implementing p j,t as follows: where γ approximately takes a number between 0.05 and 0.1 for controlling the penalizing strength. 32

  33. Rival penalized mechanism makes extra agents driven far away. 33

  34. Thank you! 53

Recommend


More recommend