towards an optimal subspace for k means
play

TOWARDS AN OPTIMAL SUBSPACE FOR K-MEANS ADVISOR: JIA-LING KOH - PowerPoint PPT Presentation

TOWARDS AN OPTIMAL SUBSPACE FOR K-MEANS ADVISOR: JIA-LING KOH SPEAKER: YIN-HSIANG LIAO 2018/01/30, FROM KDD 2017. Introduction Which Two attributes you will pick, If you what to show the clusters? Petal_length, Petal_width. Iris dataset 2


  1. TOWARDS AN OPTIMAL SUBSPACE FOR K-MEANS ADVISOR: JIA-LING KOH SPEAKER: YIN-HSIANG LIAO 2018/01/30, FROM KDD 2017.

  2. Introduction Which Two attributes you will pick, If you what to show the clusters? Petal_length, Petal_width. Iris dataset 2

  3. Introduction Which Two attributes you will pick, If you what to show the clusters? No obvious pair. PCA? 3

  4. Introduction PCA An orthogonal linear transformation. Descending order in variance. 4

  5. Introduction • Motivation: A problem of K-means: “Curse of dimensionality.” ( Hard to be interpreted. ) 5

  6. Introduction • Goal: Optimal dimensionality reduction for k-means. 6

  7. Method 7

  8. Method K-Means. ________________________ In notion of objective function: Minimize Tan, Page 499, 514 8

  9. Method In short, 9

  10. Method Objective function: _____________________ clustered space __________________ noise space 10

  11. Method Intuition of having noise term. 11

  12. Method Minimize the objective function: _________________ Gradient descent Transform the problem to an eigen-decomposition one. 12

  13. Method Let do the math _____________________ clustered space ____________________________________ 13

  14. Method Let do the math 14

  15. Method Let do the math ___ 15

  16. Method Let do the math (2) 16

  17. Method Let do the math Cyclic permutation property 17

  18. Method Let do the math (2) 18

  19. Method Let do the math 19

  20. Method Let do the math 20

  21. Method Let do the math ____________ _______________________ 21

  22. Method 22

  23. 回傳最近的 centroid 加入此 centroid 的 cluster 23

  24. Method For all clusters 24

  25. Method Diagonalization Complexity: 25

  26. Experiment Compare to k-means with PCA and ICA. Compare to 4 algorithms with dimension reduction during clustering. LDA-k-means FOSSCLU ORCLUS Run each 40 times. 4C 26

  27. 27

  28. Experiment 28

  29. Experiment 29

  30. Experiment Problems of : LDA-k-means: fixed (k-1) dims, overfit as high dims. FOSSCLU: SLOW. 30

  31. Experiment Limitation: As k-means: Outlier, non-globular, di ff erent size, densities Need to have “centroids.” 31

  32. Conclusion SubKmeans are: K-means extension. dim(cluster space) is defined automatically. The only parameter is k. Easy to implement and fast. 32

Recommend


More recommend