TOWARDS AN OPTIMAL SUBSPACE FOR K-MEANS ADVISOR: JIA-LING KOH SPEAKER: YIN-HSIANG LIAO 2018/01/30, FROM KDD 2017.
Introduction Which Two attributes you will pick, If you what to show the clusters? Petal_length, Petal_width. Iris dataset 2
Introduction Which Two attributes you will pick, If you what to show the clusters? No obvious pair. PCA? 3
Introduction PCA An orthogonal linear transformation. Descending order in variance. 4
Introduction • Motivation: A problem of K-means: “Curse of dimensionality.” ( Hard to be interpreted. ) 5
Introduction • Goal: Optimal dimensionality reduction for k-means. 6
Method 7
Method K-Means. ________________________ In notion of objective function: Minimize Tan, Page 499, 514 8
Method In short, 9
Method Objective function: _____________________ clustered space __________________ noise space 10
Method Intuition of having noise term. 11
Method Minimize the objective function: _________________ Gradient descent Transform the problem to an eigen-decomposition one. 12
Method Let do the math _____________________ clustered space ____________________________________ 13
Method Let do the math 14
Method Let do the math ___ 15
Method Let do the math (2) 16
Method Let do the math Cyclic permutation property 17
Method Let do the math (2) 18
Method Let do the math 19
Method Let do the math 20
Method Let do the math ____________ _______________________ 21
Method 22
回傳最近的 centroid 加入此 centroid 的 cluster 23
Method For all clusters 24
Method Diagonalization Complexity: 25
Experiment Compare to k-means with PCA and ICA. Compare to 4 algorithms with dimension reduction during clustering. LDA-k-means FOSSCLU ORCLUS Run each 40 times. 4C 26
27
Experiment 28
Experiment 29
Experiment Problems of : LDA-k-means: fixed (k-1) dims, overfit as high dims. FOSSCLU: SLOW. 30
Experiment Limitation: As k-means: Outlier, non-globular, di ff erent size, densities Need to have “centroids.” 31
Conclusion SubKmeans are: K-means extension. dim(cluster space) is defined automatically. The only parameter is k. Easy to implement and fast. 32
Recommend
More recommend