k means the advantages of careful seeding
play

K-means++: The Advantages of Careful Seeding Sergei Vassilvitskii - PowerPoint PPT Presentation

K-means++: The Advantages of Careful Seeding Sergei Vassilvitskii David Arthur (Stanford university) Clustering R d Given points in split them into similar groups. k n Clustering R d Given points in split them into


  1. K-means++: The Advantages of Careful Seeding Sergei Vassilvitskii David Arthur (Stanford university)

  2. Clustering R d Given points in split them into similar groups. k n

  3. Clustering R d Given points in split them into similar groups. k n This talk: k-means clustering: � c ∈C � x − c � 2 min Find centers, that minimize k C 2 x ∈ X

  4. Why Means? � c ∈C � x − c � 2 min Objective: Find centers, that minimize k C 2 x ∈ X � � x − y � 2 For one cluster: Find that minimizes y 2 x ∈ X 1 � Easy! y = x | X | x ∈ X

  5. Lloyd’s Method: k-means Initialize with random clusters

  6. Lloyd’s Method: k-means Assign each point to nearest center

  7. Lloyd’s Method: k-means Recompute optimum centers (means)

  8. Lloyd’s Method: k-means Repeat: Assign points to nearest center

  9. Lloyd’s Method: k-means Repeat: Recompute centers

  10. Lloyd’s Method: k-means Repeat...

  11. Lloyd’s Method: k-means Repeat...Until clustering does not change

  12. Analysis How good is this algorithm? Finds a local optimum That is potentially arbitrarily worse than optimal solution

  13. Approximating k-means O ( n 3 /� d ) • Mount et al.: approximation in time 9 + � O ( n + k k +2 � − 2 dk log k ( n/� )) • Har Peled et al.: in time 1 + � 2 ( k/� ) O (1) nd • Kumar et al.: in time 1 + �

  14. Approximating k-means O ( n 3 /� d ) • Mount et al.: approximation in time 9 + � O ( n + k k +2 � − 2 dk log k ( n/� )) • Har Peled et al.: in time 1 + � 2 ( k/� ) O (1) nd • Kumar et al.: in time 1 + � Lloyd’s method: 2 Ω( √ n ) • Worst-case time complexity: n O ( k ) • Smoothed complexity:

  15. Approximating k-means O ( n 3 /� d ) • Mount et al.: approximation in time 9 + � O ( n + k k +2 � − 2 dk log k ( n/� )) • Har Peled et al.: in time 1 + � 2 ( k/� ) O (1) nd • Kumar et al.: in time 1 + � Lloyd’s method: For example, Digit Recognition dataset (UCI): n = 60 , 000 d = 600 Convergence to a local optimum in 60 iterations.

  16. Challenge Develop an approximation algorithm for k-means clustering that is competitive with the k-means method in speed and solution quality. Easiest line of attack: focus on the initial center positions. Classical k-means: pick points at random. k

  17. k-means on Gaussians

  18. k-means on Gaussians

  19. Easy Fix Select centers using a furthest point algorithm (2-approximation to k-Center clustering).

  20. Easy Fix Select centers using a furthest point algorithm (2-approximation to k-Center clustering).

  21. Easy Fix Select centers using a furthest point algorithm (2-approximation to k-Center clustering).

  22. Easy Fix Select centers using a furthest point algorithm (2-approximation to k-Center clustering).

  23. Easy Fix Select centers using a furthest point algorithm (2-approximation to k-Center clustering).

  24. Sensitive to Outliers

  25. Sensitive to Outliers

  26. Sensitive to Outliers

  27. k-means++ Interpolate between the two methods: Let be the distance between and the nearest D ( x ) x cluster center. Sample proportionally to ( D ( x )) α = D α ( x ) Original Lloyd’s: α = 0 Furthest Point: α = ∞ k-means++: α = 2 Contribution of to the overall error x

  28. k-Means++

  29. k-Means++ Theorem: k-means++ is approximate in expectation. Θ(log k ) Ostrovsky et al. [06]: Similar method is approximate O (1) under some data distribution assumptions.

  30. Proof - 1st cluster Fix an optimal clustering . C ∗ Pick first center uniformly at random Bound the total error of that cluster.

  31. Proof - 1st cluster Let be the cluster. A Each point equally likely a 0 ∈ A to be the chosen center. Expected Error: 1 � � � a − a 0 � 2 E [ φ ( A )] = | A | a 0 ∈ A a ∈ A � � a − ¯ A � 2 = 2 φ ∗ ( A ) = 2 a ∈ A

  32. Proof - Other Clusters Suppose next center came from a new cluster in OPT. Bound the total error of that cluster.

  33. Other CLusters Let be this cluster, and the point selected. b 0 B Then: D 2 ( b 0 ) � � min( D ( b ) , � b − b 0 � ) 2 E [ φ ( B )] = b ∈ B D 2 ( b ) · � b 0 ∈ B b ∈ B Key step: D ( b 0 ) ≤ D ( b ) + � b − b 0 �

  34. Cont. For any b: D 2 ( b 0 ) ≤ 2 D 2 ( b ) + 2 � b − b 0 � 2 2 D 2 ( b ) + 2 � � D 2 ( b 0 ) ≤ � b − b 0 � 2 Avg. over all b: | B | | B | b ∈ B b ∈ B Same for all b 0 Cost in uniform sampling

  35. Cont. For any b: D 2 ( b 0 ) ≤ 2 D 2 ( b ) + 2 � b − b 0 � 2 2 D 2 ( b ) + 2 � � D 2 ( b 0 ) ≤ � b − b 0 � 2 Avg. over all b: | B | | B | b ∈ B b ∈ B Recall: D 2 ( b 0 ) � � min( D ( b ) , � b − b 0 � ) 2 E [ φ ( B )] = b ∈ B D 2 ( b ) · � b 0 ∈ B b ∈ B 4 � � � b − b 0 � 2 ≤ = 8 φ ∗ ( B ) | B | b 0 ∈ B b ∈ B

  36. Wrap Up If clusters are well separated, and we always pick a center from a new optimal cluster, the algorithm is - competitive. 8

  37. Wrap Up If clusters are well separated, and we always pick a center from a new optimal cluster, the algorithm is - competitive. 8 Intuition: if no points from a cluster are picked, then it probably does not contribute much to the overall error.

  38. Wrap Up If clusters are well separated, and we always pick a center from a new optimal cluster, the algorithm is - competitive. 8 Intuition: if no points from a cluster are picked, then it probably does not contribute much to the overall error. Formally, an inductive proof shows this method is Θ(log k ) competitive.

  39. Experiments Tested on several datasets: Synthetic • 10k points, 3 dimensions Cloud Cover (UCI Repository] • 10k points, 54 dimensions Color Quantization • 16k points, 16 dimensions Intrusion Detection (KDD Cup) • 500k points, 35 dimensions

  40. Typical Run KM++ v. KM v. KM-Hybrid 1300 1200 1100 1000 LLOYD Error HYBRID KM++ 900 800 700 600 0 50 100 150 200 250 300 350 400 450 500 Stage

  41. Experiments Total Error k-means km-Hybrid k-means++ Synthetic 0 . 016 0 . 015 0 . 014 6 . 06 × 10 5 6 . 02 × 10 5 5 . 95 × 10 5 Cloud Cover Color 741 712 670 32 . 9 × 10 3 3 . 4 × 10 3 Intrusion − Time: k-means++ 1% slower due to initialization.

  42. Final Message Friends don’t let friends use k-means.

  43. Thank You Any Questions?

Recommend


More recommend