clustering and k means root mean square error rms
play

Clustering and K-means Root Mean Square Error (RMS) Data: ! x 1 , ! - PowerPoint PPT Presentation

Clustering and K-means Root Mean Square Error (RMS) Data: ! x 1 , ! x 2 , , ! x N R d Approximations: ! z 1 , ! z 2 , , ! z N R d x i ! ! N 1 2 Root Mean Square error = y i 2 z N i = 1 PCA based predic>on Data: !


  1. Clustering and K-means

  2. Root Mean Square Error (RMS) Data: ! x 1 , ! x 2 , … , ! x N ∈ R d Approximations: ! z 1 , ! z 2 , … , ! z N ∈ R d x i − ! ! N 1 ∑ 2 Root Mean Square error = y i 2 z N i = 1

  3. PCA based predic>on Data: ! x 1 , ! x 2 , … , ! x N ∈ R d Mean vector: ! µ Top k eigenvectors: ! v 1 , ! v 2 , … , ! v k o j = ! ( ) ! Approximation of ! x j : ! v i ⋅ ! ! k ∑ µ + x j v i i = 1 o x N x i − ! ! 1 ∑ 2 RMS Error = o i 2 o o x N i = 1 x o x x o o x x o o x

  4. Regression based Predic>on Data: ( ! x 1 , y 1 ), ( ! x 2 , y 2 ), … , ( ! x N , y N ) ∈ R d Input: ! x ∈ R d Output: y ∈ R Approximation of y given ! d ∑ y = a 0 + x : ˆ a i x i i = 1 o x N 1 ∑ ( ) RMS Error = y i − ˆ 2 o y i x N i = 1 o x o x o x o x x o o x

  5. K-means clustering Data: ! x 1 , ! x 2 , … , ! x N ∈ R d Model: k representatives: ! 1 , ! 2 , … , ! k ∈ R d r r r Approximation of ! x j : ! x j − ! ! 2 o j = argmin ! r r i 2 i = the representative closest to ! x j N x i − ! ! 1 ∑ 2 RMS Error = o i 2 N i = 1

  6. K-means Algorithm Initialize k representatives ! 1 , ! 2 , … , ! k ∈ R d r r r Iterate until convergence: "! " ! a . Associate each ! → r j x i with it's closest representative x i b . Replace each representative ! r j with the mean of the points assigned to ! r j Both a step and b step reduce RMSE

  7. Simple Ini>aliza>on Simplest Ini>aliza>on: choose representa>ve from data points independently at random. – Problem: some representa>ves are close to each other and some parts of the data have no representa>ves. – Kmeans is a local search method – can get stuck in local minima.

  8. Kmeans++ • A different method for ini>alizing representa>ves. • Spreads out ini>al representa>ves • Add representa>ves one by one • Before adding representa>ve, define distribu>on over unselected data points. Data: ! x 1 , … , ! Current Reps: ! 1 , … , ! x N r r j Distance of example to Reps: d ( ! x ,{ ! 1 , … , ! r j }) = min 1 ≤ i ≤ j ‖ ! x − ! r r i ‖ Prob. of selecting example ! x as next representative: P ( ! x ) = 1 1 d ( ! x ,{ ! 1 , … , ! Z r r j })

  9. Example for Kmeans++ This is an unlikely ini>aliza>on for kmeans++

  10. Parallelized Kmeans • Suppose the data points are par>>oned randomly across several machines. • We want to perform the a,b steps with minimal communica>on btwn machines. 1. Choose ini>al representa>ves and broadcast to all machines. 2. Each machine par>>ons its own data points according to closest representa>ve. Defines (key,value) pairs where key=index of closest representa>ve. Value=example. 3. Compute the mean for each set by performing reduceByKey. (most of the summing done locally on each machine). 4. Broadcast new reps to all machines.

  11. Clustering stability

  12. Clustering stability Clustering using Star>ng points I Clustering using Star>ng points 2 Clustering using Star>ng points 3

  13. Measuring clustering stability Entry in row “clustering j”, column “xi” contains the index of the closest representa>ve to xi for clustering j x1 x2 x3 x4 x5 x6 xn Clustering 1 1 1 3 1 3 2 2 2 3 Clustering 2 2 2 1 2 1 3 3 3 1 Clustering 3 2 2 3 2 3 1 1 1 3 Clustering 4 1 1 1 1 3 3 3 3 1 The first three clusterings are completely consistent with each other The fourth clustering has a disagreement in x5

  14. How to quan>fy stability? • We say that a clustering is stable if the examples are always grouped in the same way. • When we have thousands of examples, we cannot expect all of them to always be grouped the same way. • We need a way to quan>fy the stability. • Basic idea: measure how much groupings differ between clusterings.

  15. Entropy A partition G of the data defines a distribution over the parts: p 1 + p 2 + ! + p k = 1 The information in this partition is measured by the Entropy: k 1 ∑ H ( G ) = H ( p 1 , p 2 , … , p k ) = p i log 2 p i i = 1 H ( G ) is a number between 0 (one part with prob. 1) and log 2 k ( p 1 = p 2 = ! = p k = 1 k )

  16. Entropy of a combined par>>on If clustering1 and clustering 2 partition the data in the exact same way then G 1 = G 2 , H ( G 1 , G 2 ) = H ( G 1 ) = H ( G 2 ) If clustering1 and clustering 2 are independent (partition the data independently from each other). then H ( G 1 , G 2 ) = H ( G 1 ) + H ( G 2 ) Suppse we produce many clusterings, using many starting points. Suppose we plot H ( G 1 ), H ( G 1 , G 2 ), … , H ( G 1 , G 2 , … , G i ), … As a function of i If the graph increases like i log 2 k then the clustering is completely unstable If the graph stops increasing after some i then we reached stability.

Recommend


More recommend