a sober look at clustering stability
play

A Sober look at Clustering Stability Shai Ben-David 1 Ulrike von - PowerPoint PPT Presentation

A Sober look at Clustering Stability Shai Ben-David 1 Ulrike von Luxburg 2 Dvid Pl 1 1 School of Computer Science University of Waterloo 2 Fraunhofer IPSI, Darmstadt, Germany COLT 2006 Shai Ben-David, Ulrike von Luxburg, Dvid Pl A Sober


  1. A Sober look at Clustering Stability Shai Ben-David 1 Ulrike von Luxburg 2 Dávid Pál 1 1 School of Computer Science University of Waterloo 2 Fraunhofer IPSI, Darmstadt, Germany COLT 2006 Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

  2. What is clustering? By clustering we mean grouping data according to some distance/similarity measure. Data Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

  3. What is clustering? By clustering we mean grouping data according to some distance/similarity measure. Clusters (Linkage algorithm) Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

  4. What is clustering? By clustering we mean grouping data according to some distance/similarity measure. Clusters (Center-based algorithm) Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

  5. Correctness of clustering Q: Clustering is not well defined problem. How do we know that we cluster correctly? A: Common solution – Stability. Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

  6. Correctness of clustering Q: Clustering is not well defined problem. How do we know that we cluster correctly? A: Common solution – Stability. Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

  7. Stability: Idea of our definition Pick your favorite clustering algorithm A . Generate two independent samples S 1 and S 2 . Stability How much will clusterings A ( S 1 ) and A ( S 2 ) differ? If for large sample sizes clusterings A ( S 1 ) and A ( S 2 ) are almost identical, we say that A is stable . Otherwise unstable . Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

  8. Stability: Idea of our definition Pick your favorite clustering algorithm A . Generate two independent samples S 1 and S 2 . Stability How much will clusterings A ( S 1 ) and A ( S 2 ) differ? If for large sample sizes clusterings A ( S 1 ) and A ( S 2 ) are almost identical, we say that A is stable . Otherwise unstable . Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

  9. Stability: Idea of our definition Pick your favorite clustering algorithm A . Generate two independent samples S 1 and S 2 . Stability How much will clusterings A ( S 1 ) and A ( S 2 ) differ? If for large sample sizes clusterings A ( S 1 ) and A ( S 2 ) are almost identical, we say that A is stable . Otherwise unstable . Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

  10. Example of stability Probability distribution Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

  11. Example of stability Sample S 1 Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

  12. Example of stability Clustering A ( S 1 ) Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

  13. Example of stability Sample S 2 Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

  14. Example of stability Clustering A ( S 2 ) Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

  15. Example of stability Clusterings A ( S 1 ) and A ( S 2 ) are equivalent. Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

  16. Example of instability Probability distribution Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

  17. Example of instability Sample S 1 Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

  18. Example of instability Clustering A ( S 1 ) Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

  19. Example of instability Sample S 2 Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

  20. Example of instability Clustering A ( S 2 ) Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

  21. Example of instability Clusterings A ( S 1 ) and A ( S 2 ) are different Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

  22. Motivation Why do people think stability is important? For tuning parameters of clusterings algorithms, such as number of clusters To verify meaningfulness of clustering outputted by algorithm. Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

  23. Motivation Why do people think stability is important? For tuning parameters of clusterings algorithms, such as number of clusters To verify meaningfulness of clustering outputted by algorithm. Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

  24. Motivation Our intention: Provide theoretical justification. We discovered: The popular belief is false. Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

  25. Motivation Our intention: Provide theoretical justification. We discovered: The popular belief is false. Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

  26. First example 1D probability distribution Probability density 50% 50% x Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

  27. First example 2 centers – stable Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

  28. First example 3 centers – solution #1 Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

  29. First example 3 centers – solution #2 = ⇒ unstable Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

  30. First example slightly asymmetric distribution Probability density (50 + ǫ )% (50 − ǫ )% x Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

  31. First example 2 centers – stable Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

  32. First example 3 centers – stable x Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

  33. Second example 1D probability distribution Probability density ∼ 90% ∼ 10% x Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

  34. Second example 2 centers – unstable ∼ 90% ∼ 10% Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

  35. Second example 3 centers – stable ∼ 90% ∼ 10% x Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

  36. Our results Theorem For a cost based algorithm (e.g. k-means, k-medians): If the optimization problem has unique optimum, then the algorithm is stable. If the underlying probability distribution is symmetric and the optimization problem has multiple symmetric optima, then the algorithm is unstable. Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

  37. Our results Theorem For a cost based algorithm (e.g. k-means, k-medians): If the optimization problem has unique optimum, then the algorithm is stable. If the underlying probability distribution is symmetric and the optimization problem has multiple symmetric optima, then the algorithm is unstable. Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

  38. Our results Theorem For a cost based algorithm (e.g. k-means, k-medians): If the optimization problem has unique optimum, then the algorithm is stable. If the underlying probability distribution is symmetric and the optimization problem has multiple symmetric optima, then the algorithm is unstable. Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

  39. Conclusion Stability, contrary to common belief, does not measure validity of a clustering or meaningfulness of choice of number of clusters. Instead, it measures the number of solutions to the clustering optimization problem for the underlying probability distribution. Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

  40. Open problems Q: Is symmetry really needed for instability? A: No! (Work in progress, together with Shai Ben-David & Hans Ulrich Simon) Analyze finite sample sizes, and give explicit bounds. Analyze other types of algorithms e.g. linkage algorithms. Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

  41. Open problems Q: Is symmetry really needed for instability? A: No! (Work in progress, together with Shai Ben-David & Hans Ulrich Simon) Analyze finite sample sizes, and give explicit bounds. Analyze other types of algorithms e.g. linkage algorithms. Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

  42. Open problems Q: Is symmetry really needed for instability? A: No! (Work in progress, together with Shai Ben-David & Hans Ulrich Simon) Analyze finite sample sizes, and give explicit bounds. Analyze other types of algorithms e.g. linkage algorithms. Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

Recommend


More recommend