k-means++: few more steps yield constant approximation Davin Choo - PowerPoint PPT Presentation

k-means++: few more steps yield constant approximation Davin Choo Christoph Grunau Julian Portmann V´ aclav Rozhoˇ n ETH Z¨ urich ICML 2020

Clustering Given unlabelled d -dimensional data points P = { p 1 , . . . , p n } , group similar ones together into k clusters Which is a better clustering into k = 3 groups?

k-means metric ◮ Centers C = { c 1 , . . . , c k } p ∈ P min c ∈ C d ( p , c ) 2 = � ◮ cost ( P , C ) = � p ∈ P cost ( p , C ) c 2 p c 3 cost ( p , c 3 ) c 1 ◮ Restricting C ⊆ P only loses a 2-factor in cost ( P , C ) ◮ NP-hard to find optimal solution [ADHP09, MNV09]

k-means metric ◮ Centers C = { c 1 , . . . , c k } p ∈ P min c ∈ C d ( p , c ) 2 = � ◮ cost ( P , C ) = � p ∈ P cost ( p , C ) ◮ Given k clusters, optimal centers are the means/centroids

k-means metric ◮ Centers C = { c 1 , . . . , c k } p ∈ P min c ∈ C d ( p , c ) 2 = � ◮ cost ( P , C ) = � p ∈ P cost ( p , C ) p 4 c 1 p 3 p 2 p 1 ◮ Given k clusters, optimal centers are the means/centroids c 1 = 1 e.g. 4 [ p 1 + p 2 + p 3 + p 4 ]

k-means metric ◮ Centers C = { c 1 , . . . , c k } p ∈ P min c ∈ C d ( p , c ) 2 = � ◮ cost ( P , C ) = � p ∈ P cost ( p , C ) c 2 c 3 c 1 ◮ Given k clusters, optimal centers are the means/centroids ◮ Given k centers, optimal cluster assignment is closest center

k-means metric ◮ Centers C = { c 1 , . . . , c k } p ∈ P min c ∈ C d ( p , c ) 2 = � ◮ cost ( P , C ) = � p ∈ P cost ( p , C ) c 2 cost ( p , c 2 ) p c 3 cost ( p , c 1 ) cost ( p , c 3 ) c 1 ◮ Given k clusters, optimal centers are the means/centroids ◮ Given k centers, optimal cluster assignment is closest center

Lloyd’s algo. [Llo82]: Heuristic alternating minimization Given k initial centers (Remark: centers not necessarily from P ) Optimal assignment ← → Optimal clustering (Animation works only for PDF readers like Adobe Acrobat Reader)

Lloyd’s algo. [Llo82]: Heuristic alternating minimization Given k initial centers (Remark: centers not necessarily from P ) Optimal assignment ← → Optimal clustering ◮ Lloyd’s algorithm never worsens cost ( P , C ) but has no performance guarantees (local minimas) ◮ One way to get theoretic guarantees: Seed with provably good initial centers

k-means++ initialization [AV07] ◮ Chooses k points from P : O (log k ) apx. (in expectation) ◮ 1 st center chosen uniformly at random from P

k-means++ initialization [AV07] ◮ Chooses k points from P : O (log k ) apx. (in expectation) ◮ 1 st center chosen uniformly at random from P Cost to centers C cost ( p , C ) ◮ D 2 -sampling: Pr[ p ] = � p ∈ P cost ( p , C ) (updated at each step) 100 90 40 p cost ( p , C ) = 90

k-means++ initialization [AV07] ◮ Chooses k points from P : O (log k ) apx. (in expectation) ◮ 1 st center chosen uniformly at random from P Cost to centers C cost ( p , C ) ◮ D 2 -sampling: Pr[ p ] = � p ∈ P cost ( p , C ) (updated at each step)

k-means++ initialization [AV07] ◮ Chooses k points from P : O (log k ) apx. (in expectation) ◮ 1 st center chosen uniformly at random from P Cost to centers C cost ( p , C ) ◮ D 2 -sampling: Pr[ p ] = � p ∈ P cost ( p , C ) (updated at each step) 40 90 p cost ( p , C ) = 40

k-means++ initialization [AV07] ◮ Chooses k points from P : O (log k ) apx. (in expectation) ◮ 1 st center chosen uniformly at random from P Cost to centers C cost ( p , C ) ◮ D 2 -sampling: Pr[ p ] = � p ∈ P cost ( p , C ) (updated at each step) ◮ Practically efficient: O ( dnk ) running time ◮ Exist instances where running k-means++ yield Ω(log k ) apx. with high probability in k [BR13, BJA16]

What is known? • Lloyd’s algorithm [Llo82] Practice Theory

What is known? • Lloyd’s algorithm [Llo82] Practice • Best known approximation factor [ANFSW19]: 6.357 • PTAS for fixed k [KSS10] Theory • PTAS for fixed d [CAKM19, FRS19] • Local search [KMN + 04]: (9 + ǫ )-approximation in poly-time

What is known? • Lloyd’s algorithm [Llo82] Practice • k-means++ [AV07]: O (log k ) apx. in O ( dnk ) time • LocalSearch++ [LS19]: O (1) apx. in O ( dnk 2 log log k ) time • Best known approximation factor [ANFSW19]: 6.357 • PTAS for fixed k [KSS10] Theory • PTAS for fixed d [CAKM19, FRS19] • Local search [KMN + 04]: (9 + ǫ )-approximation in poly-time

What is known? • Lloyd’s algorithm [Llo82] Practice • k-means++ [AV07]: O (log k ) apx. in O ( dnk ) time • LocalSearch++ [LS19]: O (1) apx. in O ( dnk 2 log log k ) time • Best known approximation factor [ANFSW19]: 6.357 • PTAS for fixed k [KSS10] Theory • PTAS for fixed d [CAKM19, FRS19] • Local search [KMN + 04]: (9 + ǫ )-approximation in poly-time ◮ Bi-criteria approximation [Wei16, ADK09]: O (1)-approximation with O ( k ) cluster centers

What is known? • Lloyd’s algorithm [Llo82] Practice • k-means++ [AV07]: O (log k ) apx. in O ( dnk ) time • LocalSearch++ [LS19]: O (1) apx. in O ( dnk 2 log log k ) time • Best known approximation factor [ANFSW19]: 6.357 • PTAS for fixed k [KSS10] Theory • PTAS for fixed d [CAKM19, FRS19] • Local search [KMN + 04]: (9 + ǫ )-approximation in poly-time ◮ Bi-criteria approximation [Wei16, ADK09]: O (1)-approximation with O ( k ) cluster centers ◮ This work: O ( dnk 2 ) running time, O (1) approximation

Outline of talk ◮ What we have discussed ◮ Clustering as a motivation ◮ Lloyd’s heuristic and k-means++ initialization ◮ Prior work

Outline of talk ◮ What we have discussed ◮ Clustering as a motivation ◮ Lloyd’s heuristic and k-means++ initialization ◮ Prior work ◮ What’s next ◮ Idea of bi-criteria algorithm and notion of settledness ◮ Idea of local search ◮ LocalSearch++: combining k-means++ with local search ◮ Key idea behind how we tighten analysis of LocalSearch++

Bi-criteria [Wei16, ADK09] and settledness ◮ “Balls into bins” process ◮ k bins: Optimal k -clustering of points defined by OPT k ◮ O ( k ) balls: Sampled points in C ◮ A cluster Q is settled if cost ( Q , C ) ≤ 10 · cost ( Q , OPT k )

Bi-criteria [Wei16, ADK09] and settledness ◮ “Balls into bins” process ◮ k bins: Optimal k -clustering of points defined by OPT k ◮ O ( k ) balls: Sampled points in C ◮ A cluster Q is settled if cost ( Q , C ) ≤ 10 · cost ( Q , OPT k ) ◮ Can show (with constant success probabilities): ◮ If not yet 20-apx., D 2 -sampling chooses from unsettled cluster ◮ If sample p from unsettled cluster Q , adding p makes Q settled

Bi-criteria [Wei16, ADK09] and settledness ◮ “Balls into bins” process ◮ k bins: Optimal k -clustering of points defined by OPT k ◮ O ( k ) balls: Sampled points in C ◮ A cluster Q is settled if cost ( Q , C ) ≤ 10 · cost ( Q , OPT k ) ◮ Can show (with constant success probabilities): ◮ If not yet 20-apx., D 2 -sampling chooses from unsettled cluster ◮ If sample p from unsettled cluster Q , adding p makes Q settled ◮ After O ( k ) samples, cost ( P , C ) ≤ 20 · cost ( P , OPT k )

Local search [KMN + 04] ◮ Initialize arbitrary k points → C ◮ Repeat ◮ Pick arbitrary point p ∈ P ◮ If ∃ q ∈ C such that cost ( P , C \ { q } ∪ { p } ) improves cost, swap

Local search [KMN + 04] ◮ Initialize arbitrary k points → C ◮ Repeat ◮ Pick arbitrary point p ∈ P ◮ If ∃ q ∈ C such that cost ( P , C \ { q } ∪ { p } ) improves cost, swap ◮ Polynomial number of iterations → O (1) approximation

LocalSearch++ [LS19] ◮ Initialize arbitrary k points → C from output of k-means++ ◮ Repeat ◮ Pick arbitrary point p ∈ P using D 2 -sampling ◮ If ∃ q ∈ C such that cost ( P , { p } ∪ C \ { q } ) improves cost, swap O ( k log log k ) ◮ Polynomial number of iterations → O (1) approximation

LocalSearch++ [LS19]: One step of analysis � 1 ◮ Lemma: In each step, cost decrease by factor of 1 − Θ � k with constant probability

LocalSearch++ [LS19]: One step of analysis � 1 ◮ Lemma: In each step, cost decrease by factor of 1 − Θ � k with constant probability ◮ Implication: After O ( k ) steps, approximation factor halves k-means++ is O (log k ) apx. in expectation O ( k ) O ( k ) steps steps � � � � log k log k . . . O (log k )-apx. O -apx. O = O (1)-apx. 2 2 r r = O (log log k ) phases, totaling O ( k log log k ) steps

LocalSearch++ [LS19]: Bounding cost decrease ◮ Match OPT centers c ∗ ∈ C ∗ to candidate centers c ∈ C C ∗ clusters C clusters M L

k-means++: few more steps yield constant approximation Davin Choo - PowerPoint PPT Presentation

k-means++: few more steps yield constant approximation Davin Choo Christoph Grunau Julian Portmann V aclav Rozho n ETH Z urich ICML 2020 Clustering Given unlabelled d -dimensional data points P = { p 1 , . . . , p n } , group

TDR Assumptions for Pulsed Neutron Yield [/keV] Neutron Yield [/keV] 2500 2000 2000 2500

6. Approximation and fitting norm approximation least-norm problems regularized

Maths Summary Intro: Zero coupon yield curve (Ex 13) TV concept Yield curve

Advanced Algorithms COMS31900 Approximation algorithms part two more constant factor

Next Edge RCM Private Yield Fund June 2020 1 Important Notes The Next Edge RCM Private Yield

Non-constant Non-constant growth model growth model You are calculating the intrinsic value of

OCEAN YIELD AS Company Presentation 20 June 2012 Highlights Ocean Yield A vessels owning

Unlocking the High Yield Bond Market Martin Fridson CEO FridsonVision LLC Presented to the

Learn more Do more Be more Learn more Do more Be more UNITY Learn more Do

Next Edge Theta Yield Fund Next Edge Capital Corp., January 2016 IMPORTANT NOTES The Next Edge

LOCAL LINEAR APPROXIMATION MATH 200 GOALS Be able to compute the local linear approximation

Defect Detection Thomas Zimmermann The First Bug September 9, 1947 More Bugs More Bugs More

Why Transformers Work. More info blablabla More info blablabla More info blablabla More

K-MEANS++ OPTIMAL INITIALIZATION ALGORITHM An Improved K-means Clustering Method OVERVIEW

Lecture 23/Chapter 19 Diversity of Sample Means Means versus Proportions Behavior of

Motion with Constant Acceleration 1 Particle Under Constant Acceleration In the case of motion

WHU_NERCMS at TRECVID2018: INS Dongshu Xu, Longxiang Jiang, Xiaoyu Chai, Jin Chen, Han Fang, Li

Greedy algorithms: greed is good? Greedy algorithms Shortest paths in weighted graphs Greed, for

Greedy algorithms: greed is good? Greedy algorithms Greed, for lack of a better word, Coin

Statistical Machine Learning Lecture 09: Classification Kristian Kersting TU Darmstadt Summer

In this lecture we investigate a connection between Taylor series and Fourier series:

CS 678 Machine Learning Lecture Notes 1 Week 1 - chapter 1 and probability 1.1 General

Standardizing Commit-and-Prove ZK Daniel Benarroch Matteo Campanelli Dario Fiore IMDEA Software

A Kalman Filter for Robust Outlier Detection Jo-Anne Ting, Evangelos Theodorou, Stefan Schaal

k-means++: few more steps yield constant approximation Davin Choo - PowerPoint PPT Presentation

k-means++: few more steps yield constant approximation Davin Choo Christoph Grunau Julian Portmann V aclav Rozho n ETH Z urich ICML 2020 Clustering Given unlabelled d -dimensional data points P = { p 1 , . . . , p n } , group

TDR Assumptions for Pulsed Neutron Yield [/keV] Neutron Yield [/keV] 2500 2000 2000 2500

6. Approximation and fitting norm approximation least-norm problems regularized

Maths Summary Intro: Zero coupon yield curve (Ex 13) TV concept Yield curve

Advanced Algorithms COMS31900 Approximation algorithms part two more constant factor

Next Edge RCM Private Yield Fund June 2020 1 Important Notes The Next Edge RCM Private Yield

Non-constant Non-constant growth model growth model You are calculating the intrinsic value of

OCEAN YIELD AS Company Presentation 20 June 2012 Highlights Ocean Yield A vessels owning

Unlocking the High Yield Bond Market Martin Fridson CEO FridsonVision LLC Presented to the

Learn more Do more Be more Learn more Do more Be more UNITY Learn more Do

Next Edge Theta Yield Fund Next Edge Capital Corp., January 2016 IMPORTANT NOTES The Next Edge

LOCAL LINEAR APPROXIMATION MATH 200 GOALS Be able to compute the local linear approximation

Defect Detection Thomas Zimmermann The First Bug September 9, 1947 More Bugs More Bugs More

Why Transformers Work. *More info blablabla *More info blablabla *More info blablabla *More

K-MEANS++ OPTIMAL INITIALIZATION ALGORITHM An Improved K-means Clustering Method OVERVIEW

Lecture 23/Chapter 19 Diversity of Sample Means Means versus Proportions Behavior of

Motion with Constant Acceleration 1 Particle Under Constant Acceleration In the case of motion

WHU_NERCMS at TRECVID2018: INS Dongshu Xu, Longxiang Jiang, Xiaoyu Chai, Jin Chen, Han Fang, Li

Greedy algorithms: greed is good? Greedy algorithms Shortest paths in weighted graphs Greed, for

Greedy algorithms: greed is good? Greedy algorithms Greed, for lack of a better word, Coin

Statistical Machine Learning Lecture 09: Classification Kristian Kersting TU Darmstadt Summer

In this lecture we investigate a connection between Taylor series and Fourier series:

CS 678 Machine Learning Lecture Notes 1 Week 1 - chapter 1 and probability 1.1 General

Standardizing Commit-and-Prove ZK Daniel Benarroch Matteo Campanelli Dario Fiore IMDEA Software

A Kalman Filter for Robust Outlier Detection Jo-Anne Ting, Evangelos Theodorou, Stefan Schaal

Why Transformers Work. More info blablabla More info blablabla More info blablabla More