π -mea d π -med eans ns an and edian ians s un unde der r di dimen ension sion re redu ducti ction on Yury Makarychev, TTIC Konstantin Makarychev, Northwestern Ilya Razenshteyn, Microsoft Research Simons Institute, November 2, 2018
Euclidean π -means and π -medians Given a set of points π in β π Partition π into π clusters π· 1 , β¦ , π· π and find a βcenterβ π π for each π· π so as to minimize the cost π ( π -median) ΰ· ΰ· π(π£, π π ) π=1 π£βπ· π π π π£, π π 2 ΰ· ΰ· ( π -means) π=1 π£βπ· π
Dimension Reduction Dimension reduction π: β π β β π is a random map that preserves distances within a factor of 1 + π with probability at least 1 β π : 1 π£ β π€ β€ π π£ β π π€ β€ (1 + π) π£ β π€ 1 + π [Johnson-Lindenstrauss β84] There exists a random log 1/π linear dimension reduction with π = π . π 2 [Larsen, Nelson β17] The dependence of π on π and π is optimal.
Dimension Reduction JL preserves all distances between points in π whp when π = Ξ©(log |π|/π 2 ) . Numerous applications in computer science. Dimension Reduction Constructions: β’ [JL β84] Project on a random π -dimensional subspace β’ [Indyk, Motwani β98] Apply a random Gaussian matrix β’ [Achlioptas β03] Apply a random matrix with Β±1 entries β’ [Ailon , Chazelle β06] Fast JL-transform
π -means under dimension reduction [Boutsidis, Zouzias, Drineas β10] Apply a dimension reduction π to our dataset π dimension reduction Cluster π(π) in dimension π .
π -means under dimension reduction want Optimal clusterings of π and π(π) have approximately the same cost. even better The cost of every clustering is approximately preserved. For what dimension π can we get this?
π -means under dimension reduction distortion π ~ log π /π 2 Folklore 1 + π Boutsidis, Zouzias, ~π/π 2 2 + π Drineas β10 ~π/π 2 Cohen, Elder, 1 + π Musco, Musco, ~ log π /π 2 9 + π Persu β15 ~ log(π/π) /π 2 MM R β18 1 + π ~ log π /π 2 Lower bound 1 + π
π -medians under dimension reduction distortion π Prior work β β Kirszsbraun Thm β ~ log π /π 2 1 + π ~ log(π/π) /π 2 MM R β18 1 + π ~ log π /π 2 Lower bound 1 + π
Plan π -means β’ Challenges β’ Warm up: π~log π /π 2 β’ Special case: βdistortionsβ are everywhere sparse β’ Remove outliers: the general case β the special case β’ Outliers π -medians β’ Overview of our approach
Out result for π -means Let π β β π π: β π β β π be a random dimension reduction. π β₯ π log π ππ /π 2 With probability at least 1 β π : 1 β π cost π β€ cost π π β€ 1 + π cost π for every clustering π = π· 1 , β¦ , π· π of π
Challenges Let π β be the optimal π -means clustering. Easy: cost π β β cost π(π β ) with probability 1 β π Hard: Prove that there is no other clustering πβ² s.t. cost π π β² < 1 β π cost π β since there are exponentially many clusterings πβ² (canβt use the union bound)
Warm-up Consider a clustering π = (π· 1 , β¦ , π· π ) . Write the cost in terms of pair-wise distances: π 1 π£ β π€ 2 cost π = ΰ· 2|π· π | ΰ· π=1 π£,π€βπ· π all distances π£ β π€ are preserved within 1 + π β cost π is preserved within 1 + π Sufficient to have π~ log π /π 2
Problem & Notation Assume that π = (π· 1 , β¦ , π· π ) is a random clustering that depends on π . Want to prove: cost π β cost π π whp. The distance between π£ and π€ is (1 + π) -preserved or distorted depending on whether π(π£) β π(π€) β 1+π π£ β π€ Think π = poly(1/π, π) is sufficiently small.
Distortion graph Connect π£ and π€ with an edge if the distance between them is distorted. + Every edge is present with probability at most π . β Edges are not independent. β π depends on the set of edges. β May have high-degree vertices. β All distances in a cluster may be distorted.
Cost of a cluster The cost of π· π is 1 π£ β π€ 2 2|π· π | ΰ· π£,π€βπ· π + Terms for non-edges (π£, π€) are (1 + π) preserved. π£ β π€ β π π£ β π(π€) β Need to prove that π£ β π€ 2 = π π£ β π(π€) 2 Β± πβ²cost π ΰ· ΰ· π£,π€βπ· π π£,π€βπ· π π£,π€ βπΉ π£,π€ βπΉ
Everywhere-sparse edges Assume every π£ β π· π is connected to at most a π fraction of all π€ in π· π (where π βͺ π ).
Everywhere-sparse edges + Terms for non-edges (π£, π€) are (1 + π) preserved. + The contribution of terms for edges is small: for an edge π£, π€ and any π₯ β π· π π£ β π€ β€ π£ β π₯ + π₯ β π€ π£ β π€ 2 β€ 2 π£ β π₯ 2 + π₯ β π€ 2
Everywhere-sparse edges π£ β π€ 2 β€ 2 π£ β π₯ 2 + π₯ β π€ 2 β’ Replace the term for every edge with two terms π£ β π₯ 2 , π₯ β π€ 2 for random π₯ β π· π . β’ Each term is used at most 2π times, in expectation. π£ β π€ 2 β€ 4π ΰ· π£ β π€ 2 ΰ· (π£,π€)βπΉ π£,π€βπ· π π£,π€βπ· π
Everywhere-sparse edges π£ β π€ 2 β π£ β π€ 2 ΰ· ΰ· π£,π€βπ· π π£,π€ βπΉ β π(π£) β π(π€) 2 β ΰ· π(π£) β π(π€) 2 ΰ· (π£,π€)βπΉ π£,π€βπ· π
Everywhere-sparse edges π£ β π€ 2 β π£ β π€ 2 ΰ· ΰ· π£,π€βπ· π π£,π€ βπΉ β π(π£) β π(π€) 2 β ΰ· π(π£) β π(π€) 2 ΰ· (π£,π€)βπΉ π£,π€βπ· π Edges are not necessarily everywhere sparse!
Outliers Want: remove βoutliersβ so that in the remaining set πβ² edges are everywhere sparse in every cluster.
(1 β π) non-distorted core Want: remove βoutliersβ so that in the remaining set πβ² edges are everywhere sparse in every cluster.
(1 β π) non-distorted core Want: remove βoutliersβ so that in the remaining set πβ² edges are everywhere sparse in every cluster. Find a subset π β² β π (which depends on π ) s.t. β’ Edges are sparse in the obtained clusters: Every π£ β π· π β© πβ² is connected to at most a π fraction of all π€ in π· π β© πβ² . β’ Outliers are rare: For every π£ , Pr π£ β π β² β€ π
All clusters are large Assume all clusters are of size ~π/π . Let π = π 1/4 . outliers = all vertices of degree at least ~ππ/π Every vertex has degree at most ππ in expectation. By Markov, Pr( π£ is an outlier) β€ ππ π β€ π Remove ππ βͺ π/π vertices in total, so all clusters still have size ~π/π . Crucially use that all clusters are large!
Main Combinatorial Lemma Idea: assign βweightsβ to vertices so that all clusters have a large weight. β’ There is a measure π on π and random set π s.t. 1 π· π βπ for π¦ β π· π β π (always) π π¦ β₯ β’ π π β€ 4π 3 /π 2 β’ Pr(π¦ β π) β€ π All clusters π· π β π are βlargeβ w.r.t. measure π . Can apply a variant of the previous argument.
Edges Incident on Outliers Need to take care of edges incident on outliers. π£ π€ π β Say, π£ is an outlier and π€ is not. β for π . β , β¦ , π· π Consider a fixed optimal clustering π· 1 Let π β be the optimal center for π£ .
Edges Incident on Outliers π£ π€ π β π€ β π β Β± π β β π£ π£ β π€ = β π(π€) β π(π β ) Β± π(π β ) β π(π£) π(π£) β π(π€) = May assume that the distances between non-outliers and the optimal centers are 1 + π -preserved.
Edges Incident on Outliers π£ π€ π β π€ β π β Β± π β β π£ π£ β π€ = β π(π€) β π(π β ) Β± π(π β ) β π(π£) π(π£) β π(π€) = β β π£ 2 ] β€ π Ο π£βπ π π£ β β π£ 2 = π OPT π½ [ Ο π£βπ β² π π£
Edges Incident on Outliers π£ π€ π β π€ β π β Β± π β β π£ π£ β π€ = β π(π€) β π(π β ) Β± π(π β ) β π(π£) π(π£) β π(π€) = Taking care of π(π β ) β π(π£) is a bit more difficult. QED
π -medians under dimension reduction
π -medians β No formula for the cost of the clustering in terms of pairwise distances. β Not obvious when π ~ log π (then all pairwise distances are approximately preserved). [was asked by Ravi Kannan in a tutorial @ Simons] + Kirzsbraun Theorem β the π~ log π case + Prove a Robust Kirzsbraun Theorem Our methods for π -means + Robust Kirzsbraun β π~ log π for π -medians
Recommend
More recommend