di dimen ension sion re redu ducti ction on
play

di dimen ension sion re redu ducti ction on Yury Makarychev, - PowerPoint PPT Presentation

-mea d -med eans ns an and edian ians s un unde der r di dimen ension sion re redu ducti ction on Yury Makarychev, TTIC Konstantin Makarychev, Northwestern Ilya Razenshteyn, Microsoft Research Simons Institute,


  1. 𝒍 -mea d 𝒍 -med eans ns an and edian ians s un unde der r di dimen ension sion re redu ducti ction on Yury Makarychev, TTIC Konstantin Makarychev, Northwestern Ilya Razenshteyn, Microsoft Research Simons Institute, November 2, 2018

  2. Euclidean 𝑙 -means and 𝑙 -medians Given a set of points π‘Œ in ℝ 𝑛 Partition π‘Œ into 𝑙 clusters 𝐷 1 , … , 𝐷 𝑙 and find a β€œcenter” 𝑑 𝑗 for each 𝐷 𝑗 so as to minimize the cost 𝑙 ( 𝑙 -median) ෍ ෍ 𝑒(𝑣, 𝑑 𝑗 ) 𝑗=1 π‘£βˆˆπ· 𝑗 𝑙 𝑒 𝑣, 𝑑 𝑗 2 ෍ ෍ ( 𝑙 -means) 𝑗=1 π‘£βˆˆπ· 𝑗

  3. Dimension Reduction Dimension reduction πœ’: ℝ 𝑛 β†’ ℝ 𝑒 is a random map that preserves distances within a factor of 1 + 𝜁 with probability at least 1 βˆ’ πœ€ : 1 𝑣 βˆ’ 𝑀 ≀ πœ’ 𝑣 βˆ’ πœ’ 𝑀 ≀ (1 + 𝜁) 𝑣 βˆ’ 𝑀 1 + 𝜁 [Johnson-Lindenstrauss β€˜84] There exists a random log 1/πœ€ linear dimension reduction with 𝑒 = 𝑃 . 𝜁 2 [Larsen, Nelson β€˜17] The dependence of 𝑒 on 𝜁 and πœ€ is optimal.

  4. Dimension Reduction JL preserves all distances between points in π‘Œ whp when 𝑒 = Ξ©(log |π‘Œ|/𝜁 2 ) . Numerous applications in computer science. Dimension Reduction Constructions: β€’ [JL β€˜84] Project on a random 𝑒 -dimensional subspace β€’ [Indyk, Motwani β€˜98] Apply a random Gaussian matrix β€’ [Achlioptas β€˜03] Apply a random matrix with Β±1 entries β€’ [Ailon , Chazelle β€˜06] Fast JL-transform

  5. 𝑙 -means under dimension reduction [Boutsidis, Zouzias, Drineas ’10] Apply a dimension reduction πœ’ to our dataset π‘Œ dimension reduction Cluster πœ’(π‘Œ) in dimension 𝑒 .

  6. 𝑙 -means under dimension reduction want Optimal clusterings of π‘Œ and πœ’(π‘Œ) have approximately the same cost. even better The cost of every clustering is approximately preserved. For what dimension 𝑒 can we get this?

  7. 𝑙 -means under dimension reduction distortion 𝒆 ~ log π‘œ /𝜁 2 Folklore 1 + 𝜁 Boutsidis, Zouzias, ~𝑙/𝜁 2 2 + 𝜁 Drineas β€˜10 ~𝑙/𝜁 2 Cohen, Elder, 1 + 𝜁 Musco, Musco, ~ log 𝑙 /𝜁 2 9 + 𝜁 Persu ’15 ~ log(𝑙/𝜁) /𝜁 2 MM R ’18 1 + 𝜁 ~ log 𝑙 /𝜁 2 Lower bound 1 + 𝜁

  8. 𝑙 -medians under dimension reduction distortion 𝒆 Prior work β€” β€” Kirszsbraun Thm β‡’ ~ log π‘œ /𝜁 2 1 + 𝜁 ~ log(𝑙/𝜁) /𝜁 2 MM R ’18 1 + 𝜁 ~ log 𝑙 /𝜁 2 Lower bound 1 + 𝜁

  9. Plan 𝑙 -means β€’ Challenges β€’ Warm up: 𝑒~log π‘œ /𝜁 2 β€’ Special case: β€œdistortions” are everywhere sparse β€’ Remove outliers: the general case β†’ the special case β€’ Outliers 𝑙 -medians β€’ Overview of our approach

  10. Out result for 𝑙 -means Let π‘Œ βŠ‚ ℝ 𝑛 πœ’: ℝ 𝑛 β†’ ℝ 𝑒 be a random dimension reduction. 𝑒 β‰₯ 𝑑 log 𝑙 πœπœ€ /𝜁 2 With probability at least 1 βˆ’ πœ€ : 1 βˆ’ 𝜁 cost π’Ÿ ≀ cost πœ’ π’Ÿ ≀ 1 + 𝜁 cost π’Ÿ for every clustering π’Ÿ = 𝐷 1 , … , 𝐷 𝑙 of π‘Œ

  11. Challenges Let π’Ÿ βˆ— be the optimal 𝑙 -means clustering. Easy: cost π’Ÿ βˆ— β‰ˆ cost πœ’(π’Ÿ βˆ— ) with probability 1 βˆ’ πœ€ Hard: Prove that there is no other clustering π’Ÿβ€² s.t. cost πœ’ π’Ÿ β€² < 1 βˆ’ 𝜁 cost π’Ÿ βˆ— since there are exponentially many clusterings π’Ÿβ€² (can’t use the union bound)

  12. Warm-up Consider a clustering π’Ÿ = (𝐷 1 , … , 𝐷 𝑙 ) . Write the cost in terms of pair-wise distances: 𝑙 1 𝑣 βˆ’ 𝑀 2 cost π’Ÿ = ෍ 2|𝐷 𝑗 | ෍ 𝑗=1 𝑣,π‘€βˆˆπ· 𝑗 all distances 𝑣 βˆ’ 𝑀 are preserved within 1 + 𝜁 ⇓ cost π’Ÿ is preserved within 1 + 𝜁 Sufficient to have 𝑒~ log π‘œ /𝜁 2

  13. Problem & Notation Assume that π’Ÿ = (𝐷 1 , … , 𝐷 𝑙 ) is a random clustering that depends on πœ’ . Want to prove: cost π’Ÿ β‰ˆ cost πœ’ π’Ÿ whp. The distance between 𝑣 and 𝑀 is (1 + 𝜁) -preserved or distorted depending on whether πœ’(𝑣) βˆ’ πœ’(𝑀) β‰ˆ 1+𝜁 𝑣 βˆ’ 𝑀 Think πœ€ = poly(1/𝑙, 𝜁) is sufficiently small.

  14. Distortion graph Connect 𝑣 and 𝑀 with an edge if the distance between them is distorted. + Every edge is present with probability at most πœ€ . βˆ’ Edges are not independent. βˆ’ π’Ÿ depends on the set of edges. βˆ’ May have high-degree vertices. βˆ’ All distances in a cluster may be distorted.

  15. Cost of a cluster The cost of 𝐷 𝑗 is 1 𝑣 βˆ’ 𝑀 2 2|𝐷 𝑗 | ෍ 𝑣,π‘€βˆˆπ· 𝑗 + Terms for non-edges (𝑣, 𝑀) are (1 + 𝜁) preserved. 𝑣 βˆ’ 𝑀 β‰ˆ πœ’ 𝑣 βˆ’ πœ’(𝑀) βˆ’ Need to prove that 𝑣 βˆ’ 𝑀 2 = πœ’ 𝑣 βˆ’ πœ’(𝑀) 2 Β± πœβ€²cost π’Ÿ ෍ ෍ 𝑣,π‘€βˆˆπ· 𝑗 𝑣,π‘€βˆˆπ· 𝑗 𝑣,𝑀 ∈𝐹 𝑣,𝑀 ∈𝐹

  16. Everywhere-sparse edges Assume every 𝑣 ∈ 𝐷 𝑗 is connected to at most a πœ„ fraction of all 𝑀 in 𝐷 𝑗 (where πœ„ β‰ͺ 𝜁 ).

  17. Everywhere-sparse edges + Terms for non-edges (𝑣, 𝑀) are (1 + 𝜁) preserved. + The contribution of terms for edges is small: for an edge 𝑣, 𝑀 and any π‘₯ ∈ 𝐷 𝑗 𝑣 βˆ’ 𝑀 ≀ 𝑣 βˆ’ π‘₯ + π‘₯ βˆ’ 𝑀 𝑣 βˆ’ 𝑀 2 ≀ 2 𝑣 βˆ’ π‘₯ 2 + π‘₯ βˆ’ 𝑀 2

  18. Everywhere-sparse edges 𝑣 βˆ’ 𝑀 2 ≀ 2 𝑣 βˆ’ π‘₯ 2 + π‘₯ βˆ’ 𝑀 2 β€’ Replace the term for every edge with two terms 𝑣 βˆ’ π‘₯ 2 , π‘₯ βˆ’ 𝑀 2 for random π‘₯ ∈ 𝐷 𝑗 . β€’ Each term is used at most 2πœ„ times, in expectation. 𝑣 βˆ’ 𝑀 2 ≀ 4πœ„ ෍ 𝑣 βˆ’ 𝑀 2 ෍ (𝑣,𝑀)∈𝐹 𝑣,π‘€βˆˆπ· 𝑗 𝑣,π‘€βˆˆπ· 𝑗

  19. Everywhere-sparse edges 𝑣 βˆ’ 𝑀 2 β‰ˆ 𝑣 βˆ’ 𝑀 2 ෍ ෍ 𝑣,π‘€βˆˆπ· 𝑗 𝑣,𝑀 βˆ‰πΉ β‰ˆ πœ’(𝑣) βˆ’ πœ’(𝑀) 2 β‰ˆ ෍ πœ’(𝑣) βˆ’ πœ’(𝑀) 2 ෍ (𝑣,𝑀)βˆ‰πΉ 𝑣,π‘€βˆˆπ· 𝑗

  20. Everywhere-sparse edges 𝑣 βˆ’ 𝑀 2 β‰ˆ 𝑣 βˆ’ 𝑀 2 ෍ ෍ 𝑣,π‘€βˆˆπ· 𝑗 𝑣,𝑀 βˆ‰πΉ β‰ˆ πœ’(𝑣) βˆ’ πœ’(𝑀) 2 β‰ˆ ෍ πœ’(𝑣) βˆ’ πœ’(𝑀) 2 ෍ (𝑣,𝑀)βˆ‰πΉ 𝑣,π‘€βˆˆπ· 𝑗 Edges are not necessarily everywhere sparse!

  21. Outliers Want: remove β€œoutliers” so that in the remaining set π‘Œβ€² edges are everywhere sparse in every cluster.

  22. (1 βˆ’ πœ„) non-distorted core Want: remove β€œoutliers” so that in the remaining set π‘Œβ€² edges are everywhere sparse in every cluster.

  23. (1 βˆ’ πœ„) non-distorted core Want: remove β€œoutliers” so that in the remaining set π‘Œβ€² edges are everywhere sparse in every cluster. Find a subset π‘Œ β€² βŠ‚ π‘Œ (which depends on π’Ÿ ) s.t. β€’ Edges are sparse in the obtained clusters: Every 𝑣 ∈ 𝐷 𝑗 ∩ π‘Œβ€² is connected to at most a πœ„ fraction of all 𝑀 in 𝐷 𝑗 ∩ π‘Œβ€² . β€’ Outliers are rare: For every 𝑣 , Pr 𝑣 βˆ‰ π‘Œ β€² ≀ πœ„

  24. All clusters are large Assume all clusters are of size ~π‘œ/𝑙 . Let πœ„ = πœ€ 1/4 . outliers = all vertices of degree at least ~πœ„π‘œ/𝑙 Every vertex has degree at most πœ€π‘œ in expectation. By Markov, Pr( 𝑣 is an outlier) ≀ πœ€π‘™ πœ„ ≀ πœ„ Remove πœ„π‘œ β‰ͺ π‘œ/𝑙 vertices in total, so all clusters still have size ~π‘œ/𝑙 . Crucially use that all clusters are large!

  25. Main Combinatorial Lemma Idea: assign β€œweights” to vertices so that all clusters have a large weight. β€’ There is a measure 𝜈 on π‘Œ and random set 𝑆 s.t. 1 𝐷 𝑗 βˆ–π‘† for 𝑦 ∈ 𝐷 𝑗 βˆ– 𝑆 (always) 𝜈 𝑦 β‰₯ β€’ 𝜈 π‘Œ ≀ 4𝑙 3 /πœ„ 2 β€’ Pr(𝑦 ∈ 𝑆) ≀ πœ„ All clusters 𝐷 𝑗 βˆ– 𝑆 are β€œlarge” w.r.t. measure 𝜈 . Can apply a variant of the previous argument.

  26. Edges Incident on Outliers Need to take care of edges incident on outliers. 𝑣 𝑀 𝑑 βˆ— Say, 𝑣 is an outlier and 𝑀 is not. βˆ— for π‘Œ . βˆ— , … , 𝐷 𝑙 Consider a fixed optimal clustering 𝐷 1 Let 𝑑 βˆ— be the optimal center for 𝑣 .

  27. Edges Incident on Outliers 𝑣 𝑀 𝑑 βˆ— 𝑀 βˆ’ 𝑑 βˆ— Β± 𝑑 βˆ— βˆ’ 𝑣 𝑣 βˆ’ 𝑀 = β‰ˆ πœ’(𝑀) βˆ’ πœ’(𝑑 βˆ— ) Β± πœ’(𝑑 βˆ— ) βˆ’ πœ’(𝑣) πœ’(𝑣) βˆ’ πœ’(𝑀) = May assume that the distances between non-outliers and the optimal centers are 1 + 𝜁 -preserved.

  28. Edges Incident on Outliers 𝑣 𝑀 𝑑 βˆ— 𝑀 βˆ’ 𝑑 βˆ— Β± 𝑑 βˆ— βˆ’ 𝑣 𝑣 βˆ’ 𝑀 = β‰ˆ πœ’(𝑀) βˆ’ πœ’(𝑑 βˆ— ) Β± πœ’(𝑑 βˆ— ) βˆ’ πœ’(𝑣) πœ’(𝑣) βˆ’ πœ’(𝑀) = βˆ— βˆ’ 𝑣 2 ] ≀ πœ„ Οƒ π‘£βˆˆπ‘Œ 𝑑 𝑣 βˆ— βˆ’ 𝑣 2 = πœ„ OPT 𝔽 [ Οƒ π‘£βˆ‰π‘Œ β€² 𝑑 𝑣

  29. Edges Incident on Outliers 𝑣 𝑀 𝑑 βˆ— 𝑀 βˆ’ 𝑑 βˆ— Β± 𝑑 βˆ— βˆ’ 𝑣 𝑣 βˆ’ 𝑀 = β‰ˆ πœ’(𝑀) βˆ’ πœ’(𝑑 βˆ— ) Β± πœ’(𝑑 βˆ— ) βˆ’ πœ’(𝑣) πœ’(𝑣) βˆ’ πœ’(𝑀) = Taking care of πœ’(𝑑 βˆ— ) βˆ’ πœ’(𝑣) is a bit more difficult. QED

  30. 𝑙 -medians under dimension reduction

  31. 𝑙 -medians βˆ’ No formula for the cost of the clustering in terms of pairwise distances. βˆ’ Not obvious when 𝑒 ~ log π‘œ (then all pairwise distances are approximately preserved). [was asked by Ravi Kannan in a tutorial @ Simons] + Kirzsbraun Theorem β‡’ the 𝑒~ log π‘œ case + Prove a Robust Kirzsbraun Theorem Our methods for 𝑙 -means + Robust Kirzsbraun β‡’ 𝑒~ log 𝑙 for 𝑙 -medians

Recommend


More recommend