composable core sets for diversity and coverage
play

Composable Core-sets for Diversity and Coverage Maximization Piotr - PowerPoint PPT Presentation

Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi (MIT) Mohammad Mahdian (Google) Vahab S. Mirrokni (Google) Core-Set Definition Setup Set of points in -dimensional


  1. Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi (MIT) Mohammad Mahdian (Google) Vahab S. Mirrokni (Google)

  2. Core-Set Definition Setup β€’ Set of π‘œ points 𝑸 in 𝑒 -dimensional – space Optimize a function 𝑔 –

  3. Core-Set Definition Setup β€’ Set of π‘œ points 𝑸 in 𝑒 -dimensional – space Optimize a function 𝑔 – 𝒅 -Core-set: Small subset of points S βŠ‚ 𝑄 β€’ which suffices to 𝑑 -approximate the optimal solution 𝑔 𝑝𝑝𝑝 𝑄 ≀ 𝑔 𝑝𝑝𝑝 𝑇 ≀ 𝑔 𝑝𝑝𝑝 ( 𝑄 ) β€’ Maximization: 𝑑

  4. Core-Set Definition Setup β€’ Set of π‘œ points 𝑸 in 𝑒 -dimensional – space Optimize a function 𝑔 – 𝒅 -Core-set: Small subset of points S βŠ‚ 𝑄 β€’ which suffices to 𝑑 -approximate the optimal solution 𝑔 𝑝𝑝𝑝 𝑄 ≀ 𝑔 𝑝𝑝𝑝 𝑇 ≀ 𝑔 𝑝𝑝𝑝 ( 𝑄 ) β€’ Maximization: 𝑑 Example β€’ – Optimization Function: Distance of the two farthest points

  5. Core-Set Definition Setup β€’ Set of π‘œ points 𝑸 in 𝑒 -dimensional – space Optimize a function 𝑔 – 𝒅 -Core-set: Small subset of points S βŠ‚ 𝑄 β€’ which suffices to 𝑑 -approximate the optimal solution 𝑔 𝑝𝑝𝑝 𝑄 ≀ 𝑔 𝑝𝑝𝑝 𝑇 ≀ 𝑔 𝑝𝑝𝑝 ( 𝑄 ) β€’ Maximization: 𝑑 Example β€’ – Optimization Function: Distance of the two farthest points 1 -Core-set: Points on the convex hull. –

  6. Composable Core-sets Setup β€’ 𝑸 𝟐 , 𝑸 πŸ‘ , … , 𝑸 𝒏 are set of points in – 𝑒 -dimensional space Optimize a function 𝑔 over their – union 𝑸 .

  7. Composable Core-sets Setup β€’ 𝑸 𝟐 , 𝑸 πŸ‘ , … , 𝑸 𝒏 are set of points in – 𝑒 -dimensional space Optimize a function 𝑔 over their – union 𝑸 . 𝒅 -Composable Core-sets: Subsets of β€’ points S 1 βŠ‚ 𝑄 1 , S 2 βŠ‚ 𝑄 2 , … , S m βŠ‚ 𝑄 𝑛 points such that the solution of the union of the core-sets approximates the solution of the point sets. β€’ Maximization : 1 𝑑 𝑔 𝑝𝑝𝑝 𝑄 1 βˆͺ β‹― βˆͺ 𝑄 𝑛 ≀ 𝑔 opt S 1 βˆͺ β‹― βˆͺ 𝑇 𝑛 ≀ 𝑔 𝑝𝑝𝑝 ( 𝑄 1 βˆͺ β‹― βˆͺ 𝑄 𝑛 )

  8. Composable Core-sets Setup β€’ 𝑸 𝟐 , 𝑸 πŸ‘ , … , 𝑸 𝒏 are set of points in – 𝑒 -dimensional space Optimize a function 𝑔 over their – union 𝑸 . 𝒅 -Composable Core-sets: Subsets of β€’ points S 1 βŠ‚ 𝑄 1 , S 2 βŠ‚ 𝑄 2 , … , S m βŠ‚ 𝑄 𝑛 points such that the solution of the union of the core-sets approximates the solution of the point sets. β€’ Maximization : 1 𝑑 𝑔 𝑝𝑝𝑝 𝑄 1 βˆͺ β‹― βˆͺ 𝑄 𝑛 ≀ 𝑔 opt S 1 βˆͺ β‹― βˆͺ 𝑇 𝑛 ≀ 𝑔 𝑝𝑝𝑝 ( 𝑄 1 βˆͺ β‹― βˆͺ 𝑄 𝑛 ) β€’ Example: two farthest points

  9. Composable Core-sets Setup β€’ 𝑸 𝟐 , 𝑸 πŸ‘ , … , 𝑸 𝒏 are set of points in – 𝑒 -dimensional space Optimize a function 𝑔 over their – union 𝑸 . 𝒅 -Composable Core-sets: Subsets of β€’ points S 1 βŠ‚ 𝑄 1 , S 2 βŠ‚ 𝑄 2 , … , S m βŠ‚ 𝑄 𝑛 points such that the solution of the union of the core-sets approximates the solution of the point sets. β€’ Maximization : 1 𝑑 𝑔 𝑝𝑝𝑝 𝑄 1 βˆͺ β‹― βˆͺ 𝑄 𝑛 ≀ 𝑔 opt S 1 βˆͺ β‹― βˆͺ 𝑇 𝑛 ≀ 𝑔 𝑝𝑝𝑝 ( 𝑄 1 βˆͺ β‹― βˆͺ 𝑄 𝑛 ) β€’ Example: two farthest points

  10. Composable Core-sets Setup β€’ 𝑸 𝟐 , 𝑸 πŸ‘ , … , 𝑸 𝒏 are set of points in – 𝑒 -dimensional space Optimize a function 𝑔 over their – union 𝑸 . 𝒅 -Composable Core-sets: Subsets of β€’ points S 1 βŠ‚ 𝑄 1 , S 2 βŠ‚ 𝑄 2 , … , S m βŠ‚ 𝑄 𝑛 points such that the solution of the union of the core-sets approximates the solution of the point sets. β€’ Maximization : 1 𝑑 𝑔 𝑝𝑝𝑝 𝑄 1 βˆͺ β‹― βˆͺ 𝑄 𝑛 ≀ 𝑔 opt S 1 βˆͺ β‹― βˆͺ 𝑇 𝑛 ≀ 𝑔 𝑝𝑝𝑝 ( 𝑄 1 βˆͺ β‹― βˆͺ 𝑄 𝑛 ) β€’ Example: two farthest points

  11. Applications – Streaming Computation β€’ Streaming Computation: Processing sequence of π‘œ data elements β€œon the fly” – – limited Storage

  12. Applications – Streaming Computation β€’ Streaming Computation: Processing sequence of π‘œ data elements β€œon the fly” – – limited Storage 𝒅 -Composable Core-set of size 𝒍 β€’ Chunks of size π‘œπ‘œ , thus number of chunks = π‘œ / π‘œ –

  13. Applications – Streaming Computation β€’ Streaming Computation: Processing sequence of π‘œ data elements β€œon the fly” – – limited Storage 𝒅 -Composable Core-set of size 𝒍 β€’ Chunks of size π‘œπ‘œ , thus number of chunks = π‘œ / π‘œ – – Core-set for each chunk Total Space: π‘œ π‘œ / π‘œ + π‘œπ‘œ = 𝑃 ( π‘œπ‘œ ) – Approximation Factor: 𝑑 –

  14. Applications – Distributed Systems β€’ Streaming Computation Distributed System: β€’ Each machine holds a block of data. – A composable core-set is computed and sent to the server –

  15. Applications – Distributed Systems β€’ Streaming Computation Distributed System: β€’ Each machine holds a block of data. – A composable core-set is computed and sent to the server – β€’ Map-Reduce Model: β€’ One round of Map-Reduce π‘œ / π‘œ mappers each getting π‘œπ‘œ points β€’ Mapper computes a composable core-set of size π‘œ β€’ Will be passed to a single reducer β€’

  16. Applications – Similarity Search β€’ Streaming Computation Distributed System β€’ β€’ Similarity Search: Small output size

  17. Applications – Similarity Search β€’ Streaming Computation Distributed System β€’ β€’ Similarity Search: Small output size Good to have result from each β€’ cluster: relevant and diverse

  18. Applications – Similarity Search β€’ Streaming Computation Distributed System β€’ β€’ Similarity Search: Small output size Good to have result from each β€’ cluster: relevant and diverse β€’ Diverse Near Neighbor Problem [ Abbar, Amer-Yahia, Indyk, Mahabadi WWW’13] [Abbar, Amer-Yahia, Indyk, Mahabadi, Varadarajan, SoCG’13]

  19. Applications – Similarity Search β€’ Streaming Computation Distributed System β€’ β€’ Similarity Search: Small output size Good to have result from each β€’ cluster: relevant and diverse β€’ Diverse Near Neighbor Problem [ Abbar, Amer-Yahia, Indyk, Mahabadi WWW’13] [Abbar, Amer-Yahia, Indyk, Mahabadi, Varadarajan, SoCG’13] uses Locality Sensitive Hashing – (LSH) and Composable Core- sets techniques.

  20. Diversity Maximization Problem A set of π‘œ points 𝑄 in metric space β€’ ( Ξ” , 𝑒𝑒𝑒𝑒 ) Optimization Problem: β€’ Find a subset of π‘œ points 𝑇 which – maximizes Diversity k=4 n = 6

  21. Diversity Maximization Problem A set of π‘œ points 𝑄 in metric space β€’ ( Ξ” , 𝑒𝑒𝑒𝑒 ) Optimization Problem: β€’ Find a subset of π‘œ points 𝑇 which – maximizes Diversity Diversity: β€’ – Minimum pairwise distance (Remote Edge) k=4 n = 6

  22. Diversity Maximization Problem A set of π‘œ points 𝑄 in metric space β€’ ( Ξ” , 𝑒𝑒𝑒𝑒 ) Optimization Problem: β€’ Find a subset of π‘œ points 𝑇 which – maximizes Diversity Diversity: β€’ – Minimum pairwise distance (Remote Edge) Sum of Pairwise distances (Remote – k=4 Clique) n = 6

  23. Diversity Maximization Problem A set of π‘œ points 𝑄 in metric space β€’ ( Ξ” , 𝑒𝑒𝑒𝑒 ) Optimization Problem: β€’ Find a subset of π‘œ points 𝑇 which – maximizes Diversity Diversity: β€’ – Minimum pairwise distance (Remote Edge) Sum of Pairwise distances (Remote – k=4 Clique) n = 6 Long list of variants [Chandra and β€’ Halldorsson β€˜01]

  24. Diversity Functions Diversity function over Description a set 𝑇 of π‘œ point Minimum Pairwise Distance: min 𝑝 , π‘Ÿβˆˆπ‘‡ 𝑒𝑒𝑒𝑒 ( π‘ž , π‘Ÿ ) Remote-edge Sum of Pairwise Distances : βˆ‘ 𝑒𝑒𝑒𝑒 ( π‘ž , π‘Ÿ ) Remote-clique 𝑝 , π‘Ÿβˆˆπ‘‡ Weight of Minimum Spanning Tree (MST) of the set 𝑇 Remote-tree Weight of minimum Traveling Salesman Tour (TSP) of the set 𝑇 Remote-cycle π‘βˆˆπ‘‡ βˆ‘ Weight of minimum star: min 𝑒𝑒𝑒𝑒 ( π‘ž , π‘Ÿ ) Remote-star π‘Ÿβˆˆπ‘‡ Remote-Pseudoforest Sum of the distance of each point to its nearest neighbor βˆ‘ min π‘Ÿβˆˆπ‘‡ 𝑒𝑒𝑒𝑒 ( π‘ž , π‘Ÿ ) π‘βˆˆπ‘‡ Weight of minimum perfect Matching of the set 𝑇 Remote-Matching Max-Coverage How well the points cover each coordinate 𝑒 οΏ½ max π‘βˆˆπ‘‡ π‘ž 𝑗 𝑗=1

Recommend


More recommend