Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi (MIT) Mohammad Mahdian (Google) Vahab S. Mirrokni (Google)
Core-Set Definition Setup β’ Set of π points πΈ in π -dimensional β space Optimize a function π β
Core-Set Definition Setup β’ Set of π points πΈ in π -dimensional β space Optimize a function π β π -Core-set: Small subset of points S β π β’ which suffices to π -approximate the optimal solution π πππ π β€ π πππ π β€ π πππ ( π ) β’ Maximization: π
Core-Set Definition Setup β’ Set of π points πΈ in π -dimensional β space Optimize a function π β π -Core-set: Small subset of points S β π β’ which suffices to π -approximate the optimal solution π πππ π β€ π πππ π β€ π πππ ( π ) β’ Maximization: π Example β’ β Optimization Function: Distance of the two farthest points
Core-Set Definition Setup β’ Set of π points πΈ in π -dimensional β space Optimize a function π β π -Core-set: Small subset of points S β π β’ which suffices to π -approximate the optimal solution π πππ π β€ π πππ π β€ π πππ ( π ) β’ Maximization: π Example β’ β Optimization Function: Distance of the two farthest points 1 -Core-set: Points on the convex hull. β
Composable Core-sets Setup β’ πΈ π , πΈ π , β¦ , πΈ π are set of points in β π -dimensional space Optimize a function π over their β union πΈ .
Composable Core-sets Setup β’ πΈ π , πΈ π , β¦ , πΈ π are set of points in β π -dimensional space Optimize a function π over their β union πΈ . π -Composable Core-sets: Subsets of β’ points S 1 β π 1 , S 2 β π 2 , β¦ , S m β π π points such that the solution of the union of the core-sets approximates the solution of the point sets. β’ Maximization : 1 π π πππ π 1 βͺ β― βͺ π π β€ π opt S 1 βͺ β― βͺ π π β€ π πππ ( π 1 βͺ β― βͺ π π )
Composable Core-sets Setup β’ πΈ π , πΈ π , β¦ , πΈ π are set of points in β π -dimensional space Optimize a function π over their β union πΈ . π -Composable Core-sets: Subsets of β’ points S 1 β π 1 , S 2 β π 2 , β¦ , S m β π π points such that the solution of the union of the core-sets approximates the solution of the point sets. β’ Maximization : 1 π π πππ π 1 βͺ β― βͺ π π β€ π opt S 1 βͺ β― βͺ π π β€ π πππ ( π 1 βͺ β― βͺ π π ) β’ Example: two farthest points
Composable Core-sets Setup β’ πΈ π , πΈ π , β¦ , πΈ π are set of points in β π -dimensional space Optimize a function π over their β union πΈ . π -Composable Core-sets: Subsets of β’ points S 1 β π 1 , S 2 β π 2 , β¦ , S m β π π points such that the solution of the union of the core-sets approximates the solution of the point sets. β’ Maximization : 1 π π πππ π 1 βͺ β― βͺ π π β€ π opt S 1 βͺ β― βͺ π π β€ π πππ ( π 1 βͺ β― βͺ π π ) β’ Example: two farthest points
Composable Core-sets Setup β’ πΈ π , πΈ π , β¦ , πΈ π are set of points in β π -dimensional space Optimize a function π over their β union πΈ . π -Composable Core-sets: Subsets of β’ points S 1 β π 1 , S 2 β π 2 , β¦ , S m β π π points such that the solution of the union of the core-sets approximates the solution of the point sets. β’ Maximization : 1 π π πππ π 1 βͺ β― βͺ π π β€ π opt S 1 βͺ β― βͺ π π β€ π πππ ( π 1 βͺ β― βͺ π π ) β’ Example: two farthest points
Applications β Streaming Computation β’ Streaming Computation: Processing sequence of π data elements βon the flyβ β β limited Storage
Applications β Streaming Computation β’ Streaming Computation: Processing sequence of π data elements βon the flyβ β β limited Storage π -Composable Core-set of size π β’ Chunks of size ππ , thus number of chunks = π / π β
Applications β Streaming Computation β’ Streaming Computation: Processing sequence of π data elements βon the flyβ β β limited Storage π -Composable Core-set of size π β’ Chunks of size ππ , thus number of chunks = π / π β β Core-set for each chunk Total Space: π π / π + ππ = π ( ππ ) β Approximation Factor: π β
Applications β Distributed Systems β’ Streaming Computation Distributed System: β’ Each machine holds a block of data. β A composable core-set is computed and sent to the server β
Applications β Distributed Systems β’ Streaming Computation Distributed System: β’ Each machine holds a block of data. β A composable core-set is computed and sent to the server β β’ Map-Reduce Model: β’ One round of Map-Reduce π / π mappers each getting ππ points β’ Mapper computes a composable core-set of size π β’ Will be passed to a single reducer β’
Applications β Similarity Search β’ Streaming Computation Distributed System β’ β’ Similarity Search: Small output size
Applications β Similarity Search β’ Streaming Computation Distributed System β’ β’ Similarity Search: Small output size Good to have result from each β’ cluster: relevant and diverse
Applications β Similarity Search β’ Streaming Computation Distributed System β’ β’ Similarity Search: Small output size Good to have result from each β’ cluster: relevant and diverse β’ Diverse Near Neighbor Problem [ Abbar, Amer-Yahia, Indyk, Mahabadi WWWβ13] [Abbar, Amer-Yahia, Indyk, Mahabadi, Varadarajan, SoCGβ13]
Applications β Similarity Search β’ Streaming Computation Distributed System β’ β’ Similarity Search: Small output size Good to have result from each β’ cluster: relevant and diverse β’ Diverse Near Neighbor Problem [ Abbar, Amer-Yahia, Indyk, Mahabadi WWWβ13] [Abbar, Amer-Yahia, Indyk, Mahabadi, Varadarajan, SoCGβ13] uses Locality Sensitive Hashing β (LSH) and Composable Core- sets techniques.
Diversity Maximization Problem A set of π points π in metric space β’ ( Ξ , ππππ ) Optimization Problem: β’ Find a subset of π points π which β maximizes Diversity k=4 n = 6
Diversity Maximization Problem A set of π points π in metric space β’ ( Ξ , ππππ ) Optimization Problem: β’ Find a subset of π points π which β maximizes Diversity Diversity: β’ β Minimum pairwise distance (Remote Edge) k=4 n = 6
Diversity Maximization Problem A set of π points π in metric space β’ ( Ξ , ππππ ) Optimization Problem: β’ Find a subset of π points π which β maximizes Diversity Diversity: β’ β Minimum pairwise distance (Remote Edge) Sum of Pairwise distances (Remote β k=4 Clique) n = 6
Diversity Maximization Problem A set of π points π in metric space β’ ( Ξ , ππππ ) Optimization Problem: β’ Find a subset of π points π which β maximizes Diversity Diversity: β’ β Minimum pairwise distance (Remote Edge) Sum of Pairwise distances (Remote β k=4 Clique) n = 6 Long list of variants [Chandra and β’ Halldorsson β01]
Diversity Functions Diversity function over Description a set π of π point Minimum Pairwise Distance: min π , πβπ ππππ ( π , π ) Remote-edge Sum of Pairwise Distances : β ππππ ( π , π ) Remote-clique π , πβπ Weight of Minimum Spanning Tree (MST) of the set π Remote-tree Weight of minimum Traveling Salesman Tour (TSP) of the set π Remote-cycle πβπ β Weight of minimum star: min ππππ ( π , π ) Remote-star πβπ Remote-Pseudoforest Sum of the distance of each point to its nearest neighbor β min πβπ ππππ ( π , π ) πβπ Weight of minimum perfect Matching of the set π Remote-Matching Max-Coverage How well the points cover each coordinate π οΏ½ max πβπ π π π=1
Recommend
More recommend