Intro Contribution Sketching Diagnostics Evaluation Conclusion Rehashing Kernel Evaluation in High Dimensions Paris Siminelakis* Ph.D. Candidate Kexin Rong*, Peter Bailis, Moses Charikar, Phillip Levis (Stanford University) ICML @ Long Beach, California June 11, 2019 * equal contribution.
Intro Contribution Sketching Diagnostics Evaluation Conclusion Kernel Density Function P = { x 1 , . . . , x n } ⊂ R d , k : R d × R d → R + , u ≥ 0, query point q n KDF u � P ( q ) = u i k ( x i , q ) i =1 n points
Intro Contribution Sketching Diagnostics Evaluation Conclusion Kernel Density Function P = { x 1 , . . . , x n } ⊂ R d , k : R d × R d → R + , u ≥ 0, query point q n KDF u � P ( q ) = u i k ( x i , q ) i =1 n points kernel k
Intro Contribution Sketching Diagnostics Evaluation Conclusion Kernel Density Function P = { x 1 , . . . , x n } ⊂ R d , k : R d × R d → R + , u ≥ 0, query point q n KDF u � P ( q ) = u i k ( x i , q ) i =1 n points kernel k
Intro Contribution Sketching Diagnostics Evaluation Conclusion Kernel Density Function P = { x 1 , . . . , x n } ⊂ R d , k : R d × R d → R + , u ≥ 0, query point q n � 1 � � KDF P ( q ) = k ( x i , q ) n i =1 n points kernel k
Intro Contribution Sketching Diagnostics Evaluation Conclusion Kernel Density Evaluation P = { x 1 , . . . , x n } ⊂ R d , k : R d × R d → R + , u ≥ 0, query point q n KDF u � P ( q ) = u i k ( x i , q ) i =1 Where is it used? 1 Non-parametric density estimation KDF P ( q ) 2 Kernel methods f ( x ) = � i α i φ ( � x − x i � ) 3 Comparing point sets (distributions) with “Kernel Distance”
Intro Contribution Sketching Diagnostics Evaluation Conclusion Kernel Density Evaluation P = { x 1 , . . . , x n } ⊂ R d , k : R d × R d → R + , u ≥ 0, query point q n KDF u � P ( q ) = u i k ( x i , q ) i =1 Where is it used? 1 Non-parametric density estimation KDF P ( q ) 2 Kernel methods f ( x ) = � i α i φ ( � x − x i � ) 3 Comparing point sets (distributions) with “Kernel Distance”
Intro Contribution Sketching Diagnostics Evaluation Conclusion Kernel Density Evaluation P = { x 1 , . . . , x n } ⊂ R d , k : R d × R d → R + , u ≥ 0, query point q n KDF u � P ( q ) = u i k ( x i , q ) i =1 Where is it used? 1 Non-parametric density estimation KDF P ( q ) 2 Kernel methods f ( x ) = � i α i φ ( � x − x i � ) 3 Comparing point sets (distributions) with “Kernel Distance”
Intro Contribution Sketching Diagnostics Evaluation Conclusion Kernel Density Evaluation P = { x 1 , . . . , x n } ⊂ R d , k : R d × R d → R + , u ≥ 0, query point q n KDF u � P ( q ) = u i k ( x i , q ) i =1 Where is it used? 1 Non-parametric density estimation KDF P ( q ) 2 Kernel methods f ( x ) = � i α i φ ( � x − x i � ) 3 Comparing point sets (distributions) with “Kernel Distance”
Intro Contribution Sketching Diagnostics Evaluation Conclusion Kernel Density Evaluation P = { x 1 , . . . , x n } ⊂ R d , k : R d × R d → R + , u ≥ 0, query point q n � KDF u P ( q ) = u i k ( x i , q ) i =1 Where is it used? 1 Non-parametric density estimation KDF P ( q ) 2 Kernel methods f ( x ) = � i α i φ ( � x − x i � ) 3 Comparing point sets (distributions) with “Kernel Distance” Evaluating at a single point requires O ( n )
Intro Contribution Sketching Diagnostics Evaluation Conclusion Kernel Density Evaluation P = { x 1 , . . . , x n } ⊂ R d , k : R d × R d → R + , u ≥ 0, query point q n � KDF u P ( q ) = u i k ( x i , q ) i =1 Where is it used? 1 Non-parametric density estimation KDF P ( q ) 2 Kernel methods f ( x ) = � i α i φ ( � x − x i � ) 3 Comparing point sets (distributions) with “Kernel Distance” How fast can we approximate KDF ?
Intro Contribution Sketching Diagnostics Evaluation Conclusion Methods for Fast Kernel Evaluation P ⊂ R d , ǫ > 0 ⇒ (1 ± ǫ )-approx to µ := KDF P ( q ) for any q ∈ R d
Intro Contribution Sketching Diagnostics Evaluation Conclusion Methods for Fast Kernel Evaluation P ⊂ R d , ǫ > 0 ⇒ (1 ± ǫ )-approx to µ := KDF P ( q ) for any q ∈ R d Space Partitions log(1 /µǫ ) O ( d ) FMM [Greengard, Rokhlin’87] Dual-Tree [Lee, Gray, Moore’06] FIG-Tree [Moriaru et al. NeurIPS’09] Slow in high dim
Intro Contribution Sketching Diagnostics Evaluation Conclusion Methods for Fast Kernel Evaluation P ⊂ R d , ǫ > 0 ⇒ (1 ± ǫ )-approx to µ := KDF P ( q ) for any q ∈ R d Space Partitions log(1 /µǫ ) O ( d ) img: computer.org Slow in high dim
Intro Contribution Sketching Diagnostics Evaluation Conclusion Methods for Fast Kernel Evaluation P ⊂ R d , ǫ > 0 ⇒ (1 ± ǫ )-approx to µ := KDF P ( q ) for any q ∈ R d Random Sampling Space Partitions 1 /µǫ 2 log(1 /µǫ ) O ( d ) img: computer.org Slow in high dim Linear in 1 /µ
Intro Contribution Sketching Diagnostics Evaluation Conclusion Methods for Fast Kernel Evaluation P ⊂ R d , ǫ > 0 ⇒ (1 ± ǫ )-approx to µ := KDF P ( q ) for any q ∈ R d Random Sampling Space Partitions Hashing O (1 / √ µǫ 2 ) 1 /µǫ 2 log(1 /µǫ ) O ( d ) Hashing-Based- Estimators [Charikar, S ’17] Similar idea: Locality Senstive Samplers [Spring, Shrivastava ’17] img: computer.org Slow in high dim Sub-linear in 1 /µ Linear in 1 /µ
Intro Contribution Sketching Diagnostics Evaluation Conclusion Methods for Fast Kernel Evaluation P ⊂ R d , ǫ > 0 ⇒ (1 ± ǫ )-approx to µ := KDF P ( q ) for any q ∈ R d Random Sampling Space Partitions Hashing O (1 / √ µǫ 2 ) 1 /µǫ 2 log(1 /µǫ ) O ( d ) Importance Sampling via Randomized Space Partitions img: computer.org Slow in high dim Sub-linear in 1 /µ Linear in 1 /µ
Intro Contribution Sketching Diagnostics Evaluation Conclusion Randomized Space Partitions Distribution H over partitions h : R d → [ M ]
Intro Contribution Sketching Diagnostics Evaluation Conclusion Randomized Space Partitions Distribution H over partitions h : R d → [ M ] h 1 h 2 h 3 h 4 h 5 h 6
Intro Contribution Sketching Diagnostics Evaluation Conclusion Locality Sensitive Hashing Partitions H such P h ∼H [ h ( x ) = h ( y )] = p ( � x − y � ) Euclidean LSH [Datar, Immorlika, Indyk, Mirrokni’04] Concatenate k hashes p k ( � x − y � )
Intro Contribution Sketching Diagnostics Evaluation Conclusion Hashing-Based-Estimators [Charikar, S. FOCS’17] Preprocess: Sample h 1 , . . . , h m ∼ H and evaluate on P Query: H t ( q ) hash-bucket for q in table t
Intro Contribution Sketching Diagnostics Evaluation Conclusion Hashing-Based-Estimators [Charikar, S. FOCS’17] Preprocess: Sample h 1 , . . . , h m ∼ H and evaluate on P Query: H t ( q ) hash-bucket for q in table t
Intro Contribution Sketching Diagnostics Evaluation Conclusion Hashing-Based-Estimators [Charikar, S. FOCS’17] Preprocess: Sample h 1 , . . . , h m ∼ H and evaluate on P Query: H t ( q ) hash-bucket for q in table t · · ·
Intro Contribution Sketching Diagnostics Evaluation Conclusion Hashing-Based-Estimators [Charikar, S. FOCS’17] Preprocess: Sample h 1 , . . . , h m ∼ H and evaluate on P Query: H t ( q ) hash-bucket for q in table t · · · Estimator: Sample random point X t from H t ( q ) and return: m Z m = 1 1 k ( X t , q ) � m n p ( X t , q ) / | H t ( q ) | t =1
Intro Contribution Sketching Diagnostics Evaluation Conclusion Hashing-Based-Estimators [Charikar, S. FOCS’17] Preprocess: Sample h 1 , . . . , h m ∼ H and evaluate on P Query: H t ( q ) hash-bucket for q in table t · · · Estimator: Sample random point X t from H t ( q ) and return: m Z m = 1 1 k ( X t , q ) � p ( X t , q ) / | H t ( q ) | m n t =1 How many samples m ? which LSH?
Intro Contribution Sketching Diagnostics Evaluation Conclusion Hashing-Based-Estimators have Practical Limitations Theorem [Charikar, S. FOCS’17] For certain kernels HBE solves the kernel evaluation problem for µ ≥ τ using O (1 / √ µǫ 2 ) samples and O ( n / √ τǫ 2 ) space. Kernel LSH Overhead 2 e ˜ e −� x − y � 2 3 ( n )) O (log Ball Carving [Andoni, Indyk’06] √ e e −� x − y � Euclidean [Datar et al’04] 1 3 t / 2 Euclidean [Datar et al’04] 1+ � x − y � t 2
Intro Contribution Sketching Diagnostics Evaluation Conclusion Hashing-Based-Estimators have Practical Limitations Theorem [Charikar, S. FOCS’17] For certain kernels HBE solves the kernel evaluation problem for µ ≥ τ using O (1 / √ µǫ 2 ) samples and O ( n / √ τǫ 2 ) space. Practical Limitations: 1 Super-linear Space ⇒ Not practical for massive datasets 2 Uses Adaptive procedure to estim. number of samples: ⇒ large-constant + stringent requirements on hash functions. 2 3 Gaussian kernel Ball-Carving LSH very slow e ˜ 3 ( n )) O (log
Intro Contribution Sketching Diagnostics Evaluation Conclusion Hashing-Based-Estimators have Practical Limitations Theorem [Charikar, S. FOCS’17] For certain kernels HBE solves the kernel evaluation problem for µ ≥ τ using O (1 / √ µǫ 2 ) samples and O ( n / √ τǫ 2 ) space. Practical Limitations: 1 Super-linear Space ⇒ Not practical for massive datasets 2 Uses Adaptive procedure to estim. number of samples: ⇒ large-constant + stringent requirements on hash functions. 2 3 Gaussian kernel Ball-Carving LSH very slow e ˜ 3 ( n )) O (log
Recommend
More recommend