Understanding Sparse JL for Feature Hashing Meena Jagadeesan - PowerPoint PPT Presentation

Understanding Sparse JL for Feature Hashing Meena Jagadeesan Harvard University (Class of 2020) NeurIPS 2019 (Poster #59) Understanding Sparse JL for Feature Hashing Meena Jagadeesan

Dimensionality reduction ( ℓ 2 -to- ℓ 2 ) A randomized map R n → R m (where m ≪ n ) that preserves distances. Understanding Sparse JL for Feature Hashing Meena Jagadeesan

Dimensionality reduction ( ℓ 2 -to- ℓ 2 ) A randomized map R n → R m (where m ≪ n ) that preserves distances. A pre-processing step in many applications: clustering nearest neighbors Understanding Sparse JL for Feature Hashing Meena Jagadeesan

Dimensionality reduction ( ℓ 2 -to- ℓ 2 ) A randomized map R n → R m (where m ≪ n ) that preserves distances. A pre-processing step in many applications: clustering nearest neighbors Key question: What is the tradeoff between the dimension m , the performance in distance preservation, and the projection time? Understanding Sparse JL for Feature Hashing Meena Jagadeesan

Dimensionality reduction ( ℓ 2 -to- ℓ 2 ) A randomized map R n → R m (where m ≪ n ) that preserves distances. A pre-processing step in many applications: clustering nearest neighbors Key question: What is the tradeoff between the dimension m , the performance in distance preservation, and the projection time? This paper: A theoretical analysis of this tradeoff for a state-of-the-art dimensionality reduction scheme on feature vectors. Understanding Sparse JL for Feature Hashing Meena Jagadeesan

Feature hashing (Weinberger et al. ’09) One standard dimensionality reduction scheme is feature hashing. Understanding Sparse JL for Feature Hashing Meena Jagadeesan

Feature hashing (Weinberger et al. ’09) One standard dimensionality reduction scheme is feature hashing. Use a hash function h : { 1 , . . . , n } → { 1 , . . . , m } on coordinates. Use random signs to handle collisions: f ( x ) i = � j ∈ h − 1 ( i ) σ j x j . Understanding Sparse JL for Feature Hashing Meena Jagadeesan

Sparse Johnson-Lindenstrauss transform (KN ’12) Sparse JL is a state-of-the-art sparse dimensionality reduction. Understanding Sparse JL for Feature Hashing Meena Jagadeesan

Sparse Johnson-Lindenstrauss transform (KN ’12) Sparse JL is a state-of-the-art sparse dimensionality reduction. Use many (anti-correlated) hash fns h 1 , . . . , h s : { 1 , . . . , n } → { 1 , . . . , m } . = ⇒ Each input coordinate is mapped to s output coordinates. Understanding Sparse JL for Feature Hashing Meena Jagadeesan

Sparse Johnson-Lindenstrauss transform (KN ’12) Sparse JL is a state-of-the-art sparse dimensionality reduction. Use many (anti-correlated) hash fns h 1 , . . . , h s : { 1 , . . . , n } → { 1 , . . . , m } . = ⇒ Each input coordinate is mapped to s output coordinates. Use random signs to deal with collisions. �� s 1 ( i ) σ k That is: f ( x ) i = j x j . √ s j ∈ h − 1 k =1 k Understanding Sparse JL for Feature Hashing Meena Jagadeesan

Sparse Johnson-Lindenstrauss transform (KN ’12) Sparse JL is a state-of-the-art sparse dimensionality reduction. Use many (anti-correlated) hash fns h 1 , . . . , h s : { 1 , . . . , n } → { 1 , . . . , m } . = ⇒ Each input coordinate is mapped to s output coordinates. Use random signs to deal with collisions. �� s 1 ( i ) σ k That is: f ( x ) i = j x j . √ s j ∈ h − 1 k =1 k (Alternate view: a random sparse matrix w/ s nonzero entries per column.) Understanding Sparse JL for Feature Hashing Meena Jagadeesan

Sparse Johnson-Lindenstrauss transform (KN ’12) Sparse JL is a state-of-the-art sparse dimensionality reduction. Use many (anti-correlated) hash fns h 1 , . . . , h s : { 1 , . . . , n } → { 1 , . . . , m } . = ⇒ Each input coordinate is mapped to s output coordinates. Use random signs to deal with collisions. �� s 1 ( i ) σ k That is: f ( x ) i = j x j . √ s j ∈ h − 1 k =1 k (Alternate view: a random sparse matrix w/ s nonzero entries per column.) The tradeoff : higher s preserves distances better, but takes longer. Understanding Sparse JL for Feature Hashing Meena Jagadeesan

Sparse Johnson-Lindenstrauss transform (KN ’12) Sparse JL is a state-of-the-art sparse dimensionality reduction. Use many (anti-correlated) hash fns h 1 , . . . , h s : { 1 , . . . , n } → { 1 , . . . , m } . = ⇒ Each input coordinate is mapped to s output coordinates. Use random signs to deal with collisions. �� s 1 ( i ) σ k That is: f ( x ) i = j x j . √ s j ∈ h − 1 k =1 k (Alternate view: a random sparse matrix w/ s nonzero entries per column.) The tradeoff : higher s preserves distances better, but takes longer. This work Analysis of tradeoff for sparse JL between # of hash functions s, dimension m, and performance in ℓ 2 -distance preservation. Understanding Sparse JL for Feature Hashing Meena Jagadeesan

Intuition for this paper Analysis of sparse JL with respect to a performance measure: Understanding Sparse JL for Feature Hashing Meena Jagadeesan

Traditional mathematical framework Consider a probability distribution F over linear maps f : R n → R m . Understanding Sparse JL for Feature Hashing Meena Jagadeesan

Traditional mathematical framework Consider a probability distribution F over linear maps f : R n → R m . Geometry-preserving condition. For each x ∈ R n : P f ∈F [ � f ( x ) � 2 ∈ (1 ± ǫ ) � x � 2 ] > 1 − δ, for ǫ target error, δ target failure probability. Understanding Sparse JL for Feature Hashing Meena Jagadeesan

Traditional mathematical framework Consider a probability distribution F over linear maps f : R n → R m . Geometry-preserving condition. For each x ∈ R n : P f ∈F [ � f ( x ) � 2 ∈ (1 ± ǫ ) � x � 2 ] > 1 − δ, for ǫ target error, δ target failure probability. (Can apply to differences x = x 1 − x 2 since f is linear.) Understanding Sparse JL for Feature Hashing Meena Jagadeesan

Traditional mathematical framework Consider a probability distribution F over linear maps f : R n → R m . Geometry-preserving condition. For each x ∈ R n : P f ∈F [ � f ( x ) � 2 ∈ (1 ± ǫ ) � x � 2 ] > 1 − δ, for ǫ target error, δ target failure probability. (Can apply to differences x = x 1 − x 2 since f is linear.) Sparse JL can sometimes perform much better in practice on feature vectors than traditional theory suggests ... Understanding Sparse JL for Feature Hashing Meena Jagadeesan

Performance on feature vectors (Weinberger et al. ’09) Consider vectors w/ small ℓ ∞ -to- ℓ 2 norm ratio: S v = { x ∈ R n | � x � ∞ ≤ v � x � 2 } . Understanding Sparse JL for Feature Hashing Meena Jagadeesan

Performance on feature vectors (Weinberger et al. ’09) Consider vectors w/ small ℓ ∞ -to- ℓ 2 norm ratio: S v = { x ∈ R n | � x � ∞ ≤ v � x � 2 } . Let F s , m be the distribution given by sparse JL with parameters s and m . Understanding Sparse JL for Feature Hashing Meena Jagadeesan

Performance on feature vectors (Weinberger et al. ’09) Consider vectors w/ small ℓ ∞ -to- ℓ 2 norm ratio: S v = { x ∈ R n | � x � ∞ ≤ v � x � 2 } . Let F s , m be the distribution given by sparse JL with parameters s and m . Definition v ( m , ǫ, δ, s ) is the supremum over v ∈ [0 , 1] such that: P f ∈F s , m [ � f ( x ) � 2 ∈ (1 ± ǫ ) � x � 2 ] > 1 − δ holds for each x ∈ S v . Understanding Sparse JL for Feature Hashing Meena Jagadeesan

Performance on feature vectors (Weinberger et al. ’09) Consider vectors w/ small ℓ ∞ -to- ℓ 2 norm ratio: S v = { x ∈ R n | � x � ∞ ≤ v � x � 2 } . Let F s , m be the distribution given by sparse JL with parameters s and m . Definition v ( m , ǫ, δ, s ) is the supremum over v ∈ [0 , 1] such that: P f ∈F s , m [ � f ( x ) � 2 ∈ (1 ± ǫ ) � x � 2 ] > 1 − δ holds for each x ∈ S v . ◮ v ( m , ǫ, δ, s ) = 0 = ⇒ poor performance ◮ v ( m , ǫ, δ, s ) = 1 = ⇒ full performance ◮ v ( m , ǫ, δ, s ) ∈ (0 , 1) = ⇒ good performance on x ∈ S v ( m ,ǫ,δ, s ) Understanding Sparse JL for Feature Hashing Meena Jagadeesan

Understanding Sparse JL for Feature Hashing Meena Jagadeesan - PowerPoint PPT Presentation

Understanding Sparse JL for Feature Hashing Meena Jagadeesan Harvard University (Class of 2020) NeurIPS 2019 (Poster #59) Understanding Sparse JL for Feature Hashing Meena Jagadeesan Dimensionality reduction ( 2 -to- 2 ) A randomized

Today. Cuckoo hashing. Today. Cuckoo hashing. Johnson-Lindenstrass. Cuckoo hashing. Hashing

14. Hashing Hash Tables, Pre-Hashing, Hashing, Resolving Collisions using Chaining, Simple

Overview Intro to Hashing Intro to Hashing Hashing with Chaining Whats hashing?

14. Hashing Hash Tables, Pre-Hashing, Hashing, Resolving Collisions using Chaining, Simple

Decision Tree Prof. Seungchul Lee Industrial AI Lab. Feature Test Feature 1 Feature 2 Feature

Database Systems Index: Hashing Based on slides by Feifei Li, University of Utah Hashing n

Hashing (Application of Probability) Ashwinee Panda Final CS 70 Lecture! 9 Aug 2018 Overview

Hashing Connections 2-Universal Hash Function Perfect Hashing Anil Maheshwari Proofs

Union-Find [10] In the last class Hashing Collision Handling for Hashing Closed

Hashing Chapter 5 1 Objectives Understand the idea of hashing Compare hashing to sorting

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

Sparse Feature Learning Philipp Koehn 3 March 2015 Philipp Koehn Machine Translation: Sparse

Sparse Feature Learning Philipp Koehn 1 March 2016 Philipp Koehn Machine Translation: Sparse

A Distinctive Feature of A Distinctive Feature of A Distinctive Feature of A Distinctive Feature

Outline Reducing Dimensionality Feature Selection 1 Steven J Zeil Feature Extraction 2

Hashing Hashing What is it? A form of narcotic intake? A side order for your eggs? A

Ongoing developments in IEEE 802.11 WLAN standardisation A study group on randomized and changing

Introduction to Randomized Algorithms Arijit Bishnu ( arijit@isical.ac.in ) Advanced Computing

Causality: Explanation versus Prediction Department of Government London School of Economics and

Gov 2000: 2. Random Variables and Probability Distributions Matthew Blackwell Fall 2016 1 / 56

LR 2 : LR : Le Leakage-Re Resilient La Layout t Ra Randomization fo for Mo Mobile

Chapter 7: Quicksort Quicksort is a divide-and-conquer sorting algorithm in which division is

Counting Basic-Irreducible Factors Mod p k in Deterministic Poly-Time and p -Adic Applications

Linear and Sublinear Linear Algebra Algorithms: Preconditioning Stochastic Gradient Algorithms