Understanding Sparse JL for Feature Hashing Meena Jagadeesan Harvard University (Class of 2020) NeurIPS 2019 (Poster #59) Understanding Sparse JL for Feature Hashing Meena Jagadeesan
Dimensionality reduction ( ℓ 2 -to- ℓ 2 ) A randomized map R n → R m (where m ≪ n ) that preserves distances. Understanding Sparse JL for Feature Hashing Meena Jagadeesan
Dimensionality reduction ( ℓ 2 -to- ℓ 2 ) A randomized map R n → R m (where m ≪ n ) that preserves distances. A pre-processing step in many applications: clustering nearest neighbors Understanding Sparse JL for Feature Hashing Meena Jagadeesan
Dimensionality reduction ( ℓ 2 -to- ℓ 2 ) A randomized map R n → R m (where m ≪ n ) that preserves distances. A pre-processing step in many applications: clustering nearest neighbors Key question: What is the tradeoff between the dimension m , the performance in distance preservation, and the projection time? Understanding Sparse JL for Feature Hashing Meena Jagadeesan
Dimensionality reduction ( ℓ 2 -to- ℓ 2 ) A randomized map R n → R m (where m ≪ n ) that preserves distances. A pre-processing step in many applications: clustering nearest neighbors Key question: What is the tradeoff between the dimension m , the performance in distance preservation, and the projection time? This paper: A theoretical analysis of this tradeoff for a state-of-the-art dimensionality reduction scheme on feature vectors. Understanding Sparse JL for Feature Hashing Meena Jagadeesan
Feature hashing (Weinberger et al. ’09) One standard dimensionality reduction scheme is feature hashing. Understanding Sparse JL for Feature Hashing Meena Jagadeesan
Feature hashing (Weinberger et al. ’09) One standard dimensionality reduction scheme is feature hashing. Use a hash function h : { 1 , . . . , n } → { 1 , . . . , m } on coordinates. Use random signs to handle collisions: f ( x ) i = � j ∈ h − 1 ( i ) σ j x j . Understanding Sparse JL for Feature Hashing Meena Jagadeesan
Feature hashing (Weinberger et al. ’09) One standard dimensionality reduction scheme is feature hashing. Use a hash function h : { 1 , . . . , n } → { 1 , . . . , m } on coordinates. Use random signs to handle collisions: f ( x ) i = � j ∈ h − 1 ( i ) σ j x j . Understanding Sparse JL for Feature Hashing Meena Jagadeesan
Feature hashing (Weinberger et al. ’09) One standard dimensionality reduction scheme is feature hashing. Use a hash function h : { 1 , . . . , n } → { 1 , . . . , m } on coordinates. Use random signs to handle collisions: f ( x ) i = � j ∈ h − 1 ( i ) σ j x j . Understanding Sparse JL for Feature Hashing Meena Jagadeesan
Sparse Johnson-Lindenstrauss transform (KN ’12) Sparse JL is a state-of-the-art sparse dimensionality reduction. Understanding Sparse JL for Feature Hashing Meena Jagadeesan
Sparse Johnson-Lindenstrauss transform (KN ’12) Sparse JL is a state-of-the-art sparse dimensionality reduction. Use many (anti-correlated) hash fns h 1 , . . . , h s : { 1 , . . . , n } → { 1 , . . . , m } . = ⇒ Each input coordinate is mapped to s output coordinates. Understanding Sparse JL for Feature Hashing Meena Jagadeesan
Sparse Johnson-Lindenstrauss transform (KN ’12) Sparse JL is a state-of-the-art sparse dimensionality reduction. Use many (anti-correlated) hash fns h 1 , . . . , h s : { 1 , . . . , n } → { 1 , . . . , m } . = ⇒ Each input coordinate is mapped to s output coordinates. Use random signs to deal with collisions. �� � � s 1 ( i ) σ k That is: f ( x ) i = j x j . √ s j ∈ h − 1 k =1 k Understanding Sparse JL for Feature Hashing Meena Jagadeesan
Sparse Johnson-Lindenstrauss transform (KN ’12) Sparse JL is a state-of-the-art sparse dimensionality reduction. Use many (anti-correlated) hash fns h 1 , . . . , h s : { 1 , . . . , n } → { 1 , . . . , m } . = ⇒ Each input coordinate is mapped to s output coordinates. Use random signs to deal with collisions. �� � � s 1 ( i ) σ k That is: f ( x ) i = j x j . √ s j ∈ h − 1 k =1 k (Alternate view: a random sparse matrix w/ s nonzero entries per column.) Understanding Sparse JL for Feature Hashing Meena Jagadeesan
Sparse Johnson-Lindenstrauss transform (KN ’12) Sparse JL is a state-of-the-art sparse dimensionality reduction. Use many (anti-correlated) hash fns h 1 , . . . , h s : { 1 , . . . , n } → { 1 , . . . , m } . = ⇒ Each input coordinate is mapped to s output coordinates. Use random signs to deal with collisions. �� � � s 1 ( i ) σ k That is: f ( x ) i = j x j . √ s j ∈ h − 1 k =1 k (Alternate view: a random sparse matrix w/ s nonzero entries per column.) The tradeoff : higher s preserves distances better, but takes longer. Understanding Sparse JL for Feature Hashing Meena Jagadeesan
Sparse Johnson-Lindenstrauss transform (KN ’12) Sparse JL is a state-of-the-art sparse dimensionality reduction. Use many (anti-correlated) hash fns h 1 , . . . , h s : { 1 , . . . , n } → { 1 , . . . , m } . = ⇒ Each input coordinate is mapped to s output coordinates. Use random signs to deal with collisions. �� � � s 1 ( i ) σ k That is: f ( x ) i = j x j . √ s j ∈ h − 1 k =1 k (Alternate view: a random sparse matrix w/ s nonzero entries per column.) The tradeoff : higher s preserves distances better, but takes longer. This work Analysis of tradeoff for sparse JL between # of hash functions s, dimension m, and performance in ℓ 2 -distance preservation. Understanding Sparse JL for Feature Hashing Meena Jagadeesan
Intuition for this paper Analysis of sparse JL with respect to a performance measure: Understanding Sparse JL for Feature Hashing Meena Jagadeesan
Intuition for this paper Analysis of sparse JL with respect to a performance measure: Understanding Sparse JL for Feature Hashing Meena Jagadeesan
Intuition for this paper Analysis of sparse JL with respect to a performance measure: Understanding Sparse JL for Feature Hashing Meena Jagadeesan
Intuition for this paper Analysis of sparse JL with respect to a performance measure: Understanding Sparse JL for Feature Hashing Meena Jagadeesan
Intuition for this paper Analysis of sparse JL with respect to a performance measure: Understanding Sparse JL for Feature Hashing Meena Jagadeesan
Traditional mathematical framework Consider a probability distribution F over linear maps f : R n → R m . Understanding Sparse JL for Feature Hashing Meena Jagadeesan
Traditional mathematical framework Consider a probability distribution F over linear maps f : R n → R m . Geometry-preserving condition. For each x ∈ R n : P f ∈F [ � f ( x ) � 2 ∈ (1 ± ǫ ) � x � 2 ] > 1 − δ, for ǫ target error, δ target failure probability. Understanding Sparse JL for Feature Hashing Meena Jagadeesan
Traditional mathematical framework Consider a probability distribution F over linear maps f : R n → R m . Geometry-preserving condition. For each x ∈ R n : P f ∈F [ � f ( x ) � 2 ∈ (1 ± ǫ ) � x � 2 ] > 1 − δ, for ǫ target error, δ target failure probability. (Can apply to differences x = x 1 − x 2 since f is linear.) Understanding Sparse JL for Feature Hashing Meena Jagadeesan
Traditional mathematical framework Consider a probability distribution F over linear maps f : R n → R m . Geometry-preserving condition. For each x ∈ R n : P f ∈F [ � f ( x ) � 2 ∈ (1 ± ǫ ) � x � 2 ] > 1 − δ, for ǫ target error, δ target failure probability. (Can apply to differences x = x 1 − x 2 since f is linear.) Sparse JL can sometimes perform much better in practice on feature vectors than traditional theory suggests ... Understanding Sparse JL for Feature Hashing Meena Jagadeesan
Performance on feature vectors (Weinberger et al. ’09) Consider vectors w/ small ℓ ∞ -to- ℓ 2 norm ratio: S v = { x ∈ R n | � x � ∞ ≤ v � x � 2 } . Understanding Sparse JL for Feature Hashing Meena Jagadeesan
Performance on feature vectors (Weinberger et al. ’09) Consider vectors w/ small ℓ ∞ -to- ℓ 2 norm ratio: S v = { x ∈ R n | � x � ∞ ≤ v � x � 2 } . Let F s , m be the distribution given by sparse JL with parameters s and m . Understanding Sparse JL for Feature Hashing Meena Jagadeesan
Performance on feature vectors (Weinberger et al. ’09) Consider vectors w/ small ℓ ∞ -to- ℓ 2 norm ratio: S v = { x ∈ R n | � x � ∞ ≤ v � x � 2 } . Let F s , m be the distribution given by sparse JL with parameters s and m . Definition v ( m , ǫ, δ, s ) is the supremum over v ∈ [0 , 1] such that: P f ∈F s , m [ � f ( x ) � 2 ∈ (1 ± ǫ ) � x � 2 ] > 1 − δ holds for each x ∈ S v . Understanding Sparse JL for Feature Hashing Meena Jagadeesan
Performance on feature vectors (Weinberger et al. ’09) Consider vectors w/ small ℓ ∞ -to- ℓ 2 norm ratio: S v = { x ∈ R n | � x � ∞ ≤ v � x � 2 } . Let F s , m be the distribution given by sparse JL with parameters s and m . Definition v ( m , ǫ, δ, s ) is the supremum over v ∈ [0 , 1] such that: P f ∈F s , m [ � f ( x ) � 2 ∈ (1 ± ǫ ) � x � 2 ] > 1 − δ holds for each x ∈ S v . ◮ v ( m , ǫ, δ, s ) = 0 = ⇒ poor performance ◮ v ( m , ǫ, δ, s ) = 1 = ⇒ full performance ◮ v ( m , ǫ, δ, s ) ∈ (0 , 1) = ⇒ good performance on x ∈ S v ( m ,ǫ,δ, s ) Understanding Sparse JL for Feature Hashing Meena Jagadeesan
Recommend
More recommend