understanding sparse jl for feature hashing
play

Understanding Sparse JL for Feature Hashing Meena Jagadeesan - PowerPoint PPT Presentation

Understanding Sparse JL for Feature Hashing Meena Jagadeesan Harvard University (Class of 2020) NeurIPS 2019 (Poster #59) Understanding Sparse JL for Feature Hashing Meena Jagadeesan Dimensionality reduction ( 2 -to- 2 ) A randomized


  1. Understanding Sparse JL for Feature Hashing Meena Jagadeesan Harvard University (Class of 2020) NeurIPS 2019 (Poster #59) Understanding Sparse JL for Feature Hashing Meena Jagadeesan

  2. Dimensionality reduction ( ℓ 2 -to- ℓ 2 ) A randomized map R n → R m (where m ≪ n ) that preserves distances. Understanding Sparse JL for Feature Hashing Meena Jagadeesan

  3. Dimensionality reduction ( ℓ 2 -to- ℓ 2 ) A randomized map R n → R m (where m ≪ n ) that preserves distances. A pre-processing step in many applications: clustering nearest neighbors Understanding Sparse JL for Feature Hashing Meena Jagadeesan

  4. Dimensionality reduction ( ℓ 2 -to- ℓ 2 ) A randomized map R n → R m (where m ≪ n ) that preserves distances. A pre-processing step in many applications: clustering nearest neighbors Key question: What is the tradeoff between the dimension m , the performance in distance preservation, and the projection time? Understanding Sparse JL for Feature Hashing Meena Jagadeesan

  5. Dimensionality reduction ( ℓ 2 -to- ℓ 2 ) A randomized map R n → R m (where m ≪ n ) that preserves distances. A pre-processing step in many applications: clustering nearest neighbors Key question: What is the tradeoff between the dimension m , the performance in distance preservation, and the projection time? This paper: A theoretical analysis of this tradeoff for a state-of-the-art dimensionality reduction scheme on feature vectors. Understanding Sparse JL for Feature Hashing Meena Jagadeesan

  6. Feature hashing (Weinberger et al. ’09) One standard dimensionality reduction scheme is feature hashing. Understanding Sparse JL for Feature Hashing Meena Jagadeesan

  7. Feature hashing (Weinberger et al. ’09) One standard dimensionality reduction scheme is feature hashing. Use a hash function h : { 1 , . . . , n } → { 1 , . . . , m } on coordinates. Use random signs to handle collisions: f ( x ) i = � j ∈ h − 1 ( i ) σ j x j . Understanding Sparse JL for Feature Hashing Meena Jagadeesan

  8. Feature hashing (Weinberger et al. ’09) One standard dimensionality reduction scheme is feature hashing. Use a hash function h : { 1 , . . . , n } → { 1 , . . . , m } on coordinates. Use random signs to handle collisions: f ( x ) i = � j ∈ h − 1 ( i ) σ j x j . Understanding Sparse JL for Feature Hashing Meena Jagadeesan

  9. Feature hashing (Weinberger et al. ’09) One standard dimensionality reduction scheme is feature hashing. Use a hash function h : { 1 , . . . , n } → { 1 , . . . , m } on coordinates. Use random signs to handle collisions: f ( x ) i = � j ∈ h − 1 ( i ) σ j x j . Understanding Sparse JL for Feature Hashing Meena Jagadeesan

  10. Sparse Johnson-Lindenstrauss transform (KN ’12) Sparse JL is a state-of-the-art sparse dimensionality reduction. Understanding Sparse JL for Feature Hashing Meena Jagadeesan

  11. Sparse Johnson-Lindenstrauss transform (KN ’12) Sparse JL is a state-of-the-art sparse dimensionality reduction. Use many (anti-correlated) hash fns h 1 , . . . , h s : { 1 , . . . , n } → { 1 , . . . , m } . = ⇒ Each input coordinate is mapped to s output coordinates. Understanding Sparse JL for Feature Hashing Meena Jagadeesan

  12. Sparse Johnson-Lindenstrauss transform (KN ’12) Sparse JL is a state-of-the-art sparse dimensionality reduction. Use many (anti-correlated) hash fns h 1 , . . . , h s : { 1 , . . . , n } → { 1 , . . . , m } . = ⇒ Each input coordinate is mapped to s output coordinates. Use random signs to deal with collisions. �� � � s 1 ( i ) σ k That is: f ( x ) i = j x j . √ s j ∈ h − 1 k =1 k Understanding Sparse JL for Feature Hashing Meena Jagadeesan

  13. Sparse Johnson-Lindenstrauss transform (KN ’12) Sparse JL is a state-of-the-art sparse dimensionality reduction. Use many (anti-correlated) hash fns h 1 , . . . , h s : { 1 , . . . , n } → { 1 , . . . , m } . = ⇒ Each input coordinate is mapped to s output coordinates. Use random signs to deal with collisions. �� � � s 1 ( i ) σ k That is: f ( x ) i = j x j . √ s j ∈ h − 1 k =1 k (Alternate view: a random sparse matrix w/ s nonzero entries per column.) Understanding Sparse JL for Feature Hashing Meena Jagadeesan

  14. Sparse Johnson-Lindenstrauss transform (KN ’12) Sparse JL is a state-of-the-art sparse dimensionality reduction. Use many (anti-correlated) hash fns h 1 , . . . , h s : { 1 , . . . , n } → { 1 , . . . , m } . = ⇒ Each input coordinate is mapped to s output coordinates. Use random signs to deal with collisions. �� � � s 1 ( i ) σ k That is: f ( x ) i = j x j . √ s j ∈ h − 1 k =1 k (Alternate view: a random sparse matrix w/ s nonzero entries per column.) The tradeoff : higher s preserves distances better, but takes longer. Understanding Sparse JL for Feature Hashing Meena Jagadeesan

  15. Sparse Johnson-Lindenstrauss transform (KN ’12) Sparse JL is a state-of-the-art sparse dimensionality reduction. Use many (anti-correlated) hash fns h 1 , . . . , h s : { 1 , . . . , n } → { 1 , . . . , m } . = ⇒ Each input coordinate is mapped to s output coordinates. Use random signs to deal with collisions. �� � � s 1 ( i ) σ k That is: f ( x ) i = j x j . √ s j ∈ h − 1 k =1 k (Alternate view: a random sparse matrix w/ s nonzero entries per column.) The tradeoff : higher s preserves distances better, but takes longer. This work Analysis of tradeoff for sparse JL between # of hash functions s, dimension m, and performance in ℓ 2 -distance preservation. Understanding Sparse JL for Feature Hashing Meena Jagadeesan

  16. Intuition for this paper Analysis of sparse JL with respect to a performance measure: Understanding Sparse JL for Feature Hashing Meena Jagadeesan

  17. Intuition for this paper Analysis of sparse JL with respect to a performance measure: Understanding Sparse JL for Feature Hashing Meena Jagadeesan

  18. Intuition for this paper Analysis of sparse JL with respect to a performance measure: Understanding Sparse JL for Feature Hashing Meena Jagadeesan

  19. Intuition for this paper Analysis of sparse JL with respect to a performance measure: Understanding Sparse JL for Feature Hashing Meena Jagadeesan

  20. Intuition for this paper Analysis of sparse JL with respect to a performance measure: Understanding Sparse JL for Feature Hashing Meena Jagadeesan

  21. Traditional mathematical framework Consider a probability distribution F over linear maps f : R n → R m . Understanding Sparse JL for Feature Hashing Meena Jagadeesan

  22. Traditional mathematical framework Consider a probability distribution F over linear maps f : R n → R m . Geometry-preserving condition. For each x ∈ R n : P f ∈F [ � f ( x ) � 2 ∈ (1 ± ǫ ) � x � 2 ] > 1 − δ, for ǫ target error, δ target failure probability. Understanding Sparse JL for Feature Hashing Meena Jagadeesan

  23. Traditional mathematical framework Consider a probability distribution F over linear maps f : R n → R m . Geometry-preserving condition. For each x ∈ R n : P f ∈F [ � f ( x ) � 2 ∈ (1 ± ǫ ) � x � 2 ] > 1 − δ, for ǫ target error, δ target failure probability. (Can apply to differences x = x 1 − x 2 since f is linear.) Understanding Sparse JL for Feature Hashing Meena Jagadeesan

  24. Traditional mathematical framework Consider a probability distribution F over linear maps f : R n → R m . Geometry-preserving condition. For each x ∈ R n : P f ∈F [ � f ( x ) � 2 ∈ (1 ± ǫ ) � x � 2 ] > 1 − δ, for ǫ target error, δ target failure probability. (Can apply to differences x = x 1 − x 2 since f is linear.) Sparse JL can sometimes perform much better in practice on feature vectors than traditional theory suggests ... Understanding Sparse JL for Feature Hashing Meena Jagadeesan

  25. Performance on feature vectors (Weinberger et al. ’09) Consider vectors w/ small ℓ ∞ -to- ℓ 2 norm ratio: S v = { x ∈ R n | � x � ∞ ≤ v � x � 2 } . Understanding Sparse JL for Feature Hashing Meena Jagadeesan

  26. Performance on feature vectors (Weinberger et al. ’09) Consider vectors w/ small ℓ ∞ -to- ℓ 2 norm ratio: S v = { x ∈ R n | � x � ∞ ≤ v � x � 2 } . Let F s , m be the distribution given by sparse JL with parameters s and m . Understanding Sparse JL for Feature Hashing Meena Jagadeesan

  27. Performance on feature vectors (Weinberger et al. ’09) Consider vectors w/ small ℓ ∞ -to- ℓ 2 norm ratio: S v = { x ∈ R n | � x � ∞ ≤ v � x � 2 } . Let F s , m be the distribution given by sparse JL with parameters s and m . Definition v ( m , ǫ, δ, s ) is the supremum over v ∈ [0 , 1] such that: P f ∈F s , m [ � f ( x ) � 2 ∈ (1 ± ǫ ) � x � 2 ] > 1 − δ holds for each x ∈ S v . Understanding Sparse JL for Feature Hashing Meena Jagadeesan

  28. Performance on feature vectors (Weinberger et al. ’09) Consider vectors w/ small ℓ ∞ -to- ℓ 2 norm ratio: S v = { x ∈ R n | � x � ∞ ≤ v � x � 2 } . Let F s , m be the distribution given by sparse JL with parameters s and m . Definition v ( m , ǫ, δ, s ) is the supremum over v ∈ [0 , 1] such that: P f ∈F s , m [ � f ( x ) � 2 ∈ (1 ± ǫ ) � x � 2 ] > 1 − δ holds for each x ∈ S v . ◮ v ( m , ǫ, δ, s ) = 0 = ⇒ poor performance ◮ v ( m , ǫ, δ, s ) = 1 = ⇒ full performance ◮ v ( m , ǫ, δ, s ) ∈ (0 , 1) = ⇒ good performance on x ∈ S v ( m ,ǫ,δ, s ) Understanding Sparse JL for Feature Hashing Meena Jagadeesan

Recommend


More recommend