categorical feature compression via submodular
play

Categorical Feature Compression via Submodular Optimization - PowerPoint PPT Presentation

Categorical Feature Compression via Submodular Optimization Mohammad Hossein Bateni, Lin Chen, Hossein Esfandiari, Thomas Fu, Vahab Mirrokni, and Afshin Rostamizadeh Pacific Ballroom #142 Why Vocabulary Compression? Why Vocabulary Compression?


  1. Categorical Feature Compression via Submodular Optimization Mohammad Hossein Bateni, Lin Chen, Hossein Esfandiari, Thomas Fu, Vahab Mirrokni, and Afshin Rostamizadeh Pacific Ballroom #142

  2. Why Vocabulary Compression?

  3. Why Vocabulary Compression? Embedding layer Huge! Video ID: ~7 billion values 99.9% of neural net

  4. How to Compress Vocabulary?

  5. How to Compress Vocabulary Group similar feature values into one. U.S. U.S./Canada Good compression preserves most Canada information of labels . China Supervised Japan Chn/Jpn/Kor Korea

  6. Problem Formulation

  7. Problem Formulation User ID Featur Compressed Favorite fruit (label) e feature Max I(f(X); C) #1843 China China/Japan/Korea s.t. f(X) can take at #429 Japan China/Japan/Korea most m values ... #9077 Brazil Brazil/Argentina Random variable Random variable Compressed feature X ∈ C ∈ {pear, apple, f(X) ∈ {Afghanistan, …, mango} {China/Japan/Korea, Albania, …, Brazil/Argentina, Zimbabwe} U.S./Canada}

  8. Our Results

  9. Our Results There is a quasi-linear (O(n log n)) algorithm that achieves Max I(f(X); C) 63% f(OPT) if label is binary . s.t. f(X) can ● Design a new submodular function after re-parametrization take at most m values There is a log(n) -round distributed algorithm that achieves 63% f(OPT) with O(n/k) space per machine. ● k is # of machines

  10. Reparametrization for Submodularity ● Sort feature values x according to P(X=x|C=0) . ● A problem of placing separators ● I(f(X); C) is a function of the set of separators.

  11. Experiment Results

  12. Pacific Ballroom #142 See you this evening

Recommend


More recommend