On the Downstream Performance of Compressed Word Embeddings Avner - PowerPoint PPT Presentation

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19. On the Downstream Performance of Compressed Word Embeddings Avner May, Jian Zhang, Tri Dao, Chris Ré Stanford University

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19. Word Embeddings 2

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19. Word Embeddings Important for strong NLP performance 2

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19. Word Embeddings Important for strong NLP performance Take a lot of memory 2

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19. Word Embedding Compression 3

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19. What determines whether a compressed embedding matrix will perform well on downstream tasks? 4

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19. What determines whether a compressed embedding matrix will perform well on downstream tasks? Train model 4

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19. What determines whether a compressed embedding matrix will perform well on downstream tasks? Train model ?? Train model 4

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19. Motivating Observation Existing ways of measuring compression quality often fail to explain relative downstream performance. 5

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19. Motivating Observation Existing ways of measuring compression quality often fail to explain relative downstream performance. Better compression quality measure 5

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19. Motivating Observation Existing ways of measuring compression quality often fail to explain relative downstream performance. Better compression Worse downstream quality measure performance 5

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19. Our Contributions: Outline 6

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19. Our Contributions: Outline 1 Define a new measure of compression quality. 6

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19. Our Contributions: Outline 1 Define a new measure of compression quality. Prove generalization bounds using this measure. 2 6

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19. Our Contributions: Outline 1 Define a new measure of compression quality. Prove generalization bounds using this measure. 2 Show strong empirical correlation w. downstream performance. 3 6

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19. Our Contributions: Outline 1 Define a new measure of compression quality. Prove generalization bounds using this measure. 2 Show strong empirical correlation w. downstream performance. 3 Use measure to select compressed embeddings. 4 6

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19. Our Contributions: Outline 1 Define a new measure of compression quality. Prove generalization bounds using this measure. 2 Show strong empirical correlation w. downstream performance. 3 Use measure to select compressed embeddings. 4 Up to 2x lower selection error rates than the next best measure. 6

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19. Defining the Measure : Intuition from Linear Regression Observation: Predictions are determined by data matrix’s left singular vectors. 7

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19. Defining the Measure : Intuition from Linear Regression Observation: Predictions are determined by data matrix’s left singular vectors. = Embed. Singular Value matrix Decomposition 7

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19. Defining the Measure : Intuition from Linear Regression Observation: Predictions are determined by data matrix’s left singular vectors. = Embed. Singular Value Regression label y matrix Decomposition 7

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19. Defining the Measure : Intuition from Linear Regression Observation: Predictions are determined by data matrix’s left singular vectors. Linear regressor predictions = Embed. Singular Value Regression label Project y onto span of y matrix Decomposition left singular vectors 7

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19. Defining the Measure : Eigenspace Overlap Score (EOS) Intuition: Measures similarity between the span of left singular vectors. 8

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19. Defining the Measure : Eigenspace Overlap Score (EOS) Intuition: Measures similarity between the span of left singular vectors. EOS Eigenspace Compressed Uncompressed overlap score embed. SVD embed. SVD 8

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19. Theoretical Results : Linear Regression Theorem (informal) : Expected difference in test mean-squared error attained by compressed vs. uncompressed embeddings is determined by EOS . 9

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19. Theoretical Results : Linear Regression Theorem (informal) : Expected difference in test mean-squared error attained by compressed vs. uncompressed embeddings is determined by EOS . Higher EOS 9

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19. Theoretical Results : Linear Regression Theorem (informal) : Expected difference in test mean-squared error attained by compressed vs. uncompressed embeddings is determined by EOS . Better downstream Higher EOS performance 9

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19. Empirical Correlation: Beyond Linear Regression EOS attains strong correlation with downstream model accuracy . 10

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19. Empirical Correlation: Beyond Linear Regression EOS attains strong correlation with downstream model accuracy . Higher accuracy EOS Higher quality 10

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19. Empirical Correlation: Beyond Linear Regression EOS attains strong correlation with downstream model accuracy . Higher accuracy Neg. PIP Loss [1] EOS Higher quality Higher quality [1] Yin and Shen, On the Dimensionality of Word Embeddings . NeurIPS 2018. 10

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19. EOS as a Selection Criterion EOS attains up to 2x lower selection error rates than 2 nd best. 11

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19. EOS as a Selection Criterion EOS attains up to 2x lower selection error rates than 2 nd best. Selection Error Rate (%) NLP Tasks [1] Avron et al., ICML 2017. [2] Yin and Shen. NeurIPS 2018. [3] Zhang et al., AISTATS 2019. 11

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19. Our Contributions: Summary 12

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19. Our Contributions: Summary 1 Defined a new measure of compression quality. 12

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19. Our Contributions: Summary 1 Defined a new measure of compression quality. Proved generalization bounds using this measure. 2 12

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19. Our Contributions: Summary 1 Defined a new measure of compression quality. Proved generalization bounds using this measure. 2 Showed strong empirical correlation w. downstream perf. 3 12

On the Downstream Performance of Compressed Word Embeddings Avner - PowerPoint PPT Presentation

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19. On the Downstream Performance of Compressed Word Embeddings Avner May, Jian Zhang, Tri Dao, Chris R Stanford University On the Downstream Performance of

Compressed Membership for NFA (DFA) with Compressed Labels is in NP (P) Artur Je University of

Hershey Mill Dam Looking Downstream from East Embankment Hershey Mill Dam Looking Downstream from

SARIMS BREAKFAST NOVEM NOVEMBER BER 6, 2014 6, 2014 DOWNSTREAM PROPE DOWNSTREAM PROPERTY

Memory Memory Decoders M bits M bits RWM NVRWM ROM S 0 S 0 Word 0 Word 0 S 1 Word 1 Word

Compressed Membership for NFA (DFA) with Compressed Labels is in NP (P) Artur Je Wrocaw,

Pattern Matching on Compressed T exts II Shunsuke Inenaga Kyushu University, Japan Agenda

Decoding in Compressed Sensing Ronald DeVore USC, 2008 p. 1/33 Discrete Compressed Sensing R

Downstream Users Point of View Data Protection for Downstream Users Dr. Tibor Mller

Downstream user chemical safety report Downstream user update 21 October 2015 Bridget Ginnity

Scotlands Census Downstream Processing Operational Outline Head of Downstream Processing Unit

Synchrotron radiation downstream Synchrotron radiation downstream of relativistic shocks of

Transitioning Apples Downstream Repositories To The Monorepo Alex Lorenz Apple

AIR CHALLENGE SUMMARY SUSTAINABILITY NORTH AMERICA WHY COMPRESSED AIR? Inappropriate

Introduction to Compressed Sensing Gitta Kutyniok (Institut f ur Mathematik, Technische

Aligning DNA sequences on compressed collections of genomes Part 2. Compressed indexing The

Fast Data Driven Compressed Sensing and application to compressed quantitative MRI Mike Davies

p3s - a few technicalities M.Potekhin (Brookhaven National Laboratory) potekhin@bnl.gov DUNE

Fine-Grained Isolation for Scalable, Dynamic, Multi-tenant Edge Clouds Yuxin Ren, Guyue Liu,

Elisabetta Pennacchio, IPNL WA105 Collaboration Meeting, March 23 rd , 2017 1 1 Outline 1.

Conflict-driven Reasoning Conflict-Driven SATisfiability 2 CDCL : propositional conflict-driven

Nenad Medvidovic University of Southern California, Los Angeles, CA, USA 1 Distributed

PatManQL: A language to manipulate patterns and data in hierarchical catalogs Panagiotis Bouros,

EOS WG proposal Wes Hardaker <hardaker@tislabs.com> draft-hardaker-eos-oops-00.txt

Core-Collapse Supernova Overview Christian D. Ott Sherman TAPIR, Caltech Fairchild Foundation

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

On the Downstream Performance of Compressed Word Embeddings Avner - PowerPoint PPT Presentation

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19. On the Downstream Performance of Compressed Word Embeddings Avner May, Jian Zhang, Tri Dao, Chris R Stanford University On the Downstream Performance of

Compressed Membership for NFA (DFA) with Compressed Labels is in NP (P) Artur Je University of

Hershey Mill Dam Looking Downstream from East Embankment Hershey Mill Dam Looking Downstream from

SARIMS BREAKFAST NOVEM NOVEMBER BER 6, 2014 6, 2014 DOWNSTREAM PROPE DOWNSTREAM PROPERTY

Memory Memory Decoders M bits M bits RWM NVRWM ROM S 0 S 0 Word 0 Word 0 S 1 Word 1 Word

Compressed Membership for NFA (DFA) with Compressed Labels is in NP (P) Artur Je Wrocaw,

Pattern Matching on Compressed T exts II Shunsuke Inenaga Kyushu University, Japan Agenda

Decoding in Compressed Sensing Ronald DeVore USC, 2008 p. 1/33 Discrete Compressed Sensing R

Downstream Users Point of View Data Protection for Downstream Users Dr. Tibor Mller

Downstream user chemical safety report Downstream user update 21 October 2015 Bridget Ginnity

Scotlands Census Downstream Processing Operational Outline Head of Downstream Processing Unit

Synchrotron radiation downstream Synchrotron radiation downstream of relativistic shocks of

Transitioning Apples Downstream Repositories To The Monorepo Alex Lorenz Apple

AIR CHALLENGE SUMMARY SUSTAINABILITY NORTH AMERICA WHY COMPRESSED AIR? Inappropriate

Introduction to Compressed Sensing Gitta Kutyniok (Institut f ur Mathematik, Technische

Aligning DNA sequences on compressed collections of genomes Part 2. Compressed indexing The

Fast Data Driven Compressed Sensing and application to compressed quantitative MRI Mike Davies

p3s - a few technicalities M.Potekhin (Brookhaven National Laboratory) potekhin@bnl.gov DUNE

Fine-Grained Isolation for Scalable, Dynamic, Multi-tenant Edge Clouds Yuxin Ren, Guyue Liu,

Elisabetta Pennacchio, IPNL WA105 Collaboration Meeting, March 23 rd , 2017 1 1 Outline 1.

Conflict-driven Reasoning Conflict-Driven SATisfiability 2 CDCL : propositional conflict-driven

Nenad Medvidovic University of Southern California, Los Angeles, CA, USA 1 Distributed

PatManQL: A language to manipulate patterns and data in hierarchical catalogs Panagiotis Bouros,

EOS WG proposal Wes Hardaker &lt;hardaker@tislabs.com&gt; draft-hardaker-eos-oops-00.txt

Core-Collapse Supernova Overview Christian D. Ott Sherman TAPIR, Caltech Fairchild Foundation

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

EOS WG proposal Wes Hardaker <hardaker@tislabs.com> draft-hardaker-eos-oops-00.txt