Automatic Selection of Context Configurations for Improved - PowerPoint PPT Presentation

Automatic Selection of Context Configurations for Improved Class-Specific Word Representations Ivan Vulić, Roy Schwartz , Ari Rappoport, Roi Reichart and Anna Korhonen CoNLL 2017; Vancouver; August 3, 2017 1 / 13

Background Distributional Semantics: What is a Context? The nice people rode their horses bravely and rapidly 2 / 13

Background Bag-of-words Distributional Semantics: What is a Context? The nice people rode their horses bravely and rapidly ◮ Bag-of-words: simplest approach ◮ Noisy 2 / 13

Background Dependency links Distributional Semantics: What is a Context? det conj obj amod cc The nice people rode their horses bravely and rapidly ◮ Bag-of-words: simplest approach ◮ Noisy ◮ Dependency links: more accurate contexts ◮ Are all dependency links useful for representing words? ◮ Different dependency links represent different word classes [Lin, 1998, Levy and Goldberg, 2014] 2 / 13

Background Coordinations / Distributional Semantics: What is a Context? Symmetric Patterns conj The nice people rode their horses bravely and rapidly ◮ Bag-of-words: simplest approach ◮ Noisy ◮ Dependency links: more accurate contexts ◮ Are all dependency links useful for representing words? ◮ Different dependency links represent different word classes ◮ Coordinations / symmetric patterns: more accurate and more efficient [Schwartz et al., 2015, Schwartz et al., 2016] 2 / 13

Background Coordinations / Distributional Semantics: What is a Context? Symmetric Patterns adv conj obj nsubj amod The nice people rode their horses bravely and rapidly ◮ Bag-of-words: simplest approach ◮ Noisy ◮ Dependency links: more accurate contexts ◮ Are all dependency links useful for representing words? ◮ Different dependency links represent different word classes ◮ Coordinations / symmetric patterns: more accurate and more efficient ◮ But... valuable information gets lost [Schwartz et al., 2015, Schwartz et al., 2016] 2 / 13

Main Contributions ◮ Detect which fine-grained context types are useful for different word classes ◮ Traverse the large space of context configurations efficiently to find the best context configuration ◮ Transfer the configurations learned for one task and one language to other tasks and languages without re-training 3 / 13

Context Types (Universal) Labeled Dependency Edges ◮ (discovers, scientist_nsubj) ◮ (discovers, stars_dobj) ◮ (discovers, telescope_nmod) ◮ (stars, discovers_dobj-1) ◮ . . . 4 / 13

Cross Lingual Context Transfer? 5 / 13

Results: Individual Labels Adjectives Nouns Verbs 0 . 6 Spearman’s ρ 0 . 4 0 . 2 0 prep comp conj obj amod adv nummod 6 / 13

Too many Context Configurations Adjectives Verbs Nouns amod, prep, acl, amod, prep, comp, subj, obj, conjlr, obj, comp, adv, appos, acl, nmod, conjlr, conjll conjlr, conjll conjll ◮ Traversing a potentially huge context configuration may be intractable 7 / 13

Searching for Context Configurations An Adapted Beam-Search Algorithm l 1 , l 2 , l 3 , l 4 8 / 13

Searching for Context Configurations An Adapted Beam-Search Algorithm f ( l 1 , l 2 , l 3 , l 4 ) f ( l 1 , l 2 , l 3 , l 4 ) l 1 , l 2 , l 3 , l 4 < f ( l 2 , l 3 , l 4 ) > f ( l 2 , l 3 , l 4 ) l 2 , l 3 , l 4 l 1 , l 3 , l 4 l 1 , l 2 , l 4 l 1 , l 2 , l 3 f ( x ) : dev set evaluation 8 / 13

Searching for Context Configurations An Adapted Beam-Search Algorithm l 1 , l 2 , l 3 , l 4 l 2 , l 3 , l 4 l 1 , l 3 , l 4 l 1 , l 2 , l 4 l 1 , l 2 , l 3 l 2 , l 3 l 2 , l 4 l 3 , l 4 l 1 , l 2 l 1 , l 3 l 1 , l 4 8 / 13

Searching for Context Configurations An Adapted Beam-Search Algorithm l 1 , l 2 , l 3 , l 4 l 2 , l 3 , l 4 l 1 , l 3 , l 4 l 1 , l 2 , l 4 l 1 , l 2 , l 3 l 2 , l 3 l 2 , l 4 l 3 , l 4 l 1 , l 2 l 1 , l 3 l 1 , l 4 l 1 l 2 l 3 l 4 8 / 13

Experimental Setup ◮ Model: Skip-gram with negative sampling [Mikolov et al., 2013] ◮ Training data: Polyglot Wikipedia ◮ Evaluation: SimLex-999 word similarity dataset [Hill et al., 2015] ◮ 666 noun pairs, 222 verb pairs, 111 adjective pairs ◮ 2-fold cross validation ◮ Evaluation measure: Spearman’s ρ ◮ Baselines: A variety of standard context types ◮ Bag-of-words (w/ and w/o positions); all dependency links, coordination dependency links, symmetric patterns 9 / 13

Results: Context Configurations 10 / 13

Selected Contexts are Efficient BoW BoW + Coord. SP Dep. All BEST A BEST N BEST V Training Time (minutes) 200 100 0 11 / 13

Transfer Results ◮ TOEFL ◮ 5% improvement over strongest baseline on verbs and nouns ◮ Other languages ◮ 0.02—0.08 ρ improvement on Italian and German accros all three word classes ◮ DE and IT SimLex999 [Leviant and Reichart, 2015] 12 / 13

Take-Home Messages ◮ Different word classes require different ( finer-grained ) context configurations ◮ An automatic framework for computationally tractable selection of optimal context configurations ◮ Design based on Universal Dependencies: context configurations transferable to other tasks and languages without retraining ◮ Future work → finer-grained contexts, other word classes, more sophisticated search algorithms, other representation models, context weighting, ... 13 / 13

Take-Home Messages ◮ Different word classes require different ( finer-grained ) context configurations ◮ An automatic framework for computationally tractable selection of optimal context configurations ◮ Design based on Universal Dependencies: context configurations transferable to other tasks and languages without retraining ◮ Future work → finer-grained contexts, other word classes, more sophisticated search algorithms, other representation models, context weighting, ... Thank you! 13 / 13

References I Hill, F., Reichart, R., and Korhonen, A. (2015). Simlex-999: Evaluating semantic models with (genuine) similarity estimation. Computational Linguistics . Leviant, I. and Reichart, R. (2015). Judgment language matters: Multilingual vector space models for judgment language aware lexical semantics. arxiv:1508.00106. Levy, O. and Goldberg, Y. (2014). Dependency-based word embeddings. In Proc. of ACL . Lin, D. (1998). Automatic retrieval and clustering of similar words. In Proc. of ACL . Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv:1301.3781. Schwartz, R., Reichart, R., and Rappoport, A. (2015). Symmetric pattern based word embeddings for improved word similarity prediction. In Proc. of CoNLL . 1 / 2

References II Schwartz, R., Reichart, R., and Rappoport, A. (2016). Symmetric patterns and coordinations: Fast and enhanced representations of verbs and adjectives. In Proc. of NAACL . 2 / 2

Automatic Selection of Context Configurations for Improved - PowerPoint PPT Presentation

Automatic Selection of Context Configurations for Improved Class-Specific Word Representations Ivan Vuli, Roy Schwartz , Ari Rappoport, Roi Reichart and Anna Korhonen CoNLL 2017; Vancouver; August 3, 2017 1 / 13 Background Distributional

Comparison Comparison of the proposed configurations of the proposed configurations for the

Automatic Verification of Automatic Verification of Automatic Verification of Automatic

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?

Configurations in Lattices & Multiple Mixing Alex Gorodnik (University of Bristol) joint work

Configurations of Extremal Even Unimodular Lattices Scott D. Kominers Harvard Mathematics

Monte Carlo simulations of the 2D Ising model Stochastic sampling of spin configurations to

SECONDHAND SELECTION Sales Price - 275,000.00 EU SECONDHAND SELECTION INTERNAL VIEWS SECONDHAND

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

SELECTION Deterministic Stochastic Proportionate selection: Roulette Wheel Selection

Selection 2 Selection Selection given a set of (distinct) elements, finding the element larger

Automatic Enrollment and Automatic IRAs David C. John The Heritage Foundation The Retirement

Automatic Registration and Calibration Automatic Registration and Calibration Automatic

Dependency Dependency- -Based Automatic Evaluation Based Automatic Evaluation Dependency

Demo (Step 1, Selection) Demo (Step 1, Optimization) Demo (Step 2, Selection) Demo (Step 2,

Conference Site Selection Stephanie Sabal Program Coordinator: Site Selection sabal@acm.org

Selection Sort Section 10.2 Code for Selection Sort (cont.) Code for an Array Sort Code for an

Crashes of older Australian riders Prof Narelle Haworth, CARRS-Q Christine Mulvihill, MUARC

Advances in Internal Medicine What a Let it Bugs & Pain & Jeopardy Rules Potpourri

Data-Intensive Distributed Computing CS 431/631 451/651 (Winter 2019) Part 1: MapReduce Algorithm

Welcome! Joel, Muharem, Andy Thanks! Thanks! Thanks! No picture required Look around you!

Why Columbus Wasnt Chinese Emperor Yongle 1403-24 Malacca With the virtue of a sage and

VITALAS at TRECVID 2009 Studying User Search Behavior with a Video Retrieval System Henning Rode,

Total Latency in Singleton Congestion Games Price of Anarchy Martin Gairing 1 Florian Schoppmann 2

Architects Guide to Impact 2010 Scalable Connectivity Marc-Thomas Schmidt Distinguished

Automatic Selection of Context Configurations for Improved - PowerPoint PPT Presentation

Automatic Selection of Context Configurations for Improved Class-Specific Word Representations Ivan Vuli, Roy Schwartz , Ari Rappoport, Roi Reichart and Anna Korhonen CoNLL 2017; Vancouver; August 3, 2017 1 / 13 Background Distributional

Comparison Comparison of the proposed configurations of the proposed configurations for the

Automatic Verification of Automatic Verification of Automatic Verification of Automatic

ERP Selection KIRTANE &amp; PANDIT Suhas Deshpande Why ERP Selection is important ?

Configurations in Lattices &amp; Multiple Mixing Alex Gorodnik (University of Bristol) joint work

Configurations of Extremal Even Unimodular Lattices Scott D. Kominers Harvard Mathematics

Monte Carlo simulations of the 2D Ising model Stochastic sampling of spin configurations to

SECONDHAND SELECTION Sales Price - 275,000.00 EU SECONDHAND SELECTION INTERNAL VIEWS SECONDHAND

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

SELECTION Deterministic Stochastic Proportionate selection: Roulette Wheel Selection

Selection 2 Selection Selection given a set of (distinct) elements, finding the element larger

Automatic Enrollment and Automatic IRAs David C. John The Heritage Foundation The Retirement

Automatic Registration and Calibration Automatic Registration and Calibration Automatic

Dependency Dependency- -Based Automatic Evaluation Based Automatic Evaluation Dependency

Demo (Step 1, Selection) Demo (Step 1, Optimization) Demo (Step 2, Selection) Demo (Step 2,

Conference Site Selection Stephanie Sabal Program Coordinator: Site Selection sabal@acm.org

Selection Sort Section 10.2 Code for Selection Sort (cont.) Code for an Array Sort Code for an

Crashes of older Australian riders Prof Narelle Haworth, CARRS-Q Christine Mulvihill, MUARC

Advances in Internal Medicine What a Let it Bugs &amp; Pain &amp; Jeopardy Rules Potpourri

Data-Intensive Distributed Computing CS 431/631 451/651 (Winter 2019) Part 1: MapReduce Algorithm

Welcome! Joel, Muharem, Andy Thanks! Thanks! Thanks! No picture required Look around you!

Why Columbus Wasnt Chinese Emperor Yongle 1403-24 Malacca With the virtue of a sage and

VITALAS at TRECVID 2009 Studying User Search Behavior with a Video Retrieval System Henning Rode,

Total Latency in Singleton Congestion Games Price of Anarchy Martin Gairing 1 Florian Schoppmann 2

Architects Guide to Impact 2010 Scalable Connectivity Marc-Thomas Schmidt Distinguished

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?

Configurations in Lattices & Multiple Mixing Alex Gorodnik (University of Bristol) joint work

Advances in Internal Medicine What a Let it Bugs & Pain & Jeopardy Rules Potpourri