Derivational Smoothing for Syntactic Distributional Semantics o , - PowerPoint PPT Presentation

Derivational Smoothing for Syntactic Distributional Semantics o ∗ , Jan ˇ Snajder † , and Britta Zeller ∗ Sebastian Pad´ ∗ Institute for Computational Linguistics, Heidelberg University † Faculty of Electrical Engineering and Computing, Zagreb University The 51st Annual Meeting of the Association for Computational Linguistics August 6, 2013

Distributional Semantics Representation of word meaning as vectors Vector components: co-occurrences with context features Firth (1957): You shall know a word by the company it keeps report Peter 1 convince 1 Peter convinced himself to write reports ⇒ write 1 Vector similarity approximates semantic similarity Simple, unsupervised induction of word meaning Used in variety of tasks (Turney and Pantel, 2010) o, ˇ Pad´ Snajder, Zeller (ACL 2013) Derivational Smoothing Aug 6, 2013 2 / 15

Main Context Choices lexical vector space syntactic vector space shoot subj-shoot hunter hunter grass deer grass deer eat obj-eat Lexical (word) context captures topical similarity Syntactic (word-relation) context captures relational similarity Can model fine-grained information (Baroni and Lenci, 2010) More appropriate for free word order languages o, ˇ Pad´ Snajder, Zeller (ACL 2013) Derivational Smoothing Aug 6, 2013 3 / 15

A problem for syntactic vector spaces: Sparsity Syntactic vector spaces are very sparse Even if constructed from very large corpora Reason: Less cooccurrences ncsubj ncsubj ncmod Peter convinced himself to write reports report dobj xcomp write 1 ⇒ Many word pairs receive semantic similarities of zero Real dissimilarity or missing data? o, ˇ Pad´ Snajder, Zeller (ACL 2013) Derivational Smoothing Aug 6, 2013 4 / 15

Derivation Smoothing The question Where can we get semantic relatedness information to smooth distributional similarity? The answer: Derivational morphology Consider derivational families: argue argumentation arguably argumentative argument Words that are derived from one another have similar meaning Available from resources like CatVar (Habash and Dorr, 2003) o, ˇ Pad´ Snajder, Zeller (ACL 2013) Derivational Smoothing Aug 6, 2013 5 / 15

Derivational Smoothing If vectors are sparse, do not compute semantic similarity directly Instead, back off to less sparse members of derivational families sim(arguably, debatably) = 0 sim(argue, debate) > 0 back-o fg smoothed-sim(arguably, debatably) = f( arguably , debatably ) (Similar to back-off to less sparse n − 1 grams in LMs) o, ˇ Pad´ Snajder, Zeller (ACL 2013) Derivational Smoothing Aug 6, 2013 6 / 15

Derivational parameters: Two parameters 1 Smoothing trigger: When is a vector considered too sparse? Smooth always Smooth only if sim ( l 1 , l 2 ) = 0 (or undefined) 2 Smoothing scheme: How to bring in derivational family maxSim: Consider most similar pair between families avgSim: Consider average similarity of all pairs centSim: Consider similarity of family centroids o, ˇ Pad´ Snajder, Zeller (ACL 2013) Derivational Smoothing Aug 6, 2013 7 / 15

Experiments Language choice: German Resource situation comparable to English, but not quite as good Derivation important process of word formation Distributional models Base Model: German Distributional Memory Dm.De (Pad´ o and Utt, 2012) 900M-token sdewac web corpus (Faaß et al. , 2010) DErivBase derivational families (Zeller et al. , 2013) Rule-based resource for German, focus on precision 18.000 non-singleton families covering 60.000 lemmas Baseline: Bag-of-words models (same corpus) o, ˇ Pad´ Snajder, Zeller (ACL 2013) Derivational Smoothing Aug 6, 2013 8 / 15

Evaluation Task 1: Synonym choice 980 targets with four candidates each (Reader’s Digest) “Which term is antiquated most similar to? (a) venerable, (b) old, (c) unusable, (d) outdated ?” Prediction: candidate with max cosine similarity to target Evaluation: Accuracy (%) + Coverage (%) Task 2: Word similarity prediction 350 pairwise judgments on 5-point scale (Zesch et al. , 2007) ( monkey, macaque ) ⇒ 4 ( office, tiger ) ⇒ 1 Prediction: Cosine similarity Evaluation: Correlation (Pearson’s r ) + Coverage (%) o, ˇ Pad´ Snajder, Zeller (ACL 2013) Derivational Smoothing Aug 6, 2013 9 / 15

Results: Synonym choice Model Acc. % Cov. % Dm.De , unsmoothed 53.7 80.8 avgSim 46.0 86.6 Dm.De , smooth always maxSim 50.3 86.6 centSim 49.1 86.6 avgSim 52.6 86.6 Dm.De , smooth if sim = 0 maxSim 51.2 86.6 centSim 51.3 86.6 BoW “baseline” 56.9 98.5 Gain in coverage (+6%), but small loss in accuracy (-1%) BoW “baseline” performs best Conservative trigger (smooth if necessary) works best o, ˇ Pad´ Snajder, Zeller (ACL 2013) Derivational Smoothing Aug 6, 2013 10 / 15

Results: Semantic similarity Model r Cov. % Dm.De , unsmoothed .44 58.9 avgSim .30 88.0 Dm.De , smooth always maxSim .43 88.0 centSim .44 88.0 avgSim .43 88.0 Dm.De , smooth if sim = 0 maxSim .42 88.0 centSim 88.0 .47 BoW baseline .36 94.9 Again, conservative trigger works best Big increase in coverage (+30%), small increase in correlation o, ˇ Pad´ Snajder, Zeller (ACL 2013) Derivational Smoothing Aug 6, 2013 11 / 15

Task Comparison Result change through smoothing Task Quality Coverage Synonym choice − 0 . 09 % Acc. +6% Semantic similarity +0 . 03 Corr. +30% Semantic similarity benefits more from derivational smoothing than synonym choice Derivational families contain related words , not synonyms argue argumentation arguably argumentative argument o, ˇ Pad´ Snajder, Zeller (ACL 2013) Derivational Smoothing Aug 6, 2013 12 / 15

Summary Sparsity is a problem for syntax-based distributional models “Derivational smoothing”: Back off from rare word to derivational family Initial experiments Conservative trigger (smooth only when sim=0) works best Jury still out on smoothing scheme (combination method) Future work More experiments on smoothing schemes Use richer information about derivational families o, ˇ Pad´ Snajder, Zeller (ACL 2013) Derivational Smoothing Aug 6, 2013 13 / 15

References I Baroni, M. and Lenci, A. (2010). Distributional Memory : A General Framework for Corpus-Based Semantics. Computational Linguistics , 36 (4). Faaß, G., Heid, U., and Schmid, H. (2010). Design and application of a gold standard for morphological analysis: SMOR as an example of morphological evaluation. In Proceedings of the Seventh International Conference on Language Resources and Evaluation , Valletta, Malta. Firth, J. R. (1957). Papers in linguistics 1934-1951 . Oxford University Press. Habash, N. and Dorr, B. (2003). A categorial variation database for English. In Proceedings of the NAACL/HLT , pages 17–23. Pad´ o, S. and Utt, J. (2012). A distributional memory for German. In Proceedings of KONVENS , Vienna, Austria. o, ˇ Pad´ Snajder, Zeller (ACL 2013) Derivational Smoothing Aug 6, 2013 14 / 15

References II Turney, P. D. and Pantel, P. (2010). From Frequency to Meaning: Vector Space Models of Semantics. Journal of Artificial Intelligence Research , 37 (1), 141–188. Zeller, B., ˇ Snajder, J., and Pad´ o, S. (2013). DErivBase: Inducing and evaluating a derivational morphology resource for German. In Proceedings of ACL , Sofia, Bulgaria. Zesch, T., Gurevych, I., and M¨ uhlh¨ auser, M. (2007). Comparing Wikipedia and German Wordnet by Evaluating Semantic Relatedness on Multiple Datasets. In Proceedings of NAACL/HLT , pages 205–208. o, ˇ Pad´ Snajder, Zeller (ACL 2013) Derivational Smoothing Aug 6, 2013 15 / 15

Derivational Smoothing for Syntactic Distributional Semantics o , - PowerPoint PPT Presentation

Derivational Smoothing for Syntactic Distributional Semantics o , Jan Snajder , and Britta Zeller Sebastian Pad Institute for Computational Linguistics, Heidelberg University Faculty of Electrical Engineering and

Aggregation Model for English Derivational Morphology Daniel Deutsch,* John Hewitt,* and Dan Roth

Distributional Semantics The unsupervised modeling of meaning on a large scale Tim Van de Cruys

Outline Information Retrieval (IR) Syntactic IR Problems of Syntactic IR Semantic

Chapter 3: Syntactic Forms, Grammatical Functions, and Semantic Roles Syntactic Constructions in

Resugaring: Lifting Evaluation Sequences through Syntactic Sugar Justin Pombrio, Shriram

Introduction Syntactic analysis (5LN455) Syntactic parsing (5LN713/5LN717) 2017-11-07 Sara

Today Morphology of verbs, contd Derivational vs. Inflectional Tense & Aspect

Distributional Compositionality Intro to Distributional Semantics Raffaella Bernardi University

Linear mixed models with improper priors and flexible distributional assumptions for longitudinal

Statistics and Samples in Distributional Reinforcement Learning Mark Rowland, Robert Dadashi,

Statistics and Samples in Distributional Reinforcement Learning Rowland, Dadashi, Kumar, Munos,

Compositional Distributional Semantic Models for Semantic Relatedness and Entailment Sidharth

Automatic construction of distributional thesaurus (for multiple languages) Zheng ZHANG 1 st

Exponential smoothing and non-negative data Muhammad Akram Rob J Hyndman J Keith Ord Business

THE COMPARISON OF INCOME THE COMPARISON OF INCOME SMOOTHING AND MARKET SMOOTHING AND MARKET

Testing for Poverty Traps: Asset Smoothing versus Consumption Smoothing in Burkina Faso (with

Understanding E-commerce Fraud from Autonomous Chat with Cybercriminals Peng Wang , Xiaojing Liao,

Automatically Tuning Task-Based Programs for Multi-core Processors Jin Zhou Brian Demsky

Standard Implementation Methodology (SIM) High Level Overview of Enhancements for SIM v3.0 Helen

Lecture 14 Zero Knowledge I From Secure Communication to Complex Interactions Alice (x) Bob

Benchmarking: The Way Forward for Software Evolution Susan Elliott Sim University of

W3C Web Cryptography Next Steps Workshop Natasha Rooney, GSMA @thisNatasha GSMA: Telecoms

Making Decisions via Simulation Factor Screening [Law, Ch. 10], [Handbook of Sim. Opt.], [Haas,

Microwave Instrument Update Bjorn Lambrigtsen Frank Sun Steve Broberg Jet Propulsion Laboratory

Derivational Smoothing for Syntactic Distributional Semantics o , - PowerPoint PPT Presentation

Derivational Smoothing for Syntactic Distributional Semantics o , Jan Snajder , and Britta Zeller Sebastian Pad Institute for Computational Linguistics, Heidelberg University Faculty of Electrical Engineering and

Aggregation Model for English Derivational Morphology Daniel Deutsch,* John Hewitt,* and Dan Roth

Distributional Semantics The unsupervised modeling of meaning on a large scale Tim Van de Cruys

Outline Information Retrieval (IR) Syntactic IR Problems of Syntactic IR Semantic

Chapter 3: Syntactic Forms, Grammatical Functions, and Semantic Roles Syntactic Constructions in

Resugaring: Lifting Evaluation Sequences through Syntactic Sugar Justin Pombrio, Shriram

Introduction Syntactic analysis (5LN455) Syntactic parsing (5LN713/5LN717) 2017-11-07 Sara

Today Morphology of verbs, contd Derivational vs. Inflectional Tense &amp; Aspect

Distributional Compositionality Intro to Distributional Semantics Raffaella Bernardi University

Linear mixed models with improper priors and flexible distributional assumptions for longitudinal

Statistics and Samples in Distributional Reinforcement Learning Mark Rowland, Robert Dadashi,

Statistics and Samples in Distributional Reinforcement Learning Rowland, Dadashi, Kumar, Munos,

Compositional Distributional Semantic Models for Semantic Relatedness and Entailment Sidharth

Automatic construction of distributional thesaurus (for multiple languages) Zheng ZHANG 1 st

Exponential smoothing and non-negative data Muhammad Akram Rob J Hyndman J Keith Ord Business

THE COMPARISON OF INCOME THE COMPARISON OF INCOME SMOOTHING AND MARKET SMOOTHING AND MARKET

Testing for Poverty Traps: Asset Smoothing versus Consumption Smoothing in Burkina Faso (with

Understanding E-commerce Fraud from Autonomous Chat with Cybercriminals Peng Wang , Xiaojing Liao,

Automatically Tuning Task-Based Programs for Multi-core Processors Jin Zhou Brian Demsky

Standard Implementation Methodology (SIM) High Level Overview of Enhancements for SIM v3.0 Helen

Lecture 14 Zero Knowledge I From Secure Communication to Complex Interactions Alice (x) Bob

Benchmarking: The Way Forward for Software Evolution Susan Elliott Sim University of

W3C Web Cryptography Next Steps Workshop Natasha Rooney, GSMA @thisNatasha GSMA: Telecoms

Making Decisions via Simulation Factor Screening [Law, Ch. 10], [Handbook of Sim. Opt.], [Haas,

Microwave Instrument Update Bjorn Lambrigtsen Frank Sun Steve Broberg Jet Propulsion Laboratory

Today Morphology of verbs, contd Derivational vs. Inflectional Tense & Aspect