Incorporating Relational Knowledge into Word Representations using - PowerPoint PPT Presentation

Incorporating Relational Knowledge into Word Representations using Subspace Regularization Jun Araki (Carnegie Mellon University) joint work with Abhishek Kumar (IBM Research) ACL 2016

Distributed word representations • Low-dimensional dense word vectors learned from unstructured text – Based on distributional hypothesis (Harris, 1954) – Capture semantic and syntactic regularities of words, encoding word relations • e.g., – Publicly available, well-developed software: word2vec and GloVe – Successfully applied to various NLP tasks 2

Underlying motivation • Two variants of the word2vec algorithm by Mikolov et al. (2013) – Skip-gram maximizes – Continuous bag-of-words (CBOW) maximizes 3

Underlying motivation • Two variants of the word2vec algorithm by Mikolov et al. (2013) – Skip-gram maximizes – Continuous bag-of-words (CBOW) maximizes • They rely on co-occurrence statistics only • Motivation : combining word representation learning with lexical knowledge 4

Prior work (1): Grouping similar words • Lexical knowledge: {( w i , r , w j )} – Words w i and w j are connected by relation type r 5

Prior work (1): Grouping similar words • Lexical knowledge: {( w i , r , w j )} – Words w i and w j are connected by relation type r • Treats w i and w j as generic similar words – (Yu and Dredze, 2014; Faruqui et al., 2015; Liu et al., 2015) – Regularization effect: – Based on a (over-)generalized notion of word similarity – Ignores relation types 6

Prior work (1): Grouping similar words • Lexical knowledge: {( w i , r , w j )} – Words w i and w j are connected by relation type r • Treats w i and w j as generic similar words – (Yu and Dredze, 2014; Faruqui et al., 2015; Liu et al., 2015) – Regularization effect: – Based on a (over-)generalized notion of word similarity – Ignores relation types • Limitations – Places an implicit restriction on relation types • E.g., synonyms and paraphrases 7

Prior work (2): Constant translation model • CTM models each relation type r by a relation vector r – (Bordes et al., 2013; Xu et al., 2014; Fried and Duh, 2014) – Regularization effect: – Assumes that w i can be translated into w j by a simple sum with a single relation vector 8

Prior work (2): Constant translation model • CTM models each relation type r by a relation vector r – (Bordes et al., 2013; Xu et al., 2014; Fried and Duh, 2014) – Regularization effect: – Assumes that w i can be translated into w j by a simple sum with a single relation vector • Limitations – The assumption can be very restrictive when word representations are learned from co-occurrence instances – Not suitable for modeling: • symmetric relations (e.g., antonymy) • transitive relations (e.g., hypernymy) 9

Subspace-regularized word embeddings • We model each relation type by a low-rank subspace – This relaxes the constant translation assumption – Suitable for both symmetric and transitive relations • Formalization – Relational knowledge: – Difference vector: – Construct a matrix stacking difference vectors • Assumption : D k is approximately of low rank p where and 10

Rank-1 subspace regularization • p = 1  where and – All difference vectors for the same relation type are collinear • Minimizes a joint objective: • Example: relation “capital - of” – Our method: – CTM: Berlin Beijing Cairo China Egypt Germany 11

Optimization for word vectors • We use parallel asynchronous SGD with negative sampling – Each thread works on a predefined segment of the text corpus by: • sampling a target word and its local context window, and • updating the parameters stored in a shared memory – Puts our regularizer on input embeddings • Gradient updates by regularization 12

Optimization for relation parameters • Optimizes and by solving the batch optimization problem – Launches a thread that keeps solving the problem – Alternates between two least-squares sub- problems for and – Uses projected gradient descent with an asynchronous batch update 13

Data sets • Text corpus – English Wikipedia: ~4.8M articles and ~2B tokens • Relational knowledge data – WordRep (Gao et al., 2014) • 44,584 triplets ( w i , r , w j ) of 25 relation types from WordNet etc. – Google word analogy (Mikolov et al., 2013) • 19,544 quadruplets of a : b :: c : d from 550 triplets ( w i , r , w j ) • Relations used for our training – Split the WordRep triplets randomly to <train>:<test> = 4:1 – Remove from <train> triplets containing words in Google analogy data 14

Results (1): Knowledge-base completion • Task: – Complete ( x , r , y ) by predicting y* for the missing word y given x and r • Inference by RELSUB – y* = the word closest to the rank-1 subspace x + s r where | s |≤ c • Inference by RELCONST – y* = the word closest to x + r 15

Results (2): Word analogy • Task: – Complete a : b :: c : d by predicting d* for the missing word d given a , b and c • Inference by RELSUB and RELCONST – d* = the word closest to c + b - a 16

Conclusion and future work • Conclusion – We present a novel approach for modeling relational knowledge based on rank-1 subspace regularization – We show the effectiveness of the approach on standard tasks • Future work – Investigate the interplay between word frequencies and regularization strength – Study higher-rank subspace regularization • Formalization for word similarity – Evaluate our methods by other metrics including downstream tasks 17

Thank you very much. Any questions? 18

Incorporating Relational Knowledge into Word Representations using - PowerPoint PPT Presentation

Incorporating Relational Knowledge into Word Representations using Subspace Regularization Jun Araki (Carnegie Mellon University) joint work with Abhishek Kumar (IBM Research) ACL 2016 Distributed word representations Low-dimensional dense

Chapter 2: Relational Model Chapter 2: Relational Model Structure of Relational Databases

Chapter 3: Relational Model Structure of Relational Databases Relational Algebra Tuple

Relational Algebra Relational Query Languages Recall: Query = Retrieval Program Language

Relational Algebra 1 / 39 Relational Algebra Relational model specifies stuctures and

Relational Query Languages (2) SQL and QBE Walid G. Aref Query Languages For The Relational

Chapter 8 Evaluation of Relational Operators Implementing the Relational Algebra Relational

Relational Calculus More declarative than relational algebra Foundation for query

RELATIONAL ALGEBRA CHAPTER 6 1 CHAPTER 6 OUTLINE Unary Relational Operations: SELECT and

Relational Data Model Hacettepe University Computer Engineering Department Outline 1. Relational

This Lecture The Relational Model Relational data structures Relations and Relational

Knowledge-Based Agents knowledge knowledge representation, knowledge base, types of knowledge

Relational Non-Relational Rational Agile Predictable Flexible Traditional

CSE 154 LECTURE 13:RELATIONAL DATABASES AND SQL Relational databases relational database : A

CSC 337 LECTURE 20: RELATIONAL DATABASES AND SQL Relational databases relational database : A

Relational Calculus Another Theoretical QL-Relational Calculus Comes in two flavors: Tuple

Extended RA Database Systems: The Complete Book Ch 5.1-5.2, 15.4 1 Relational Algebra A Set of

Free abelian covers and arrangements of Schubert varieties Alex Suciu Northeastern University

Post-quantum RSA (pqRSA) Daniel J. Bernstein Joint work with: Josh Fried Nadia Heninger Paul

A remark on the composition of polynomial Erhard Aichinger and functions over algebraically

GAGTA7 Conference Dynamics for the splittings of free-by-cyclic groups Ilya Kapovich University

Alaskas Economic Climate November 16th, 2016 Alaska Department of Labor and Workforce

As find your seats, consider this scenario: In the middle of the semester, a student comes to

14 January 2017 Curriculum STELLAR Customised Learning Packages o Grammar o Writing

Weak concrete mathematical incompleteness, phase transitions and reverse mathematics Florian

Sambuz

Useful Links

Newsletter

Mail Us