equipe melodi Distributional Semantics The unsupervised modeling of meaning on a large scale Tim Van de Cruys IRIT, Toulouse Tuesday 17 November 2015
Distributional similarity distributional hypothesis [Harris 1954] 2/52 — Distributional Semantics — tim.vandecruys@irit.fr • The induction of meaning from text is based on the • Take a word and its contexts: • tasty sooluceps • sweet sooluceps • stale sooluceps • freshly baked sooluceps • By looking at a word’s context, one can infer its meaning
Distributional similarity distributional hypothesis [Harris 1954] 2/52 — Distributional Semantics — tim.vandecruys@irit.fr • The induction of meaning from text is based on the • Take a word and its contexts: • tasty sooluceps ⇒ food • sweet sooluceps • stale sooluceps • freshly baked sooluceps • By looking at a word’s context, one can infer its meaning
Distributional similarity distributional hypothesis [Harris 1954] 2/52 — Distributional Semantics — tim.vandecruys@irit.fr • The induction of meaning from text is based on the • Take a word and its contexts: • tasty sooluceps • sweet sooluceps • stale sooluceps • freshly baked sooluceps • By looking at a word’s context, one can infer its meaning
Matrix 0 1 1 0 1 truck 2 1 0 1 car 0 2 2 strawberry 0 0 1 2 raspberry second-hand fast tasty red 3/52 — Distributional Semantics — tim.vandecruys@irit.fr • captures co-occurrence frequencies of two entities
Matrix 0 4 3 0 2 truck 4 8 0 7 car 0 6 12 strawberry 0 0 9 7 raspberry second-hand fast tasty red 3/52 — Distributional Semantics — tim.vandecruys@irit.fr • captures co-occurrence frequencies of two entities
Matrix 0 29 18 0 4 truck 39 31 0 23 car 0 34 44 strawberry 0 0 98 56 raspberry second-hand fast tasty red 3/52 — Distributional Semantics — tim.vandecruys@irit.fr • captures co-occurrence frequencies of two entities
Matrix 0 293 393 0 104 truck 370 487 0 392 car 2 437 1035 strawberry 0 1 592 728 raspberry second-hand fast tasty red 3/52 — Distributional Semantics — tim.vandecruys@irit.fr • captures co-occurrence frequencies of two entities
Vector space model red fast car 4/52 — Distributional Semantics — tim.vandecruys@irit.fr raspberry strawberry
Word-context matrix context1 context2 context3 context4 word1 word2 word3 word4 He drove his second-hand car a couple of miles down the road . 5/52 — Distributional Semantics — tim.vandecruys@irit.fr • Different notions of context • window around word • dependency-based features (extracted from parse trees)
Word-context matrix context1 context2 context3 context4 word1 word2 word3 word4 5/52 — Distributional Semantics — tim.vandecruys@irit.fr • Different notions of context • window around word (2 words) • dependency-based features (extracted from parse trees) He drove [ his second-hand car a couple ] of miles down the road .
Word-context matrix context1 context2 context3 context4 word1 word2 word3 word4 5/52 — Distributional Semantics — tim.vandecruys@irit.fr • Different notions of context • window around word (sentence) • dependency-based features (extracted from parse trees) [ He drove his second-hand car a couple of miles down the road . ]
Word-context matrix context1 context2 context3 context4 word1 word2 word3 word4 He drove his second-hand car a couple of miles down the road . obj mod 5/52 — Distributional Semantics — tim.vandecruys@irit.fr • Different notions of context • window around word • dependency-based features (extracted from parse trees)
Different kinds of semantic similarity (co-)hyponymous as association and meronymy 6/52 — Distributional Semantics — tim.vandecruys@irit.fr • ‘tight’, synonym-like similarity : (near-)synonymous or • loosely related, topical similarity : more loose relationships, such
Different kinds of semantic similarity (co-)hyponymous as association and meronymy Example treatment , illness 6/52 — Distributional Semantics — tim.vandecruys@irit.fr • ‘tight’, synonym-like similarity : (near-)synonymous or • loosely related, topical similarity : more loose relationships, such • doctor : nurse , GP , physician , practitioner , midwife , dentist , surgeon • doctor : medication , disease , surgery , hospital , patient , clinic , nurse ,
Relation context – similarity 7/52 — Distributional Semantics — tim.vandecruys@irit.fr • Different context leads to different kind of similarity • Syntax, small window ↔ large window, documents • The former models induce tight, synonymous similarity • The latter models induce topical relatedness
Computing similarity … strawberry psychologist, physicist, sociologist, statistician 8/52 — Distributional Semantics — tim.vandecruys@irit.fr • Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday • blackberry, blackcurrant, blueberry, raspberry, redcurrant, • anthropologist, biologist, economist, linguist, mathematician, • drought, earthquake, famine, flood, flooding, storm, tsunami
…on a large scale from the web, … formats that take advantage of the sparseness 9/52 — Distributional Semantics — tim.vandecruys@irit.fr • Frequency matrices are extracted from very large corpora • Large collections of newspapers, Wikipedia, documents crawled • > 100 billion words • Large demands with regard to computing power and memory • Matrices are very sparse → use of algorithms and storage
…on a large scale framework (10 nodes, 640 cores in total) 10/52 — Distributional Semantics — tim.vandecruys@irit.fr • Take advantage of parallel computations • Many algorithms can be implemented within a map-reduce • Collection of frequency matrices • Matrix transformations • Syntactic parsing • Make use of IRIT’s high performance computing cluster OSIRIM • Huge speedup
Dimensionality reduction Two reasons for performing dimensionality reduction: similarity computations may become intractable again is able to capture intrinsic semantic features data sparseness and noise) 11/52 — Distributional Semantics — tim.vandecruys@irit.fr • Intractable computations • When number of elements and number of features is too large, • reduction of the number of features makes computation tractable • Generalization capacity • the dimensionality reduction is able to describe the data better, or • dimensionality reduction is able to improve the results (counter
Non-negative matrix factorization and H such that: (1) additive, no subtractive relations are allowed 12/52 — Distributional Semantics — tim.vandecruys@irit.fr • Given a non-negative matrix V, find non-negative matrix factors W V n × m ≈ W n × r H r × m • Choosing r ≪ n , m reduces data • Constraint on factorization: all values in three matrices need to be non-negative values ( ≥ 0 ) • Constraint brings about a parts-based representation: only • Particularly useful for finding topical, thematic information
13/52 — Distributional Semantics — tim.vandecruys@irit.fr Graphical Representation context words k context words H k = V W nouns nouns x
Example dimensions via logiciel pomme universitaires tumeurs connexion saumon scolarité lésions canard enseignant étudiants cardiaque internet poire étudiant métabolisme html fumé formateurs artérielle veau dim 9 desserts dim 12 dim 21 dim 24 infection fichiers agneau professeurs respiratoire windows cursus serveur respiratoires messagerie miel enseignants maladies téléchargement boeuf pédagogique nerveux 14/52 — Distributional Semantics — tim.vandecruys@irit.fr
Word meaning in context ‘global’ word meaning meaning (1) Jack is listening to a record (2) Jill updated the record 15/52 — Distributional Semantics — tim.vandecruys@irit.fr • Standard word space models are good at capturing general, ↔ Words have different senses ↔ Meaning of individual word instances differs significantly • Context is determining factor for construction of individual word
Word meaning in context ‘global’ word meaning meaning (1) Jack is listening to a record (2) Jill updated the record 15/52 — Distributional Semantics — tim.vandecruys@irit.fr • Standard word space models are good at capturing general, ↔ Words have different senses ↔ Meaning of individual word instances differs significantly • Context is determining factor for construction of individual word
Word meaning in context ‘global’ word meaning meaning (1) Jack is listening to a record (2) Jill updated the record 15/52 — Distributional Semantics — tim.vandecruys@irit.fr • Standard word space models are good at capturing general, ↔ Words have different senses ↔ Meaning of individual word instances differs significantly • Context is determining factor for construction of individual word
Recommend
More recommend