Seman&cs with Dense Vectors Dorota Glowacka - PowerPoint PPT Presentation

Seman&cs ¡with ¡Dense ¡Vectors ¡ Dorota ¡Glowacka ¡ dorota.glowacka@ed.ac.uk ¡

Previous ¡lectures: ¡ ¡-‑ ¡how ¡to ¡represent ¡a ¡word ¡as ¡a ¡sparse ¡vector ¡ with ¡dimensions ¡corresponding ¡to ¡the ¡words ¡in ¡ the ¡vocabulary ¡ ¡ ¡ ¡-‑ ¡the ¡values ¡in ¡the ¡vector ¡were ¡a ¡func&on ¡of ¡ the ¡ count ¡ of ¡ the ¡ word ¡ co-‑occurring ¡ with ¡ each ¡ neighbouring ¡word ¡ ¡ -‑ ¡ each ¡ word ¡ is ¡ thus ¡ represented ¡ with ¡ a ¡ vector ¡that ¡is ¡ long ¡(with ¡vocabularies ¡of ¡20,000 ¡ to ¡ 50,000) ¡ and ¡ sparse ¡ (with ¡ most ¡ elements ¡ of ¡ the ¡vector ¡for ¡each ¡word ¡equal ¡to ¡zero) ¡ ¡ ¡

Today’s ¡Lecture ¡ • How ¡to ¡represent ¡a ¡word ¡with ¡vectors ¡that ¡are ¡ short ¡(with ¡length ¡of ¡50 ¡– ¡1,000) ¡and ¡ dense ¡(most ¡ values ¡are ¡non-‑zero) ¡ • Why ¡short ¡vectors? ¡ ¡-‑ ¡easier ¡to ¡include ¡as ¡features ¡in ¡machine ¡ learning ¡systems ¡ ¡-‑ ¡because ¡they ¡contain ¡fewer ¡parameters, ¡they ¡ generalize ¡beRer ¡and ¡are ¡less ¡prone ¡to ¡overfiTng ¡ ¡-‑ ¡sparse ¡vectors ¡are ¡beRer ¡at ¡capturing ¡ synonymy ¡ ¡

Singular ¡Value ¡Decomposi&on ¡(SVD) ¡ • SVD ¡is ¡a ¡method ¡for ¡finding ¡ the ¡most ¡important ¡ dimensions ¡ of ¡a ¡dataset ¡ • It ¡can ¡be ¡applied ¡to ¡any ¡rectangular ¡matrix ¡ • SVD ¡ belongs ¡ to ¡ a ¡ family ¡ of ¡ methods ¡ that ¡ can ¡ approximate ¡ an ¡ N-‑ dimensional ¡ dataset ¡ using ¡ fewer ¡ dimensions , ¡ such ¡ as ¡ Principle ¡ Component ¡ Analysis ¡(PCA) ¡ or ¡ Factor ¡Analysis ¡ • First ¡applied ¡in ¡ Latent ¡Seman>c ¡Analysis ¡(LSA) ¡ to ¡ tasks ¡ genera&ng ¡ embeddings ¡ from ¡ term-‑ document ¡matrices ¡

Singular ¡Value ¡Decomposi&on ¡(SVD) ¡ • Dimensionality ¡reduc&on ¡methods ¡first ¡ rotate ¡the ¡ axes ¡ of ¡the ¡original ¡dataset ¡into ¡a ¡ new ¡space . ¡ • The ¡new ¡space ¡is ¡chosen ¡so ¡that ¡the ¡ highest ¡order ¡ dimension ¡ captures ¡ the ¡ most ¡ variance ¡ in ¡ the ¡ original ¡dataset, ¡the ¡next ¡dimension ¡captures ¡the ¡ next ¡most ¡variance, ¡and ¡so ¡on. ¡ • While ¡ some ¡ informa&on ¡ about ¡ the ¡ rela&onship ¡ between ¡the ¡original ¡points ¡is ¡necessarily ¡lost ¡in ¡ the ¡ new ¡ transforma&on, ¡ the ¡ remaining ¡ dimensions ¡preserve ¡as ¡much ¡as ¡possible ¡of ¡the ¡ original ¡seTng. ¡

Latent ¡Seman&c ¡Analysis ¡(LSA) ¡ • LSA ¡is ¡a ¡par&cular ¡applica&on ¡of ¡SVD ¡to ¡a ¡| V | ¡ × ¡c ¡term-‑ document ¡matrix ¡ X ¡represen&ng ¡| V | ¡words ¡and ¡their ¡ co-‑occurrence ¡with ¡ c ¡ documents. ¡ • SVD ¡ factorizes ¡ matrix ¡ X ¡ into ¡ the ¡ product ¡ of ¡ three ¡ matrices: ¡ 1. | V | × ¡ m ¡ matrix ¡ W , ¡ where ¡ each ¡ row ¡ w ¡ represents ¡ a ¡ word ¡and ¡each ¡column ¡represents ¡ m ¡dimensions ¡in ¡a ¡ latent ¡space. ¡ ¡ m ¡column ¡vectors ¡are ¡orthogonal ¡to ¡each ¡other ¡and ¡are ¡ ordered ¡by ¡the ¡amount ¡of ¡variance ¡in ¡the ¡original ¡dataset ¡ m ¡= ¡rank ¡of ¡ X ¡ (number ¡of ¡linearly ¡independent ¡rows) ¡ ¡

Latent ¡Seman&c ¡Analysis ¡(LSA) ¡ 2. ¡ Σ ¡is ¡a ¡diagonal ¡ m ¡× ¡m ¡ matrix ¡with ¡ singular ¡values ¡along ¡ the ¡ diagonal, ¡ expressing ¡ the ¡ importance ¡ of ¡ each ¡ dimension. ¡ 3. ¡The ¡ m ¡× ¡c ¡ matrix ¡ C , ¡where ¡each ¡row ¡represents ¡one ¡of ¡ the ¡ latent ¡ dimensions ¡ and ¡ the ¡ m ¡ row ¡ vectors ¡ are ¡ orthogonal ¡to ¡each ¡other. ¡ • By ¡using ¡only ¡the ¡first ¡ k ¡dimensions ¡of ¡ W, ¡Σ ¡ and ¡ C , ¡the ¡ product ¡ of ¡ these ¡ 3 ¡ matrices ¡ becomes ¡ a ¡ least-‑squares ¡ approxima&on ¡to ¡the ¡original ¡ X . ¡ • Since ¡ the ¡ first ¡ dimensions ¡ encode ¡ the ¡ most ¡ variance, ¡ SVD ¡ models ¡ the ¡ most ¡ important ¡ informa&on ¡ in ¡ the ¡ original ¡ X ¡

! $ ! $ # & # & ! $ ! $ … 0 0 σ 1 0 # & # & # & # & # & # & … 0 0 # σ 2 & # & 0 # & # & # & # & … 0 X W C = 0 0 σ 3 # & # & # & # & ! # & # & # ! ! ! " & # & # & # & # & # & σ m … 0 0 0 # & # & # & # & " % " % # & # & " % " % m × m m × c V × m V × c

Taking ¡only ¡the ¡top ¡ k ¡≤ ¡m ¡ dimensions ¡a]er ¡SVD ¡is ¡applied ¡to ¡the ¡co-‑occurrence ¡matrix ¡ X : ¡ ¡ ! $ ! $ # & # & ! $ … 0 0 σ 1 # & 0 # & # & # & # & … 0 0 σ 2 # & 0 # & # & # & ! $ … 0 X W k C = # & 0 0 σ 3 # & # & " % # & ! # & # ! ! ! " & # & k × c # & # & σ k # & … 0 0 0 # & # & " % # & # & " % # & " % k × k V × k V × c

SVD ¡and ¡LSA ¡ • Using ¡only ¡the ¡top ¡ k ¡dimensions ¡leads ¡to ¡a ¡ reduced ¡ W ¡matrix, ¡with ¡one ¡ k -‑dimensioned ¡ row ¡per ¡word ¡ • This ¡row ¡acts ¡as ¡a ¡ dense ¡ k-‑ dimensional ¡vector ¡ (embedding) ¡represen&ng ¡that ¡word ¡ • LSA ¡embeddings ¡generally ¡set ¡k ¡= ¡300 ¡ • LSA ¡applies ¡a ¡par&cular ¡weigh&ng ¡for ¡each ¡co-‑ occurrence ¡cell ¡that ¡mul&plies ¡two ¡weights: ¡ local ¡and ¡ global ¡

LSA ¡term ¡weigh&ng ¡ • The ¡ local ¡ weight ¡of ¡each ¡ term ¡ i ¡in ¡ document ¡ j ¡ is ¡ its ¡log ¡frequency: ¡ ( ) + 1 log f i , j • The ¡ global ¡ weight ¡of ¡term ¡i ¡is ¡a ¡version ¡of ¡its ¡ entropy : ¡ ( ) log p i , j ( ) 1 + ∑ j p i , j log D where ¡ D ¡is ¡the ¡number ¡of ¡documents. ¡

SVD ¡and ¡word-‑context ¡ • In ¡LSA, ¡SVD ¡is ¡applied ¡to ¡the ¡term-‑document ¡ matrix. ¡ • An ¡ alterna&ve ¡ is ¡ to ¡ apply ¡ SVD ¡ to ¡ the ¡ word-‑ word ¡ or ¡ word-‑context ¡ matrix ¡ – ¡ the ¡ context ¡ dimensions ¡are ¡words ¡(rather ¡than ¡documents ¡ as ¡in ¡LSA) ¡ • Relies ¡on ¡PPMI-‑weighted ¡word-‑word ¡matrix ¡ • Only ¡top ¡dimensions ¡are ¡used ¡– ¡ truncated ¡SVD ¡

Skip-‑gram ¡and ¡CBOW ¡ • Methods ¡ for ¡ genera&ng ¡ dense ¡ embeddings ¡ inspired ¡by ¡neural ¡network ¡models ¡ • Neural ¡ network ¡ language ¡ models ¡ are ¡ given ¡ a ¡ word ¡and ¡predict ¡a ¡context ¡– ¡this ¡process ¡can ¡be ¡ used ¡to ¡learn ¡word ¡embeddings. ¡ • The ¡intui&on ¡is ¡that ¡words ¡with ¡similar ¡meanings ¡ tend ¡to ¡occur ¡near ¡each ¡other ¡in ¡text. ¡ • The ¡process ¡for ¡learning ¡these ¡embeddings ¡has ¡a ¡ strong ¡ rela&onship ¡ with ¡ SVD ¡ factoriza&on ¡ and ¡ dot-‑product ¡similarity ¡metrics. ¡

Skip-‑gram ¡Model ¡ • Learns ¡two ¡separate ¡embeddings ¡for ¡each ¡word ¡ w : ¡ word ¡embedding ¡v ¡and ¡ context ¡embedding ¡c . ¡ • Embeddings ¡ encoded ¡ in ¡ two ¡ matrices: ¡ word ¡ matrix ¡W ¡and ¡ context ¡matrix ¡C . ¡ • Each ¡ row ¡ i ¡ of ¡ word ¡ matrix ¡ W ¡ is ¡ 1 ¡ x ¡ d ¡ vector ¡ embedding ¡ v i ¡ for ¡word ¡ i ¡ vocabulary ¡ V . ¡ • Each ¡column ¡ i ¡of ¡the ¡context ¡matrix ¡ C ¡is ¡a ¡ d ¡x ¡1 ¡ vector ¡embedding ¡ c i ¡for ¡word ¡ i ¡in ¡vocabulary ¡ V . ¡

Seman&cs with Dense Vectors Dorota Glowacka - PowerPoint PPT Presentation

Seman&cs with Dense Vectors Dorota Glowacka dorota.glowacka@ed.ac.uk Previous lectures: - how to represent a word as a sparse vector with

Vector'Semantics Dense%Vectors% Dan%Jurafsky Sparse'versus'dense'vectors PPMI%vectors%are

Vectors Vectors and Scalars Properties of Vectors Components of a Vector and Unit

Orthonormal bases of functions April 24, 2018 Data - Vectors or Functions Vectors Functions

Methods of Adding Vectors Geometrically MCV4U: Calculus & Vectors Recall that two vectors are

Dense Flow Visualization Lecture 10 February 27, 2020 General Overview Dense methods in 2D

A Massively Parallel Dense Symmetric A Massively Parallel Dense Symmetric A Massively Parallel

Results for different matrices and comparisons Dense Matrices Rectangular Matrices

Seman<cs of Language Learning Language from The meaning

Some seman&c issues in acquisi&on Learning quan&fier

Dense cold mixes: Preservation of Dense cold mixes: Preservation of county roads county roads

Geometric Vectors A geometric vector is a representation of a vector using an arrow diagram, or

Vectors and Semantics Peter Turney Vectors and Semantics Vision of the Future future of

JUST THE MATHS SLIDES NUMBER 8.2 VECTORS 2 (Vectors in component form) by A.J.Hobson

Multiplying a Vector By a Scalar MCV4U: Calculus & Vectors Compare the two vectors, u and

VIDEO SIGNALS VIDEO SIGNALS Corners and Shapes PROJECTION OF VECTORS PROJECTION OF VECTORS

Geometric Vectors A geometric vector is a representation of a vector using an arrow diagram, or

Overview Given two bases B and C for the same vector space, we saw yesterday how P P to find the

How to Write Fast Numerical Code Spring 2011 Lecture 15 Instructor: Markus Pschel TA: Georg

May 30, 2005 Matlab Tutorial 1. Arrays and Matrices Row Vector r = [1 , 2

Introduction to MATLAB Markus Kuhn Computer Laboratory, University of Cambridge

Subspaces and the Three Matrix Spaces Subspaces Defn. A subspace of a vector space V is a subset

Communicating uncertainty about mig igration statistics David Spiegelhalter Chairman of the

Roy L. Crole University of Leicester, UK Midlands Graduate School, University

Borel-de Siebenthal theory for real affine root systems R. Venkatesh Department of Mathematics,

Seman&cs with Dense Vectors Dorota Glowacka - PowerPoint PPT Presentation

Seman&cs with Dense Vectors Dorota Glowacka dorota.glowacka@ed.ac.uk Previous lectures: - how to represent a word as a sparse vector with

Vector'Semantics Dense%Vectors% Dan%Jurafsky Sparse'versus'dense'vectors PPMI%vectors%are

Vectors Vectors and Scalars Properties of Vectors Components of a Vector and Unit

Orthonormal bases of functions April 24, 2018 Data - Vectors or Functions Vectors Functions

Methods of Adding Vectors Geometrically MCV4U: Calculus &amp; Vectors Recall that two vectors are

Dense Flow Visualization Lecture 10 February 27, 2020 General Overview Dense methods in 2D

A Massively Parallel Dense Symmetric A Massively Parallel Dense Symmetric A Massively Parallel

Results for different matrices and comparisons Dense Matrices Rectangular Matrices

Seman&lt;cs of Language Learning Language from The meaning

Some seman&amp;c issues in acquisi&amp;on Learning quan&amp;fier

Dense cold mixes: Preservation of Dense cold mixes: Preservation of county roads county roads

Geometric Vectors A geometric vector is a representation of a vector using an arrow diagram, or

Vectors and Semantics Peter Turney Vectors and Semantics Vision of the Future future of

JUST THE MATHS SLIDES NUMBER 8.2 VECTORS 2 (Vectors in component form) by A.J.Hobson

Multiplying a Vector By a Scalar MCV4U: Calculus &amp; Vectors Compare the two vectors, u and

VIDEO SIGNALS VIDEO SIGNALS Corners and Shapes PROJECTION OF VECTORS PROJECTION OF VECTORS

Geometric Vectors A geometric vector is a representation of a vector using an arrow diagram, or

Overview Given two bases B and C for the same vector space, we saw yesterday how P P to find the

How to Write Fast Numerical Code Spring 2011 Lecture 15 Instructor: Markus Pschel TA: Georg

May 30, 2005 Matlab Tutorial 1. Arrays and Matrices Row Vector r = [1 , 2

Introduction to MATLAB Markus Kuhn Computer Laboratory, University of Cambridge

Subspaces and the Three Matrix Spaces Subspaces Defn. A subspace of a vector space V is a subset

Communicating uncertainty about mig igration statistics David Spiegelhalter Chairman of the

Roy L. Crole University of Leicester, UK Midlands Graduate School, University

Borel-de Siebenthal theory for real affine root systems R. Venkatesh Department of Mathematics,

Methods of Adding Vectors Geometrically MCV4U: Calculus & Vectors Recall that two vectors are

Seman<cs of Language Learning Language from The meaning

Some seman&c issues in acquisi&on Learning quan&fier

Multiplying a Vector By a Scalar MCV4U: Calculus & Vectors Compare the two vectors, u and