 
              The Role of Dimensionality Reduction in Distributional Semantics or: having fun with matrix algebra Stefan Evert Technische Universität Darmstadt, Germany evert@linglit.tu-darmstadt.de Leuven Statistics Days 8 June 2012 Stefan Evert (TU Darmstadt) Dimensionality Reduction for DSM wordspace.collocations.de 1 / 50
Outline Outline Introduction Definitions and notation Sparse high-dimensional models Dimensionality reduction Singular value decomposition (SVD) Interpretations of SVD Alternatives to SVD A case study Outlook and discussion Stefan Evert (TU Darmstadt) Dimensionality Reduction for DSM wordspace.collocations.de 2 / 50
Introduction Definitions and notation Outline Introduction Definitions and notation Sparse high-dimensional models Dimensionality reduction Singular value decomposition (SVD) Interpretations of SVD Alternatives to SVD A case study Outlook and discussion Stefan Evert (TU Darmstadt) Dimensionality Reduction for DSM wordspace.collocations.de 3 / 50
Introduction Definitions and notation General definition of DSMs A distributional semantic model (DSM) is a scaled and/or transformed co-occurrence matrix M , such that each row m represents the distribution of a target term across contexts. get see use hear eat kill knife 0.027 -0.024 0.206 -0.022 -0.044 -0.042 cat 0.031 0.143 -0.243 -0.015 -0.009 0.131 dog -0.026 0.021 -0.212 0.064 0.013 0.014 boat -0.022 0.009 -0.044 -0.040 -0.074 -0.042 cup -0.014 -0.173 -0.249 -0.099 -0.119 -0.042 pig -0.069 0.094 -0.158 0.000 0.094 0.265 banana 0.047 -0.139 -0.104 -0.022 0.267 -0.042 Term = word form, lemma, phrase, morpheme, word pair, . . . Targets = rows (terms whose distribution is represented) Features = columns (individual contexts or collocates) Stefan Evert (TU Darmstadt) Dimensionality Reduction for DSM wordspace.collocations.de 4 / 50
Introduction Definitions and notation Notation: term-context matrix Frequency matrix F œ R k · n ( term-context row vectors f i œ R n ) y n h i p a o p e s a t o l t k d a a n l c i t r o i l h a a e e e l P B P K B F F S T f T · · · · · · cat 10 10 7 – – – – 1 f T W · · · · · · X dog – 10 4 11 – – – 2 W X . W X animal 2 15 10 2 – – – . W X F = . W X time 1 – – – 2 1 – W . X . W X . reason – 1 – – 1 4 1 U V f T cause – – – 2 1 2 6 · · · · · · k e ff ect – – – 1 – 1 – Interpretation as collection of row vectors : I F = ( f ij ) , where f ij = ( f i ) j = frequency count of target term t i in context c j (wrt. context tokens , here: Wikipedia articles) Stefan Evert (TU Darmstadt) Dimensionality Reduction for DSM wordspace.collocations.de 5 / 50
Introduction Definitions and notation Notation: term-term matrix Cooccurrence matrix M œ R k · n ( term-term row vectors m i œ R n ) t n a n t r i d a o y e d l p l l p e e e l i l m k x r a e i b k e i t f i l S T m T · · · · · · cat 83 17 7 37 – 1 – 1 m T W · · · · · · X dog 561 13 30 60 1 2 4 2 W X . W X animal 42 10 109 134 13 5 5 . W X M = . W X time 19 9 29 117 81 34 109 W . X . W X . reason 1 – 2 14 68 140 47 U V m T cause – 1 – 4 55 34 55 · · · · · · k e ff ect – – 1 6 60 35 17 Interpretation as collection of row vectors : I M = ( m ij ) , where m ij = ( m i ) j = cooccurrence frequency of target term t i with feature term τ j (a collocate of t i ) Stefan Evert (TU Darmstadt) Dimensionality Reduction for DSM wordspace.collocations.de 6 / 50
Introduction Definitions and notation DSM parameters Corpus with linguistic annotation Stefan Evert (TU Darmstadt) Dimensionality Reduction for DSM wordspace.collocations.de 8 / 50
Introduction Definitions and notation DSM parameters Corpus with linguistic annotation » Term-context vs. term-term matrix Stefan Evert (TU Darmstadt) Dimensionality Reduction for DSM wordspace.collocations.de 8 / 50
Introduction Definitions and notation DSM parameters Corpus with linguistic annotation » Term-context vs. term-term matrix » Type & size of context Stefan Evert (TU Darmstadt) Dimensionality Reduction for DSM wordspace.collocations.de 8 / 50
Introduction Definitions and notation DSM parameters Corpus with linguistic annotation » Term-context vs. term-term matrix » Type & size of context » Feature scaling Stefan Evert (TU Darmstadt) Dimensionality Reduction for DSM wordspace.collocations.de 8 / 50
Introduction Definitions and notation DSM parameters Corpus with linguistic annotation » Term-context vs. term-term matrix » Type & size of context » Feature scaling » Similarity/distance measure & normalisation Stefan Evert (TU Darmstadt) Dimensionality Reduction for DSM wordspace.collocations.de 8 / 50
Introduction Definitions and notation DSM parameters Corpus with linguistic annotation » Term-context vs. term-term matrix » Type & size of context » Feature scaling » Similarity/distance measure & normalisation » Dimensionality reduction Stefan Evert (TU Darmstadt) Dimensionality Reduction for DSM wordspace.collocations.de 8 / 50
Introduction Definitions and notation DSM parameters Corpus with linguistic annotation » Term-context vs. term-term matrix » Type & size of context » Feature scaling » Similarity/distance measure & normalisation » Dimensionality reduction » Semantic distance, nearest neighbours, semantic maps, . . . Stefan Evert (TU Darmstadt) Dimensionality Reduction for DSM wordspace.collocations.de 8 / 50
Introduction Definitions and notation DSM parameters Corpus with linguistic annotation » Term-context vs. term-term matrix » Type & size of context » Feature scaling » Similarity/distance measure & normalisation » Dimensionality reduction » Semantic distance, nearest neighbours, semantic maps, . . . Stefan Evert (TU Darmstadt) Dimensionality Reduction for DSM wordspace.collocations.de 8 / 50
Introduction Definitions and notation DSM parameters Corpus with linguistic annotation » Term-context vs. term-term matrix » Type & size of context » Feature scaling » Similarity/ distance measure & normalisation » Dimensionality reduction » Semantic distance, nearest neighbours, semantic maps, . . . Stefan Evert (TU Darmstadt) Dimensionality Reduction for DSM wordspace.collocations.de 8 / 50
Introduction Definitions and notation Geometric interpretation and semantic distance Two dimensions of English V − Obj DSM I row vector m dog 120 describes usage of word dog in the 100 corpus knife I can be seen as ● 80 coordinates of point use in n -dimensional 60 Euclidean space R n I illustrated for two 40 dimensions: boat ● 20 get and use dog ● cat I m dog = ( 115 , 10 ) ● 0 0 20 40 60 80 100 120 get Stefan Evert (TU Darmstadt) Dimensionality Reduction for DSM wordspace.collocations.de 9 / 50
Introduction Definitions and notation Geometric interpretation and semantic distance Two dimensions of English V − Obj DSM I similarity = spatial 120 proximity (Euclidean metric) 100 I location depends on knife frequency of noun ● 80 ( f dog ¥ 2 . 7 · f cat ) use 60 40 boat d = 57.5 ● 20 dog d = 63.3 ● cat ● 0 0 20 40 60 80 100 120 get Stefan Evert (TU Darmstadt) Dimensionality Reduction for DSM wordspace.collocations.de 10 / 50
Introduction Definitions and notation Geometric interpretation and semantic distance Two dimensions of English V − Obj DSM I similarity = spatial 120 proximity (Euclidean metric) 100 I location depends on knife frequency of noun ● 80 ( f dog ¥ 2 . 7 · f cat ) use I direction more 60 important than 40 location boat ● 20 dog ● cat ● 0 0 20 40 60 80 100 120 get Stefan Evert (TU Darmstadt) Dimensionality Reduction for DSM wordspace.collocations.de 11 / 50
Introduction Definitions and notation Geometric interpretation and semantic distance Two dimensions of English V − Obj DSM I similarity = spatial 120 proximity (Euclidean metric) 100 I location depends on knife frequency of noun ● 80 ( f dog ¥ 2 . 7 · f cat ) ● use I direction more 60 important than 40 location boat ● I normalise “length” ● 20 dog Î m dog Î of vector ● cat ● ● ● 0 0 20 40 60 80 100 120 get Stefan Evert (TU Darmstadt) Dimensionality Reduction for DSM wordspace.collocations.de 12 / 50
Introduction Definitions and notation Geometric interpretation and semantic distance Two dimensions of English V − Obj DSM I similarity = spatial 120 proximity (Euclidean metric) 100 I location depends on knife frequency of noun ● 80 ( f dog ¥ 2 . 7 · f cat ) ● use I direction more 60 α = 54.3 ° important than 40 location boat ● I normalise “length” ● 20 dog Î m dog Î of vector ● cat ● ● ● I or use angle α as 0 distance measure 0 20 40 60 80 100 120 get Stefan Evert (TU Darmstadt) Dimensionality Reduction for DSM wordspace.collocations.de 12 / 50
Recommend
More recommend