machine learning for computational linguistics
play

Machine Learning for Computational Linguistics Distributed - PowerPoint PPT Presentation

Machine Learning for Computational Linguistics Distributed representations ar ltekin University of Tbingen Seminar fr Sprachwissenschaft June 14, 2016 Introduction SVD June 14, 2016 SfS / University of Tbingen .


  1. Machine Learning for Computational Linguistics Distributed representations Çağrı Çöltekin University of Tübingen Seminar für Sprachwissenschaft June 14, 2016

  2. Introduction SVD June 14, 2016 SfS / University of Tübingen Ç. Çöltekin, methods used 1 / 24 objects of interest, such as Representations of linguistic units Summary Embeddings ▶ Most ML methods we use depend on how we represent the ▶ words, morphemes ▶ sentences, phrases ▶ letters, phonemes ▶ documents ▶ speakers, authors ▶ … ▶ The way we represent these objects interacts with the ML ▶ They also afgect what can be learned

  3. Introduction SVD June 14, 2016 SfS / University of Tübingen Ç. Çöltekin, hand-annotated data WordNets, but they will still be categorical/hard distinctions ‘story’ and ‘tale’ units and their relationships are categorical meaning of the words or their relation to each other treat them as individual symbols Symbolic representations Summary Embeddings 2 / 24 ▶ A common way to represent words (and other units) is to w 1 = ‘cat’, w 2 = ‘dog’, w 3 = ‘book’ ▶ The symbols do not include any information about the use or ▶ They are useful in many NLP tasks, but distinctions between ▶ ‘cat’ as difgerent from ‘dog’ as it is from ‘book’ ▶ The relationship between ‘cat’ and ‘dog’ is not difgerent from ▶ Some of these can be extracted from conventional lexicons or ▶ The similarity/difgerence decisions are typically made based on

  4. Introduction correspond to distances in the high-dimensional vector space June 14, 2016 SfS / University of Tübingen Ç. Çöltekin, useful. The distances in symbolic/one-hot representation are not vectors SVD the word vectors live Summary Vector representations Embeddings 3 / 24 ▶ The idea is to represent the linguistic objects as vectors cat = ( 0 . 1 , 0 . 3 , 0 . 5 , . . . , 0 . 4 ) dog = ( 0 . 2 , 0 . 3 , 0 . 4 , . . . , 0 . 3 ) book = ( 0 . 9 , 0 . 1 , 0 . 8 , . . . , 0 . 3 ) ▶ The (syntactic/semantic) difgerences between the words ▶ Symbolic representations are equivalent to 1-of-K or one-hot cat = ( 0 , . . . , 1 , 0 , 0 , . . . , 0 ) dog = ( 0 , . . . , 0 , 1 , 0 , . . . , 0 ) book = ( 0 , . . . , 0 , 0 , 1 , . . . , 0 )

  5. Introduction SVD June 14, 2016 SfS / University of Tübingen Ç. Çöltekin, target word to a complete document similar representations appear to determine their representations 4 / 24 —Firth (1957) You shall know a word by the company it keeps. Where does the vector representations come from? Summary Embeddings ▶ The vectors are (almost certainly) learned from the data ▶ The idea goes back to, ▶ In practice, we make use of the contexts where the words ▶ The words that appear in similar contexts are mapped to ▶ Context varies from a small window of words around the

  6. Introduction SVD June 14, 2016 SfS / University of Tübingen Ç. Çöltekin, prediction error (word2vec, GloVe, …) target words), and update the vectors to minimize the covariances are assigned to similar vectors (LSA/LSI) How to calculate word vectors techniques like SVD to assign vectors: the words with high Summary Embeddings 5 / 24 ▶ Typically we use unsupervised (or self-supervised) methods ▶ Common approaches: ▶ Obtain global counts of words in each context, and use ▶ Predict the words from their context (or the context from the ▶ Model each word as a mixture of latent variables (LDA)

  7. Introduction 0 0 0 1 cats 1 1 0 0 dogs 1 1 0 books reads 0 0 1 1 and 1 1 0 0 Ç. Çöltekin, SfS / University of Tübingen June 14, 2016 0 0 SVD S3 Embeddings Summary A toy example A four-sentence corpus with bag of words (BOW) model. The corpus: S1: She likes cats and dogs S2: He likes dogs and cats S3: She likes books S4: He reads books Term-document (sentence) matrix S1 S2 S4 1 she 1 0 1 0 he 0 1 0 1 likes 1 1 6 / 24

  8. Introduction 0 dogs 1 0 0 0 0 1 0 0 0 cats 0 0 0 0 0 0 1 0 0 reads 0 0 0 0 0 0 1 0 0 0 0 June 14, 2016 SfS / University of Tübingen Ç. Çöltekin, 0 0 1 1 0 0 0 0 0 and 0 1 0 0 1 1 0 0 0 books 1 0 0 0 0 2 likes SVD 0 Embeddings Summary A toy example A four-sentence corpus with bag of words (BOW) model. The corpus: S1: She likes cats and dogs S2: He likes dogs and cats S3: She likes books S4: He reads books Term-term (left-context) matrix 0 she 2 0 6 / 24 0 0 2 0 0 0 0 0 0 he 0 0 0 0 0 s s k d s s s g o d e e a t e k o o n # h a e h i c d b a s r l

  9. Introduction 1 0 reads 0 0 0 1 cats 1 1 0 0 dogs 1 0 1 0 books 0 0 1 1 and 1 1 0 0 Ç. Çöltekin, SfS / University of Tübingen June 14, 2016 1 1 SVD S1 Embeddings Summary Term-document matrices terms: similar terms appear in similar contexts the context: similar contexts contain similar words matrices are typically sparse and large likes Term-document (sentence) matrix S2 0 1 0 1 0 he 7 / 24 S3 1 0 1 she S4 ▶ The rows are about the ▶ The columns are about ▶ The term-context

  10. Introduction can be decomposed as June 14, 2016 SfS / University of Tübingen Ç. Çöltekin, dimensionality of rows (terms) and columns (documents) SVD 8 / 24 SVD (again) algebra Summary Embeddings ▶ Singular value decomposition is a well-known method in linear ▶ An n × m ( n terms m documents) term-document matrix X X = UΣV T U is a n × r unitary matrix, where r is the rank of X ( r ⩽ min ( n , m ) ). Columns of U are the eigenvectors of XX T Σ is a r × r diagonal matrix of singular values (square root of eigenvalues of XX T and X T X ) V T is a r × m unitary matrix. Columns of V are the eigenvectors of X T X ▶ One can consider U and V as PCA performed for reducing

  11. Introduction of the data with minimum loss June 14, 2016 SfS / University of Tübingen Ç. Çöltekin, minimum SVD 9 / 24 Summary Truncated SVD Embeddings X = UΣV T ▶ Using eigenvectors (from U and V ) that correspond to k largest singular values ( k < r ), allows reducing dimensionality ▶ The approximation, ˆ X = U k Σ k V k results in the best approximation of X , such that ∥ ˆ X − X ∥ F is

  12. Introduction of the data with minimum loss June 14, 2016 SfS / University of Tübingen Ç. Çöltekin, minimum SVD 10 / 24 Summary Truncated SVD Embeddings X = UΣV T ▶ Using eigenvectors (from U and V ) that correspond to k largest singular values ( k < r ), allows reducing dimensionality ▶ The approximation, ˆ X = U k Σ k V k results in the best approximation of X , such that ∥ ˆ X − X ∥ F is ▶ Note that r may easily be millions (of words or contexts), while we choose k much smaller (at most a few hundreds)

  13. Introduction . . . . . ... . . . SVD . . . ... . . . ... . ... June 14, 2016 SfS / University of Tübingen Ç. Çöltekin, . . . . . . . . . . . . . 11 / 24 . Embeddings . Summary Truncated SVD (2) . . .   x 1 , 1 x 1 , 2 x 1 , 3 x 1 , m . . . x 1 , 1 x 1 , 2 x 1 , 3 . . . x 1 , m     x 2 , 1 x 2 , 2 x 2 , 3 x 2 , m . . .   =   x 3 , 1 x 3 , 2 x 3 , 3 x 3 , m . . .         x n , 1 x n , 2 x n , 3 x n , m . . .   u 1 , 1 u 1 , k . . .     u 2 , 1 u 2 , k σ 1 0 u 1 , 1 u 1 , 2 u 1 , m . . . . . . . . .     u 3 , 1 u 3 , k . . . ×  ×              0 σ k u k , 1 u k , 2 u n , m . . . . . .   u n , 1 u n , k . . .

  14. Introduction . . . . . ... . . . SVD . . . ... . . . ... . ... June 14, 2016 SfS / University of Tübingen Ç. Çöltekin, . . . . . . . . . . . . . 11 / 24 . Embeddings . . . . Truncated SVD (2) Summary   x 1 , 1 x 1 , 2 x 1 , 3 x 1 , m . . . x 1 , 1 x 1 , 2 x 1 , 3 x 1 , m . . .     x 2 , 1 x 2 , 2 x 2 , 3 x 2 , m . . .   =   x 3 , 1 x 3 , 2 x 3 , 3 x 3 , m . . .         x n , 1 x n , 2 x n , 3 x n , m . . .   u 1 , 1 u 1 , k . . .     u 2 , 1 u 2 , k σ 1 0 u 1 , 1 u 1 , 2 u 1 , m . . . . . . . . .     u 3 , 1 u 3 , k . . . ×  ×              0 σ k u k , 1 u k , 2 u n , m . . . . . .   u n , 1 u n , k . . . The term 1 can be represented using the fjrst row of U k

  15. Introduction . . . . . ... . . . SVD . . . ... . . . ... . ... June 14, 2016 SfS / University of Tübingen Ç. Çöltekin, . . . . . . . . . . . . . 11 / 24 . Embeddings . . . . Truncated SVD (2) Summary   x 1 , 1 x 1 , 2 x 1 , 3 . . . x 1 , m x 1 , 1 x 1 , 2 x 1 , 3 x 1 , m . . .     x 2 , 1 x 2 , 2 x 2 , 3 . . . x 2 , m   =   x 3 , 1 x 3 , 2 x 3 , 3 x 3 , m . . .         x n , 1 x n , 2 x n , 3 . . . x n , m   u 1 , 1 u 1 , k . . .     u 2 , 1 u 2 , k σ 1 0 u 1 , 1 u 1 , 2 u 1 , m . . . . . . . . .     u 3 , 1 u 3 , k . . . ×  ×              0 σ k u k , 1 u k , 2 u n , m . . . . . .   u n , 1 u n , k . . . The document 1 can be represented using the fjrst column of V T k

Recommend


More recommend