lecture 8
play

Lecture 8: NLP and Word Embeddings Alireza Akhavan Pour - PowerPoint PPT Presentation

Lecture 8: NLP and Word Embeddings Alireza Akhavan Pour CLASS.VISION SRTTU A.Akhavan 1 Lecture 8 - NLP and Word Embeddings Word representation V = [a, aaron, ..., zulu, <UNK>] = ,


  1. Lecture 8: NLP and Word Embeddings Alireza Akhavan Pour CLASS.VISION ،هبنش۱۹ نابآ۱۳۹۷ SRTTU – A.Akhavan 1 Lecture 8 -

  2. NLP and Word Embeddings Word representation V = [a, aaron, ..., zulu, <UNK>] 𝑾 = 𝟐𝟏, 𝟏𝟏𝟏 1-hot representation 𝑘𝑣𝑗𝑑𝑓 I want a glass of orange ______ . ? I want a glass of apple ______ . ؟لکشم تسا ناسکی اهرادرب مامت یسدیلقا هلصاف. 𝑃 5391 𝑃 9853 generalize دنک. تسا هدید شزومآ رد هک یتاملک یور زا دناوتیمن ،هبنش۱۹ نابآ۱۳۹۷ SRTTU – A.Akhavan 2 Lecture 8 -

  3. Featurized representation: word embedding تیسنج -11 -0.950.970.000.01 یتنطلس 0.94 0.93-0.010.000.02 0.01 نس 0.71 0.690.03-0.020.020.03 300 یکاروخ 0. 020.00 0. 96 -0. 97- 0.0 1 0.0 1 زیاس ... ... هدنز 𝑘𝑣𝑗𝑑𝑓 I want a glass of orange ______ . تمیق لعف 𝑘𝑣𝑗𝑑𝑓 I want a glass of apple ______ . 𝒇 𝟓𝟔𝟕 𝒇 𝟕𝟑𝟔𝟖 .... ،هبنش۱۹ نابآ۱۳۹۷ SRTTU – A.Akhavan 3 Lecture 8 -

  4. Visualizing word embeddings [van der Maaten and Hinton., 2008.Visualizing data using t-SNE] ،هبنش۱۹ نابآ۱۳۹۷ SRTTU – A.Akhavan 4 Lecture 8 -

  5. Using word embeddings: Named entity recognition example Sally Johnson is an orange farmer Robert Lin is an apple farmer Robert Lin is a durian cultivator ،هبنش۱۹ نابآ۱۳۹۷ SRTTU – A.Akhavan 5 Lecture 8 -

  6. Using word embeddings: Named entity recognition example  Now if you have tested your model with this sentence " Robert Lin is a durian cultivator “ the network should learn the name even if it hasn't seen the word durian before (during training). That's the power of word representations.  The algorithms that are used to learn word embeddings can examine billions of words of unlabeled text - for example, 100 billion words and learn the representation from them. ،هبنش۱۹ نابآ۱۳۹۷ SRTTU – A.Akhavan 6 Lecture 8 -

  7. Transfer learning and word embeddings I. Learn word embeddings from large text corpus (1-100 billion of words).  Or download pre-trained embedding online. II. Transfer embedding to new task with the smaller training set (say, 100k words). تسا یدورو داعبا شهاک رگید تبثم یگژیو کی . روتکو یاج هب لبثم10.000 one-hot روتکو کی اب300درک میهاوخ راک یدعب. یدعب III. Optional: continue to finetune the word embeddings with new data.  You bother doing this if your smaller training set (from step 2) is big enough. ،هبنش۱۹ نابآ۱۳۹۷ SRTTU – A.Akhavan 7 Lecture 8 -

  8. Relation to face encoding (Embeddings)  Word embeddings have an interesting relationship to the face recognition task: o In this problem, we encode each face into a vector and then check how similar are these vectors. o Words encoding and embeddings have a similar meaning here.  In the word embeddings task, we are learning a representation for each word in our vocabulary (unlike in image encoding where we have to map each new image to some n-dimensional vector). [Taigman et. al., 2014. DeepFace: Closing the gap to human level performance] ،هبنش۱۹ نابآ۱۳۹۷ SRTTU – A.Akhavan 8 Lecture 8 -

  9. Properties of word embeddings • Analogies 𝒇 𝑵𝒃𝒐 𝒇 𝑿𝒑𝒏𝒃𝒐 𝒇 𝑳𝒋𝒐𝒉 𝒇 𝑹𝒗𝒇𝒇𝒐  Can we conclude this relation: −2 −2  Man ==> Woman 0 0 e Man – e Woman ≈ e King – e Q 𝐯𝐟𝐟𝐨 ≈  King ==> ?? 0 0 0 0 [Mikolov et. al., 2013, Linguistic regularities in continuous space word representations] ،هبنش۱۹ نابآ۱۳۹۷ SRTTU – A.Akhavan 9 Lecture 8 -

  10. Analogies using word vectors woman Queen t-SNE man King 𝒇 𝒙 300 D 𝑮𝒋𝒐𝒆 𝒙𝒑𝒔𝒆 𝒙: 𝑏𝑠𝑕 max 𝑡𝑗𝑛(𝑓 𝑥 , 𝑓 𝑙𝑗𝑜𝑕 − 𝑓 𝑛𝑏𝑜 + 𝑓 𝑥𝑝𝑛𝑏𝑜 ൯ 𝑥 ،هبنش۱۹ نابآ۱۳۹۷ SRTTU – A.Akhavan 10 Lecture 8 -

  11. Cosine similarity 𝑡𝑗𝑛(𝑓 𝑥 , 𝑓 𝑙𝑗𝑜𝑕 − 𝑓 𝑛𝑏𝑜 + 𝑓 𝑥𝑝𝑛𝑏𝑜 ൯ یسدیلقا هلصاف : 𝑉 𝑈 𝑊 𝑡𝑗𝑛 𝑣, 𝑤 = 𝑣 2 𝑤 2 𝑉 − 𝑊 2 𝑉 𝑊 𝑉 𝑊 Man:Woman as boy:girl Ottawa:Canada as Iran:Tehran 1 Big:bigger as tall:taller 0 Yen: Japan as Ruble:Russia -1 ،هبنش۱۹ نابآ۱۳۹۷ SRTTU – A.Akhavan 11 Lecture 8 -

  12. Embedding matrix 6257 𝑃 6257 here … orange example ... <UNK> 0 0 -0.2 … -0.67 -0.2 … 0.2 0 0 10.000 0.7 … 0.3 -0.5 … 0.1 . : 0.85 … 0.25 0.3 … 1 300 1 . -0.04 … -0.18 0.33 … -0.1 : 6257 0 ... 0 0.5 … 1 0.3 … 0.2 10.000 𝐹 ،هبنش۱۹ نابآ۱۳۹۷ SRTTU – A.Akhavan 12 Lecture 8 -

  13. Embedding matrix 6257 𝑃 6257 here … orange example ... <UNK> 0 0 -0.2 … -0.67 -0.2 … 0.2 0 0 10.000 0.7 … 0.3 -0.5 … 0.1 . : 0.85 … 0.25 0.3 … 1 300 1 . -0.04 … -0.18 0.33 … -0.1 : 6257 0 ... 0 0.5 … 1 0.3 … 0.2 10.000 𝐹 𝑭 . 𝑷 𝟕𝟑𝟔𝟖 = 𝒇 𝟕𝟑𝟔𝟖 ،هبنش۱۹ نابآ۱۳۹۷ SRTTU – A.Akhavan 13 Lecture 8 -

  14. Embedding matrix 0 0 here … orange example ... <UNK> 0 0 -0.2 … -0.67 -0.2 … 0.2 . 0.7 … 0.3 -0.5 … 0.1 : 0.85 … 0.25 0.3 … 1 1 . -0.04 … -0.18 0.33 … -0.1 : ... 0.5 … 1 0.3 … 0.2 0 0 𝑭 . 𝑷 𝟕𝟑𝟔𝟖 = 𝒇 𝟕𝟑𝟔𝟖 300x10k 10k x 1 300 x 1  If O 6257 is the one hot encoding of the word orange of shape (10000, 1), then np.dot(E,O 6257 ) = e 6257 which shape is (300, 1).  Generally np.dot(E, O j ) = e j (embedding for word j) ،هبنش۱۹ نابآ۱۳۹۷ SRTTU – A.Akhavan 14 Lecture 8 -

  15. Embedding matrix https://keras.io/layers/embeddings/ The Embedding layer is best understood as a dictionary mapping integer indices (which stand for specific words) to dense vectors. It takes as input integers, it looks up these integers into an internal dictionary, and it returns the associated vectors. It's effectively a dictionary lookup. ،هبنش۱۹ نابآ۱۳۹۷ SRTTU – A.Akhavan 15 Lecture 8 -

  16. عبانم • https://www.coursera.org/specializations/deep -learning • https://github.com/fchollet/deep-learning- with-python-notebooks/blob/master/6.1-using- word-embeddings.ipynb • https://github.com/mbadry1/DeepLearning.ai- Summary/ ،هبنش۱۹ نابآ۱۳۹۷ SRTTU – A.Akhavan 16 Lecture 8 -

Recommend


More recommend