what do you mean inferring word meaning using computer
play

What do you mean? Inferring Word Meaning Using Computer Vision - PowerPoint PPT Presentation

What do you mean? Inferring Word Meaning Using Computer Vision Shibamouli Lahiri Original paper Multimodal Distributional Semantics Bruni, et al. (JAIR 2014) 2 Authors Elia Bruni Nam Khanh Marco Baroni Tran 3


  1. What do you mean? Inferring Word Meaning Using Computer Vision Shibamouli Lahiri

  2. Original paper ➔ Multimodal Distributional Semantics ◆ Bruni, et al. (JAIR 2014) 2

  3. Authors Elia Bruni Nam Khanh Marco Baroni Tran 3

  4. What does this word mean? অথਐ 4

  5. What does this word mean? অথਐ means “meaning” in Bengali. 5

  6. What does this word mean? ★ অথਐ means “meaning” in Bengali. ★ It also means “money” or “wealth”. 6

  7. The importance of “grounding” ওবামা বুশ ি੹�ন 7

  8. The importance of “grounding” ওবামা বুশ ি੹�ন 8

  9. The importance of “grounding” ওবামা = আেমিরকার ৪৪তম রা�ਵপিত 9

  10. The importance of “grounding” Topic 1: state agent control set system systems states event learning model action problem agents task actions time algorithm knowledge events figure Topic 2: optimal problem function time probability set information game strategy model distribution case algorithm section number random cost theorem vol matrix Topic 3: data information learning features set work text language word number analysis words results table based research social semantic web system Topic 4: design circuit gate logic test delay input circuits fault gates error simulation number timing placement faults figure analysis techniques model Topic 5: system user data systems security users file time server application software information network applications key design mobile process access interface 10

  11. Grounding in the brain ● “kick” vs “lick” ○ Pulvermueller, 2005 11

  12. Distributed vs Perceptual tropical yellow fruit peel edible smooth 12

  13. Origins of Meaning Representation “You shall know a word by the company it keeps.” 13

  14. Origins of Meaning Representation “The individual words in language name objects ... It is the object for which the word stands.” 14

  15. Can we combine them? Yes! car automobile vehicle + = 15

  16. Background Distributional Perceptual Words Visual words BoW BoVW Documents Images 16

  17. Background Distributional Perceptual Words Visual words BoW BoVW Documents Images car automobile vehicle 17

  18. Visual words 18

  19. Overview 19

  20. Overview ⃪ SVD on joint matrix ⃪ Feature-level fusion ⃪ Scoring-level fusion 20

  21. Overview 21

  22. Text matrix ❏ Corpora: ukWaC and Wackypedia 1.9B and 820M tokens. ❏ Both lemmatized and POS-tagged. ❏ ❏ Terms: most frequent 20K nouns, 5K adjectives, 5K verbs Adjustment leads to 20,515 terms. ❏ ❏ Context: Window2 and Window20 ❏ Association Score: non-negative local mutual information (LMI) ❏ Matrix: 20,515 rows, 30K columns 22

  23. Non-negative LMI 23

  24. Image matrix ❏ Corpus: ESP-Game dataset 100K tagged images, 14 tags on average. ❏ 20,515 distinct word types. ❏ ❏ Terms: 20,515 words ❏ Context: Visual words ✕ spatial bins 5K visual words ❏ 16 spatial bins ❏ ❏ Association Score: non-negative LMI word associated with images labeled with it ❏ co-occurrence counts are summed ❏ ❏ Matrix: 20,515 rows, 80K columns 24

  25. ESP-Game images 25

  26. Image matrix construction ★ Identify “keypoints”. ★ 128 SIFT features for each keypoint. ○ 4 ✕ 4 sampling regions ✕ 8 orientations ○ Average across three channels (HSV) ★ Cluster all keypoints from all images. 5,000 clusters with k-means ○ Cluster centers are “visual words” ○ ★ Image representation ○ Vector of “term frequency” on visual words ★ 4 ✕ 4 spatial binning 26

  27. Visual words 27

  28. Image matrix construction 28

  29. Overview 29

  30. Latent Multimodal Mixing ➔ Singular value decomposition (SVD): ➔ Low-rank approximation: 30

  31. Overview 31

  32. Multimodal Similarity ➢ Goal: Measure similarity between word pairs ➢ Similarity function: Cosine on latent vectors ➢ Feature-level fusion: F = � F t ⊕ (1- � )F v ➢ Scoring-level fusion: S = � S t + (1- � )S v 32

  33. Fusion options � = 1 Text only � = 0.5 NaiveFL (Bruni et al., 2011) � = 0 Image only FL � = 1 Text only = r � = 0.5 NaiveSL (Leong & Mihalcea, 2011) SL � = 0 Image only � = 1 Text mixed k � ∈ (0, 1) TunedFL FL � = 0 Image mixed < r � = 1 Text mixed SL � ∈ (0, 1) TunedSL � = 0 Image mixed 33

  34. Experiments: Overview ➔ Differentiation between semantic relations ➔ Word relatedness ➔ Concrete vs Abstract words ➔ Concept categorization 34

  35. Experiments: Overview ➔ Differentiation between semantic relations ➔ Word relatedness ➔ Concrete vs Abstract words ➔ Concept categorization 35

  36. Semantic Relations Goal is to explore which word relations are best captured by which model. ★ 36

  37. Semantic Relations Goal is to explore which word relations are best captured by which model. ★ BLESS benchmark (Baroni and Lenci, 2011) ★ 184 pivot words denoting concrete concepts. ★ For each pivot, there are related words ( relata ): ★ ○ COORD (co-hyponym: alligator-lizard ) ○ HYPER (hypernym: alligator-reptile ) ○ MERO (meronym: alligator-teeth ) ○ ATTRI (attribute: alligator-ferocious ) ○ EVENT ( alligator-swim ) ○ RAN.N (random noun: alligator-trombone ) ○ RAN.J (random adjective: alligator-electronic ) ○ RAN.V (random verb: alligator-conclude ) 37

  38. Semantic Relations Goal is to explore which word relations are best captured by which model. ★ BLESS benchmark (Baroni and Lenci, 2011) ★ 184 pivot words denoting concrete concepts. ★ For each pivot, there are related words ( relata ): ★ ○ COORD (co-hyponym: alligator-lizard ) ○ HYPER (hypernym: alligator-reptile ) ○ MERO (meronym: alligator-teeth ) ○ ATTRI (attribute: alligator-ferocious ) ○ EVENT ( alligator-swim ) ○ RAN.N (random noun: alligator-trombone ) ○ RAN.J (random adjective: alligator-electronic ) ○ RAN.V (random verb: alligator-conclude ) Represent pivots and relata with text and image vectors. ★ Pick relatum with highest cosine for each relation. ★ Convert cosines to z-scores. ★ 38

  39. Semantic Relations 39

  40. Semantic Relations Pivot Text Image Pivot Text Image cabbage leafy white helicopter heavy old carrot fresh orange onion fresh white cherry ripe red oven electric new deer wild brown plum juicy red dishwasher electric white sofa comfortable old elephant wild white sparrow wild little glider heavy white stove electric hot gorilla wild black tanker heavy grey hat white old toaster electric new hatchet sharp short trout fresh old 40

  41. Experiments: Overview ➔ Differentiation between semantic relations ➔ Word relatedness ➔ Concrete vs Abstract words ➔ Concept categorization 41

  42. Word Relatedness ★ Goal is to predict relatedness between two words. 42

  43. Word Relatedness ★ Goal is to predict relatedness between two words. ★ WS (WordSim353) and MEN (Marco-Elia-Nam) benchmarks. ★ WS has 353 similarity-rated word pairs. 252 were used in this study. ○ ★ MEN has 3,000 similarity-rated word pairs. ○ Similarity scores obtained from Mechanical Turk. 2,000 development pairs. ○ 1,000 test pairs. ○ 43

  44. Word Relatedness ★ Goal is to predict relatedness between two words. ★ WS (WordSim353) and MEN (Marco-Elia-Nam) benchmarks. ★ WS has 353 similarity-rated word pairs. 252 were used in this study. ○ ★ MEN has 3,000 similarity-rated word pairs. ○ Similarity scores obtained from Mechanical Turk. 2,000 development pairs. ○ 1,000 test pairs. ○ ★ Models evaluated by correlation between human similarity and cosine similarity (of word pairs). 44

  45. Word Relatedness (Spearman) Window2 Window20 Model MEN WS MEN WS Text 0.73 0.70 0.68 0.70 Image 0.43 0.36 0.43 0.36 NaiveFL 0.75 0.67 0.73 0.67 NaiveSL 0.76 0.69 0.74 0.64 MixLDA (Feng and Lapata, 2010) 0.30 0.23 0.30 0.23 Text mixed 0.77 0.73 0.74 0.75 Image mixed 0.55 0.52 0.57 0.51 TunedFL 0.78 0.72 0.76 0.75 TunedSL 0.78 0.71 0.77 0.72 45

  46. Word Relatedness (Spearman) Window2 Window20 Model MEN WS MEN WS Text 0.73 0.70 0.68 0.70 Image 0.43 0.36 0.43 0.36 NaiveFL 0.75 0.67 0.73 0.67 NaiveSL 0.76 0.69 0.74 0.64 MixLDA (Feng and Lapata, 2010) 0.30 0.23 0.30 0.23 Text mixed 0.77 0.73 0.74 0.75 Image mixed 0.55 0.52 0.57 0.51 TunedFL 0.78 0.72 0.76 0.75 TunedSL 0.78 0.71 0.77 0.72 46

  47. Word Relatedness (Pearson) MixLDA (Feng and Lapata, 2010) 0.32 Window2 Window20 Text mixed 0.47 0.49 TunedFL 0.46 0.49 TunedSL 0.46 0.47 47

  48. Qualitative Analysis Text (Window20) TunedFL dawn - dusk pet - puppy sunrise - sunset candy - chocolate canine - dog paw - pet grape - wine bicycle - bike foliage - plant apple - cherry foliage - petal copper - metal skyscraper - tall military - soldier cat - feline paws - whiskers pregnancy - pregnant stream - waterfall misty - rain cheetah - lion 48

  49. Experiments: Overview ➔ Differentiation between semantic relations ➔ Word relatedness ➔ Concrete vs Abstract words ➔ Concept categorization 49

  50. Concrete vs Abstract Words Goal is to see which model performs better on concrete/abstract words. ★ 50

Recommend


More recommend