What do you mean? Inferring Word Meaning Using Computer Vision Shibamouli Lahiri
Original paper ➔ Multimodal Distributional Semantics ◆ Bruni, et al. (JAIR 2014) 2
Authors Elia Bruni Nam Khanh Marco Baroni Tran 3
What does this word mean? অথਐ 4
What does this word mean? অথਐ means “meaning” in Bengali. 5
What does this word mean? ★ অথਐ means “meaning” in Bengali. ★ It also means “money” or “wealth”. 6
The importance of “grounding” ওবামা বুশ ি�ন 7
The importance of “grounding” ওবামা বুশ ি�ন 8
The importance of “grounding” ওবামা = আেমিরকার ৪৪তম রা�ਵপিত 9
The importance of “grounding” Topic 1: state agent control set system systems states event learning model action problem agents task actions time algorithm knowledge events figure Topic 2: optimal problem function time probability set information game strategy model distribution case algorithm section number random cost theorem vol matrix Topic 3: data information learning features set work text language word number analysis words results table based research social semantic web system Topic 4: design circuit gate logic test delay input circuits fault gates error simulation number timing placement faults figure analysis techniques model Topic 5: system user data systems security users file time server application software information network applications key design mobile process access interface 10
Grounding in the brain ● “kick” vs “lick” ○ Pulvermueller, 2005 11
Distributed vs Perceptual tropical yellow fruit peel edible smooth 12
Origins of Meaning Representation “You shall know a word by the company it keeps.” 13
Origins of Meaning Representation “The individual words in language name objects ... It is the object for which the word stands.” 14
Can we combine them? Yes! car automobile vehicle + = 15
Background Distributional Perceptual Words Visual words BoW BoVW Documents Images 16
Background Distributional Perceptual Words Visual words BoW BoVW Documents Images car automobile vehicle 17
Visual words 18
Overview 19
Overview ⃪ SVD on joint matrix ⃪ Feature-level fusion ⃪ Scoring-level fusion 20
Overview 21
Text matrix ❏ Corpora: ukWaC and Wackypedia 1.9B and 820M tokens. ❏ Both lemmatized and POS-tagged. ❏ ❏ Terms: most frequent 20K nouns, 5K adjectives, 5K verbs Adjustment leads to 20,515 terms. ❏ ❏ Context: Window2 and Window20 ❏ Association Score: non-negative local mutual information (LMI) ❏ Matrix: 20,515 rows, 30K columns 22
Non-negative LMI 23
Image matrix ❏ Corpus: ESP-Game dataset 100K tagged images, 14 tags on average. ❏ 20,515 distinct word types. ❏ ❏ Terms: 20,515 words ❏ Context: Visual words ✕ spatial bins 5K visual words ❏ 16 spatial bins ❏ ❏ Association Score: non-negative LMI word associated with images labeled with it ❏ co-occurrence counts are summed ❏ ❏ Matrix: 20,515 rows, 80K columns 24
ESP-Game images 25
Image matrix construction ★ Identify “keypoints”. ★ 128 SIFT features for each keypoint. ○ 4 ✕ 4 sampling regions ✕ 8 orientations ○ Average across three channels (HSV) ★ Cluster all keypoints from all images. 5,000 clusters with k-means ○ Cluster centers are “visual words” ○ ★ Image representation ○ Vector of “term frequency” on visual words ★ 4 ✕ 4 spatial binning 26
Visual words 27
Image matrix construction 28
Overview 29
Latent Multimodal Mixing ➔ Singular value decomposition (SVD): ➔ Low-rank approximation: 30
Overview 31
Multimodal Similarity ➢ Goal: Measure similarity between word pairs ➢ Similarity function: Cosine on latent vectors ➢ Feature-level fusion: F = � F t ⊕ (1- � )F v ➢ Scoring-level fusion: S = � S t + (1- � )S v 32
Fusion options � = 1 Text only � = 0.5 NaiveFL (Bruni et al., 2011) � = 0 Image only FL � = 1 Text only = r � = 0.5 NaiveSL (Leong & Mihalcea, 2011) SL � = 0 Image only � = 1 Text mixed k � ∈ (0, 1) TunedFL FL � = 0 Image mixed < r � = 1 Text mixed SL � ∈ (0, 1) TunedSL � = 0 Image mixed 33
Experiments: Overview ➔ Differentiation between semantic relations ➔ Word relatedness ➔ Concrete vs Abstract words ➔ Concept categorization 34
Experiments: Overview ➔ Differentiation between semantic relations ➔ Word relatedness ➔ Concrete vs Abstract words ➔ Concept categorization 35
Semantic Relations Goal is to explore which word relations are best captured by which model. ★ 36
Semantic Relations Goal is to explore which word relations are best captured by which model. ★ BLESS benchmark (Baroni and Lenci, 2011) ★ 184 pivot words denoting concrete concepts. ★ For each pivot, there are related words ( relata ): ★ ○ COORD (co-hyponym: alligator-lizard ) ○ HYPER (hypernym: alligator-reptile ) ○ MERO (meronym: alligator-teeth ) ○ ATTRI (attribute: alligator-ferocious ) ○ EVENT ( alligator-swim ) ○ RAN.N (random noun: alligator-trombone ) ○ RAN.J (random adjective: alligator-electronic ) ○ RAN.V (random verb: alligator-conclude ) 37
Semantic Relations Goal is to explore which word relations are best captured by which model. ★ BLESS benchmark (Baroni and Lenci, 2011) ★ 184 pivot words denoting concrete concepts. ★ For each pivot, there are related words ( relata ): ★ ○ COORD (co-hyponym: alligator-lizard ) ○ HYPER (hypernym: alligator-reptile ) ○ MERO (meronym: alligator-teeth ) ○ ATTRI (attribute: alligator-ferocious ) ○ EVENT ( alligator-swim ) ○ RAN.N (random noun: alligator-trombone ) ○ RAN.J (random adjective: alligator-electronic ) ○ RAN.V (random verb: alligator-conclude ) Represent pivots and relata with text and image vectors. ★ Pick relatum with highest cosine for each relation. ★ Convert cosines to z-scores. ★ 38
Semantic Relations 39
Semantic Relations Pivot Text Image Pivot Text Image cabbage leafy white helicopter heavy old carrot fresh orange onion fresh white cherry ripe red oven electric new deer wild brown plum juicy red dishwasher electric white sofa comfortable old elephant wild white sparrow wild little glider heavy white stove electric hot gorilla wild black tanker heavy grey hat white old toaster electric new hatchet sharp short trout fresh old 40
Experiments: Overview ➔ Differentiation between semantic relations ➔ Word relatedness ➔ Concrete vs Abstract words ➔ Concept categorization 41
Word Relatedness ★ Goal is to predict relatedness between two words. 42
Word Relatedness ★ Goal is to predict relatedness between two words. ★ WS (WordSim353) and MEN (Marco-Elia-Nam) benchmarks. ★ WS has 353 similarity-rated word pairs. 252 were used in this study. ○ ★ MEN has 3,000 similarity-rated word pairs. ○ Similarity scores obtained from Mechanical Turk. 2,000 development pairs. ○ 1,000 test pairs. ○ 43
Word Relatedness ★ Goal is to predict relatedness between two words. ★ WS (WordSim353) and MEN (Marco-Elia-Nam) benchmarks. ★ WS has 353 similarity-rated word pairs. 252 were used in this study. ○ ★ MEN has 3,000 similarity-rated word pairs. ○ Similarity scores obtained from Mechanical Turk. 2,000 development pairs. ○ 1,000 test pairs. ○ ★ Models evaluated by correlation between human similarity and cosine similarity (of word pairs). 44
Word Relatedness (Spearman) Window2 Window20 Model MEN WS MEN WS Text 0.73 0.70 0.68 0.70 Image 0.43 0.36 0.43 0.36 NaiveFL 0.75 0.67 0.73 0.67 NaiveSL 0.76 0.69 0.74 0.64 MixLDA (Feng and Lapata, 2010) 0.30 0.23 0.30 0.23 Text mixed 0.77 0.73 0.74 0.75 Image mixed 0.55 0.52 0.57 0.51 TunedFL 0.78 0.72 0.76 0.75 TunedSL 0.78 0.71 0.77 0.72 45
Word Relatedness (Spearman) Window2 Window20 Model MEN WS MEN WS Text 0.73 0.70 0.68 0.70 Image 0.43 0.36 0.43 0.36 NaiveFL 0.75 0.67 0.73 0.67 NaiveSL 0.76 0.69 0.74 0.64 MixLDA (Feng and Lapata, 2010) 0.30 0.23 0.30 0.23 Text mixed 0.77 0.73 0.74 0.75 Image mixed 0.55 0.52 0.57 0.51 TunedFL 0.78 0.72 0.76 0.75 TunedSL 0.78 0.71 0.77 0.72 46
Word Relatedness (Pearson) MixLDA (Feng and Lapata, 2010) 0.32 Window2 Window20 Text mixed 0.47 0.49 TunedFL 0.46 0.49 TunedSL 0.46 0.47 47
Qualitative Analysis Text (Window20) TunedFL dawn - dusk pet - puppy sunrise - sunset candy - chocolate canine - dog paw - pet grape - wine bicycle - bike foliage - plant apple - cherry foliage - petal copper - metal skyscraper - tall military - soldier cat - feline paws - whiskers pregnancy - pregnant stream - waterfall misty - rain cheetah - lion 48
Experiments: Overview ➔ Differentiation between semantic relations ➔ Word relatedness ➔ Concrete vs Abstract words ➔ Concept categorization 49
Concrete vs Abstract Words Goal is to see which model performs better on concrete/abstract words. ★ 50
Recommend
More recommend