learning concept taxonomies from multi modal data
play

Learning Concept Taxonomies from Multi-modal Data Hao Zhang - PowerPoint PPT Presentation

Learning Concept Taxonomies from Multi-modal Data Hao Zhang Zhiting Hu, Yuntian Deng, Mrinmaya Sachan, Zhicheng Yan and Eric P. Xing Carnegie Mellon University 1 Outline Problem Taxonomy Induction Model Features Evaluation


  1. Learning Concept Taxonomies from Multi-modal Data Hao Zhang Zhiting Hu, Yuntian Deng, Mrinmaya Sachan, Zhicheng Yan and Eric P. Xing Carnegie Mellon University 1

  2. Outline • Problem • Taxonomy Induction Model • Features • Evaluation and Analysis 2

  3. Problem • Taxonomy induction {consumer goods, fashion, uniform, A set of lexical terms = neckpiece, handwear, finery, disguise, ...} • Human knowledge • Question answering • Interpretability • Information extraction 3 • Computer vision

  4. Problem • Existing Taxonomies – Knowledge/time intensive to build – Limited coverage – Unavailable 4

  5. Related Works (NLP) • Automatically induction of taxonomies Widdows [2003] Snow et al [2006] Poon and Domnigos [2010] Kozareva and Hovy Yang and Callan [2009] Navigli et al [2011] [2010] Fu et al [2014] Bansal et al [2014] 5

  6. Problem • What evidence helps taxonomy induction? – Surface features shark • Ends with • Contains • Suffix match white shark • … brid bird of prey 6

  7. Problem • What evidence helps taxonomy induction? – Semantics from text descriptions • Parent-child relation • Sibling relation [Bansal 2014] “seafish, such as shark…” seafish “rays are a group of seafishes…” shark ray “Either shark or ray…” “Both shark and ray…” 7

  8. Problem • What evidence helps taxonomy induction? – Semantics from text descriptions • Parent-child relation • Sibling relation [Bansal 2014] “seafish, such as shark…” • Wikipedia abstract – Presence and distance extracted as “rays are a group – Patterns of seafishes…” • Web-ngrams “Either shark or ray…” • … “Both shark and ray…” 8

  9. Problem • What evidence helps taxonomy induction? – wordvec d(𝑤 𝑙𝑗𝑜𝑕 , 𝑤 𝑟𝑣𝑓𝑓𝑜 ) ≈ 𝑒(𝑤 𝑛𝑏𝑜 , 𝑤 𝑥𝑝𝑛𝑏𝑜 ) ? 𝑤 se𝑏𝑔𝑗𝑡ℎ − 𝑤 𝑡ℎ𝑏𝑠𝑙 𝑤 ℎ𝑣𝑛𝑏𝑜 − 𝑤(𝑥𝑝𝑛𝑏𝑜) – Projections between parent and child [Fu 2014] 9

  10. Motivation • How about images? Seafish seafish Shark shark ray Ray 10

  11. Motivation • Our motivation – Images may include perceptual semantics – Jointly leverage text and visual information (from the web) • Problems to be addressed: – How to design visual features to capture the perceptual semantics? – How to design models to integrate visual and text information?

  12. Related Works (CV) • Building visual hierarchies Griffin and Perona [2008] Sivic et al [2008] Chen et al [2013] 12

  13. Task Definition • Assume a set of N cateogries 𝒚 = 𝑦 = , 𝑦 > , … , 𝑦 @ – Each category has a name and a set of images • Goal: induce a taxonomy tree over 𝒚 – Using both text & visual features x = {Animal, Fish, Shark, Cat, Tiger, Terrestrial animal, Seafish, Feline} • Setting: Supervised learning of category hierarchies from data

  14. Model Let 𝑨 B (1 ≤ 𝑨 B ≤ 𝑂) be the index of the parent of category 𝑦 B – The set 𝐴 = {𝑨 = , 𝑨 > , … , 𝑨 B } encodes the whole tree structure • Our goal → infer the conditional distribution 𝑞(𝒜|𝒚) x = {Animal, Fish, Shark, Cat, Tiger, Terrestrial animal, Seafish, Feline} 14

  15. Model Overview • Intuition: Categories tend to be closely related to parents and siblings – (text) hypernym-hyponym relation: shark -> cat shark – visual similarity: images of shark ⇔ images of ray • Method: Induce features from distributed representations of images and text – image: deep convnet – text: word embedding 15

  16. Taxonomy Induction Model • Notations: – 𝒅 B : child nodes of 𝑦 B N ∈ 𝒅 B – 𝑦 B – 𝑕 P : consistency term depending on features – 𝑥 : model weights to be learned parent indexes popularity (#child) of categories of categories N with parent consistency of 𝑦 B prior of popularity 16 𝑦 B and siblings 𝒅 𝒐 \x BN

  17. Taxonomy Induction Model • Looking into 𝑕 P : N , 𝒅 B \𝑦 B N – 𝑕 𝑦 B , 𝑦 B evaluates how consistent a parent-child group is. – The whole model is a factorization of consistency terms of all local parent-child groups. N with parent consistency of 𝑦 B 𝑦 B and siblings 𝒅 𝒐 \x BN 17

  18. Model: Develop 𝑕 P • Notations: – 𝒅 B : child nodes of 𝑦 B N ∈ 𝒅 B – 𝑦 B – 𝑕 P : consistency term depending on features – 𝑥 : model weights to be learned weight vector (to be learned) feature vector: feature N with parent consistency of 𝑦 B N with parent vector of 𝑦 B 𝑦 B and siblings 𝒅 𝒐 \x BN N 𝑦 B and siblings 𝒅 B \𝑦 B 18

  19. Feature: Develop 𝑔 • Visual features: – Sibling similarity – Parent-child similarity – Parent prediction • Text features – Parent prediction [Fu et al.] – Sibling Similarity – Surface features [Bansal et al.] 19

  20. Feature: Develop 𝑔 • Visual features: Sibling similarity (S-V1*) – Step 1 : fit a Gaussian to the images of each category – Step 2: Derive the pairwise similarity 𝑤𝑗𝑡𝑡𝑗𝑛(𝑦 B , 𝑦 U ) – Step 3: Derive the groupwise similarity by averaging S-V1 evaluates the visual similarity between siblings * S: Siblings, V: Visual 20

  21. Feature: Develop 𝑔 • Visual features: Parent-child Similarity (PC-V1*) – Step 1 : Fit a Gaussian for child categories – Step 2: Fit a Gaussian for only the top-K images of parent categories – Step 3 – 4: same with S-V1 Seafish Shark * PC: Parent-child, V: Visual 21

  22. Feature: Develop 𝑔 • Visual features: Parent Prediction (PC-V2*) – Step 1 : Learn a projection matrix to map the mean image of child category to the word embedding of its parent category – Step 2: Calculate the distance – Step 3: bin the distance as a feature vector * PC: Parent-child, V: Visual 22

  23. Feature: Develop 𝑔 • Text features – Parent prediction [Fu et al.] • Parent prediction: projection from child to parent – Sibling Similarity • Distance between word vectors – Surface features [Bansal et al.] • Ends with (e.g. catshark is a sub-category of shark), LCS, Capitalization, etc. 23

  24. Parameter Estimation • Inference – Gibbs sampling • Learning – Supervised learning from gold taxonomies of training data – Gradient descent-based maximum likelihood estimation • Output taxonomies – Chao-Liu-Edmonds algorithm 24

  25. Experiment Setup • Implementation – Wordvec: Google word2vec – Convnet: VGG-16 >VW • Evaluation metric: Ancestor-F1 = VXW • Data – Training set: ImageNet taxonomies 25

  26. Evaluation Results: Comparison to baseline methods • Embedding-based feature (LV) is comparable to state-of-the-art • Full feature set (LVB) achieve the best • L: L anguage features – surface features – embedding features • V: V isual features • B: B ansal2014 features – web ngrams etc. • E: E mbedding features 26

  27. Evaluation Results: How much visual features help? Messages: • Visual similarity (S-V1, PC-V1) help a lot • The complexity of visual representations does not affect much 27

  28. Evaluation Results: Investigating PC-V1 • Images of parent category are not all necessarily visually similar to images of child category Seafish Shark 28

  29. Evaluation Results: When/Where visual features help? • Messages: – Shallow layers ↔ abstract categories ↔ text features more effective – Deep layers ↔ specific categories ↔ visual features more effective Weights v.s. depth 29

  30. Take-home Message • Visual similarity helps taxonomy induction a lot – Sibling similarity – Parent-child similarity • Which features are more important? – Visual features are more indicative in near- leaf layers – Text features more evident in near-root layers • Embedding features augments word count features 30

  31. Thank You! Q & A 31

  32. Evaluation Results: Visualization 32

Recommend


More recommend