multimodal knowledge graphs
play

Multimodal Knowledge Graphs Generation Methods, Applications, and - PowerPoint PPT Presentation

Multimodal Knowledge Graphs Generation Methods, Applications, and Challenges ShihFu Chang Alireza Zareian, Hassan Akbari, Brian Chen, Svebor Karaman, Zhecan James Wang, and Haoxuan You Columbia University Prof. Heng Ji, Manling Li, Di Lu,


  1. Multimodal Knowledge Graphs Generation Methods, Applications, and Challenges Shih‐Fu Chang Alireza Zareian, Hassan Akbari, Brian Chen, Svebor Karaman, Zhecan James Wang, and Haoxuan You Columbia University Prof. Heng Ji, Manling Li, Di Lu, and Spencer Whitehead University of Illinois, Urbana‐Champaign 1

  2. K no wle dg e Gra phs  E ntitie s, e ve nts, re la tio ns, e tc . Visit Isr ae l Princ e Willia m T e xt IE The first-ever official visit by a British royal to Israel is underway. Prince William the 36-year-old Duke of Cambridge and second in line to the throne will meet with both Israeli and Palestinian leaders over the next three days. 2

  3. K no wle dg e Gra phs  E ntitie s, e ve nts, re la tio ns, e tc .  E ve nts de sc rib e wha t ha ppe ns  E ntitie s a re c ha ra c te rize d b y the a rg ume nt ro le the y pla y in e ve nts Visit Isr ae l Age nt De stina tion Princ e Willia m T e xt IE The first-ever official visit by a British royal to Israel is underway Prince William the 36-year-old Duke of Cambridge and second in line to the throne will meet with both Israeli and Palestinian leaders over the next three days. 3

  4. K no wle dg e Gra phs  Applic a tio n: Que stio n Answe ring , Re a so ning , Hypo the sis Ve rific a tio n a nd Disc o ve ry F ind re c e nt visits o f Visit Isr ae l po litic ia ns to I sra e l. Age nt De stina tion Princ e Willia m T e xt IE Answe rs: The first-ever official visit by a British royal to Israel is underway Prince William the 36-year-old Duke of Cambridge and second in line to the throne will meet with both Israeli and Palestinian leaders over the next three days. 4

  5. Knowledge Beyond Text • We communicate through multi media • Our experiment shows 34% of news images contain event arguments that are not mentioned in text Fire Stretcher 5 TransportPerson_Instrument = stretcher

  6. Why Multimo da l?  Visua l da ta c o nta ins c o mple me nta ry da ta use d fo r:  Visua l I llustra tio n  Disa mb ig ua tio n Suppor te r s  Additio na l De ta ils Pe r son T r anspor t Pr ote ste r s Attac k Instr ume nt T ar ge t De stina tion Age nt Age nt Instr ume nt Bus Ra lly Pe r son T r anspor t Wounde d pr ote ste r Stone 6

  7. Cha lle ng e s & Applic a tio ns  Cha lle ng e s:  Pa rsing ima g e s/ vide o s to struc ture s Visua l IE T e xt IE  Gro unding e ve nt/ e ntitie s a c ro ss mo da litie s  E xtra c ting c o mple me nta ry multimo da l a rg ume nts ? T e xt gr aph Sc e ne g ra ph Multi- Moda l Knowle dg e Gra ph Applic ation 7

  8. Cha lle ng e 1: Pa rsing I ma g e s to Sc e ne Gra phs  E xtra c t struc ture d re pre se nta tio n o f a sc e ne  E ntitie s a nd the ir se ma ntic re la tio nships Object Detection 8

  9. Pa rsing I ma g e s to Sc e ne Gra phs  E xisting me tho d  E xtra c t o b je c t pro po sa ls  Co nte xtua lize fe a ture s b y RNN (o r me ssa g e pa ssing ) (Xu et. al, CVPR 2017)  Cla ssify a ll no de s a nd pa irs o f no de s  L imitations  Co mputa tio na lly e xha ustive  𝑃 𝑜 � fo r 𝑜 � 100 pro po sa ls  Diffic ult to mo de l hig he r o rde r re la tio nships, e .g . “girl e ating c ake with fo rk”  Re q uire s full supe rvisio n Neural Motifs (Zellers, Yatskar, Thomson, Choi , CVPR 2018) One of the SOTA methods for scene graph generation 9

  10. Re fo rmula te a s a n E ve nt-Ce ntric Pro b le m  Our wo rk: Visual Se mantic Parsing Ne twork (Zare ian e t al. CVPR19)  Ge ne ra lize d fo rmula tio n o f sc e ne g ra ph g e ne ra tio n ntity-c e ntric  b ipa rtite re pre se nta tio n o f pre dic a te s & e ntitie s  E  Re duc e c o mputa tio na l c o mple xity fro m 𝑃 𝑜 � to sub -q ua dra tic  Mo de l a rg ume nt ro le re la tio ns b e yo nd (sub je c t, o b je c t), (a g e nt, pa tie nt) re la tio ns age nt Gir l e ating Cake patie nt holding Hand be longs F ork instrume nt 10

  11. Re fo rmula te a s a n E ve nt-Ce ntric Pro b le m  Our wo rk: Visual Se mantic Parsing Ne twork (Zare ian e t al. CVPR20)  Ge ne ra lize d fo rmula tio n o f sc e ne g ra ph g e ne ra tio n ntity-c e ntric  b ipa rtite re pre se nta tio n o f pre dic a te s & e ntitie s  E  Re duc e c o mputa tio na l c o mple xity fro m 𝑃 𝑜 � to sub -q ua dra tic  Mo de l a rg ume nt ro le re la tio ns b e yo nd (sub je c t, o b je c t), (a g e nt, pa tie nt) re la tio ns age nt Gir l e ating Cake holding instrume nt Hand be long F ork patie nt 11

  12. Bipa rtite E mb e dding s fo r E ntity & Pre dic a te RPN Ro I Alig n ��� �1� 𝐼 � ��� �1� 𝐼 � ��� �2� T ra ina b le 𝐼 � ��� �2� 𝐼 � Pre dic a te ��� �3� 𝐼 � E mb e dding … Ba nk … ��� �𝑜 � � 𝐼 � ��� �𝑜 � � 𝐼 � 12

  13. Arg ume nt Ro le Pre dic tio n  I nitia lize e ntity a nd pre dic a te no de s  Co mpute ro le -spe c ific a tte ntio n sc o re s  I nput: e ntity-pre dic a te fe a ture pa irs  Output: sc a la r fo r e a c h the ma tic ro le age nt patie nt � … FC ���������� ��� �1� � 𝐼 � instrume nt ��� �1� 𝐼 � � FC ���������� ��� �2� 𝐼 � ��� �2� 𝐼 � ��� �3� 𝐼 � … … ��� �𝑜 � � 𝐼 � ��� �𝑜 � � 𝐼 � 13

  14. Ro le -De pe nde nt Me ssa g e Pa ssing  Bi- dir e c tional Me ssage passing ntitie s  Role s  Pr  E e dic ate s � �→� ������� FC ��_���� ��� �1� �→� �������.� 𝐼 � FC ��_���� �→� ��������� FC ��_���� � ��� �1� �→� 𝐼 � FC ��_������� … ��� �2� �→� �������.� … 𝐼 � FC ��_���� … ��� �2� 𝐼 � … �→� �������.� FC ��_���� ��� �3� �→� �������.� FC ��_���� 𝐼 � … … … … ��� �𝑜 � � 𝐼 � ��� �𝑜 � � �→� �������.� age nt FC ��_���� patie nt 𝐼 � instrume nt Me ssa g e Pa ssing 14

  15. Ro le -De pe nde nt Me ssa g e Pa ssing  Bi- dir e c tional Me ssage passing ntitie s  Role s  Pr  E e dic ate s � �→�������� FC ��_���� � �→� ��� �1� FC ��_������� 𝐼 � ��� �1� �→���������� �→�������� FC ��_���� 𝐼 � FC ��_���� … ��� �2� 𝐼 � … … ��� �2� �→�������� 𝐼 � FC ��_���� … �→��������.� FC ��_���� ��� �3� 𝐼 � … … … ��� �𝑜 � � �→�������� … 𝐼 � FC ��_���� ��� �𝑜 � � 𝐼 � age nt patie nt instrume nt Me ssa g e Pa ssing 15

  16. Visua l Se ma ntic Pa rsing Ne two rk  Bi-dire c tio na l Me ssa g e pa ssing  Re pe a t fo r 𝑣 ite ra tio ns  Cla ssify no de s a nd e dg e s age nt patie nt … … Bina rize … … … … … … … … instrume nt ��� �1� 𝐼 � Gir l ��� �1� 𝐼 � e ating ��� �2� 𝐼 � Cake ��� �2� 𝐼 � holding ��� �3� 𝐼 � Hand … … … … ��� �𝑜 � � 𝐼 � be long ��� �𝑜 � � 𝐼 � F ork � FC ���� � FC ���� 16

  17. Visua l Se ma ntic Pa rsing Ne two rk  We akly supe r vise d tr aining  Unkno wn a lig nme nt b e twe e n o utput a nd g ro und truth g ra phs age nt patie nt … … … … … … … … … … instrume nt l | 𝐷 � �1� ��� �1� 𝐼 � Gir Cake e ating| 𝐷 � �1� ��� �1� 𝐼 � holding Cake | 𝐷 � �2� ��� �2� 𝐼 � F ork holding| 𝐷 � �2� ��� �2� 𝐼 � e ating Hand| 𝐷 � �3� ��� �3� 𝐼 � … Gir l … … … be long| 𝐷 � �𝑜 � � ��� �𝑜 � � 𝐼 � be long ork| 𝐷 � �𝑜 � � ��� �𝑜 � � 𝐼 � F Hand Gr ound tr uth 𝓜 𝑭 𝓜 𝑺 𝓜 𝑸 17

  18. 18 Visua l Se ma ntic Pa rsing Ne two rk

  19. I nc o rpo ra te E xte rna l K B (Za re ia n, e t a l, E CCV20)  L ink c o nc e pts in sc e ne g ra phs to e xte rna l kno wle dg e b a se s suc h a s Co nc e ptNe t  Pa ss me ssa g e s o ve r b ridg e s b e twe e n sc e ne g ra phs a nd e xte rna l g ra phs  Re fine b ridg e s b e twe e n g ra phs 19

  20. 20 RN) Base line (KE T xa mple s o f GB-NE Ours (GB- Ne t) RN) Base line (KE Sc e ne Gra ph E Ours (GB- Ne t)

  21. Cha lle ng e 2: T e xt-Visua l Gro unding (Akb a ri e t a l CVPR19)  L o c a lize te xt q ue ry in ima g e  Bridg e visua l a nd te xt kno wle dg e g ra phs  Witho ut using pre de fine d c la ssifie rs  Cha lle ng e s  Se nsitive to do ma in va ria tio ns  Ab stra c t c o nc e pt no t g ro unda b le 21

Recommend


More recommend