distant supervision and multir
play

Distant Supervision and MultiR Happy Mittal We will discuss - PowerPoint PPT Presentation

Distant Supervision and MultiR Happy Mittal We will discuss Distant Supervision [Mintz et al, 2009] MultiR [Hoffmann et al, 2011] Relation Instance Extraction Hrithik Roshans Movie Kaabil Actor(Hrithik Roshan, Kaabil) features love


  1. Distant Supervision and MultiR Happy Mittal

  2. We will discuss • Distant Supervision [Mintz et al, 2009] • MultiR [Hoffmann et al, 2011]

  3. Relation Instance Extraction Hrithik Roshan’s Movie Kaabil Actor(Hrithik Roshan, Kaabil) features love affair between two blind people. • Fully Supervised Learning • Labeled corpora of sentences. • Suffers from small dataset, domain bias. • Unsupervised Learning • Cluster patterns to identify relations. • Large corpora available. • Can’t give name to relations identified. • Bootstrap Learning • Give initial seed patterns and facts. • Generate more facts and patterns. • Suffers from semantic drift. • Distant Supervision • Combines advantages of above approaches.

  4. Distant Supervision [Mintz et al 2009] Person Birth Place Edwin Hubble Marshfield … . … . Sentences Knowledge base Generate training (Ex : Wikipedia (Ex : Freebase) data articles) HOW ?  Assumption : Fact r(e1,e2) => Every sentence having entities e1 and e2 specifies relation r .

  5. Distant Supervision (Generating training data) • Astronomer Edwin Hubble was born in Marshfield , Missouri. • Features : • Lexical Features o Entity Types of both entities. NE1 NE2 Label PER LOC Birthplace

  6. Distant Supervision (Generating training data) • Astronomer Edwin Hubble was born in Marshfield , Missouri. • Features : • Lexical Features o Words between entities and their POS tags. NE1 Middle NE2 Label PER [was/VERB born/VERB in/CLOSED] LOC Birthplace

  7. Distant Supervision (Generating training data) • Astronomer Edwin Hubble was born in Marshfield , Missouri. • Features : • Lexical Features o Window of k words to left and right, k ∈ {0,1,2} Left Window NE1 Middle NE2 Right window Label [] PER [was/VERB born/VERB in/CLOSED] LOC [] Birthplace [Astronomer] PER [was/VERB born/VERB in/CLOSED] LOC [,] Birthplace [#,Astronomer] PER [was/VERB born/VERB in/CLOSED] LOC [,Missouri] Birthplace

  8. Distant Supervision (Generating training data) • Astronomer Edwin Hubble was born in Marshfield , Missouri. • Features : • Syntactic Features o Dependency Path between entities. o Window node in dependency path.

  9. Distant supervision • Strong Assumption : If a fact r(e1,e2) is seen in KB, then • Every sentence having e1 and e2 specifies relation r . • Relax this assumption : • At least one sentence having e1 and e2 specifies relation r [Riedel et al, 2010]

  10. Relaxing the assumption [Riedel et al 2010] Relation 𝑍 ∈ R Founded Variable Z1,Z2 ∈ {0,1} Relation mention Z1 = 1 Z2 = 0 Variables Steve Jobs Steve Jobs is the CEO of Apple founded Apple X2 X1 • Model the joint distribution 𝑄(𝑍 = 𝑧, 𝑎 = 𝑨|𝑦)

  11. Relaxing the assumption [Riedel et al 2010] Relation 𝑍 ∈ R Founded Variable Z1,Z2 ∈ {0,1} Relation mention Z1 = 1 Z2 = 0 Variables Steve Jobs is the Steve Jobs CEO of Apple founded Apple X2 X1 • Model the joint distribution 𝑄 𝑍 = 𝑧, 𝑎 = 𝑨 𝑦 • Problem : Doesn’t allow overlapping relations. • MultiR solves that problem.

  12. MultiR [Hoffman et al 2011] … CEO-of Founded 𝑍 ∈ 0,1 𝑠 Relation Variables (Capture aggregate level prediction) Z1 = Z2 = Z3 = 𝑎 𝑗 ∈ 𝑆 Founded CEO-of None Relation mention Variables (Capture sentence level prediction) Steve Jobs is the Steve Jobs left Steve Jobs CEO of Apple Apple founded Apple X2 X3 X1

  13. MultiR [Hoffman et al 2011] • Probability Distribution 1 𝑎 𝑦 𝑠 𝜚 𝑘𝑝𝑗𝑜 (𝑧 𝑠 , 𝑨) 𝑗 ∅ 𝑓𝑦𝑢𝑠𝑏𝑑𝑢 (𝑨 𝑗, 𝑦 𝑗 ) • 𝑄 𝑍 = 𝑧, 𝑎 = 𝑨 𝑦 = 1 if at least one 𝑨 𝑗 [Mintz et al] features mentions relation 𝑧 𝑠

  14. MultiR [Hoffman et al 2011] • Parameter Learning 1 𝑎 𝑦 𝑠 𝜚 𝑘𝑝𝑗𝑜 (𝑧 𝑠 , 𝑨) 𝑗 ∅ 𝑓𝑦𝑢𝑠𝑏𝑑𝑢 (𝑨 𝑗, 𝑦 𝑗 ) • 𝑄 𝑍 = 𝑧, 𝑎 = 𝑨 𝑦; 𝜄 = 1 if at least one 𝑨 𝑗 [Mintz et al] features mentions relation 𝑧 𝑠 1 𝑎 𝑦 𝑠 𝜚 𝑘𝑝𝑗𝑜 (𝑧 𝑠 , 𝑨) 𝑗 exp( 𝑘 𝜄 • 𝑄 𝑍 = 𝑧, 𝑎 = 𝑨 𝑦; 𝜄 = 𝑘 ∅ 𝑘 (𝑨 𝑗, 𝑦 𝑗 ) • Treat Z variables as latent variables. • Interested in maximizing 𝑀 𝜄 = 𝑄 𝑧 𝑗 𝑦 𝑗 ; 𝜄 = 𝑄 𝑧 𝑗 , 𝑨 𝑦 𝑗 ; 𝜄 𝑗 𝑗 𝑨 𝑚 𝜄 = 𝑚𝑝𝑕 𝑄 𝑧 𝑗 , 𝑨 𝑦 𝑗 ; 𝜄 𝑗 𝑨

  15. MultiR [Hoffman et al 2011] • Parameter learning Assumption of online training

  16. MultiR [Hoffman et al 2011] • Parameter learning Difficult to compute Compute argmax instead

  17. MultiR [Hoffman et al 2011] • Learning Algorithm Need to do two inferences

  18. MultiR Inference 1 𝑏𝑠𝑕𝑛𝑏𝑦 𝑧,𝑨 𝑄(𝑧, 𝑨|𝑦; 𝜄) Capital CEO-of founder ? ? ? 𝑍 ∈ 0,1 𝑠 Relation Variables (Capture aggregate level prediction) ? ? ? 𝑎 𝑗 ∈ 𝑆 Relation mention Variables (Capture sentence level prediction) Apple was Steve Jobs ls the Steve Jobs founded by Steve CEO of Apple founded Apple Jobs X1 X3 X2 Founder 10.5 12.5 4.5 CEO-of 8.9 8.7 8.5 Capital 6.3 4.5 0.5

  19. MultiR Inference 1 𝑏𝑠𝑕𝑛𝑏𝑦 𝑧,𝑨 𝑄(𝑧, 𝑨|𝑦; 𝜄) Capital CEO-of founder ? ? ? 𝑍 ∈ 0,1 𝑠 Relation Variables (Capture aggregate level prediction) ? ? ? 𝑎 𝑗 ∈ 𝑆 Relation mention Variables (Capture sentence level prediction) Apple was Steve Jobs is the Steve Jobs founded by Steve CEO of Apple founded Apple Jobs X1 X2 X3 Founder 10.5 12.5 4.5 CEO-of 8.9 8.7 8.5 Capital 6.3 4.5 0.5

  20. MultiR Inference 1 𝑏𝑠𝑕𝑛𝑏𝑦 𝑧,𝑨 𝑄(𝑧, 𝑨|𝑦; 𝜄) Capital CEO-of founder ? ? ? 𝑍 ∈ 0,1 𝑠 Relation Variables (Capture aggregate level prediction) Founder Founder CEO-of 𝑎 𝑗 ∈ 𝑆 Relation mention Variables (Capture sentence level prediction) Apple was Steve Jobs is the Steve Jobs founded by Steve CEO of Apple founded Apple Jobs X1 X2 X3 Founder 10.5 12.5 4.5 CEO-of 8.9 8.7 8.5 Capital 6.3 4.5 0.5

  21. MultiR Inference 1 𝑏𝑠𝑕𝑛𝑏𝑦 𝑧,𝑨 𝑄(𝑧, 𝑨|𝑦; 𝜄) Capital CEO-of founder 0 1 1 𝑍 ∈ 0,1 𝑠 Relation Variables (Capture aggregate level prediction) Founder Founder CEO-of 𝑎 𝑗 ∈ 𝑆 Relation mention Variables (Capture sentence level prediction) Apple was Steve Jobs is the Steve Jobs founded by Steve CEO of Apple founded Apple Jobs X1 X2 X3 Founder 10.5 12.5 4.5 𝑃( 𝑆 𝑇 ) CEO-of 8.9 8.7 8.5 Capital 6.3 4.5 0.5

  22. MultiR Inference 2 𝑏𝑠𝑕𝑛𝑏𝑦 𝑨 𝑄(𝑨|𝑦, 𝑧; 𝜄) Capital CEO-of founder 0 1 1 𝑍 ∈ 0,1 𝑠 Relation Variables (Capture aggregate level prediction) ? ? ? 𝑎 𝑗 ∈ 𝑆 Relation mention Variables (Capture sentence level prediction) Apple was Steve Jobs is the Steve Jobs founded by Steve CEO of Apple founded Apple Jobs X3 X2 X1 Founder 10.5 12.5 4.5 CEO-of 8.9 8.7 8.5 Capital 6.3 4.5 0.5

  23. MultiR Inference 2 𝑏𝑠𝑕𝑛𝑏𝑦 𝑨 𝑄(𝑨|𝑦, 𝑧; 𝜄) Capital CEO-of founder 0 1 1 𝑍 ∈ 0,1 𝑠 Relation Variables Potentials as 8.5 (Capture aggregate level 8.7 10.5 edge weights 8.9 prediction) 4.5 12.5 (Ignore edges ? ? ? With y = 0) 𝑎 𝑗 ∈ 𝑆 Relation mention Variables (Capture sentence level prediction) Apple was Steve Jobs is the Steve Jobs founded by Steve CEO of Apple founded Apple Jobs X1 X3 X2 Founder 10.5 12.5 4.5 CEO-of 8.9 8.7 8.5 Capital 6.3 4.5 0.5

  24. MultiR Inference 2 𝑏𝑠𝑕𝑛𝑏𝑦 𝑨 𝑄(𝑨|𝑦, 𝑧; 𝜄) Capital CEO-of founder 0 1 1 Variant of weighted Potentials as 8.5 8.7 edge cover problem 10.5 edge weights 8.9 4.5 12.5 (Ignore edges ? ? ? With y = 0) Each y at least one edge Each z exactly one Apple was Steve Jobs is the Steve Jobs founded by Steve CEO of Apple founded Apple edge Jobs X1 X3 X2 Founder 10.5 12.5 4.5 CEO-of 8.9 8.7 8.5 Capital 6.3 4.5 0.5

  25. MultiR Inference 2 𝑏𝑠𝑕𝑛𝑏𝑦 𝑨 𝑄(𝑨|𝑦, 𝑧; 𝜄) Capital CEO-of founder 0 1 1 Variant of weighted Potentials as 8.5 8.7 edge cover problem 10.5 edge weights 8.9 4.5 12.5 (Ignore edges ? ? ? With y = 0) Each y at least one edge Each z exactly one Apple was Steve Jobs is the Steve Jobs founded by Steve CEO of Apple founded Apple edge Jobs X1 X3 X2 Founder 10.5 12.5 4.5 CEO-of 8.9 8.7 8.5 Capital 6.3 4.5 0.5

  26. MultiR Inference 2 𝑏𝑠𝑕𝑛𝑏𝑦 𝑨 𝑄(𝑨|𝑦, 𝑧; 𝜄) Capital CEO-of founder 0 1 1 Variant of weighted Potentials as 8.5 8.7 edge cover problem 10.5 edge weights 8.9 4.5 12.5 (Ignore edges Founder Founder With y = 0) CEO-Of Each y at least one edge Each z exactly one Apple was Steve Jobs is the Steve Jobs founded by Steve CEO of Apple founded Apple edge Jobs X1 X3 X2 Exact Solution Founder 10.5 12.5 4.5 𝑃(𝑊(𝐹 + 𝑊𝑚𝑝𝑕𝑊)) CEO-of 8.9 8.7 8.5 Capital 6.3 4.5 0.5

Recommend


More recommend