knowledge graph construction
play

KNOWLEDGE GRAPH CONSTRUCTION Jay Pujara University of Maryland, - PowerPoint PPT Presentation

KNOWLEDGE GRAPH CONSTRUCTION Jay Pujara University of Maryland, College Park Max Planck Institute 7/9/2015 Can Computers Create Knowledge? Internet Knowledge Massive source of publicly available information Computers + Knowledge = What


  1. KNOWLEDGE GRAPH CONSTRUCTION Jay Pujara University of Maryland, College Park Max Planck Institute 7/9/2015

  2. Can Computers Create Knowledge? Internet Knowledge Massive source of publicly available information

  3. Computers + Knowledge =

  4. What does it mean to create knowledge? What do we mean by knowledge?

  5. Defining the Questions • Extraction • Representation • Reasoning and Inference

  6. Defining the Questions • Extraction • Representation • Reasoning and Inference

  7. A Revised Knowledge-Creation Diagram Extraction Internet Knowledge Graph (KG) Cutting-edge IE methods Structured representation of Massive source of entities, their labels and publicly available the relationships information between them

  8. Knowledge Graphs in the wild

  9. Motivating Problem: Real Challenges Extraction Internet Knowledge Graph Difficult! Noisy! Contains many errors and inconsistencies

  10. NELL: The Never-Ending Language Learner • Large-scale IE project (Carlson et al., AAAI10) • Lifelong learning: aims to “read the web” • Ontology of known labels and relations • Knowledge base contains millions of facts

  11. Examples of NELL errors

  12. Entity co-reference errors Kyrgyzstan has many variants: • Kyrgystan • Kyrgistan • Kyrghyzstan • Kyrgzstan • Kyrgyz Republic

  13. Missing and spurious labels Kyrgyzstan is labeled a bird and a country

  14. Missing and spurious relations Kyrgyzstan’s location is ambiguous – Kazakhstan, Russia and US are included in possible locations

  15. Violations of ontological knowledge • Equivalence of co-referent entities (sameAs) • SameEntity(Kyrgyzstan, Kyrgyz Republic) • Mutual exclusion (disjointWith) of labels • MUT(bird, country) • Selectional preferences (domain/range) of relations • RNG(countryLocation, continent) Enforcing these constraints requires jointly considering multiple extractions across documents

  16. Examples where joint models have succeeded • Information extraction • ER+Segmentation: Poon & Domingos, AAAI07 • SRL: Srikumar & Roth, EMNLP11 • Within-doc extraction: Singh et al., AKBC13 • Social and communication networks • Fusion: Eldardiry & Neville, MLG10 • EMailActs: Carvalho & Cohen, SIGIR05 • GraphID: Namata et al., KDD11

  17. GRAPH IDENTIFICATION

  18. Slides courtesy Getoor, Namata, Kok Transformation Graph Identification Input Graph Output Graph Available but inappropriate Appropriate for further for analysis analysis

  19. Slides courtesy Getoor, Namata, Kok Motivation: Different Networks nsmith@msn.com Neil Smith mjones@email.com mtaylor@email.com Mary Taylor Robert Lee neil@email.com Mary Jones robert@email.com Anne Cole acole@email.com mary@email.com Label: CEO Manager Assistant Programmer Communication Network Organizational Network Nodes: Email Address Nodes: Person Edges: Communication Edges: Manages Node Attributes: Words Node Labels: Title

  20. Slides courtesy Getoor, Namata, Kok Graph Identification nsmith@msn.com Neil Smith mjones@email.com mtaylor@email.com Mary Taylor Robert Lee neil@email.com Graph Iden+fica+on Mary Jones robert@email.com Anne Cole acole@email.com mary@email.com Label: CEO Manager Assistant Programmer Input Graph: Email Communication Network Output Graph: Social Network

  21. Slides courtesy Getoor, Namata, Kok Graph Identification nsmith@msn.com mjones@email.com mtaylor@email.com neil@email.com Graph Iden+fica+on robert@email.com acole@email.com mary@email.com Input Graph: Email Communication Network Output Graph: Social Network • What’s involved?

  22. Slides courtesy Getoor, Namata, Kok Graph Identification nsmith@msn.com Neil Smith mjones@email.com mtaylor@email.com Mary Taylor Robert Lee neil@email.com ER robert@email.com Anne Cole Mary Jones acole@email.com mary@email.com Input Graph: Email Communication Network Output Graph: Social Network • What’s involved? • Entity Resolution (ER): Map input graph nodes to output graph nodes

  23. Slides courtesy Getoor, Namata, Kok Graph Identification nsmith@msn.com Neil Smith mjones@email.com mtaylor@email.com Mary Taylor Robert Lee neil@email.com ER+LP robert@email.com Anne Cole Mary Jones acole@email.com mary@email.com Input Graph: Email Communication Network Output Graph: Social Network • What’s involved? • Entity Resolution (ER): Map input graph nodes to output graph nodes • Link Prediction (LP): Predict existence of edges in output graph

  24. Slides courtesy Getoor, Namata, Kok Graph Identification nsmith@msn.com Neil Smith mjones@email.com mtaylor@email.com Mary Taylor Robert Lee neil@email.com ER+LP+NL robert@email.com Anne Cole Mary Jones acole@email.com mary@email.com Label: CEO Manager Assistant Programmer Input Graph: Email Communication Network Output Graph: Social Network • What’s involved? • Entity Resolution (ER): Map input graph nodes to output graph nodes • Link Prediction (LP): Predict existence of edges in output graph • Node Labeling (NL): Infer the labels of nodes in the output graph

  25. Slides courtesy Getoor, Namata, Kok Problem Dependencies ER Input LP NL Graph • Most work looks at these tasks in isolation • In graph identification they are: • Evidence-Dependent – Inference depend on observed input graph e.g., ER depends on input graph • Intra-Dependent – Inference within tasks are dependent e.g., NL prediction depend on other NL predictions • Inter-Dependent – Inference across tasks are dependent e.g., LP depend on ER and NL predictions

  26. KNOWLEDGE GRAPH IDENTIFICATION Pujara, Miao, Getoor, Cohen, ISWC 2013 (best student paper)

  27. (Pujara et al., ISWC13) Motivating Problem (revised) Knowledge Graph (noisy) Extraction Graph Internet = Large-scale IE Joint Reasoning

  28. (Pujara et al., ISWC13) Knowledge Graph Identification Problem: Knowledge Graph Knowledge Graph = Identification Extraction Graph Solution: Knowledge Graph Identification (KGI) • Performs graph identification : • entity resolution • node labeling • link prediction • Enforces ontological constraints • Incorporates multiple uncertain sources

  29. (Pujara et al., ISWC13) Illustration of KGI: Extractions Uncertain Extractions: .5: Lbl(Kyrgyzstan, bird) .7: Lbl(Kyrgyzstan, country) .9: Lbl(Kyrgyz Republic, country) .8: Rel(Kyrgyz Republic, Bishkek, hasCapital)

  30. (Pujara et al., ISWC13) Illustration of KGI: Ontology + ER Extraction Graph Uncertain Extractions: .5: Lbl(Kyrgyzstan, bird) Kyrgyzstan Kyrgyz Republic .7: Lbl(Kyrgyzstan, country) .9: Lbl(Kyrgyz Republic, country) Rel(hasCapital) .8: Rel(Kyrgyz Republic, Bishkek, hasCapital) l b country L bird Bishkek

  31. (Pujara et al., ISWC13) Illustration of KGI: Ontology + ER (Annotated) Extraction Graph Uncertain Extractions: SameEnt .5: Lbl(Kyrgyzstan, bird) Kyrgyzstan Kyrgyz Republic .7: Lbl(Kyrgyzstan, country) .9: Lbl(Kyrgyz Republic, country) Rel(hasCapital) .8: Rel(Kyrgyz Republic, Bishkek, hasCapital) D o m l b Ontology: country L Dom(hasCapital, country) Mut(country, bird) bird Entity Resolution: Bishkek SameEnt(Kyrgyz Republic, Kyrgyzstan)

  32. (Pujara et al., ISWC13) Illustration of KGI (Annotated) Extraction Graph Uncertain Extractions: SameEnt .5: Lbl(Kyrgyzstan, bird) Kyrgyzstan Kyrgyz Republic .7: Lbl(Kyrgyzstan, country) .9: Lbl(Kyrgyz Republic, country) Rel(hasCapital) .8: Rel(Kyrgyz Republic, Bishkek, hasCapital) D o m l b Ontology: country L Dom(hasCapital, country) Mut(country, bird) bird Entity Resolution: Bishkek SameEnt(Kyrgyz Republic, Kyrgyzstan) After Knowledge Graph Identification Kyrgyzstan Rel(hasCapital) Lbl Bishkek country Kyrgyz Republic

  33. (Pujara et al., ISWC13) Modeling Knowledge Graph Identification

  34. (Pujara et al., ISWC13) Viewing KGI as a probabilistic graphical model Rel(hasCapital, Lbl(Kyrgyzstan, bird) Kyrgyzstan, Bishkek) Lbl(Kyrgyzstan, country) Lbl(Kyrgyz Republic, country) Rel(hasCapital, Lbl(Kyrgyz Republic, Kyrgyz Republic, bird) Bishkek)

  35. (Pujara et al., ISWC13) Background: Probabilistic Soft Logic (PSL) (Broecheler et al., UAI10; Kimming et al., NIPS-ProbProg12) • Templating language for hinge-loss MRFs, very scalable! • Model specified as a collection of logical formulas SameEnt ( E 1 , E 2 ) ˜ ∧ Lbl ( E 1 , L ) ⇒ Lbl ( E 2 , L ) Uses soft-logic formulation p ˜ ∧ q = max(0 , p + q − 1) • Truth values of atoms relaxed p ˜ ∨ q = min(1 , p + q ) to [0,1] interval • Truth values of formulas ¬ p = 1 − p ˜ derived from Lukasiewicz p ˜ ⇒ q = min(1 , q − p + 1) t-norm

  36. Soft Logic T utorial: Rules to Groundings • Given a database of evidence, we can convert rule templates to instances (grounding) • Rules are grounded by substituting literals into formulas SameEnt ( E 1 , E 2 ) ˜ ∧ Lbl ( E 1 , L ) ⇒ Lbl ( E 2 , L ) SameEnt (Kyrgyzstan , Kyrygyz Republic) ˜ ∧ Lbl (Kyrgyzstan , country) ⇒ Lbl (Kyrygyz Republic , country) • The soft logic interpretation assigns a “satisfaction” value to each ground rule

Recommend


More recommend