KNOWLEDGE GRAPH CONSTRUCTION Jay Pujara University of Maryland, College Park Max Planck Institute 7/9/2015
Can Computers Create Knowledge? Internet Knowledge Massive source of publicly available information
Computers + Knowledge =
What does it mean to create knowledge? What do we mean by knowledge?
Defining the Questions • Extraction • Representation • Reasoning and Inference
Defining the Questions • Extraction • Representation • Reasoning and Inference
A Revised Knowledge-Creation Diagram Extraction Internet Knowledge Graph (KG) Cutting-edge IE methods Structured representation of Massive source of entities, their labels and publicly available the relationships information between them
Knowledge Graphs in the wild
Motivating Problem: Real Challenges Extraction Internet Knowledge Graph Difficult! Noisy! Contains many errors and inconsistencies
NELL: The Never-Ending Language Learner • Large-scale IE project (Carlson et al., AAAI10) • Lifelong learning: aims to “read the web” • Ontology of known labels and relations • Knowledge base contains millions of facts
Examples of NELL errors
Entity co-reference errors Kyrgyzstan has many variants: • Kyrgystan • Kyrgistan • Kyrghyzstan • Kyrgzstan • Kyrgyz Republic
Missing and spurious labels Kyrgyzstan is labeled a bird and a country
Missing and spurious relations Kyrgyzstan’s location is ambiguous – Kazakhstan, Russia and US are included in possible locations
Violations of ontological knowledge • Equivalence of co-referent entities (sameAs) • SameEntity(Kyrgyzstan, Kyrgyz Republic) • Mutual exclusion (disjointWith) of labels • MUT(bird, country) • Selectional preferences (domain/range) of relations • RNG(countryLocation, continent) Enforcing these constraints requires jointly considering multiple extractions across documents
Examples where joint models have succeeded • Information extraction • ER+Segmentation: Poon & Domingos, AAAI07 • SRL: Srikumar & Roth, EMNLP11 • Within-doc extraction: Singh et al., AKBC13 • Social and communication networks • Fusion: Eldardiry & Neville, MLG10 • EMailActs: Carvalho & Cohen, SIGIR05 • GraphID: Namata et al., KDD11
GRAPH IDENTIFICATION
Slides courtesy Getoor, Namata, Kok Transformation Graph Identification Input Graph Output Graph Available but inappropriate Appropriate for further for analysis analysis
Slides courtesy Getoor, Namata, Kok Motivation: Different Networks nsmith@msn.com Neil Smith mjones@email.com mtaylor@email.com Mary Taylor Robert Lee neil@email.com Mary Jones robert@email.com Anne Cole acole@email.com mary@email.com Label: CEO Manager Assistant Programmer Communication Network Organizational Network Nodes: Email Address Nodes: Person Edges: Communication Edges: Manages Node Attributes: Words Node Labels: Title
Slides courtesy Getoor, Namata, Kok Graph Identification nsmith@msn.com Neil Smith mjones@email.com mtaylor@email.com Mary Taylor Robert Lee neil@email.com Graph Iden+fica+on Mary Jones robert@email.com Anne Cole acole@email.com mary@email.com Label: CEO Manager Assistant Programmer Input Graph: Email Communication Network Output Graph: Social Network
Slides courtesy Getoor, Namata, Kok Graph Identification nsmith@msn.com mjones@email.com mtaylor@email.com neil@email.com Graph Iden+fica+on robert@email.com acole@email.com mary@email.com Input Graph: Email Communication Network Output Graph: Social Network • What’s involved?
Slides courtesy Getoor, Namata, Kok Graph Identification nsmith@msn.com Neil Smith mjones@email.com mtaylor@email.com Mary Taylor Robert Lee neil@email.com ER robert@email.com Anne Cole Mary Jones acole@email.com mary@email.com Input Graph: Email Communication Network Output Graph: Social Network • What’s involved? • Entity Resolution (ER): Map input graph nodes to output graph nodes
Slides courtesy Getoor, Namata, Kok Graph Identification nsmith@msn.com Neil Smith mjones@email.com mtaylor@email.com Mary Taylor Robert Lee neil@email.com ER+LP robert@email.com Anne Cole Mary Jones acole@email.com mary@email.com Input Graph: Email Communication Network Output Graph: Social Network • What’s involved? • Entity Resolution (ER): Map input graph nodes to output graph nodes • Link Prediction (LP): Predict existence of edges in output graph
Slides courtesy Getoor, Namata, Kok Graph Identification nsmith@msn.com Neil Smith mjones@email.com mtaylor@email.com Mary Taylor Robert Lee neil@email.com ER+LP+NL robert@email.com Anne Cole Mary Jones acole@email.com mary@email.com Label: CEO Manager Assistant Programmer Input Graph: Email Communication Network Output Graph: Social Network • What’s involved? • Entity Resolution (ER): Map input graph nodes to output graph nodes • Link Prediction (LP): Predict existence of edges in output graph • Node Labeling (NL): Infer the labels of nodes in the output graph
Slides courtesy Getoor, Namata, Kok Problem Dependencies ER Input LP NL Graph • Most work looks at these tasks in isolation • In graph identification they are: • Evidence-Dependent – Inference depend on observed input graph e.g., ER depends on input graph • Intra-Dependent – Inference within tasks are dependent e.g., NL prediction depend on other NL predictions • Inter-Dependent – Inference across tasks are dependent e.g., LP depend on ER and NL predictions
KNOWLEDGE GRAPH IDENTIFICATION Pujara, Miao, Getoor, Cohen, ISWC 2013 (best student paper)
(Pujara et al., ISWC13) Motivating Problem (revised) Knowledge Graph (noisy) Extraction Graph Internet = Large-scale IE Joint Reasoning
(Pujara et al., ISWC13) Knowledge Graph Identification Problem: Knowledge Graph Knowledge Graph = Identification Extraction Graph Solution: Knowledge Graph Identification (KGI) • Performs graph identification : • entity resolution • node labeling • link prediction • Enforces ontological constraints • Incorporates multiple uncertain sources
(Pujara et al., ISWC13) Illustration of KGI: Extractions Uncertain Extractions: .5: Lbl(Kyrgyzstan, bird) .7: Lbl(Kyrgyzstan, country) .9: Lbl(Kyrgyz Republic, country) .8: Rel(Kyrgyz Republic, Bishkek, hasCapital)
(Pujara et al., ISWC13) Illustration of KGI: Ontology + ER Extraction Graph Uncertain Extractions: .5: Lbl(Kyrgyzstan, bird) Kyrgyzstan Kyrgyz Republic .7: Lbl(Kyrgyzstan, country) .9: Lbl(Kyrgyz Republic, country) Rel(hasCapital) .8: Rel(Kyrgyz Republic, Bishkek, hasCapital) l b country L bird Bishkek
(Pujara et al., ISWC13) Illustration of KGI: Ontology + ER (Annotated) Extraction Graph Uncertain Extractions: SameEnt .5: Lbl(Kyrgyzstan, bird) Kyrgyzstan Kyrgyz Republic .7: Lbl(Kyrgyzstan, country) .9: Lbl(Kyrgyz Republic, country) Rel(hasCapital) .8: Rel(Kyrgyz Republic, Bishkek, hasCapital) D o m l b Ontology: country L Dom(hasCapital, country) Mut(country, bird) bird Entity Resolution: Bishkek SameEnt(Kyrgyz Republic, Kyrgyzstan)
(Pujara et al., ISWC13) Illustration of KGI (Annotated) Extraction Graph Uncertain Extractions: SameEnt .5: Lbl(Kyrgyzstan, bird) Kyrgyzstan Kyrgyz Republic .7: Lbl(Kyrgyzstan, country) .9: Lbl(Kyrgyz Republic, country) Rel(hasCapital) .8: Rel(Kyrgyz Republic, Bishkek, hasCapital) D o m l b Ontology: country L Dom(hasCapital, country) Mut(country, bird) bird Entity Resolution: Bishkek SameEnt(Kyrgyz Republic, Kyrgyzstan) After Knowledge Graph Identification Kyrgyzstan Rel(hasCapital) Lbl Bishkek country Kyrgyz Republic
(Pujara et al., ISWC13) Modeling Knowledge Graph Identification
(Pujara et al., ISWC13) Viewing KGI as a probabilistic graphical model Rel(hasCapital, Lbl(Kyrgyzstan, bird) Kyrgyzstan, Bishkek) Lbl(Kyrgyzstan, country) Lbl(Kyrgyz Republic, country) Rel(hasCapital, Lbl(Kyrgyz Republic, Kyrgyz Republic, bird) Bishkek)
(Pujara et al., ISWC13) Background: Probabilistic Soft Logic (PSL) (Broecheler et al., UAI10; Kimming et al., NIPS-ProbProg12) • Templating language for hinge-loss MRFs, very scalable! • Model specified as a collection of logical formulas SameEnt ( E 1 , E 2 ) ˜ ∧ Lbl ( E 1 , L ) ⇒ Lbl ( E 2 , L ) Uses soft-logic formulation p ˜ ∧ q = max(0 , p + q − 1) • Truth values of atoms relaxed p ˜ ∨ q = min(1 , p + q ) to [0,1] interval • Truth values of formulas ¬ p = 1 − p ˜ derived from Lukasiewicz p ˜ ⇒ q = min(1 , q − p + 1) t-norm
Soft Logic T utorial: Rules to Groundings • Given a database of evidence, we can convert rule templates to instances (grounding) • Rules are grounded by substituting literals into formulas SameEnt ( E 1 , E 2 ) ˜ ∧ Lbl ( E 1 , L ) ⇒ Lbl ( E 2 , L ) SameEnt (Kyrgyzstan , Kyrygyz Republic) ˜ ∧ Lbl (Kyrgyzstan , country) ⇒ Lbl (Kyrygyz Republic , country) • The soft logic interpretation assigns a “satisfaction” value to each ground rule
Recommend
More recommend