KNOWLEDGE GRAPH IDENTIFICATION Jay Pujara 1 , Hui Miao 1 , Lise - PowerPoint PPT Presentation

KNOWLEDGE GRAPH IDENTIFICATION Jay Pujara 1 , Hui Miao 1 , Lise Getoor 1 , William Cohen 2 1 University of Maryland, College Park, US 2 Carnegie Mellon University International Semantic Web Conference 10/25/2013

Overview Problem: Approach: Build a Knowledge Graph Knowledge Graph from millions of noisy Identification reasons extractions jointly over all facts in the knowledge graph Method: Results: Use probabilistic soft logic State-of-the-art performance to easily specify models and on real-world datasets efficiently optimize them producing knowledge graphs with millions of facts

CHALLENGES IN KNOWLEDGE GRAPH CONSTRUCTION

Motivating Problem: New Opportunities Extraction Internet Knowledge Graph (KG) Cutting-edge IE Structured methods representation of Massive source of entities, their labels and publicly available the relationships information between them

Motivating Problem: Real Challenges Extraction Internet Knowledge Graph Difficult! Noisy! Contains many errors and inconsistencies

NELL: The Never-Ending Language Learner • Large-scale IE project (Carlson et al., 2010) • Lifelong learning: aims to “read the web” • Ontology of known labels and relations • Knowledge base contains millions of facts

Examples of NELL errors

Entity co-reference errors Kyrgyzstan has many variants: • Kyrgystan • Kyrgistan • Kyrghyzstan • Kyrgzstan • Kyrgyz Republic

Missing and spurious labels Kyrgyzstan is labeled a bird and a country

Missing and spurious relations Kyrgyzstan’s location is ambiguous – Kazakhstan, Russia and US are included in possible locations

Violations of ontological knowledge • Equivalence of co-referent entities (sameAs) • SameEntity(Kyrgyzstan, Kyrgyz Republic) • Mutual exclusion (disjointWith) of labels • MUT(bird, country) • Selectional preferences (domain/range) of relations • RNG(countryLocation, continent) Enforcing these constraints require jointly considering multiple extractions

KNOWLEDGE GRAPH IDENTIFICATION

Motivating Problem (revised) Knowledge Graph (noisy) Extraction Graph Internet = Large-scale IE Joint Reasoning

Knowledge Graph Identification Problem: Knowledge Graph Knowledge Graph = Identification Extraction Graph Solution: Knowledge Graph Identification (KGI) • Performs graph identification : • entity resolution • collective classification • link prediction • Enforces ontological constraints • Incorporates multiple uncertain sources

Illustration of KGI: Extractions Uncertain Extractions: .5: Lbl(Kyrgyzstan, bird) .7: Lbl(Kyrgyzstan, country) .9: Lbl(Kyrgyz Republic, country) .8: Rel(Kyrgyz Republic, Bishkek, hasCapital)

Illustration of KGI: Extraction Graph Extraction Graph Uncertain Extractions: .5: Lbl(Kyrgyzstan, bird) Kyrgyzstan Kyrgyz Republic .7: Lbl(Kyrgyzstan, country) .9: Lbl(Kyrgyz Republic, country) Rel(hasCapital) .8: Rel(Kyrgyz Republic, Bishkek, Lbl hasCapital) country bird Bishkek

Illustration of KGI: Ontology + ER (Annotated) Extraction Graph Uncertain Extractions: SameEnt .5: Lbl(Kyrgyzstan, bird) Kyrgyzstan Kyrgyz Republic .7: Lbl(Kyrgyzstan, country) .9: Lbl(Kyrgyz Republic, country) Rel(hasCapital) .8: Rel(Kyrgyz Republic, Bishkek, Lbl Dom hasCapital) country Ontology: Dom(hasCapital, country) Mut(country, bird) bird Entity Resolution: Bishkek SameEnt(Kyrgyz Republic, Kyrgyzstan)

Illustration of KGI (Annotated) Extraction Graph Uncertain Extractions: SameEnt .5: Lbl(Kyrgyzstan, bird) Kyrgyzstan Kyrgyz Republic .7: Lbl(Kyrgyzstan, country) .9: Lbl(Kyrgyz Republic, country) Rel(hasCapital) .8: Rel(Kyrgyz Republic, Bishkek, Lbl Dom hasCapital) country Ontology: Dom(hasCapital, country) Mut(country, bird) bird Entity Resolution: Bishkek SameEnt(Kyrgyz Republic, Kyrgyzstan) After Knowledge Graph Identification Kyrgyzstan Rel(hasCapital) Lbl Bishkek country Kyrgyz Republic

MODELING KNOWLEDGE GRAPH IDENTIFICATION

Viewing KGI as a probabilistic graphical model Rel(hasCapital, Lbl(Kyrgyzstan, bird) Kyrgyzstan, Bishkek) Lbl(Kyrgyzstan, country) Lbl(Kyrgyz Republic, country) Rel(hasCapital, Lbl(Kyrgyz Republic, Kyrgyz Republic, bird) Bishkek)

Background: Probabilistic Soft Logic (PSL) • Templating language for hinge-loss MRFs, very scalable! • Model specified as a collection of logical formulas SameEnt ( E 1 , E 2 ) ˜ ∧ Lbl ( E 1 , L ) ⇒ Lbl ( E 2 , L ) • Uses soft-logic formulation • Truth values of atoms relaxed to [0,1] interval • Truth values of formulas derived from Lukasiewicz t-norm

Background: PSL Rules to Distributions • Rules are grounded by substituting literals into formulas w EL : SameEnt (Kyrgyzstan , Kyrygyz Republic) ˜ ∧ Lbl (Kyrgyzstan , country) ⇒ Lbl (Kyrygyz Republic , country) • Each ground rule has a weighted distance to satisfaction derived from the formula’s truth value P ( G | E ) = 1 $ & ∑ Z exp − w r ϕ r ( G ) % ' r ∈ R • The PSL program can be interpreted as a joint probability distribution over all variables in knowledge graph, conditioned on the extractions

Background: Finding the best knowledge graph • MPE inference solves max G P(G) to find the best KG • In PSL, inference solved by convex optimization • Efficient: running time scales with O(|R|)

PSL Rules for the KGI Model

PSL Rules: Uncertain Extractions Predicate representing uncertain Relation in relation extraction from extractor T Weight for source T Knowledge Graph (relations) w CR − T : CandRel T ( E 1 , E 2 , R ) ⇒ Rel ( E 1 , E 2 , R ) w CL − T : CandLbl T ( E, L ) ⇒ Lbl ( E, L ) Label in Weight for source T Predicate representing uncertain Knowledge Graph (labels) label extraction from extractor T

PSL Rules: Entity Resolution ER predicate captures • Rules require co-referent confidence that entities entities to have the same are co-referent labels and relations • Creates an equivalence class of co-referent entities

PSL Rules: Ontology Inverse: ˜ w O : Inv ( R, S ) ∧ Rel ( E 1 , E 2 , R ) ⇒ Rel ( E 2 , E 1 , S ) Selectional Preference: ˜ w O : Dom ( R, L ) ∧ Rel ( E 1 , E 2 , R ) ⇒ Lbl ( E 1 , L ) ˜ w O : Rng ( R, L ) ∧ Rel ( E 1 , E 2 , R ) ⇒ Lbl ( E 2 , L ) Subsumption: ˜ w O : Sub ( L, P ) ∧ Lbl ( E, L ) ⇒ Lbl ( E, P ) ˜ w O : RSub ( R, S ) ∧ Rel ( E 1 , E 2 , R ) ⇒ Rel ( E 1 , E 2 , S ) Mutual Exclusion: ˜ w O : Mut ( L 1 , L 2 ) ∧ Lbl ( E, L 1 ) ⇒ ˜ ¬ Lbl ( E, L 2 ) ˜ w O : RMut ( R, S ) ∧ Rel ( E 1 , E 2 , R ) ⇒ ˜ ¬ Rel ( E 1 , E 2 , S ) Adapted from Jiang et al., ICDM 2012

EVALUATION

T wo Evaluation Datasets LinkedBrainz NELL Description Community-supplied data about Real-world IE system extracting musical artists, labels, and general facts from the WWW creative works Noise Realistic synthetic noise Imperfect extractors and ambiguous web pages Candidate Facts 810K 1.3M Unique Labels 27 456 and Relations Ontological 49 67.9K Constraints

LinkedBrainz dataset for KGI Mapping to FRBR/FOAF ontology mo:label mo:Release mo:Label DOM rdfs:domain mo:record foaf:maker RNG rdfs:range mo:Record mo:MusicalArtist inverseOf mo:track INV owl:inverseOf subClassOf subClassOf SUB rdfs:subClassOf mo:Track foaf:made mo:SoloMusicArtist mo:MusicGroup RSUB rdfs:subPropertyOf mo:published_as MUT owl:disjointWith mo:Signal

Adding noise to LinkedBrainz Add realistic noise to LinkedBrainz data: Error Type Erroneous Data Co-reference User misspells artist Label User swaps artist and album fields Relation User omits or adds spurious albums for artist Reliability Gaussian noise on truth value of information

LinkedBrainz experiments Comparisons: Baseline Use noisy truth values as fact scores PSL-EROnly Only apply rules for E ntity R esolution PSL-OntOnly Only apply rules for Ont ological reasoning PSL-KGI Apply K nowledge G raph I dentification model AUC Precision Recall F1 at .5 Max F1 Baseline 0.672 0.946 0.477 0.634 0.788 PSL-EROnly 0.797 0.953 0.558 0.703 0.831 PSL-OntOnly 0.753 0.964 0.605 0.743 0.832 PSL-KGI 0.901 0.970 0.714 0.823 0.919

NELL Evaluation: two settings Target Set: restrict to a subset of KG Complete: Infer full knowledge graph (Jiang, ICDM12) ? ? • Closed-world model • Open-world model • Uses a target set: subset of KG • All possible entities, relations, labels • Derived from 2-hop neighborhood • Inference assigns truth value to • Excludes trivially satisfied variables each variable

NELL experiments: T arget Set Task: Compute truth values of a target set derived from the evaluation data Comparisons: Baseline Average confidences of extractors for each fact in the NELL candidates NELL Evaluate NELL’s promotions (on the full knowledge graph) MLN Method of (Jiang, ICDM12) – estimates marginal probabilities with MC-SAT PSL-KGI Apply full Knowledge Graph Identification model Running Time: Inference completes in 10 seconds, values for 25K facts AUC F1 Baseline .873 .828 NELL .765 .673 MLN (Jiang, 12) .899 .836 PSL-KGI .904 .853

KNOWLEDGE GRAPH IDENTIFICATION Jay Pujara 1 , Hui Miao 1 , Lise - PowerPoint PPT Presentation

KNOWLEDGE GRAPH IDENTIFICATION Jay Pujara 1 , Hui Miao 1 , Lise Getoor 1 , William Cohen 2 1 University of Maryland, College Park, US 2 Carnegie Mellon University International Semantic Web Conference 10/25/2013 Overview Problem: Approach:

Knowledge-Based Agents knowledge knowledge representation, knowledge base, types of knowledge

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

Ontology-Aware Partitioning for Knowledge Graph Identification Jay Pujara, Hui Miao, Lise Getoor,

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

Challenges and Innovations in Building a Product Knowledge Graph XIN LUNA DONG, AMAZON JANUARY,

Graph Indexing: Tree + Delta Delta >= Graph >= Graph Graph Indexing: Tree + Peixian Zhao,

Graph Mining Marco Serafini COMPSCI 532 Lecture 11 Classes of Graph Systems Graph

RISK IDENTIFICATION Everything your competitor knows about Risk Identification on Software

Plan for today Knowledge-based systems 1 Explicit knowledge Knowledge Representation Inferred

Plan for today Knowledge-based systems 1 Tacit knowledge Knowledge Representation Inferred

26:198:722 Expert Systems I Knowledge representation I Knowledge acquisition I Machine learning I

Challenges in Chinese Knowledge Graph Construction Chengyu Wang, Ming Gao, Xiaofeng He, Rong

Knowledge Graph Completion Mayank Kejriwal (USC/ISI) What is knowledge graph completion? An

STARDO G ENTERPRISE KNOWLEDGE GRAP H 1 KNOWLEDGE GRAPH = INFRASTRUCTURE 2 Stardog Union

Part 1: Knowledge Graphs Part 2: Part 3: Knowledge Graph

Part 1: Knowledge Graphs Part 2: Part 3: Knowledge Graph

Reminder: RDF triples The RDF data model is similar to classical conceptual modelling approaches

1 So#ware Engineer at YarcData, part of Cray Inc One

opencypher.org opencypher.org | opencypher@googlegroups.com

SPARQL - Querying the Web of Data Seminar WS 2008/2009 RDF and the Web of Data Olaf Hartig

Semantic Web Approach to Personal Information Management on Mobile Devices Ora Lassila, Ph.D.

Ontologies, Description Logics and Semantic Web A Short Introduction Marek Obitko

LAST WEEK ON IO LAB Group for project 2. Maximum 3 people. Brainstorm controlled vocabulary

RDB2RDF mapping with D2RQ and D2R Server Richard Cyganiak Presentation to W3C RDB2RDF WG, 10 Nov

KNOWLEDGE GRAPH IDENTIFICATION Jay Pujara 1 , Hui Miao 1 , Lise - PowerPoint PPT Presentation

KNOWLEDGE GRAPH IDENTIFICATION Jay Pujara 1 , Hui Miao 1 , Lise Getoor 1 , William Cohen 2 1 University of Maryland, College Park, US 2 Carnegie Mellon University International Semantic Web Conference 10/25/2013 Overview Problem: Approach:

Knowledge-Based Agents knowledge knowledge representation, knowledge base, types of knowledge

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

Ontology-Aware Partitioning for Knowledge Graph Identification Jay Pujara, Hui Miao, Lise Getoor,

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

Challenges and Innovations in Building a Product Knowledge Graph XIN LUNA DONG, AMAZON JANUARY,

Graph Indexing: Tree + Delta Delta &gt;= Graph &gt;= Graph Graph Indexing: Tree + Peixian Zhao,

Graph Mining Marco Serafini COMPSCI 532 Lecture 11 Classes of Graph Systems Graph

RISK IDENTIFICATION Everything your competitor knows about Risk Identification on Software

Plan for today Knowledge-based systems 1 Explicit knowledge Knowledge Representation Inferred

Plan for today Knowledge-based systems 1 Tacit knowledge Knowledge Representation Inferred

26:198:722 Expert Systems I Knowledge representation I Knowledge acquisition I Machine learning I

Challenges in Chinese Knowledge Graph Construction Chengyu Wang, Ming Gao, Xiaofeng He, Rong

Knowledge Graph Completion Mayank Kejriwal (USC/ISI) What is knowledge graph completion? An

STARDO G ENTERPRISE KNOWLEDGE GRAP H 1 KNOWLEDGE GRAPH = INFRASTRUCTURE 2 Stardog Union

Part 1: Knowledge Graphs Part 2: Part 3: Knowledge Graph

Part 1: Knowledge Graphs Part 2: Part 3: Knowledge Graph

Reminder: RDF triples The RDF data model is similar to classical conceptual modelling approaches

1 So#ware Engineer at YarcData, part of Cray Inc One

opencypher.org opencypher.org | opencypher@googlegroups.com

SPARQL - Querying the Web of Data Seminar WS 2008/2009 RDF and the Web of Data Olaf Hartig

Semantic Web Approach to Personal Information Management on Mobile Devices Ora Lassila, Ph.D.

Ontologies, Description Logics and Semantic Web A Short Introduction Marek Obitko

LAST WEEK ON IO LAB Group for project 2. Maximum 3 people. Brainstorm controlled vocabulary

RDB2RDF mapping with D2RQ and D2R Server Richard Cyganiak Presentation to W3C RDB2RDF WG, 10 Nov

Graph Indexing: Tree + Delta Delta >= Graph >= Graph Graph Indexing: Tree + Peixian Zhao,