Knowledge Graph Completion Introduction and motivation We have our - PowerPoint PPT Presentation

Knowledge Graph Completion

Introduction and motivation We have our ‘constructed’ knowledge graph, now what? freebase: Seattle 2

Introduction and motivation Problem 1: Wrong/missing triples 3

Introduction and motivation Problem 2: Many nodes refer to the same underlying entity 4

For Web extractions, noise is inevitable • Thousands of web domains • Many page formats • Distracting & irrelevant content • Purposeful obfuscation • Poor grammar & spelling • Tables To reach its potential, a constructed KG must be completed or identified 5

Noise Analysis • Extractors found to offer a collective tradeoff between multiple dimensions • Noise is rarely ‘random’! Glossary Regex Landmark CRF NER Easy to 4 2 4 4 4 define Site All All Short Tail All All coverage 2-3 3-4 4 2-3 3 Precision 3-4 2 1 2 1 Recall 6

ENTITY RESOLUTION 7

Definitions and alternate names • Common sense: – Which entities refer to the same thing? • Slightly more formal: – Which mentions (aka records, instances, nodes, surface strings … ) refer to the same underlying entity? • Rigorous mathematical/logical definition – Doesn’t exist, or unknown! Just like other hard AI problems... • Why try to solve the problem aka why is it a problem? 8

Applications: A Web of Linked ‘Data’ 9

Applications: Schema.org ▪ Schema.org is an RDF ontology from which triples (with Web- dereferencable URIs) can be embedded in HTML pages http://schema.org/ 10

Applications: Google Knowledge Graph https://developers.google.com/knowledge-graph/ 11

SUB-COMMUNITIES 12

Entity Linking/Canonicalization • Name of an entity (such as a city or location) not enough to resolve ambiguity • Use Geonames knowledge base to canonicalize entity using machine learning and text features 13

Co-reference Resolution 14

Entity Resolution (what we’ll be covering) • Itself has many sub-communities and approaches • Because of flexible representations (compared to databases or strict models like OWL), KG-ER systems tend to be community- agnostic 15

STANDARD ER ARCHITECTURE 16

Entity Resolution is fundamentally non-linear • Theoretically quadratic in the number of nodes, even if ‘resolution rule’ was known • In practice, number of ‘duplicates’ tends to grow linearly, and duplicates overlap in non-trivial ways • How to devise efficient algorithms? 50 years of research has agreed on a two- step solutions Candidate set Resolved Execute Execute Knowledge entities graph blocking similarity 17

Blocking • Key idea is to use a cheap heuristic that efficiently clusters approximately similar entities into (possibly overlapping) blocks Blocks Generate Apply blocking key candidate set e.g. Tokens(LastName) (12 pairs), apply similarity function on each pair ‘Exhaustive’ set: 10 C 2 = 45 pairs 18

Aside: some blocks have skewed size... • Property of real-world data (zipf distribution, power laws...) • How to address data skew? • Apply blocking methods with guarantees • May lose some recall in the process Example Sorted Neighborhood aka merge-purge: -- use blocking key as ‘sorting’ key --slide a window of constant size (w) over sorted nodes --only pairs of nodes within window are paired, added to candidate set Other methods: block purging, canopies... 19

Similarity/link specification • Over 50 years of research on what makes for a good ‘similarity’ function • Current approach: apply ‘typical’ machine learning workflow to candidate set • Important to remember that features are extracted from ‘mention pairs’...leads to non-trivial alignment issues – Some form of schema-matching almost always attempted in practical systems – Some (but not much) work on so-called schema-free similarity General Semantic Web 20

Aside: why schema matching? 21

Feature engineering ... Open question: how much can representation learning contribute to Entity Resolution? 22

Similarity: putting it together • ML model can be supervised, semi-supervised or unsupervised Schema alignment /extract useful information sets Candidate set Probability that pair is duplicate Machine Learning (ML) model 23

OUTPUT REPRESENTATION AND HANDLING 24

From links to clusters • For perfect links, transitive closure/connected components works • With imperfect links, effect can be severe – One weak link is all it takes to form a giant component – Not uncommon in the real world • More robust clustering methods have to be applied – Community detection literature – Spectral clustering – Many more! • Some recent work has proposed to explore ER as a micro-clustering problem 25

From (possibly noisy) clusters to … ??? • Surprisingly under-studied problem! • Should the entities be fused into a single entity? How? – Entity linking has a conceptually elegant solution to this problem … – … but how to deal with NIL clusters? • Semantic Web approach – Represent individual links as KG triples and leave it at that – Entity Name Systems for advanced search/reasoning 26

BEYOND ENTITY RESOLUTION 27

By itself, generic ER is unlikely to be enough to sufficiently boost KG quality • Other things explored in the literature: • Domain knowledge – Collective ER methods have tried to exploit these systematically • Multi-type Entity Resolution – Extremely useful for knowledge graphs, lots more work to be done • Entity Resolution+Ontologies+IE Confidences: – Probabilistic Graphical Models like Probabilistic Soft Logic • Knowledge graph embeddings – Useful for link prediction and triples classification – Recall the Microsoft-founded_in-Seattle example earlier 28

Knowledge graph embeddings/representation learning • Useful for link prediction/missing relationships/triples classification • Not clear if it is really better than PSL on noisy KGs • Not clear how to combine KGEs with domain engineering 29

Concluding notes • Entity Resolution (ER) is a hard problem for machines , may be AI complete – It’s ‘easy’ for us because we’re so good at it – Not clear what will achieve the next breakthrough in ER • Essential to attempt a solution if KGs are semi-automatically constructed from Web data – Quality doesn’t have to be perfect, as we showed earlier with KG search • Wealth of solutions but can be broken down into standard components – Blocking, to make ER efficient – Similarity, to make ER automatic/adaptive • Many open questions, especially in relation to new ML models • More broadly, lots of opportunities for KG completion 30

Knowledge Graph Completion Introduction and motivation We have our - PowerPoint PPT Presentation

Knowledge Graph Completion Introduction and motivation We have our constructed knowledge graph, now what? freebase: Seattle 2 Introduction and motivation Problem 1: Wrong/missing triples 3 Introduction and motivation Problem 2: Many

Knowledge Graph Completion Mayank Kejriwal (USC/ISI) What is knowledge graph completion? An

Knowledge-Based Agents knowledge knowledge representation, knowledge base, types of knowledge

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

ELD Completion Module Advice for students on completion of Modules A, B & C Why?

http://cs224w.stanford.edu 1. Introduction to Knowledge Graphs 2. Knowledge Graph completion 3.

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

Challenges and Innovations in Building a Product Knowledge Graph XIN LUNA DONG, AMAZON JANUARY,

4.6 Unfailing Completion Classical completion: Try to transform a set E of equations into an

Lecture 15: Exact Tensor Completion Joint Work with David Steurer Lecture Outline Part I:

Graph Indexing: Tree + Delta Delta >= Graph >= Graph Graph Indexing: Tree + Peixian Zhao,

Graph Mining Marco Serafini COMPSCI 532 Lecture 11 Classes of Graph Systems Graph

Plan for today Knowledge-based systems 1 Explicit knowledge Knowledge Representation Inferred

Plan for today Knowledge-based systems 1 Tacit knowledge Knowledge Representation Inferred

26:198:722 Expert Systems I Knowledge representation I Knowledge acquisition I Machine learning I

An Interpretable Knowledge Transfer Model for Knowledge Base Completion Qizhe Xie, Xuezhe Ma,

Graphs Introduction Graph Graph A graph G = ( V , E ) is a set V of vertices connected by an

Bloom Filter based Inter-domain Name Resolution: A Feasibility Study Konstantinos V. Katsaros,

Resolution and logarithmic resolution by weighted blowing up Dan Abramovich, Brown University

Convention 2019 Dont Be Scared, Be Dont Be Scared, Be Prepared to Write a Prepared to

3D Deep Learning Hao Su @Stanford CS231n Guest Leture Broad Applications of 3D data Robotics

Resolution in FO logic (Ch. 9) Review: CNF Conjunctive normal form is a number of clauses stuck

www.business.ftc.gov/gutcheck www.wemarket4u.net/fatfoe

st Pr r

Logic as a Tool Chapter 4: Deductive Reasoning in First-Order Logic 4.5 Resolution for

Knowledge Graph Completion Introduction and motivation We have our - PowerPoint PPT Presentation

Knowledge Graph Completion Introduction and motivation We have our constructed knowledge graph, now what? freebase: Seattle 2 Introduction and motivation Problem 1: Wrong/missing triples 3 Introduction and motivation Problem 2: Many

Knowledge Graph Completion Mayank Kejriwal (USC/ISI) What is knowledge graph completion? An

Knowledge-Based Agents knowledge knowledge representation, knowledge base, types of knowledge

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

ELD Completion Module Advice for students on completion of Modules A, B &amp; C Why?

http://cs224w.stanford.edu 1. Introduction to Knowledge Graphs 2. Knowledge Graph completion 3.

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

Challenges and Innovations in Building a Product Knowledge Graph XIN LUNA DONG, AMAZON JANUARY,

4.6 Unfailing Completion Classical completion: Try to transform a set E of equations into an

Lecture 15: Exact Tensor Completion Joint Work with David Steurer Lecture Outline Part I:

Graph Indexing: Tree + Delta Delta &gt;= Graph &gt;= Graph Graph Indexing: Tree + Peixian Zhao,

Graph Mining Marco Serafini COMPSCI 532 Lecture 11 Classes of Graph Systems Graph

Plan for today Knowledge-based systems 1 Explicit knowledge Knowledge Representation Inferred

Plan for today Knowledge-based systems 1 Tacit knowledge Knowledge Representation Inferred

26:198:722 Expert Systems I Knowledge representation I Knowledge acquisition I Machine learning I

An Interpretable Knowledge Transfer Model for Knowledge Base Completion Qizhe Xie, Xuezhe Ma,

Graphs Introduction Graph Graph A graph G = ( V , E ) is a set V of vertices connected by an

Bloom Filter based Inter-domain Name Resolution: A Feasibility Study Konstantinos V. Katsaros,

Resolution and logarithmic resolution by weighted blowing up Dan Abramovich, Brown University

Convention 2019 Dont Be Scared, Be Dont Be Scared, Be Prepared to Write a Prepared to

3D Deep Learning Hao Su @Stanford CS231n Guest Leture Broad Applications of 3D data Robotics

Resolution in FO logic (Ch. 9) Review: CNF Conjunctive normal form is a number of clauses stuck

www.business.ftc.gov/gutcheck www.wemarket4u.net/fatfoe

st Pr r

Logic as a Tool Chapter 4: Deductive Reasoning in First-Order Logic 4.5 Resolution for

ELD Completion Module Advice for students on completion of Modules A, B & C Why?

Graph Indexing: Tree + Delta Delta >= Graph >= Graph Graph Indexing: Tree + Peixian Zhao,