CSE 6240: Web Search and Text Mining. Spring 2020 Graph and Knowledge Graph Representation Learning Prof. Srijan Kumar http://cc.gatech.edu/~srijan 1 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Today’s Lecture • Embedding entire graphs • Introduction to Knowledge Graphs • Embeddings in Knowledge Graphs – TransE – TransR 2 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Embedding Entire Graphs • Goal: How to embed an entire graph 𝐻? 𝒜 $ • Tasks: – Classifying toxic vs. non-toxic molecules – Identifying anomalous graphs 3 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
� Approach #1 Simple idea: • Run a standard graph embedding technique on the (sub)graph 𝐻 • Then just sum (or average) the node embeddings in the (sub)graph 𝐻 𝑨 $ = ' 𝑨 ( (∈$ • Used by Duvenaud et al., 2016 to classify molecules based on their graph structure – Convolutional Networks on Graphs for Learning Molecular Fingerprints. NeurIPS 2015 4 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Approach #2 • Idea: Introduce a “virtual node” to represent the (sub)graph and run a standard graph embedding technique • Proposed by Li et al., 2016 as a general technique for subgraph embedding – Gated Graph Sequence Neural Networks. ICLR 2016 5 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Approach #3 • Represent a graph as a distribution/set of walks on that graph • Anonymous Walk Embeddings: – States in anonymous walk correspond to the index of the first time we visited the node in a random walk – Anonymous Walk Embeddings, ICML 2018 6 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Number of Walks Grows The number of anonymous walks grows exponentially: – There are 5 anon. walks 𝑏 , of length 3: 𝑏 - =111, 𝑏 . =112, 𝑏 / = 121, 𝑏 0 = 122, 𝑏 1 = 123 7 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Idea #1: Anonymous Walks • Enumerate all possible anonymous walks 𝑏 , of 𝑚 steps and record their counts • Represent the graph as a probability distribution over these walks • For example: – Set 𝑚 = 3 – Then we can represent the graph as a 5-dim vector • Since there are 5 anonymous walks 𝑏 , of length 3: 111, 112, 121, 122, 123 – 𝑎 $ [𝑗] = probability of anonymous walk 𝑏 , in 𝐻 8 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Idea #2: Learn Walk Embeddings Learn embedding 𝒜 𝒋 of every anonymous walk 𝒃 𝒋 • The embedding of a graph 𝐻 is then sum/avg/concatenation of walk embeddings z , 9 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Idea #2: Learn Walk Embeddings How to embed walks? • Idea: Embed walks such that the next walk starting from the same node can be predicted – Set walk embedding z , such that we maximize ? 𝑥 >@A ? = 𝑔(𝑨) ? 𝑄 𝑥 > , … , 𝑥 > ? is a 𝑢 -th random • Where 𝑥 > walk starting at node 𝑣 – Similar to the word2vec idea 10 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Idea #2: Learn Walk Embeddings • Run 𝑼 different random walks from 𝒗 each of length 𝒎 : ? … 𝑏 N ? , 𝑏 . ? 𝑂 M 𝑣 = 𝑏 - – Let 𝑏 , be its anonymous version of walk 𝑥 , • Learn to predict walks that co-occur in 𝚬 - size window 11 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Idea #2: Learn Walk Embeddings • Estimate embedding 𝒜 𝒋 of anonymous walk 𝒃 𝒋 of 𝒙 𝒋 : N max 1 𝑈 ' log 𝑄(𝑏 > |𝑏 >@A , … , 𝑏 >@- ) >ZA where: Δ = context window size \]^(_ ` a ) , i.e., softmax over all • 𝑄 𝑥 > 𝑥 >@A , … , 𝑥 >@- = d ∑ \]^(_(` c )) c walks - A A ∑ 𝑔(𝑏 > ) = 𝑐 + 𝑉 ⋅ 𝑨 , • ,Z- – where 𝑐 ∈ ℝ , 𝑉 ∈ ℝ j , 𝑨 , is the embedding of 𝑏 , (anonymized version of walk 𝑥 , ) 12 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Summary of Graph Embeddings We discussed 3 ideas to graph embeddings: • Approach 1: Embed nodes and sum/average them • Approach 2: Create super-node that spans the (sub) graph and then embed that node • Approach 3: Anonymous Walk Embeddings – Idea 1: Represent the graph via the distribution over all the anonymous walks – Idea 2: Embed anonymous walks 13 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Today’s Lecture • Embedding entire graphs • Introduction to Knowledge Graphs • Embeddings in Knowledge Graphs – TransE – TransR 14 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Knowledge Graphs • Knowledge in graph form – Capture entities, types, and relationships • Nodes are entities • Nodes are labeled with their types • Edges between two nodes capture relationships between entities 15 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Example: Bibliographic networks • Node types : paper, title, author, conference, year • Relation types : pubWhere, pubYear, hasTitle, hasAuthor, cite 16 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Example: Social networks • Node types : account, song, post, food, channel • Relation types : friend, like, cook, watch, listen 17 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Example: Google Knowledge Graph paintedBy 18 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Knowledge Graphs in Practice • Google Knowledge Graph • Amazon Product Graph • Facebook Graph API • IBM Watson • Microsoft Satori • Project Hanover/Literome • LinkedIn Knowledge Graph • Yandex Object Answer 19 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Applications of Knowledge Graphs • Serving information 20 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Applications of Knowledge Graphs • Question answering and conversation agents 21 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Knowledge Graph Datasets • Publicly available KGs: – FreeBase, Wikidata, Dbpedia, YAGO, NELL • Common characteristics: – Massive : millions of nodes and edges – Incomplete : many true edges are missing Given a massive KG, Can we predict enumerating all the plausible BUT missing possible facts is links? intractable! 22 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Example: Freebase • Freebase – ~50 million entities 93.8% of persons from – ~38K relation types Freebase have no place of – ~3 billion facts/triples birth and 78.5% have no nationality! • FB15k/FB15k-237 – A complete subset of Freebase, used by researchers to learn KG models [1] Paulheim, Heiko. "Knowledge graph refinement: A survey of approaches and evaluation methods." Semantic web 8.3 (2017): 489-508. [2] Min, Bonan, et al. "Distant supervision for relation extraction with an incomplete knowledge base." Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies . 2013. 23 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Today’s Lecture • Embedding entire graphs • Introduction to Knowledge Graphs • Embeddings in Knowledge Graphs – TransE – TransR 24 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Key Task: KG Completion • Knowledge Graph completion is a link prediction problem • KG incompleteness can substantially affect the efficiency of systems relying on it • Main paper: Translating Embeddings for Modeling Multi-relational Data. Bordes, Usunier, Garcia-Duran. NeurIPS 2013. 25 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Key Task: KG Completion • Intuition: a link prediction model that learns from local and global connectivity patterns in the KG, taking into account entities and relationships of different types at the same time missing relation • Models: TransE and TransR 26 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Translating Embeddings: TransE • Relationships between entities = triplets – 𝒊 (head entity), 𝒎 (relation), 𝒖 (tail entity) => (ℎ, 𝑚, 𝑢) • Entities and relations are all embedded in an entity space 𝑆 o • Relations are represented as translations – ℎ + 𝑚 ≈ 𝑢 if the given fact is true; else, ℎ + 𝑚 ≠ 𝑢 27 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
TransE • Translation Intuition : NOTATION: embedding – For a triple (ℎ, 𝑠, 𝑢) , 𝐢, 𝐬, 𝐮 ∈ ℝ v , vectors will appear in 𝐢 + 𝐬 = 𝐮 boldface • Score function: 𝑔 w ℎ, 𝑢 = ||ℎ + 𝑠 − 𝑢|| Nationality 𝐬 Obama 𝐢 𝐮 American 28 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Recommend
More recommend