An Interpretable Knowledge Transfer Model for Knowledge Base - PowerPoint PPT Presentation

An Interpretable Knowledge Transfer Model for Knowledge Base Completion Qizhe Xie, Xuezhe Ma, Zihang Dai, Eduard Hovy Carnegie Mellon University Language Technologies Institute August 2, 2017 1 / 28

Outline Introduction Task Motivation Model Experiments Main Results Performance on Rare Relations Interpretability Analysis on Sparseness 2 / 28

Task: Knowledge base completion (KBC) ◮ Recover missing facts in knowledge bases ◮ Given lots of triples such as ( Leonardo DiCaprio , won award , Oscar ) ◮ Predict missing facts ( Leonardo DiCaprio , Profession , ? ) ◮ Embedding-based approaches 4 / 28

Data Sparsity Issue Frequency Log(Frequency) Frequency Log(Frequency) 40000 11 16000 10 30000 8.25 12000 7.5 Log(Frequency) Log(Frequency) Frequency Frequency 20000 5.5 8000 5 10000 2.75 4000 2.5 0 0 0 0 Relation Relation (a) WN18 (b) FB15k Figure 1: Frequencies of relations are subject to Zipf’s law. 5 / 28

Problems Our Model Tackle ◮ Data-sparsity: Transfer learning ◮ On WN18, the rarer the relation is, the greater the improvements are ◮ Interpretability: ℓ 0 -regularized representation ◮ Reverse relations, undirected relations and similar relations are identified by the sparse representation ◮ Model size: Compression ◮ On FB15k, the number of parameters can be reduced to 1/90 of the original model 6 / 28

Notation and Previous Models ◮ Data: Triples ( h , r , t ) ◮ Training data: ( h = Leonardo DiCaprio , r = won award , t = Oscar ) ◮ Test data: ( h = Leonardo DiCaprio , r = Profession , t = ? ) ◮ Energy function f r ( h , t ) of triples ( h , r , t ) ◮ Minimize the energy of true triples and maximize the energy of false triples ◮ TransE [Bordes et al., 2013]: f r ( h , t ) = � h + r − t � ℓ Parameters: entity embeddings h , t , relation embeddings r ◮ STransE [Nguyen et al., 2016]: f r ( h , t ) = � W r , 1 h + r − W r , 2 t � ℓ Parameters: relation-specific projection matrices W r , 1 , W r , 2 and embeddings ◮ All parameters are trained by SGD 8 / 28

STransE: Parametrizing Each Relation Separately ◮ Prone to the data sparsity problem 9 / 28

Sharing Parameters through Common Concepts ◮ Relation-concept mapping example with attention weights: ◮ Parametrize concepts instead of relations ◮ Relation matrices are weighted averages of concept matrices with attention weights W r 1 , 1 = 0 . 2 D 1 + 0 . 8 D 2 10 / 28

Sharing Parameters through Common Concepts ◮ Suppose a ground-truth mapping is given, then ◮ Transfer learning can be done effectively through parameter sharing ◮ We can interpret similar relations ◮ All parameters are trainable by SGD ◮ Concepts need to be learned end-to-end ◮ How do we obtain the mapping? 11 / 28

Dense Mapping ◮ Dense attention: Construct a dense bipartite graph and train attention weights ◮ Problems: ◮ Uninterpretable: In practice, even with ℓ 1 regularization, we get a distributed weights W r 1 , 1 = 0 . 2 D 1 + 0 . 52 D 2 + 0 . 1 D 3 + 0 . 15 D 4 + 0 . 03 D 5 ◮ Inefficient: Computation involves all concept matrices ◮ Unnecessary: Intuitively, each relation can be composed of at most K concepts 12 / 28

Sparse Mapping ◮ Problem: Not differentiable ◮ An approximate approach: ◮ Given current embeddings, a correct mapping should minimize the loss function ◮ For each relation, assign a single concept to the relation and compute the loss ◮ Greedily choose the top K concepts that minimize the loss 13 / 28

Block Iterative Optimization ◮ Randomly initialize mappings and concepts. ◮ Repeat ◮ Optimize embeddings and attention weights with SGD ◮ Reassign mappings 14 / 28

A Better Sampling Approach: Domain sampling ◮ Loss function involves negative sampling ◮ Sample from domain-specific entities with an adaptive probability ◮ E.g., negative sample of ( Steve Jobs , was born in , US ) : ◮ Uniform negative sample: ( Steve Jobs , was born in , CMU ) ◮ Domain negative sample: ( Steve Jobs , was born in , China ) 15 / 28

Main Results WN18 FB15k Model Additional Information Mean Rank Hits@10 Mean Rank Hits@10 SE [Bordes et al., 2011] No 985 80.5 162 39.8 Unstructured [Bordes et al., 2014] No 304 38.2 979 6.3 TransE [Bordes et al., 2013] No 251 89.2 125 47.1 TransH [Wang et al., 2014] No 303 86.7 87 64.4 TransR [Lin et al., 2015b] No 225 92.0 77 68.7 CTransR [Lin et al., 2015b] No 218 92.3 75 70.2 KG2E [He et al., 2015] No 348 93.2 59 74.0 TransD [Ji et al., 2015] No 212 92.2 91 77.3 TATEC [Garc´ ıa-Dur´ an et al., 2016] No - - 58 76.7 NTN [Socher et al., 2013] No - 66.1 - 41.4 DISTMULT [Yang et al., 2015] No - 94.2 - 57.7 STransE [Nguyen et al., 2016] No 206 (244) 93.4 (94.7) 69 79.7 ITransF No 205 94.2 65 81.0 ITransF (domain sampling) No 223 95.2 77 81.4 R TransE [Garc´ ıa-Dur´ an et al., 2015] Path - - 50 76.2 PTransE [Lin et al., 2015a] Path - - 58 84.6 NLFeat [Toutanova and Chen, 2015] Node + Link Features - 94.3 - 87.0 Random Walk [Wei et al., 2016] Path - 94.8 - 74.7 IRN [Shen et al., 2016] External Memory 249 95.3 38 92.7 Table 1: Link prediction results on two datasets. Hits@10 is the top-10 accuracy. Higher Hits@10 or lower Mean Rank indicates better performance. 17 / 28

Performance on Rare Relations ITransF (ours) STransE 100 75 Hits@10 50 25 0 Relations: Frequent —> Rare Figure 2: Average Hits@10 on WN18 relations 18 / 28

Performance on Rare Relations ITransF (ours) STransE ITransF (ours) STransE 100 100 75 75 Hits@10 Hits@10 50 50 25 25 0 0 Frequent Medium Rare Frequent Medium Rare Relation Bin Relation Bin (a) WN18 (b) FB15k Figure 3: Average Hits@10 on relations of different frequencies 19 / 28

Interpretability: How Is Knowledge Shared? ◮ Each relation’s head and tail have their own concepts. (a) WN18 (b) FB15k Figure 4: Heatmap visualization of attention weights on WN18 and FB15k. 20 / 28

Interpretability: How Is Knowledge Shared? ◮ Each relation’s head and tail have their own concepts. ◮ Interpretation: ◮ Reverse relations: hyponym and hypernym; award winning work and won award for. (a) WN18 (b) FB15k Figure 5: Heatmap visualization of attention weights on WN18 and FB15k. 21 / 28

Interpretability: How Is Knowledge Shared? ◮ Each relation’s head and tail have their own concepts. ◮ Interpretation: ◮ Reverse relations: hyponym and hypernym; award winning work and won award for. ◮ Undirected relations: spouse; similar to. (a) WN18 (b) FB15k 22 / 28 Figure 6: Heatmap visualization of attention weights on WN18 and FB15k.

Interpretability: How Is Knowledge Shared? ◮ Each relation’s head and tail have their own concepts. ◮ Interpretation: ◮ Reverse relations: hyponym and hypernym; award winning work and won award for. ◮ Undirected relations: spouse; similar to. ◮ Similar relations: was anominated for and won award for; instance hypernym and hypernym. (a) WN18 (b) FB15k 23 / 28

Interpretability of ℓ 1 regularized dense mapping (a) WN18 (b) FB15k Figure 8: Heatmap visualization of ℓ 1 regularized dense mapping ◮ The mapping cannot be sparse without performance loss. 24 / 28

A Byproduct of Parameter Sharing: Model Compression ITransF STransE CTransR ITransF STransE CTransR 83 95 79.75 93.75 Hits@10 Hits@10 76.5 92.5 73.25 91.25 70 90 15 30 75 300 600 1200 1345 2200 2690 18 22 26 30 36 45 # concepts # concepts (a) FB15k (b) WN18 Figure 9: Performance with different number of concepts ◮ On FB15k, the model can be compressed by nearly 90 times. 25 / 28

Analysis on Sparseness ◮ Does sparseness hurt performance? WN18 FB15k Method MR H10 Time MR H10 Time Dense 199 94.0 4m34s 69 79.4 4m30s Dense + ℓ 1 228 94.2 4m25s 131 78.9 5m47s Sparse 207 94.1 2m32s 67 79.6 1m52s Table 2: Performance of model with dense graph or sparse graph with only 15 or 22 concepts. The time gap is more significant when we use more concepts. ◮ How does our approach compare to sparse encoding methods? WN18 FB15k Method MR H10 MR H10 Pretrain + Sparse Encoding [Faruqui et al., 2015] 211 86.6 66 79.1 Ours 205 94.2 65 81.0 Table 3: Different methods to obtain sparse representations 26 / 28

Conclusion ◮ Propose a knowledge embedding model which can discover shared hidden concepts ◮ Perform transfer learning through parameter sharing ◮ Design a learning algorithm to induce the interpretable sparse representation ◮ Outperform baselines on two benchmark datasets for the knowledge base completion task 27 / 28

An Interpretable Knowledge Transfer Model for Knowledge Base - PowerPoint PPT Presentation

An Interpretable Knowledge Transfer Model for Knowledge Base Completion Qizhe Xie, Xuezhe Ma, Zihang Dai, Eduard Hovy Carnegie Mellon University Language Technologies Institute August 2, 2017 1 / 28 Outline Introduction Task Motivation

Interpretable sets in o-minimal structures Will Johnson March 27, 2015 Will Johnson

Radiative Transfer Radiative Transfer Radiative transfer is a branch of atmospheric physics. We

Knowledge Transfer Using Latent Variable Models Ayan Acharya UT Austin, Department of ECE July

Technology Transfer or Knowledge Transfer? Russ Somma, Ph.D. SommaTech,LLC Affiliate of IPS

Not Just a Black Box: Interpretable Deep Learning for Genomics Avan> Shrikumar, Peyton

Incremental Approach to Interpretable Classification Rule Learning Bishwamittra Ghosh and Kuldeep

Inducing Interpretable Word Senses for WSD and Enrichment of Lexical Resources Overview Jan 11,

Two-level Authoring of Computer- Interpretable Guidelines David Buenestado, Juan M. Pikatza, Unai

IMLI: An Incremental Framework for MaxSAT-Based Learning of Interpretable Classification Rules

Deep Visual Models with Interpretable Features and Modularized Structures Quanshi Zhang John

From ML Successes to Applications ICIP18 Tutorial on Interpretable Deep Learning 2 Black Box

Transfer United: Partnerships to Foster Transfer Student Success Tuesday, November 5 th

Technology Acquisition & Transfer What is Technology Transfer ? Technology Transfer is

Knowledge-Based Agents knowledge knowledge representation, knowledge base, types of knowledge

KNOWLEDGE ACQUISITION AND CONSTRUCTION Transfer of Knowledge Knowledge acquisition is the

Heat Transfer Heat Transfer Introduction Practical occurrences, applications, factors

Knowledge Representation 9 AI Slides (5e) c Lin Zuoquan@PKU 2003-2019 9 1 9 Knowledge

Larry Holder School of EECS Washington State University Artificial Intelligence 1 } Knowledge

Share Virtual Discovery Environment in Linked Data (SHARE-VDE) Short Summary Michele Casalini

Good morning! Knowledge Management and Specialized Information Systems Knowledge Management

Expanding the YAGO knowledge base Regexes Answering Queries with Unix Shell Thomas Rebele

Table of Contents I Creating a Knowledge Base Basic Family Relationships Defining Orphans

Knowledge-Based Agents (Logical Agents) A knowledge-based agent needs (at least): A

OpenGeo: An Open Geometric Knowledge Base Dongming Wang, Xiaoyu Chen, Wenya An, Lei Jiang, and Dan

An Interpretable Knowledge Transfer Model for Knowledge Base - PowerPoint PPT Presentation

An Interpretable Knowledge Transfer Model for Knowledge Base Completion Qizhe Xie, Xuezhe Ma, Zihang Dai, Eduard Hovy Carnegie Mellon University Language Technologies Institute August 2, 2017 1 / 28 Outline Introduction Task Motivation

Interpretable sets in o-minimal structures Will Johnson March 27, 2015 Will Johnson

Radiative Transfer Radiative Transfer Radiative transfer is a branch of atmospheric physics. We

Knowledge Transfer Using Latent Variable Models Ayan Acharya UT Austin, Department of ECE July

Technology Transfer or Knowledge Transfer? Russ Somma, Ph.D. SommaTech,LLC Affiliate of IPS

Not Just a Black Box: Interpretable Deep Learning for Genomics Avan&gt; Shrikumar, Peyton

Incremental Approach to Interpretable Classification Rule Learning Bishwamittra Ghosh and Kuldeep

Inducing Interpretable Word Senses for WSD and Enrichment of Lexical Resources Overview Jan 11,

Two-level Authoring of Computer- Interpretable Guidelines David Buenestado, Juan M. Pikatza, Unai

IMLI: An Incremental Framework for MaxSAT-Based Learning of Interpretable Classification Rules

Deep Visual Models with Interpretable Features and Modularized Structures Quanshi Zhang John

From ML Successes to Applications ICIP18 Tutorial on Interpretable Deep Learning 2 Black Box

Transfer United: Partnerships to Foster Transfer Student Success Tuesday, November 5 th

Technology Acquisition &amp; Transfer What is Technology Transfer ? Technology Transfer is

Knowledge-Based Agents knowledge knowledge representation, knowledge base, types of knowledge

KNOWLEDGE ACQUISITION AND CONSTRUCTION Transfer of Knowledge Knowledge acquisition is the

Heat Transfer Heat Transfer Introduction Practical occurrences, applications, factors

Knowledge Representation 9 AI Slides (5e) c Lin Zuoquan@PKU 2003-2019 9 1 9 Knowledge

Larry Holder School of EECS Washington State University Artificial Intelligence 1 } Knowledge

Share Virtual Discovery Environment in Linked Data (SHARE-VDE) Short Summary Michele Casalini

Good morning! Knowledge Management and Specialized Information Systems Knowledge Management

Expanding the YAGO knowledge base Regexes Answering Queries with Unix Shell Thomas Rebele

Table of Contents I Creating a Knowledge Base Basic Family Relationships Defining Orphans

Knowledge-Based Agents (Logical Agents) A knowledge-based agent needs (at least): A

OpenGeo: An Open Geometric Knowledge Base Dongming Wang, Xiaoyu Chen, Wenya An, Lei Jiang, and Dan

Not Just a Black Box: Interpretable Deep Learning for Genomics Avan> Shrikumar, Peyton

Technology Acquisition & Transfer What is Technology Transfer ? Technology Transfer is