Learning to Exploit Long-term Relational Dependencies in Knowledge Graphs Lingbing Guo, Zequn Sun, Wei Hu* Nanjing University, China * Corresponding author: whu@nju.edu.cn ICML’19, June 9–15, Long Beach, CA, USA
Knowledge graphs Knowledge graphs (KGs) store a wealth of structured facts about the real world n A fact !, #, $ : subject entity, relation, object entity ¡ KGs are far from complete and two important tasks are proposed n Introduction ➤ Our method ➤ Experiments and results ➤ Conclusion 2
Knowledge graphs Knowledge graphs (KGs) store a wealth of structured facts about the real world n A fact !, #, $ : subject entity, relation, object entity ¡ KGs are far from complete and two important tasks are proposed n Entity alignment: find entities in 1. different KGs denoting the same real-world object KG completion: complete missing facts in a single KG 2. E.g., predict ? in ( Tim Berners-Lee , employer , ? ) or ( ? , employer , W3C ) n Introduction ➤ Our method ➤ Experiments and results ➤ Conclusion 3
Challenges For KG embedding, existing methods largely focus on learning from relational n triples of entities Triple-level learning has two major limitations n Low expressiveness ¡ Learn entity embeddings from a fairly local view (i.e., 1-hop neighbors) n Inefficient information propagation ¡ Only use triples to deliver semantic information within/across KGs n Introduction ➤ Our method ➤ Experiments and results ➤ Conclusion 4
Learning to exploit long-term relational dependencies A relational path is an entity-relation chain , where entities and relations appear n alternately United Kingdom → country – → Tim Berners-Lee → employer → W3C RNNs perform well on sequential data n Limitations to leverage RNNs to model relational paths ¡ A relational path have two different types: “entity” and “relation” 1. Always appear in an alternating order ¡ A relational path is constituted by triples, but these basic structure units are overlooked by RNNs 2. Introduction ➤ Our method ➤ Experiments and results ➤ Conclusion 5
Recurrent skipping networks A conditional skipping mechanism allows RSNs to shortcut the current input entity n to let it directly participate in predicting its object entity Introduction ➤ Our method ➤ Experiments and results ➤ Conclusion 6
Tri-gram residual learning Residual learning n Let !(#) be an original mapping, and %(#) be the expected mapping ¡ Compared to directly optimizing !(#) to fit %(#) , ¡ it is easier to optimize !(#) to fit residual part %(#) An extreme case, %(#) = # n Introduction ➤ Our method ➤ Experiments and results ➤ Conclusion 7
Tri-gram residual learning Residual learning n Let !(#) be an original mapping, and %(#) be the expected mapping ¡ Compared to directly optimizing !(#) to fit %(#) , ¡ ( United Kingdom, country – , Tim Berners-Lee , employer , W3C ) Models Optimize !([)], employer ) as it is easier to optimize !(#) to fit residual part %(#) RNNs ! ) , employer ≔ W3C An extreme case, %(#) = # n RRNs ! ) , employer ≔ W3C − ) RSNs ! ) , employer ≔ W3C − Tim Berners − Lee Tri-gram residual learning n ) denotes context ( United Kingdom, country – , Tim Berners-Lee ) United Kingdom → country – → Tim Berners-Lee → employer → W3C ¡ Compared to directly learning to predict W3C by employer and its mixed context, it is easier to ¡ learn the residual part between W3C and Tim Berners-Lee Because they forms a triple, and we should not overlook the triple structure in the paths n Introduction ➤ Our method ➤ Experiments and results ➤ Conclusion 8
Architecture Embedding-based Entity Alignment 0.46 0.05 0.25 0.04 0.01 0.03 0.12 0.78 English in KG 2 English in KG 1 cosine similarity An end-to-end framework n embedding embeddings Type-based Noise Contrastive Estimation (NCE) Biased random walk sampling 1. negative negative relations entities NCE loss NCE loss Deep paths carry more relational dependencies than triples n Cross-KG paths deliver alignment information between KGs n Recurrent Skipping combine combine Recurrent skipping network Network 2. RNN unit Type-based noise contrastive estimation 3. United Kingdom → country – → Tim Berners-Lee → employer → W3C Evaluate loss in an optimized way n …… Biased Random Walk Sampling Tim Berners-Lee English language , 0.1 𝑓 𝑗 language – language , 0.4 language – 𝑓 𝑗+1 language – language W3C 𝑓 𝑗−1 Tim Berners-Lee United Kingdom English Tim Berners-Lee Tim Berners-Lee English English language language seed alignment KG 1 KG 2 United Kingdom W3C Introduction ➤ Our method ➤ Experiments and results ➤ Conclusion 9
Hits@1 DBP-WD DBP-YG EN-FR EN-DE Experiments and results MTransE 22.3 24.6 25.1 31.2 IPTransE 23.1 22.7 25.5 31.3 JAPE 21.9 23.3 25.6 32.0 BootEA 32.3 31.3 31.3 44.2 Entity alignment results n GCN-Align 17.7 19.3 15.5 25.3 Datasets: normal & dense TransR 5.2 2.9 3.6 5.2 ¡ TransD 27.7 17.3 21.1 24.4 Performed best on all datasets ¡ ConvE 5.7 11.3 9.4 0.8 RotatE 17.2 15.9 14.5 31.9 Especially on the normal datasets n RSNs (w/o biases) 37.2 36.5 32.4 45.7 RSNs 38.8 40.0 34.7 48.7 Introduction ➤ Our method ➤ Experiments and results ➤ Conclusion 10
Hits@1 DBP-WD DBP-YG EN-FR EN-DE Experiments and results MTransE 22.3 24.6 25.1 31.2 IPTransE 23.1 22.7 25.5 31.3 JAPE 21.9 23.3 25.6 32.0 BootEA 32.3 31.3 31.3 44.2 Entity alignment results n GCN-Align 17.7 19.3 15.5 25.3 Datasets: normal & dense TransR 5.2 2.9 3.6 5.2 ¡ TransD 27.7 17.3 21.1 24.4 Performed best on all datasets ¡ ConvE 5.7 11.3 9.4 0.8 RotatE 17.2 15.9 14.5 31.9 Especially on the normal datasets n RSNs (w/o biases) 37.2 36.5 32.4 45.7 RSNs 38.8 40.0 34.7 48.7 FB15K Hits@1 Hits@10 MRR KG completion results n TransE 30.5 73.7 0.46 Datasets: FB15K, WN18 TransR 37.7 76.7 0.52 ¡ TransD 31.5 69.1 0.44 Obtained comparable performance ¡ ComplEx 59.9 84.0 0.69 ConvE 67.0 87.3 0.75 Better than all translational models n RotatE 74.6 88.4 0.80 RSNs (w/o cross-KG biase) 72.2 87.3 0.78 Introduction ➤ Our method ➤ Experiments and results ➤ Conclusion 11
Further analysis RSNs RRNs (SC-LSTM) RNNs 0.4 RSNs vs. RNNs, RRNs [recurrent residual networks] 0.3 Hits@1 n (a) DBP-WD ( normal ) 0.2 0.1 Achieved better results with only 1/30 epochs 0 ¡ 0.8 0.7 Hits@1 0.6 (b) DBP-WD (dense) 0.5 0.4 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 Epochs Introduction ➤ Our method ➤ Experiments and results ➤ Conclusion 12
Further analysis RSNs RRNs (SC-LSTM) RNNs 0.4 RSNs vs. RNNs, RRNs [recurrent residual networks] 0.3 Hits@1 n (a) DBP-WD ( normal ) 0.2 0.1 Achieved better results with only 1/30 epochs 0 ¡ 0.8 0.7 Hits@1 0.6 (b) DBP-WD (dense) 0.5 0.4 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 Epochs DBP-WD DBP-YG DBP-WD DBP-YG Random walk length n EN-FR EN-DE EN-FR EN-DE 50 85 On all the datasets, increased steadily from normal dense 45 80 ¡ Hits@1 Hits@1 40 75 length 5 to 15 35 70 30 65 5 7 9 11 13 15 17 19 21 23 25 5 7 9 11 13 15 17 19 21 23 25 Random walk length Random walk length Introduction ➤ Our method ➤ Experiments and results ➤ Conclusion 13
Conclusion We studied path-level KG embedding learning n RSNs: sequence models to learn relational paths 1. End-to-end framework: biased random walk sampling + RSNs 2. Superior in entity alignment and competitive in KG completion 3. Future work n Unified sequence model: relational paths & textual information ¡ Introduction ➤ Our method ➤ Experiments and results ➤ Conclusion 14
Poster: Tonight, Pacific Ballroom #42 Datasets & source code: https://github.com/nju-websoft/RSN Acknowledgements: National Key R&D Program of China (No. 2018YFB1004300) l National Natural Science Foundation of China (No. 61872172) l Key R&D Program of Jiangsu Science and Technology Department (No. BE2018131) l ICML’19, June 9–15, Long Beach, CA, USA
Recommend
More recommend