knowledge graph embedding and its applications
play

Knowledge Graph Embedding and Its Applications Xiaolong Jin CAS Key - PDF document

Knowledge Graph Embedding and Its Applications Xiaolong Jin CAS Key Lab of Network Data Science and Technology, Institute of Computing Technology (ICT), Chinese Academy of Sciences (CAS) 2019 11 30@Fudan Agenda Background Knowledge


  1. Knowledge Graph Embedding and Its Applications Xiaolong Jin CAS Key Lab of Network Data Science and Technology, Institute of Computing Technology (ICT), Chinese Academy of Sciences (CAS) 2019 ‐ 11 ‐ 30@Fudan Agenda  Background  Knowledge Graph Embedding (KGE)  Applications of KGE  Conclusions 2

  2. Background  A Knowledge Graph (KG) is a system that understands facts about people, places and things and how these entities are all connected  Examples  Dbpedia  YAGO  NELL  Freebase  Wolfram Alpha  Probase  Google KG  …… 3 Background  Typical applications of KGs  Vertical search  Intelligent QA  Disease diagnosis  Financial anti ‐ fraud  Abnormal data analysis  Machine translation  …… 4

  3. Vertical Search 5 Intelligent QA  IBM’s Watson  Google’s Google Now  Apple’s Siri  Amazon’s Alexa  Microsoft’s Xiaobing & Cortana  Baidu’s Dumi ( 度秘 )  Sogou’s Wangzai ( 旺仔 )  … 6

  4. Disease Diagnosis  Watson Care Manager  Knowledge service platform for Traditional Chinese Medicine (TCM)  … 基于知识图谱的癌症研究 @安德森癌症中心&IBM Watson 中医药知识服务平台 @中医科学院中医药信息研究所 7 Typical Representation of KGs  Symbolic triples: (head entity, relation, tail entity)  e.g., (Eiffel Tower, is_located_in, Paris)  (Eiffel Tower, is_a, place)  (Bob, is_a_friend_of, Alice)  8

  5. Inference over KGs  Logic based models  Pros: Easily interpretable  Cons: Highly complex  Path ranking algorithms  Pros: Easily interpretable  Cons: Cannot handle rare relations  Cannot handle KGs with low connectivity  Extracting paths is time ‐ consuming   Embedding ‐ based methods  Pros: Highly efficient  Can capture semantic information  9  Cons: Less interpretable Agenda  Background  Knowledge Graph Embedding (KGE)  Applications of KGE  Conclusions 10

  6. Knowledge Graph Embedding (KGE)  Map the entities, relations, and even paths of a KG into a low ‐ dimensional vector space  Encode semantic information  Computationally efficient TransE (Translational Embeddings)  Basic idea  Treat relations as the translation operations between vectors corresponding to entities  The score function of h + r = t China + Capital = Beijing France + Capital = Paris  Loss function Optimal Margin Positive Negative 12 triple set triple set

  7. Trans Series of KGE  TransE cannot well handle 1 ‐ N, N ‐ 1, or N ‐ M relations  TransH  TransR TransH  … TransR 13 Agenda  Background  Knowledge Graph Embedding (KGE)  Applications of KGE  Conclusions 14

  8. The Applications of KGE  Basic applications  Link prediction ?  Entity alignment  KG integration  … ?  Advanced applications  Vertical search KG2 KG1 (group 2) (group 1)  Intelligent QA  Disease diagnosis  … aligned entity pairs Application 1 : Link Prediction Shared embedding based neural networks for knowledge graph completion S. Guan, X. Jin, Y. Wang, et al. The 27th ACM International Conference on Information and Knowledge Management (CIKM’18) 16

  9. Motivation  Existing methods for link prediction  Handle three types of tasks:  Do not distinguish them in training  These prediction tasks have quite different performance  Link prediction upon reasoning  It is a process that gradually approaches to the target  FCN of decreasing hidden nodes can imitate such a process 17 The Proposed Method  Shared Embedding based Neural Network (SENN)  Explicitly distinguish the three prediction tasks  Integrate them into a FCN based framework  Extend SENN to SENN+  Use to improve and 18

  10. The SENN Method  The framework  2 shared embedding matrices  3 substructures: head_pred , rel._pred and tail_pred 19 The Three Substructures  Head_pred  The score function where is the ReLU function 20

  11. The Three Substructures  Head_pred  The prediction label vector where is the sigmoid or softmax function  Each element of indicates the probability of the corresponding entity h to form a valid triple (h, r, t) 21 The Three Substructures  rel._pred and tail_pred do similarly  The score functions  The prediction label vectors 22

  12. Model Training  The general loss function  Idea: cross entropy of prediction and target label vectors  These prediction tasks have their target label vectors , and for where is the set of valid head entities in the training set, given and  Use label smoothing to regularize target label vectors where is the label smoothing parameter 23 Model Training  The general loss function  Binary cross ‐ entropy losses for the 3 prediction tasks  The general loss for the given triple 24

  13. Model Training  The adaptively weighted loss mechanism  The prediction on 1 ‐ side or M ‐ side  Punish the model more severely if deterministic ones are wrong  Relation prediction and entity prediction  Punish wrong predictions on head/tail entities more severely  The final loss function for the triple 25 The SENN+ Method  Employ to improve and in test  The relation ‐ aided test mechanism  Given , assume that is a valid head entity  If we do , is most probably have a prediction label higher than other relations and be ranked higher 26

  14. The SENN+ Method  The adaptively weighted loss mechanism  Two additional relation aided vectors  The final prediction label vectors for entity prediction 27 Experiments  Entity prediction 28

  15. Experiments  Entity prediction in detail  The adaptively weighted loss mechanism D istinguish and well learn the predictions of different mapping  properties 29 Experiments  Relation prediction  SENN and SENN+ capture the following information to obtain better performance  I mplicit information interaction among different predictions  P rediction ‐ specific information 30

  16. Application 2 : Link Prediction on N ‐ Ary Facts NaLP: Link Prediction on N ‐ ary Relational Data S. Guan, X. Jin, Y. Wang, X. Cheng. The 2019 International World Wide Web Conference (WWW 2019) 31 Motivation  N ‐ ary facts are pervasive in practice  Existing link prediction methods usually convert n ‐ ary facts into a few triples (i.e., binary sub ‐ facts), which has some drawbacks  Needs to consider many triples and is thus more complicated  The loss of structural information in some conversions that leads to inaccurate link prediction  The added virtual entities and triples bring in more parameters to be learned 32

  17. Related works  A few link prediction methods focus on n ‐ ary facts directly  m ‐ TransH (IJCAI ‐ 2016)  A relation is defined by the mapping from a sequence of roles, corresponding to this type of relation, to their values. E.g.,  Receive_Award: [person, award, point in time]  [Marie Curie, Nobel Prize in Chemistry, 1911]  “Marie Curie received Nobel Prize in Chemistry in 1911.”  Each specific mapping is an instance of the relation  Generalize TransH to n ‐ ary relational data 33 Related works  Also a few link prediction methods focus on n ‐ ary facts directly  RAE (Relatedness Affiliated Embedding, WWW ‐ 2018)  Improve m ‐ TransH by further considering values’ relatedness  Ignore the roles in the above process  Under different sequences of roles, the relatedness of two values is greatly different Marie Curie and Henri Becquerel (person, award, point in time, winner) (person, spouse, start time, end time, place of marriage) The proposed NaLP method explicitly models the relatedness of the role ‐ value pairs. 34

  18. The NaLP method  The presentation of each n ‐ ary fact  A set of role ‐ value pairs  Formally, given an n ‐ ary fact with roles, each role having values, the representation is as follows:  For example, “ Marie Curie received Nobel Prize in Chemistry in 1911 .” is represented as: {person: Marie Curie, award: Nobel Prize in Chemistry, point in time: 1911} 35 The NaLP method  The framework  A role and its value are tightly linked to each other, thus should be bound together  For a set of role ‐ value pairs, it decides if they form a valid n ‐ ary fact, i.e., if they are closely related Role ‐ value pair embedding Relatedness evaluation 36

  19. Role ‐ value pair embedding Capture the features of the role ‐ value pairs Form the embedding matrix 37 Relatedness evaluation  The principle  A set of role ‐ value pairs form a valid fact → Every two role ‐ value pairs are greatly related → The values of their relatedness feature vector are large → The minimum over each feature dimension among all the pairs is not allowed to be too small → Apply element ‐ wise minimizing over the pair ‐ wise relatedness to approximately evaluate the overall relatedness 38

  20. Relatedness evaluation Compute the relatedness between role ‐ value pairs Estimate the overall relatedness of all the role ‐ value pairs Obtain the evaluation score 39 Look into NaLP and the loss function  Look into NaLP  Permutation ‐ invariant to the input order of role ‐ value pairs  Able to cope with facts of different arities  The loss function: 40

Recommend


More recommend