Knowledge Graph Embedding and Its Applications Xiaolong Jin CAS Key Lab of Network Data Science and Technology, Institute of Computing Technology (ICT), Chinese Academy of Sciences (CAS) 2019 ‐ 11 ‐ 30@Fudan Agenda Background Knowledge Graph Embedding (KGE) Applications of KGE Conclusions 2
Background A Knowledge Graph (KG) is a system that understands facts about people, places and things and how these entities are all connected Examples Dbpedia YAGO NELL Freebase Wolfram Alpha Probase Google KG …… 3 Background Typical applications of KGs Vertical search Intelligent QA Disease diagnosis Financial anti ‐ fraud Abnormal data analysis Machine translation …… 4
Vertical Search 5 Intelligent QA IBM’s Watson Google’s Google Now Apple’s Siri Amazon’s Alexa Microsoft’s Xiaobing & Cortana Baidu’s Dumi ( 度秘 ) Sogou’s Wangzai ( 旺仔 ) … 6
Disease Diagnosis Watson Care Manager Knowledge service platform for Traditional Chinese Medicine (TCM) … 基于知识图谱的癌症研究 @安德森癌症中心&IBM Watson 中医药知识服务平台 @中医科学院中医药信息研究所 7 Typical Representation of KGs Symbolic triples: (head entity, relation, tail entity) e.g., (Eiffel Tower, is_located_in, Paris) (Eiffel Tower, is_a, place) (Bob, is_a_friend_of, Alice) 8
Inference over KGs Logic based models Pros: Easily interpretable Cons: Highly complex Path ranking algorithms Pros: Easily interpretable Cons: Cannot handle rare relations Cannot handle KGs with low connectivity Extracting paths is time ‐ consuming Embedding ‐ based methods Pros: Highly efficient Can capture semantic information 9 Cons: Less interpretable Agenda Background Knowledge Graph Embedding (KGE) Applications of KGE Conclusions 10
Knowledge Graph Embedding (KGE) Map the entities, relations, and even paths of a KG into a low ‐ dimensional vector space Encode semantic information Computationally efficient TransE (Translational Embeddings) Basic idea Treat relations as the translation operations between vectors corresponding to entities The score function of h + r = t China + Capital = Beijing France + Capital = Paris Loss function Optimal Margin Positive Negative 12 triple set triple set
Trans Series of KGE TransE cannot well handle 1 ‐ N, N ‐ 1, or N ‐ M relations TransH TransR TransH … TransR 13 Agenda Background Knowledge Graph Embedding (KGE) Applications of KGE Conclusions 14
The Applications of KGE Basic applications Link prediction ? Entity alignment KG integration … ? Advanced applications Vertical search KG2 KG1 (group 2) (group 1) Intelligent QA Disease diagnosis … aligned entity pairs Application 1 : Link Prediction Shared embedding based neural networks for knowledge graph completion S. Guan, X. Jin, Y. Wang, et al. The 27th ACM International Conference on Information and Knowledge Management (CIKM’18) 16
Motivation Existing methods for link prediction Handle three types of tasks: Do not distinguish them in training These prediction tasks have quite different performance Link prediction upon reasoning It is a process that gradually approaches to the target FCN of decreasing hidden nodes can imitate such a process 17 The Proposed Method Shared Embedding based Neural Network (SENN) Explicitly distinguish the three prediction tasks Integrate them into a FCN based framework Extend SENN to SENN+ Use to improve and 18
The SENN Method The framework 2 shared embedding matrices 3 substructures: head_pred , rel._pred and tail_pred 19 The Three Substructures Head_pred The score function where is the ReLU function 20
The Three Substructures Head_pred The prediction label vector where is the sigmoid or softmax function Each element of indicates the probability of the corresponding entity h to form a valid triple (h, r, t) 21 The Three Substructures rel._pred and tail_pred do similarly The score functions The prediction label vectors 22
Model Training The general loss function Idea: cross entropy of prediction and target label vectors These prediction tasks have their target label vectors , and for where is the set of valid head entities in the training set, given and Use label smoothing to regularize target label vectors where is the label smoothing parameter 23 Model Training The general loss function Binary cross ‐ entropy losses for the 3 prediction tasks The general loss for the given triple 24
Model Training The adaptively weighted loss mechanism The prediction on 1 ‐ side or M ‐ side Punish the model more severely if deterministic ones are wrong Relation prediction and entity prediction Punish wrong predictions on head/tail entities more severely The final loss function for the triple 25 The SENN+ Method Employ to improve and in test The relation ‐ aided test mechanism Given , assume that is a valid head entity If we do , is most probably have a prediction label higher than other relations and be ranked higher 26
The SENN+ Method The adaptively weighted loss mechanism Two additional relation aided vectors The final prediction label vectors for entity prediction 27 Experiments Entity prediction 28
Experiments Entity prediction in detail The adaptively weighted loss mechanism D istinguish and well learn the predictions of different mapping properties 29 Experiments Relation prediction SENN and SENN+ capture the following information to obtain better performance I mplicit information interaction among different predictions P rediction ‐ specific information 30
Application 2 : Link Prediction on N ‐ Ary Facts NaLP: Link Prediction on N ‐ ary Relational Data S. Guan, X. Jin, Y. Wang, X. Cheng. The 2019 International World Wide Web Conference (WWW 2019) 31 Motivation N ‐ ary facts are pervasive in practice Existing link prediction methods usually convert n ‐ ary facts into a few triples (i.e., binary sub ‐ facts), which has some drawbacks Needs to consider many triples and is thus more complicated The loss of structural information in some conversions that leads to inaccurate link prediction The added virtual entities and triples bring in more parameters to be learned 32
Related works A few link prediction methods focus on n ‐ ary facts directly m ‐ TransH (IJCAI ‐ 2016) A relation is defined by the mapping from a sequence of roles, corresponding to this type of relation, to their values. E.g., Receive_Award: [person, award, point in time] [Marie Curie, Nobel Prize in Chemistry, 1911] “Marie Curie received Nobel Prize in Chemistry in 1911.” Each specific mapping is an instance of the relation Generalize TransH to n ‐ ary relational data 33 Related works Also a few link prediction methods focus on n ‐ ary facts directly RAE (Relatedness Affiliated Embedding, WWW ‐ 2018) Improve m ‐ TransH by further considering values’ relatedness Ignore the roles in the above process Under different sequences of roles, the relatedness of two values is greatly different Marie Curie and Henri Becquerel (person, award, point in time, winner) (person, spouse, start time, end time, place of marriage) The proposed NaLP method explicitly models the relatedness of the role ‐ value pairs. 34
The NaLP method The presentation of each n ‐ ary fact A set of role ‐ value pairs Formally, given an n ‐ ary fact with roles, each role having values, the representation is as follows: For example, “ Marie Curie received Nobel Prize in Chemistry in 1911 .” is represented as: {person: Marie Curie, award: Nobel Prize in Chemistry, point in time: 1911} 35 The NaLP method The framework A role and its value are tightly linked to each other, thus should be bound together For a set of role ‐ value pairs, it decides if they form a valid n ‐ ary fact, i.e., if they are closely related Role ‐ value pair embedding Relatedness evaluation 36
Role ‐ value pair embedding Capture the features of the role ‐ value pairs Form the embedding matrix 37 Relatedness evaluation The principle A set of role ‐ value pairs form a valid fact → Every two role ‐ value pairs are greatly related → The values of their relatedness feature vector are large → The minimum over each feature dimension among all the pairs is not allowed to be too small → Apply element ‐ wise minimizing over the pair ‐ wise relatedness to approximately evaluate the overall relatedness 38
Relatedness evaluation Compute the relatedness between role ‐ value pairs Estimate the overall relatedness of all the role ‐ value pairs Obtain the evaluation score 39 Look into NaLP and the loss function Look into NaLP Permutation ‐ invariant to the input order of role ‐ value pairs Able to cope with facts of different arities The loss function: 40
Recommend
More recommend