graph representation learning
play

Graph Representation Learning: Embedding, GNNs, and Pre-Training - PowerPoint PPT Presentation

Graph Representation Learning: Embedding, GNNs, and Pre-Training Yuxiao Dong https://ericdongyx.github.io/ Microsoft Research, Redmond Joint Work with Jiezhong Qiu Ziniu Hu Hongxia Yang Jing Zhang Tsinghua UCLA Alibaba Renmin U. of


  1. Graph Representation Learning: Embedding, GNNs, and Pre-Training Yuxiao Dong https://ericdongyx.github.io/ Microsoft Research, Redmond

  2. Joint Work with Jiezhong Qiu Ziniu Hu Hongxia Yang Jing Zhang Tsinghua UCLA Alibaba Renmin U. of China (Jie Tang) (Yizhou Sun) Jie Tang Yizhou Sun Hao Ma Kuansan Wang Tsinghua UCLA Facebook AI Microsoft Research

  3. Why Graphs?

  4. Graphs Office/ Of ice/So Social cial Gr Graph aph Bi Biol ologi ogical Ne Neura ral Networks ks Acad Academ emic Gr Graph aph Knowledge Graph Kno In Internet Transp Tr sportation figure credit: Web

  5. The Graph Mining Paradigm 𝑦 !" : node 𝑀 ! ’s π‘˜ #$ feature, e.g., 𝑀 ! ’s pagerank value Graph & Network applications β€’ Node classification β€’ Link prediction X β€’ Community detection β€’ Anomaly detection β€’ Social influence β€’ Graph evolution hand-crafted feature matrix β€’ … … feature engineering machine learning models Structural Diversity and Homophily: A Study Across More Than One Hundred Big Networks. KDD 2017.

  6. Graph Representation Learning Graph & Network applications β€’ Node classification β€’ Link prediction Z β€’ Community detection β€’ Anomaly detection β€’ Social influence β€’ Graph evolution hand-crafted latent feature matrix β€’ … … machine learning models Feature engineering learning Input: a network 𝐻 = (π‘Š, 𝐹) β€’ Output: 𝒂 ∈ 𝑆 ! Γ—# , 𝑙 β‰ͺ |π‘Š| , 𝑙 -dim vector 𝒂 $ for each node v . β€’

  7. Application: Embedding Heterogeneous Academic Graph Graph Representation Learning Academic Graph 1. https://academic.microsoft.com/ 2. Kuansan Wang et al. Microsoft Academic Graph: When experts are not enough. Quantitative Science Studies 1 (1), 396-413, 2020 3. Dong et al. metapath2vec: scalable representation learning for heterogeneous networks. In KDD 2017. 4. Code & data for metapath2vec: https://ericdongyx.github.io/metapath2vec/m2v.html

  8. Application: Similarity Search & Recommendation Johns Hopkins Harvard Stanford UChicago Yale Columbia 1. https://academic.microsoft.com/ 2. Kuansan Wang et al. Microsoft Academic Graph: When experts are not enough. Quantitative Science Studies 1 (1), 396-413, 2020 3. Dong et al. metapath2vec: scalable representation learning for heterogeneous networks. In KDD 2017. 4. Code & data for metapath2vec: https://ericdongyx.github.io/metapath2vec/m2v.html

  9. Application: Reasoning about Diabetes from MAG Symptom Cause Treatment

  10. Application: Reasoning about COVID-19 from MAG SARS-CoV-2 Oseltamivir Wasting Asymptomatic Diarrhea Lamivudine Coronavirus Azithromycin COVID-19 Antiviral drug MERS Zika Virus Rash Post-exposure prophylaxis Abdominal pain Ebola Virus Symptom Cause Treatment

  11. Graph Representation Learning Network Embedding Matrix Factorization GNNs Pre-Training

  12. Network Embedding Feature learning 𝑀 !"# 𝑀 !"$ 𝑀 ! 𝑀 !%$ 𝑀 !%# Sequences of objects Skip-Gram Words in Text β€’ Nodes in graphs β€’ 1. Mikolov, et al. Efficient estimation of word representations in vector space. In ICLR 2013 . 2. Perozzi et al. DeepWalk: Online learning of social representations. In KDD’ 14 , pp. 701–710.

  13. Distributional Hypothesis of Harris β€’ Word embedding : words in similar contexts have similar meanings (e.g., skip-gram in word embedding) β€’ Node embedding : nodes in similar structural contexts are similar β€’ DeepWalk: structural contexts are defined by co-occurrence over random walk paths Harris, Z. (1954). Distributional structure. Word , 10(23): 146-162.

  14. hide The Objective 3 π’œ 4 ) exp(π’œ 2 β„’ = # # βˆ’log(𝑄(𝑑|𝑀)) π‘ž 𝑑 𝑀 = 3 π’œ 5 ) βˆ‘ 5∈6 exp(π’œ 2 !∈# $∈% !" (!) β„’ Γ  to maximize the likelihood of node co-occurrence on a random walk path % π’œ & Γ  the possibility that node 𝑀 and context 𝑑 appear on a random walk path π’œ $

  15. Network Embedding: Random Walk + Skip-Gram 𝑀 !"# 𝑀 !"$ 𝑀 ! 𝑀 !%$ 𝑀 !%# Radom Walk Strategies: o DeepWalk (walk length > 1) o LINE (walk length = 1) o PTE (walk length = 1) o node2vec (biased random walk) o metapath2vec (heterogeneous rw) 1. Perozzi et al. DeepWalk : Online learning of social representations. In KDD’ 14 . Most Cited Paper in KDD’14. 2. Tang et al. LINE : Large scale information network embedding. In WWW’15 . Most Cited Paper in WWW’15. Grover and Leskovec. node2vec : Scalable feature learning for networks. In KDD’16. 2 nd Most Cited Paper in KDD’16. 3. 4. Dong et al. metapath2vec : scalable representation learning for heterogeneous networks. In KDD 2017. Most Cited Paper in KDD’17.

  16. Graph Representation Learning Network Embedding Matrix Factorization GNNs Pre-Training DeepWalk β€’ LINE β€’ Node2vec β€’ PTE β€’ … β€’ metapath2vec β€’

  17. hide NetMF: Network Embedding as Matrix Factorization β€’ DeepWalk β€’ LINE β€’ PTE β€’ node2vec 𝑩 Adjacency matrix b : #negative samples T : context window size 𝑬 Degree matrix π‘€π‘π‘š 𝐻 = ( ( 𝐡 !" ! " 1. Qiu et al. Network embedding as matrix factorization: unifying deepwalk, line, pte, and node2vec. In WSDM’18.

  18. Understanding Random Walk + Skip Gram π‘₯ !"# π‘₯ !"$ π‘₯ ! π‘₯ !%$ π‘₯ !%# Skip-Gram Graph Language NLP Language β€’ 𝐻: graph β€’ #(w,c): co-occurrence of w & c log(#(𝒙, 𝒅)|𝒠| 𝑐#(π‘₯)#(𝑑)) β€’ 𝑩 : adjacency matrix β€’ #(w): occurrence of word w β€’ 𝑬: degree matrix β€’ #(c): occurrence of context c β€’ π‘€π‘π‘š 𝐻 : volume of 𝐻 β€’ 𝒠: wordβˆ’context pair (w, c) multiβˆ’set β€’ |𝒠| : number of word-context pairs Levy and Goldberg. Neural word embeddings as implicit matrix factorization. In NIPS 2014

  19. Understanding Random Walk + Skip Gram NLP Language Distinguish direction and distance β€’ #(w,c): co-occurrence of w & c β€’ #(w): occurrence of word w β€’ #(c): occurrence of context c Partition the multiset 𝒠 into several sub-multisets β€’ β€’ 𝒠: wordβˆ’context pair (w, c) multiβˆ’set according to the way in which each node and its β€’ |𝒠| : number of word-context pairs context appear in a random walk node sequence. More formally, for 𝑠 = 1, 2, β‹― , π‘ˆ , we define β€’

  20. Understanding Random Walk + Skip Gram the length of random walk 𝑀 β†’ ∞

  21. Understanding Random Walk + Skip Gram Graph Language 𝑩 Adjacency matrix 𝑬 Degree matrix π‘€π‘π‘š 𝐻 = ( ( 𝐡 !" ! " b : #negative samples T : context window size

  22. Understanding Random Walk + Skip Gram π‘₯ !"# π‘₯ !"$ π‘₯ ! π‘₯ !%$ π‘₯ !%# DeepWalk is asymptotically and implicitly factorizing 𝑩 Adjacency matrix 𝑬 Degree matrix π‘€π‘π‘š 𝐻 = ( ( 𝐡 !" ! " b : #negative samples T : context window size 1. Qiu et al. Network embedding as matrix factorization: unifying deepwalk, line, pte, and node2vec. In WSDM’18.

  23. Unifying DeepWalk, LINE, PTE, & node2vec as Matrix Factorization β€’ DeepWalk β€’ LINE β€’ PTE β€’ node2vec Qiu et al. Network embedding as matrix factorization: unifying deepwalk, line, pte, and node2vec. In WSDM’18. The most cited paper in WSDM’18 as of May 2019

  24. NetMF: Explicitly Factorizing the DeepWalk Matrix Matrix π‘₯ !"# π‘₯ !"$ π‘₯ ! Factorization π‘₯ !%$ π‘₯ !%# DeepWalk is asymptotically and implicitly factorizing 1. Qiu et al. Network embedding as matrix factorization: unifying deepwalk, line, pte, and node2vec. In WSDM’18. 2. Code &data for NetMF: https://github.com/xptree/NetMF

  25. NetMF 1. Construction 2. Factorization 𝑻 = 1. Qiu et al. Network embedding as matrix factorization: unifying deepwalk, line, pte, and node2vec. In WSDM’18. 2. Code &data for NetMF: https://github.com/xptree/NetMF

  26. Results Explicit matrix factorization (NetMF) offers performance gains over implicit matrix factorization (DeepWalk & LINE) 1. Qiu et al. Network embedding as matrix factorization: unifying deepwalk, line, pte, and node2vec. In WSDM’18. 2. Code &data for NetMF: https://github.com/xptree/NetMF

  27. Network Embedding Random Walk Skip Gram DeepWalk, LINE, node2vec, metapath2vec (dense) Matrix Output: 𝑻 = 𝑔(𝑩) Input: Factorization Vectors Adjacency Matrix NetMF 𝒂 𝑩 𝑔 𝑩 = 1. Qiu et al. Network embedding as matrix factorization: unifying deepwalk, line, pte, and node2vec. In WSDM’18. 2. Code &data for NetMF: https://github.com/xptree/NetMF

  28. Challenge? 𝑻 = π‘œ " non-zeros Dense!! Time complexity 𝑃(π‘œ # )

  29. NetMF How can we solve this issue? 1. Construction 2. Factorization 𝑻 = 1. Qiu et al. NetSMF: Network embedding as sparse matrix factorization. In WWW 2019. 2. Code & data for NetSMF: https://github.com/xptree/NetSMF

  30. NetSMF--Sparse How can we solve this issue? 1. Sparse Construction 2. Sparse Factorization 𝑻 = 1. Qiu et al. NetSMF: Network embedding as sparse matrix factorization. In WWW 2019. 2. Code & data for NetSMF: https://github.com/xptree/NetSMF

Recommend


More recommend