metapath2vec Scalable Representation Learning for Heterogeneous Networks Yuxiao Dong Nitesh V. Chawla Ananthram Swami Microsoft Research University of Notre Dame Army Research Lab & Notre Dame Interdisciplinary Center for Network Science and Applications ( iCeNSA ) University of Notre Dame
Conventional Network Mining and Learning Network Mining Tasks node attribute inference ♣ community detection ♣ ♣ similarity search ♣ link prediction social recommendation ♣ … ♣ hand-crafted feature matrix feature engineering machine learning models 1
Network Embedding for Mining and Learning ? Network Mining Tasks node attribute inference ♣ community detection ♣ X ♣ similarity search ♣ link prediction social recommendation ♣ … ♣ latent representation matrix feature learning machine learning models Y. Bengio, A. Courville, and P. Vincent. Representation learning: A review and new perspectives. IEEE TPAMI , 35(8):1798 – 1828, 2013. 2 Y. LeCun, Y. Bengio, and G. Hinton. Deep learning. Nature , 521(7553):436 – 444, 2015.
Word Embedding in NLP ♣ Input: a text corpus 𝐸 = {𝑋} ♣ Output: 𝒀 ∈ 𝑆 𝑋 ×𝑒 , 𝑒 ≪ |𝑋| , d -dim vector 𝒀 𝑥 for each word w. input hidden output o Computational lens on big social 𝑥 𝑗−2 and information networks. o The connections between individuals form the structural … 𝑥 𝑗−1 o In a network sense, individuals X matters in the ways in which ... 𝑥 𝑗 o Accordingly, this thesis develops computational models to investigating the ways that ... 𝑥 𝑗+1 o We study two fundamental and interconnected directions: user demographics and network 𝑥 𝑗+2 diversity o ... ... sentences word2vec latent representation vector ♣ geographically close words---a word and its context words---in a sentence or document exhibit interrelations in human natural language. T. Mikolov, I Sutskever, K Chen, GS Corrado, J Dean. Distributed representations of words and phrases and their compositionality. In NIPS ’13 , pp. 3111-31119. 1. 3 T. Mikolov, K. Chen, G. Corrado , and J. Dean, “Efficient estimation of word representations in vector space,” arXiv:1301.3781 , 2013. 2.
Network Embedding ♣ Input: a network 𝐻 = (𝑊, 𝐹) ♣ Output: 𝒀 ∈ 𝑆 𝑊 ×𝑒 , 𝑒 ≪ |𝑊| , d -dim vector 𝒀 𝑤 for each node v. input hidden output v 1 v 1 v 5 𝑑 𝑗−2 v 2 v 3 v 3 𝑑 𝑗−1 … ... X 𝑤 v 3 v 5 v 2 𝑑 𝑗+1 v 5 v 1 v 3 𝑑 𝑗+2 random walk paths latent representation vector word2vec (sentences) DeepWalk [Perozzi et al., KDD14] B. Perozzi, R. Al-Rfou, and S. Skiena , “ DeepWalk : Online learning of social representations,” in KDD ’ 14 , pp. 701 – 710. 1. A. Grover, J. Leskovec. node2vec: Scalable Feature Learning for Networks. in KDD ’16 , pp. 855 — 864. 2. T. Mikolov, I Sutskever, K Chen, GS Corrado, J Dean. Distributed representations of words and phrases and their compositionality. In NIPS ’13 , pp. 3111-31119. 3. 4 T. Mikolov, K. Chen, G. Corrado , and J. Dean, “Efficient estimation of word representations in vector space,” arXiv:1301.3781 , 2013. 4.
Heterogeneous Network Embedding: Problem ♣ Input: a heterogeneous information network 𝐻 = (𝑊, 𝐹, 𝑈) ♣ Output: 𝒀 ∈ 𝑆 𝑊 ×𝑒 , 𝑒 ≪ |𝑊| , d -dim vector 𝒀 𝑤 for each node v. ? X latent representation vector 5
Heterogeneous Network Embedding: Challenges How do we effectively preserve the concept of “node - context” ♣ among multiple types of nodes, e.g., authors, papers, & venues in academic heterogeneous networks? ♣ Can we directly apply homogeneous network embedding architectures to heterogeneous networks? ♣ It is also difficult for conventional meta-path based methods to model similarities between nodes without connected meta-paths. 6
Heterogeneous Network Embedding: Solutions metapath2vec skip-gram meta-path-based random walks heterogeneous metapath2vec++ skip-gram 7
metapath2vec meta-path-based skip-gram random walks hidden output layer input layer layer prob. that KDD 0 KDD apears ACL 0 a 1 0 a 2 0 a 3 0 a 4 ... ... 1 a 5 0 MIT 0 CMU 0 p 1 0 p 2 0 p 3 prob. that 0 p 3 appears |V|-dim |V| x k 1. Y. Sun, J. Han. Mining heterogeneous information networks: Principles and Methodologies. Morgan & Claypool Publishers, 2012. 8 T. Mikolov, et al. Distributed representations of words and phrases and their compositionality. In NIPS ’13. 2.
metapath2vec: Meta-Path-Based Random Walks Goal: to generate paths that are able to capture both the semantic and structural correlations between different types of nodes, facilitating the transformation of heterogeneous network structures into skip-gram. 9
metapath2vec: Meta-Path-Based Random Walks Given a meta-path scheme ♣ The transition probability at step i is defined as ♣ Recursive guidance for random walkers, i.e., ♣ 10
metapath2vec: Meta-Path-Based Random Walks Given a meta-path scheme (Example) ♣ OAPVPAO In a traditional random walk procedure, in the toy example, ♣ the next step of a walker on node a4 transitioned from node CMU can be all types of nodes surrounding it — a2, a3, a5, p2, p3, and CMU. Under the meta- path scheme ‘OAPVPAO’, for example, the ♣ walker is biased towards paper nodes (P) given its previous step on an organization node CMU (O), following the semantics of this meta-path. 11
metapath2vec meta-path-based skip-gram random walks hidden output layer input layer layer prob. that KDD 0 KDD apears ACL 0 a 1 0 a 2 0 a 3 0 a 4 ... ... 1 a 5 0 MIT 0 CMU 0 p 1 0 p 2 0 The potential issue of skip-gram for p 3 prob. that 0 p 3 appears |V|-dim |V| x k heterogeneous network embedding: To predict the context node 𝑑 𝑢 (type t ) given a node v , metapath2vec encourages all types of nodes to appear in this context position 1. Y. Sun, J. Han. Mining heterogeneous information networks: Principles and Methodologies. Morgan & Claypool Publishers, 2012. 12 T. Mikolov, et al. Distributed representations of words and phrases and their compositionality. In NIPS ’13. 2.
metapath2vec++ heterogeneous meta-path-based skip-gram random walks output layer prob. that KDD appears prob. that hidden ACL appears input layer |V V | x k V layer KDD 0 ACL 0 a 1 prob. that 0 a 3 appears a 2 0 a 3 prob. that 0 a 5 appears a 4 |V A | x k A 1 a 5 0 prob. that MIT 0 CMU appears CMU 0 p 1 |V o | x k o 0 p 2 0 p 3 0 prob. that p 2 appears |V|-dim prob. that p 3 appears |V p | x k P 13
metapath2vec++: Heterogeneous Skip-Gram output layer prob. that KDD appears prob. that ♣ softmax in metapath2vec hidden ACL appears input layer |V V | x k V layer KDD 0 ACL 0 a 1 prob. that 0 a 3 appears a 2 0 a 3 ♣ softmax in metapath2vec++ prob. that 0 a 5 appears a 4 |V A | x k A 1 a 5 0 prob. that MIT 0 CMU appears CMU 0 p 1 |V o | x k o 0 p 2 0 p 3 0 prob. that p 2 appears |V|-dim prob. that p 3 appears |V p | x k P ♣ objective function (heterogeneous ♣ stochastic gradient descent negative sampling) 14 T. Mikolov, et al. Distributed representations of words and phrases and their compositionality. In NIPS ’13. 1.
metapath2vec++ ♣ every sub-procedure is easy to parallelize ♣ 24-32X speedup by using 40 cores 40 metapath2vec metapath2vec++ 32 24 speedup 16 8 4 2 1 12 4 8 16 24 32 40 #threads 15
Network Mining and Learning Paradigm Network Applications node attribute inference ♣ community detection ♣ metapath2vec X similarity search ♣ ♣ link prediction metapath2vec++ ♣ social recommendation … ♣ latent representation vector 16
Experiments Heterogeneous Data Baselines Mining Tasks ♣ DeepWalk [KDD ’14] ♣ AMiner Academic Network ♣ node classification o 9 1.7 million authors o logistic regression ♣ node2vec [KDD ’16] o 3 million papers ♣ node clustering ♣ LINE [WWW ’15] o 3800+ venues o k-means o 8 research areas ♣ PTE [KDD ’15] ♣ similarity search o cosine similarity Parameters ♣ #walks: 1000 ♣ walk-length: 100 ♣ #dimensions: 128 ♣ neighborhood size: 7 J. Tang, et al. ArnetMiner: Extraction and Mining of Academic Social Networks. In KDD 2008. 17 https://aminer.org/aminernetwork publications
Application 1: Multi-Class Node Classification 18
Application 1: Multi-Class Node Classification 19
Application 2: Node Clustering 20 http://projector.tensorflow.org/
Application 3: Similarity Search 21
Visualization word2vec [ Mikolov, 2013 ] 22 http://projector.tensorflow.org/
Problem: Heterogeneous Network Embedding ♣ Models: metapath2vec & metapath2vec++ ♣ ♧ The automatic discovery of internal semantic relationships between different types of nodes in heterogeneous networks Applications: classification, clustering, & ♣ similarity search 23
Thank you! Data & Code https://ericdongyx.github.io/metapath2vec/m2v.html 24
Recommend
More recommend