CS224W: Analysis of Networks Jure Leskovec, Stanford University http://cs224w.stanford.edu
? ? ? ? Machine Learning ? Node classification 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 2
Β‘ (Supervised) Machine Learning Lifecycle: This feature, that feature. Every single time! Raw Structured Learning Model Data Data Algorithm Automatically Feature Downstream learn the features Engineering prediction task 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 3
Goal: Efficient task-independent feature learning for machine learning in networks! vec node 2 u π: π£ β β & β & Feature representation, embedding 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 4
β’ Β‘ We map each node in a network into a low- dimensional space β Β§ Distributed representation for nodes β Β§ Similarity between nodes indicates link strength Β§ Encode network information and generate node β representation 17 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 5
Β‘ Zacharyβs Karate Club network: β’ Zacharyβs Karate Network: 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 6
Graph representation learning is hard: Β‘ Images are fixed size Β§ Convolutions (CNNs) Β‘ Text is linear Β§ Sliding window (word2vec) Β‘ Graphs are neither of these! Β§ Node numbering is arbitrary (node isomorphism problem) Β§ Much more complicated structure 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 7
node2vec: Random Walk Based (Unsupervised) Feature Learning node2vec: Scalable Feature Learning for Networks A. Grover, J. Leskovec. KDD 2016. 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 8
Β‘ Goal: Embed nodes with similar network neighborhoods close in the feature space. Β‘ We frame this goal as prediction-task independent maximum likelihood optimization problem. Β‘ Key observation: Flexible notion of network neighborhood π π (π£) of node u leads to rich features. Β‘ Develop biased 2 nd order random walk procedure S to generate network neighborhood π π (π£) of node u. 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 9
Β‘ Intuition: Find embedding of nodes to d-dimensions that preserves similarity Β‘ Idea: Learn node embedding such that nearby nodes are close together Β‘ Given a node u , how do we define nearby nodes? Β§ π + π£ β¦ neighbourhood of u obtained by some strategy S 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 10
οΏ½ Β‘ Given π» = (π, πΉ) , Β‘ Our goal is to learn a mapping π: π£ β β π . Β‘ Log-likelihood objective: β max log Pr(π π (π£)| π π£ ) π£ βπ π Β§ where π π (π£) is neighborhood of node π£ . Β‘ Given node π£ , we want to learn feature representations predictive of nodes in its neighborhood π π (π£) . 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 11
οΏ½ οΏ½ οΏ½ max ? log Pr(π π (π£)| π π£ ) π π£ βπ Β‘ Assumption: Conditional likelihood factorizes over the set of neighbors. log Pr(π + (π£|π π£ ) = ? log Pr (π(π A )| π π£ ) B C βD E (F) Β‘ Softmax parametrization: exp(π π π β π(π£)) Pr(π(π π )| π π£ ) = β exp(π π€ β π(π£))) π€βπ 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 12
οΏ½ οΏ½ οΏ½ exp(π π A β π(π£)) max ? ? log β exp(π π€ β π(π£))) L MβN F βN BβD O (F) Β‘ Maximize the objective using Stochastic Gradient descent with negative sampling . Β§ Computing the summation is expensive Β§ Idea: Just sample a couple of βnegative nodesβ Β§ This means at each iteration only embeddings of a few nodes will be updated at a time Β§ Much faster training of embeddings 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 13
Two classic strategies to define a neighborhood π + π£ of a given node π£ : s 1 s 2 s 8 s 7 BFS u s 6 DFS s 9 s 4 s 5 s 3 π PQ+ π£ = { π‘ T , π‘ U , π‘ V } Local microscopic view Global macroscopic view π XQ+ π£ = { π‘ Y , π‘ Z , π‘ [ } Jure Leskovec, Stanford CS224W: Analysis of Networks, 12/4/17 14 http://cs224w.stanford.edu
u BFS: DFS: Micro-view of Macro-view of neighbourhood neighbourhood 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 15
Biased random walk π that given a node π£ generates neighborhood π + π£ Β‘ Two parameters: Β§ Return parameter π : Β§ Return back to the previous node Β§ In-out parameter π : Β§ Moving outwards (DFS) vs. inwards (BFS) 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 16
πΆ π» (π): Biased 2 nd -order random walks explore network neighborhoods: u β s 4 β ? u s 1 1 s 5 s 1 s 4 s 5 1/q u 1/p Β§ BFS-like: low value of π Β§ DFS-like: low value of π π, π can learned in a semi-supervised way 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 17
Β‘ 1) Compute random walk probs. Β‘ 2) Simulate π random walks of length π starting from each node u Β‘ 3) Optimize the node2vec objective using Stochastic Gradient Descent Linear-time complexity. All 3 steps are individually parallelizable 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 18
Interactions of characters in a novel: p=1, q=2 p=1, q=0.5 Microscopic view of the Macroscopic view of the network neighbourhood network neighbourhood 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 19
12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 20
0 . 20 0 . 20 0 . 15 0 . 15 Macro-F 1 score Macro-F 1 score 0 . 10 0 . 10 0 . 05 0 . 05 0 . 00 0 . 00 0 . 0 0 . 1 0 . 2 0 . 3 0 . 4 0 . 5 0 . 6 0 . 0 0 . 1 0 . 2 0 . 3 0 . 4 0 . 5 0 . 6 Fraction of missing edges Fraction of additional edges 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 21
General-purpose feature learning in networks: Β‘ An explicit locality preserving objective for feature learning. Β‘ Biased random walks capture diversity of network patterns. Β‘ Scalable and robust algorithm with excellent empirical performance. Β‘ Future extensions would involve designing random walk strategies entailed to network with specific structure such as heterogeneous networks and signed networks. 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 22
OhmNet: Extension to Hierarchical Networks 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 23
Letβs generalize node2vec to multilayer networks! 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 24
Β‘ Each network is a layer π» A = (π A , πΉ A ) Β‘ Similarities between layers are given in hierarchy β³ , map π encodes parent-child relationships 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 25
Β‘ Computational framework that learns features of every node and at every scale based on: Edges within each layer Β§ Inter-layer relationships between nodes active Β§ on different layers 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 26
Output: embeddings of Input nodes in layers as well as internal levels of the hierarchy π» A π» e π» f π» g 1 2 3 27
β’ OhmNet: Given layers G i and hierarchy M , learn node features captured by functions f i β’ Functions f i embed every node in a d - A multi-layer network with four dimensional feature layers and a two-level hierarchy M space 28
, hierarchy β³ Β‘ Given: Layers π» A Β§ Layers π» A AhT..j are in leaves of β³ A β β & Β‘ Goal: Learn functions: π A : π 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 29
Β‘ Approach has two components: Β§ Per-layer objectives: Nodes with similar network neighborhoods in each layer are embedded close together Β§ Hierarchical dependency objectives: Nodes in nearby layers in hierarchy are encouraged to share similar features 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 30
Β‘ Intuition: For each layer, find a mapping of nodes to π -dimensions that preserves node similarity Β‘ Approach: Similarity of nodes π£ and π€ is defined based on similarity of their network neighborhoods Β‘ Given node π£ in layer π we define nearby nodes π A (π£) based on random walks starting at node π£ 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 31
Recommend
More recommend