http cs224w stanford edu
play

http://cs224w.stanford.edu ? ? ? ? Machine Learning ? Node - PowerPoint PPT Presentation

CS224W: Analysis of Networks Jure Leskovec, Stanford University http://cs224w.stanford.edu ? ? ? ? Machine Learning ? Node classification 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 2


  1. CS224W: Analysis of Networks Jure Leskovec, Stanford University http://cs224w.stanford.edu

  2. ? ? ? ? Machine Learning ? Node classification 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 2

  3. Β‘ (Supervised) Machine Learning Lifecycle: This feature, that feature. Every single time! Raw Structured Learning Model Data Data Algorithm Automatically Feature Downstream learn the features Engineering prediction task 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 3

  4. Goal: Efficient task-independent feature learning for machine learning in networks! vec node 2 u 𝑔: 𝑣 β†’ ℝ & ℝ & Feature representation, embedding 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 4

  5. β€’ Β‘ We map each node in a network into a low- dimensional space – Β§ Distributed representation for nodes – Β§ Similarity between nodes indicates link strength Β§ Encode network information and generate node – representation 17 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 5

  6. Β‘ Zachary’s Karate Club network: β€’ Zachary’s Karate Network: 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 6

  7. Graph representation learning is hard: Β‘ Images are fixed size Β§ Convolutions (CNNs) Β‘ Text is linear Β§ Sliding window (word2vec) Β‘ Graphs are neither of these! Β§ Node numbering is arbitrary (node isomorphism problem) Β§ Much more complicated structure 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 7

  8. node2vec: Random Walk Based (Unsupervised) Feature Learning node2vec: Scalable Feature Learning for Networks A. Grover, J. Leskovec. KDD 2016. 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 8

  9. Β‘ Goal: Embed nodes with similar network neighborhoods close in the feature space. Β‘ We frame this goal as prediction-task independent maximum likelihood optimization problem. Β‘ Key observation: Flexible notion of network neighborhood 𝑂 𝑇 (𝑣) of node u leads to rich features. Β‘ Develop biased 2 nd order random walk procedure S to generate network neighborhood 𝑂 𝑇 (𝑣) of node u. 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 9

  10. Β‘ Intuition: Find embedding of nodes to d-dimensions that preserves similarity Β‘ Idea: Learn node embedding such that nearby nodes are close together Β‘ Given a node u , how do we define nearby nodes? Β§ 𝑂 + 𝑣 … neighbourhood of u obtained by some strategy S 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 10

  11. οΏ½ Β‘ Given 𝐻 = (π‘Š, 𝐹) , Β‘ Our goal is to learn a mapping 𝑔: 𝑣 β†’ ℝ 𝑒 . Β‘ Log-likelihood objective: βˆ‘ max log Pr(𝑂 𝑇 (𝑣)| 𝑔 𝑣 ) 𝑣 βˆˆπ‘Š 𝑔 Β§ where 𝑂 𝑇 (𝑣) is neighborhood of node 𝑣 . Β‘ Given node 𝑣 , we want to learn feature representations predictive of nodes in its neighborhood 𝑂 𝑇 (𝑣) . 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 11

  12. οΏ½ οΏ½ οΏ½ max ? log Pr(𝑂 𝑇 (𝑣)| 𝑔 𝑣 ) 𝑔 𝑣 βˆˆπ‘Š Β‘ Assumption: Conditional likelihood factorizes over the set of neighbors. log Pr(𝑂 + (𝑣|𝑔 𝑣 ) = ? log Pr (𝑔(π‘œ A )| 𝑔 𝑣 ) B C ∈D E (F) Β‘ Softmax parametrization: exp(𝑔 π‘œ 𝑗 ⋅𝑔(𝑣)) Pr(𝑔(π‘œ 𝑗 )| 𝑔 𝑣 ) = βˆ‘ exp(𝑔 𝑀 ⋅𝑔(𝑣))) π‘€βˆˆπ‘Š 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 12

  13. οΏ½ οΏ½ οΏ½ exp(𝑔 π‘œ A β‹… 𝑔(𝑣)) max ? ? log βˆ‘ exp(𝑔 𝑀 β‹… 𝑔(𝑣))) L M∈N F ∈N B∈D O (F) Β‘ Maximize the objective using Stochastic Gradient descent with negative sampling . Β§ Computing the summation is expensive Β§ Idea: Just sample a couple of β€œnegative nodes” Β§ This means at each iteration only embeddings of a few nodes will be updated at a time Β§ Much faster training of embeddings 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 13

  14. Two classic strategies to define a neighborhood 𝑂 + 𝑣 of a given node 𝑣 : s 1 s 2 s 8 s 7 BFS u s 6 DFS s 9 s 4 s 5 s 3 𝑂 PQ+ 𝑣 = { 𝑑 T , 𝑑 U , 𝑑 V } Local microscopic view Global macroscopic view 𝑂 XQ+ 𝑣 = { 𝑑 Y , 𝑑 Z , 𝑑 [ } Jure Leskovec, Stanford CS224W: Analysis of Networks, 12/4/17 14 http://cs224w.stanford.edu

  15. u BFS: DFS: Micro-view of Macro-view of neighbourhood neighbourhood 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 15

  16. Biased random walk 𝑇 that given a node 𝑣 generates neighborhood 𝑂 + 𝑣 Β‘ Two parameters: Β§ Return parameter π‘ž : Β§ Return back to the previous node Β§ In-out parameter π‘Ÿ : Β§ Moving outwards (DFS) vs. inwards (BFS) 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 16

  17. 𝑢 𝑻 (𝒗): Biased 2 nd -order random walks explore network neighborhoods: u β†’ s 4 β†’ ? u s 1 1 s 5 s 1 s 4 s 5 1/q u 1/p Β§ BFS-like: low value of π‘ž Β§ DFS-like: low value of π‘Ÿ π‘ž, π‘Ÿ can learned in a semi-supervised way 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 17

  18. Β‘ 1) Compute random walk probs. Β‘ 2) Simulate 𝑠 random walks of length π‘š starting from each node u Β‘ 3) Optimize the node2vec objective using Stochastic Gradient Descent Linear-time complexity. All 3 steps are individually parallelizable 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 18

  19. Interactions of characters in a novel: p=1, q=2 p=1, q=0.5 Microscopic view of the Macroscopic view of the network neighbourhood network neighbourhood 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 19

  20. 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 20

  21. 0 . 20 0 . 20 0 . 15 0 . 15 Macro-F 1 score Macro-F 1 score 0 . 10 0 . 10 0 . 05 0 . 05 0 . 00 0 . 00 0 . 0 0 . 1 0 . 2 0 . 3 0 . 4 0 . 5 0 . 6 0 . 0 0 . 1 0 . 2 0 . 3 0 . 4 0 . 5 0 . 6 Fraction of missing edges Fraction of additional edges 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 21

  22. General-purpose feature learning in networks: Β‘ An explicit locality preserving objective for feature learning. Β‘ Biased random walks capture diversity of network patterns. Β‘ Scalable and robust algorithm with excellent empirical performance. Β‘ Future extensions would involve designing random walk strategies entailed to network with specific structure such as heterogeneous networks and signed networks. 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 22

  23. OhmNet: Extension to Hierarchical Networks 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 23

  24. Let’s generalize node2vec to multilayer networks! 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 24

  25. Β‘ Each network is a layer 𝐻 A = (π‘Š A , 𝐹 A ) Β‘ Similarities between layers are given in hierarchy β„³ , map 𝜌 encodes parent-child relationships 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 25

  26. Β‘ Computational framework that learns features of every node and at every scale based on: Edges within each layer Β§ Inter-layer relationships between nodes active Β§ on different layers 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 26

  27. Output: embeddings of Input nodes in layers as well as internal levels of the hierarchy 𝐻 A 𝐻 e 𝐻 f 𝐻 g 1 2 3 27

  28. β€’ OhmNet: Given layers G i and hierarchy M , learn node features captured by functions f i β€’ Functions f i embed every node in a d - A multi-layer network with four dimensional feature layers and a two-level hierarchy M space 28

  29. , hierarchy β„³ Β‘ Given: Layers 𝐻 A Β§ Layers 𝐻 A AhT..j are in leaves of β„³ A β†’ ℝ & Β‘ Goal: Learn functions: 𝑔 A : π‘Š 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 29

  30. Β‘ Approach has two components: Β§ Per-layer objectives: Nodes with similar network neighborhoods in each layer are embedded close together Β§ Hierarchical dependency objectives: Nodes in nearby layers in hierarchy are encouraged to share similar features 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 30

  31. Β‘ Intuition: For each layer, find a mapping of nodes to 𝑒 -dimensions that preserves node similarity Β‘ Approach: Similarity of nodes 𝑣 and 𝑀 is defined based on similarity of their network neighborhoods Β‘ Given node 𝑣 in layer 𝑗 we define nearby nodes 𝑂 A (𝑣) based on random walks starting at node 𝑣 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 31

Recommend


More recommend