http://cs224w.stanford.edu ? ? ? ? Machine Learning ? Node - PowerPoint PPT Presentation

CS224W: Analysis of Networks Jure Leskovec, Stanford University http://cs224w.stanford.edu

? ? ? ? Machine Learning ? Node classification 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 2

¡ (Supervised) Machine Learning Lifecycle: This feature, that feature. Every single time! Raw Structured Learning Model Data Data Algorithm Automatically Feature Downstream learn the features Engineering prediction task 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 3

Goal: Efficient task-independent feature learning for machine learning in networks! vec node 2 u 𝑔: 𝑣 → ℝ & ℝ & Feature representation, embedding 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 4

• ¡ We map each node in a network into a low- dimensional space – § Distributed representation for nodes – § Similarity between nodes indicates link strength § Encode network information and generate node – representation 17 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 5

¡ Zachary’s Karate Club network: • Zachary’s Karate Network: 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 6

Graph representation learning is hard: ¡ Images are fixed size § Convolutions (CNNs) ¡ Text is linear § Sliding window (word2vec) ¡ Graphs are neither of these! § Node numbering is arbitrary (node isomorphism problem) § Much more complicated structure 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 7

node2vec: Random Walk Based (Unsupervised) Feature Learning node2vec: Scalable Feature Learning for Networks A. Grover, J. Leskovec. KDD 2016. 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 8

¡ Goal: Embed nodes with similar network neighborhoods close in the feature space. ¡ We frame this goal as prediction-task independent maximum likelihood optimization problem. ¡ Key observation: Flexible notion of network neighborhood 𝑂 𝑇 (𝑣) of node u leads to rich features. ¡ Develop biased 2 nd order random walk procedure S to generate network neighborhood 𝑂 𝑇 (𝑣) of node u. 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 9

¡ Intuition: Find embedding of nodes to d-dimensions that preserves similarity ¡ Idea: Learn node embedding such that nearby nodes are close together ¡ Given a node u , how do we define nearby nodes? § 𝑂 + 𝑣 … neighbourhood of u obtained by some strategy S 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 10

� ¡ Given 𝐻 = (𝑊, 𝐹) , ¡ Our goal is to learn a mapping 𝑔: 𝑣 → ℝ 𝑒 . ¡ Log-likelihood objective: ∑ max log Pr(𝑂 𝑇 (𝑣)| 𝑔 𝑣 ) 𝑣 ∈𝑊 𝑔 § where 𝑂 𝑇 (𝑣) is neighborhood of node 𝑣 . ¡ Given node 𝑣 , we want to learn feature representations predictive of nodes in its neighborhood 𝑂 𝑇 (𝑣) . 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 11

� � � max ? log Pr(𝑂 𝑇 (𝑣)| 𝑔 𝑣 ) 𝑔 𝑣 ∈𝑊 ¡ Assumption: Conditional likelihood factorizes over the set of neighbors. log Pr(𝑂 + (𝑣|𝑔 𝑣 ) = ? log Pr (𝑔(𝑜 A )| 𝑔 𝑣 ) B C ∈D E (F) ¡ Softmax parametrization: exp(𝑔 𝑜 𝑗 ⋅𝑔(𝑣)) Pr(𝑔(𝑜 𝑗 )| 𝑔 𝑣 ) = ∑ exp(𝑔 𝑤 ⋅𝑔(𝑣))) 𝑤∈𝑊 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 12

� � � exp(𝑔 𝑜 A ⋅ 𝑔(𝑣)) max ? ? log ∑ exp(𝑔 𝑤 ⋅ 𝑔(𝑣))) L M∈N F ∈N B∈D O (F) ¡ Maximize the objective using Stochastic Gradient descent with negative sampling . § Computing the summation is expensive § Idea: Just sample a couple of “negative nodes” § This means at each iteration only embeddings of a few nodes will be updated at a time § Much faster training of embeddings 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 13

Two classic strategies to define a neighborhood 𝑂 + 𝑣 of a given node 𝑣 : s 1 s 2 s 8 s 7 BFS u s 6 DFS s 9 s 4 s 5 s 3 𝑂 PQ+ 𝑣 = { 𝑡 T , 𝑡 U , 𝑡 V } Local microscopic view Global macroscopic view 𝑂 XQ+ 𝑣 = { 𝑡 Y , 𝑡 Z , 𝑡 [ } Jure Leskovec, Stanford CS224W: Analysis of Networks, 12/4/17 14 http://cs224w.stanford.edu

u BFS: DFS: Micro-view of Macro-view of neighbourhood neighbourhood 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 15

Biased random walk 𝑇 that given a node 𝑣 generates neighborhood 𝑂 + 𝑣 ¡ Two parameters: § Return parameter 𝑞 : § Return back to the previous node § In-out parameter 𝑟 : § Moving outwards (DFS) vs. inwards (BFS) 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 16

𝑶 𝑻 (𝒗): Biased 2 nd -order random walks explore network neighborhoods: u → s 4 → ? u s 1 1 s 5 s 1 s 4 s 5 1/q u 1/p § BFS-like: low value of 𝑞 § DFS-like: low value of 𝑟 𝑞, 𝑟 can learned in a semi-supervised way 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 17

¡ 1) Compute random walk probs. ¡ 2) Simulate 𝑠 random walks of length 𝑚 starting from each node u ¡ 3) Optimize the node2vec objective using Stochastic Gradient Descent Linear-time complexity. All 3 steps are individually parallelizable 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 18

Interactions of characters in a novel: p=1, q=2 p=1, q=0.5 Microscopic view of the Macroscopic view of the network neighbourhood network neighbourhood 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 19

12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 20

0 . 20 0 . 20 0 . 15 0 . 15 Macro-F 1 score Macro-F 1 score 0 . 10 0 . 10 0 . 05 0 . 05 0 . 00 0 . 00 0 . 0 0 . 1 0 . 2 0 . 3 0 . 4 0 . 5 0 . 6 0 . 0 0 . 1 0 . 2 0 . 3 0 . 4 0 . 5 0 . 6 Fraction of missing edges Fraction of additional edges 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 21

General-purpose feature learning in networks: ¡ An explicit locality preserving objective for feature learning. ¡ Biased random walks capture diversity of network patterns. ¡ Scalable and robust algorithm with excellent empirical performance. ¡ Future extensions would involve designing random walk strategies entailed to network with specific structure such as heterogeneous networks and signed networks. 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 22

OhmNet: Extension to Hierarchical Networks 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 23

Let’s generalize node2vec to multilayer networks! 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 24

¡ Each network is a layer 𝐻 A = (𝑊 A , 𝐹 A ) ¡ Similarities between layers are given in hierarchy ℳ , map 𝜌 encodes parent-child relationships 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 25

¡ Computational framework that learns features of every node and at every scale based on: Edges within each layer § Inter-layer relationships between nodes active § on different layers 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 26

Output: embeddings of Input nodes in layers as well as internal levels of the hierarchy 𝐻 A 𝐻 e 𝐻 f 𝐻 g 1 2 3 27

• OhmNet: Given layers G i and hierarchy M , learn node features captured by functions f i • Functions f i embed every node in a d - A multi-layer network with four dimensional feature layers and a two-level hierarchy M space 28

, hierarchy ℳ ¡ Given: Layers 𝐻 A § Layers 𝐻 A AhT..j are in leaves of ℳ A → ℝ & ¡ Goal: Learn functions: 𝑔 A : 𝑊 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 29

¡ Approach has two components: § Per-layer objectives: Nodes with similar network neighborhoods in each layer are embedded close together § Hierarchical dependency objectives: Nodes in nearby layers in hierarchy are encouraged to share similar features 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 30

¡ Intuition: For each layer, find a mapping of nodes to 𝑒 -dimensions that preserves node similarity ¡ Approach: Similarity of nodes 𝑣 and 𝑤 is defined based on similarity of their network neighborhoods ¡ Given node 𝑣 in layer 𝑗 we define nearby nodes 𝑂 A (𝑣) based on random walks starting at node 𝑣 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 31

http://cs224w.stanford.edu ? ? ? ? Machine Learning ? Node - PowerPoint PPT Presentation

CS224W: Analysis of Networks Jure Leskovec, Stanford University http://cs224w.stanford.edu ? ? ? ? Machine Learning ? Node classification 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 2

http://cs224w.stanford.edu October August 12/3/2013 Jure Leskovec, Stanford CS224W: Social and

http://cs224w.stanford.edu 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information

http://cs224w.stanford.edu Course website: Course website: http://cs224w.stanford.edu

http://cs224w.stanford.edu 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information

http://cs224w.stanford.edu Nodes Nodes Network Adjacency matrix 11/30/17 Jure Leskovec,

http://cs224w.stanford.edu ? ? ? ? Machine Learning ? Node classification 10/15/19 Jure

http://cs224w.stanford.edu Output: Node embeddings. We can also embed larger network

http://cs224w.stanford.edu Stanford Social Web (ca. 1999) network

http://cs224w.stanford.edu Networks of tightly Networks of tightly connected groups

http://cs224w.stanford.edu Spreading through networks: Spreading through networks:

http://cs224w.stanford.edu Non overlapping vs overlapping communities Non overlapping

http://cs224w.stanford.edu Teams of 2 3 students (1 is also ok) Teams of 2 3 students

http://cs224w.stanford.edu How to organize/navigate it? How to organize/navigate it?

http://cs224w.stanford.edu Probabilistic models of network contagion Probabilistic models

http://cs224w.stanford.edu [LibenNowell Kleinberg 03] Link prediction task: Link

http://cs224w.stanford.edu In decision-based models nodes make decisions based on pay-off

Kidney Disease (CKD): evidence to date Prof. Sophie de Seigneux, MD, PhD Service de

Disclosures Current Issues in Practical Hematopathology: Thymomas I have no disclosures

Digital Optimization How AARP Services increased referrals and membership renewals Session

A United Welfare Analysis of Government Policies Executive Summary July 2019 Nathaniel Hendren

Q. Why do we have a brain? A. To produce adaptable and complex movements movement is the

SABATI EMATOLOGICI DELLA ROMAGNA Cesena 28 maggio 2016 Lematologo di fronte alla

Ridit splines with applications to propensity weighting Roger B. Newson r.newson@imperial.ac.uk

Part I: Youth Perspectives on COVID- 19, Racism, and Returning to School Cultural Responsiveness