http://cs224w.stanford.edu Main question today: Given a network with - PowerPoint PPT Presentation

CS224W: Machine Learning with Graphs Jure Leskovec with Srijan Kumar , Stanford University http://cs224w.stanford.edu

¡ Main question today: Given a network with labels on some nodes, how do we assign labels to all other nodes in the network? ¡ Example: In a network, some nodes are fraudsters and some nodes are fully trusted. How do you find the other fraudsters and trustworthy nodes? 10/17/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 2

Jure Leskovec, Stanford ? ? ? ? ? ¡ Given labels of some nodes ¡ Let’s predict labels of unlabeled nodes ¡ This is called semi-supervised node classification

¡ Main question today: Given a network with labels on some nodes, how do we assign labels to all other nodes in the network? ¡ Collective classification: Idea of assigning labels to all nodes in a network together ¡ Intuition: Correlations exist in networks. Leverage them! ¡ We will look at three techniques today: § Relational classification § Iterative classification § Belief propagation 10/17/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 4

¡ Individual behaviors are correlated in a network environment ¡ Three main types of dependencies that lead to correlation: Homophily Influence Confounding 10/17/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 5

¡ Homophily : the tendency of individuals to associate and bond with similar others § “Birds of a feather flock together” § It has been observed in a vast array of network studies, based on a variety of attributes (e.g., age, gender, organizational role, etc.) § Example : people who like the same music genre are more likely to establish a social connection (meeting at concerts, interacting in music forums, etc.) ¡ Influence : social connections can influence the individual characteristics of a person. § We will cover this in depth next month! § Example : I recommend my “peculiar” musical preferences to my friends, until one of them grows to like my same favorite genres J 10/17/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 6

Example: ¡ Real social network § Nodes = people § Edges = friendship § Node color = race ¡ People are segregated by race due to homophily (Easley and Kleinberg, 2010) 10/17/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 7

¡ How do we leverage this correlation observed in networks to help predict node labels? How do we predict the labels for the nodes in beige? 10/17/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 8

¡ Similar nodes are typically close together or directly connected: § “ Guilt-by-association ”: If I am connected to a node with label 𝑌 , then I am likely to have label 𝑌 as well. § Example: Malicious/benign web page: Malicious web pages link to one another to increase visibility, look credible, and rank higher in search engines 10/17/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 9

¡ Classification label of an object 𝑃 in network may depend on: § Features of 𝑃 § Labels of the objects in 𝑃 ’s neighborhood § Features of objects in 𝑃 ’s neighborhood 10/17/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 10

Given : • Graph • Few labeled nodes Find : class (red/green) of remaining nodes Assuming : Networks have homophily 10/17/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 11

¡ Let 𝑿 be a 𝑜×𝑜 (weighted) adjacency matrix over 𝑜 nodes ¡ Let Y = −1, 0, 1 + be a vector of labels : § 1: positive node § -1: negative node § 0: unlabeled node ¡ Goal: Predict which unlabeled nodes are likely positive 10/17/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 13

¡ Intuition : Simultaneous classification of interlinked nodes using correlations ¡ Several applications § Document classification § Part of speech tagging § Link prediction § Optical character recognition § Image/3D data segmentation § Entity resolution in sensor networks § Spam and fraud detection 10/17/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 14

¡ Markov Assumption: the label Y i of one node i depends on the labels of its neighbors N i 𝑄(𝑍 / |𝑗) = 𝑄 𝑍 / 𝑂 / ) ¡ Collective classification involves 3 steps: Local Classifier Relational Classifier Collective Inference • Assign initial • Capture • Propagate labels correlations correlations between nodes through network 15 10/17/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu

Local Classifier: Used for initial label assignment Local Classifier § Predicts label based on node attributes/features • Assign initial § Standard classification task labels § Does not use network information Relational Classifier: Capture correlations based Relational Classifier on the network • Capture Learns a classier to label one node based on the • correlations labels and/or attributes of its neighbors between nodes This is where network information is used • Collective Inference: Propagate the correlation Collective Inference Apply relational classifier to each node iteratively • Iterate until the inconsistency between neighboring • • Propagate correlations labels is minimized through network Network structure substantially affects the final • prediction 10/17/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 16

¡ Exact inference is practical only when the network satisfies certain conditions § Exact inference is NP-hard for arbitrary networks ¡ We will look at techniques for approximate inference: Intuition: Exact vs. Approximate If we represent every node as a discrete random variable with a joint § Relational classifiers mass function 𝑞 of its class membership, the marginal distribution § Iterative classification of a node is the summation of 𝑞 over all the other nodes. The exact solution takes exponential § Belief propagation time in the number of nodes, therefore we use inference ¡ All are iterative algorithms techniques that approximate the solution by narrowing the scope of the propagation (e.g., only neighbors) and the number of variables by means of aggregation. 10/17/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 17

¡ How to predict the labels 𝑍 𝑗 for the nodes 𝑗 in beige? ¡ Each node 𝑗 has a feature vector 𝑔 𝑗 ¡ Labels for some nodes are given (+ for green, - for blue) ¡ Task: Find 𝑄(𝑍𝑗) given all features and the network 𝑄(𝑍𝑗) = ? 10/17/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 18

¡ Basic idea: Class probability of 𝑍 𝑗 is a weighted average of class probabilities of its neighbors ¡ For labeled nodes, initialize with ground-truth 𝑍 labels ¡ For unlabeled nodes, initialize 𝑍 uniformly ¡ Update all nodes in a random order until convergence or until maximum number of iterations is reached 10/17/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 19

¡ Repeat for each node 𝑗 and label 𝑑 1 𝑄 𝑍 / = 𝑑 = ∑ /,: ∈< 𝑋(𝑗, 𝑘) = 𝑋 𝑗, 𝑘 𝑄(𝑍 : = 𝑑) /,: ∈< § 𝑋(𝑗, 𝑘) is the edge strength from 𝑗 to 𝑘 ¡ Challenges: § Convergence is not guaranteed § Model cannot use node feature information 10/17/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 20

Initialization: All labeled nodes to their labels, and all unlabeled nodes uniformly P(Y = 1) = 1 P(Y=1) = 0.5 P(Y = 1) = 0.5 P(Y = 1) = 0.5 P(Y = 1) = 0 P(Y=1) = 0.5 P(Y = 1) = 0.5 P(Y = 1) = 1 P(Y = 1) = 0 10/17/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 21

¡ Update for the 1 st Iteration: § For node 3, N 3 ={1,2,4} P(Y = 1) = 1 P(Y=1|N 3 ) = 1/3 (0 + 0 + 0.5) = 0.17 P(Y = 1) = 0.5 P(Y = 1) = 0.5 P(Y = 1) = 0 P(Y=1) = 0.5 P(Y = 1) = 0.5 P(Y = 1) = 1 P(Y = 1) = 0 10/17/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 22

¡ Update for the 1 st Iteration: § For node 4, N 4 ={1,3, 5, 6} P(Y = 1) = 1 P(Y=1) = 0.17 P(Y = 1) = 0.5 P(Y = 1) = 0.5 P(Y = 1) = 0 P(Y=1|N 4 )= ¼(0+ 0.17+0.5+1) P(Y = 1) = 0.5 = 0.42 P(Y = 1) = 1 P(Y = 1) = 0 10/17/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 23

¡ Update for the 1 st Iteration: § For node 5, N 5 ={4,6,7,8} P(Y = 1) = 1 P(Y=1) = 0.17 P(Y=1|N 5 ) = ¼ (0.42+1+1+0.5) = 0.73 P(Y = 1) = 0.5 P(Y = 1) = 0 P(Y=1|N 4 )= 0.42 P(Y = 1) = 0.5 P(Y = 1) = 1 P(Y = 1) = 0 10/17/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 24

After Iteration 1 P(Y = 1) = 0.17 P(Y = 1) = 0.73 P(Y = 1) = 1.00 P(Y = 1) = 0 P(Y = 1) = 0.42 P(Y = 1) = 0.91 P(Y = 1) = 0 10/17/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 25

http://cs224w.stanford.edu Main question today: Given a network with - PowerPoint PPT Presentation

CS224W: Machine Learning with Graphs Jure Leskovec with Srijan Kumar , Stanford University http://cs224w.stanford.edu Main question today: Given a network with labels on some nodes, how do we assign labels to all other nodes in the network?

http://cs224w.stanford.edu October August 12/3/2013 Jure Leskovec, Stanford CS224W: Social and

http://cs224w.stanford.edu 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information

http://cs224w.stanford.edu Course website: Course website: http://cs224w.stanford.edu

http://cs224w.stanford.edu 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information

http://cs224w.stanford.edu ? ? ? ? Machine Learning ? Node classification 12/4/17 Jure

http://cs224w.stanford.edu Nodes Nodes Network Adjacency matrix 11/30/17 Jure Leskovec,

http://cs224w.stanford.edu ? ? ? ? Machine Learning ? Node classification 10/15/19 Jure

http://cs224w.stanford.edu Output: Node embeddings. We can also embed larger network

http://cs224w.stanford.edu Stanford Social Web (ca. 1999) network

http://cs224w.stanford.edu [LibenNowell Kleinberg 03] Link prediction task: Link

http://cs224w.stanford.edu Three topics for today: 1. GNN recommendation (PinSage) 2.

http://cs224w.stanford.edu Networks of tightly Networks of tightly connected groups

http://cs224w.stanford.edu Spreading through networks: Spreading through networks:

http://cs224w.stanford.edu Non overlapping vs overlapping communities Non overlapping

http://cs224w.stanford.edu Teams of 2 3 students (1 is also ok) Teams of 2 3 students

http://cs224w.stanford.edu How to organize/navigate it? How to organize/navigate it?

CS6501: T opics in Learning and Game Theory (Fall 2019) Prediction Markets and Scoring Rules

Machine Learning - MT 2016 11 & 12. Neural Networks Varun Kanade University of Oxford

Back-Propagation 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University back to the

On Stable Marriages and Greedy Matchings Fredrik Manne University of Bergen, Norway Md. Naim,

Methodical Approximate Hardware Design and Reuse Amir Yazdanbakhsh

Optimizing Collective Communication on Multicores Rajesh Nishtala 1 Katherine Yelick 1 1

About this class Two-Sided Matching (mostly from Roth and Sotomayor) 1 Basic Structure Two

Nonlinear Control Lecture # 10 Time Varying and Perturbed Systems Nonlinear Control Lecture #

http://cs224w.stanford.edu Main question today: Given a network with - PowerPoint PPT Presentation

CS224W: Machine Learning with Graphs Jure Leskovec with Srijan Kumar , Stanford University http://cs224w.stanford.edu Main question today: Given a network with labels on some nodes, how do we assign labels to all other nodes in the network?

http://cs224w.stanford.edu October August 12/3/2013 Jure Leskovec, Stanford CS224W: Social and

http://cs224w.stanford.edu 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information

http://cs224w.stanford.edu Course website: Course website: http://cs224w.stanford.edu

http://cs224w.stanford.edu 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information

http://cs224w.stanford.edu ? ? ? ? Machine Learning ? Node classification 12/4/17 Jure

http://cs224w.stanford.edu Nodes Nodes Network Adjacency matrix 11/30/17 Jure Leskovec,

http://cs224w.stanford.edu ? ? ? ? Machine Learning ? Node classification 10/15/19 Jure

http://cs224w.stanford.edu Output: Node embeddings. We can also embed larger network

http://cs224w.stanford.edu Stanford Social Web (ca. 1999) network

http://cs224w.stanford.edu [LibenNowell Kleinberg 03] Link prediction task: Link

http://cs224w.stanford.edu Three topics for today: 1. GNN recommendation (PinSage) 2.

http://cs224w.stanford.edu Networks of tightly Networks of tightly connected groups

http://cs224w.stanford.edu Spreading through networks: Spreading through networks:

http://cs224w.stanford.edu Non overlapping vs overlapping communities Non overlapping

http://cs224w.stanford.edu Teams of 2 3 students (1 is also ok) Teams of 2 3 students

http://cs224w.stanford.edu How to organize/navigate it? How to organize/navigate it?

CS6501: T opics in Learning and Game Theory (Fall 2019) Prediction Markets and Scoring Rules

Machine Learning - MT 2016 11 &amp; 12. Neural Networks Varun Kanade University of Oxford

Back-Propagation 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University back to the

On Stable Marriages and Greedy Matchings Fredrik Manne University of Bergen, Norway Md. Naim,

Methodical Approximate Hardware Design and Reuse Amir Yazdanbakhsh

Optimizing Collective Communication on Multicores Rajesh Nishtala 1 Katherine Yelick 1 1

About this class Two-Sided Matching (mostly from Roth and Sotomayor) 1 Basic Structure Two

Nonlinear Control Lecture # 10 Time Varying and Perturbed Systems Nonlinear Control Lecture #

Machine Learning - MT 2016 11 & 12. Neural Networks Varun Kanade University of Oxford