message passing and node classification
play

Message Passing and Node Classification Prof. Srijan Kumar 1 - PowerPoint PPT Presentation

CSE 6240: Web Search and Text Mining. Spring 2020 Message Passing and Node Classification Prof. Srijan Kumar 1 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining Outline Main question today: Given a network with


  1. CSE 6240: Web Search and Text Mining. Spring 2020 Message Passing and Node Classification Prof. Srijan Kumar 1 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  2. Outline • Main question today: Given a network with labels on some nodes, how do we labels all the other nodes? • Example: In a network, some nodes are fraudsters and some nodes are fully trusted . How do you find the other fraudsters and trustworthy nodes? 2 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  3. Intuition • Collective classification: Idea of assigning labels to all nodes in a network together – Leverage the correlations in the network! • We will look at three techniques today: – Relational classification – Iterative classification – Belief propagation 3 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  4. Today’s Lecture • Overview of collective classification • Relational classification • Iterative classification • Belief propagation The lecture slides are borrowed from Prof. Jure Leskovec’s slides from CS224W 4 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  5. Correlations Exists in Networks Example: • Real social network – Nodes = people – Edges = friendship – Node color = race • People are segregated by race due to homophily (Easley and Kleinberg, 2010) 5 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  6. Classification with Network Data • How to leverage this correlation observed in networks to help predict user attributes or interests? How to predict the labels for the nodes in yellow? 6 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  7. Motivation • Similar entities are typically close together or directly connected: – “Guilt-by-association”: If I am connected to a node with label X, then I am likely to have label X as well. – Example: Malicious/benign web page: Malicious web pages link to one another to increase visibility, look credible, and rank higher in search engines 7 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  8. Intuition • Classification label of a node O in network may depend on: – Features of O – Labels of the objects in O’s neighborhood – Features of objects in O’s neighborhood 8 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  9. Guilt-by-association Given : • few graph and • labeled nodes Find : class (red/green) for rest nodes Assuming : networks have homophily 9 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  10. Guilt-By-Association • Let 𝑿 be a 𝑜×𝑜 (weighted) adjacency matrix over 𝑜 nodes • Let Y = −1, 0, 1 ) be a vector of labels : – 1: positive node, known to be involved in a gene function/biological process – -1: negative node – 0: unlabeled node • Goal: Predict which unlabeled nodes are likely positive 10 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  11. Collective Classification • Intuition : simultaneous classification of interlinked objects using correlations • Several applications – Document classification – Part of speech tagging – Link prediction – Optical character recognition – Image/3D data segmentation – Entity resolution in sensor networks – Spam and fraud detection 11 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  12. Collective Classification Overview • Markov Assumption: the label Y i of one node i depends on the label of its neighbors N i 𝑄(𝑍 - |𝑗) = 𝑄 𝑍 - 𝑂 - ) • Collective classification involves 3 steps: Local Classifier Relational Classifier Collective Inference • Assign initial label • Capture • Propagate correlations correlations between nodes through network 12 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  13. Collective Classification Overview • Predicts label based on node attributes/features Local Classifier • Classical classification • Does not employ network information • Assign initial label Learn a classifier from the labels or/and attributes of • Relational Classifier its neighbors to label one node Network information is used • • Capture correlations between nodes Apply relational classifier to each node iteratively • Collective Inference Iterate until the inconsistency between neighboring • labels is minimized • Propagate Network structure substantially affects the final • correlations prediction through network 13 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  14. Today’s Lecture • Overview of collective classification • Relational classification • Iterative classification • Belief propagation 14 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  15. Problem Setting • How to predict the labels Y i for the nodes i in yellow? – Each node i has a feature vector f i – Labels for some nodes are given (+ for green, - for blue) • Task: find P(Y i ) given the network and features P(Y i ) = ? 15 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  16. Probabilistic Relational Classifier • Basic idea : Class probability of Y i is a weighted average of class probabilities of its neighbors. • For labeled nodes, initialize with ground- truth Y labels • For unlabeled nodes, initialize Y uniformly • Update all nodes in a random order till convergence or till maximum number of iterations is reached 16 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  17. Probabilistic Relational Classifier • Repeat for each node i and label c – W(i,j) is the edge strength from i to j – |N i | is the number of neighbors of I • Challenges: – Convergence is not guaranteed – Model cannot use node feature information 17 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  18. Example Initialization: All labeled nodes to their labels and all unlabeled nodes uniformly P(Y = 1) = 1 P(Y=1) = 0.5 P(Y = 1) = 0.5 P(Y = 1) = 0.5 P(Y = 1) = 0 P(Y=1) = 0.5 P(Y = 1) = 0.5 P(Y = 1) = 1 P(Y = 1) = 0 18 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  19. Example • Update for the 1 st Iteration: – For node 3, N 3 ={1,2,4} P(Y = 1) = 1 P(Y=1|N 3 ) = 1/3 (0 + 0 + 0.5) = 0.17 P(Y = 1) = 0.5 P(Y = 1) = 0.5 P(Y = 1) = 0 P(Y=1) = 0.5 P(Y = 1) = 0.5 P(Y = 1) = 1 P(Y = 1) = 0 19 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  20. Example • Update for the 1 st Iteration: – For node 4, N 4 ={1,3, 5, 6} P(Y = 1) = 1 P(Y=1) = 0.17 P(Y = 1) = 0.5 P(Y = 1) = 0.5 P(Y = 1) = 0 P(Y=1|N 4 )= ¼(0+ 0.17+0.5+1) P(Y = 1) = 0.5 = 0.42 P(Y = 1) = 1 P(Y = 1) = 0 20 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  21. Example • Update for the 1 st Iteration: – For node 5, N 5 ={4,6,7,8} P(Y = 1) = 1 P(Y=1) = 0.17 P(Y=1|N 5 ) = ¼ (0.42+1+1+0.5) = 0.73 P(Y = 1) = 0.5 P(Y = 1) = 0 P(Y=1|N 4 )= 0.42 P(Y = 1) = 0.5 P(Y = 1) = 1 P(Y = 1) = 0 21 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  22. Example After Iteration 1 P(Y = 1) = 0.17 P(Y = 1) = 0.73 P(Y = 1) = 1.00 P(Y = 1) = 0 P(Y = 1) = 0.42 P(Y = 1) = 0.91 P(Y = 1) = 0 22 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  23. Example After Iteration 2 P(Y = 1) = 0.14 P(Y = 1) = 0.85 P(Y = 1) = 1.00 P(Y = 1) = 0 All neighbors values are P(Y = 1) = fixed. So the value can not 0.47 change. P(Y = 1) = 0.95 P(Y = 1) = 0 23 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  24. Example After Iteration 3 P(Y = 1) = 0.16 P(Y = 1) = 0.86 P(Y = 1) = 1.00 P(Y = 1) = 0 P(Y = 1) = 0.50 P(Y = 1) = 0.95 P(Y = 1) = 0 24 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  25. Example After Iteration 4 P(Y = 1) = 0.16 P(Y = 1) = 0.86 P(Y = 1) = 1.00 P(Y = 1) = 0 P(Y = 1) = 0.51 P(Y = 1) = 0.95 P(Y = 1) = 0 25 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  26. Example • All scores stabilize after 5 iterations • Final labeling – Nodes 5, 8, 9 are + (P(Y i = 1) > 0.5) – Node 3 is – (P(Y i = 1) < 0.5) – Node 4 is in between (P(Y i = 1) =0.5) - + + +/- + 26 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  27. Today’s Lecture • Overview of collective classification • Relational classification • Iterative classification • Belief propagation 27 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  28. Iterative Classification • Relational classifiers do not use node attributes – How can one leverage them? • Main idea of iterative classification: classify node i based on its attributes as well as labels of neighbor set N i 28 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  29. Iterative Classification: Process 1. Create a feature vector a i for each node i 2. Train a classifier to classify using a i 3. Node may have various number of neighbors, so we can aggregate using: count , mode, proportion, mean, exists, etc. 29 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Recommend


More recommend