Message Passing and Node Classification Prof. Srijan Kumar 1 - PowerPoint PPT Presentation

CSE 6240: Web Search and Text Mining. Spring 2020 Message Passing and Node Classification Prof. Srijan Kumar 1 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Outline • Main question today: Given a network with labels on some nodes, how do we labels all the other nodes? • Example: In a network, some nodes are fraudsters and some nodes are fully trusted . How do you find the other fraudsters and trustworthy nodes? 2 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Intuition • Collective classification: Idea of assigning labels to all nodes in a network together – Leverage the correlations in the network! • We will look at three techniques today: – Relational classification – Iterative classification – Belief propagation 3 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Today’s Lecture • Overview of collective classification • Relational classification • Iterative classification • Belief propagation The lecture slides are borrowed from Prof. Jure Leskovec’s slides from CS224W 4 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Correlations Exists in Networks Example: • Real social network – Nodes = people – Edges = friendship – Node color = race • People are segregated by race due to homophily (Easley and Kleinberg, 2010) 5 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Classification with Network Data • How to leverage this correlation observed in networks to help predict user attributes or interests? How to predict the labels for the nodes in yellow? 6 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Motivation • Similar entities are typically close together or directly connected: – “Guilt-by-association”: If I am connected to a node with label X, then I am likely to have label X as well. – Example: Malicious/benign web page: Malicious web pages link to one another to increase visibility, look credible, and rank higher in search engines 7 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Intuition • Classification label of a node O in network may depend on: – Features of O – Labels of the objects in O’s neighborhood – Features of objects in O’s neighborhood 8 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Guilt-by-association Given : • few graph and • labeled nodes Find : class (red/green) for rest nodes Assuming : networks have homophily 9 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Guilt-By-Association • Let 𝑿 be a 𝑜×𝑜 (weighted) adjacency matrix over 𝑜 nodes • Let Y = −1, 0, 1 ) be a vector of labels : – 1: positive node, known to be involved in a gene function/biological process – -1: negative node – 0: unlabeled node • Goal: Predict which unlabeled nodes are likely positive 10 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Collective Classification • Intuition : simultaneous classification of interlinked objects using correlations • Several applications – Document classification – Part of speech tagging – Link prediction – Optical character recognition – Image/3D data segmentation – Entity resolution in sensor networks – Spam and fraud detection 11 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Collective Classification Overview • Markov Assumption: the label Y i of one node i depends on the label of its neighbors N i 𝑄(𝑍 - |𝑗) = 𝑄 𝑍 - 𝑂 - ) • Collective classification involves 3 steps: Local Classifier Relational Classifier Collective Inference • Assign initial label • Capture • Propagate correlations correlations between nodes through network 12 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Collective Classification Overview • Predicts label based on node attributes/features Local Classifier • Classical classification • Does not employ network information • Assign initial label Learn a classifier from the labels or/and attributes of • Relational Classifier its neighbors to label one node Network information is used • • Capture correlations between nodes Apply relational classifier to each node iteratively • Collective Inference Iterate until the inconsistency between neighboring • labels is minimized • Propagate Network structure substantially affects the final • correlations prediction through network 13 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Today’s Lecture • Overview of collective classification • Relational classification • Iterative classification • Belief propagation 14 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Problem Setting • How to predict the labels Y i for the nodes i in yellow? – Each node i has a feature vector f i – Labels for some nodes are given (+ for green, - for blue) • Task: find P(Y i ) given the network and features P(Y i ) = ? 15 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Probabilistic Relational Classifier • Basic idea : Class probability of Y i is a weighted average of class probabilities of its neighbors. • For labeled nodes, initialize with ground- truth Y labels • For unlabeled nodes, initialize Y uniformly • Update all nodes in a random order till convergence or till maximum number of iterations is reached 16 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Probabilistic Relational Classifier • Repeat for each node i and label c – W(i,j) is the edge strength from i to j – |N i | is the number of neighbors of I • Challenges: – Convergence is not guaranteed – Model cannot use node feature information 17 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Example Initialization: All labeled nodes to their labels and all unlabeled nodes uniformly P(Y = 1) = 1 P(Y=1) = 0.5 P(Y = 1) = 0.5 P(Y = 1) = 0.5 P(Y = 1) = 0 P(Y=1) = 0.5 P(Y = 1) = 0.5 P(Y = 1) = 1 P(Y = 1) = 0 18 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Example • Update for the 1 st Iteration: – For node 3, N 3 ={1,2,4} P(Y = 1) = 1 P(Y=1|N 3 ) = 1/3 (0 + 0 + 0.5) = 0.17 P(Y = 1) = 0.5 P(Y = 1) = 0.5 P(Y = 1) = 0 P(Y=1) = 0.5 P(Y = 1) = 0.5 P(Y = 1) = 1 P(Y = 1) = 0 19 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Example • Update for the 1 st Iteration: – For node 4, N 4 ={1,3, 5, 6} P(Y = 1) = 1 P(Y=1) = 0.17 P(Y = 1) = 0.5 P(Y = 1) = 0.5 P(Y = 1) = 0 P(Y=1|N 4 )= ¼(0+ 0.17+0.5+1) P(Y = 1) = 0.5 = 0.42 P(Y = 1) = 1 P(Y = 1) = 0 20 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Example • Update for the 1 st Iteration: – For node 5, N 5 ={4,6,7,8} P(Y = 1) = 1 P(Y=1) = 0.17 P(Y=1|N 5 ) = ¼ (0.42+1+1+0.5) = 0.73 P(Y = 1) = 0.5 P(Y = 1) = 0 P(Y=1|N 4 )= 0.42 P(Y = 1) = 0.5 P(Y = 1) = 1 P(Y = 1) = 0 21 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Example After Iteration 1 P(Y = 1) = 0.17 P(Y = 1) = 0.73 P(Y = 1) = 1.00 P(Y = 1) = 0 P(Y = 1) = 0.42 P(Y = 1) = 0.91 P(Y = 1) = 0 22 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Example After Iteration 2 P(Y = 1) = 0.14 P(Y = 1) = 0.85 P(Y = 1) = 1.00 P(Y = 1) = 0 All neighbors values are P(Y = 1) = fixed. So the value can not 0.47 change. P(Y = 1) = 0.95 P(Y = 1) = 0 23 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Example • All scores stabilize after 5 iterations • Final labeling – Nodes 5, 8, 9 are + (P(Y i = 1) > 0.5) – Node 3 is – (P(Y i = 1) < 0.5) – Node 4 is in between (P(Y i = 1) =0.5) - + + +/- + 26 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Today’s Lecture • Overview of collective classification • Relational classification • Iterative classification • Belief propagation 27 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Iterative Classification • Relational classifiers do not use node attributes – How can one leverage them? • Main idea of iterative classification: classify node i based on its attributes as well as labels of neighbor set N i 28 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Iterative Classification: Process 1. Create a feature vector a i for each node i 2. Train a classifier to classify using a i 3. Node may have various number of neighbors, so we can aggregate using: count , mode, proportion, mean, exists, etc. 29 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Message Passing and Node Classification Prof. Srijan Kumar 1 - PowerPoint PPT Presentation

CSE 6240: Web Search and Text Mining. Spring 2020 Message Passing and Node Classification Prof. Srijan Kumar 1 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining Outline Main question today: Given a network with

Title node 1 branch 1 branch 2 node 2 root branch 3 node 3 branch 4 node 4 Title node

Anonymity and Censorship Resistance Entry node Middle node Exit node Tor user Tor Node Tor

COMP31212: Concurrency Topics 4.3: Message Passing Topic 4.3: Message Passing Outline Topic

Message Passing Concepts Message Passing Model The message passing model is based on the

Message Passing Programming with MPI What is MPI? Message Passing Programming with MPI 1

Message Passing Programming with MPI Message Passing Programming with MPI 1 What is MPI?

Message-Passing Programming with MPI Message-Passing Concepts Overview This lecture will

MPI - Message Passing Interface MPI is the mostly used message passing-standard By

Interference Alignment via Message-Passing Message-Passing M. Guillaud Motivation Maxime

Distributed Objects Message Passing vs. Distributed Objects Message Passing versus Distributed

1 Agenda Quick'Intro' Node.js:'The'Beginning' What'Is'Node.js? Why'Use'Node.js?

Fault Tolerance in Message Passing Fault Tolerance in Message Passing and in Action and in

Message passing and channels INF4140 - Models of concurrency Message passing and channels Fall

Message Passing Dr. Liam OConnor University of Edinburgh LFCS (and UNSW) Term 2 2020 1

Message Passing Dr. Liam OConnor University of Edinburgh LFCS (and UNSW) Term 2 2020 1

+ Design of Parallel Algorithms Introduction to the Message Passing Interface MPI + Principles

Introduction to Municipal Bonds A. Reid Cavnar Senior Vice President

Priority Technology Holdings, Inc. Slides Supplementing First Quarter 2019 Earnings Call Total

Regulating Transportation Network Companies: Fourth level Fifth level Should Uber and

Divergent Costs & Growth Lessons from Californias Intermodal Seaports Energy Foundation

Quantum Authentication with Key Recycling Christopher Portmann Dept. Physics, ETH Zurich,

CASCO N 2002 CASCO N 2002 Outline Basic Concepts API Architecture API Programming

Messaging is not just for investment banks! (+ web is not the only way) Gojko Adzic

Java Programming Unit 16 JNDI. Java Messaging Service.

Message Passing and Node Classification Prof. Srijan Kumar 1 - PowerPoint PPT Presentation

CSE 6240: Web Search and Text Mining. Spring 2020 Message Passing and Node Classification Prof. Srijan Kumar 1 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining Outline Main question today: Given a network with

Title node 1 branch 1 branch 2 node 2 root branch 3 node 3 branch 4 node 4 Title node

Anonymity and Censorship Resistance Entry node Middle node Exit node Tor user Tor Node Tor

COMP31212: Concurrency Topics 4.3: Message Passing Topic 4.3: Message Passing Outline Topic

Message Passing Concepts Message Passing Model The message passing model is based on the

Message Passing Programming with MPI What is MPI? Message Passing Programming with MPI 1

Message Passing Programming with MPI Message Passing Programming with MPI 1 What is MPI?

Message-Passing Programming with MPI Message-Passing Concepts Overview This lecture will

MPI - Message Passing Interface MPI is the mostly used message passing-standard By

Interference Alignment via Message-Passing Message-Passing M. Guillaud Motivation Maxime

Distributed Objects Message Passing vs. Distributed Objects Message Passing versus Distributed

1 Agenda Quick'Intro' Node.js:'The'Beginning' What'Is'Node.js? Why'Use'Node.js?

Fault Tolerance in Message Passing Fault Tolerance in Message Passing and in Action and in

Message passing and channels INF4140 - Models of concurrency Message passing and channels Fall

Message Passing Dr. Liam OConnor University of Edinburgh LFCS (and UNSW) Term 2 2020 1

Message Passing Dr. Liam OConnor University of Edinburgh LFCS (and UNSW) Term 2 2020 1

+ Design of Parallel Algorithms Introduction to the Message Passing Interface MPI + Principles

Introduction to Municipal Bonds A. Reid Cavnar Senior Vice President

Priority Technology Holdings, Inc. Slides Supplementing First Quarter 2019 Earnings Call Total

Regulating Transportation Network Companies: Fourth level Fifth level Should Uber and

Divergent Costs &amp; Growth Lessons from Californias Intermodal Seaports Energy Foundation

Quantum Authentication with Key Recycling Christopher Portmann Dept. Physics, ETH Zurich,

CASCO N 2002 CASCO N 2002 Outline Basic Concepts API Architecture API Programming

Messaging is not just for investment banks! (+ web is not the only way) Gojko Adzic

Java Programming Unit 16 JNDI. Java Messaging Service.

Divergent Costs & Growth Lessons from Californias Intermodal Seaports Energy Foundation