x ? Machine Learning 5/4/20 Tim Althoff, UW CS547: Machine - PowerPoint PPT Presentation

? ? x ? Machine Learning 5/4/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 2

? ? ? ? Machine Learning ? Node classification 5/4/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 3

Classifying the function of proteins in the interactome Image from: Ganapathiraju et al. 2016. Schizophrenia interactome with 504 novel protein–protein interactions. Nature . 5/4/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 4

¡ (Supervised) Machine Learning Lifecycle requires feature engineering every single time! Raw Structured Learning Model Data Data Algorithm Automatically Feature Downstream Engineering learn the features task 5/4/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 5

Goal: Efficient task-independent feature learning for machine learning in networks! vec node u 𝑔: 𝑣 → ℝ ! ℝ ! Feature representation, embedding 5/4/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 6

Task: We map each node in a network to a • point in a low-dimensional space § Distributed representation for nodes – § Similarity of embedding between nodes indicates – their network similarity § Encode network information and generate node – representation 17 5/4/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 7

2D embedding of nodes of the Zachary’s Karate Club network: • Zachary’s Karate Network: Image from: Perozzi et al. DeepWalk: Online Learning of Social Representations. KDD 2014. 5/4/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 8

¡ Modern deep learning toolbox is designed for simple sequences or grids § CNNs for fixed-size images/grids…. § RNNs or word2vec for text/sequences… 5/4/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 9

But networks are far more complex! ¡ Complex topographical structure (no spatial locality like grids) vs vs. Text Te Networks ks Im Imag ages es ¡ No fixed node ordering or reference point ¡ Often dynamic and have multimodal features. 5/4/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 10

Assume we have a graph G : ¡ V is the vertex set ¡ A is the adjacency matrix (assume binary) ¡ No node features or extra information is used! 5/4/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 12

¡ Goal is to encode nodes so that similarity in the embedding space (e.g., dot product) approximates similarity in the original network 5/4/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 13

Goal: Go similarity( u, v ) ≈ z > v z u in the original network Similarity of the embedding Ne Need t to d define! 5/4/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 14

Define an encoder (i.e., a mapping from 1. nodes to embeddings) Define a node similarity function (i.e., a 2. measure of similarity in the original network) Optimize the parameters of the encoder 3. so that: similarity( u, v ) ≈ z > v z u in the original network Similarity of the embedding 5/4/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 15

¡ Encoder maps each node to a low- dimensional vector d -dimensional embedding enc ( v ) = z v node in the input graph ¡ Similarity function specifies how relationships in vector space map to relationships in the original network similarity( u, v ) ≈ z > v z u Similarity of u and v in dot product between node the original network embeddings 5/4/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 16

¡ Simplest encoding approach: encoder is just an embedding-lookup enc ( v ) = Zv Matrix, each column is 𝑒 -dim node Z ∈ R d × |V| embedding [w [what w we l learn!] !] Indicator vector, all zeroes v ∈ I |V| except a one in column indicating node 𝑤 5/4/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 17

¡ Simplest encoding approach: encoder is just an embedding-lookup embedding vector for a specific node embedding matrix Dimension/size Z = of embeddings one column per node 5/4/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 18

Simplest encoding approach: encoder is just an embedding-lookup Each node is assigned a unique embedding vector Many methods: node2vec, DeepWalk, LINE 5/4/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 19

Key choice of methods is how they define node similarity. E.g., should two nodes have similar embeddings if they…. ¡ are connected? ¡ share neighbors? ¡ have similar “structural roles”? ¡ …? 5/4/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 20

Material based on: Perozzi et al. 2014. DeepWalk: Online Learning of Social Representations. KDD. • Grover et al. 2016. node2vec: Scalable Feature Learning for Networks. KDD. •

Probability that 𝑣 z > and 𝑤 co-occur on u z v ≈ a random walk over the network 𝑨 ! … embedding of node 𝑣 5/4/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 22

Estimate probability of visiting node 𝒘 on a 1. random walk starting from node 𝒗 using some random walk strategy 𝑺 Optimize embeddings to encode these 2. random walk statistics: 𝑨 ! Similarity (here: dot product ≈ cos(𝜄) ) encodes random walk “similarity” 𝑨 " 5/4/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 23

Expressivity: Flexible stochastic 1. definition of node similarity that incorporates both local and higher- order neighborhood information Efficiency: Do not need to consider all 2. node pairs when training; only need to consider pairs that co-occur on random walks 5/4/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 24

¡ Intuition: Find embedding of nodes to 𝑒 -dimensional space so that node similarity is preserved ¡ Idea: Learn node embedding such that nearby nodes are close together in the network ¡ Given a node 𝒗 , how do we define nearby nodes? § 𝑂 ! 𝑣 … neighbourhood of 𝑣 obtained by some strategy 𝑆 5/4/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 25

¡ Given 𝐻 = (𝑊, 𝐹) ¡ Our goal is to learn a mapping 𝑨: 𝑣 → ℝ ! ¡ Maximize log-likelihood objective: max 8 log P(𝑂 & (𝑣)| 𝑨 # ) " # ∈% § where 𝑂 ! (𝑣) is neighborhood of node 𝑣 ¡ Given node 𝑣 , we want to learn feature representations predictive of nodes in its neighborhood 𝑂 & (𝑣) 5/4/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 26

Run short fixed-length random walks 1. starting from each node on the graph using some strategy R For each node 𝑣 collect 𝑂 ' (𝑣) , the multiset * 2. of nodes visited on random walks starting from u Optimize embeddings according to: Given 3. node 𝑣 , predict its neighbors 𝑂 & (𝑣) max 8 log P(𝑂 & (𝑣)| 𝑨 # ) " # ∈% * 𝑂 ! (𝑣) can have repeat elements since nodes can be visited multiple times on random walks 5/4/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 27

max 8 log P(𝑂 & (𝑣)| 𝑨 # ) " # ∈% ¡ Assumption: Conditional likelihood factorizes over the set of neighbors: log P(𝑂 & (𝑣)|𝑨 # ) = 8 log P(z ( | 𝑨 # ) (∈) ! (#) ¡ Softmax parametrization: Why softmax? ,-.(/ " ⋅" # ) Pr z ( 𝑨 # ) = We want node 𝑤 to be most similar to node 𝑣 ∑ $∈& ,-.(/ ' ⋅" # ) (out of all nodes 𝑜 ). Intuition: ∑ " exp 𝑦 " ≈ max exp(𝑦 " ) " 5/4/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 28

Putting it all together: exp( z > ✓ ◆ u z v ) X X L = − log P n 2 V exp( z > u z n ) u 2 V v 2 N R ( u ) predicted probability of 𝑣 sum over nodes 𝑤 sum over all and 𝑤 co-occuring on seen on random nodes 𝑣 random walk walks starting from 𝑣 Optimizing random walk embeddings = Finding node embeddings 𝒜 that minimize L 5/4/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 30

But doing this naively is too expensive!! exp( z > ✓ ◆ u z v ) X X L = − log P n 2 V exp( z > u z n ) u 2 V v 2 N R ( u ) Nested sum over nodes gives O(|V| 2 ) complexity! 5/4/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 31

But doing this naively is too expensive!! exp( z > ✓ ◆ u z v ) X X L = − log P n 2 V exp( z > u z n ) u 2 V v 2 N R ( u ) The normalization term from the softmax is the culprit… can we approximate it? 5/4/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 32

x ? Machine Learning 5/4/20 Tim Althoff, UW CS547: Machine - PowerPoint PPT Presentation

? ? x ? Machine Learning 5/4/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 2 ? ? ? ? Machine Learning ? Node classification 5/4/20 Tim Althoff, UW CS547: Machine Learning for Big Data,

struc2vec : Learning Node Representations from Structural Identity Leonardo Ribeiro, Pedro

OPERATIONS FRAMEWORK BUILDING YOUR PRACTICAL INTERNAL RED TEAM ABHIJITH ABHIJITH B R B R [Abx

Morphology of the Worlds Languages, June 11-13 2009, Leipzig Speech errors in nominalized

TRINITY OF AI Anima Anandkumar AI is in the news every day But what is intelligence? 2 Which

Sean P. Cornelius With Emma K. Towlson and Albert-Lszl Barabsi www.BarabasiLab.com

An Information Flow Model for Conflict and Fission in Small Groups By: Wayne W. Zachary

Kipf, T., Welling, M.: Semi-Supervised Classification with Graph Convolutional Networks Radim

A Distance Measure for the Analysis of Polar Opinion Dynamics in Social Networks Victor Amelkin

Statistical Inference for Networks 4th Lehmann Symposium, Rice University, May 2011 Peter Bickel

CMU 15-251 Graphs: Basics Teachers: Anil Ada Ariel Procaccia (this time) Zachary Karate Club

Recommender Systems Instructor: Ekpe Okorafor 1. Accenture Big Data Academy 2. Computer

Social and Technological Networks Rik Sarkar Social Networks Network of friends Node:

DeepWalk: Online Learning of Social Representations ACM SIG-KDD August 26, 2014 Bryan Perozzi ,

Simplifying Graph Convolutional Networks Amauri Holanda Felix Wu* Tianyi Zhang* Christopher

Graph Sampling and Sparsification Lecture 19 CSCI 4974/6971 7 Nov 2016 1 / 10 Todays Biz 1.

Understanding the Worldview Techniques to help you break through the firmament. Business Analysis

Edge-based graph partitioning Outline Introduction 2D Medium-grain Rob H. Bisseling

L ECTURE 35: N ETWORKS 2 T EACHER : G IANNI A. D I C ARO I MPORTANCE / P OWER IN NETWORKS Certain

Localization and Spreading of Diseases in Networks A. V. Goltsev, S. N. Dorogovtsev, J. G.

Multi-join Query Evaluation on Big Data Lecture 1 Dan Suciu March, 2015 Dan Suciu Multi-Joins

Community detection in complex networks Vinh Loc DAO Summary 1 Introduction 2 Datasets and

Mixture Models David M. Blei COS424 Princeton University March 2, 2012 Unsupervised learning

Mu l t i - s c a l e mi x i n g i n c o mp l e x n e t w o r k s L

Yellow ! A Remote Locations E-Gen Yellow A Energy Generation for your on the go lifestyle