http cs224w stanford edu
play

http://cs224w.stanford.edu ? ? ? ? Machine Learning ? Node - PowerPoint PPT Presentation

CS224W: Machine Learning with Graphs Jure Leskovec, Stanford University http://cs224w.stanford.edu ? ? ? ? Machine Learning ? Node classification 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs,


  1. CS224W: Machine Learning with Graphs Jure Leskovec, Stanford University http://cs224w.stanford.edu

  2. ? ? ? ? Machine Learning ? Node classification 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 2

  3. ? ? x ? Machine Learning 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 4

  4. Β‘ (Supervised) Machine Learning Lifecycle requires feature engineering every single time! Raw Learning Structured Model Data Data Algorithm Automatically Feature Downstream learn the features Engineering task 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 5

  5. Goal: Efficient task-independent feature learning for machine learning with graphs! vec node u 𝑔: 𝑣 β†’ ℝ & ℝ & Feature representation, embedding 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 6

  6. Β‘ Task: We map each node in a network into a β€’ low-dimensional space Β§ Distributed representations for nodes – Β§ Similarity of embeddings between nodes indicates – their network similarity Β§ Encode network information and generate node – representation 17 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 7

  7. Β‘ 2D embeddings of nodes of the Zachary’s Karate Club network: β€’ Zachary’s Karate Network: Image from: Perozzi et al. DeepWalk: Online Learning of Social Representations. KDD 2014. 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 8

  8. Β‘ Modern deep learning toolbox is designed for simple sequences or grids. Β§ CNNs for fixed-size images/grids…. Β§ RNNs or word2vec for text/sequences… 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 9

  9. Β‘ But networks are far more complex! Β§ Complex topographical structure (i.e., no spatial locality like grids) Β§ No fixed node ordering or reference point (i.e., the isomorphism problem) Β§ Often dynamic and have multimodal features. 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 10

  10. Β‘ Assume we have a graph G : Β§ V is the vertex set. Β§ A is the adjacency matrix (assume binary). Β§ No node features or extra information is used! 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 12

  11. Β‘ Goal is to encode nodes so that similarity in the embedding space (e.g., dot product) approximates similarity in the original network 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 13

  12. Go Goal: similarity( u, v ) β‰ˆ z > v z u in the original network Similarity of the embedding Ne Need t to d define! 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 14

  13. Define an encoder (i.e., a mapping from 1. nodes to embeddings) Define a node similarity function (i.e., a 2. measure of similarity in the original network) Optimize the parameters of the encoder so 3. that: similarity( u, v ) β‰ˆ z > v z u in the original network Similarity of the embedding 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 15

  14. Β‘ Encoder: maps each node to a low- dimensional vector d -dimensional embedding enc ( v ) = z v node in the input graph Β‘ Similarity function: specifies how the relationships in vector space map to the relationships in the original network similarity( u, v ) β‰ˆ z > v z u Similarity of u and v in dot product between node the original network embeddings 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 16

  15. Β‘ Simplest encoding approach: encoder is just an embedding-lookup enc ( v ) = Zv matrix, each column is a node Z ∈ R d Γ— |V| embedding [w [what w we l learn!] !] indicator vector, all zeroes v ∈ I |V| except a one in column indicating node v 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 17

  16. Β‘ Simplest encoding approach: encoder is just an embedding-lookup embedding vector for a specific node embedding matrix Dimension/size Z = of embeddings one column per node 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 18

  17. Simplest encoding approach: encoder is just an embedding-lookup Each node is assigned to a unique embedding vector Many methods: DeepWalk, node2vec, TransE 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 19

  18. Β‘ Key choice of methods is how they define node similarity. Β‘ E.g., should two nodes have similar embeddings if they…. Β§ are connected? Β§ share neighbors? Β§ have similar β€œstructural roles”? Β§ …? 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 20

  19. Material based on: Perozzi et al. 2014. DeepWalk: Online Learning of Social Representations. KDD. β€’ Grover et al. 2016. node2vec: Scalable Feature Learning for Networks. KDD. β€’

  20. 10 9 12 2 8 1 11 3 Given a graph and a starting 4 point , we select a neighbor of it at random , and move to this neighbor; then we select 6 a neighbor of this point at 5 random, and move to it, etc. The (random) sequence of points selected this way is a 7 random walk on the graph . 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 22

  21. probability that u z > and v co-occur on u z v β‰ˆ a random walk over the network 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 23

  22. Estimate probability of visiting node π’˜ on a 1. random walk starting from node 𝒗 using some random walk strategy 𝑺 Optimize embeddings to encode these 2. random walk statistics: Similarity (here: dot product= cos(πœ„) ) encodes random walk β€œsimilarity” 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 24

  23. Expressivity: Flexible stochastic definition of 1. node similarity that incorporates both local and higher-order neighborhood information Efficiency: Do not need to consider all node 2. pairs when training; only need to consider pairs that co-occur on random walks 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 25

  24. Β‘ Intuition: Find embedding of nodes to d-dimensions that preserves similarity Β‘ Idea: Learn node embedding such that nearby nodes are close together in the network Β‘ Given a node 𝑣 , how do we define nearby nodes? Β§ 𝑂 7 𝑣 … neighbourhood of 𝑣 obtained by some strategy 𝑆 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 26

  25. Β‘ Given 𝐻 = (π‘Š, 𝐹) , Β‘ Our goal is to learn a mapping 𝑨: 𝑣 β†’ ℝ & . Β‘ Log-likelihood objective: max C log P(𝑂 J (𝑣)| 𝑨 D ) B D ∈F Β§ where 𝑂 7 (𝑣) is neighborhood of node 𝑣 by strategy 𝑆 Β‘ Given node 𝑣 , we want to learn feature representations that are predictive of the nodes in its neighborhood 𝑂 J (𝑣) 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 27

  26. Run short fixed-length random walks 1. starting from each node on the graph using some strategy R For each node 𝑣 collect 𝑂 7 (𝑣) , the multiset * 2. of nodes visited on random walks starting from u Optimize embeddings according to: Given 3. node 𝑣 , predict its neighbors 𝑂 J (𝑣) max C log P(𝑂 J (𝑣)| 𝑨 D ) B D ∈F * 𝑂 7 (𝑣) can have repeat elements since nodes can be visited multiple times on random walks 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 28

  27. X X L = βˆ’ log( P ( v | z u )) u ∈ V v ∈ N R ( u ) β€’ Intuition: Optimize embeddings to maximize likelihood of random walk co-occurrences β€’ Parameterize 𝑄(𝑀|π’œ 𝑣 ) using softmax: exp( z > Why softmax? u z v ) We want node 𝑀 to be P ( v | z u ) = most similar to node 𝑣 P n 2 V exp( z > u z n ) (out of all nodes π‘œ ). Intuition: βˆ‘ R exp 𝑦 R β‰ˆ max exp(𝑦 R ) R 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 30

  28. Putting it all together: exp( z > βœ“ β—† u z v ) X X L = βˆ’ log P n 2 V exp( z > u z n ) u 2 V v 2 N R ( u ) predicted probability of 𝑣 sum over nodes 𝑀 sum over all and 𝑀 co-occuring on seen on random nodes 𝑣 random walk walks starting from 𝑣 Optimizing random walk embeddings = Finding embeddings z u that minimize L 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 31

  29. But doing this naively is too expensive!! exp( z > βœ“ β—† u z v ) X X L = βˆ’ log P n 2 V exp( z > u z n ) u 2 V v 2 N R ( u ) Nested sum over nodes gives O(|V| 2 ) complexity! 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 32

Recommend


More recommend