Graph Embeddings Alicia Frame, PhD October 10, 2019
Overview What’s an embedding? How do these work? Motivating Example - Word2Vec - Motivating Example - DeepWalk - Graph embeddings overview Graph embedding techniques Graph embeddings with Neo4j 2
TL;DR - what’s an embedding? What does the internet say? Google: “An embedding is a relatively - low-dimensional space into which you can translate high-dimensional vectors” Wikipedia : “In mathematics, an embedding - is one instance of some mathematical structure contained within another instance, such as a group that is a subgroup.” A way of mapping something (a document, an image, a graph) into a fixed length vector (or matrix) that captures key features while reducing the dimensionality 3
So what’s a graph embedding? Graph embeddings are a specific type of embedding that translate graphs, or parts of graphs, to fixed length vectors (or tensors) 4
But why bother? An embedding translates something complex into something a machine can work with Represents the important features of the input object in a - compact, low dimensional format Embedded representation can be used as a feature for ML, for - direct comparisons, or as an input representation for a DL model Embeddings - typically - learn what’s important in an unsupervised, generalizable way . 5
Motivating Examples 6
Motivating example: Word Embeddings How can I represent words in a way that I can use them mathematically? How similar are two words? - Can I use the representation of a word in a model? - Naive approach - how similar are the strings? Hand engineered rules? - How many of each letter? - CAT = [10100000000000000001000000] 7
Motivating example: Word Embeddings Can we use documents to encode words? Frequency matrix: Weighted term frequency (TF-IDF) 8
Motivating example: Word Embeddings Word order probably matters too: Words that occur together have similar contexts. He is not lazy “Tylenol is a pain reliever,” - He is intelligent “Paracetamol is a pain reliever” He is smart same context Co-occurence: how often do two - words appear in the same He is not lazy context window? He is intelligent Context window: specific - He is smart number and direction 9
Motivating example: Word Embeddings Word order probably matters too: Words that occur together have similar contexts. “Tylenol is a pain reliever,” - “Paracetamol is a pain reliever” 3 same context 3 Co-occurence: how often do two - words appear in the same context window? Context window: specific - number and direction 10
Motivating example: Word Embeddings Why not stop here? You need more documents to really understand context … but - the more documents you have the bigger your matrix is Giant sparse matrices or vectors are cumbersome and - uninformative We need to reduce the dimensionality of our matrix 11
Motivating Example: Word Embeddings Count Based Methods: Linear algebra to the rescue? Pros: Preserves semantic relationships, accurate, known methods Cons: Huge memory requirements, not trained for a specific task 12
Motivating Example: Word Embeddings Predictive Methods: learn an embedding for a specific task 13
Motivating Example: Word Embeddings Predictive Methods: learn an embedding for a specific task 14
Motivating Example: Word Embeddings The SkipGram model learns a vector representation for each word that maximizes the probability of that word given the previous words output prediction - probability, for each word in the corpus, that it’s the next word input word - one hot encoded vector 15
Motivating Example: Word Embeddings The SkipGram model learns a vector representation for each word that maximizes the probability of that word given the previous words output prediction - probability, for each word in the corpus, that it’s the next word input word - one hot encoded vector 16
Motivating Example: Word Embeddings The SkipGram model learns a vector representation for each word that maximizes the probability of that word given the previous words The hidden layer is a weight matrix with one row per word, and one column per neuron -- this is the embedding! 17
(if we really want to get into the math) Maximize the probability that the next word is w_t given h: Train model by maximizing the log-likelihood over the training set: Skipgram model calculates: 18
Motivating Example: Word Embeddings Word embeddings condense representations of the words while preserving context: 19
Cool, but what’s this got to do with graphs? 20
Motivating example: DeepWalk How do we represent a node in a graph mathematically? Can we adapt word2vec? Each node is like a word - Neighborhood around the node is the context window - 21
Motivating example: DeepWalk Extract the context for each node by sampling random walks from the graph: For every node in the graph, take n fixed length random walks (equivalent to sentences) 22
Motivating example: DeepWalk Once we have our sentences, we can extract the context windows and learn weights using the same skip-gram model (Objective is to predict neighboring nodes given the target node) 23
Motivating example: DeepWalk Embeddings are the hidden layer weights from the skipgram model Note: there are also equivalent methodologies to the matrix factorization approaches or hand engineered approaches we talked about for words as well 24
Graph Embeddings Overview 25
There are lots of graph embeddings... What type of graph are you trying to create an embedding for? Monopartite graphs (DeepWalk is designed for these) - Multipartite graphs (eg. Knowledge Graphs) - What aspect of the graph are you trying to represent? Vertex embeddings: describe connectivity of each node - Path embeddings: traversals across the graph - Graph embeddings: encode an entire graph into a single vector - 26 What tp
Node embedding overview Most techniques consist of: - A similarity function that measures the similarity between nodes - An encoder function : generates the node embedding - A decoder function to reconstruct pairwise similarity - A loss function that measures how good your reconstruction is 27
Shallow Graph Embedding Techniques Shallow - Encoder function is an embedding lookup Matrix Factorization: These techniques all rely on an adjacency matrix input - Matrix factorization is applied either directly to the input or - some transformation of the input Random Walk: Obtain node co-occurrence via random walks - Learn weight to optimize similarity measure - 28
Shallow Graph Embedding Techniques Shallow - Encoder function is an embedding lookup Matrix Factorization: These techniques all rely on an adjacency matrix input - Massive memory footprint Matrix factorization is applied either directly to the input or Computationally intense - some transformation of the input Random Walk: Obtain node co-occurrence via random walks - Learn weight to optimize similarity measure - 29
Shallow Graph Embedding Techniques Shallow - Encoder function is an embedding lookup Matrix Factorization: These techniques all rely on an adjacency matrix input - Massive memory footprint Matrix factorization is applied either directly to the input or Computationally intense - some transformation of the input Random Walk: Obtain node co-occurrence via random walks Local-only perspective - Assumes similar nodes are close together Learn weight to optimize similarity measure - 30
Shallow Graph Embedding Techniques Matrix Factorization: These techniques all rely on an adjacency matrix input - Massive memory footprint Matrix factorization is applied either directly to the input or - Computationally intense some transformation of the input Random Walk: Obtain node co-occurrence via random walks - Local-only perspective Learn weight to optimize similarity measure - Assumes similar nodes are close together 31
Shallow Embeddings Why not stick with these? Shallow embeddings are inefficient - no parameters shared - between nodes Can’t leverage node attributes - Only generate embeddings for nodes present when the - embedding was trained - problematic for large, evolving graphs Newer methodologies - compress information Neighborhood autoencoder methods - Neighborhood aggregation - Convolutional autoencoders - 32
Autoencoder methods 33
Using Graph Embeddings 34
Why are we going to all this trouble? Visualization & pattern discovery: Leveraging lots of existing - t-SNE plots - PCA - Clustering and community detection: Apply generic tabular data approaches (eg. k-means) but allows - capture of both functional and structural roles KNN graphs based on embedding similarity - 35
Why are we going to all this trouble? Node classification/semi-supervised learning Predict missing node attributes Link prediction predict edges not present in the graph - Either using similarity measures/heuristics or ML pipelines - Embeddings can make the graph algorithm library even more powerful ! 36
Graph Embeddings in Neo4j 37
Recommend
More recommend