Kipf, T., Welling, M.: Semi-Supervised Classification with Graph Convolutional Networks Radim Ε petlΓk Czech Technical University in Prague
2 Overview - Kipf and Welling - use first order approximation in Fourier-domain to obtain an efficient linear-time graph-CNNs - apply the approximation to the semi-supervised graph node classification problem
3 Graph Adjacency Matrix π© - symmetric, square matrix - π΅ ππ = 1 iff vertices π€ π and π€ π are incident - π΅ ππ = 0 otherwise http://mathworld.wolfram.com/AdjacencyMatrix.html
4 Graph Convolutional Network - given a graph π» = π, πΉ , graph-CNN is a function which: - takes as input: - feature description π π β β πΈ for every node π ; summarized as π β β πΓπΈ , where π is number of nodes, πΈ is number of input features - description of the graph structure in matrix form, typically an adjacency matrix π΅ - produces: - node-level output π β β πΓπΊ , where πΊ is the number of output features per node
5 Graph Convolutional Network - is composed of non-linear functions πΌ (π+1) = π(πΌ π , π΅) , where πΌ 0 = π , and πΌ (π) = π , and π is the number of layers.
6 Graph Convolutional Network - graphically: https://tkipf.github.io/graph-convolutional-networks/
7 Graph Convolutional Network Letβs start with a simple layer -wise propagation rule π πΌ π , π΅ = π(π΅πΌ π π π ) , where π (π) β β πΈ π ΓπΈ π+1 is a weight matrix for the π -th neural network layer, π(β ) is a non-linear activation function, π΅ β β πΓπ is adjacency matrix, π is the number of nodes, πΌ (π) β β πΓπΈ π https://samidavies.wordpress.com/2016/09/20/whats-up-with-the-graph-laplacian/
8 Graph Convolutional Network multiplication with π΅ not enough, weβre missing the node itself π πΌ π , π΅ = π(π΅πΌ π π π ) , we fix it by π πΌ π , π΅ = π( α π΅πΌ π π π ) , where α π΅ = π΅ + π½ , π½ is the identity matrix
9 Graph Convolutional Network α π΅ is typically not normalized; this multiplication π πΌ π , π΅ = π( α π΅πΌ π π π ) , would change the scale of features πΌ (π) πΈ β 1 πΈ β 1 we fix that by symmetric normalization, i.e. ΰ·‘ 2 π΅ΰ·‘ 2 , where ΰ·‘ πΈ is the diagonal node degree matrix of α π΅ , ΰ·‘ πΈ ππ = Ο π α π΅ ππ , producing πΈ β 1 πΈ β 1 π πΌ π , π΅ = π( ΰ·‘ 2 πΌ π π π ) , 2 α π΅ ΰ·‘
10 Graph Convolutional Network Examining a single layer, single filter π β β , and a single node feature vector π β β πΈ
11 Graph Convolutional Network π΅ = π΅ + π½ , ΰ·‘ α πΈ ππ = Ο π α π΅ ππ β¦ renormalization trick
12 Graph Convolutional Network β² =- π 1 β² π = π 0 β² π β π½ π β² π + π 1 π 0
13 Graph Convolutional Network β² π β π½ π β² π + π 1 π 0 ΰ·¨ π = π π β π½ , π β β π πΎ β π = ππ πΎ π β€ π Inverse Fourier transform β filtering β Fourier transform
14 Graph Convolutional Network An efficient graph convolution approximation was performed when the multiplication was interpreted as approximation of convolution in Fourier domain using Chebyshev polynomials. where π is number of nodes, E is number of edges, πΈ π is number of input channels, πΈ π+1 is number of output channels.
15 Overview - Kipf and Welling - use first order approximation in Fourier-domain to obtain an efficient linear-time graph-CNNs - apply the approximation to the semi-supervised graph node classification problem
16 Semi-supervised Classification Task βͺ given a point set π = {π¦ 1 , β¦ , π¦ π , π¦ π+1 , β¦ ,π¦ π } βͺ and a label set π = {1,β¦ π} , where β first π points have labels π§ 1 , β¦ , π§ π β π β remaining points are unlabeled β π is the number of classes βͺ the goal is to β predict the labels of the unlabeled points
17 Semi-supervised Classification Task βͺ graphically: https://papers.nips.cc/paper/2506-learning-with-local-and-global-consistency.pdf
18 graph-CNN EXAMPLE βͺ example: β two-layer graph-CNN π΅ππ 0 π 1 π = π π, π΅ = softmax α π΅ ReLU α where π 0 β β π·ΓπΌ with π· input channels and πΌ features maps, π 1 β β πΌΓπΊ with πΊ output features per node
19 Graph Convolutional Network - graphically: https://arxiv.org/pdf/1609.02907.pdf
20 graph-CNN EXAMPLE βͺ objective function: β cross-entropy where Y π is a set of node indices that have labels, π ππ is the element in the l-th row, f-th column of matrix π , ground truth: π ππ is 1 if instance π comes from a class π .
21 graph-CNN EXAMPLE - RESULTS βͺ weights trained with gradient descent
22 graph-CNN EXAMPLE - RESULTS βͺ different variants of propagation models
23 graph-CNN another EXAMPLE βͺ 3- layer GCN, βkarate - clubβ problem, one labeled example per class: 300 training iterations
24 Limitations - Memory grows linearly with data - only works with undirected graph - assumption of locality - assumption of equal importance of self-connections vs. edges to neighboring nodes α π΅ = π΅ + ππ½ where π is a learnable parameter.
25 Summary - Kipf and Welling - use first order approximation in Fourier-domain to obtain an efficient linear-time graph-CNNs - apply the approximation to the semi-supervised graph node classification problem
26 Thank you very much for your timeβ¦
27 Answers to Questions α π΅ = π΅ + ππ½ π - The lambda parameter would control the influence of neighbouring edges vs. self-connections. - How (or why) would the lambda parameter trade-off also between supervised and unsupervised learning?
Recommend
More recommend