kipf t welling m semi supervised classification with
play

Kipf, T., Welling, M.: Semi-Supervised Classification with Graph - PowerPoint PPT Presentation

Kipf, T., Welling, M.: Semi-Supervised Classification with Graph Convolutional Networks Radim petlk Czech Technical University in Prague 2 Overview - Kipf and Welling - use first order approximation in Fourier-domain to obtain an efficient


  1. Kipf, T., Welling, M.: Semi-Supervised Classification with Graph Convolutional Networks Radim Ε petlΓ­k Czech Technical University in Prague

  2. 2 Overview - Kipf and Welling - use first order approximation in Fourier-domain to obtain an efficient linear-time graph-CNNs - apply the approximation to the semi-supervised graph node classification problem

  3. 3 Graph Adjacency Matrix 𝑩 - symmetric, square matrix - 𝐡 π‘—π‘˜ = 1 iff vertices 𝑀 𝑗 and 𝑀 π‘˜ are incident - 𝐡 π‘—π‘˜ = 0 otherwise http://mathworld.wolfram.com/AdjacencyMatrix.html

  4. 4 Graph Convolutional Network - given a graph 𝐻 = π‘Š, 𝐹 , graph-CNN is a function which: - takes as input: - feature description π’š 𝒋 ∈ ℝ 𝐸 for every node 𝑗 ; summarized as π‘Œ ∈ ℝ 𝑂×𝐸 , where 𝑂 is number of nodes, 𝐸 is number of input features - description of the graph structure in matrix form, typically an adjacency matrix 𝐡 - produces: - node-level output π‘Ž ∈ ℝ 𝑂×𝐺 , where 𝐺 is the number of output features per node

  5. 5 Graph Convolutional Network - is composed of non-linear functions 𝐼 (π‘š+1) = 𝑔(𝐼 π‘š , 𝐡) , where 𝐼 0 = π‘Œ , and 𝐼 (𝑀) = π‘Ž , and 𝑀 is the number of layers.

  6. 6 Graph Convolutional Network - graphically: https://tkipf.github.io/graph-convolutional-networks/

  7. 7 Graph Convolutional Network Let’s start with a simple layer -wise propagation rule 𝑔 𝐼 π‘š , 𝐡 = 𝜏(𝐡𝐼 π‘š 𝑋 π‘š ) , where 𝑋 (π‘š) ∈ ℝ 𝐸 π‘š ×𝐸 π‘š+1 is a weight matrix for the π‘š -th neural network layer, 𝜏(β‹…) is a non-linear activation function, 𝐡 ∈ ℝ 𝑂×𝑂 is adjacency matrix, 𝑂 is the number of nodes, 𝐼 (π‘š) ∈ ℝ 𝑂×𝐸 π‘š https://samidavies.wordpress.com/2016/09/20/whats-up-with-the-graph-laplacian/

  8. 8 Graph Convolutional Network multiplication with 𝐡 not enough, we’re missing the node itself 𝑔 𝐼 π‘š , 𝐡 = 𝜏(𝐡𝐼 π‘š 𝑋 π‘š ) , we fix it by 𝑔 𝐼 π‘š , 𝐡 = 𝜏( መ 𝐡𝐼 π‘š 𝑋 π‘š ) , where መ 𝐡 = 𝐡 + 𝐽 , 𝐽 is the identity matrix

  9. 9 Graph Convolutional Network መ 𝐡 is typically not normalized; this multiplication 𝑔 𝐼 π‘š , 𝐡 = 𝜏( መ 𝐡𝐼 π‘š 𝑋 π‘š ) , would change the scale of features 𝐼 (π‘š) 𝐸 βˆ’ 1 𝐸 βˆ’ 1 we fix that by symmetric normalization, i.e. ΰ·‘ 2 𝐡ෑ 2 , where ΰ·‘ 𝐸 is the diagonal node degree matrix of መ 𝐡 , ΰ·‘ 𝐸 𝑗𝑗 = Οƒ π‘˜ መ 𝐡 π‘—π‘˜ , producing 𝐸 βˆ’ 1 𝐸 βˆ’ 1 𝑔 𝐼 π‘š , 𝐡 = 𝜏( ΰ·‘ 2 𝐼 π‘š 𝑋 π‘š ) , 2 መ 𝐡 ΰ·‘

  10. 10 Graph Convolutional Network Examining a single layer, single filter πœ„ ∈ ℝ , and a single node feature vector π’š ∈ ℝ 𝐸

  11. 11 Graph Convolutional Network 𝐡 = 𝐡 + 𝐽 , ΰ·‘ መ 𝐸 𝑗𝑗 = Οƒ π‘˜ መ 𝐡 π‘—π‘˜ … renormalization trick

  12. 12 Graph Convolutional Network β€² =- πœ„ 1 β€² πœ„ = πœ„ 0 β€² 𝑀 βˆ’ 𝐽 π’š β€² π’š + πœ„ 1 πœ„ 0

  13. 13 Graph Convolutional Network β€² 𝑀 βˆ’ 𝐽 π’š β€² π’š + πœ„ 1 πœ„ 0 ΰ·¨ 𝑀 = 𝑑 𝑀 βˆ’ 𝐽 , 𝑑 ∈ ℝ 𝒉 𝜾 ⋆ π’š = 𝑉𝒉 𝜾 𝑉 ⊀ π’š Inverse Fourier transform – filtering – Fourier transform

  14. 14 Graph Convolutional Network An efficient graph convolution approximation was performed when the multiplication was interpreted as approximation of convolution in Fourier domain using Chebyshev polynomials. where 𝑂 is number of nodes, E is number of edges, 𝐸 π‘š is number of input channels, 𝐸 π‘š+1 is number of output channels.

  15. 15 Overview - Kipf and Welling - use first order approximation in Fourier-domain to obtain an efficient linear-time graph-CNNs - apply the approximation to the semi-supervised graph node classification problem

  16. 16 Semi-supervised Classification Task β–ͺ given a point set π‘Œ = {𝑦 1 , … , 𝑦 π‘š , 𝑦 π‘š+1 , … ,𝑦 π‘œ } β–ͺ and a label set 𝑀 = {1,… 𝑑} , where – first π‘š points have labels 𝑧 1 , … , 𝑧 π‘š ∈ 𝑀 – remaining points are unlabeled – 𝑑 is the number of classes β–ͺ the goal is to – predict the labels of the unlabeled points

  17. 17 Semi-supervised Classification Task β–ͺ graphically: https://papers.nips.cc/paper/2506-learning-with-local-and-global-consistency.pdf

  18. 18 graph-CNN EXAMPLE β–ͺ example: – two-layer graph-CNN π΅π‘Œπ‘‹ 0 𝑋 1 π‘Ž = 𝑔 π‘Œ, 𝐡 = softmax መ 𝐡 ReLU መ where 𝑋 0 ∈ ℝ 𝐷×𝐼 with 𝐷 input channels and 𝐼 features maps, 𝑋 1 ∈ ℝ 𝐼×𝐺 with 𝐺 output features per node

  19. 19 Graph Convolutional Network - graphically: https://arxiv.org/pdf/1609.02907.pdf

  20. 20 graph-CNN EXAMPLE β–ͺ objective function: – cross-entropy where Y 𝑀 is a set of node indices that have labels, π‘Ž π‘šπ‘” is the element in the l-th row, f-th column of matrix π‘Ž , ground truth: 𝑍 π‘šπ‘” is 1 if instance π‘š comes from a class 𝑔 .

  21. 21 graph-CNN EXAMPLE - RESULTS β–ͺ weights trained with gradient descent

  22. 22 graph-CNN EXAMPLE - RESULTS β–ͺ different variants of propagation models

  23. 23 graph-CNN another EXAMPLE β–ͺ 3- layer GCN, β€œkarate - club” problem, one labeled example per class: 300 training iterations

  24. 24 Limitations - Memory grows linearly with data - only works with undirected graph - assumption of locality - assumption of equal importance of self-connections vs. edges to neighboring nodes መ 𝐡 = 𝐡 + πœ‡π½ where πœ‡ is a learnable parameter.

  25. 25 Summary - Kipf and Welling - use first order approximation in Fourier-domain to obtain an efficient linear-time graph-CNNs - apply the approximation to the semi-supervised graph node classification problem

  26. 26 Thank you very much for your time…

  27. 27 Answers to Questions ሚ 𝐡 = 𝐡 + πœ‡π½ 𝑂 - The lambda parameter would control the influence of neighbouring edges vs. self-connections. - How (or why) would the lambda parameter trade-off also between supervised and unsupervised learning?

Recommend


More recommend