graph convolutional network
play

Graph Convolutional Network Heting Gao University of Illinois at - PowerPoint PPT Presentation

Graph Convolutional Network Heting Gao University of Illinois at Urbana-Champaign hgao17@illinois.edu November 10, 2018 Heting Gao (UIUC) Short title November 10, 2018 1 / 27 Overview Graph Convolution 1 Preliminary Graph Fourier


  1. Graph Convolutional Network Heting Gao University of Illinois at Urbana-Champaign hgao17@illinois.edu November 10, 2018 Heting Gao (UIUC) Short title November 10, 2018 1 / 27

  2. Overview Graph Convolution 1 Preliminary Graph Fourier Transform Graph Spectral Filtering Fast Localized Spectral Filtering Convolutional Graph Network Heting Gao (UIUC) Short title November 10, 2018 2 / 27

  3. Preliminary i A connected undirected graph is represented as G = {V , E , W } V is the set of vertices and |V| = N . E is the set of edges. W is the weighted adjacency matrix. W i , j is the weight of the edge e = ( i , j ) connecting vertex i and j . W i , j = 0 if the edge does not exist. If the weight of the graph is not naturally defined, a common way to define the weight is � − [ dist ( i , j )] 2 � � if dist ( i , j ) ≤ κ exp W i , j = 2 θ 0 otherwise for some parameter κ and θ . dist ( i , j ) can be the actual distance on the graph between vertex i and j , or the distance between features of vertex i and j i(IEEE-2012) [David i Shuman] The Emerging Field of Signal Processing on Graphs Heting Gao (UIUC) Short title November 10, 2018 3 / 27

  4. Preliminary A signal or a function on the graph f : V → R can be represented as a vector f ∈ R N . f i = f ( v i ) is the function value on vertex v i ∈ V The Non-Normalized Graph Laplacian L = D − W W is the weight matrix. D is the degree matrix. It is a diagonal matrix and diagonal element is the sum of all the incident edge weights. D i , i = � N j =1 W i , j The Graph Laplacian L is a difference operator. ∀ f ∈ R N , � ( L f )( i ) = W i , j [ f ( i ) − f ( j )] j ∈N i � ( Lf ) i = W i , j ( f i − f j ) j ∈N i where N i denote the set of neighbor nodes of vertex i . Heting Gao (UIUC) Short title November 10, 2018 4 / 27

  5. Laplacian 1-D Laplacian operator ∆ f ( t + h ) − f ( t ) f ′ ( t ) = lim h h → 0 ∂ 2 ∆ f ( t ) = ∂ t 2 f ( t ) ∂ ∂ t f ′ ( t ) = f ′ ( t + h ) − f ′ ( t ) = lim h h → 0 Heting Gao (UIUC) Short title November 10, 2018 5 / 27

  6. Laplacian 1-D discrete Laplacian operator ∆ f ′ [ n ] = f [ n + 1] − f [ n ] f ′ [ n + 1] − f ′ [ n ] ∆ f [ n ] = = ( f [ n + 1] − f [ n ]) − ( f [ n ] − f [ n − 1]) = f [ n + 1] + f [ n − 1] − 2 f [ n ] 2-D discrete Laplacian operator ∆ ∆ f [ n , m ] = f [ n + 1 , m ] + f [ n − 1 , m ] + f [ n , m + 1] + f [ n , m − 1] − 4 f [ n , m ] Heting Gao (UIUC) Short title November 10, 2018 6 / 27

  7. Laplacian The Graph Laplacian L is a discrete Laplaican operator on the graph signals. � W i , j ( f i − f j ) ( Lf ) i = j ∈N i − ∆ f = Lf Heting Gao (UIUC) Short title November 10, 2018 7 / 27

  8. Fourier Transform For a given function f ( t ), its Fourier transform F at a given frequency 2 π k is � F (Ω) = < f ( t ) , e j Ω t > = f ( t ) e − j Ω t dt R The Laplacian of the basis e j Ω t is in form of itself. − ∆ e j Ω t = − ∂ 2 ∂ t 2 e j Ω t = Ω 2 e j Ω t For graph Fourier transform, we also want to find a set of analogous basis. Let u ∈ R N be a basis for graph transform, we want − ∆ u = Lu = λ u This is eigenvalue decomposition Heting Gao (UIUC) Short title November 10, 2018 8 / 27

  9. Graph Fourier Transform Let U = [ u l ] l =1 ,..., N denote the matrix of eigenvectors of L Let Λ = [ λ l ] l =1 ,..., N denote the diagonal matrix of eigenvalues of L For a given signal f , its Fourier transform F ( λ l ) at the given frequency λ l is � N f i u ∗ F ( λ l ) = < f , u l > = l , i i =1 The inverse Fourier transform is then N � f i = F ( λ l ) u l , i l =1 Let F ∈ R N denote the Fourier transform vector of the given graph signal f ∈ R N , we have the following matrix form of Fourier transform. U T f = F f = UF Heting Gao (UIUC) Short title November 10, 2018 9 / 27

  10. Graph Spectral Filtering Let F : R N → R N denote the graph Fourier transform Let F − 1 : R N → R N denote the inverse graph Fourier transform. Let h ∈ R N denote the filter function on the graph. Let H ∈ R N denote the Fourier transform of the filter function. Let y ∈ R N denote the function after filtering on the graph. = h ∗ f y F − 1 [ F ( h ) ⊙ F ( f )] = U [ U T h ⊙ U T f ] = U [ H ⊙ U T f ] =   H ( λ 1 )   ...  U T f = U  H ( λ l ) Heting Gao (UIUC) Short title November 10, 2018 10 / 27

  11. Graph Spectral Filtering Define H ( L ) the spectral filter as   H ( λ 1 )   ...  U T H ( L ) = U  H ( λ l ) The adjustable parameter would be [ H l ] l =1 , 2 ,..., N Let θ = [ H ( λ l )] l =1 , 2 ,..., N . Let g θ ( Λ ) = diag ( θ ) We can define the convolutional layer as y = σ ( U g θ ( Λ ) U T f ) Heting Gao (UIUC) Short title November 10, 2018 11 / 27

  12. Fast Localized Spectral Filtering ii If we define the convolutional layer as y = σ ( U g θ ( Λ ) U T f ) There are however 3 limitations. The convolution is not localized. With arbitrary θ , the signal f can be propagated to any other nodes. θ ∈ R N means that we need N parameter. Eigen decomposition has a computational complexity of O ( N 3 ) and every forward propagation has complexity of O ( N 2 ) ii(NIPS-2016) [Michal Defferrard] Convolutional Neural Networks onGraphs with Fast Localized Spectral Filtering Heting Gao (UIUC) Short title November 10, 2018 12 / 27

  13. Fast Localized Spectral Filtering We can instead define � K θ k Λ k g θ ( Λ ) = k =1 We get the new convolutional layer as σ ( U g θ ( Λ ) U T f ) = y K � θ k Λ k ) U T f ) = σ ( U ( k =1 K � θ k UΛ k U T f ) = σ ( k =1 � K θ k L K f ) = σ ( k =1 Heting Gao (UIUC) Short title November 10, 2018 13 / 27

  14. Fast Localized Spectral Filtering The new definition of a convolutional layer is K � θ k L k f ) y = σ ( k =1 It have three advantage The convolution is localized and is exactly K -hop localized we are using at most K ’s power of L We need only K parameters We do not need to decompose L and the forward propagation can be approximated using Chebyshev polynomials (I do not understand this part but I will still try to describe the steps described in the paper). Heting Gao (UIUC) Short title November 10, 2018 14 / 27

  15. Chebychev Polynomial Chebychev Polynomial Expansion T 0 ( y ) = 1 T 1 ( y ) = y T k ( y ) = 2 yT k − 1 − T k − 2 These polynomials forms an orthogonal basis for x ∈ L 2 ([ − 1 , 1] , dy √ 1 − y 2 ), the Hilbert space of square integrable √ dy functions with respect to the measure 1 − y 2 � � 1 T l ( y ) T m ( y ) δ l , m π/ 2 m , l > 0 � dy = 1 − y 2 π m = l = 0 − 1 Heting Gao (UIUC) Short title November 10, 2018 15 / 27

  16. Chebychev Polynomial In particular, ∀ h ∈ L 2 ([ − 1 , 1] , dy √ 1 − y 2 ), h has the following chebychev polynomial expansion. ∞ � h ( y ) = 1 2 c 0 + c k T k ( y ) k =1 λ max − I ) U T = Since λ ∈ [0 , λ max ] plug in y = U ( 2 Λ 2 L λ max − I , K ∞ � � θ k L k = 1 g θ ( L ) = 2 c 0 + c k T k ( y ) k =1 k =1 where T k ( y ) can be computed recursively as T k ( y ) = 2 yT k − 1 ( y ) + T k − 2 ( y ) Heting Gao (UIUC) Short title November 10, 2018 16 / 27

  17. Chebychev Polynomial Let ¯ f k = T k ( y ) f can be computed recursively as ¯ f k = T k ( y ) f = T k ( y ) f = 2 yT k − 1 ( y ) f + T k − 2 ( y ) f 2 y ¯ f k − 1 + ¯ = f k − 2 2( 2 L − I )¯ f k − 1 + ¯ = f k − 2 λ max The approximated convolutional layer is as � K θ k ¯ y = σ ( g θ ( L ) f ) = σ ( f k ) k =0 with ¯ f 0 = f , and ¯ f 1 = y f = ( 2 L λ max − I ) f Heting Gao (UIUC) Short title November 10, 2018 17 / 27

  18. Convolutional Graph Network iv Instead of using K -hop localized filter, set K = 1, but instead stack multiple layers. Use symmetric normalized Laplacian L sym = D − 1 2 LD − 1 2 = I − D − 1 2 WD − 1 2 iii . The entries of L sym are   1 , i = j   1 L sym √ , i � = j and vertex i and j are connected = i , j  d i d j   0 , otherwise The equation is equivalent to � f j 1 j ∈N i W i , j ( f i √ ( L sym f ) i = √ d i − d j ) √ d i The eigenvalues [ λ l ] l =1 , 2 ,..., N of L sym is in the range of [0 , 2] iiiKipf’s paper uses A to represent the weight matrix. I will stick to W to be consistent in this presentation iv(ICLR-2017) [Thomas N. Kipf] Semi-Supervised Classification with Graph Convolutional Networks Heting Gao (UIUC) Short title November 10, 2018 18 / 27

  19. Convolutional Graph Network Then the convolutional layer can be approximated as = σ ( g θ ( L ) f ) y ≈ σ ( θ 0 f + θ 1 y f ) σ ( θ 0 f + θ 1 (2 L sym = − I ) f ) λ max σ [( θ 0 − θ 1 ) f + θ 1 L sym f ] (assume λ max = 2) = σ [( θ 0 − θ 1 ) f + θ 1 ( I − D − 1 2 WD − 1 2 ) f ] = σ ( θ 0 f − θ 1 D − 1 2 WD − 1 2 ) f ) = Heting Gao (UIUC) Short title November 10, 2018 19 / 27

  20. Convolutional Graph Network The approximated output layer y = σ ( θ 0 f − θ 1 D − 1 2 WD − 1 2 ) f ) The number of parameters is further reduced to 1 in the paper by assuming θ = θ 0 = − θ 1 y = σ ( θ ( I + D − 1 2 WD − 1 2 ) f ) The matrix I + D − 1 2 WD − 1 2 has eigenvalues λ ∈ [0 , 2]. Repeated application on this matrix can result in numerical instability. Renormalize I + D − 1 2 WD − 1 D − 1 D − 1 2 to � 2 � W � 2 � W = W + I � � � D i , i = W i , j j Heting Gao (UIUC) Short title November 10, 2018 20 / 27

Recommend


More recommend