deep learning on graphs
play

Deep Learning on Graphs for Advanced Big Data Analysis Student - PowerPoint PPT Presentation

CANDIDACY EXAM Electrical Engineering Doctoral program (EDEE) Deep Learning on Graphs for Advanced Big Data Analysis Student Supervisor Advisor Michal Defferrard Xavier Bresson Pierre Vandergheynst EPFL LTS2 Laboratory August 30, 2016


  1. CANDIDACY EXAM Electrical Engineering Doctoral program (EDEE) Deep Learning on Graphs for Advanced Big Data Analysis Student Supervisor Advisor Michaël Defferrard Xavier Bresson Pierre Vandergheynst EPFL LTS2 Laboratory August 30, 2016

  2. State of Research Performed Research Further Research Introduction ◮ Objective: analyze and extract information for decision-making from large-scale and high-dimensional datasets ◮ Method: Deep Learning (DL), especially Convolutional Neural Networks (CNNs), on Graphs ◮ Fields: Deep Learning and Graph Signal Processing (GSP) 2 / 25

  3. State of Research Performed Research Further Research Motivation ◮ Important and growing class of data lies on irregular domains ◮ Natural graphs / networks ◮ Constructed (feature / data) graphs ◮ Modeling versatility: graphs model heterogeneous pairwise relationships ◮ Important problem: recent works, high demand ◮ Reproduce the breakthrough of DL beyond Computer Vision ! 3 / 25

  4. State of Research Problem Performed Research State of the Art Further Research Further Work Problem Formulate DL components on graphs (& discover alternatives) Convolutional Neural Networks (CNNs) ◮ Localization: compact filters for low complexity ◮ Stationarity: translation invariance ◮ Compositionality: analysis with a filterbank Challenges ◮ Generalize convolution, downsampling and pooling to graphs ◮ Evaluate the assumptions on graph signals 4 / 25

  5. State of Research Problem Performed Research State of the Art Further Research Further Work Local Receptive Fields Gregor and LeCun 2010; Coates and Ng 2011; Bruna et al. 2013 ◮ Group features based upon similarity ◮ Reduce the number of learned parameters ◮ Can use graph adjacency matrix ◮ No weight-sharing / convolution / stationarity 5 / 25

  6. State of Research Problem Performed Research State of the Art Further Research Further Work Spatial approaches to Convolution on Graphs Niepert, Ahmed, and Kutzkov 2016; Vialatte, Gripon, and Mercier 2016 1. Define receptive field / neighborhood 2. Order nodes 6 / 25

  7. State of Research Problem Performed Research State of the Art Further Research Further Work Geodesic CNNs on Riemannian manifolds Masci et al. 2015 ◮ Generalization of CNNs to non-Euclidean manifolds ◮ Local geodesic system of polar coordinates to extract patches ◮ Tailored for geometry analysis and processing 7 / 25

  8. State of Research Problem Performed Research State of the Art Further Research Further Work Graph Neural Networks (GNNs) Scarselli et al. 2009 ◮ Recurrent Neural Networks (RNNs) on Graphs ◮ Propagate node representations until convergence ◮ Representations used as features 8 / 25

  9. State of Research Problem Performed Research State of the Art Further Research Further Work Diffusion-Convolutional Neural Networks (DCNNs) Atwood and Towsley 2015 ◮ Multiplication with powers (0 to H ) of transition matrix ◮ Diffused features multiplied by weight vector of support H ◮ No pooling, followed by a fully connected layer Node classification Graph classification Edge classification 9 / 25

  10. State of Research Problem Performed Research State of the Art Further Research Further Work Spectral Networks on Graphs Bruna et al. 2013; Henaff, Bruna, and LeCun 2015 ◮ First spectral definition ◮ Introduced a supervised graph estimation strategy ◮ Experiments on image recognition, text categorization and bioinformatics ◮ Spline filter parametrization ◮ Agglomerative method for coarsening 10 / 25

  11. State of Research Problem Performed Research State of the Art Further Research Further Work Further Work Build on (Bruna et al. 2013) and (Henaff, Bruna, and LeCun 2015) ◮ Spectral formulation ◮ Computational complexity ◮ Localization ◮ Ad hoc coarsening & pooling 11 / 25

  12. State of Research Learning Fast Localized Spectral Filters Performed Research Coarsening & Pooling Further Research Results Performed Research Proposed an efficient spectral generalization of CNNs to graphs Main contributions 1. Spectral formulation 2. Strictly localized filters 3. Low computational complexity 4. Efficient pooling 5. Experimental results 12 / 25

  13. State of Research Learning Fast Localized Spectral Filters Performed Research Coarsening & Pooling Further Research Results Paper “Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering” Defferrard, Bresson, and Vandergheynst 2016 ◮ Accepted for publication at NIPS 2016 ◮ Presented by Xavier at SUTD and University of Bergen Peer Reviews ◮ “extend ... data driven, end-to-end learning with excellent learning complexity” ◮ “very clean, efficient parametrization [for] efficient learning and evaluation” ◮ “highly promising paper ... shows how to efficiently generalize the [convolution]” ◮ “the potential for significant impact is high” ◮ “new and upcoming area with only a few recent works” 13 / 25

  14. State of Research Learning Fast Localized Spectral Filters Performed Research Coarsening & Pooling Further Research Results Definitions Chung 1997 ◮ G = ( V , E , W ): undirected and connected graph ◮ W ∈ R n × n : weighted adjacency matrix ◮ D ii = � j W ij : diagonal degree matrix ◮ x : V → R , x ∈ R n : graph signal ◮ L = D − W ∈ R n × n : combinatorial graph Laplacian ◮ L = I n − D − 1 / 2 WD − 1 / 2 : normalized graph Laplacian ◮ L = U Λ U T , U = [ u 0 , . . . , u n − 1 ] ∈ R n × n : graph Fourier basis ◮ ˆ x = U T x ∈ R n : graph Fourier transform 14 / 25

  15. State of Research Learning Fast Localized Spectral Filters Performed Research Coarsening & Pooling Further Research Results Spectral Filtering of Graph Signals y = g θ ( L ) x = g θ ( U Λ U T ) x = Ug θ (Λ) U T x Non-parametric filter: g θ (Λ) = diag( θ ) ◮ Non-localized in vertex domain ◮ Learning complexity in O ( n ) ◮ Computational complexity in O ( n 2 ) (& memory) 15 / 25

  16. State of Research Learning Fast Localized Spectral Filters Performed Research Coarsening & Pooling Further Research Results Polynomial Parametrization for Localized Filters K − 1 � θ k Λ k g θ (Λ) = k =0 ◮ Value at j of g θ centered at i : k θ k ( L k ) i , j ( g θ ( L ) δ i ) j = ( g θ ( L )) i , j = � ◮ d G ( i , j ) > K implies ( L K ) i , j = 0 (Hammond, Vandergheynst, and Gribonval 2011, Lemma 5.2) ◮ K -localized ◮ Learning complexity in O ( K ) ◮ Computational complexity in O ( n 2 ) 16 / 25

  17. State of Research Learning Fast Localized Spectral Filters Performed Research Coarsening & Pooling Further Research Results Recursive Formulation for Fast Filtering K − 1 θ k T k (˜ ˜ � g θ (Λ) = Λ) , Λ = 2Λ /λ max − I n k =0 ◮ Chebyshev polynomials: T k ( x ) = 2 xT k − 1 ( x ) − T k − 2 ( x ) with T 0 = 1 and T 1 = x k =0 θ k T k (˜ ◮ Filtering: y = g θ ( L ) x = � K − 1 L ) x ◮ Recurrence: y = g θ ( L ) x = [¯ x 0 , . . . , ¯ x K − 1 ] θ , x k = T k (˜ L ) x = 2˜ x 1 = ˜ ¯ L ¯ x k − 1 − ¯ x k − 2 with ¯ x 0 = x and ¯ Lx ◮ K -localized ◮ Learning complexity in O ( K ) ◮ Computational complexity in O ( K |E| ) 17 / 25

  18. State of Research Learning Fast Localized Spectral Filters Performed Research Coarsening & Pooling Further Research Results Learning Filters F in g θ i , j ( L ) x s , i ∈ R n � y s , j = i =1 ◮ x s , i : feature map i of sample s ◮ θ i , j : trainable parameters ( F in × F out vectors of Chebyshev coefficients) Gradients for backpropagation: ∂θ i , j = � S ∂ E x s , i , K − 1 ] T ∂ E s =1 [¯ x s , i , 0 , . . . , ¯ ◮ ∂ y s , j ∂ x s , i = � F out ∂ E j =1 g θ i , j ( L ) ∂ E ◮ ∂ y s , j Overall cost of O ( K |E| F in F out S ) operations 18 / 25

  19. State of Research Learning Fast Localized Spectral Filters Performed Research Coarsening & Pooling Further Research Results Coarsening & Pooling 5 0 1 2 3 4 5 6 7 8 9 10 11 6 3 0 4 2 1 2 3 0 1 0 0 1 2 3 4 5 7 11 1 8 4 2 5 10 9 0 1 2 ◮ Coarsening: Graclus / Metis ◮ Normalized cut minimization ◮ Pooling: as regular 1D signals ◮ Satisfies parallel architectures like GPUs ◮ Activation: ReLU (or tanh, sigmoid) 19 / 25

  20. State of Research Learning Fast Localized Spectral Filters Performed Research Coarsening & Pooling Further Research Results Training time (20NEWS) 1400 Non-Param / Spline 1200 Chebyshev 1000 time (ms) 800 600 400 200 0 2000 4000 6000 8000 10000 12000 number of features (words) Make CNNs practical for graph signals ! Spline: g θ (Λ) = B θ (Bruna et al. 2013; Henaff, Bruna, and LeCun 2015) 20 / 25

  21. State of Research Learning Fast Localized Spectral Filters Performed Research Coarsening & Pooling Further Research Results Convergence (MNIST) 6.5 100 Chebyshev 90 6.0 Non-Param 80 5.5 Spline validation accuracy 70 training loss 5.0 60 4.5 50 4.0 40 Chebyshev 3.5 30 Non-Param 3.0 20 Spline 10 2.5 500 1000 1500 2000 500 1000 1500 2000 step step Validation accuracy Training loss Faster convergence ! 21 / 25

Recommend


More recommend