Novel tensor framework for neural networks and model reduction Shashanka Ubaru 1 Lior Horesh 1 Misha Kilmer 2 Elizabeth Newman 2 Haim Avron 3 Osman Malik 4 1 IBM TJ Watson Research Center 2 Tufts University 3 Tel Aviv University 4 University of Colorado, Boulder ICERM Workshop on Algorithms for Dimension and Complexity Reduction IBM Research / March, 2020 / c � 2020 IBM Corporation Shashanka Ubaru (IBM) Tensor NNs 1 / 35
Outline Brief introduction to tensors Tensor based graph neural networks Tensor neural networks Numerical Results Model reduction for NNs? Shashanka Ubaru (IBM) Tensor NNs 2 / 35
Introduction Much of real-world data is inherently multidimensional Many operators and models are natively multi-way Shashanka Ubaru (IBM) Tensor NNs 3 / 35
Tensor Applications Medical imaging Machine vision Latent semantic tensor indexing Video surveillance, streaming Ivanov, Mathies, Vasilescu, Tensor subspace analysis for viewpoint recognition , ICCV, 2009 Shi, Ling, Hu, Yuan, Xing, Multi-target tracking with motion context in tensor power iteration , CVPR, 2014 Shashanka Ubaru (IBM) Tensor NNs 4 / 35
Background and Notation Notation : A n 1 × n 2 ..., × n d - d th order tensor ◮ 0 th order tensor - scalar ◮ 1 st order tensor - vector ◮ 2 nd order tensor - matrix ◮ 3 rd order tensor ... Shashanka Ubaru (IBM) Tensor NNs 5 / 35
Inside the Box Fiber - a vector defined by fixing all but one index while varying the rest Slice - a matrix defined by fixing all but two indices while varying the rest Shashanka Ubaru (IBM) Tensor NNs 6 / 35
Tensor Multiplication Definition The k - mode multiplication of a tensor A ∈ R n 1 × n 2 ×···× n d with a matrix U ∈ R j × n k is denoted by A× k U and is of size n 1 × · · · × n k − 1 × j × n k +1 × · · · × n d Element-wise n d � ( A× k U ) i 1 ··· i k − 1 ji k +1 ··· i d = a i 1 i 2 ··· i d u ji k i k =1 k-mode multiplication Shashanka Ubaru (IBM) Tensor NNs 7 / 35
The ⋆ M -Product Given A ∈ R ℓ × p × n , B ∈ R p × m × n , and an invertible n × n matrix M , then � � A ❆ ˆ ˆ × 3 M − 1 C = A ⋆ M B = B where C ∈ R ℓ × m × n , ˆ A = A × 3 M , and ❆ multiplies the frontal slices in parallel Useful properties: tensor transpose, identity tensor, connection to Fourier transform, invariance to circulant shifts, . . . Shashanka Ubaru (IBM) Tensor NNs 8 / 35
Tensor Graph Convolutional Networks Shashanka Ubaru (IBM) Tensor NNs 9 / 35
Dynamic Graphs Graphs are ubiquitous data structures - represent interactions and structural relationships. In many real-world applications, underlying graph changes over time. Learning representations of dynamic graphs is essential. Shashanka Ubaru (IBM) Tensor NNs 10 / 35
Dynamic Graphs - Applications Corporate/financial networks, Natural Language Understanding (NLU), Social networks, Neural activity networks, Traffic predictions. Shashanka Ubaru (IBM) Tensor NNs 11 / 35
Graph Convolutional Networks Graph Neural Networks (GNN) popular tools to explore graph structured data. Graph Convolutional Networks (GCN) - based on graph convolution filters - extend convolutional neural networks (CNNs) to irregular graph domains. These GNN models operate on a given, static graph. Courtesy: Image by (Kipf & Welling, 2016). Shashanka Ubaru (IBM) Tensor NNs 12 / 35
Graph Convolutional Networks Motivation: Convolution of two signals x and y : x ⊗ y = F − 1 ( Fx ⊙ Fy ) , F is Fourier transform (DFT matrix). Convolution of two node signals x and y on a graph with Laplacian L = U Λ U T : x ⊗ y = U ( U T x ⊙ U T y ) . Filtered convolution: x ⊗ filt y = h ( L ) x ⊙ h ( L ) y , with matrix filter function h ( L ) = U h (Λ) U T . Shashanka Ubaru (IBM) Tensor NNs 13 / 35
Graph Convolutional Neural Networks Layer of initial convolution based GNNs (Bruna et. al, 2016): Given graph Laplacian L ∈ R N × N and node features X ∈ R N × F : H i +1 = σ ( h θ ( L ) H i W ( i ) ) , h θ filter function parametrized by θ , σ a nonlinear function (e.g., RELU), and W ( i ) a weight matrix with H 0 = X . Defferrard et al., (2016) used Chebyshev approximation: K � h θ ( L ) = θ k T k ( L ) . k =0 GCN (Kipf & Welling, 2016): Each layer takes form: σ ( LXW ). 2-layer example: Z = softmax( L σ ( LXW (0) ) W (1) ) Shashanka Ubaru (IBM) Tensor NNs 14 / 35
GCN for dynamic graphs We consider time varying , or dynamic , graphs Goal: Extend GCN framework to the dynamic setting for tasks such as node and edge classification, link prediction. Our approach: Use the tensor framework T adjacency matrices A :: t ∈ R N × N stacked into tensor A ∈ R N × N × T T node feature matrices X :: t ∈ R N × F stacked into tensor X ∈ R N × F × T Shashanka Ubaru (IBM) Tensor NNs 15 / 35
TensorGCN Adjacency tensor Time T A 1 Embedding 2 1 Graph tasks Link prediction TensorGCN Edge classification Node classification X 1 Dynamic graph Feature Tensor Shashanka Ubaru (IBM) Tensor NNs 16 / 35
TensorGCN We use the ⋆ M -Product to extend the std. GCN to dynamic graphs. We propose tensor GCN model σ ( A ⋆ M X ⋆ M W ). 2-layer example: Z = softmax( A ⋆ M σ ( A ⋆ M X ⋆ M W (0) ) ⋆ M W (1) ) (1) We choose M to be lower triangular and banded: � 1 if max(1 , t − b + 1) ≤ k ≤ t, min( b,t ) M tk = 0 otherwise , Can be shown to be consistent with a spatio-temporal message passing model. O. Malik, S. Ubaru, L. Horesh, M. Kilmer, and H. Avron, Tensor graph convolutional networks for prediction on dynamic graphs , 2020 Shashanka Ubaru (IBM) Tensor NNs 17 / 35
Tensor Neural Networks Shashanka Ubaru (IBM) Tensor NNs 18 / 35
Neural Networks Let a 0 be a feature vector with an associated target vector c Let f be a function which propagates a 0 though connected layers: a j +1 = σ ( W j · a j + b j ) for j = 0 , . . . , N − 1 , where σ is some nonlinear, monotonic activation function Shashanka Ubaru (IBM) Tensor NNs 19 / 35
Neural Networks Let a 0 be a feature vector with an associated target vector c Let f be a function which propagates a 0 though connected layers: a j +1 = σ ( W j · a j + b j ) for j = 0 , . . . , N − 1 , where σ is some nonlinear, monotonic activation function Goal: Learn the function f which optimizes: m f ∈H E ( f ) ≡ 1 � V ( c ( i ) , f ( a ( i ) min 0 )) + R ( f ) m � �� � � �� � i =1 regularizer loss function H - hypothesis space of functions rich, restrictive, efficient Shashanka Ubaru (IBM) Tensor NNs 19 / 35
Reduced Parameterization Given an n × n image A 0 , stored as a 0 ∈ R n 2 × 1 and � A 0 ∈ R n × 1 × n . Matrix: a j +1 = σ ( W j · a j + b j ) n 4 + n 2 parameters Tensor: A j +1 = σ ( W j ⋆ M � � A j + � B j ) n 3 + n 2 parameters Shashanka Ubaru (IBM) Tensor NNs 20 / 35
Improved Parametrization Given an n × n image A 0 , stored as a 0 ∈ R n 2 × 1 and � A 0 ∈ R n × 1 × n . Shashanka Ubaru (IBM) Tensor NNs 21 / 35
Tensor Neural Networks (tNNs) Forward propagation Update parameters Objective function Backward propagation M. Nielsen, Neural networks and deep learning , 2017 Shashanka Ubaru (IBM) Tensor NNs 22 / 35
Tensor Neural Networks (tNNs) A j +1 = σ ( W j ⋆ M � � A j + � B j ) Update parameters Objective function Backward propagation M. Nielsen, Neural networks and deep learning , 2017 Shashanka Ubaru (IBM) Tensor NNs 22 / 35
Tensor Neural Networks (tNNs) A j +1 = σ ( W j ⋆ M � A j +1 = σ ( W j ⋆ M � � � A j + � A j + � B j ) B j ) 2 || W N · unfold ( � E = 1 A N ) − c || 2 Update parameters F Backward propagation M. Nielsen, Neural networks and deep learning , 2017 Shashanka Ubaru (IBM) Tensor NNs 22 / 35
Tensor Neural Networks (tNNs) A j +1 = σ ( W j ⋆ M � A j +1 = σ ( W j ⋆ M � � � A j + � A j + � B j ) B j ) 2 || W N · unfold ( � E = 1 A N ) − c || 2 Update parameters F δ � j ⋆ M ( δ � A j +1 ⊙ σ ′ ( � A j = W ⊤ Z j +1 )) where � Z j +1 = W j ⋆ M � A j + � B j and ⊙ is the pointwise product δ � A j := ∂E ∂ � A j M. Nielsen, Neural networks and deep learning , 2017 Shashanka Ubaru (IBM) Tensor NNs 22 / 35
Tensor Neural Networks (tNNs) A j +1 = σ ( W j ⋆ M � A j +1 = σ ( W j ⋆ M � � � A j + � A j + � B j ) B j ) 2 || W N · unfold ( � E = 1 A N ) − c || 2 Update parameters F δ � j ⋆ M ( δ � A j +1 ⊙ σ ′ ( � A j = W ⊤ Z j +1 )) where � Z j +1 = W j ⋆ M � A j + � B j and ⊙ is the pointwise product ∂ � ∂ � A j +1 Z j +1 δ � A j := ∂E ∂E A j = ∂ � ∂ � ∂ � ∂ � A j +1 Z j +1 A j M. Nielsen, Neural networks and deep learning , 2017 Shashanka Ubaru (IBM) Tensor NNs 22 / 35
Tensor Neural Networks (tNNs) A j +1 = σ ( W j ⋆ M � A j +1 = σ ( W j ⋆ M � � � A j + � A j + � B j ) B j ) δ W j = ( δ � A j +1 ⊙ σ ′ ( � Z j +1 )) ⋆ M � A ⊤ j 2 || W N · unfold ( � E = 1 A N ) − c || 2 F δ � B j = δ � A j +1 ⊙ σ ′ ( � Z j +1 ) δ � j ⋆ M ( δ � A j +1 ⊙ σ ′ ( � A j = W ⊤ Z j +1 )) where � Z j +1 = W j ⋆ M � A j + � B j and ⊙ is the pointwise product M. Nielsen, Neural networks and deep learning , 2017 Shashanka Ubaru (IBM) Tensor NNs 22 / 35
Recommend
More recommend