GRAPH REPRESENTATIONS, BACKPROPAGATION AND BIOLOGICAL PLAUSIBILITY - PowerPoint PPT Presentation

GRAPH REPRESENTATIONS, BACKPROPAGATION AND BIOLOGICAL PLAUSIBILITY Marco Gori SAILAB, University of Siena NeurIPS 2019

OUTLINE • Learning in structured domains • Diffusion machines and spatiotemporal locality • Backpropagation diffusion and biological plausibility NeurIPS 2019

LEARNING IN STRUCTURED DOMAINS ? NeurIPS 2019

Graphs as Pattern Models r o i v a h image classification e b l a c i m e h c o c i s y h p What are the features?

Social nets quasi-eqlibrium dynamic models here we need to make prediction at node level! Social networks Citation networks Communication networks Multi-agent systems

GRAPH NEURAL NETS popular and successful mostly thanks to ? graph convolutional networks pictures from Z. Wu et al Non-Euclidean Deep Learning NeurIPS 2019

HISTORICALLY … ANOTHER PATH WAS FOLLOWED! NeurIPS 2019

Extension of the idea of time unfolding …

Structure unfolding The case of binary trees …

5 59 510 56 57 58 55 51 52 53 54 39 310 49 410 36 37 38 46 47 48 35 45 31 32 33 34 41 42 43 44 NeurIPS 2019 The Graph Neural Network Model 19 110 29 210 A recurrent net arises from cyclic graphs 16 17 18 26 27 28 Gori et al IJCNN 2005, 2009 IEEE-TNN Graph Compiling … 15 25 11 12 13 14 21 22 23 24 5 59 510 56 57 58 39 310 55 49 410 36 37 38 46 47 48 19 110 35 29 210 45 16 17 18 26 27 28 15 25 11 12 13 14 23 24 25

LEARNING AS A DIFFUSION PROCESS THE FRAMEWORK OF CONSTRAINED-BASED LEARNING AND THE ROLE OF TIME COHERENCE ? NeurIPS 2019

Natural Laws of Learning there is no distinction between training and test sets! Once we believe in ergodicity … The links with mechanics laws of learning regularization term Loss function of neural net Z T ✓ 1 ◆ q 2 + 1 q 2 + V ( q, t ) dt e − t/ ✏ 2 ✏ 2 ⇢ ¨ A = 2 ✏⌫ ˙ ✏ 0 laws of mechanics kinetic energy potential energy NeurIPS 2019

Natural Laws of Cognition: A Pre-Algorithmic Step Natural Learning Theory � Mechan- Remarks s a ics t w i � q i e Weights are interpreted as generalized coor- n dinates. s l e a w i � ˙ Weights variations are interpreted as gener- l ˙ q i r c u i alized velocities. t e r n a υ i � p i The conjugate momentum to the weights is p f o defined by using the machinery of Legendre n s o transforms. t i h t A(w) � S(q) g i The cognitive action is the dual of the action s i o e in mechanics. p w w) � L(t, q, ˙ The Lagrangian F is associated with the F(t, w, ˙ q) classic Lagrangian L in mechanics. H(t, w, υ ) � H(t, q, p) When using w and υ , we can define the Hamiltonian, just like in mechanics. NeurIPS 2019

Constraint Reactions architectural and environmental constraints ∧ ∨ L = { (( 0 , 0 ), 0 ), (( 0 , 1 ), 1 ), (( 1 , 0 ), 1 ), (( 1 , 1 ), 0 ) } = 1 2 y ((0 , 1) , (1 , 0)) ¬ y ((0 , 0) , (1 , 1)) . 3 4 “hard” architectural constraints x κ 3 − σ ( w 31 x κ 1 + w 32 x κ 2 + b 3 ) = 0 x κ 4 − σ ( w 41 x κ 1 + w 42 x κ 2 + b 4 ) = 0 κ = 1 , 2 , 3 , 4 x κ 5 − σ ( w 53 x κ 3 + w 54 x κ 4 + b 4 ) = 0 training set constraints x 15 = 1 , x 25 = 1 , x 35 = 0 , x 45 = 0 NeurIPS 2019

Lagrangian Approach variational calculus under subsidiary conditions Lagrangian Multipliers functional optimization: Static Models Dynamic Models holonomic constraints non-holonomic constraints

Formulation of Learning holonomic constraints (DAGs) R regularization term risk function Z 1 x ( t ) | 2 + m W | ˙ W ( t ) | 2 ) $ ( t ) dt + F ( x, W ) , A ( x, W ) := 2( m x | ˙ ∈ M # let us consider the case and let constraints ν x, W, ˙ W, ¨ R wing F ( x, W ) := F ( t, x, ˙ x, ¨ W ) dt , G j ( t, x ( t ) , W ( t )) = 0 , 1 ≤ j ≤ ⌫ . neural constraints (Einstein’s notation) proposition holds true: ⇢ ⇠ j − e j ( ⌧ ) , if 1 ≤ j ≤ ! ; G j ( ⌧ , ⇠ , M ) := ⇠ j − � ( m jk ⇠ k ) if ! < j ≤ ⌫ , Proposition 1 : Functionally independent for acyclic graphs feedforward nets

Formulation of Learning (con’t) holonomic constraints - any digraph regularization term risk function Z Z 1 x ( t ) | 2 + m W | ˙ W ( t ) | 2 + m s | ˙ s ( t ) | 2 ) $ ( t ) dt + F ( x, W, s ) , A ( x, W, s ) := 2( m x | ˙ (12) x, W, ˙ W, ¨ R re F ( x, W, s ) := F ( t, x, ˙ x, ¨ W, s ) dt . neural constraints slack variables ( ⇠ j − e j ( ⌧ ) + ⇣ j , if 1 ≤ j ≤ ! ; G j ( ⌧ , ⇠ , M, ⇣ ) := ⇠ j − � ( m jk ⇠ k ) + ⇣ j if ! < j ≤ ⌫ . Proposition 2 : Functionally independent for any graph

Formulation of Learning (con’t) Non-holonomic constraints (any digraph) regularization term loss term Z ⇣ m x x ( t ) | 2 + m W W ( t ) | 2 + F ( t, x, W ) ⌘ 2 | ˙ A ( x, W ) = 2 | ˙ $ ( t ) dt neural constraints x i ( t ) + cx i ( t ) − � ( w ik ( t ) x k ( t )) = 0; ˙ 0 < c < 1 Proposition 3 : Functionally independent for any graph

Feedforward Networks (DAGs) x ( t ) − � j ( t ) G j ξ ( x ( t ) , W ( t )) + L x − m x $ ( t )¨ x ( t ) − m x ˙ $ ( t ) ˙ F ( x ( t ) , W ( t )) = 0; − m W $ ( t ) ¨ $ ( t ) ˙ W ( t ) − � j ( t ) G j M ( x ( t ) , W ( t )) + L W W ( t ) − m W ˙ F ( x ( t ) , W ( t )) = 0 , instantaneous linear equation ξ a G j ⇣ G i + G i m ab G j x a + G i x a ˙ ⌘ ξ a m ab G i ττ + 2( G i w ab + G i � � j = $ τξ a ˙ τ m ab ˙ ξ a m bc ˙ w bc ) m x m W x a ˙ x b + G i + G i � ξ a ξ b ˙ m ab m cd ˙ w ab ˙ w cd L x a F G i + L w ab G i ξ a m ab F x a G i w ab G i − ˙ $ ( ˙ ξ a + ˙ m ab ) + , m x m W where L x x ) /dt + d 2 ( F ¨ x ) /dt 2 , L W W ) /dt + d 2 ( F ¨ W ) /dt 2 F = F x − d ( F ˙ F = F W − d ( F ˙ vatives of F with respect to x and W respectively (see ( 9 )). An expression for Lagrange supervised learning x, W, ˙ W, ¨ W ) = F ( t, x ) → L x L w F ( t, x, ˙ x, ¨ F = ∂ x F, F = 0

Reduction to Backpropagation m x → 0 → W the chain rule arises … W ij = − 1 ˙ � � 0 ( w ik x k ) � i x j ; x ξ a G j G i ξ a � j = − V x a G i ξ a , λ Augmented Learning Space T � = − V x , T is � 3 = − V x 3 ;   − � 0 ( w 21 x 1 ) w 21 1 0 − � 0 ( w 32 x 2 ) w 32 T = 0 1 � 2 = � 0 ( w 32 x 2 ) w 32 � 3 ;   0 0 1 � 1 = � 0 ( w 21 x 1 ) w 21 � 2 . Eq. (26) the Lagrange multipliers are derived as follows A somewhat surprising kinship with the BP delta-error Early discovery by Yan Le Cun, 1989

Augmented Learning Space Euler-Lagrange Equations W non-holonomic constraints x intuition: we need to store the multipliers and provide temporal updating λ x i ( t ) + cx i ( t ) − � ( w ik ( t ) x k ( t )) = 0; ˙ 1 − BP-like GNN factorization δ j x i W ( t ) = − 1 ˙ � � j ( t ) G j M ( t, x ( t ) , W ( t ) , ˙ x ( t )); This makes GNN efficient! ˙ � ( t ) = � j ( t ) G j ξ ( t, x ( t ) , W ( t ) , ˙ x ( t )) + V ξ ( t, x ( t )) Unlike BPTT and RTRL, learning equations are local in space and time: connections with Equilibrium Propagation (Y. Bengio et al)

DIFFUSION LEARNING AND BIOLOGICAL PLAUSIBILITY reactions: Lagrangian multipliers δ i environmental interaction LOCALITY IN SPACE AND IN TIME inputs x j

Biological Plausibility of Backpropagation Biological concerns should not involve BP , BP diffusion is biologically plausible BP algorithm is NOT biologically plausible but the instantaneous map x i ( t ) = σ ( w ik x k ( t )) replace with − x i ( t ) = � ( w ik ( t − 1) x k ( t − 1)) . x i ( t ) + cx i ( t ) − � ( w ik ( t ) x k ( t )) = 0; ˙ … clever related comment by Francis Crick, 1989

Forward and Backward Waves BP diffusion is biologically plausible t + 1 t + 2 t + 3 t + 4 t + 5 t + 6 t + 7 t + 8 t BP algorithm is NOT biologically plausible

Conclusions PRELIMINARY EXPERIMENTAL CHECK • GNN: Success due to convolutional graphs, but the “diffusion path” is still worth exploring • What happens with deep networks in graph compiling? • Laws of learning, pre-algorithmic issues, and biological plausibility • Dynamic models for Lagrangian multipliers (always delta- error): new perspective whenever time-coherence does matter! • Euler-Lagrangian Learning and SGD

Acknowledgments Alessandro Betti, SAILAB Publications • F. Scarselli et al, “The Graph Neural Network Model,” IEEE-TNN, 2009 • A. Betti, M. Gori, and S. Melacci, Cognitive Action Laws: The Case of Visual Features, IEEE-TNNLS 2019 • A. Betti, M. Gori, and S. Melacci, Motion Invariance in Visual Environment, IJCAI 2019 • A. Betti and M. Gori, Backprop Diffusion is Biologically Plausible, arXiv:1912.04635 • A. Betti and M. Gori, Spatiotemporal Local Propagation, arXiv: 1907.05106 Software • arXiv:1912.04635 • Preliminary version NeurIPS 2019

Machine Learning A CONSTRAINT-BASED APPROACH Marco Gori

GRAPH REPRESENTATIONS, BACKPROPAGATION AND BIOLOGICAL PLAUSIBILITY - PowerPoint PPT Presentation

GRAPH REPRESENTATIONS, BACKPROPAGATION AND BIOLOGICAL PLAUSIBILITY Marco Gori SAILAB, University of Siena NeurIPS 2019 OUTLINE Learning in structured domains Diffusion machines and spatiotemporal locality Backpropagation diffusion

Backpropagation Why backpropagation Neural networks are sequences of parametrized functions

MLPs with Backpropagation CS 472 Backpropagation 1 Multilayer Nets? Linear Systems F(cx) =

CSC321 Lecture 6: Backpropagation Roger Grosse Roger Grosse CSC321 Lecture 6: Backpropagation 1

8.3 GRAPH REPRESENTATIONS AND GRAPH ISOMORPHISM INCIDENCE TABLE REPRESENTATION def: An incidence

Neural Networks for Machine Learning Lecture 13a The ups and downs of backpropagation Geoffrey

Backpropagation Matt Gormley Lecture 12 Oct 10, 2018 1 Q&A 3 BACKPROPAGATION 4 A

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

61A Lecture 16 Announcements String Representations String Representations 4 String

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

Backpropagation, Self-Attention, Text Representations Through Language Modeling Karl Stratos

Parsing of natural language sentences to syntactic and semantic graph representations

Learning From Data Lecture 21 Neural Networks: Backpropagation Forward propagation: algorithmic

Lecture 11: Multi-layer Perceptron Forward Pass Backpropagation Aykut Erdem November

More refined representations Control dependence graph Problem: control-flow edges in CFG

Graph Indexing: Tree + Delta Delta >= Graph >= Graph Graph Indexing: Tree + Peixian Zhao,

Graph Mining Marco Serafini COMPSCI 532 Lecture 11 Classes of Graph Systems Graph

Various Faces of Data Centric Networking and Systems Eiko Yoneki University of Cambridge

Purpose Learn. Gain valuable feedback from membership Connect. Provide a space for NCDFIs

Native Financial Inclusion Policy Toolkit: Advocating for Stronger Policies in 2018 January 11,

ConnectHome Nation Webinar CARES Act Funding: Overview of Eligible Digital Inclusion Activities

Simplicity and Complexity of Belief-Propagation Elchanan Mossel 1 1 MIT July 2020 Elchanan Mossel

DERP Forum Strengthening Relationships with our Regulatory Partners St. Louis, Missouri May 8-9,

CS 188: Artificial Intelligence Bayes Nets Instructors: Dan Klein and Pieter Abbeel ---

Bayes Networks 2 Robert Platt Northeastern University All slides in this file are adapted from

GRAPH REPRESENTATIONS, BACKPROPAGATION AND BIOLOGICAL PLAUSIBILITY - PowerPoint PPT Presentation

GRAPH REPRESENTATIONS, BACKPROPAGATION AND BIOLOGICAL PLAUSIBILITY Marco Gori SAILAB, University of Siena NeurIPS 2019 OUTLINE Learning in structured domains Diffusion machines and spatiotemporal locality Backpropagation diffusion

Backpropagation Why backpropagation Neural networks are sequences of parametrized functions

MLPs with Backpropagation CS 472 Backpropagation 1 Multilayer Nets? Linear Systems F(cx) =

CSC321 Lecture 6: Backpropagation Roger Grosse Roger Grosse CSC321 Lecture 6: Backpropagation 1

8.3 GRAPH REPRESENTATIONS AND GRAPH ISOMORPHISM INCIDENCE TABLE REPRESENTATION def: An incidence

Neural Networks for Machine Learning Lecture 13a The ups and downs of backpropagation Geoffrey

Backpropagation Matt Gormley Lecture 12 Oct 10, 2018 1 Q&amp;A 3 BACKPROPAGATION 4 A

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

61A Lecture 16 Announcements String Representations String Representations 4 String

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

Backpropagation, Self-Attention, Text Representations Through Language Modeling Karl Stratos

Parsing of natural language sentences to syntactic and semantic graph representations

Learning From Data Lecture 21 Neural Networks: Backpropagation Forward propagation: algorithmic

Lecture 11: Multi-layer Perceptron Forward Pass Backpropagation Aykut Erdem November

More refined representations Control dependence graph Problem: control-flow edges in CFG

Graph Indexing: Tree + Delta Delta &gt;= Graph &gt;= Graph Graph Indexing: Tree + Peixian Zhao,

Graph Mining Marco Serafini COMPSCI 532 Lecture 11 Classes of Graph Systems Graph

Various Faces of Data Centric Networking and Systems Eiko Yoneki University of Cambridge

Purpose Learn. Gain valuable feedback from membership Connect. Provide a space for NCDFIs

Native Financial Inclusion Policy Toolkit: Advocating for Stronger Policies in 2018 January 11,

ConnectHome Nation Webinar CARES Act Funding: Overview of Eligible Digital Inclusion Activities

Simplicity and Complexity of Belief-Propagation Elchanan Mossel 1 1 MIT July 2020 Elchanan Mossel

DERP Forum Strengthening Relationships with our Regulatory Partners St. Louis, Missouri May 8-9,

CS 188: Artificial Intelligence Bayes Nets Instructors: Dan Klein and Pieter Abbeel ---

Bayes Networks 2 Robert Platt Northeastern University All slides in this file are adapted from

Backpropagation Matt Gormley Lecture 12 Oct 10, 2018 1 Q&A 3 BACKPROPAGATION 4 A

Graph Indexing: Tree + Delta Delta >= Graph >= Graph Graph Indexing: Tree + Peixian Zhao,