local representation alignment
play

Local Representation Alignment: A Biologically Motivated Algorithm - PowerPoint PPT Presentation

Local Representation Alignment: A Biologically Motivated Algorithm for Training Neural Systems Alexander G. Ororbia II The Neural Adaptive Computing (NAC) Laboratory Rochester Institute of Technology 1 Collaborators The Pennsylvania State


  1. Local Representation Alignment: A Biologically Motivated Algorithm for Training Neural Systems Alexander G. Ororbia II The Neural Adaptive Computing (NAC) Laboratory Rochester Institute of Technology 1

  2. Collaborators • The Pennsylvania State University • Dr. C. Lee Giles • Dr. Daniel Kifer • Rochester Institute of Technology (RIT) • Dr. Ifeoma Nwogu (Computer Vision) • Dr. Travis Desell (Neuro-evolution, distributed computing) • Students • Ankur Mali (PhD student, Penn State, co-advised w/ Dr. C. Lee Giles) • Timothy Zee (PhD student, RIT, co-advised w/ Dr. Ifeoma Nwogu) • Abdelrahman Elsiad (PhD student, RIT, co-advised w/ Dr. Travis Desell) 2

  3. Objectives • Context : Credit assignment & algorithmic alternatives • Backpropagation of errors (backprop) • Feedback alignment algorithms Equilibrium propagation (EP) • Target propagation (TP) • Contrastive Hebbian learning (CHL) Contrastive Divergence (CD) • Discrepancy Reduction – a family of learning procedures • Error-Driven Local Representation Alignment (LRA/LRA-E) • Adaptive Noise Difference Target Propagation (DTP- σ ) • Experimental Results & Variations • Conclusions 3

  4. 4

  5. MLP = Multilayer perceptron SGD, Adam, RMSprop AE = Autoencoder BM = Boltzmann machine Backprop, CHL, LRA MSE, MAE, CNLL MLP, AE, BM, RNN MNIST 5

  6. Problems with Backprop • The global feedback pathway • Vanishing/exploding gradients • In recurrent networks, this is worse!! • The weight transport problem • High sensitivity to initialization • Activation constraints/conditions • Requires system to be fully differentiable → difficulty in handling discrete-valued functions • Requires sufficiently linearity → adversarial samples Global optimization, back-prop through whole graph. 6

  7. Feedforward Inference Illustration: forward propagation in a multilayer perceptron (MLP) to collect activities (Shared across most algorithms, i.e., backprop, random feedback alignment, direct feedback alignment, local representation alignment) 7

  8. 8

  9. 9

  10. 10

  11. 11

  12. 12

  13. Backpropagation of Errors 13

  14. Conducting credit assignment using the activities produced by the inference pass 14

  15. Pass error signal back through post-activations (get derivatives w.r.t. pre-activitions) 15

  16. Pass error signal back through (incoming) synaptic weights to get error signal transmitted to post- activations in layer below 16

  17. Repeat the previous steps, layer by layer (recursive treatment of backprop procedure) 17

  18. 18

  19. 19

  20. Random Feedback Alignment 20

  21. 21

  22. Pass error signal back through post-activations (get derivatives w.r.t. pre-activitions) 22

  23. Pass error signal back through fixed, random alignment weights (replaces backprop’s step of passing error through transpose of feedforward weights) 23

  24. Repeat previous steps (similar to backprop) 24

  25. 25

  26. 26

  27. Direct Feedback Alignment 27

  28. 28

  29. Pass error signal back through post-activations (get derivatives w.r.t. pre-activitions) 29

  30. Pass error signal along first set of direct alignment weights to second layer 30

  31. Pass error signal along next set of direct alignment weights to first layer 31

  32. Treat the signals propagated along direct alignment connections as proxies for error derivatives and run them through post-activations in each layer, respectively 32

  33. 33

  34. Backpropagation of Errors: Direct Feedback Alignment: Random Feedback Alignment: 34

  35. Global versus Local Signals Global optimization, back-prop through whole graph. Local optimization, back-prop through sub-graphs. 36

  36. Global versus Local Signals Will these yield coherent models? Global feedback pathway Global optimization, back-prop through whole graph. Local optimization, back-prop through sub-graphs. 37

  37. Equilibrium Propagation 38 Negative phase Positive phase

  38. The Discrepancy Reduction Family • General process (Ororbia et. al., 2017 Adapt) • 1) Search for latent representations that better explain input/output (targets) • 2) Reduce mismatch between currently “guessed” representations & target representations • Sum of internal, local losses (in nats) → total discrepancy (akin to “pseudo - energy”) • Coordinated local learning rules • Algorithms • Difference target propagation (DTP) (Lee et. al., 2014) • DTP- σ (Ororbia et. al., 2019) • LRA (Ororbia et. al., 2018, Ororbia et. al., 2019) • Others – targets could come from an external, interacting process • NPC (neural predictive coding, Ororbia et. al., 2017/2018/2019) 39

  39. Adaptive Noise Difference Target Propagation (DTP- σ ) z L ˄ z L g(z ) L ˄ g(z ) z L-1 L ˄ z L-1 40 Image adapted from (Lillicrap et al., 2018)

  40. Error-Driven Local Representation Alignment (LRA-E) 41

  41. 42

  42. Transmit error along error feedback weights, and error correct the post- activations using the transmitted displacement/delta 43

  43. Calculate local error in layer below, measuring discrepancy between original post-activation and error- corrected post-activation 44

  44. Repeat the past several steps, error-correcting each layer further down within the network/system 45

  45. 46

  46. Optional…substitute & repeat! 47

  47. Aligning Local Representations • Credit assignment by optimizing subgraphs linked by error units The Cauchy local loss: 48

  48. Aligning Local Representations • Credit assignment by optimizing subgraphs linked by error units, motivated/inspired by (Rao & Ballard, 1999) 49

  49. Aligning Local Representations • Credit assignment by optimizing subgraphs linked by error units, motivated/inspired by (Rao & Ballard, 1999) There is more than one way to compute these changes 50

  50. Some Experimental Results 51

  51. Experimental Results MNIST 7 3 Fashion MNIST Trousers Dress Shirt 52 (Ororbia et al., 2018 Bio)

  52. Acquired Filters LRA Backprop Third level filters acquired, after a single pass through the data, by tanh network trained by a) backprop, b) LRA. 53

  53. Visualization of Topmost Post-Activities 54

  54. Measuring Total Discrepancy in LRA-E Angle between LRA, DFA, & DTP- σ against Backprop 55

  55. Training Deep (& Thin) Networks Equilibrium Propagation (8 layers): MNIST : 59.03% Fashion MNIST : 67.33% (Ororbia et al., 2018 Credit) Equilibrium Propagation (3 layers): MNIST : 6.00% Fashion MNIST : 16.71% 56

  56. Training Networks from Null Initialization LWTA: SLWTA: (Ororbia et al., 2018 Credit) 57

  57. Training Stochastic Networks (Ororbia et al., 2018 Credit) 58

  58. If time p ermits…let’s talk about modeling time… 59

  59. Training Neural Temporal/Recurrent Models • Integrating LRA into recurrent networks – result = Temporal Neural Coding Network The Parallel Temporal Neural Coding Network (P-TNCN) (Ororbia et al., 2018) (Ororbia et al., 2018 Continual) 60

  60. Removing Back-Propagation through Time! • Each step in time entails: 1) generate hypothesis, 2) error correction in light of evidence 61

  61. 62

  62. 63

  63. Conclusions • Backprop has issues, alignment algorithms fix one issue • Other algorithms such as DTP or EP are slow…. • Discrepancy reduction • Local representation alignment • Adaptive noise difference target propagation (DTP- σ ) • Showed promising results, stable and performant compared to alternatives such as Equilibrium Propagation & alignment algorithms • Can work with non-differentiable operators (discrete/stochastic) • Can be used to train recurrent/temporal models too! 64

  64. Questions? 65

  65. References • (Ororbia et al., 2018, Credit) -- Alexander G. Ororbia II, Ankur Mali, Daniel Kifer, and C. Lee Giles. “Deep Credit Assignment by Aligning Local Distributed Representations”. arXiv:1803.01834 [ cs.LG]. • (Ororbia et al., 2018, Continual) -- Alexander G. Ororbia II , Ankur Mali, C. Lee Giles, and Daniel Kifer . “Continual Learning of Recurrent Neural Networks by Locally Aligning Distributed Representations”. arXiv:1810.07411 [ cs.LG]. • (Ororbia et al., 2017, Adapt) -- Alexander G. Ororbia II , Patrick Haffner, David Reitter, and C. Lee Giles. “Learning to Adapt by Minimizing Discrepancy”. arXiv:1711.11542 [cs.LG]. • (Ororbia et al., 2018, Lifelong) -- Alexander G. Ororbia II , Ankur Mali, Daniel Kifer , and C. Lee Giles. “Lifelong Neural Predictive Coding: Sparsity Yields Less Forgetting when Learning Cumulatively”. arXiv :1905.10696 [cs.LG]. • (Ororbia et al., 2018, Bio) -- Alexander G. Ororbia II and Ankur Mali. “Biologically Motivated Algorithms for Propagating Local Target Representations”. In: Thirty - Third AAAI Conference on Artificial Intelligence. 66

Recommend


More recommend