dense associative memories and deep learning
play

Dense Associative Memories and Deep Learning Dmitry Krotov IBM - PowerPoint PPT Presentation

Dense Associative Memories and Deep Learning Dmitry Krotov IBM Research MIT-IBM Watson AI Lab Institute for Advanced Study Learning Mechanisms Architectures What is associative memory? energy landscape 1 2 3 4 memories


  1. Dense Associative Memories and Deep Learning Dmitry Krotov IBM Research MIT-IBM Watson AI Lab Institute for Advanced Study

  2. Learning Mechanisms Architectures

  3. What is associative memory? energy landscape ξ 1 ξ 2 ξ 3 ξ 4 memories

  4. Standard Dense Associative Memory Associative Memory N X E = − σ i T ij σ j K X ξ µ i ξ µ K ⇣ N i,j =1 T ij = ⌘ n j X X ξ µ E = − i σ i µ =1 µ =1 i =1 -dynamical variables σ i n ≥ 2 ξ µ -memorized patterns i power of the N -number of neurons interaction vertex K -number of memories K ⇣ N ⌘ 2 X X ξ µ E = − i σ i µ =1 i =1 K max ≈ 0 . 14 N K max ≈ α n N n − 1

  5.  K ✓ ⌘◆� ⇣ ⌘ ⇣ σ ( t +1) j σ ( t ) j σ ( t ) X X X ξ µ ξ µ � ξ µ ξ µ = Sign F i + � F i + i j j µ =1 j 6 = i j 6 = i h ξ µ i i = 0 h ξ µ j i = δ µ ν δ ij i ξ ν

  6. Pattern recognition with DAM classification visible neurons neurons 28 v i = v i x α or c α 784 10 28

  7.  K ✓ ⌘◆� ⇣ ⌘ ⇣ σ ( t +1) j σ ( t ) j σ ( t ) X X X ξ µ ξ µ � ξ µ ξ µ = Sign F i + � F i + i j j µ =1 j 6 = i j 6 = i K N N  ✓ ⌘◆� ⇣ ⌘ ⇣ X X X ξ µ X X ξ µ − ξ µ ξ µ ξ µ ξ µ c α = g α x α + γ x γ + α x α + γ x γ + β F i v i − F i v i µ =1 i =1 i =1 γ 6 = α γ 6 = α output c α . The update g ( x ) = tanh( x )

  8. training random memories constructed memory ξ µ i ∈ N (0 , 0 . 1) vectors MNIST Dataset

  9. Main question: What kind of representation of the data has the neural network learned?

  10. Features vs. prototypes in psychology and neuroscience Feature-matching theory Prototype theory Electrical signal from brain Visual area of brain Recording electrode Stimulus Hubel,Wiesel, 1959 Solso, McCarthy,1981 training set Wallis, et al., Journal of Vision,2008

  11. Feature to prototype transition power of the interaction vertex n = 3 n = 20 n = 2 n = 30 1 256 0 . 5 192 0 128 − 0 . 5 64 − 1 feature detectors prototype detectors

  12. Feature to prototype transition power of the interaction vertex n = 3 n = 20 n = 2 n = 30 1 256 0 . 5 192 0 128 − 0 . 5 64 − 1 1 . 44% 1 . 51% 1 . 61% 1 . 80% 1 . 6% Simard, Steinkraus, Platt, 2003

  13. Duality with feed-forward nets c α ⇣ K ⌘ c α X v i ξ µ c α = g α h µ µ =1 h µ ⇣ N v i x α ⌘ X ξ µ h µ = f i v i i =1 K ⇣ N 10 v i ⌘ X X X ξ µ ξ µ E = − F i v i + α c α µ =1 i =1 α =1 Duality rule: f ( x ) = F 0 ( x ) activation energy function function

  14. Commonly used activation functions n = 2 n standard DAM Hopfield net f ( x ) = ReP n − 1 f ( x ) = ReLU x x

  15. Question: Are there any tasks for which models with higher order interactions perform better than models with quadratic interactions?

  16. Adversarial Inputs n=2 2 3 v i → v i − ∂ C ∂ v i

  17. Adversarial Deformations in DAM 10 C 1st log( C α ) A A 0 C 2nd A A -10 decision boundary -20 10 20 30 40 50 60 70 80 number of image updates n=2 n=3 n=20 n=30 8 3 8 3 3 3 3 9 5 8 8 8

  18. Question: Can we use Dense Associative Memories for classification of high resolution images?

  19. VGG16 coupled to DAM �������������������������������� ��������������������������� ���� ���

  20. Adversarial Inputs in the Image Domain ����������������������������

  21. Input transfer classified by n=2 classified by n=8 made with n=2 ������������������� classified by n=2 classified by n=8 made with n=8 �������������������

  22. Error rate of misclassification Classify n=2 n=8 Generate n=2 100% 32% n=8 57% 100%

  23. n=30 37.6% 48.3% 56.9% 98.8% generate n=20 45.3% 63.7% 98.9% 5.77% n=3 33.9% 99% 8.71% 3.32% 98.9% 50.7% 9.07% 3.44% n=2 n=2 n=3 n=20 n=30 test

  24. Results on ImageNet Accuracy: 69%

  25. ImageNet errors police van, police wagon, paddy wagon, patrol wagon, wagon, black Maria bell cote, bell cot

  26. Large Capacity Physics Dense Associative Memories K ⇣ N ⌘ n X X ξ µ E = − i σ i µ =1 i =1 Psychology Computer Neuroscience Science No Adversarial Feature to Prototype Problems Transition

Recommend


More recommend