Dense Associative Memories and Deep Learning Dmitry Krotov IBM - PowerPoint PPT Presentation

Dense Associative Memories and Deep Learning Dmitry Krotov IBM Research MIT-IBM Watson AI Lab Institute for Advanced Study

Learning Mechanisms Architectures

What is associative memory? energy landscape ξ 1 ξ 2 ξ 3 ξ 4 memories

Standard Dense Associative Memory Associative Memory N X E = − σ i T ij σ j K X ξ µ i ξ µ K ⇣ N i,j =1 T ij = ⌘ n j X X ξ µ E = − i σ i µ =1 µ =1 i =1 -dynamical variables σ i n ≥ 2 ξ µ -memorized patterns i power of the N -number of neurons interaction vertex K -number of memories K ⇣ N ⌘ 2 X X ξ µ E = − i σ i µ =1 i =1 K max ≈ 0 . 14 N K max ≈ α n N n − 1

 K ✓ ⌘◆� ⇣ ⌘ ⇣ σ ( t +1) j σ ( t ) j σ ( t ) X X X ξ µ ξ µ � ξ µ ξ µ = Sign F i + � F i + i j j µ =1 j 6 = i j 6 = i h ξ µ i i = 0 h ξ µ j i = δ µ ν δ ij i ξ ν

Pattern recognition with DAM classification visible neurons neurons 28 v i = v i x α or c α 784 10 28

 K ✓ ⌘◆� ⇣ ⌘ ⇣ σ ( t +1) j σ ( t ) j σ ( t ) X X X ξ µ ξ µ � ξ µ ξ µ = Sign F i + � F i + i j j µ =1 j 6 = i j 6 = i K N N  ✓ ⌘◆� ⇣ ⌘ ⇣ X X X ξ µ X X ξ µ − ξ µ ξ µ ξ µ ξ µ c α = g α x α + γ x γ + α x α + γ x γ + β F i v i − F i v i µ =1 i =1 i =1 γ 6 = α γ 6 = α output c α . The update g ( x ) = tanh( x )

training random memories constructed memory ξ µ i ∈ N (0 , 0 . 1) vectors MNIST Dataset

Main question: What kind of representation of the data has the neural network learned?

Features vs. prototypes in psychology and neuroscience Feature-matching theory Prototype theory Electrical signal from brain Visual area of brain Recording electrode Stimulus Hubel,Wiesel, 1959 Solso, McCarthy,1981 training set Wallis, et al., Journal of Vision,2008

Feature to prototype transition power of the interaction vertex n = 3 n = 20 n = 2 n = 30 1 256 0 . 5 192 0 128 − 0 . 5 64 − 1 feature detectors prototype detectors

Feature to prototype transition power of the interaction vertex n = 3 n = 20 n = 2 n = 30 1 256 0 . 5 192 0 128 − 0 . 5 64 − 1 1 . 44% 1 . 51% 1 . 61% 1 . 80% 1 . 6% Simard, Steinkraus, Platt, 2003

Duality with feed-forward nets c α ⇣ K ⌘ c α X v i ξ µ c α = g α h µ µ =1 h µ ⇣ N v i x α ⌘ X ξ µ h µ = f i v i i =1 K ⇣ N 10 v i ⌘ X X X ξ µ ξ µ E = − F i v i + α c α µ =1 i =1 α =1 Duality rule: f ( x ) = F 0 ( x ) activation energy function function

Commonly used activation functions n = 2 n standard DAM Hopfield net f ( x ) = ReP n − 1 f ( x ) = ReLU x x

Question: Are there any tasks for which models with higher order interactions perform better than models with quadratic interactions?

Adversarial Inputs n=2 2 3 v i → v i − ∂ C ∂ v i

Adversarial Deformations in DAM 10 C 1st log( C α ) A A 0 C 2nd A A -10 decision boundary -20 10 20 30 40 50 60 70 80 number of image updates n=2 n=3 n=20 n=30 8 3 8 3 3 3 3 9 5 8 8 8

Question: Can we use Dense Associative Memories for classification of high resolution images?

VGG16 coupled to DAM ��

Adversarial Inputs in the Image Domain ��

Input transfer classified by n=2 classified by n=8 made with n=2 �� classified by n=2 classified by n=8 made with n=8 ��

Error rate of misclassification Classify n=2 n=8 Generate n=2 100% 32% n=8 57% 100%

n=30 37.6% 48.3% 56.9% 98.8% generate n=20 45.3% 63.7% 98.9% 5.77% n=3 33.9% 99% 8.71% 3.32% 98.9% 50.7% 9.07% 3.44% n=2 n=2 n=3 n=20 n=30 test

Results on ImageNet Accuracy: 69%

ImageNet errors police van, police wagon, paddy wagon, patrol wagon, wagon, black Maria bell cote, bell cot

Large Capacity Physics Dense Associative Memories K ⇣ N ⌘ n X X ξ µ E = − i σ i µ =1 i =1 Psychology Computer Neuroscience Science No Adversarial Feature to Prototype Problems Transition

Dense Associative Memories and Deep Learning Dmitry Krotov IBM - PowerPoint PPT Presentation

Dense Associative Memories and Deep Learning Dmitry Krotov IBM Research MIT-IBM Watson AI Lab Institute for Advanced Study Learning Mechanisms Architectures What is associative memory? energy landscape 1 2 3 4 memories

Real Time Embedded Systems " Memories Memories " rene.beuchat@epfl.ch LAP/ISIM/IC/EPFL

Associative arrays Associative arrays map a key to a value Keys and values can be different

Lazy Associative Classification Decision Tree Classifier (Eager) Associative Classifier By

In-Place Associative Computing Avidan Akerib Ph.D. Vice President Associative Computing BU

Results for different matrices and comparisons Dense Matrices Rectangular Matrices

Dense Flow Visualization Lecture 10 February 27, 2020 General Overview Dense methods in 2D

A Massively Parallel Dense Symmetric A Massively Parallel Dense Symmetric A Massively Parallel

Memories Introduction Why do we need memory in an FPGA Device? Topics Types of FPGA

Memories and SRAM 1 Silicon Memories Why store things in silicon? Its fast!!!

Memories and SRAM 1 Silicon Memories Why store things in silicon? Its fast!!!

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

10. Left-associative grammar (LAG) 10.1 Rule types and derivation order 10.1.1 The notion

Cache design overview ANY cache can be viewed as k-way associative. What are the pros and cons of

Associative dyadic boolean functions Goals Def: A Boolean function f : { 0 , 1 } 2 { 0 , 1 }

NEURAL NETWORKS NEURAL NETWORKS THE IDEA BEHIND ARTIFICIAL NEURONS Initially a simplified

Feature Selection Richard Pospesel and Bert Wierenga Introduction Preprocessing Peaking

Threshold Networks over undirected graphs Universidad Adolfo

CSC421 Intro to Artificial Intelligence UNIT 32: Instance-based Learning and Neural Networks

POLAR: Attention-based CNN for One-shot Personalized Article Recommendation Zhengxiao Du, Jie

Algorithmic Learning Theory Theoretical Computer Science Peter Rossmanith Felix Reidl, Fernando

Introduction to Artificial Intelligence What is Artificial Intelligence for YOU? CPSC 533

Course setup 9 ec course examination based on computer exercises weekly exercises

Dense Associative Memories and Deep Learning Dmitry Krotov IBM - PowerPoint PPT Presentation

Dense Associative Memories and Deep Learning Dmitry Krotov IBM Research MIT-IBM Watson AI Lab Institute for Advanced Study Learning Mechanisms Architectures What is associative memory? energy landscape 1 2 3 4 memories

Real Time Embedded Systems &quot; Memories Memories &quot; rene.beuchat@epfl.ch LAP/ISIM/IC/EPFL

Associative arrays Associative arrays map a key to a value Keys and values can be different

Lazy Associative Classification Decision Tree Classifier (Eager) Associative Classifier By

In-Place Associative Computing Avidan Akerib Ph.D. Vice President Associative Computing BU

Results for different matrices and comparisons Dense Matrices Rectangular Matrices

Dense Flow Visualization Lecture 10 February 27, 2020 General Overview Dense methods in 2D

A Massively Parallel Dense Symmetric A Massively Parallel Dense Symmetric A Massively Parallel

Memories Introduction Why do we need memory in an FPGA Device? Topics Types of FPGA

Memories and SRAM 1 Silicon Memories Why store things in silicon? Its fast!!!

Memories and SRAM 1 Silicon Memories Why store things in silicon? Its fast!!!

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

10. Left-associative grammar (LAG) 10.1 Rule types and derivation order 10.1.1 The notion

Cache design overview ANY cache can be viewed as k-way associative. What are the pros and cons of

Associative dyadic boolean functions Goals Def: A Boolean function f : { 0 , 1 } 2 { 0 , 1 }

NEURAL NETWORKS NEURAL NETWORKS THE IDEA BEHIND ARTIFICIAL NEURONS Initially a simplified

Feature Selection Richard Pospesel and Bert Wierenga Introduction Preprocessing Peaking

Threshold Networks over undirected graphs Universidad Adolfo

CSC421 Intro to Artificial Intelligence UNIT 32: Instance-based Learning and Neural Networks

POLAR: Attention-based CNN for One-shot Personalized Article Recommendation Zhengxiao Du, Jie

Algorithmic Learning Theory Theoretical Computer Science Peter Rossmanith Felix Reidl, Fernando

Introduction to Artificial Intelligence What is Artificial Intelligence for YOU? CPSC 533

Course setup 9 ec course examination based on computer exercises weekly exercises

Real Time Embedded Systems " Memories Memories " rene.beuchat@epfl.ch LAP/ISIM/IC/EPFL