Memory Networks and Neural Turing Machines
Diego Marcheggiani
University of Amsterdam ILLC
Memory Networks and Neural Turing Machines Diego Marcheggiani - - PowerPoint PPT Presentation
Memory Networks and Neural Turing Machines Diego Marcheggiani University of Amsterdam ILLC Unsupervised Language Learning 2016 Outline Motivation Memory Networks End-to-end Memory Networks Neural Turing Machines Outline Motivation
University of Amsterdam ILLC
◮ Neural networks have hard times to capture long-range
◮ Neural networks have hard times to capture long-range
◮ Neural networks have hard times to capture long-range
◮ Memory networks (MN) and Neural Turing machines (NTM) try to
◮ MN are mainly motivated by the fact that it is hard to capture
◮ while NTM are devised to perform program induction.
◮ We have a neural network, RNN, MLP, ...
◮ We have a neural network, RNN, MLP, ... ◮ An external memory.
◮ We have a neural network, RNN, MLP, does not matter. ◮ External memory where the neural network can write and read to.
writing
◮ We have a neural network, RNN, MLP, does not matter. ◮ External memory where the neural network can write and read to.
writing reading
◮ Input feature map (I): transforms the input in a feature
◮ Input feature map (I): transforms the input in a feature
◮ Generalization (G): writes the input, or a function of it, on the
◮ Input feature map (I): transforms the input in a feature
◮ Generalization (G): writes the input, or a function of it, on the
◮ Output feature map (O): reads the most relevant memory slots
◮ Input feature map (I): transforms the input in a feature
◮ Generalization (G): writes the input, or a function of it, on the
◮ Output feature map (O): reads the most relevant memory slots ◮ Response (R): given the info read from the memory, returns the
◮ Input feature map (I): transforms the input in a feature
◮ Generalization (G): writes the input, or a function of it, on the
◮ Output feature map (O): reads the most relevant memory slots ◮ Response (R): given the info read from the memory, returns the
◮ Input text: Fred moved to the bedroom.
◮ Input text: Fred moved to the bedroom.
◮ Input question: Where is Dan now?
◮ Input text: Fred moved to the bedroom.
◮ Input question: Where is Dan now? ◮ Output answer: bedroom
Fred moved to the bedroom. Joe went to the kitchen. Joe took the milk. Dan journeyed to the bedroom.
Fred moved to the bedroom. Joe went to the kitchen. Joe took the milk. Dan journeyed to the bedroom.
Fred moved to the bedroom. Joe went to the kitchen. Joe took the milk. Dan journeyed to the bedroom. m
Fred moved to the bedroom. Joe went to the kitchen. Joe took the milk. Dan journeyed to the bedroom. m Where is Dan now?
Fred moved to the bedroom. Joe went to the kitchen. Joe took the milk. Dan journeyed to the bedroom. m Where is Dan now?
O · UO · y
Fred moved to the bedroom. Joe went to the kitchen. Joe took the milk. Dan journeyed to the bedroom. m Where is Dan now? Answer: bedroom
R · UR · y
Fred moved to the bedroom. Joe went to the kitchen. Joe took the milk. Dan journeyed to the bedroom. m Where is Dan now? Answer: bedroom
R · UR · y
Fred moved to the bedroom. Joe went to the kitchen. Joe took the milk. Dan journeyed to the bedroom. m Where is the milk now?
Fred moved to the bedroom. Joe went to the kitchen. Joe took the milk. Dan journeyed to the bedroom. m Where is the milk now?
O · UO · y
Fred moved to the bedroom. Joe went to the kitchen. Joe took the milk. Dan journeyed to the bedroom. m Where is the milk now?
O · UO · y
Fred moved to the bedroom. Joe went to the kitchen. Joe took the milk. Dan journeyed to the bedroom. m Where is the milk now? Answer: kitchen
R · UR · y
r=r
◮ Single word as answer.
◮ Single word as answer. ◮ Need to iterate over the entire memory.
◮ Single word as answer. ◮ Need to iterate over the entire memory. ◮ The write component is somehow naive.
◮ Single word as answer. ◮ Need to iterate over the entire memory. ◮ The write component is somehow naive. ◮ Strongly
◮ Single word as answer. ◮ Need to iterate over the entire memory. ◮ The write component is somehow naive. ◮ Strongly fully
◮ Single word as answer. ◮ Need to iterate over the entire memory. ◮ The write component is somehow naive. ◮ Strongly fully extremely
◮ Single word as answer. ◮ Need to iterate over the entire memory. ◮ The write component is somehow naive. ◮ Strongly fully extremely supervised.
◮ argmax is substituted by a soft attention mechanism ◮ less supervised, no need for annotated supporting facts
Fred moved to the bedroom. Joe went to the kitchen. Joe took the milk. Dan journeyed to the bedroom. m Where is Dan now?
Fred moved to the bedroom. Joe went to the kitchen. Joe took the milk. Dan journeyed to the bedroom. m Where is Dan now? B A softmax(u^T A m)
Fred moved to the bedroom. Joe went to the kitchen. Joe took the milk. Dan journeyed to the bedroom. m Where is Dan now? B A softmax(u^T A m) C
Fred moved to the bedroom. Joe went to the kitchen. Joe took the milk. Dan journeyed to the bedroom. m Where is Dan now? B A softmax(u^T A m) C
W
Fred moved to the bedroom. Joe went to the kitchen. Joe took the milk. Dan journeyed to the bedroom. m Where is the milk now?
Fred moved to the bedroom. Joe went to the kitchen. Joe took the milk. Dan journeyed to the bedroom. m Where is the milk now? B A1 softmax(u^T A1 m)
i = softmax(u1T · A1 · mi)
Fred moved to the bedroom. Joe went to the kitchen. Joe took the milk. Dan journeyed to the bedroom. m Where is the milk now? B A1 softmax(u1^T A1 m) C1
i = C 1 · mi
i c1 i )
Fred moved to the bedroom. Joe went to the kitchen. Joe took the milk. Dan journeyed to the bedroom. m Where is the milk now? B A1 softmax(u1^T A1 m) C1
Fred moved to the bedroom. Joe went to the kitchen. Joe took the milk. Dan journeyed to the bedroom. m Where is the milk now? B A1 softmax(u1^T A1 m) C1
softmax(u2^T A2 m) A2
i = softmax(u2T · A2 · mi)
Fred moved to the bedroom. Joe went to the kitchen. Joe took the milk. Dan journeyed to the bedroom. m Where is the milk now? B A1 softmax(u1^T A1 m) C1
softmax(u2^T A2 m) A2 C2
i = C 2 · mi
i c2 i )
Fred moved to the bedroom. Joe went to the kitchen. Joe took the milk. Dan journeyed to the bedroom. m Where is the milk now? B A1 softmax(u1^T A1 m) C1
softmax(u2^T A2 m) A2 C2
Answer: kitchen W
◮ As Turing machines, NTMs have a controller, a memory, a write
◮ Differently from Memory Networks,
◮ the attention mechanism of NTMs is more sophisticated. ◮ NTMs are already equipped for rewriting the memory.
◮ The memory can be updated during training and testing, at each
◮ The memory can be updated during training and testing, at each
◮ wr t is the weighting vector over the memory at time t - It is
◮ The memory can be updated during training and testing, at each
◮ wr t is the weighting vector over the memory at time t - It is
◮ The read vector is calculated as:
t is emitted by the controller.
◮ ww t is the write weighting vector (emitted by the controller). ◮ Mt(i) represents the memory location i at time step t.
◮ ww t is the write weighting vector (emitted by the controller). ◮ Mt(i) represents the memory location i at time step t. ◮ write operation is composed by two parts: ◮ erase part ◮ the controller emits an erase vector et in the range (0,1)
◮ ww t is the write weighting vector (emitted by the controller). ◮ Mt(i) represents the memory location i at time step t. ◮ write operation is composed by two parts: ◮ erase part ◮ the controller emits an erase vector et in the range (0,1)
◮ add part ◮ the controller emits an add vector at
◮ content-based addressing ◮ location-based addressing
◮ the controller emits a key vector kt ◮ the key vector is compared to the memory via cosine similarity K[·, ·] ◮
◮ βt is a scalar emitted by the controller that attenuates or amplify
t = gtwc t + (1 − gt)wt−1
N−1
t (j)st(i − j)
◮ It can be a recurrent or a feedforward neural network. ◮ it takes as input a vector xt and the memory Mt ◮ the output is
◮ It can be a recurrent or a feedforward neural network. ◮ it takes as input a vector xt and the memory Mt ◮ the output is
◮ It can be a recurrent or a feedforward neural network. ◮ it takes as input a vector xt and the memory Mt ◮ the output is
◮ It can be a recurrent or a feedforward neural network. ◮ it takes as input a vector xt and the memory Mt ◮ the output is
◮ It can be a recurrent or a feedforward neural network. ◮ it takes as input a vector xt and the memory Mt ◮ the output is
◮ It can be a recurrent or a feedforward neural network. ◮ it takes as input a vector xt and the memory Mt ◮ the output is
◮ It can be a recurrent or a feedforward neural network. ◮ it takes as input a vector xt and the memory Mt ◮ the output is
◮ It can be a recurrent or a feedforward neural network. ◮ it takes as input a vector xt and the memory Mt ◮ the output is
◮ It can be a recurrent or a feedforward neural network. ◮ it takes as input a vector xt and the memory Mt ◮ the output is
◮ the emissions for the write and read head, and the erase and add
◮ copy task ◮ repeat copy ◮ associative recall ◮ sorting
◮ Neural Programmer: Inducing Latent Programs with Gradient
◮ Neural Programmer-Interpreters ◮ Reinforcement Learning Neural Turing Machines - Revised ◮ Neural Random-Access Machines ◮ Neural GPUs Learn Algorithms
◮ Ask Me Anything: Dynamic Memory Networks for Natural Language
◮ The Goldilocks Principle: Reading Children’s Books with Explicit
◮ Memory networks ◮ End-to-end memory networks ◮ Neural Turing machines