A Vector-Symbolic Architecture for Representing Scene Structure Eric Weiss Redwood Center for Theoretical Neuroscience UC Berkeley 4/15/2016
Active Vision
Active Vision Advantages: - Selective processing saves computation - Should scale well with image size
Recurrent Network Classification
Example Trial t = 0 t = 1 t = 3 t = 2 Glimpse windows Glimpse content (input to neural net)
Issues Network training objective: locate a specific digit (“find the 4”) Does not achieve good performance, difficult to debug...
Goal: A memory architecture for recurrent neural networks that: -can represent a set of objects and their positions in the image -is differentiable -is easy to analyze
t=0 t=1 t=2
Vector-Symbolic Representations Everything (object identities, positions, etc.) is represented by an n-dimensional vector. Kanerva 2009, Plate 2003, Danihelka et. al 2016 (Google DeepMind) Encoding Locations (r) -need a continuous function: r = L (x, y)
Vector-Symbolic Representations Association (Binding) -can bind two vectors by multiplying element-wise, forming a new vector m = v * r -m will be nearly orthogonal to any other meaningful vector. So, binding ≅ scrambling
Vector-Symbolic Representations Storing a Set (“scene vector”) -set of items is constructed by summing vectors m = v 0 * r 0 + v 1 * r 1 + v 2 * r 2
Vector-Symbolic Representations Retrieval (Un-binding) -can recover the information associated with a vector r 0 by multiplying the memory vector m by it’s -1 complex conjugate r 0 -1 = (v 0 * r 0 + v 1 * r 1 + v 2 * r 2 ) * r 0 -1 m * r 0 -1 + v 1 * r 1 * r 0 -1 + v 2 * r 2 * r 0 -1 = v 0 + “noise” = v 0 * r 0 * r 0
Encoding a Scene as a Vector t=0 t=1 t=2 v 1 * L (x 0 , y 0 ) + v 2 * L (x 1 , y 1 ) + v 3 * L (x 2 , y 2 ) = m
Dataset & Learning -Trained on toy “multi- MNIST” dataset to report digit identity when queried with a location
Glimpse Selection Strategy -Use reinforcement learning to obtain glimpse policy
background
background
background
background
background
background
background
background
background
background
Intelligent vs. Random Glimpses Classification Accuracy Glimpse Number
Summary / Contributions ● Introduced a novel spatial memory framework for image processing and neural networks ● Demonstrated its effectiveness on a toy dataset ● Showed how it can be used to intelligently guide visual attention
Future Work ● Variable glimpse scaling ● Incorporation in an end-to-end trainable recurrent neural network ● Application to real-world datasets with much larger numbers of object categories
Thanks for listening! Many thanks to: Stephen Tyree Shalini Gupta Pavlo Molchanov Brian Cheung Bruno Olshausen Jan Kautz for support and valuable discussions.
Differentiable Glimpse Mechanism -1 0 1 -1 0 1 -1 -1 I : Image S(x,y) : sampling matrix x,y : Glimpse coordinates 0 0 g : Glimpse output 1 1
Recommend
More recommend