A Vector-Symbolic Architecture for Representing Scene Structure - - PowerPoint PPT Presentation

a vector symbolic architecture for representing scene
SMART_READER_LITE
LIVE PREVIEW

A Vector-Symbolic Architecture for Representing Scene Structure - - PowerPoint PPT Presentation

A Vector-Symbolic Architecture for Representing Scene Structure Eric Weiss Redwood Center for Theoretical Neuroscience UC Berkeley 4/15/2016 Active Vision Active Vision Advantages: - Selective processing saves computation - Should scale


slide-1
SLIDE 1

A Vector-Symbolic Architecture for Representing Scene Structure

Eric Weiss

Redwood Center for Theoretical Neuroscience UC Berkeley 4/15/2016

slide-2
SLIDE 2

Active Vision

slide-3
SLIDE 3

Active Vision

Advantages:

  • Selective processing saves computation
  • Should scale well with image size
slide-4
SLIDE 4

Recurrent Network

Classification

slide-5
SLIDE 5

Example Trial

t = 0 t = 1 t = 2 t = 3

Glimpse windows Glimpse content (input to neural net)

slide-6
SLIDE 6

Issues

Network training objective: locate a specific digit (“find the 4”) Does not achieve good performance, difficult to debug...

slide-7
SLIDE 7

Goal:

A memory architecture for recurrent neural networks that:

  • can represent a set of objects and their

positions in the image

  • is differentiable
  • is easy to analyze
slide-8
SLIDE 8

t=0 t=1 t=2

slide-9
SLIDE 9

Vector-Symbolic Representations

Everything (object identities, positions, etc.) is represented by an n-dimensional vector.

Kanerva 2009, Plate 2003, Danihelka et. al 2016 (Google DeepMind)

Encoding Locations (r)

  • need a continuous function: r = L(x, y)
slide-10
SLIDE 10

Vector-Symbolic Representations

Association (Binding)

  • can bind two vectors by multiplying element-wise,

forming a new vector m = v*r

  • m will be nearly orthogonal to any other

meaningful vector. So, binding ≅ scrambling

slide-11
SLIDE 11

Vector-Symbolic Representations

Storing a Set (“scene vector”)

  • set of items is constructed by summing vectors

m = v0*r0 + v1*r1 + v2*r2

slide-12
SLIDE 12

Vector-Symbolic Representations

Retrieval (Un-binding)

  • can recover the information associated with a

vector r0 by multiplying the memory vector m by it’s complex conjugate r0

  • 1

m*r0

  • 1 = (v0*r0 + v1*r1 + v2*r2 )*r0
  • 1

= v0*r0*r0

  • 1+ v1*r1*r0
  • 1+ v2*r2*r0
  • 1 = v0 + “noise”
slide-13
SLIDE 13

t=0 t=1 t=2

Encoding a Scene as a Vector

v1*L(x0, y0) + v2*L(x1, y1) + v3*L(x2, y2) = m

slide-14
SLIDE 14

Dataset & Learning

  • Trained on toy “multi-

MNIST” dataset to report digit identity when queried with a location

slide-15
SLIDE 15

Glimpse Selection Strategy

  • Use reinforcement learning

to obtain glimpse policy

slide-16
SLIDE 16

background

slide-17
SLIDE 17

background

slide-18
SLIDE 18

background

slide-19
SLIDE 19

background

slide-20
SLIDE 20

background

slide-21
SLIDE 21

background

slide-22
SLIDE 22

background

slide-23
SLIDE 23

background

slide-24
SLIDE 24

background

slide-25
SLIDE 25

background

slide-26
SLIDE 26

Intelligent vs. Random Glimpses

Classification Accuracy Glimpse Number

slide-27
SLIDE 27

Summary / Contributions

  • Introduced a novel spatial memory

framework for image processing and neural networks

  • Demonstrated its effectiveness on a toy

dataset

  • Showed how it can be used to intelligently

guide visual attention

slide-28
SLIDE 28

Future Work

  • Variable glimpse scaling
  • Incorporation in an end-to-end trainable

recurrent neural network

  • Application to real-world datasets with much

larger numbers of object categories

slide-29
SLIDE 29

Thanks for listening!

Many thanks to: Stephen Tyree Shalini Gupta Pavlo Molchanov Brian Cheung Bruno Olshausen Jan Kautz for support and valuable discussions.

slide-30
SLIDE 30

Differentiable Glimpse Mechanism

I: Image S(x,y): sampling matrix x,y: Glimpse coordinates g: Glimpse output

  • 1 0 1
  • 1

1

  • 1 0 1
  • 1

1