a vector symbolic architecture for representing scene
play

A Vector-Symbolic Architecture for Representing Scene Structure - PowerPoint PPT Presentation

A Vector-Symbolic Architecture for Representing Scene Structure Eric Weiss Redwood Center for Theoretical Neuroscience UC Berkeley 4/15/2016 Active Vision Active Vision Advantages: - Selective processing saves computation - Should scale


  1. A Vector-Symbolic Architecture for Representing Scene Structure Eric Weiss Redwood Center for Theoretical Neuroscience UC Berkeley 4/15/2016

  2. Active Vision

  3. Active Vision Advantages: - Selective processing saves computation - Should scale well with image size

  4. Recurrent Network Classification

  5. Example Trial t = 0 t = 1 t = 3 t = 2 Glimpse windows Glimpse content (input to neural net)

  6. Issues Network training objective: locate a specific digit (“find the 4”) Does not achieve good performance, difficult to debug...

  7. Goal: A memory architecture for recurrent neural networks that: -can represent a set of objects and their positions in the image -is differentiable -is easy to analyze

  8. t=0 t=1 t=2

  9. Vector-Symbolic Representations Everything (object identities, positions, etc.) is represented by an n-dimensional vector. Kanerva 2009, Plate 2003, Danihelka et. al 2016 (Google DeepMind) Encoding Locations (r) -need a continuous function: r = L (x, y)

  10. Vector-Symbolic Representations Association (Binding) -can bind two vectors by multiplying element-wise, forming a new vector m = v * r -m will be nearly orthogonal to any other meaningful vector. So, binding ≅ scrambling

  11. Vector-Symbolic Representations Storing a Set (“scene vector”) -set of items is constructed by summing vectors m = v 0 * r 0 + v 1 * r 1 + v 2 * r 2

  12. Vector-Symbolic Representations Retrieval (Un-binding) -can recover the information associated with a vector r 0 by multiplying the memory vector m by it’s -1 complex conjugate r 0 -1 = (v 0 * r 0 + v 1 * r 1 + v 2 * r 2 ) * r 0 -1 m * r 0 -1 + v 1 * r 1 * r 0 -1 + v 2 * r 2 * r 0 -1 = v 0 + “noise” = v 0 * r 0 * r 0

  13. Encoding a Scene as a Vector t=0 t=1 t=2 v 1 * L (x 0 , y 0 ) + v 2 * L (x 1 , y 1 ) + v 3 * L (x 2 , y 2 ) = m

  14. Dataset & Learning -Trained on toy “multi- MNIST” dataset to report digit identity when queried with a location

  15. Glimpse Selection Strategy -Use reinforcement learning to obtain glimpse policy

  16. background

  17. background

  18. background

  19. background

  20. background

  21. background

  22. background

  23. background

  24. background

  25. background

  26. Intelligent vs. Random Glimpses Classification Accuracy Glimpse Number

  27. Summary / Contributions ● Introduced a novel spatial memory framework for image processing and neural networks ● Demonstrated its effectiveness on a toy dataset ● Showed how it can be used to intelligently guide visual attention

  28. Future Work ● Variable glimpse scaling ● Incorporation in an end-to-end trainable recurrent neural network ● Application to real-world datasets with much larger numbers of object categories

  29. Thanks for listening! Many thanks to: Stephen Tyree Shalini Gupta Pavlo Molchanov Brian Cheung Bruno Olshausen Jan Kautz for support and valuable discussions.

  30. Differentiable Glimpse Mechanism -1 0 1 -1 0 1 -1 -1 I : Image S(x,y) : sampling matrix x,y : Glimpse coordinates 0 0 g : Glimpse output 1 1

Recommend


More recommend