associative memories
play

Associative memories 9/25/2014 Memorized associations are - PowerPoint PPT Presentation

Associative memories 9/25/2014 Memorized associations are ubiquitous Stimulus Response Bill Memorized associations are ubiquitous Stimulus Response Key properties: Noise tolerance (generalization) Graceful saturation


  1. Associative memories � 9/25/2014

  2. Memorized associations are ubiquitous Stimulus Response “Bill”

  3. Memorized associations are ubiquitous Stimulus Response Key properties: � • Noise tolerance (generalization) • Graceful saturation • High capacity “Bill”

  4. First attempts: Holography van Heerden, 1963 Willshaw, Longuet-higgins, 1960s Z � r,s ( x ) = r ? s = r ( ⇣ ) r ( x − ⇣ ) d ⇣ Mathematically, this is the convolution-correlation scheme from class 4. Z s � φ r,s ( x ) = s ( τ ) φ r,s + ( τ + x ) d τ

  5. Steinbuch, 1962 Matrix memories Willshaw et al., 1969 Input / Stimulus Before long, it was realized that better results could be obtained with a simpler, more neurally plausible framework. Let’s explore a simple Hebbian scheme. We have input and output lines, and we strength synapses when they’re on together. Output / Response

  6. Storage

  7. Storage

  8. Storage

  9. Storage

  10. Storage What happens here depends on the specific choice of learning rule.

  11. Storage S 3 S 2 S 1 Additive Hebb rule n X R i S T M = i i =1 What happens here R 1 depends on the R 2 specific choice of R 3 learning rule.

  12. Storage S T M S 3 S 2 S 1 Additive Hebb rule n X R i S T M = i i =1 n X RS T M = i =1 What happens here R 1 depends on the R R 2 specific choice of R 3 learning rule.

  13. Storage S T M S 2 Additive Hebb rule n X R i S T M = i i =1 n X RS T M = i =1

  14. Storage S T M S 2 Additive Hebb rule n X R i S T M = i i =1 n X RS T M = i =1 ˆ ˆ R 2 R

  15. Retrieval S T M S 1 Additive Hebb rule n X R i S T M = i i =1 n X RS T M = i =1

  16. Retrieval S T M S 1 Additive Hebb rule n X R i S T M = i i =1 n X ˆ R i S T R j = i S j i =1 ˆ R 1 ˆ R

  17. Retrieval S T M S 1 Additive Hebb rule n X R i S T M = i i =1 n X ˆ R i S T R j = i S j i =1 n X ˆ R i S T i S j + R j || S j || 2 R j = ˆ R 1 ˆ R i 6 = j

  18. Retrieval S T M S 1 Additive Hebb rule n X R i S T M = i i =1 n X ˆ R i S T R j = i S j i =1 0 1 n X ˆ R i S T i S j + R j || S j || 2 R j = ˆ R 1 ˆ R i 6 = j If the S k are orthonormal , then retrieval is exact.

  19. Another perspective Recall the the optimal memory matrix is M ? = RS † If the columns of S are linearly independent, then X † = ( X T X ) − 1 X T , giving M ? = R ( S T S ) − 1 S T . So if the columns of S (the S i ) are orthonormal, M ? = RS T , which is exactly what we got for the simple Hebb rule.

  20. Capacity How much information can a matrix memory store? Model: P patterns of size N 1. | S | = | R | = N × P 2. Each input pattern (column of S ) has m S nonzeros, and each output pattern (column of R ) has m R nonzeros. 3. Binary Hebb rule: M = max( RS T , 1 ) (Each entry is clipped at one.) 4. Threshold recall: ⇢ 1 [ M S j ] k > τ ˆ R jk = 0 else

  21. Capacity How much information can a matrix memory store? Model: P patterns of size N 1. | S | = | R | = N × P 2. Each input pattern (column of S ) has m S nonzeros, and each output pattern (column of R ) has m R nonzeros. 3. Binary Hebb rule: M = max( RS T , 1 ) � (Each entry is clipped at one.) Parameters we can pick 4. Threshold recall: ⇢ 1 [ M S j ] k > τ ˆ R jk = 0 else

  22. Capacity S T M To choose the threshold τ , note that in the absence of noise, a column of M has exactly m S ones. So in order to recover all the ones in R , we better set τ = m S . ˆ ˆ R 2 R

  23. Capacity Sparsity parameters The chance of given weight in M remaining zero throughout the learning process is (1 − m S m R − P mS mR ) P ≈ e = (1 − q ) N 2 N 2 The probability of a spurious one in ˆ R is the probability of exceeding τ purely by chance. This is Nq m S . So the highest value for m S we can choose before make the first error is: m S = − log N Nq m S = 1 ⇒ log q

  24. Capacity From previous slide Sparsity parameters − P mS mR m S = − log N = (1 − q ) N 2 e log q

  25. Capacity Sparsity parameters This is quite good! = log(2) m S m R − P mS mR (1 − q ) = e N 2 ⇒ N 2 N 2

  26. Kanerva, 1988 Memory space The decision to represent memory items as sparse, high- dimensional vectors has some interesting consequences. High-dimensional spaces are counterintuitive. In 1000 dimensions, 0.001 of all patterns are within 451 bits of a given point, and all but 0.001 are within 549 bits. � Points tend to be orthogonal - most point-pairs are “noise-like.”

  27. Kanerva, 1988 Memory space The decision to represent memory items as sparse, high- dimensional vectors has some interesting consequences. High-dimensional spaces are counterintuitive. Linking concepts Almost all pairs of points are far apart, but there are multiple“linking” points that are close to both. �

  28. Kanerva, 1988 Memory space The decision to represent memory items as sparse, high- dimensional vectors has some interesting consequences. High-dimensional spaces are counterintuitive. Linking concepts Almost all pairs of points are far apart, but there are multiple“linking” points that are close to both. �

  29. Matrix memories in the brain: Marr, 1967 Albus, 1971 Marr’s model of the cerebellum Short, live experiment

  30. Matrix memories in the brain: Marr’s model of the cerebellum The cerebellum produces smooth , coordinated motor movements. (And may be involved in cognition as well). 24-year old Chinese woman without a cerebellum

  31. Learn associations straight Purkinje Axons from context to actions, so Motor Output you don’t have to “think” before doing. Mossy Fiber Contextual Input

  32. Learn associations straight Purkinje Axons from context to actions, so Motor Output you don’t have to “think” before doing. Mossy Fiber Contextual Input

  33. Learn associations straight Purkinje Axons from context to actions, so Motor Output you don’t have to “think” before doing. Mossy Fiber Contextual Input

  34. But how does training work? Purkinje Axons How do the right patterns Motor Output “appear” on the output (Purkinje) lines? Mossy Fiber Contextual Input Climbing Fibers Motor Teaching Input The rest of the brain

  35. Purkinje Axons There’s a remarkable 1-1 correspondence Motor Output between climbing fibers and Purkinje axons. Moreover, each climbing fiber wraps around and around it’s Purkinje axon, making hundreds of synapses; a single AP can make a Purkinje spike. Mossy Fiber Contextual Input Climbing Fibers Motor Teaching Input The rest of the brain

  36. We said that sparsity was a Purkinje Axons key property. How is that Motor Output manifested here? Mossy Fiber Contextual Input Climbing Fibers Motor Teaching Input The rest of the brain

  37. Purkinje Axons Motor Output Granule Cells Sparsification Mossy Fiber Contextual Input Climbing Fibers Motor Teaching Input

  38. Purkinje Axons Motor Output Granule Cells Sparsification Mossy Fiber Contextual Input Climbing Fibers Motor Teaching Input

  39. Purkinje Axons Motor Output Granule Cells Sparsification Mossy Fiber Contextual Input Climbing Fibers Motor Teaching Input

  40. Purkinje Axons Motor Output Granule Cells Sparsification There are 50 billion granule cells - 3/4 of the brain’s neurons.They’re tiny. � The idea here is that they “blow up” the mossy fiber input into a larger space in Mossy Fiber which the signal can be sparser. Contextual Input � Granule cells code for sets of mossy fibers Climbing Fibers Motor Teaching Input ( codons ), hypothesized to be primitive input features.

  41. Storing structured information We’ve discussed how to store S-R pairs, but human cognition goes way beyond this. Relations � • The kettle is on the table. • The kettle is to the right of the mug.

  42. Storing structured information As before, “concepts” are activation vectors. Kettle Jar

  43. Storing structured information As before, “concepts” are activation vectors. How to represent this? Green(Jar) & Gray(Kett le) Kettle Jar

  44. Storing structured information As before, “concepts” are activation vectors. How to represent this? Green(Jar) & Gray(Kett le) Kettle Jar Green Gray

  45. Storing structured information As before, “concepts” are activation vectors. How to represent this? Now what? Maybe we should just have Green(Jar) & Gray(Kett le) all of these patterns fire at once? � But then how do we know we don’t Kettle Jar Green Gray have Gray(Jar) & Green(Kettle) ? Or, worse, Jar(Kettle) & Green & Gray?

  46. Storing structured information As before, “concepts” are activation vectors. How to represent this? Now what? Maybe we should just have Green(Jar) & Gray(Kett le) all of these patterns fire at once? � But then how do we know we don’t Kettle Jar Green Gray have Gray(Jar) & Green(Kettle) ? Or, worse, Jar(Kettle) & Green & Gray? We need a way to bind predicates to arguments.

  47. Storing structured information As before, “concepts” are activation vectors. How to represent this? We need a way to bind Green(Jar) & Gray(Kett le) predicates to arguments. Green Gray Kettle Jar Binding operator ⊗ ⊕ ⊗ Conjunction operator

Recommend


More recommend