Neural Networks
Hopfield Nets and Boltzmann Machines Spring 2018
1
Neural Networks Hopfield Nets and Boltzmann Machines Spring 2018 1 - - PowerPoint PPT Presentation
Neural Networks Hopfield Nets and Boltzmann Machines Spring 2018 1 Recap: Hopfield network = +1 > 0 = + 1 0 Symmetric loopy
1
πβ π
πππ§π + ππ
2
πππ§π + ππ
πβ π
πππ§π + ππ
3
π,π<π
πβ π
πππ§π
4
Not assuming node bias
state PE 5
state PE
6
β In doing so it may flip
β Which may flip
7
πΉ(π‘) = π· β 1 2 ΰ·
π
π¦ππ ππ = β ΰ·
π
ΰ·
π>π
πΎπππ¦ππ¦π β ΰ·
π
πππ¦π
β Dipoles stop flipping if flips result in increase of energy
π ππ = ΰ·
πβ π
πΎπππ¦π + ππ
π¦π = ΰ΅π¦π ππ π‘πππ π¦π π ππ = 1 βπ¦π ππ’βππ π₯ππ‘π
8
β Where energy is a local minimum
configuration
β I.e. the system remembers its stable state and returns to it
9
πΉ = β ΰ·
π
ΰ·
π>π
π₯
πππ§ππ§π
does not change significantly any more
π§π 0 = π¦π, 0 β€ π β€ π β 1
π§π π’ + 1 = Ξ ΰ·
πβ π
π₯
πππ§π
, 0 β€ π β€ π β 1
10
11
12
1
1 13
1
1 1 1
1
14
π
π β I) = πππ β πππ
1
1 1 1
1
15
Number of patterns
1
1 1 1
1
16
π
π β π = πππ β πππ
ππ = ΰ· πβ{π}
ππ§π π
17
18
19
20
Energy landscape
an additive constant Gradients and location
21
Energy landscape
an additive constant Gradients and location
Both have the same Eigen vectors
22
Energy landscape
an additive constant Gradients and location
NOTE: This is a positive semidefinite matrix Both have the same Eigen vectors
23
24
25
26
27
Stored patterns
28
β If a pattern is stored, itβs βghostβ is stored as well β Intuitively, patterns must ideally be maximally far apart
Stored patterns Ghosts (negations)
29
β Projects π³ onto the nearest corner of the hypercube β It βquantizesβ the space into orthants
β Each step rotates the vector π³π and then projects it onto the nearest corner
30
π³ ππ³ Projection: π‘πππ ππ³
31
33
(1,1) (1,-1)
34
35
36
37
38
39
β Let π = π³1 π³2 β¦ π³πΏ π¬π³+1 π¬π³+2 β¦ π¬π π = πΞππ β π¬π³+1 π¬π³+2 β¦ π¬π are orthogonal to π³1 π³2 β¦ π³πΏ β π1 = π2 = ππΏ = 1 β ππΏ+1 , β¦ , ππ = 0
stable (same logic as earlier)
unstable
β Get projected onto subspace spanned by π³1 π³2 β¦ π³πΏ
40
41
ππ = ΰ· πβ{π}
ππ§π π
β Different patterns presented different numbers of times β Equivalent to having unequal Eigen values..
β Hint: Lanczos iterations
π
42
β McElice and Posner, 84β β E.g. when we had the Hebbian net with N orthogonal base patterns, all patterns are stable
β Mostafa and St. Jacques 85β
β McElice et. Al. 87β
β But this may come with many βparasiticβ memories
43
β McElice and Posner, 84β β E.g. when we had the Hebbian net with N orthogonal base patterns, all patterns are stable
β Mostafa and St. Jacques 85β
β McElice et. Al. 87β
β But this may come with many βparasiticβ memories
44
How do we find this network?
β McElice and Posner, 84β β E.g. when we had the Hebbian net with N orthogonal base patterns, all patterns are stable
β Mostafa and St. Jacques 85β
β McElice et. Al. 87β
β But this may come with many βparasiticβ memories
45
Can we do something about this? How do we find this network?
46
47
48
49
π
π³βππ
The bias can be captured by another fixed-value component
50
π
π³βππ
π³βππ
51
π
π³βππ
π³βππ
π³βππ
π³βππ
52
π³βππ
π³βππ
53
π³βππ
π³βππ
54
π³βππ
π³βππ
state Energy Bowls will all actually be quadratic
β Make them local minima β Emphasize more βimportantβ memories by repeating them more frequently
55
π³βππ
π³βππ
state Energy Target patterns
56
π³βππ
π³βππ
state Energy
β If you raise every valley, eventually theyβll all move up above the target patterns, and many will even vanish
57
π³βππ
π³βππ&π³=π€πππππ§
state Energy
58
π³βππ
π³βππ&π³=π€πππππ§
state Energy
59
state Energy
60
π³βππ
π³βππ&π³=π€πππππ§
π β π³π€π³π€ π
61
π³βππ
π³βππ&π³=π€πππππ§
π β π³π€π³π€ π
62
π³βππ
π³βππ&π³=π€πππππ§
63
state Energy
64
state Energy
65
state Energy
β It will settle in a valley. If this is not the target pattern, raise it
66
π³βππ
π³βππ&π³=π€πππππ§
π β π³π€π³π€ π
67
π³βππ
π³βππ&π³=π€πππππ§
68
state Energy
69
state Energy
70
state Energy
71
state Energy
π β π³ππ³π π
72
π³βππ
π³βππ&π³=π€πππππ§
β Minimizing energy maximizes log likelihood
73
β Derivation of this probability is in fact quite trivial..
74
75
76
state Energy
77
π
π³βππ
π³βππ
π³βππ
π³βππ
More importance to more frequently presented memories More importance to more attractive spurious memories
78
π
π³βππ
π³βππ
π³βππ
π³βππ
More importance to more frequently presented memories More importance to more attractive spurious memories
79
π
π³βππ
π³βππ
Natural distribution for variables: The Boltzmann Distribution
π³βππ
π³βππ
80
81
82
83
84
β The neurons that store the actual patterns of interest: Visible neurons β The neurons that only serve to increase the capacity but whose actual values are not important: Hidden neurons β These can be set to anything in order to store a visible pattern
β Ideally choose the one that results in the lowest energy β But thatβs an exponential search space!
β Simulated annealing
88
state PE
π‘
π‘
π‘
π‘
π‘
π‘
π‘
π‘
distribution at T=1
β This is the probability of different states that the network will wander over at equilibrium
π<π
π β πππ‘π
states
β Where a state is a binary string β Specifically, it models a Boltzmann distribution β The parameters of the model are the weights of the network
β It is a generative model: generates states according to π π
π<π
π β πππ‘π
β S has i-th bit = +1 and Sβ has i-th bit = β1
π π = π π‘π = 1 π‘
πβ π π(π‘ πβ π)
π πβ² = π π‘π = β1 π‘πβ π π(π‘πβ π) ππππ π β ππππ πβ² = ππππ π‘π = 1 π‘
πβ π β ππππ π‘π = 0 π‘πβ π
ππππ π β ππππ πβ² = πππ π π‘π = 1 π‘
πβ π
1 β π π‘π = 1 π‘πβ π
96
πβ π
ππ‘ π + ππ
πβ π
ππ‘ π β ππ
ππ‘ π + ππ
97
98
can take value 0 or 1 with a probability that depends on the local field
β Note the slight change from Hopfield nets β Not actually necessary; only a matter of convenience
π
πππ‘ π + ππ
πβ π) =
π
πππ‘ π + ππ
πβ π) =
probability given above
β Gibbs sampling: Fix N-1 variables and sample the remaining variable β As opposed to energy-based update (mean field approximation): run the test zi > 0 ?
π
πππ‘ π + ππ
πβ π) =
π<π
π β πππ‘π
π + πππ‘π
β²π‘ π β² + πππ‘π β²
log π π = ΰ·
π<π
π₯πππ‘ππ‘
π + πππ‘π
β log ΰ·
πβ²
ππ¦π ΰ·
π<π
π₯πππ‘π
β²π‘ π β² + πππ‘π β²
< log π π > = 1 π ΰ·
πβπ
log π π = 1 π ΰ·
π
ΰ·
π<π
π₯πππ‘ππ‘
π + πππ‘π(π)
β log ΰ·
πβ²
ππ¦π ΰ·
π<π
π₯πππ‘π
β²π‘ π β² + πππ‘π β²
β Of which there can be an exponential number!
log π π = 1 π ΰ·
π
ΰ·
π<π
π₯πππ‘ππ‘
π + πππ‘π(π)
β log ΰ·
πβ²
ππ¦π ΰ·
π<π
π₯πππ‘π
β²π‘ π β² + πππ‘π β²
π log π π ππ₯ππ = 1 π ΰ·
π
π‘ππ‘
π β? ? ?
πlog Οπβ² ππ¦π Οπ<π π₯πππ‘π
β²π‘ π β² + πππ‘π β²
ππ₯ππ = ΰ·
πβ²
ππ¦π Οπ<π π₯πππ‘π
β²π‘ π β² + πππ‘π β²
Οπβ² ππ¦π Οπ<π π₯πππ‘π
β²π‘ π β² + πππ‘π β² π‘π β²π‘ π β²
πlog Οπβ² ππ¦π Οπ<π π₯πππ‘π
β²π‘ π β² + πππ‘π β²
ππ₯ππ = ΰ·
πβ²
π(πβ²)π‘π
β²π‘ π β²
β By probabilistically selecting state values according to our model
ππ‘πππ£π = {ππ‘πππ£π,1, ππ‘πππ£π,1=2, β¦ , ππ‘πππ£π,π}
πβ²
β²π‘ π β² β 1
πβ²βππ‘πππ£π
β²π‘ π β²
πlog Οπβ² ππ¦π Οπ<π π₯πππ‘π
β²π‘ π β² + πππ‘π β²
ππ₯ππ = ΰ·
πβ²
π(πβ²)π‘π
β²π‘ π β²
log π π = 1 π ΰ·
π
ΰ·
π<π
π₯πππ‘ππ‘
π + πππ‘π(π)
β log ΰ·
πβ²
ππ¦π ΰ·
π<π
π₯πππ‘π
β²π‘ π β² + πππ‘π β²
π log π π ππ₯ππ = 1 π ΰ·
π
π‘ππ‘
π β 1
π ΰ·
πβ²βππ‘πππ£π
π‘π
β²π‘ π β²
π log π π ππ₯ππ = 1 π ΰ·
π
π‘ππ‘
π β 1
π ΰ·
πβ²βππ‘πππ£π
π‘π
β²π‘ π β²
110