Four non- orthogonal 6-bit patterns • Patterns are perfectly stationary and stable for K > 0.14N • Fewer spurious minima than for the orthogonal 2-pattern case – Most fake-looking memories are in fact ghosts.. 33
Six non- orthogonal 6-bit patterns • Breakdown largely due to interference from “ghosts” • But patterns are stationary, and often stable – For K >> 0.14N 34
More visualization.. • Lets inspect a few 8-bit patterns – Keeping in mind that the Karnaugh map is now a 4-dimensional tesseract 35
One 8-bit pattern • Its actually cleanly stored, but there are a few spurious minima 36
Two orthogonal 8-bit patterns • Both have regions of attraction • Some spurious minima 37
Two non-orthogonal 8-bit patterns • Actually have fewer spurious minima – Not obvious from visualization.. 38
Four orthogonal 8-bit patterns • Successfully stored 39
Four non-orthogonal 8-bit patterns • Stored with interference from ghosts.. 40
Eight orthogonal 8-bit patterns • Wipeout 41
Eight non-orthogonal 8-bit patterns • Nothing stored – Neither stationary nor stable 42
Making sense of the behavior • Seems possible to store K > 0.14N patterns – i.e. obtain a weight matrix W such that K > 0.14N patterns are stationary – Possible to make more than 0.14N patterns at-least 1-bit stable • So what was Hopfield talking about? • Patterns that are non-orthogonal easier to remember – I.e. patterns that are closer are easier to remember than patterns that are farther!! • Can we attempt to get greater control on the process than Hebbian learning gives us? 43
Bold Claim • I can always store (upto) N orthogonal patterns such that they are stationary! – Although not necessarily stable • Why? 44
“Training” the network • How do we make the network store a specific pattern or set of patterns? – Hebbian learning – Geometric approach – Optimization • Secondary question – How many patterns can we store? 45
A minor adjustment • Note behavior of 𝐅 𝐳 = 𝐳 𝑈 𝐗𝐳 with 𝐗 = 𝐙𝐙 𝑈 − 𝑂 𝑞 𝐉 Energy landscape only differs by an additive constant • Is identical to behavior with Gradients and location 𝐗 = 𝐙𝐙 𝑈 of minima remain same • Since 𝐳 𝑈 𝐙𝐙 𝑈 − 𝑂 𝑞 𝐉 𝐳 = 𝐳 𝑈 𝐙𝐙 𝑈 𝐳 − 𝑂𝑂 𝑞 • But 𝐗 = 𝐙𝐙 𝑈 is easier to analyze. Hence in the following slides we will use 𝐗 = 𝐙𝐙 𝑈 46
A minor adjustment • Note behavior of 𝐅 𝐳 = 𝐳 𝑈 𝐗𝐳 with 𝐗 = 𝐙𝐙 𝑈 − 𝑂 𝑞 𝐉 Energy landscape only differs by Both have the an additive constant • Is identical to behavior with same Eigen vectors Gradients and location 𝐗 = 𝐙𝐙 𝑈 of minima remain same • Since 𝐳 𝑈 𝐙𝐙 𝑈 − 𝑂 𝑞 𝐉 𝐳 = 𝐳 𝑈 𝐙𝐙 𝑈 𝐳 − 𝑂𝑂 𝑞 • But 𝐗 = 𝐙𝐙 𝑈 is easier to analyze. Hence in the following slides we will use 𝐗 = 𝐙𝐙 𝑈 47
A minor adjustment • Note behavior of 𝐅 𝐳 = 𝐳 𝑈 𝐗𝐳 with 𝐗 = 𝐙𝐙 𝑈 − 𝑂 𝑞 𝐉 Energy landscape only differs by Both have the an additive constant • Is identical to behavior with same Eigen vectors Gradients and location 𝐗 = 𝐙𝐙 𝑈 of minima remain same NOTE: This • Since is a positive semidefinite matrix 𝐳 𝑈 𝐙𝐙 𝑈 − 𝑂 𝑞 𝐉 𝐳 = 𝐳 𝑈 𝐙𝐙 𝑈 𝐳 − 𝑂𝑂 𝑞 • But 𝐗 = 𝐙𝐙 𝑈 is easier to analyze. Hence in the following slides we will use 𝐗 = 𝐙𝐙 𝑈 48
Consider the energy function 𝐹 = − 1 2 𝐳 𝑈 𝐗𝐳 − 𝐜 𝑈 𝐳 • Reinstating the bias term for completeness sake – Remember that we don’t actually use it in a Hopfield net 49
Consider the energy function This is a quadratic! For Hebbian learning W is positive semidefinite E is convex 𝐹 = − 1 2 𝐳 𝑈 𝐗𝐳 − 𝐜 𝑈 𝐳 • Reinstating the bias term for completeness sake – Remember that we don’t actually use it in a Hopfield net 50
The energy function 𝐹 = − 1 2 𝐳 𝑈 𝐗𝐳 − 𝐜 𝑈 𝐳 • 𝐹 is a convex quadratic 51
The energy function 𝐹 = − 1 2 𝐳 𝑈 𝐗𝐳 − 𝐜 𝑈 𝐳 • 𝐹 is a convex quadratic – Shown from above (assuming 0 bias) • But components of 𝑧 can only take values ±1 – I.e 𝑧 lies on the corners of the unit hypercube 52
The energy function 𝐹 = − 1 2 𝐳 𝑈 𝐗𝐳 − 𝐜 𝑈 𝐳 • 𝐹 is a convex quadratic – Shown from above (assuming 0 bias) • But components of 𝑧 can only take values ±1 – I.e 𝑧 lies on the corners of the unit hypercube 53
The energy function Stored patterns 𝐹 = − 1 2 𝐳 𝑈 𝐗𝐳 − 𝐜 𝑈 𝐳 • The stored values of 𝐳 are the ones where all adjacent corners are higher on the quadratic – Hebbian learning attempts to make the quadratic steep in the vicinity of stored patterns 54
Patterns you can store Ghosts (negations) Stored patterns • Ideally must be maximally separated on the hypercube – The number of patterns we can store depends on the actual distance between the patterns 55
Storing patterns • A pattern 𝐳 𝑄 is stored if: – 𝑡𝑗𝑜 𝐗𝐳 𝑞 = 𝐳 𝑞 for all target patterns • Note: for binary vectors 𝑡𝑗𝑜 𝐳 is a projection – Projects 𝐳 onto the nearest corner of the hypercube – It “quantizes” the space into orthants 56
Storing patterns • A pattern 𝐳 𝑄 is stored if: – 𝑡𝑗𝑜 𝐗𝐳 𝑞 = 𝐳 𝑞 for all target patterns • Training: Design 𝐗 such that this holds • Simple solution: 𝐳 𝑞 is an Eigenvector of 𝐗 – And the corresponding Eigenvalue is positive 𝐗𝐳 𝑞 = 𝜇𝐳 𝑞 – More generally orthant( 𝐗𝐳 𝑞 ) = orthant( 𝐳 𝑞 ) • How many such 𝐳 𝑞 can we have? 57
Only N patterns? (1,1) (1,-1) • Patterns that differ in 𝑂/2 bits are orthogonal • You can have no more than 𝑂 orthogonal vectors in an 𝑂 -dimensional space 59
Another random fact that should interest you • The Eigenvectors of any symmetric matrix 𝐗 are orthogonal • The Eigen values may be positive or negative 60
Storing more than one pattern • Requirement: Given 𝐳 1 , 𝐳 2 , … , 𝐳 𝑄 – Design 𝐗 such that • 𝑡𝑗𝑜 𝐗𝐳 𝑞 = 𝐳 𝑞 for all target patterns • There are no other binary vectors for which this holds • What is the largest number of patterns that can be stored? 61
Storing 𝑳 orthogonal patterns • Simple solution: Design 𝐗 such that 𝐳 1 , 𝐳 2 , … , 𝐳 𝐿 are the Eigen vectors of 𝐗 – Let 𝑍 = 𝐳 1 𝐳 2 … 𝐳 𝐿 𝑋 = 𝑍Λ𝑍 𝑈 – 𝜇 1 , … , 𝜇 𝐿 are positive – For 𝜇 1 = 𝜇 2 = 𝜇 𝐿 = 1 this is exactly the Hebbian rule • The patterns are provably stationary 62
Hebbian rule • In reality – Let 𝑍 = 𝐳 1 𝐳 2 … 𝐳 𝐿 𝐬 𝑳+1 𝐬 𝑳+2 … 𝐬 𝑂 𝑋 = 𝑍Λ𝑍 𝑈 – 𝐬 𝑳+1 𝐬 𝑳+2 … 𝐬 𝑂 are orthogonal to 𝐳 1 𝐳 2 … 𝐳 𝐿 – 𝜇 1 = 𝜇 2 = 𝜇 𝐿 = 1 – 𝜇 𝐿+1 , … , 𝜇 𝑂 = 0 • All patterns orthogonal to 𝐳 1 𝐳 2 … 𝐳 𝐿 are also stationary – Although not stable 63
Storing 𝑶 orthogonal patterns • When we have 𝑂 orthogonal (or near orthogonal) patterns 𝐳 1 , 𝐳 2 , … , 𝐳 𝑂 – 𝑍 = 𝐳 1 𝐳 2 … 𝐳 𝑂 𝑋 = 𝑍Λ𝑍 𝑈 – 𝜇 1 = 𝜇 2 = 𝜇 𝑂 = 1 • The Eigen vectors of 𝑋 span the space • Also, for any 𝐳 𝑙 𝐗𝐳 𝑙 = 𝐳 𝑙 64
Storing 𝑶 orthogonal patterns • The 𝑂 orthogonal patterns 𝐳 1 , 𝐳 2 , … , 𝐳 𝑂 span the space • Any pattern 𝐳 can be written as 𝐳 = 𝑏 1 𝐳 1 + 𝑏 2 𝐳 2 + ⋯ + 𝑏 𝑂 𝐳 𝑂 𝐗𝐳 = 𝑏 1 𝐗𝐳 1 + 𝑏 2 𝐗𝐳 2 + ⋯ + 𝑏 𝑂 𝐗𝐳 𝑂 = 𝑏 1 𝐳 1 + 𝑏 2 𝐳 2 + ⋯ + 𝑏 𝑂 𝐳 𝑂 = 𝐳 • All patterns are stable – Remembers everything – Completely useless network 65
Storing K orthogonal patterns • Even if we store fewer than 𝑂 patterns – Let 𝑍 = 𝐳 1 𝐳 2 … 𝐳 𝐿 𝐬 𝑳+1 𝐬 𝑳+2 … 𝐬 𝑂 𝑋 = 𝑍Λ𝑍 𝑈 – 𝐬 𝑳+1 𝐬 𝑳+2 … 𝐬 𝑂 are orthogonal to 𝐳 1 𝐳 2 … 𝐳 𝐿 – 𝜇 1 = 𝜇 2 = 𝜇 𝐿 = 1 – 𝜇 𝐿+1 , … , 𝜇 𝑂 = 0 • All patterns orthogonal to 𝐳 1 𝐳 2 … 𝐳 𝐿 are stationary • Any pattern that is entirely in the subspace spanned by 𝐳 1 𝐳 2 … 𝐳 𝐿 is also stable (same logic as earlier) • Only patterns that are partially in the subspace spanned by 𝐳 1 𝐳 2 … 𝐳 𝐿 are unstable – Get projected onto subspace spanned by 𝐳 1 𝐳 2 … 𝐳 𝐿 66
Problem with Hebbian Rule • Even if we store fewer than 𝑂 patterns – Let 𝑍 = 𝐳 1 𝐳 2 … 𝐳 𝐿 𝐬 𝑳+1 𝐬 𝑳+2 … 𝐬 𝑂 𝑋 = 𝑍Λ𝑍 𝑈 – 𝐬 𝑳+1 𝐬 𝑳+2 … 𝐬 𝑂 are orthogonal to 𝐳 1 𝐳 2 … 𝐳 𝐿 – 𝜇 1 = 𝜇 2 = 𝜇 𝐿 = 1 • Problems arise because Eigen values are all 1.0 – Ensures stationarity of vectors in the subspace – What if we get rid of this requirement? 67
Hebbian rule and general (non- orthogonal) vectors 𝑞 𝑧 𝑘 𝑞 𝑥 𝑘𝑗 = 𝑧 𝑗 𝑞∈{𝑞} • What happens when the patterns are not orthogonal • What happens when the patterns are presented more than once – Different patterns presented different numbers of times – Equivalent to having unequal Eigen values.. • Can we predict the evolution of any vector 𝐳 – Hint: Lanczos iterations 𝑈 • Can write 𝐙 𝑄 = 𝐙 𝑝𝑠𝑢ℎ𝑝 𝐂 , 𝐗 = 𝐙 𝑝𝑠𝑢ℎ𝑝 𝐂Λ𝐂 𝑈 𝐙 𝑝𝑠𝑢ℎ𝑝 68
The bottom line • With an network of 𝑂 units (i.e. 𝑂 -bit patterns) • The maximum number of stable patterns is actually exponential in 𝑂 – McElice and Posner, 84’ – E.g. when we had the Hebbian net with N orthogonal base patterns, all patterns are stable • For a specific set of 𝐿 patterns, we can always build a network for which all 𝐿 patterns are stable provided 𝐿 ≤ 𝑂 – Mostafa and St. Jacques 85’ • For large N, the upper bound on K is actually N/4logN – McElice et. Al. 8 7’ – But this may come with many “parasitic” memories 69
The bottom line • With an network of 𝑂 units (i.e. 𝑂 -bit patterns) • The maximum number of stable patterns is actually exponential in 𝑂 – McElice and Posner, 84’ How do we find this – E.g. when we had the Hebbian net with N orthogonal base network? patterns, all patterns are stable • For a specific set of 𝐿 patterns, we can always build a network for which all 𝐿 patterns are stable provided 𝐿 ≤ 𝑂 – Mostafa and St. Jacques 85’ • For large N, the upper bound on K is actually N/4logN – McElice et. Al. 8 7’ – But this may come with many “parasitic” memories 70
The bottom line • With an network of 𝑂 units (i.e. 𝑂 -bit patterns) • The maximum number of stable patterns is actually exponential in 𝑂 – McElice and Posner, 84’ How do we find this – E.g. when we had the Hebbian net with N orthogonal base network? patterns, all patterns are stable • For a specific set of 𝐿 patterns, we can always build a network for which all 𝐿 patterns are stable provided 𝐿 ≤ 𝑂 Can we do something – Mostafa and St. Jacques 85’ about this? • For large N, the upper bound on K is actually N/4logN – McElice et. Al. 8 7’ – But this may come with many “parasitic” memories 71
A different tack • How do we make the network store a specific pattern or set of patterns? – Hebbian learning – Geometric approach – Optimization • Secondary question – How many patterns can we store? 72
Consider the energy function 𝐹 = − 1 2 𝐳 𝑈 𝐗𝐳 − 𝐜 𝑈 𝐳 • This must be maximally low for target patterns • Must be maximally high for all other patterns – So that they are unstable and evolve into one of the target patterns 73
Alternate Approach to Estimating the Network 𝐹(𝐳) = − 1 2 𝐳 𝑈 𝐗𝐳 − 𝐜 𝑈 𝐳 • Estimate 𝐗 (and 𝐜 ) such that – 𝐹 is minimized for 𝐳 1 , 𝐳 2 , … , 𝐳 𝑄 – 𝐹 is maximized for all other 𝐳 • Caveat: Unrealistic to expect to store more than 𝑂 patterns, but can we make those 𝑂 patterns memorable 74
Optimizing W (and b) 𝐹(𝐳) = − 1 2 𝐳 𝑈 𝐗𝐳 𝐗 = argmin 𝐹(𝐳) 𝐗 𝐳∈𝐙 𝑄 The bias can be captured by another fixed-value component • Minimize total energy of target patterns – Problem with this? 75
Optimizing W 𝐹(𝐳) = − 1 2 𝐳 𝑈 𝐗𝐳 𝐗 = argmin 𝐹(𝐳) − 𝐹(𝐳) 𝐗 𝐳∈𝐙 𝑄 𝐳∉𝐙 𝑄 • Minimize total energy of target patterns • Maximize the total energy of all non-target patterns 76
Optimizing W 𝐹(𝐳) = − 1 2 𝐳 𝑈 𝐗𝐳 𝐗 = argmin 𝐹(𝐳) − 𝐹(𝐳) 𝐗 𝐳∈𝐙 𝑄 𝐳∉𝐙 𝑄 • Simple gradient descent: 𝐳𝐳 𝑈 − 𝐳𝐳 𝑈 𝐗 = 𝐗 + 𝜃 𝐳∈𝐙 𝑄 𝐳∉𝐙 𝑄 77
Optimizing W 𝐳𝐳 𝑈 − 𝐳𝐳 𝑈 𝐗 = 𝐗 + 𝜃 𝐳∈𝐙 𝑄 𝐳∉𝐙 𝑄 • Can “emphasize” the importance of a pattern by repeating – More repetitions greater emphasis 78
Optimizing W 𝐳𝐳 𝑈 − 𝐳𝐳 𝑈 𝐗 = 𝐗 + 𝜃 𝐳∈𝐙 𝑄 𝐳∉𝐙 𝑄 • Can “emphasize” the importance of a pattern by repeating – More repetitions greater emphasis • How many of these? – Do we need to include all of them? – Are all equally important? 79
The training again.. 𝐳𝐳 𝑈 − 𝐳𝐳 𝑈 𝐗 = 𝐗 + 𝜃 𝐳∈𝐙 𝑄 𝐳∉𝐙 𝑄 • Note the energy contour of a Hopfield network for any weight 𝐗 Bowls will all actually be quadratic Energy 80 state
The training again 𝐳𝐳 𝑈 − 𝐳𝐳 𝑈 𝐗 = 𝐗 + 𝜃 𝐳∈𝐙 𝑄 𝐳∉𝐙 𝑄 • The first term tries to minimize the energy at target patterns – Make them local minima – Emphasize more “important” memories by repeating them more frequently Target patterns Energy 81 state
The negative class 𝐳𝐳 𝑈 − 𝐳𝐳 𝑈 𝐗 = 𝐗 + 𝜃 𝐳∈𝐙 𝑄 𝐳∉𝐙 𝑄 • The second term tries to “raise” all non -target patterns – Do we need to raise everything ? Energy 82 state
Option 1: Focus on the valleys 𝐳𝐳 𝑈 − 𝐳𝐳 𝑈 𝐗 = 𝐗 + 𝜃 𝐳∈𝐙 𝑄 𝐳∉𝐙 𝑄 &𝐳=𝑤𝑏𝑚𝑚𝑓𝑧 • Focus on raising the valleys – If you raise every valley, eventually they’ll all move up above the target patterns, and many will even vanish Energy 83 state
Identifying the valleys.. 𝐳𝐳 𝑈 − 𝐳𝐳 𝑈 𝐗 = 𝐗 + 𝜃 𝐳∈𝐙 𝑄 𝐳∉𝐙 𝑄 &𝐳=𝑤𝑏𝑚𝑚𝑓𝑧 • Problem: How do you identify the valleys for the current 𝐗 ? Energy 84 state
Identifying the valleys.. • Initialize the network randomly and let it evolve – It will settle in a valley Energy 85 state
Training the Hopfield network 𝐳𝐳 𝑈 − 𝐳𝐳 𝑈 𝐗 = 𝐗 + 𝜃 𝐳∈𝐙 𝑄 𝐳∉𝐙 𝑄 &𝐳=𝑤𝑏𝑚𝑚𝑓𝑧 • Initialize 𝐗 • Compute the total outer product of all target patterns – More important patterns presented more frequently • Randomly initialize the network several times and let it evolve – And settle at a valley • Compute the total outer product of valley patterns • Update weights 86
Training the Hopfield network: SGD version 𝐳𝐳 𝑈 − 𝐳𝐳 𝑈 𝐗 = 𝐗 + 𝜃 𝐳∈𝐙 𝑄 𝐳∉𝐙 𝑄 &𝐳=𝑤𝑏𝑚𝑚𝑓𝑧 • Initialize 𝐗 • Do until convergence, satisfaction, or death from boredom: – Sample a target pattern 𝐳 𝑞 • Sampling frequency of pattern must reflect importance of pattern – Randomly initialize the network and let it evolve • And settle at a valley 𝐳 𝑤 – Update weights 𝑈 − 𝐳 𝑤 𝐳 𝑤 𝑈 • 𝐗 = 𝐗 + 𝜃 𝐳 𝑞 𝐳 𝑞 87
Training the Hopfield network 𝐳𝐳 𝑈 − 𝐳𝐳 𝑈 𝐗 = 𝐗 + 𝜃 𝐳∈𝐙 𝑄 𝐳∉𝐙 𝑄 &𝐳=𝑤𝑏𝑚𝑚𝑓𝑧 • Initialize 𝐗 • Do until convergence, satisfaction, or death from boredom: – Sample a target pattern 𝐳 𝑞 • Sampling frequency of pattern must reflect importance of pattern – Randomly initialize the network and let it evolve • And settle at a valley 𝐳 𝑤 – Update weights 𝑈 − 𝐳 𝑤 𝐳 𝑤 𝑈 • 𝐗 = 𝐗 + 𝜃 𝐳 𝑞 𝐳 𝑞 88
Which valleys? • Should we randomly sample valleys? – Are all valleys equally important? Energy 89 state
Which valleys? • Should we randomly sample valleys? – Are all valleys equally important? • Major requirement: memories must be stable – They must be broad valleys • Spurious valleys in the neighborhood of memories are more important to eliminate Energy 90 state
Identifying the valleys.. • Initialize the network at valid memories and let it evolve – It will settle in a valley. If this is not the target pattern, raise it Energy 91 state
Training the Hopfield network 𝐳𝐳 𝑈 − 𝐳𝐳 𝑈 𝐗 = 𝐗 + 𝜃 𝐳∈𝐙 𝑄 𝐳∉𝐙 𝑄 &𝐳=𝑤𝑏𝑚𝑚𝑓𝑧 • Initialize 𝐗 • Compute the total outer product of all target patterns – More important patterns presented more frequently • Initialize the network with each target pattern and let it evolve – And settle at a valley • Compute the total outer product of valley patterns • Update weights 92
Training the Hopfield network: SGD version 𝐳𝐳 𝑈 − 𝐳𝐳 𝑈 𝐗 = 𝐗 + 𝜃 𝐳∈𝐙 𝑄 𝐳∉𝐙 𝑄 &𝐳=𝑤𝑏𝑚𝑚𝑓𝑧 • Initialize 𝐗 • Do until convergence, satisfaction, or death from boredom: – Sample a target pattern 𝐳 𝑞 • Sampling frequency of pattern must reflect importance of pattern – Initialize the network at 𝐳 𝑞 and let it evolve • And settle at a valley 𝐳 𝑤 – Update weights 𝑈 − 𝐳 𝑤 𝐳 𝑤 𝑈 • 𝐗 = 𝐗 + 𝜃 𝐳 𝑞 𝐳 𝑞 93
A possible problem • What if there’s another target pattern downvalley – Raising it will destroy a better-represented or stored pattern! Energy 94 state
A related issue • Really no need to raise the entire surface, or even every valley Energy 95 state
A related issue • Really no need to raise the entire surface, or even every valley • Raise the neighborhood of each target memory – Sufficient to make the memory a valley – The broader the neighborhood considered, the broader the valley Energy 96 state
Raising the neighborhood • Starting from a target pattern, let the network evolve only a few steps – Try to raise the resultant location • Will raise the neighborhood of targets • Will avoid problem of down-valley targets Energy 97 state
Training the Hopfield network: SGD version 𝐳𝐳 𝑈 − 𝐳𝐳 𝑈 𝐗 = 𝐗 + 𝜃 𝐳∈𝐙 𝑄 𝐳∉𝐙 𝑄 &𝐳=𝑤𝑏𝑚𝑚𝑓𝑧 • Initialize 𝐗 • Do until convergence, satisfaction, or death from boredom: – Sample a target pattern 𝐳 𝑞 • Sampling frequency of pattern must reflect importance of pattern – Initialize the network at 𝐳 𝑞 and let it evolve a few steps (2- 4) • And arrive at a down-valley position 𝐳 𝑒 – Update weights 𝑈 − 𝐳 𝑒 𝐳 𝑒 𝑈 • 𝐗 = 𝐗 + 𝜃 𝐳 𝑞 𝐳 𝑞 98
A probabilistic interpretation 𝑄(𝐳) = 𝐷𝑓𝑦𝑞 1 𝐹(𝐳) = − 1 2 𝐳 𝑈 𝐗𝐳 2 𝐳 𝑈 𝐗𝐳 • For continuous 𝐳 , the energy of a pattern is a perfect analog to the negative log likelihood of a Gaussian density • For binary y it is the analog of the negative log likelihood of a Boltzmann distribution – Minimizing energy maximizes log likelihood 𝑄(𝐳) = 𝐷𝑓𝑦𝑞 1 𝐹(𝐳) = − 1 2 𝐳 𝑈 𝐗𝐳 2 𝐳 𝑈 𝐗𝐳 99
The Boltzmann Distribution 𝐹 𝐳 = − 1 𝑄(𝐳) = 𝐷𝑓𝑦𝑞 −𝐹(𝐳) 2 𝐳 𝑈 𝐗𝐳 − 𝐜 𝑈 𝐳 𝑙𝑈 1 𝐷 = σ 𝐳 𝑄(𝐳) • 𝑙 is the Boltzmann constant • 𝑈 is the temperature of the system • The energy terms are like the loglikelihood of a Boltzmann distribution at 𝑈 = 1 – Derivation of this probability is in fact quite trivial.. 100
Continuing the Boltzmann analogy 𝐹 𝐳 = − 1 𝑄(𝐳) = 𝐷𝑓𝑦𝑞 −𝐹(𝐳) 2 𝐳 𝑈 𝐗𝐳 − 𝐜 𝑈 𝐳 𝑙𝑈 1 𝐷 = σ 𝐳 𝑄(𝐳) • The system probabilistically selects states with lower energy – With infinitesimally slow cooling, at 𝑈 = 0, it arrives at the global minimal state 101
Recommend
More recommend