optimal learning rate
play

Optimal Learning Rate What is the optimal value opt of the learning - PowerPoint PPT Presentation

Optimal Learning Rate What is the optimal value opt of the learning rate? Consider 1 -dim. case. Use first-order Taylor expansion around current weight w c E ( w ) = E ( w c ) + ( w w c ) E ( w c ) . w Differentiating both sides


  1. Optimal Learning Rate • What is the optimal value η opt of the learning rate? Consider 1 -dim. case. Use first-order Taylor expansion around current weight w c E ( w ) = E ( w c ) + ( w − w c ) ∂E ( w c ) . ∂w Differentiating both sides with respect to w gives: + ( w − w c ) ∂ 2 E ( w c ) ∂E ( w ) = ∂E ( w c ) ∂w 2 ∂w ∂w ∂E ( w min ) Setting w = w min and noting that = 0 , one obtains ∂w + ( w min − w c ) + ∂ 2 E ( w c ) 0 = ∂E ( w c ) ∂w 2 ∂w – p. 132

  2. Optimal Learning Rate (cont.) � − 1 � ∂ 2 E ( w c ) ∂E ( w c ) w min = w c − ∂w 2 ∂w � �� � η opt E ( w ) E ( w ) η < η opt η = η opt w w w min w min – p. 133

  3. Hopfield Network Introductory Example recalled by the memory • Suppose we want to store N binary images in some memory. • The memory should be content-addressable and insensitive to small errors. • We present corrupted images to the memory (e.g. our brain) and re- call the corresponding images. presentation of corrupted images – p. 134

  4. Hopfield Network S 5 • w ij denotes weight S 4 connection from unit j w 51 = w 15 to unit i • no unit has connection S 1 with itself w ii = 0 , ∀ i • connections are sym- S 3 metric w ij = w ji , ∀ i, j S 2 State of unit i can take values ± 1 and is denoted as S i . State dynamics are governed by activity rule:   � if a ≥ 0 , +1 �  , where sgn ( a ) = S i = sgn w ij S j if a < 0 − 1 j – p. 135

  5. Learning Rule in a Hopfield Network Learning in Hopfield networks: • Store a set of desired memories { x ( n ) } in the network, where each memory is a binary pattern with x i ∈ {− 1 , +1 } . • The weights are set using the sum of outer products w ij = 1 � x ( n ) x ( n ) j , i N n where N denotes the number of units ( N can also be some positive constant, e.g. number of patterns). Given a m × 1 column vector a and 1 × n row vector b . The outer product a ⊗ b (short a b ) is defined as the m × n matrix     a 1 a 1 b 1 a 1 b 2 a 1 b 3      ⊗ [ b 1 b 2 b 3 ] = m = n = 3  , a 2 a 2 b 1 a 2 b 2 a 2 b 3   a 3 a 3 b 1 a 3 b 2 a 3 b 3 – p. 136

  6. Learning in Hopfield Network (Example) Suppose we want to store patterns x (1) = [ − 1 , +1 , − 1] and x (2) = [+1 , − 1 , +1] .     +1 − 1 +1 − 1  ⊗ [ − 1 , +1 , − 1]   = − 1 +1 − 1  +1   − 1 +1 − 1 +1 +     +1 − 1 +1 +1  ⊗ [+1 , − 1 , +1]   = − 1 +1 − 1  − 1   +1 +1 − 1 +1 – p. 137

  7. Learning in Hopfield Network (Example) (cont.)   0 − 2 +2 W = 1   − 2 0 − 2   3 +2 − 2 0 Recall: no unit has connection with itself. The storage of patterns in the network can also be interpreted as constructing stable states. The condition for patterns to be stable is:   �  = x i , ∀ i. sgn w ij x i j Suppose we present pattern x (1) to the network and want to restore the corresponding pattern. – p. 138

  8. Learning in Hopfield Network (Example) (cont.) Let us assume that the network states are set as follows: S i = x i , ∀ i . We can restore pattern x (1) = [ − 1 , +1 , − 1] as follows:     3 3 � �  = +1 S 1 = sgn = − 1 S 2 = sgn w 1 j S j w 2 j S j    j =1 j =1   3 � S 3 = sgn = − 1 w 3 j S j   j =1 Can we also restore the original patterns by presenting “similar”patterns which are corrupted by noise? – p. 139

  9. Updating States in a Hopfield Network Synchronous updates: �� � • all units update their states S i = sgn j w ij S j simultaneously. Asynchronous updates: • one unit at a time updates its state. The sequence of selected units may be a fixed sequence or a random sequence. Synchronously updating states can lead to oscillation (no convergence to a stable state). 1 S 1 = +1 S 2 = − 1 1 – p. 140

  10. Aim of a Hopfield Network Our aim is that by presenting a corrupted pattern, and by ap- plying iteratively the state update rule the Hopfield network will settle down in a stable state which corresponds to the desired pattern. Hopfield network is a method for • pattern completion • error correction. The state of a Hopfield network can be expressed in terms of the energy function E = − 1 � w ij S i S j 2 i,j Hopfield observed that if a state is a local minimum in the energy function, it is also a stable state for the network. – p. 141

  11. Basin of Attraction and Stable States ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� basin of attraction stable states Within the space the stored patterns x ( n ) are acting like attractors. – p. 142

  12. Haykin’s Digit Example Suppose we stored the following digits in the Hopfield network: Energy = −67.73 Energy = −67.87 Energy = −82.33 Energy = −86.6 Energy = −77.73 Pattern 0 Pattern 1 Pattern 2 Pattern 3 Pattern 4 Energy = −90.47 Energy = −83.13 Energy = −66.93 Pattern 6 Pattern 9 Pattern box – p. 143

  13. Updated States of Corrupted Digit 6 Energy = −10.27 Energy = −12.2 Energy = −13.6 Energy = −14.87 Energy = −15.87 Start Pattern updated unit 40 updated unit 39 updated unit 81 updated unit 98 Energy = −18.07 Energy = −20.4 Energy = −22.2 Energy = −23.33 Energy = −25.73 updated unit 80 updated unit 12 updated unit 114 updated unit 115 updated unit 49 Energy = −26.8 Energy = −29.67 Energy = −30.13 Energy = −31.47 Energy = −34.4 updated unit 117 updated unit 3 updated unit 48 updated unit 6 updated unit 79 – p. 144

  14. Updated States of Corrupted Digit 6 (cont.) Energy = −36.73 Energy = −38.4 Energy = −41.07 Energy = −42.4 Energy = −45.27 updated unit 113 updated unit 57 updated unit 103 updated unit 18 updated unit 109 Energy = −47.6 Energy = −50.4 Energy = −52.67 Energy = −56.47 Energy = −58.4 updated unit 83 updated unit 71 updated unit 77 updated unit 26 updated unit 15 Energy = −60.67 Energy = −63.33 Energy = −64.47 Energy = −68 Energy = −71.27 updated unit 31 updated unit 58 updated unit 16 updated unit 29 updated unit 88 – p. 145

  15. Updated States of Corrupted Digit 6 (cont.) The resulting pattern (stable state with energy − 90 . 47 ) matches the desired pattern. Energy = −73.73 Energy = −77.27 Energy = −81.47 Energy = −84.27 Energy = −87.33 updated unit 72 updated unit 90 updated unit 19 updated unit 21 updated unit 25 Energy = −90.47 Energy = −90.47 updated unit 73 Original Pattern 6 – p. 146

  16. Recall a Spurious Pattern Energy = −28.27 Energy = −28.27 Energy = −30.27 Energy = −31.93 Energy = −32.8 Start Pattern updated unit 44 updated unit 12 updated unit 64 updated unit 45 Energy = −33.4 Energy = −35.6 Energy = −37.6 Energy = −40 Energy = −42.6 updated unit 98 updated unit 111 updated unit 50 updated unit 81 updated unit 95 Energy = −44.53 Energy = −44.8 Energy = −48.13 Energy = −50.53 Energy = −51.87 updated unit 65 updated unit 15 updated unit 54 updated unit 62 updated unit 33 – p. 147

  17. Recall a Spurious Pattern (cont.) Energy = −53.73 Energy = −56.53 Energy = −59.93 Energy = −61.6 Energy = −63.2 updated unit 37 updated unit 91 updated unit 58 updated unit 84 updated unit 43 Energy = −63.73 Energy = −66.8 Energy = −67.6 Energy = −69 Energy = −70.4 updated unit 28 updated unit 112 updated unit 48 updated unit 88 updated unit 26 Energy = −71.93 Energy = −74.13 Energy = −76.6 Energy = −80.27 Energy = −81.4 updated unit 73 updated unit 70 updated unit 40 updated unit 117 updated unit 106 – p. 148

  18. Recall a Spurious Pattern (cont.) The Hopfield network settled down in local minima with energy − 84 . 93 . This pattern however is not the desired pattern. It is a pattern which was not stored in the network. Energy = −84.8 Energy = −84.93 Energy = −83.13 updated unit 61 updated unit 15 Original Pattern 9 – p. 149

  19. Incorrect Recall of Corrupted Pattern 2 Energy = −22.07 Energy = −22.07 Energy = −22.13 Energy = −22.33 Energy = −24.13 Start Pattern updated unit 97 updated unit 17 updated unit 58 updated unit 45 Energy = −24.53 Energy = −27.6 Energy = −28.33 Energy = −29.87 Energy = −31.47 updated unit 18 updated unit 100 updated unit 7 updated unit 103 updated unit 81 Energy = −32.13 Energy = −32.33 Energy = −35.47 Energy = −36.53 Energy = −38.67 updated unit 68 updated unit 86 updated unit 119 updated unit 33 updated unit 87 – p. 150

  20. Incorrect Recall of Corrupted Pattern 2 (cont.) Energy = −39.2 Energy = −41.73 Energy = −45.47 Energy = −48 Energy = −49.6 updated unit 57 updated unit 73 updated unit 120 updated unit 104 updated unit 43 Energy = −51.6 Energy = −51.67 Energy = −55.6 Energy = −56.4 Energy = −58.27 updated unit 91 updated unit 37 updated unit 3 updated unit 31 updated unit 24 Energy = −60.73 Energy = −61.87 Energy = −62.87 Energy = −64.8 Energy = −68.93 updated unit 101 updated unit 41 updated unit 117 updated unit 65 updated unit 10 – p. 151

Recommend


More recommend