lecture 40 final exam review
play

Lecture 40 final exam review Mark Hasegawa-Johnson 5/6/2020 Some - PowerPoint PPT Presentation

Lecture 40 final exam review Mark Hasegawa-Johnson 5/6/2020 Some sample problems DNNs: Practice Final, question 23 Reinforcement learning: Practice Final, question 24 Games: Practice Final, question 25 Game theory: Practice


  1. Lecture 40 – final exam review Mark Hasegawa-Johnson 5/6/2020

  2. Some sample problems • DNNs: Practice Final, question 23 • Reinforcement learning: Practice Final, question 24 • Games: Practice Final, question 25 • Game theory: Practice Final, question 26

  3. Practice Exam, question 23 You have a two-layer neural network trained as an animal classier. The Input input feature vector is ⃗ 𝑦 = Weights [𝑦 ! , 𝑦 " , 𝑦 # , 1] , where 𝑦 ! , 𝑦 " , and 𝑦 # 𝑦 ! ∗ are some features, and 1 is 𝒛 𝟐 𝑥 !! ℎ ! multiplied by the bias. There are two 𝑥 !" 𝑦 " hidden nodes, and three output 𝑥 "! 𝑧 ∗ = [𝑧 ! ∗ , 𝑧 " ∗ , 𝑧 # ∗ , ] , ∗ nodes, ⃗ 𝒛 𝟑 𝑥 "" 𝑦 # ℎ " corresponding to the three output ∗ = Pr(dog| ⃗ ∗ = 𝑥 #! classes 𝑧 ! 𝑦 ), 𝑧 " 𝑥 #" ∗ = Pr(skunk| ⃗ ∗ Pr(cat| ⃗ 𝑦 ), 𝑧 # 𝑦 ). 𝒛 𝟒 1 1 Hidden node activations are sigmoid; output node activations are softmax. By http://www.birdphotos.com - Own work, CC BY 3.0, https://commons.wikimedia.org /w/index.php?curid=4409510

  4. Practice Exam, question 23 (a) A Maltese puppy has feature Input vector ⃗ 𝑦 = [2,20, −1, 1] . All weights and biases are initialized Weights 𝑧 ∗ ? to zero. What is ⃗ 𝑦 ! ∗ 𝒛 𝟐 𝑥 !! ℎ ! 𝑥 !" 𝑦 " 𝑥 "! ∗ 𝒛 𝟑 𝑥 "" 𝑦 # ℎ " 𝑥 #! 𝑥 #" ∗ 𝒛 𝟒 1 1 By http://www.birdphotos.com - Own work, CC BY 3.0, https://commons.wikimedia.org /w/index.php?curid=4409510

  5. Practice Exam, question 23 (a) A Maltese puppy has feature Input vector ⃗ 𝑦 = [2,20, −1, 1] . All weights and biases are Weights 𝑧 ∗ ? initialized to zero. What is ⃗ 𝑦 ! ∗ 𝒛 𝟐 𝑥 !! ℎ ! 𝑥 !" Hidden node excitations are both: 𝑦 " 𝑥 "! 0×⃗ 𝑦 = 0 ∗ 𝒛 𝟑 𝑥 "" 𝑦 # ℎ " Therefore, hidden node 𝑥 #! 𝑥 #" activations are both: ∗ 𝒛 𝟒 1 1 1 1 + 1 = 1 1 1 + 𝑓 "# = 2 By http://www.birdphotos.com - Own work, CC BY 3.0, https://commons.wikimedia.org /w/index.php?curid=4409510

  6. Practice Exam, question 23 (a) A Maltese puppy has feature Input vector ⃗ 𝑦 = [2,20, −1, 1] . All weights and biases are Weights 𝑧 ∗ ? initialized to zero. What is ⃗ 𝑦 ! ∗ 𝒛 𝟐 𝑥 !! ℎ ! 𝑥 !" Output node excitations are all: 𝑦 " 𝑥 "! 0×ℎ = 0 ∗ 𝒛 𝟑 𝑥 "" 𝑦 # ℎ " Therefore, output node 𝑥 #! 𝑥 #" activations are all: ∗ 𝒛 𝟒 1 1 𝑓 # 𝑓 # = 1 ' ∑ $%& 3 By http://www.birdphotos.com - Own work, CC BY 3.0, https://commons.wikimedia.org /w/index.php?curid=4409510

  7. Practice Exam, question 23 (b) Let 𝑥 $( be the weight connecting Input the ith output node to the jth hidden ∗ Weights node. What is )* ! )+ !# ? Write your 𝑦 ! ∗ 𝒛 𝟐 𝑥 !! ∗ , 𝑥 $( , and/or ℎ ( answer in terms of 𝑧 $ ℎ ! 𝑥 !" 𝑦 " for appropriate values of i and/or j. 𝑥 "! ∗ 𝒛 𝟑 𝑥 "" 𝑦 # ℎ " 𝑥 #! 𝑥 #" ∗ 𝒛 𝟒 1 1 By http://www.birdphotos.com - Own work, CC BY 3.0, https://commons.wikimedia.org /w/index.php?curid=4409510

  8. Practice Exam, question 23 ∗ !" ! (b) What is !# !# ? Answer: OK, first we need the definition of softmax. Let’s write it in lots of parts, so it will be easier to differentiate. Input ∗ = num 𝑧 $ Weights den Where ”num” is the numerator of the softmax function: 𝑦 ! ∗ 𝒛 𝟐 𝑥 !! num = exp 𝑔 ! ℎ ! 𝑥 !" “den” is the denominator of the softmax function: 𝑦 " 𝑥 "! % ∗ 𝒛 𝟑 den = - exp 𝑔 𝑥 "" " 𝑦 # ℎ " "#$ 𝑥 #! 𝑥 #" And both of those are written in terms of the softmax excitations, let’s call them 𝑔 " : ∗ 𝒛 𝟒 1 1 By http://www.birdphotos.com - 𝑔 " = - 𝑥 "& ℎ " Own work, CC BY 3.0, https://commons.wikimedia.org & /w/index.php?curid=4409510

  9. Practice Exam, question 23 ∗ (b) What is () ' (* ') ? Now we differentiate each part: Input ∗ 𝑒𝑧 + 1 𝑒num − num 𝑒den Weights = den + 𝑒𝑥 +- den 𝑒𝑥 +- 𝑒𝑥 +- 𝑦 ! ∗ 𝒛 𝟐 𝑥 !! 𝑒num 𝑒𝑔 ℎ ! 𝑥 !" + = exp 𝑔 𝑦 " + 𝑒𝑥 !" 𝑒𝑥 +- 𝑥 "! ∗ 𝒛 𝟑 𝑥 "" 0 𝑦 # ℎ " 𝑒𝑔 𝑒𝑔 𝑒den # ! 𝑒𝑥 !" = exp 𝑔 " = 1 exp 𝑔 𝑥 #! . 𝑒𝑥 !" 𝑥 #" 𝑒𝑥 +- ./- ∗ 𝒛 𝟒 1 1 𝑒𝑔 + = ℎ - 𝑒𝑥 +- By http://www.birdphotos.com - Own work, CC BY 3.0, https://commons.wikimedia.org /w/index.php?curid=4409510

  10. Practice Exam, question 23 ∗ 01 ( (b) What is 02 () ? Putting it all back together again: Input Weights ∗ 𝑒𝑧 3 𝑦 ! ∗ 𝒛 𝟐 𝑥 !! 𝑒𝑥 34 ℎ ! 𝑥 !" 1 𝑦 " 𝑥 "! exp 𝑔 = " ℎ 4 ∗ 𝒛 𝟑 7 ∑ 564 exp 𝑔 𝑥 "" 5 𝑦 # ℎ " 𝑥 #! 𝑥 #" exp 𝑔 3 3 exp 𝑔 ∗ − " ℎ 4 𝒛 𝟒 1 1 7 ∑ 564 exp 𝑔 5 By http://www.birdphotos.com - ∗ 𝑒𝑧 3 Own work, CC BY 3.0, ∗ ℎ 4 − 𝑧 3 ∗ 3 ℎ 4 = 𝑧 3 https://commons.wikimedia.org 𝑒𝑥 34 /w/index.php?curid=4409510

  11. Some sample problems • DNNs: Practice Final, question 23 • Reinforcement learning: Practice Final, question 24 • Games: Practice Final, question 25 • Game theory: Practice Final, question 26

  12. Practice Exam, question 24 A cat lives in a two-room apartment. It has two possible actions: purr, or walk. It starts in room s0 = 1, where it receives the reward r0 = 2 (petting). It then implements the following sequence of actions: a0 =walk, a1 =purr. In response, it observes the following sequence of states and rewards: s1 = 2, r1 = 5 (food), s2 = 2.

  13. Practice Exam, question 24 (a) The cat starts out with a Q-table whose entries are all Q(s,a) = 0. • …then performs one iteration of TD-learning using each of the two SARS sequences described above. • …it uses a relatively high learning rate (alpha = 0.05) and a relatively low discount factor (gamma = 3/4). Which entries in the Q-table have changed, after this learning, and what are their new values?

  14. Practice Exam, question 24 Time step 0: 𝑇𝐵𝑆𝑇 = (1, 𝑥𝑏𝑚𝑙, 2,2) 𝑅𝑚𝑝𝑑𝑏𝑚 = 𝑆(1) + 𝛿 max 𝑅(2, 𝑏) * = 2 + 3 4 max 0,0 = 2 𝑅(1, 𝑥) = 𝑅(1, 𝑥) + 𝛽(𝑅𝑚𝑝𝑑𝑏𝑚 − 𝑅 1, 𝑥 ) = 0 + 0.05 ∗ (2 − 0) = 0.1 Time step 1: 𝑇𝐵𝑆𝑇 = (2, 𝑞𝑣𝑠𝑠, 5,2) 𝑅𝑚𝑝𝑑𝑏𝑚 = 𝑆(2) + 𝛿 max 𝑅(2, 𝑏) * = 5 + 3 4 max 0,0 = 5 𝑅(2, 𝑞𝑣𝑠𝑠) = 𝑅(2, 𝑞) + 𝛽(𝑅𝑚𝑝𝑑𝑏𝑚 − 𝑅 2, 𝑞 ) = 0 + 0.05 ∗ (5 − 0) = 0.25

  15. Practice Exam, question 24 (b) The cat decides, instead, to use model-based learning. Based on these two observations, it estimates P(s’|s,a) with Laplace smoothing, where the smoothing constant is k=1. Find P(s’|2,purr). Time step 0: 𝑇𝐵𝑆𝑇 = (1, 𝑥𝑏𝑚𝑙, 2,2) Time step 1: 𝑇𝐵𝑆𝑇 = (2, 𝑞𝑣𝑠𝑠, 5,2)

  16. Practice Exam, question 24 (b) Find P(s’|2,purr). P 𝑡 G = 1 𝑡 = 2, 𝑏 = 𝑞𝑣𝑠𝑠 = 1 + 𝐷𝑝𝑣𝑜𝑢(𝑡 = 2, 𝑏 = 𝑞𝑣𝑠𝑠, 𝑡 G = 1) 1 2 + ∑ 𝐷𝑝𝑣𝑜𝑢(𝑡 = 2, 𝑏 = 𝑞𝑣𝑠𝑠, 𝑡 G ) = 2 + 1 P 𝑡 G = 2 𝑡 = 2, 𝑏 = 𝑞𝑣𝑠𝑠 = 1 + 𝐷𝑝𝑣𝑜𝑢(𝑡 = 2, 𝑏 = 𝑞𝑣𝑠𝑠, 𝑡 G = 2) 2 + ∑ 𝐷𝑝𝑣𝑜𝑢(𝑡 = 2, 𝑏 = 𝑞𝑣𝑠𝑠, 𝑡 G ) = 1 + 1 2 + 1

  17. Practice Exam, question 24 (c) The cat estimates R(1)=2, R(2)=5, and the following P(s’|s,a) table. It chooses the policy pi(1)=purr, pi(2)=walk. What is the policy-dependent utility of each room? Write two equations in the two unknowns U(1) and U(2); don’t solve. a=purr a=walk s=1 s=2 s=1 s=2 s’=1 2/3 1/3 1/3 2/3 s’=2 1/3 2/3 2/3 1/3

  18. Practice Exam, question 24 (c) Answer: policy-dependent utility is just like Bellman’s equation, but without the max operation. The equations are 𝑄 𝑡 % 𝑡 = 1, 𝜌 1 𝑉(𝑡 % ) 𝑉 1 = 𝑆 1 + 𝛿 - $% 𝑄 𝑡 % 𝑡 = 2, 𝜌 2 𝑉(𝑡 % ) 𝑉 2 = 𝑆 2 + 𝛿 - $% a=purr a=walk s=1 s=2 s=1 s=2 s’=1 2/3 1/3 1/3 2/3 s’=2 1/3 2/3 2/3 1/3

  19. Practice Exam, question 24 (c) Answer: So to solve, we just plug in the values for all variables except U(1) and U(2): 𝑉 1 = 2 + (3 4) 2 3 𝑉 1 + 1 3 𝑉(2) 𝑉 2 = 5 + (3 4) 2 3 𝑉 1 + 1 3 𝑉(2) a=purr a=walk s=1 s=2 s=1 s=2 s’=1 2/3 1/3 1/3 2/3 s’=2 1/3 2/3 2/3 1/3

  20. Practice Exam, question 24 (d) Since it has some extra time, and excellent python programming skills, the cat decides to implement deep reinforcement learning, using an actor-critic algorithm. Inputs are one-hot encodings of state and action. What are the input and output dimensions of the actor network, and of the critic network?

  21. Practice Exam, question 24 (d) Actor network is 𝜌 3 𝑡 = probability that action a is the best action, where a=1 or a=2. So output has two dimensions. Input is the state, s. If there are two states, encoded using a one-hot vector, then state 1 is encoded as 𝑡 = [1,0] , state 2 is encoded as 𝑡 = [0,1] . So, two dimensions.

Recommend


More recommend