reinforcement learning
play

Reinforcement Learning a gentle introduction & industrial - PowerPoint PPT Presentation

Reinforcement Learning a gentle introduction & industrial application Christian Hidber Learning learning from children FOLIE 2 REINFORCEMENT LEARNING The game: demo FOLIE 3 REINFORCEMENT LEARNING The game: setup game engine Goal:


  1. Reinforcement Learning a gentle introduction & industrial application Christian Hidber

  2. Learning learning from children FOLIE 2 REINFORCEMENT LEARNING

  3. The game: demo FOLIE 3 REINFORCEMENT LEARNING

  4. The game: setup game engine Goal: maximize sum of rewards Step reward actions game state learner FOLIE 4 REINFORCEMENT LEARNING "Dieses Foto" von Unbekannter Autor ist lizenziert gemäß CC BY-NC

  5. The game: positive feedback game engine Step reward actions game state learner FOLIE 5 REINFORCEMENT LEARNING "Dieses Foto" von Unbekannter Autor ist lizenziert gemäß CC BY-NC

  6. The game: negative feedback game engine Step reward actions game state learner FOLIE 6 REINFORCEMENT LEARNING "Dieses Foto" von Unbekannter Autor ist lizenziert gemäß CC BY-NC

  7. the learned stuff => policy game engine Step reward actions game state Policy learner (rules learned, how to play the game) FOLIE 7 REINFORCEMENT LEARNING

  8. policy improvement => learning game engine Step reward actions game state Policy learner (rules learned, how to play the game) FOLIE 8 REINFORCEMENT LEARNING "Dieses Foto" von Unbekannter Autor ist lizenziert gemäß CC BY-NC

  9. policy improvement => learning game engine Step reward actions game state Policy learner (rules learned, how to play the game) FOLIE 9 REINFORCEMENT LEARNING "Dieses Foto" von Unbekannter Autor ist lizenziert gemäß CC BY-NC

  10. Reinforcement learning game engine Key idea: continuously improve policy to increase total reward Step reward actions game state Policy RL algorithm (rules learned, how to play the game) FOLIE 10 REINFORCEMENT LEARNING "Dieses Foto" von Unbekannter Autor ist lizenziert gemäß CC BY-NC

  11. Episode 1 : play with 1st policy (random) Policy (rules learned, how to play the game) Action from State Policy Next State Reward -1 1 2 3 4 5 6 7 Step # FOLIE 11 REINFORCEMENT LEARNING

  12. Episode 1 : play with 1st policy (random) Policy (rules learned, how to play the game) Action from State Policy Next State Reward 100 1 2 3 4 5 6 7 Step # FOLIE 12 REINFORCEMENT LEARNING

  13. Episode 1 : play with 1st policy (random) Policy (rules learned, how to play the game) Action from State Policy Next State Reward -50 Episode Over Episode Over 1 2 3 4 5 6 7 Step # FOLIE 13 REINFORCEMENT LEARNING

  14. Episode 1 : improve 1st policy for state in step 3 Policy (rules learned, how to play the game) Action from State Policy -50 Episode Over 1 2 3 4 5 6 7 Step # FOLIE 14 REINFORCEMENT LEARNING

  15. Episode 1 : improve 1st policy for state in step 2 Policy (rules learned, how to play the game) Action from State Policy 50 (=+100 -50) Future Reward (sum of all rewards from current state until ‘game over’) Episode Over 1 2 3 4 5 6 7 Step # FOLIE 15 REINFORCEMENT LEARNING

  16. Episode 1 : improve 1st policy for state in step 1 Policy (rules learned, how to play the game) Action from State Policy 49 (=-1 +100 -50) Future Reward (sum of all rewards from current state until ‘game over’) Episode Over 1 2 3 4 5 6 7 Step # FOLIE 16 REINFORCEMENT LEARNING M3, OCTOBER 2018

  17. Episode 2 : play with 2nd policy Policy (rules learned, how to play the game) Already learned: go left is ok Action from State Policy Next State Reward -1 1 2 3 4 5 6 7 Step # FOLIE 17 REINFORCEMENT LEARNING

  18. Episode 2 : play with 2nd policy Policy (rules learned, how to play the game) Already learned: go left is ok Action from State Policy Next State Reward 100 1 2 3 4 5 6 7 Step # FOLIE 18 REINFORCEMENT LEARNING

  19. Already learned: Episode 2 : play with 2nd policy Policy don’t go up (rules learned, how to play the game) Action from State Policy Next State Reward 100 1 2 3 4 5 6 7 Step # FOLIE 19 REINFORCEMENT LEARNING

  20. Episode 2 : play with 2nd policy Policy (rules learned, how to play the game) Action from State Policy Next State Reward -1 1 2 3 4 5 6 7 Step # FOLIE 20 REINFORCEMENT LEARNING

  21. Episode 2 : play with 2nd policy Policy (rules learned, how to play the game) Action from State Policy Next State Reward -50 Episode Over Episode Over 1 2 3 4 5 6 7 Step # FOLIE 21 REINFORCEMENT LEARNING

  22. Episode 2 : improve 2nd policy for state in step 5 Policy (rules learned, how to play the game) Action from State Policy -50 (=-50) Future Reward (sum of all rewards from current state until ‘game over’) Episode Over 1 2 3 4 5 6 7 Step # FOLIE 22 REINFORCEMENT LEARNING

  23. Episode 2 : improve 2nd policy for state in step 4 Policy (rules learned, how to play the game) Action from State Policy -51 (=-1-50) Future Reward (sum of all rewards from current state until ‘game over’) Episode Over 1 2 3 4 5 6 7 Step # FOLIE 23 REINFORCEMENT LEARNING

  24. Episode 2 : improve 2nd policy for state in step 3 Policy (rules learned, how to play the game) Action from State Policy 49 (=+100-1-50) Future Reward (sum of all rewards from current state until ‘game over’) Episode Over 1 2 3 4 5 6 7 Step # FOLIE 24 REINFORCEMENT LEARNING

  25. Episode 2 : improve 2nd policy for state in step 2 Policy (rules learned, how to play the game) Action from State Policy 149 (=+100+100-1-50) Future Reward (sum of all rewards from current state until ‘game over’) Episode Over 1 2 3 4 5 6 7 Step # FOLIE 25 REINFORCEMENT LEARNING

  26. Episode 2 : improve 2nd policy for state in step 2 Policy (rules learned, how to play the game) Action from State Policy = some running average of old and new value 149 (=+100+100-1-50) Future Reward (sum of all rewards from current state until ‘game over’) Episode Over 1 2 3 4 5 6 7 Step # FOLIE 26 REINFORCEMENT LEARNING

  27. Episode 2 : improve 2nd policy for state in step 1 Policy (rules learned, how to play the game) Action from State Policy 148 (=-1+100+100-1-50) Future Reward (sum of all rewards from current state until ‘game over’) Episode Over 1 2 3 4 5 6 7 Step # FOLIE 27 REINFORCEMENT LEARNING

  28. So far ….. a policy is a map from states to action probabilities FOLIE 28 REINFORCEMENT LEARNING

  29. …updated by the reinforcement learning algorithm a policy is a map from states to action probabilities FOLIE 29 REINFORCEMENT LEARNING

  30. …updated by the reinforcement learning algorithm a policy is a map from states to action probabilities FOLIE 30 REINFORCEMENT LEARNING

  31. After many, many episodes, for each state… FOLIE 31 REINFORCEMENT LEARNING

  32. Algorithm sketch Policy (rules learned, how to play the game) Initialize table with random action probabilities for each state Repeat play episode with policy given by table Record (state 1 ,action 1 ,reward 1 ),(state 2 ,action 2 ,reward 2 ),…. for episode For each step i compute FutureReward i = reward i + reward i+1 +… update table[state i ] s.t. • action i becomes for state i more likely if FutureReward i is “high” • action i becomes for state i less likely if FutureReward i is “low” FOLIE 32 REINFORCEMENT LEARNING

  33. Algorithm sketch Policy (rules learned, how to play the game) Initialize table with random action probabilities for each state Repeat play episode with policy given by table Record (state 1 ,action 1 ,reward 1 ),(state 2 ,action 2 ,reward 2 ),…. for episode For each step i compute FutureReward i = reward i + reward i+1 +… update table[state i ] s.t. • action i becomes for state i more likely if FutureReward i is “high” • action i becomes for state i less likely if FutureReward i is “low” FOLIE 33 REINFORCEMENT LEARNING

  34. Algorithm sketch Policy (rules learned, how to play the game) Initialize table with random action probabilities for each state Repeat play episode with policy given by table Record (state 1 ,action 1 ,reward 1 ),(state 2 ,action 2 ,reward 2 ),…. for episode For each step i compute FutureReward i = reward i + reward i+1 +… update table[state i ] s.t. • action i becomes for state i more likely if FutureReward i is “high” • action i becomes for state i less likely if FutureReward i is “low” FOLIE 34 REINFORCEMENT LEARNING

  35. The game: demo FOLIE 35 REINFORCEMENT LEARNING

  36. The bad news: nice idea, but… FOLIE 36 «Image" licensed according to CC BY-SA REINFORCEMENT LEARNING

  37. The bad news: nice idea, but… too many states… too many actions • Too much memory needed • Too much time FOLIE 37 «Image" licensed according to CC BY-SA REINFORCEMENT LEARNING

  38. The solution Policy (rules learned, how to play the game) Idea: Replace lookup table with a neural network that approximates the action probabilities contained in the table Instead of Table[state] = action probabilities Do NeuralNet( state ) ~ action probablities Change to “play episode with policy given by NeuralNet” Change to “update weights of NeuralNet” FOLIE 38 REINFORCEMENT LEARNING

Recommend


More recommend