Monte Carlo Methods CS60077: Reinforcement Learning Abir Das IIT - PowerPoint PPT Presentation

Monte Carlo Methods CS60077: Reinforcement Learning Abir Das IIT Kharagpur Sep 06 and 12, 2019

Agenda Introduction MC Evaluation MC Control Agenda § Understand how to evaluate policies in model-free setting using Monte Carlo methods § Understand Monte Carlo methods in model-free setting for control of Reinforcement Learning problems Abir Das (IIT Kharagpur) CS60077 Sep 06 and 12, 2019 2 / 32

Agenda Introduction MC Evaluation MC Control Resources § Reinforcement Learning by David Silver [Link] § Reinforcement Learning by Balaraman Ravindran [Link] § Monte Carlo Simulation by Nando de Freitas [Link] § SB: Chapter 5 Abir Das (IIT Kharagpur) CS60077 Sep 06 and 12, 2019 3 / 32

Agenda Introduction MC Evaluation MC Control Model Free Setting § Like the previous few lectures, here also we will deal with prediction and control problems but this time it will be in a model-free setting § In model-free setting we do not have the full knowledge of the MDP § Model-free prediction : Estimate the value function of an unknown MDP § Model-free control : Optimise the value function of an unknown MDP § Model-free methods require only experience - sample sequences of states, actions, and rewards ( S 1 , A 1 , R 2 , · · · ) from actual or simulated interaction with an environment. § Actual experince requires no knowledge of the environment’s dynamics. § Simulated experience ‘requires’ models to generate samples only. No knowledge of the complete probability distributions of state transitions is required. In many cases this is easy to do. Abir Das (IIT Kharagpur) CS60077 Sep 06 and 12, 2019 4 / 32

Agenda Introduction MC Evaluation MC Control Monte Carlo § What is the probability that a dart thrown uniformly at random in the unit square will hit the red area? (1,1) (0,1) & (0,0) (1,0) ' , 0 P(area)=? Abir Das (IIT Kharagpur) CS60077 Sep 06 and 12, 2019 5 / 32

Agenda Introduction MC Evaluation MC Control Monte Carlo § What is the probability that a dart thrown uniformly at random in the unit square will hit the red area? (1,1) (0,1) & (0,0) (1,0) ' , 0 P(area)= & ' ⁄ Abir Das (IIT Kharagpur) CS60077 Sep 06 and 12, 2019 6 / 32

Agenda Introduction MC Evaluation MC Control Monte Carlo § What is the probability that a dart thrown uniformly at random in the unit square will hit the red area? (1,1) (0,1) & (0,0) (1,0) ' , 0 ' P(area)= 𝜌 & ' ⁄ Abir Das (IIT Kharagpur) CS60077 Sep 06 and 12, 2019 8 / 32

Agenda Introduction MC Evaluation MC Control Monte Carlo § What is the probability that a dart thrown uniformly at random in the unit square will hit the red area? (1,1) (0,1) & (0,0) (1,0) ' , 0 # *+, -./+0 P(area)= # -12+ -./+0 Abir Das (IIT Kharagpur) CS60077 Sep 06 and 12, 2019 10 / 32

Agenda Introduction MC Evaluation MC Control Monte Carlo § What is the probability that a dart thrown uniformly at random in the unit square will hit the red area? (1,1) (0,1) x x x x x x xx x x x x 8 x x x x x 19 x x & (0,0) (1,0) ' , 0 # *+,-. /0 ,1* +,1+ P(area)= # *+,-. Abir Das (IIT Kharagpur) CS60077 Sep 06 and 12, 2019 11 / 32

Agenda Introduction MC Evaluation MC Control History of Monte Carlo § The bomb and ENIAC Image taken from: www.livescience.com Image taken from: www.digitaltrends.com Abir Das (IIT Kharagpur) CS60077 Sep 06 and 12, 2019 12 / 32

Agenda Introduction MC Evaluation MC Control Monte Carlo for Expectation Calculation � § Lets say we want to compute E [ f ( x )] = f ( x ) p ( x ) dx � x ( i ) � N § Draw i.i.d. samples i =1 from the probability density p ( x ) Image taken from: Nando de Freitas: MLSS 08 N � δ x ( i ) ( x ) [ δ x ( i ) ( x ) is impulse at x ( i ) on x axis] § Approximate p ( x ) ≈ 1 N i =1 � � N � f ( x ) 1 § E [ f ( x )] = f ( x ) p ( x ) dx ≈ δ x ( i ) ( x ) dx = N i =1 � N N � � � x ( i ) � 1 = 1 f ( x ) δ x ( i ) ( x ) dx f N N i =1 i =1 � �� f ( x ( i ) ) Abir Das (IIT Kharagpur) CS60077 Sep 06 and 12, 2019 13 / 32

Agenda Introduction MC Evaluation MC Control Monte Carlo Policy Evaluation § Learn v π from episodes of experience under policy π S 1 , A 1 , R 2 , S 2 , A 2 , R 3 , · · · , S k , A k , R k ∼ π § Recall that the return is the total discounted reward: G t = R t +1 + γR t +2 + · · · + γ T − 1 R T § Recall that the value function is the expected return: v π ( s ) = E [ G t | S t = s ] § Monte-Carlo policy evaluation uses empirical mean return instead of expected return Abir Das (IIT Kharagpur) CS60077 Sep 06 and 12, 2019 14 / 32

Agenda Introduction MC Evaluation MC Control First Visit Monte Carlo Policy Evaluation § To evaluate state s i.e. to learn v π ( s ) § The first time-step t that state s is visited in an episode, § Increment counter N ( s ) ← N ( s ) + 1 § Increment total retun S ( s ) ← S ( s ) + G t § Value is estimated by mean return V ( s ) = S ( s ) /N ( s ) § By law of large number, V ( s ) → v π ( s ) as N ( s ) → ∞ Abir Das (IIT Kharagpur) CS60077 Sep 06 and 12, 2019 15 / 32

Agenda Introduction MC Evaluation MC Control Every Visit Monte Carlo Policy Evaluation § To evaluate state s i.e. to learn v π ( s ) § Every time-step t that state s is visited in an episode, § Increment counter N ( s ) ← N ( s ) + 1 § Increment total retun S ( s ) ← S ( s ) + G t § Value is estimated by mean return V ( s ) = S ( s ) /N ( s ) § By law of large number, V ( s ) → v π ( s ) as N ( s ) → ∞ Abir Das (IIT Kharagpur) CS60077 Sep 06 and 12, 2019 16 / 32

Agenda Introduction MC Evaluation MC Control Blackjack Example States (200 of them): Current sum (12-21) Dealer’s showing card (ace-10) Do I have a “useable” ace? (yes-no) Action stick: Stop receiving cards (and terminate) Action twist: Take another card (no replacement) Reward for stick: +1 if sum of cards > sum of dealer cards 0 if sum of cards = sum of dealer cards -1 if sum of cards < sum of dealer cards Reward for twist: -1 if sum of cards > 21 (and terminate) 0 otherwise Transitions: automatically twist if sum of cards < 12 Slide courtesy: David Silver [Deepmind] Abir Das (IIT Kharagpur) CS60077 Sep 06 and 12, 2019 17 / 32

Agenda Introduction MC Evaluation MC Control Blackjack Example Policy: stick if sum of cards ≥ 20, otherwise twist Slide courtesy: David Silver [Deepmind] Abir Das (IIT Kharagpur) CS60077 Sep 06 and 12, 2019 18 / 32

Agenda Introduction MC Evaluation MC Control Monte Carlo Control § We will now, see how Monte Carlo estimation can be used in control . § This is mostly like the generalized policy iteration (GPI) where one maintains both an approximate policy and an approximate value function. Abir Das (IIT Kharagpur) CS60077 Sep 06 and 12, 2019 19 / 32

Agenda Introduction MC Evaluation MC Control Monte Carlo Control § We will now, see how Monte Carlo estimation can be used in control . § This is mostly like the generalized policy iteration (GPI) where one maintains both an approximate policy and an approximate value function. § Policy evaluation is done as Monte Carlo evaluation § Then, we can do greedy policy improvement. Abir Das (IIT Kharagpur) CS60077 Sep 06 and 12, 2019 19 / 32

Agenda Introduction MC Evaluation MC Control Monte Carlo Control § We will now, see how Monte Carlo estimation can be used in control . § This is mostly like the generalized policy iteration (GPI) where one maintains both an approximate policy and an approximate value function. § Policy evaluation is done as Monte Carlo evaluation § Then, we can do greedy policy improvement. § What is the problem!! Abir Das (IIT Kharagpur) CS60077 Sep 06 and 12, 2019 19 / 32

Agenda Introduction MC Evaluation MC Control Monte Carlo Control § We will now, see how Monte Carlo estimation can be used in control . § This is mostly like the generalized policy iteration (GPI) where one maintains both an approximate policy and an approximate value function. § Policy evaluation is done as Monte Carlo evaluation § Then, we can do greedy policy improvement. § What is the problem!! � � § π ′ ( s ) . r ( s, a ) + γ � p ( s ′ | s, a ) v π ( s ′ ) = arg max a ∈A s ′ ∈S Abir Das (IIT Kharagpur) CS60077 Sep 06 and 12, 2019 19 / 32

Agenda Introduction MC Evaluation MC Control Monte Carlo Control § Greedy policy improvement over v ( s ) requires model of MDP � � π ′ ( s ) . r ( s, a ) + γ � p ( s ′ | s, a ) v π ( s ′ ) = arg max a ∈A s ′ ∈S Abir Das (IIT Kharagpur) CS60077 Sep 06 and 12, 2019 20 / 32

Monte Carlo Methods CS60077: Reinforcement Learning Abir Das IIT - PowerPoint PPT Presentation

Monte Carlo Methods CS60077: Reinforcement Learning Abir Das IIT Kharagpur Sep 06 and 12, 2019 Agenda Introduction MC Evaluation MC Control Agenda Understand how to evaluate policies in model-free setting using Monte Carlo methods

Monte Carlo Methods Guojin Chen Christopher Cprek Chris Rambicure Monte Carlo Methods 1.

Monte Carlo Generators Monte Carlo Generators Monte Carlo Generators QCD Lecture III P .

Chapter 5: Monte Carlo Methods Monte Carlo methods are learning methods Experience

Monte Carlo Approximation of Monte Carlo Filters Adam M. Johansen et al. Collaborators Include:

BROCHURE 2019 TETRA JUICES DEL MONTE DEL MONTE 6 x 1L GOLD PINEAPPLE 6 x 1L 6 x 1L 6 x 1L

Monte Carlo Estimation 7 January 2019 OSU CSE 1 Monte Carlo Methods Class of computational

Monte Carlo Methods An introduction to Monte Carlo (MC) methods How to use MC methods

Monte Carlo methods for volumetric light transport Monte Carlo methods for volumetric light

Monte Carlo Methods for physically based Volume rendering Monte Carlo Methods for physically based

Draft Introduction to (randomized) quasi-Monte Carlo Pierre LEcuyer MCQMC Conference,

Monte Carlo Localization Ximing Yu March 24, 2009 Ximing Yu Monte Carlo Localization 1

Monte Carlo Control CMPUT 366: Intelligent Systems S&B 5.3-5.5, 5.7 Lecture Outline 1.

4. THE MONTE CARLO METHOD 4.1 I ntroduction This chapter is aimed at describing the Monte Carlo

MARKOV CHAIN MONTE CARLO METHODS MARKOV CHAIN MONTE CARLO METHODS MARKO LAINE, FMI MARKO LAINE,

Monte Carlo Methods and Area Estimates CS3220 - Summer 2008 Jonathan Kaldor Monte Carlo Methods

Monte Carlo Methods Monte Carlo Methods I, at any rate, am convinced that He does not throw dice.

Unified View width of backup Dynamic Temporal- programming difference learning height

Notes on Quantitative UX Research at Google Chris Chapman Quantitative UX Researcher Overview

A Reinforcement Learning and Synthetic Data Approach to Mobile Notification Management Rowan

Graph Sketching, Sampling, Streaming, and Space Efficient Optimization (Part II) Sudipto Guha

the Division. Comments issued by the CF Staff may be different from those included here based

Linear Regression Michael R. Roberts Department of Finance The Wharton School University of

Workplace Safety and Health: An Investigative Reporters Experience Howard Berkes Correspondent,

and Front-End Electronics for HEP/NP I S A R M O S T A F A N E Z H A D , P H . D . P R I N C I

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

Monte Carlo Methods CS60077: Reinforcement Learning Abir Das IIT - PowerPoint PPT Presentation

Monte Carlo Methods CS60077: Reinforcement Learning Abir Das IIT Kharagpur Sep 06 and 12, 2019 Agenda Introduction MC Evaluation MC Control Agenda Understand how to evaluate policies in model-free setting using Monte Carlo methods

Monte Carlo Methods Guojin Chen Christopher Cprek Chris Rambicure Monte Carlo Methods 1.

Monte Carlo Generators Monte Carlo Generators Monte Carlo Generators QCD Lecture III P .

Chapter 5: Monte Carlo Methods Monte Carlo methods are learning methods Experience

Monte Carlo Approximation of Monte Carlo Filters Adam M. Johansen et al. Collaborators Include:

BROCHURE 2019 TETRA JUICES DEL MONTE DEL MONTE 6 x 1L GOLD PINEAPPLE 6 x 1L 6 x 1L 6 x 1L

Monte Carlo Estimation 7 January 2019 OSU CSE 1 Monte Carlo Methods Class of computational

Monte Carlo Methods An introduction to Monte Carlo (MC) methods How to use MC methods

Monte Carlo methods for volumetric light transport Monte Carlo methods for volumetric light

Monte Carlo Methods for physically based Volume rendering Monte Carlo Methods for physically based

Draft Introduction to (randomized) quasi-Monte Carlo Pierre LEcuyer MCQMC Conference,

Monte Carlo Localization Ximing Yu March 24, 2009 Ximing Yu Monte Carlo Localization 1

Monte Carlo Control CMPUT 366: Intelligent Systems S&amp;B 5.3-5.5, 5.7 Lecture Outline 1.

4. THE MONTE CARLO METHOD 4.1 I ntroduction This chapter is aimed at describing the Monte Carlo

MARKOV CHAIN MONTE CARLO METHODS MARKOV CHAIN MONTE CARLO METHODS MARKO LAINE, FMI MARKO LAINE,

Monte Carlo Methods and Area Estimates CS3220 - Summer 2008 Jonathan Kaldor Monte Carlo Methods

Monte Carlo Methods Monte Carlo Methods I, at any rate, am convinced that He does not throw dice.

Unified View width of backup Dynamic Temporal- programming difference learning height

Notes on Quantitative UX Research at Google Chris Chapman Quantitative UX Researcher Overview

A Reinforcement Learning and Synthetic Data Approach to Mobile Notification Management Rowan

Graph Sketching, Sampling, Streaming, and Space Efficient Optimization (Part II) Sudipto Guha

the Division. Comments issued by the CF Staff may be different from those included here based

Linear Regression Michael R. Roberts Department of Finance The Wharton School University of

Workplace Safety and Health: An Investigative Reporters Experience Howard Berkes Correspondent,

and Front-End Electronics for HEP/NP I S A R M O S T A F A N E Z H A D , P H . D . P R I N C I

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

Monte Carlo Control CMPUT 366: Intelligent Systems S&B 5.3-5.5, 5.7 Lecture Outline 1.