Introduction to Reinforcement Learning Bayesian Methods in - PowerPoint PPT Presentation

Introduction to Reinforcement Learning Bayesian Methods in Reinforcement Learning ICML 2007

sequential decision making under uncertainty ? How Can I ... ? Move around in the physical world (e.g. driving, navigation) Play and win a game Retrieve information over the web Do medical diagnosis and treatment Maximize the throughput of a factory Optimize the performance of a rescue team Bayesian Methods in Reinforcement Learning ICML 2007

Reinforcement learning Action Reward Environment State RL: A class of learning problems in which an agent interacts with an unfamiliar, dynamic and stochastic environment Goal: Learn a policy to maximize some measure of long-term reward Interaction: Modeled as a MDP or a POMDP Bayesian Methods in Reinforcement Learning ICML 2007

Markov decision processes An MDP is defined as a 5-tuple ( X , A , p, q, p 0 ) : State space of the process X A : Action space of the process p ( ·| x, a ) x t +1 ∼ p ( ·| x t , a t ) : Probability distribution over next state q ( ·| x, a ) : Probability distribution over rewards R ( x t , a t ) ∼ q ( ·| x t , a t ) : Initial state distribution p 0 • Policy: Mapping from states to actions or distributions over actions µ ( x ) ∈ A or µ ( ·| x ) ∈ Pr( A ) Bayesian Methods in Reinforcement Learning ICML 2007

Example: Backgammon States: board configurations 10 20 (about ) Actions: permissible moves Rewards: win +1, lose -1, else 0 Bayesian Methods in Reinforcement Learning ICML 2007

RL applications Backgammon (Tesauro, 1994) Inventory Management (Van Roy, Bertsekas, Lee, & Tsitsiklis, 1996) Dynamic Channel Allocation (e.g. Singh & Bertsekas, 1997) Elevator Scheduling (Crites & Barto, 1998) Robocup Soccer (e.g. Stone & Veloso, 1999) Many Robots (navigation, bi-pedal walking, grasping, switching between skills, ...) Helicopter Control (e.g. Ng, 2003, Abbeel & Ng, 2006) More Applications http://neuromancer.eecs.umich.edu/cgi-bin/twiki/view/Main/SuccessesOfRL Bayesian Methods in Reinforcement Learning ICML 2007

Value Function State Value Function: � ∞ � γ t ¯ � V µ ( x ) = E µ R ( x t , µ ( x t )) | x 0 = x t =0 State-Action Value Function: � ∞ � γ t ¯ � Q µ ( x, a ) = E µ R ( x t , a t ) | x 0 = x, a 0 = a t =0 Bayesian Methods in Reinforcement Learning ICML 2007

Policy Evaluation Finding the value function of a policy Bellman Equations � � � ¯ � p ( x ′ | x, a ) V µ ( x ′ ) V µ ( x ) = µ ( a | x ) R ( x, a ) + γ a ∈ A x ′ ∈ X Q µ ( x, a ) = ¯ � � p ( x ′ | x, a ) µ ( a ′ | x ′ ) Q µ ( x ′ , a ′ ) R ( x, a ) + γ x ′ ∈ X a ′ ∈ A Bayesian Methods in Reinforcement Learning ICML 2007

Policy Optimization µ ∗ Finding a policy maximizing V µ ( x ) ∀ x ∈ X Bellman Optimality Equations � � ¯ � V ∗ ( x ) = max p ( x ′ | x, a ) V ∗ ( x ′ ) R ( x, a ) + γ a ∈ A x ′ ∈ X Q ∗ ( x, a ) = ¯ � p ( x ′ | x, a ) max a ′ ∈ A Q ∗ ( x ′ , a ′ ) R ( x, a ) + γ x ′ ∈ X Q ∗ ( x, a ) = Q µ ∗ ( x, a ) Note: if is available, then an optimal action for a ∗ ∈ arg max a Q ∗ ( x, a ) state is given by any x Bayesian Methods in Reinforcement Learning ICML 2007

Policy Optimization Value Iteration V 0 ( x ) = 0 � � ¯ � p ( x ′ | x, a ) V t ( x ′ ) V t +1 ( x ) = max R ( x, a ) + γ a ∈ A x ′ ∈ X system dynamics unknown Bayesian Methods in Reinforcement Learning ICML 2007

Reinforcement Learning (RL) Action Reward Environment State RL Problem: Solve MDP when transition and/or reward models are unknown Basic Idea: use samples obtained from the agent’s interaction with the environment to solve the MDP Bayesian Methods in Reinforcement Learning ICML 2007

Model-Based vs. Model-Free R L What is model? state transition distribution and reward distribution Model-Based RL: model is not available, but it is explicitly learned Model-Free RL: model is not available and is not explicitly learned Value Function / Policy Model-Based RL Model-Free Acting or or Planning Direct RL Model Experience Model Learning Bayesian Methods in Reinforcement Learning ICML 2007

Reinforcement learning solutions PEGASUS SARSA Genetic Algorithms Q-learning Policy Gradient Value Iteration Algorithms Value Function Policy Search Algorithms Algorithms Actor-Critic Algorithms Sutton, et al. 2000 Konda & Tsitsiklis 2000 Peters, et al. 2005 Bhatnagar, Ghavamzadeh, Sutton 2007 Bayesian Methods in Reinforcement Learning ICML 2007

Learning Modes Offline Learning Learning while interacting with a simulator Online Learning Learning while interacting with the environment Bayesian Methods in Reinforcement Learning ICML 2007

Offline Learning Agent interacts with a simulator Rewards/costs do not matter no exploration/exploitation tradeoff Computation time between actions is not critical Simulator can produce as much as data we wish Main Challenge How to minimize time to converge to optimal policy Bayesian Methods in Reinforcement Learning ICML 2007

Online Learning No simulator - Direct interaction with environment Agent receives reward/cost for each action Main Challenges Exploration/exploitation tradeoff Should actions be picked to maximize immediate reward or to maximize information gain to improve policy Real-time execution of actions Limited amount of data since interaction with environment is required Bayesian Methods in Reinforcement Learning ICML 2007

Bayesian Learning Bayesian Methods in Reinforcement Learning ICML 2007

The bayesian approach Z Y - hidden process , - observable Y Z Goal: infer from measurements of Z Y Known: statistical dependence between and P ( Y | Z ) Z Y Place prior over : reflecting our uncertainty Z P ( Z ) Observe: Y = y P ( y | Z ) P ( Z ) Compute posterior of : Z P ( Z | Y = y ) = � P ( y | Z ′ ) P ( Z ′ ) dZ ′ Bayesian Methods in Reinforcement Learning ICML 2007

Bayesian Learning Pros Principled treatment of uncertainty Conceptually simple Immune to overfitting (prior serves as regularizer) Facilitates encoding of domain knowledge (prior) Cons Mathematically and computationally complex E.g. posterior may not have a closed form How do we pick the prior? Bayesian Methods in Reinforcement Learning ICML 2007

Bayesian RL + Systematic method for inclusion and update of prior knowledge and domain assumptions Encode uncertainty about transition function, reward function, value function, policy, etc. with a probability distribution (belief) Update belief based on evidence (e.g., state, action, reward) Appropriately reconcile exploration with exploitation Select action based on belief Providing full distribution, not just point estimates Measure of uncertainty for performance predictions (e.g. value function, policy gradient) Bayesian Methods in Reinforcement Learning ICML 2007

Bayesian RL Model-based Bayesian RL Distribution over transition probability Model-free Bayesian RL Distribution over value function, policy, or policy gradient Bayesian inverse RL Distribution over reward Bayesian multi-agent RL Distribution over other agents’ policies Bayesian Methods in Reinforcement Learning ICML 2007

Introduction to Reinforcement Learning Bayesian Methods in - PowerPoint PPT Presentation

Introduction to Reinforcement Learning Bayesian Methods in Reinforcement Learning ICML 2007 sequential decision making under uncertainty ? How Can I ... ? Move around in the physical world (e.g. driving, navigation) Play and win a game

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

Introduction to Reinforcement Learning Kevin Chen and Zack Khan Lecture 1: Introduction to

Introduction to Reinforcement Learning and Q-Learning Skyler Seto (ss3349) May 2, 2016 Skyler

Introduction CSCE CSCE 496/896 496/896 Lecture 7: Lecture 7: Reinforcement Reinforcement

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

7. Motor Control and Reinforcement Learning Outline A. Action Selection and Reinforcement B.

1 Deep Reinforcement Learning Qianqian Li, Nayeon Koong, Langtian He What is deep reinforcement

Path following with reinforcement learning for autonomous cars - Mozzam Motiwala (IAS) Index

CSC2621 Topics in Robotics Reinforcement Learning in Robotics Week 11: Hierarchical Reinforcement

Machine Learning for NLP Reinforcement learning Aurlie Herbelot 2019 Centre for Mind/Brain

Restoratio ion o of all ll thin ings - Lindy Strong Energy Healing Kingdom Living Before we

Food Science Basics Session 2: Food Chemistry Basics FoodCrumbles.com Ready, to immerse

The Demand Side of the The Demand Side of the Market Market Starring Starring Utility

KELLOGG COMPANY 2019 Q3 EARNINGS October 29, 2019 KELLOGG COMPANY | Q3 2019 EARNINGS | October

Brazils macro has weakened for years The fiscal imbalance is one Unemployment has of the

Offline Policy-search in Bayesian Reinforcement Learning Castronovo Michael University of Li`

Explicit-State Abstraction: A New Method Abstractions for Generating Heuristic Functions

LICENSING OF SEPs ON FAIR, REASONABLE AND NON-DISCRIMINATORY (FRAND) TERMS By: Prof. Manveen