reinforcement learning through the optimization lens Benjamin Recht - PowerPoint PPT Presentation

Aug 13, 2023 •280 likes •1.03k views

reinforcement learning through the optimization lens Benjamin Recht University of California, Berkeley trustable, scalable, predictable Control Theory ! Reinforcement Learning is the study of how to use past data to enhance the future

reinforcement learning through the optimization lens Benjamin Recht University of California, Berkeley
trustable, scalable, predictable
Control Theory ! Reinforcement Learning is the study of how to use past data to enhance the future manipulation of a dynamical system
Disciplinary Biases AE/CE/EE/ME CS Reinforcement Control Theory Learning RL Control continuous discrete model action data action IEEE Transactions Science Magazine
Disciplinary Biases AE/CE/EE/ME CS Reinforcement Today’s talk will try to unify these camps and point Control Theory Learning out how to merge their perspectives. RL Control continuous discrete model action data action IEEE Transactions Science Magazine
Main research challenge: What are the fundamental limits of learning systems that interact with the physical environment? How well must we understand a system in order to control it? •statistical learning theory theoretical •robust control theory foundations •core optimization
Control theory is the study of dynamical systems with inputs y u x t + 1 = Ax t + Bu t G y t = Cx t + Du t x t Simplest case of such systems are linear systems x t is called the state , and the dimension of the state is called the degree, d. u t is called the input , and the dimension is p. y t is called the output , and the dimension is q. For today, will only consider C = I , D = 0 ( x t observed)
Reinforcement e t e r c Learning s d i Control theory is the study of dynamical systems with inputs ^ y u p ( x t + 1 | past ) = p ( x t + 1 | x t , u t ) G p ( y t | past ) = p ( y t | x t , u t ) x t Simplest example: Partially Observed Markov Decision Process (POMDP) x t is the state , and it takes values in [d] u t is called the input , and takes values in [p]. y t is called the output , and takes values in [q]. For today, will only consider when x t observed (MDP).

Recommend

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Reinforcement Learning Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning: an Introduction, 2nd Edition: Chapters 6 (6.1 6.5) Outline Reinforcement Learning Reinforcement Learning: the

592 views • 27 slides

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

Reinforcement Learning Q-Learning Deep Q-Learning on Atari Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement Learning Q-Learning Deep Q-Learning on Atari Table of Contents Reinforcement Learning

942 views • 63 slides

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Introduction to Reinforcement Learning RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem Inside an RL agent Temporal difference learning Many faces of Reinforcement Learning What is

555 views • 35 slides

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning<br/><br/> 4/25/19, 8*06 PM Reinforcement Learning<br/><br/> 4/25/19, 8*06 PM Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning? Spring 2019 Created:

372 views • 15 slides

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Reinforcement Learning and Simulation-Based Search Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and Simulation-Based Search Outline 1 Reinforcement Learning 2 Simulation-Based Search 3 Planning Under

426 views • 20 slides

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine playing a new game whose rules you dont know; after a hundred or so moves your don t know; after a hundred or so moves, your opponent announces, You

512 views • 30 slides

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest Lecture May 24, 2017 Lecture overview What makes a reinforcement learning algorithm safe ? Notation Creating a safe reinforcement learning

1.43k views • 88 slides

MK Lens Promotion New Lens Concept Keep 4K Cabrio quality, Light weight, Compact, Affordable,

MK Lens Promotion New Lens Concept Keep 4K Cabrio quality, Light weight, Compact, Affordable, Cinema zoom lens!! MK18-55mm T2.9 MK50-135mm T2.9 Model Name Lens Mount Sony E mount Sony E mount Focal Langth 18-55mm 50-135mm Zoom Ration

551 views • 9 slides

1 Converging Lenses Biconvex lens Planoconvex lens Positive meniscus lens Oct 289:56 AM

Lenses 2 types converging and diverging Oct 289:52 AM Converging Lenses rays that are parallel to the principal axis, strike the lens, and converge to a single point on the opposite side of the lens (they pass through 2 surfaces) Oct

703 views • 10 slides

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning Haarnoja, Tang et al. (2017) Reinforcement Learning with Deep Energy Based Policies, ICML . Haarnoja, Zhou et al. (2018) Soft Actor-Critic: Off-Policy

684 views • 24 slides

Introduction to Reinforcement Learning Kevin Chen and Zack Khan Lecture 1: Introduction to

Lecture 1: Introduction to Reinforcement Learning Introduction to Reinforcement Learning Kevin Chen and Zack Khan Lecture 1: Introduction to Reinforcement Learning Outline 1. Course Logistics 2. What is Reinforcement Learning? 3.

931 views • 67 slides

CONDENSER CONDENSER LENS LENS AMRITA CHAKRABORTY 02/01/2016 WHAT IS IT? CONDENSER is a

INSTRUMENTAL TECHNIQUE PRESENTATION CONDENSER CONDENSER LENS LENS AMRITA CHAKRABORTY 02/01/2016 WHAT IS IT? CONDENSER is a lens that serves to concentrate light from the illumination source in a microscope that is in turn focused

237 views • 10 slides

E-lens APEX Study Plan E-lens teams HOBBC experiments at 255 GeV Test of lattice

E-lens APEX Study Plan E-lens teams HOBBC experiments at 255 GeV Test of lattice Dynamic Aperture (in operations with 1 BB collision) Polarization and polarization lifetime (in operation) Phase advance IP8 to e-lens (setup or

216 views • 8 slides

Question: If youre building a camera and want to make a larger image (a telephoto lens) you

Question: If youre building a camera and want to make a larger image (a telephoto lens) you should: Cameras 1. increase the diameter of the lens 2. decrease the diameter of the lens 3. increase the curvature of the lens 4. decrease the

311 views • 3 slides

Neural Combinatorial Optimization With Reinforcement Learning CS885 Reinforcement Learning Paper

Neural Combinatorial Optimization With Reinforcement Learning CS885 Reinforcement Learning Paper by Bello, I., Pham, H., Le, Q. V., Norouzi, M., & Bengio, S. (2016) Presented by Yan Shi Outline 1. Introduction 2. Background 3. Algorithms

969 views • 27 slides

Introduction to Reinforcement Learning and Q-Learning Skyler Seto (ss3349) May 2, 2016 Skyler

Reinforcement Learning and Markov Decision Process Q-Learning Q-Learning Convergence Introduction to Reinforcement Learning and Q-Learning Skyler Seto (ss3349) May 2, 2016 Skyler Seto (ss3349) Introduction to Reinforcement Learning and

567 views • 27 slides

W h e n i s a L i n e a r S y s t e m W h e n i s a L i n e a r S

MTNS06, Kyoto (July, 2006) W h e n i s a L i n e a r S y s t e m W h e n i s a L i n e a r S y s t e m E a s y o r D i f f i c u l t E a s y o r D i f f i c u l t t o C o n

923 views • 43 slides

Programming-Model Centric Debugging for Multicore Embedded Systems Kevin Pouget, UJF-LIG,

s t m i c r o e l e c t r o n i c s , u n i v e r s i t y o f g r e n o b l e / l i g l a b o r a t o r y STMicroelectronics LIG University of Grenoble Programming-Model Centric Debugging for Multicore Embedded Systems Kevin Pouget, UJF-LIG,

844 views • 69 slides

What's new in GStreamer Land The last 2 years and the future FOSDEM 2017, Brussels Open Media

What's new in GStreamer Land The last 2 years and the future FOSDEM 2017, Brussels Open Media Devroom 5 February 2017 Sebastian Drge <sebastian@centricular.com> Tim Mller <tim@centricular.com> Centricular Introduction

648 views • 29 slides

cse 311: foundations of computing Spring 2015 Lecture 10: Functions, Modular arithmetic [this

cse 311: foundations of computing Spring 2015 Lecture 10: Functions, Modular arithmetic [this special lecture was given by a 5-year-old] a little recap So far: - Propositional logic - Logic to build circuits - Predicates and quantifiers -

597 views • 22 slides

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Outline Loop

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Outline Loop transfer recovery Recall The LQR loop for the system , x ( t ) = Ax ( t ) + Bu ( t ) u ( t ) R u ( s ) 0 K (

380 views • 21 slides

Step Response Analysis. Frequency Response, Relation Between Model Descriptions Automatic

Step Response Analysis. Frequency Response, Relation Between Model Descriptions Automatic Control, Basic Course, Lecture 3 Gustav Nilsson 17 November 2016 Lund University, Department of Automatic Control Content 1. Step Response Analysis 2.

618 views • 57 slides

Signal and Systems Chapter 6: Time-Frequency Characterization of Systems Magnitude/Phase of

Signal and Systems Chapter 6: Time-Frequency Characterization of Systems Magnitude/Phase of Transforms and Frequency Responses Linear and Nonlinear Phase Ideal and Nonideal Frequency-Selective Filters CT & DT Rational

366 views • 20 slides

Current state on filter approximation and evaluation Thibault Hilaire (thibault.hilaire@lip6.fr)

Filter Context Realizations Evaluation Approximation Conclusion Current state on filter approximation and evaluation Thibault Hilaire (thibault.hilaire@lip6.fr) Kick-off ANR MetaLibm January 22, 2014 Filter approximation and evaluation

636 views • 50 slides