Class 4: On Policy Prediction With Approximation Chapter 9 Sutton - PowerPoint PPT Presentation

Jan 02, 2023 •248 likes •423 views

Class 4: On Policy Prediction With Approximation Chapter 9 Sutton slides/silver slides 295, class 4 1 Forms of approximations functions: A linear approximation, a neural network, a decision tree 295, class 4 2 The

Class 4: On Policy Prediction With Approximation Chapter 9 Sutton slides/silver slides 295, class 4 1
Forms of approximations functions: • A linear approximation, • a neural network, • a decision tree 295, class 4 2
The Prediction Objective With approximation we can no longer hope to converge to the exact value for each state. We must specify a state weighting or distribution representing how much we care about the error in each state s. The objective function is to minimize the Mean Square Value Error, denoted: mu(s) is the fraction of time spent in s, which is called “on - policy distribution” The continuing case and the episodic case are different. • It is not obvious that the above is a good objective for RL (we want the value function in order to generate a good policy, but this is what we use. • For a general function form no guarantee to converge to optimal w* 17
Stochastic-gradient and Semi-gradient Methods 295, class 4 18
General Stochastic Gradient Descent 295, class 4 19
295, class 4 20
Gradient Monte Carlo Algorithm for estimating v we cannot perform the exact update (9.5) because v(S t ) is unknown, but we can approximate it by substituting U t in place of v(S t ). This yields the following general SGD method for state-value prediction: Th egeneral SGD (aiming at G_t) converges to a local optimum approximation 21
Semi Gradient Methods Replacing G_t with a bootstrapping target such as TD(0) or G_{t:t+n} will not guarantee convergence (but for linear functions) semi-gradient (bootstrapping) methods offer important advantages: they typically enable significantly faster learning, without waiting for the end of an episode. This enables them to be used on continuing problems and provides computational advantages. A prototypical semi-gradient method is semi-gradient TD(0), 295, class 4 22
State Aggregation 295, class 4 23
295, class 4 24
295, class 4 25
295, class 4 26
Linear Methods X(s) is a feature vector with the same dimensionality as w In the linear case there is only one optimum thus the Semi-SGD is guaranteed to converge to or near a local optimum. SGD does converges to the global optimum if alpha satisfies the usual conditions Of reducing over time. 27
TD(0) Convergence 295, class 4 28
Bootstrapping on the 1000-state random walk 295, class 4 29
295, class 4 31
n-Step Semi-gradient TD for v 295, class 4 32

Recommend

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction Our goal today To define a Structure and Structured Prediction 1 What are structures? 2 Examples of structured data? 3 Examples of structured

661 views • 34 slides

Branch Prediction Branch Prediction vs vs Execution Time Execution Time Prediction

Branch Prediction Branch Prediction vs vs Execution Time Execution Time Prediction Prediction Jakob Engblom, PhD Jakob Engblom, PhD Uppsala Unive University rsity & Virtutech Inc. & Virtutech Inc. Uppsala virtutech virtu

377 views • 26 slides

Prediction of Prediction of Class and Property Assertions Class and Property Assertions on OWL

Prediction of Prediction of Class and Property Assertions Class and Property Assertions on OWL Ontologies through on OWL Ontologies through Evidence Combination Evidence Combination Giuseppe Rizzo Giuseppe Rizzo Nicola Fanizzi Nicola

231 views • 22 slides

Using lasso and related estimators for prediction Di Liu StataCorp July 12, 2019 1 / 20

Using lasso and related estimators for prediction Di Liu StataCorp July 12, 2019 1 / 20 Prediction What is a prediction? Prediction is to predict an outcome variable on new (unseen) data Good prediction minimizes mean-squared error (or other

217 views • 20 slides

Prediction and Odds 18.05 Spring 2017 Probabilistic Prediction Also called probabilistic

Prediction and Odds 18.05 Spring 2017 Probabilistic Prediction Also called probabilistic forecasting. Assign a probability to each outcome of a future experiment. Prediction: It will rain tomorrow. Probabilistic prediction: Tomorrow it

480 views • 26 slides

Using Stata 16s lasso features for prediction and inference Di Liu StataCorp 1 / 50

Using Stata 16s lasso features for prediction and inference Di Liu StataCorp 1 / 50 Motivation I: Prediction What is a prediction? Prediction is to predict an outcome variable on new (unseen) data Good prediction minimizes mean-squared

766 views • 53 slides

CS 104 Computer Organization and Design Branch Prediction CS104:Branch Prediction 1 Branch

CS 104 Computer Organization and Design Branch Prediction CS104:Branch Prediction 1 Branch Prediction Quick Overview App App App Now that we know about SRAMs System software Mem CPU I/O CS104: Branch Prediction 2 Branch

677 views • 21 slides

Exercise 7a: Additional Intra Prediction Modes Implement Additional Block Prediction Modes Add

Exercise 7a: Additional Intra Prediction Modes Implement Additional Block Prediction Modes Add the following two modes (in addition to DC mode) vertical prediction (see figure & lecture) horizontal prediction (see figure & lecture) Use

704 views • 4 slides

Programming Abstraction in C++ Eric S. Roberts and Julie Zelenski Stanford University 2010

Vector Class Grid Class Stack Class Queue Class Map Class Lexicon Class Scanner Class Iterators Programming Abstraction in C++ Eric S. Roberts and Julie Zelenski Stanford University 2010 Vector Class Grid Class Stack Class Queue Class

586 views • 54 slides

BIBLICAL SURVEY Introductory Class Introductory Class BIBLICAL SURVEY Introductory Class

BIBLICAL SURVEY Introductory Class Introductory Class BIBLICAL SURVEY Introductory Class Introductory Class BIBLICAL SURVEY Introductory Class Introductory Class BIBLICAL SURVEY Introductory Class Introductory Class From here From here BIBLICAL

840 views • 69 slides

Curriculum on The Cadet Corps Uniform Class A Uniform Class A Uniform Agenda C1. Class A

Curriculum on The Cadet Corps Uniform Class A Uniform Class A Uniform Agenda C1. Class A Uniform CLASS A UNIFORM C1. Wear the Class A Uniform to standard. Class A Uniform Known as Black Service Uniform or Class A Worn mainly

502 views • 14 slides

Electing Your Membership Class Class TG, Class TH, or Class DC As a school employee who

Electing Your Membership Class Class TG, Class TH, or Class DC As a school employee who first became an active member of PSERS on or after July 1, 2019, you are automatically enrolled as a Class TG member. Class TG is a hybrid of both

542 views • 17 slides

TwissOptics Class Joschua Dilly TwissOptics Class 2 The TwissOptics Class Resonance Driving

TwissOptics Class Joschua Dilly TwissOptics Class 2 The TwissOptics Class Resonance Driving Terms Linear Dispersion Coupling Chromatic Beating Conclusion To do TwissOptics Class 3 The TwissOptics Class TwissOptics Class 4 New class

450 views • 24 slides

DeepLoc Data set statistics & performance Protein prediction II Gregor Sturm, Johannes Rest,

Protein Prediction II DeepLoc DeepLoc Data set statistics & performance Protein prediction II Gregor Sturm, Johannes Rest, Lukas Friedrich, Dominik Mller 1 Page: * Protein Prediction II: DeepLoc Protein Prediction II DeepLoc

446 views • 13 slides

(seasonal) prediction systems Arun Kumar Climate Prediction Center College Park, Maryland, USA

Design and framework of long-range (seasonal) prediction systems Arun Kumar Climate Prediction Center College Park, Maryland, USA arun.kumar@noaa.gov IITM-ICTP ESM Workshop 21 July, 2016 1/32 Outline What is long-range prediction and

544 views • 32 slides

Summary of part I: prediction and RL Prediction is important for action selection The

Summary of part I: prediction and RL Prediction is important for action selection The problem: prediction of future reward The algorithm: temporal difference learning Neural implementation: dopamine dependent learning in BG A

939 views • 60 slides

Math 211 Math 211 Lecture #1 Introduction August 26, 2002 2 Welcome to Math 211 Welcome to

1 Math 211 Math 211 Lecture #1 Introduction August 26, 2002 2 Welcome to Math 211 Welcome to Math 211 Math 211 Section 1 John C. Polking Herman Brown 402 713-348-4841 polking@rice.edu Office Hours: 2:30 3:30 TWTh and by

492 views • 4 slides

Function Approximation for (on policy) Prediction and Control Lecture 8, CMU 10-403 Katerina

Carnegie Mellon School of Computer Science Deep Reinforcement Learning and Control Function Approximation for (on policy) Prediction and Control Lecture 8, CMU 10-403 Katerina Fragkiadaki Used Materials Disclaimer : Much of the material

642 views • 41 slides

Todays Outline Reinforcement Learning II Dan Weld Review Reinforcement Learning Review

10/26/2012 CSE 573: Artificial Intelligence Todays Outline Reinforcement Learning II Dan Weld Review Reinforcement Learning Review MDPs New MDP Algorithm: Q-value iteration Review Q-learning Large MDPs Linear function

415 views • 8 slides

17. Review Linear approximation: f f x x + f y y. Tangent plane: to z = f ( x, y )

17. Review Linear approximation: f f x x + f y y. Tangent plane: to z = f ( x, y ) at ( x 0 , y 0 , z 0 ) z z 0 = f x ( x x 0 ) + f y ( y y 0 ) . Let w = f ( x, y, z ). Chain rule: d w = f x d x + f y d y + f z d z. So

454 views • 4 slides

Nonlinear Equations = 40 / How can we solve these equations? Spring force:

Nonlinear Equations = 40 / How can we solve these equations? Spring force: = ! = 0.5 / What is the displacement when = 2N? Drag force: = 0.5 ! " = !

544 views • 23 slides

Systems of Nonlinear Equations CS3220 - Summer 2008 Jonathan Kaldor From 1 to N So far, we

Systems of Nonlinear Equations CS3220 - Summer 2008 Jonathan Kaldor From 1 to N So far, we have considered finding roots of scalar equations f(x) = 0 f(x) takes a single scalar argument, computes a single scalar Much like for

428 views • 24 slides

LOCAL LINEAR APPROXIMATION MATH 200 GOALS Be able to compute the local linear approximation

MATH 200 WEEK 6 - WEDNESDAY LOCAL LINEAR APPROXIMATION MATH 200 GOALS Be able to compute the local linear approximation for a function of two or more variables at a given point. Be able to use a local linear approximation to estimate a

561 views • 14 slides

Polygon Rendering Methods Ray Casting Given a freeform surface, one usually Simplest

Polygon Rendering Methods Ray Casting Given a freeform surface, one usually Simplest shading approach is to perform independent approximates the surface as a polyhedra. lighting calculation for every pixel How do we calculate in

603 views • 9 slides