Model-Free Methods Model-Free Methods Model-based: use all branches - PowerPoint PPT Presentation

Dec 29, 2022 •99 likes •281 views

Model-Free Methods Model-Free Methods Model-based: use all branches S 2 A 1 S 3 R=2 A 2 S 2 S 1 S 3 A 3 R= -1 S 1 In model-based we update V (S) using all the possible S In model-free we take a step, and update based on this sample

Model-Free Methods Model-Free Methods
Model-based: use all branches S 2 A 1 S 3 R=2 A 2 S 2 S 1 S 3 A 3 R= -1 S 1 In model-based we update V π (S) using all the possible S’ In model-free we take a step, and update based on this sample
Model-based: use all branches S 2 A 1 S 3 R=2 A 2 S 2 S 1 S 3 A 3 R= -1 S 1 In model-free we take a step, and update based on this sample <V> ← <V> + α (V – <V> ) V(S 1 ) ← V(S 1 ) + α [r + γ V (S 3 ) - V(S 1 ) ]
On-line: take an action A, ending at S 1 S 1 r 1 A S 2 S t S 3 <V> ← <V> + α (V – <V> )
TD Prediction Algorithm Terminology: Prediction -- computing V π (S) for a given π Prediction error: [r + γV (S') – V(S)] Expected : V(S), observed: r + γV (S')
Learning a Policy: Exploration problem: take an action A, ending at S 1 S 1 r 1 A S 2 S t S 3 Update S t then update S 1 May never explore the alternative actions to A
From Value to Action • Based on V(S), action can be selected • ‘Greedy’ selection is not good enough (Select action A with current max expected future reward) • Need for ‘exploration’ • For example: ‘ε - greedy’ • Max return with p = 1- ε, and with p=ε one of the other actions • Can be a more complex decision • Done here in episodes
TD Policy Learning ε -greedy ε -greedy performs exploration Can be more complex, e.g. changing ε with time or with conditions
TD ‘Actor - Critic’ Terminology: Prediction is the same as policy evaluation. Computing V π (S) ‘actor’ Motivated by brain modeling
‘Actor - critic’ scheme -- standard drawing Motivated by brain modeling (E.g. Ventral striatum is the critic, dorsal striatum is the actor)
Q-learning • The main algorithm used for model-free RL
Q-values (state-action) S 2 A 1 S 3 R=2 Q(S 1, A 1 ) A 2 S 2 S 1 Q(S 1, A 3 ) S 3 A 3 R= -1 S 1 Q π (S,a) is the expected return starting from S, taking the action a, and thereafter following policy π
Q-value (state-action) • The same update is done on Q-values rather than on V • Used in most practical algorithms and some brain models • Q π (S,a) is the expected return starting from S, taking the action a, and thereafter following policy π :
Q-values (state-action) S 2 A 1 S 3 Q(S 1, A 1 ) R=2 A 2 S 2 S 1 Q(S 1, A 3 ) S 3 A 3 R= -1 S 1
SARSA It is called SARSA because it uses s(t) a(t) r(t+1) s(t+1) a(t+1) A step like this uses the current π, so that each S has its a = π(S)
SARSA RL Algorithm Epsilon greedy: with probability epsilon do not select the greedy action, but with equal probability among all actions
On Convergence • Using episodes: • Some of the states are ‘ terminals ’ • When the computation reaches a terminal s, it stops. • Re-starts at a new state s according to some probability • At the starting state, each action has a non-zero probability (exploration) • As the number of episodes goes to infinity, Q(S,A) will converge to Q * (S,A).

Recommend

Meshless Meshless Methods Meshless Meshless Methods Methods Methods Contents

Meshless Meshless Methods Meshless Meshless Methods Methods Methods Contents Introduction Mess Free Methods Element Free Galerkin Method Element Free Galerkin Method Moving Particle Semi-Implicit Method Conclusion C l i

562 views • 38 slides

Outline Scale-Free Networks Networks Scale-Free Networks Original model Original model

Scale-Free Outline Scale-Free Networks Networks Scale-Free Networks Original model Original model Original model Introduction Introduction Model details Model details Complex Networks, Course 295A, Spring, 2008 Introduction Analysis

298 views • 12 slides

Chapter 5: Monte Carlo Methods Monte Carlo methods are learning methods Experience

Chapter 5: Monte Carlo Methods Monte Carlo methods are learning methods Experience values, policy Monte Carlo methods can be used in two ways: ! model-free: No model necessary and still attains optimality ! Simulated: Needs only a

2.17k views • 32 slides

Gluten Free & Free From August 25, 2014 Natural & Organic ECRM San Diego, CA Agenda

Gluten Free & Free From August 25, 2014 Natural & Organic ECRM San Diego, CA Agenda Gluten Free Overview Free From Opportunity Free From Execution Q&A 2 Gluten Free Overview The Facts About Celiac Disease Celiac Disease

554 views • 28 slides

INTERSTARCH GLUTEN-FREE BAKING MIXES 1 GLUTEN-FREE PRODUCTS from INTERSTARCH GLUTEN -FREE

INTERSTARCH GLUTEN-FREE BAKING MIXES 1 GLUTEN-FREE PRODUCTS from INTERSTARCH GLUTEN -FREE BAKING MIX FOR WHITE GLUTEN -FREE BAKING MIX FOR GLUTEN -FREE BAKING MIX FOR FIBER Brand name BREAD BAKING HOMEMADE BREAD BAKING

451 views • 6 slides

The Smoke- -Free Free The Smoke Arizona Act Arizona Act Arizona Department of Health Services

The Smoke- -Free Free The Smoke Arizona Act Arizona Act Arizona Department of Health Services Arizona Department of Health Services SMOKE- -FREE ARIZONA FREE ARIZONA SMOKE The Smoke- -Free Arizona Act Free Arizona Act The Smoke In

332 views • 18 slides

Free Software and the Environment Ben ONeill What makes free software good? What makes free

Free Software and the Environment Ben ONeill What makes free software good? What makes free software good? Ethical What makes free software good? Ethical Collaborative What makes free software good? Ethical Collaborative

592 views • 26 slides

FREE FREE FREE FREE RIDE RIDE RIDE RIDE W HAT HAT IS IS F REE REE RIDE RIDE ? HAT HAT IS

FREE FREE FREE FREE RIDE RIDE RIDE RIDE W HAT HAT IS IS F REE REE RIDE RIDE ? HAT HAT IS IS REE REE RIDE RIDE W HO HO ARE ARE F REE REE RIDERS RIDERS ? HO HO ARE ARE REE REE RIDERS RIDERS AGENDA AGENDA AGENDA AGENDA I

476 views • 23 slides

EQUATION OF FREE FALL Chapter 2 = Free Fall v = u - gt Chapter 2 = Free Fall v = u - gt

EQUATION OF FREE FALL Chapter 2 = Free Fall v = u - gt Chapter 2 = Free Fall v = u - gt v = u - 2gs Chapter 2 = Free Fall v = u - gt v = u - 2gs s = ut - gt Chapter 2 = Free Fall v = u - gt v = u - 2gs s = ut -

420 views • 6 slides

Bio-inspired computation: Clock-free, grid-free, scale-free, and symbol-free (FA2386-12-1-4050)

Bio-inspired computation: Clock-free, grid-free, scale-free, and symbol-free (FA2386-12-1-4050) PI: Janet Wiles (University of Queensland) AFOSR Program Review: Mathematical and Computational Cognition Program Computational and Machine

307 views • 29 slides

When Free Software Isn't Better When Free Software Isn't Better benjamin mako hill :: when

When Free Software Isn't Better When Free Software Isn't Better benjamin mako hill :: when free software isn't better Why free software isn't always very good, why that's not a problem, and how we make it better. Free Software is... Free

690 views • 43 slides

Hopes and Fears for Evergreen Oh were free! Free! Forever were free Come join the song

My My Hopes and Fears for Evergreen Oh were free! Free! Forever were free Come join the song of all the redeemed Yes were free! Free! Forever amen Thats when death was arrested and my life began We worship you! Hallelujah,

713 views • 23 slides

Roadmap SE process management Waterfall model Incremental methods Agile/XP methods

Roadmap SE process management Waterfall model Incremental methods Agile/XP methods Iterative / spiral methods (eg, RUP) Evolutionary methods V-Model CMMI 320312 Software Engineering (P. Baumann) 1 Spiral Model

441 views • 15 slides

Clustering ! Hierarchical methods ! Model-based methods ! Density-based methods 1 2 What is

Preview ! Introduction Lecture 10 ! Partitioning methods Clustering ! Hierarchical methods ! Model-based methods ! Density-based methods 1 2 What is Clustering? Examples of Clustering Applications ! Cluster: a collection of data objects !

301 views • 8 slides

Cosmological model : Cosmological model Cosmological model Cosmological model : : : :

Cosmological model : Cosmological model Cosmological model Cosmological model : : : : Cosmological model Cosmological model Cosmological model Cosmological model : : : from initial conditions from initial conditions from initial

1.14k views • 46 slides

Roadmap SE process management Waterfall model Incremental methods Agile/XP methods,

Roadmap SE process management Waterfall model Incremental methods Agile/XP methods, SCRUM Iterative / spiral methods (eg, RUP) Evolutionary methods V -Model CMMI 320312 Software Engineering (P. Baumann) 1 The

339 views • 19 slides

Understanding human speech recognition: Reverse-engineering the engineering solution using

Cai Wingfield CSLB, Department of Psychology Workshop on Neurocomputation: From Brains to Machines University of Cambridge 25 November 2015 Understanding human speech recognition: Reverse-engineering the engineering solution using EMEG

196 views • 18 slides

Developmental Systems Companion slides for the book Bio-Inspired Artificial Intelligence: Theories,

Developmental Systems Companion slides for the book Bio-Inspired Artificial Intelligence: Theories, 1 Methods, and Technologies by Dario Floreano and Claudio Mattiussi, MIT Press Biological systems Early development of the Drosophila fly

449 views • 34 slides

10/26/2011 The Journal of Experimental Biology Distinct startle responses are associated with

10/26/2011 The Journal of Experimental Biology Distinct startle responses are associated with neuroanatomical differences in pufferfishes The leading journal in comparative animal A. K. Greenwood, C. L. Peichel, and S. J. Zottoli

211 views • 7 slides

10b Machine Learning: Symbol-based 10.0 Introduction 10.5 Knowledge and Learning 10.1 A

10b Machine Learning: Symbol-based 10.0 Introduction 10.5 Knowledge and Learning 10.1 A Framework for 10.6 Unsupervised Learning Symbol-based Learning 10.7 Reinforcement Learning 10.2 Version Space Search 10.8 Epilogue and 10.3 The

866 views • 50 slides

Pedicle Subtraction Osteotomy: Maximizing correction and reducing complications Munish C. Gupta,

11/13/2015 Pedicle Subtraction Osteotomy: Maximizing correction and reducing complications Munish C. Gupta, MD Chief of Pediatric and Adult Spine Surgery Mildred B. Simon Distinguished Professor of Orthopedics Professor of Neurological Surgery

744 views • 52 slides

Disclosure Mini-Open Transpedicular Consultant: Globus Corpectomy for the Treatment of

Disclosure Mini-Open Transpedicular Consultant: Globus Corpectomy for the Treatment of Honorarium: Medtronic Painful Spine Tumors UCSF Spine Symposium Honorarium: Depuy June 1, 2013 Honorarium: Stryker Dean Chou, M.D.

333 views • 16 slides

Body Chart Initial Hypothesis? Orthopaedic Manual Physical Therapy Series 2017-2018

Property of VOMPTI, LLC www.vompti.com L ATERAL E LBOW C ASE S TUDY Kristin Kelley, DPT, OCS, FAAOMPT Orthopaedic Manual Physical Therapy Series Charlottesville 2017-2018 Orthopaedic Manual Physical Therapy Series 2017-2018 Body Chart Initial

537 views • 43 slides

Informed decisions. Informed decisions. Better health. Better health.

Trusted evidence. Trusted evidence. Informed decisions. Informed decisions. Better health. Better health. 1.02 3.87 2.20 4.32 1.38 5.44 Double blind randomized

927 views • 43 slides