Reinforcemen t Learning Read Chapter Exercises - PDF document

�� Reinforcemen t Learning �Read Chapter �� Exercises �� Con trol learning � Con trol p olici es that c ho ose optimal actions � Q learning � Con v ergence �� lecture slides for textb o ok Machine L e arning � T� Mitc hell� McGra w Hill� ��

Con trol Learning Consider learning to c ho ose actions� e�g�� Rob ot learning to do c k on battery c harger � Learning to c ho ose actions to optimize factory output � Learning to pla y Bac kgammon Note sev eral problem c haracteristics� � Dela y ed rew ard � Opp ortunit y for activ e exploration � P ossibilit y that state only partially observ able � P ossible need to learn m ultiple tasks with same sensors�e�ectors �� lecture slides for textb o ok Machine L e arning � T� Mitc hell� McGra w Hill� ��

One Example� TD�Gammon �T esauro� �� Learn to pla y Bac kgammon Immediate rew ard � �� if win � �� if lose � � for all other states T rained b y pla ying �� million games against itself No w appro ximately equal to b est h uman pla y er �� lecture slides for textb o ok Machine L e arning � T� Mitc hell� McGra w Hill� ��

Reinforcemen t Learning Problem Agent State Reward Action Environment a a a 0 1 2 s s s ... 0 1 2 r r r 0 1 2 Goal: Learn to choose actions that maximize 2 γ r + r + ... , where γ γ <1 r + 0 < 0 1 2 �� lecture slides for textb o ok Machine L e arning � T� Mitc hell� McGra w Hill� ��

Mark o v Decision Pro cesses Assume � �nite set of states S � set of actions A � at eac h discrete time agen t observ es state � s S t and c ho oses action � a A t � then receiv es immediate rew ard r t � and state c hanges to s t �� Mark o v assumption� � � s � and s � � a t �� t t � � s � r r � a t t t i�e�� and dep end only on state r s curr ent t t �� and action functions � and r ma y b e nondeterministic � functions � and r not necessarily kno wn to � agen t �� lecture slides for textb o ok Machine L e arning � T� Mitc hell� McGra w Hill� ��

Agen t�s Learning T ask Execute actions in en vironmen t� observ e results� and � learn action p olicy � � that maximizes � S A � � r � � � � � E � r � r � � t t �� t �� from an y starting state in S � here � � � � � is the discoun t factor for future rew ards Note something new� � T arget function is � � � S A � but w e ha v e no training examples of form h s� a i � training examples are of form hh s� a i � r i �� lecture slides for textb o ok Machine L e arning � T� Mitc hell� McGra w Hill� ��

V alue F unction T o b egin� consider deterministic w orlds�� F or eac h p ossible p olicy the agen t migh t adopt� � w e can de�ne an ev aluation function o v er states � � V � s � � r � � r � � r � �� t t �� t �� i � � r X t � i i �� where are generated b y follo wing p olicy r � r � � � � t �� t starting at state � s Restated� the task is to learn the optimal p olicy � � � � � argmax � s � � � � s � � V � �� lecture slides for textb o ok Machine L e arning � T� Mitc hell� McGra w Hill� ��

0 100 0 G 0 0 0 0 0 100 0 0 0 0 � s� a � �immediate rew ard� v alues r 0 90 100 G G 90 100 0 81 72 81 81 90 100 81 90 81 90 100 72 81 Q � s� a � v alues � s � v alues V � G One optimal p olicy �� lecture slides for textb o ok Machine L e arning � T� Mitc hell� McGra w Hill� ��

What to Learn W e migh t try to ha v e agen t learn the ev aluati on � � function �whic h w e write as � � V V It could then do a lo ok ahead searc h to c ho ose b est action from an y state s b ecause � � � � s � � argmax � r � s� a � � � V � � � s� a �� a A problem� � This w orks w ell if agen t kno ws � � S � A � S � and r � S � A � � � But when it do esn�t� it can�t c ho ose actions this w a y �� lecture slides for textb o ok Machine L e arning � T� Mitc hell� McGra w Hill� ��

F unction Q De�ne new function v ery similar to � V � Q � s� a � � � s� a � � � � � s� a �� r � V If agen t learns Q � it can c ho ose optimal action ev en without kno wing � � � � � � s � � argmax � r � s� a � � � V � � � s� a �� a � � s � � argmax Q � s� a � � a is the ev aluation function the agen t will learn Q �� lecture slides for textb o ok Machine L e arning � T� Mitc hell� McGra w Hill� ��

T raining Rule to Learn Q Note and � closely related� Q V � � � s � � max Q � s� � V a � a Whic h allo ws us to write recursiv ely as Q � Q � s � a � � r � s � a � � � V � � � s � a �� t t t t t t � � s � � max Q � s � � r � a � � a t �� t t a � � Nice� Let denote learner�s curren t appro ximation Q to Q � Consider training rule � � � � Q � s� a � � � max Q � s � r � � a � a where � is the state resulting from applying action s in state a s �� lecture slides for textb o ok Machine L e arning � T� Mitc hell� McGra w Hill� ��

Learning for Deterministi c W orlds Q � F or eac h initial i ze table en try � s� a � � � s� a Q Observ e curren t state s Do forev er� � Select an action and execute it a � Receiv e immediate rew ard r � Observ e the new state s � � � Up date the table en try for Q � s� a � as follo ws� � � � � Q � s� a � � r � � max Q � s � a � � a � s � s � �� lecture slides for textb o ok Machine L e arning � T� Mitc hell� McGra w Hill� ��

� Up dating Q 90 72 100 100 R R 63 63 81 81 a right initial state: s 1 next state: s 2 � � � Q � s � � � max Q � s � � a r � � a � � r ig ht � a � � � � � � max f �� g � �� notice if rew ards non�negativ e� then � � � � s� n � � s� a � � � s� a � a� Q Q n �� n and � � � s� n � � � � s� a � � Q � s� a � a� Q n �� lecture slides for textb o ok Machine L e arning � T� Mitc hell� McGra w Hill� ��

� Q con v erges to Q � Consider case of deterministic w orld where see eac h h s� a i visited in�nitely often� of � De�ne a full in terv al to b e an in terv al during Pr o whic h eac h h s� a i is visited� During eac h full � in terv al the largest error in Q table is reduced b y factor of � � Let b e table after up dates� and � b e the Q n n n � maxim um error in � that is Q n � � � max j Q � s� a � � Q � s� a � j n n s�a � F or an y table en try � s� a � up dated on iteration Q n � � �� the error in the revised estimate � s� a � is n Q n �� j � s� a � � Q � s� a � j � j � r � max � s �� Q � Q � a n �� n � a � � � � r � max Q � s �� j � � a � a � � � � � � j max � s � � max Q � s � j � Q � a � a n � � a a � � � � � � max j � s � � Q � s � j � Q � a � a n � a � �� max j Q � s � a � � Q � s � a � j n �� s �a � j Q � s� a � � Q � s� a � j � � � n �� n �� lecture slides for textb o ok Machine L e arning � T� Mitc hell� McGra w Hill� ��

Note w e used general fact that j max � a � � max � a � j � max j f � a � � � a � j f f f � � � � a a a �� lecture slides for textb o ok Machine L e arning � T� Mitc hell� McGra w Hill� ��

Reinforcemen t Learning Read Chapter Exercises - PDF document

Reinforcemen t Learning Read Chapter Exercises Con trol learning Con trol p olici es that c ho ose optimal actions Q learning

13. Reinforcemen t Learning [Read Chapter 13] [Exercises 13.1, 13.2, 13.4] Con

For Monday Read chapter 18, sections 5-6 Homework: Chapter 18, exercises 1-2 Program 3

EXERCISES EXERCISES Important Perfectly safe for the vast majority of people Those with

Neck Exercises for Prevention, Neck Exercises for Prevention, Rehabilitation and Strength

Course setup 9 ec course examination based on computer exercises weekly exercises

Exercises, II part Forward Chaining: 12 Jul 2012 Exercises, II part Consider the following set

Review Search This material: Chapter 1 4 (3 rd ed.) Read Chapter 13 (Quantifying Uncertainty)

For Thursday Read chapter 9 Homework: Chapter 7, exercises 2 and 10 Program 1 Any

For Friday Read chapter 2 Homework: Chapter 1, exercises 3, 11-13 Send email to

For Thursday Read Weiss, chapter 7, sections 7-10 Homework: Weiss, chapter 4, exercises

For Friday Read chapter 8 Homework: Chapter 7, exercises 2 and 10 Program 1,

For Wednesday Read Weiss, chapter 6, sections 1-3 Homework: Weiss, chapter 3, exercises

For Friday Read Weiss, chapter 6, sections 1-3 Homework: Weiss, chapter 4, exercises

For Friday Read Weiss, chapter 6, section 4 Homework: Weiss, chapter 4, exercises 1-2

For Thursday Read Weiss, chapter 4, sections 1-4 Homework: Weiss, chapter 3, exercises

For Monday Read chapter 9 Homework: Chapter 8, exercises 9 and 10 Program 1 Any

Reinforcement Learning-Based SLC Cache Technique for Enhancing SSD Write Performance Sangjin Yoo

Reinforcement Learning for NLP Graham Neubig Site https://phontron.com/class/nn4nlp2019/ What

Reinforcement learning Yifeng Tao School of Computer Science Carnegie Mellon University Slides

Banking Dynamics and Capital Regulation Jos Vctor Ros Rull Tamon Takamura Yaz Terajima

Reinforcement Learning: Part 2 Chris Watkins Department of Computer Science Royal Holloway,

Verification of Agents learning through Reinforcement Shashank Pathak 12 Giorgio Metta 12 Luca

Device Placement Optimization with Reinforcement Learning Azalia Mirhoseini, Hieu Pham, Quoc V.

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement