Journal of Articial In telligence Researc h - PDF document

Journal of Arti�cial In telligence Researc h � �� Submitted �� published �� Reinforcemen t Learning� A Surv ey Leslie P ac k Kaelbling lpk�cs�br o wn�edu Mic hael L� Littman mlittman�cs�br o wn�edu Computer Scienc e Dep artment� Box �� Br own University Pr ovidenc e� RI �� USA Andrew W� Mo ore a wm�cs�cmu�edu Smith Hal l �� Carne gie Mel lon University� �� F orb es A venue Pittsbur gh� P A �� USA Abstract This pap er surv eys the �eld of reinforcemen t learning from a computer�science p er� sp ectiv e� It is written to b e accessible to researc hers familia r with mac hine learning� Both the historical basis of the �eld and a broad selection of curren t w ork are summarized� Reinforcemen t learning is the problem faced b y an agen t that learns b eha vior through trial�and�error in teractions with a dynamic en vironmen t� The w ork describ ed here has a resem blance to w ork in psyc hology � but di�ers considerably in the details and in the use of the w ord �reinforcemen t�� The pap er discusses cen tral issues of reinforcemen t learning� including trading o� exploration and exploitation� establishing the foundations of the �eld via Mark o v decision theory � learning from dela y ed reinforcemen t� constructing empirical mo dels to accelerate learning� making use of generalization and hierarc h y � and coping with hidden state� It concludes with a surv ey of some implemen ted systems and an assessmen t of the practical utilit y of curren t metho ds for reinforcemen t learning� �� In tro duction Reinforcemen t learning dates bac k to the early da ys of cyb ernetics and w ork in statistics� psyc hology � neuroscience� and computer science� In the last �v e to ten y ears� it has attracted rapidly increasing in terest in the mac hine learning and arti�cial in telligence comm unities� Its promise is b eguiling�a w a y of programming agen ts b y rew ard and punishmen t without needing to sp ecify how the task is to b e ac hiev ed� But there are formidable computational obstacles to ful�lling the promise� This pap er surv eys the historical basis of reinforcemen t learning and some of the curren t w ork from a computer science p ersp ectiv e� W e giv e a high�lev el o v erview of the �eld and a taste of some sp eci�c approac hes� It is� of course� imp ossible to men tion all of the imp ortan t w ork in the �eld� this should not b e tak en to b e an exhaustiv e accoun t� Reinforcemen t learning is the problem faced b y an agen t that m ust learn b eha vior through trial�and�error in teractions with a dynamic en vironmen t� The w ork describ ed here has a strong family resem blance to ep on ymous w ork in psyc hology � but di�ers considerably in the details and in the use of the w ord �reinforcemen t�� It is appropriately though t of as a class of problems� rather than as a set of tec hniques� There are t w o main strategies for solving reinforcemen t�learning problems� The �rst is to searc h in the space of b eha viors in order to �nd one that p erforms w ell in the en vironmen t� This approac h has b een tak en b y w ork in genetic algorithms and genetic programming� � �� c AI Access F oundation and Morgan Kaufmann Publishers� All righ ts reserv ed�

Kaelbling� Littman� � Moore T a s i I B R r Figure �� The standard reinforcemen t�learning mo del� as w ell as some more no v el searc h tec hniques �Sc hmidh ub er� �� The second is to use statistical tec hniques and dynamic programming metho ds to estimate the utilit y of taking actions in states of the w orld� This pap er is dev oted almost en tirely to the second set of tec hniques b ecause they tak e adv an tage of the sp ecial structure of reinforcemen t�learning problems that is not a v ailable in optimization problems in general� It is not y et clear whic h set of approac hes is b est in whic h circumstances� The rest of this section is dev oted to establishing notation and describing the basic reinforcemen t�learning mo del� Section � explains the trade�o� b et w een exploration and exploitation and presen ts some solutions to the most basic case of reinforcemen t�learning problems� in whic h w e w an t to maximize the immediate rew ard� Section � considers the more general problem in whic h rew ards can b e dela y ed in time from the actions that w ere crucial to gaining them� Section � considers some classic mo del�free algorithms for reinforcemen t learning from dela y ed rew ard� adaptiv e heuristic critic� T D � � � and Q�learning� Section � demonstrates a con tin uum of algorithms that are sensitiv e to the amoun t of computation an agen t can p erform b et w een actual steps of action in the en vironmen t� Generalization�the cornerstone of mainstream mac hine learning researc h�has the p oten tial of considerably aiding reinforcemen t learning� as describ ed in Section �� Section � considers the problems that arise when the agen t do es not ha v e complete p erceptual access to the state of the en vironmen t� Section � catalogs some of reinforcemen t learning�s successful applications� Finally � Section � concludes with some sp eculations ab out imp ortan t op en problems and the future of reinforcemen t learning� �� Reinforcemen t�Learning Mo del In the standard reinforcemen t�learning mo del� an agen t is connected to its en vironmen t via p erception and action� as depicted in Figure �� On eac h step of in teraction the agen t receiv es as input� i � some indication of the curren t state� s � of the en vironmen t� the agen t then c ho oses an action� a � to generate as output� The action c hanges the state of the en vironmen t� and the v alue of this state transition is comm unicated to the agen t through a scalar r einfor c ement signal � r � The agen t�s b eha vior� B � should c ho ose actions that tend to increase the long�run sum of v alues of the reinforcemen t signal� It can learn to do this o v er time b y systematic trial and error� guided b y a wide v ariet y of algorithms that are the sub ject of later sections of this pap er� ��

Journal of Articial In telligence Researc h - PDF document

Journal of Articial In telligence Researc h Submitted published Reinforcemen t Learning A Surv ey Leslie P ac k Kaelbling lpkcsbr o

Journal of Articial In telligence Researc h 12 (2000) 219-234 Submitted 5/99;

Alte ternate te De Definiti tions (Ru Human inte telligence (Russell + Norv ssell + Norvig

What t is Arti tificial Inte telligence? Webster says: a. the capacity to acquire and apply

OPPORTUNITIES FOR Financial Literacy Education in Millennials with Arti fi cial

In Intr troduction n to to Ar Arti tificial l In Inte telligence e (A (AI) I) Com

CS 337: Arti fi cial Intelligence & Machine Learning Instructor: Prof. Ganesh Ramakrishnan

Digital Collections Customer Days 2017 Arti fj cial Intelligence, Semantic Data & Distributed

Explain it like Im 5 AI, ML, NLP, and Deep Learning Kathryn Hume, Sales & Marketing

Heuristi tic Search. In uninformed search, we dont try to evaluate which of the nodes on

Constr traint t Sati tisfacti tion Problems The search algorithms we discussed so far had

In Introd troducti uction on To To Ar Artificial tificial In Intel telli ligence gence

1 Aerial View of Pier 8 1 Ground Floor Animation Ground Floor Uses The design for Pier 8 has a

About the Journal The Journal of Experimental Biology is a leading journal in comparative

Wh What at's wi with Arti Articl cle 6 and and 45Q? 45Q? Tw Two New New Re Research Ques

1 Monday Tuesday Wed. Thurs. Fri. Speci cial Education 1/2 1/2 Direct ctor Speci cial

Distinguished lecture talk by our new AU honorary doctor Wendy E. Mackay on Creating Human-

I nterna l net w or k o f 4 comp l ementar y f aci l ities Ex terna l net w or k o f researc h ,

2 ND GRADE TUESDAY 3 / 17 H A P P Y S T. PAT R I C K S DAY JOURNAL 8:00-8:30

20 2016 16 LLS LLSA Revie iew Anticoagulants/Antithrombotics Arti ticles 1, 9, 11, 1, 12

BRIC ICS Law Jo Journal New research project of the University of Tyumen BRIC ICS Law Journal

Ar Arti tifici ficial al In Inte tell llig igenc ence e (A (AI) I) Co Computer ter

OF 23 23 A PRIL IL 200 2008 C HA HAL M ARTI ELEC ECTED EGMEN ENTS TS O TIN 1 P RES TO M OUNT

2016 20 16-20 2017 17 Res Researc earch and Inn h and Innovati ovation Ac on Acti tions

www.FLgov.com/FBCB Spe peci cial al Th Than anks ks To To:

Journal of Articial In telligence Researc h - PDF document

Journal of Articial In telligence Researc h Submitted published Reinforcemen t Learning A Surv ey Leslie P ac k Kaelbling lpkcsbr o

Journal of Articial In telligence Researc h 12 (2000) 219-234 Submitted 5/99;

Alte ternate te De Definiti tions (Ru Human inte telligence (Russell + Norv ssell + Norvig

What t is Arti tificial Inte telligence? Webster says: a. the capacity to acquire and apply

OPPORTUNITIES FOR Financial Literacy Education in Millennials with Arti fi cial

In Intr troduction n to to Ar Arti tificial l In Inte telligence e (A (AI) I) Com

CS 337: Arti fi cial Intelligence &amp; Machine Learning Instructor: Prof. Ganesh Ramakrishnan

Digital Collections Customer Days 2017 Arti fj cial Intelligence, Semantic Data &amp; Distributed

Explain it like Im 5 AI, ML, NLP, and Deep Learning Kathryn Hume, Sales &amp; Marketing

Heuristi tic Search. In uninformed search, we dont try to evaluate which of the nodes on

Constr traint t Sati tisfacti tion Problems The search algorithms we discussed so far had

In Introd troducti uction on To To Ar Artificial tificial In Intel telli ligence gence

1 Aerial View of Pier 8 1 Ground Floor Animation Ground Floor Uses The design for Pier 8 has a

About the Journal The Journal of Experimental Biology is a leading journal in comparative

Wh What at's wi with Arti Articl cle 6 and and 45Q? 45Q? Tw Two New New Re Research Ques

1 Monday Tuesday Wed. Thurs. Fri. Speci cial Education 1/2 1/2 Direct ctor Speci cial

Distinguished lecture talk by our new AU honorary doctor Wendy E. Mackay on Creating Human-

I nterna l net w or k o f 4 comp l ementar y f aci l ities Ex terna l net w or k o f researc h ,

2 ND GRADE TUESDAY 3 / 17 H A P P Y S T. PAT R I C K S DAY JOURNAL 8:00-8:30

20 2016 16 LLS LLSA Revie iew Anticoagulants/Antithrombotics Arti ticles 1, 9, 11, 1, 12

BRIC ICS Law Jo Journal New research project of the University of Tyumen BRIC ICS Law Journal

Ar Arti tifici ficial al In Inte tell llig igenc ence e (A (AI) I) Co Computer ter

OF 23 23 A PRIL IL 200 2008 C HA HAL M ARTI ELEC ECTED EGMEN ENTS TS O TIN 1 P RES TO M OUNT

2016 20 16-20 2017 17 Res Researc earch and Inn h and Innovati ovation Ac on Acti tions

www.FLgov.com/FBCB Spe peci cial al Th Than anks ks To To:

CS 337: Arti fi cial Intelligence & Machine Learning Instructor: Prof. Ganesh Ramakrishnan

Digital Collections Customer Days 2017 Arti fj cial Intelligence, Semantic Data & Distributed

Explain it like Im 5 AI, ML, NLP, and Deep Learning Kathryn Hume, Sales & Marketing