Journal of Arti�cial In telligence Researc h � ������ ������� Submitted ����� published ���� Reinforcemen t Learning� A Surv ey Leslie P ac k Kaelbling lpk�cs�br o wn�edu Mic hael L� Littman mlittman�cs�br o wn�edu Computer Scienc e Dep artment� Box ����� Br own University Pr ovidenc e� RI ���������� USA Andrew W� Mo ore a wm�cs�cmu�edu Smith Hal l ���� Carne gie Mel lon University� ���� F orb es A venue Pittsbur gh� P A ����� USA Abstract This pap er surv eys the �eld of reinforcemen t learning from a computer�science p er� sp ectiv e� It is written to b e accessible to researc hers familia r with mac hine learning� Both the historical basis of the �eld and a broad selection of curren t w ork are summarized� Reinforcemen t learning is the problem faced b y an agen t that learns b eha vior through trial�and�error in teractions with a dynamic en vironmen t� The w ork describ ed here has a resem blance to w ork in psyc hology � but di�ers considerably in the details and in the use of the w ord �reinforcemen t�� The pap er discusses cen tral issues of reinforcemen t learning� including trading o� exploration and exploitation� establishing the foundations of the �eld via Mark o v decision theory � learning from dela y ed reinforcemen t� constructing empirical mo dels to accelerate learning� making use of generalization and hierarc h y � and coping with hidden state� It concludes with a surv ey of some implemen ted systems and an assessmen t of the practical utilit y of curren t metho ds for reinforcemen t learning� �� In tro duction Reinforcemen t learning dates bac k to the early da ys of cyb ernetics and w ork in statistics� psyc hology � neuroscience� and computer science� In the last �v e to ten y ears� it has attracted rapidly increasing in terest in the mac hine learning and arti�cial in telligence comm unities� Its promise is b eguiling�a w a y of programming agen ts b y rew ard and punishmen t without needing to sp ecify how the task is to b e ac hiev ed� But there are formidable computational obstacles to ful�lling the promise� This pap er surv eys the historical basis of reinforcemen t learning and some of the curren t w ork from a computer science p ersp ectiv e� W e giv e a high�lev el o v erview of the �eld and a taste of some sp eci�c approac hes� It is� of course� imp ossible to men tion all of the imp ortan t w ork in the �eld� this should not b e tak en to b e an exhaustiv e accoun t� Reinforcemen t learning is the problem faced b y an agen t that m ust learn b eha vior through trial�and�error in teractions with a dynamic en vironmen t� The w ork describ ed here has a strong family resem blance to ep on ymous w ork in psyc hology � but di�ers considerably in the details and in the use of the w ord �reinforcemen t�� It is appropriately though t of as a class of problems� rather than as a set of tec hniques� There are t w o main strategies for solving reinforcemen t�learning problems� The �rst is to searc h in the space of b eha viors in order to �nd one that p erforms w ell in the en vironmen t� This approac h has b een tak en b y w ork in genetic algorithms and genetic programming� � ���� c AI Access F oundation and Morgan Kaufmann Publishers� All righ ts reserv ed�
Kaelbling� Littman� � Moore T a s i I B R r Figure �� The standard reinforcemen t�learning mo del� as w ell as some more no v el searc h tec hniques �Sc hmidh ub er� ������ The second is to use statistical tec hniques and dynamic programming metho ds to estimate the utilit y of taking actions in states of the w orld� This pap er is dev oted almost en tirely to the second set of tec hniques b ecause they tak e adv an tage of the sp ecial structure of reinforcemen t�learning problems that is not a v ailable in optimization problems in general� It is not y et clear whic h set of approac hes is b est in whic h circumstances� The rest of this section is dev oted to establishing notation and describing the basic reinforcemen t�learning mo del� Section � explains the trade�o� b et w een exploration and exploitation and presen ts some solutions to the most basic case of reinforcemen t�learning problems� in whic h w e w an t to maximize the immediate rew ard� Section � considers the more general problem in whic h rew ards can b e dela y ed in time from the actions that w ere crucial to gaining them� Section � considers some classic mo del�free algorithms for reinforcemen t learning from dela y ed rew ard� adaptiv e heuristic critic� T D � � � and Q�learning� Section � demonstrates a con tin uum of algorithms that are sensitiv e to the amoun t of computation an agen t can p erform b et w een actual steps of action in the en vironmen t� Generalization�the cornerstone of mainstream mac hine learning researc h�has the p oten tial of considerably aiding reinforcemen t learning� as describ ed in Section �� Section � considers the problems that arise when the agen t do es not ha v e complete p erceptual access to the state of the en vironmen t� Section � catalogs some of reinforcemen t learning�s successful applications� Finally � Section � concludes with some sp eculations ab out imp ortan t op en problems and the future of reinforcemen t learning� ��� Reinforcemen t�Learning Mo del In the standard reinforcemen t�learning mo del� an agen t is connected to its en vironmen t via p erception and action� as depicted in Figure �� On eac h step of in teraction the agen t receiv es as input� i � some indication of the curren t state� s � of the en vironmen t� the agen t then c ho oses an action� a � to generate as output� The action c hanges the state of the en vironmen t� and the v alue of this state transition is comm unicated to the agen t through a scalar r einfor c ement signal � r � The agen t�s b eha vior� B � should c ho ose actions that tend to increase the long�run sum of v alues of the reinforcemen t signal� It can learn to do this o v er time b y systematic trial and error� guided b y a wide v ariet y of algorithms that are the sub ject of later sections of this pap er� ���
Recommend
More recommend