Autonomous(Task(Sequencing(for(Customized( - PowerPoint PPT Presentation

Autonomous(Task(Sequencing(for(Customized( Curriculum(Design(in(Reinforcement(Learning Sanmit'Narvekar, Jivko Sinapov,+and+Peter+Stone Department+of+Computer+Science University+of+Texas+at+Austin {sanmit,+jsinapov,+pstone}+@cs.utexas.edu

Successes(of(Reinforcement(Learning Approaching+or+passing+human+level+performance BUT Can+take+ millions of+episodes!+People+learn+this+MUCH faster University+of+Texas+at+Austin Sanmit+Narvekar 2

People(Learn(via(Curricula People+are+able+to+learn+a+lot+of+complex+tasks+very+efficiently+ University+of+Texas+at+Austin Sanmit+Narvekar 3

Example:(Quick(Chess • Quickly+learn+the+ fundamentals+of+chess • 5+x+6+board+ • Fewer+pieces+per+type • No+castling • No+enQpassant+ University+of+Texas+at+Austin Sanmit+Narvekar 4

Example:(Quick(Chess .+.+.+.+.+. University+of+Texas+at+Austin Sanmit+Narvekar 5

Task(Space Pawns+++King Pawns+only Target+task Empty+task One+piece+per+type • Quick+Chess+is+a+curriculum+designed+for+people • We+want+to+do+something+similar+automatically for+autonomous+agents University+of+Texas+at+Austin Sanmit+Narvekar 6

Curriculum(Learning Task+=+MDP Environment State Action Reward Agent Task'Creation Presented+at+AAMAS+‘16 Transfer'Learning Sequencing via+Value+Function+Transfer • Curriculum+learning+is+a+complex+problem+that+ties+task+creation,+sequencing,+ and+transfer+learning University+of+Texas+at+Austin Sanmit+Narvekar 7

Autonomous(Task(Sequencing University+of+Texas+at+Austin Sanmit+Narvekar 8

Sequencing(as(an(MDP M 3 ! 1 R 1,3 M 1 ! 4 M 4 R 0,1 R 4,4 M 4 M 3 ! f ! 2 R 2,4 ! 0 R 0,3 M 4 R 5,4 M 2 ! 5 M 3 R 0,2 R 3,3 ! 3 • State'space' S C :+All+policies+ ! i an+agent+can+represent • Action'space' A C :+Different+tasks+ M j an+agent+can+train+on • Transition'function' p C (s C ,a C ) :+Learning+task+ a C transforms+an+agent’s+policy+ s C • Reward'function' r C (s C ,a C ) :+Cost+in+time+steps+to+learn+task+ a C given+policy+ s C University+of+Texas+at+Austin Sanmit+Narvekar 9

Sequencing(as(an(MDP M 3 ! 1 R 1,3 M 1 ! 4 M 4 R 0,1 R 4,4 M 4 M 3 ! f ! 2 R 2,4 ! 0 R 0,3 M 4 R 5,4 M 2 ! 5 M 3 R 0,2 R 3,3 ! 3 • A+policy ! C :+S C ! A C on+this+curriculum+MDP+(CMDP)+specifies+which+task+to+ train+on+given+learning+agent+policy+ ! i • Learning+full+policy+ ! C can+be+difficult!+ • Taking+an+action+requires+solving+a+full+task+MDP • Transitions+are+not+deterministic+ University+of+Texas+at+Austin Sanmit+Narvekar 10

Sequencing(as(an(MDP M 3 Target+Task ! 1 R 1,3 M 1 ! 4 M 4 R 0,1 R 4,4 M 4 M 3 ! f ! 2 R 2,4 ! 0 R 0,3 M 4 R 5,4 M 2 ! 5 M 3 R 0,2 R 3,3 ! 3 • Instead,+find+one+trace/execution in+CMDP+of+ ! C* • Main'Idea :+Leverage+fact+that+we+know+the+target+task and+therefore+what+is+ relevant+for+the+final+state+policy+ ! f to+guide+selection of+tasks University+of+Texas+at+Austin Sanmit+Narvekar 11

Autonomous(Sequencing Target'Task • Grid+world+domain • Objectives • Navigate+the+world • Pick+up+keys • Unlock+locks • Avoid+pits University+of+Texas+at+Austin Sanmit+Narvekar 12

Autonomous(Sequencing 1 • Recursive+algorithm+(6+steps) 2 • Each+iteration+adds+a+source+task+to+ 3 the+curriculum Unsolvable+Tasks Solvable+Tasks • This+in+turn+updates+the+policy 4 5 • Terminates+when+performance+on+ target+task+greater than+desired+ performance+threshold+ University+of+Texas+at+Austin Sanmit+Narvekar 13 6

Autonomous(Sequencing Step'1 1 Target'Task • Assume+learning+budget+ " • Attempt+to+solve target+task+ directly+in+ " steps.+Save+samples • Solvable? • Target+task+easy+to+learn • Started+with+policy+that+made+it+easy+ to+learn.+Done • Goal:+incrementally learn+subtasks+ to+build+a+policy that+can+learn+the+ target+task University+of+Texas+at+Austin Sanmit+Narvekar 14

Autonomous(Sequencing Step'2 1 • Could+not+solve+target • Create+source+tasks using+ methods+from+AAMAS+‘16.+ 2 Step'3 • Attempt+to+solve+each+source+ in+ " steps • Partition+sources+into+ 3 solvable+/+unsolvable+ Solvable+Tasks Unsolvable+Tasks University+of+Texas+at+Austin Sanmit+Narvekar 15

Autonomous(Sequencing Initial+Policy+ ! 0 Step'4 • If+solvable+tasks+exist,+select+ [s 1 ,+s 2 ,+s 3 ,+s 4 …+s " ] the+one+that+updates+the+ policy the+most+on+samples+ [ U … P ] , , , drawn+from+the+target+task Solvable+Tasks • Assumption • Source+tasks+that+can+be+ solved+have+policies+that+are+ ! 1 ! 2 relevant+to+the+target+task • Don’t+provide+negative+ [ … P [ U … P ] ] , , , , , , 4 transfer � University+of+Texas+at+Austin Sanmit+Narvekar 16

Autonomous(Sequencing Step'4'(cont.) New+Policy+ ! 1 • Add+source+task to+curriculum • Return+to+Step+1 [s 1 ,+s 2 ,+s 3 ,+s 4 …+s " ] [ P … P ] , , , • (ReQevaluate+on+target+task) • Policy+has+changed,+so+we+will+get+a+new+set+of+samples • Samples+biased towards+agent’s+current+set+of+experiences • This+in+turn+guides+selection of+source+tasks University+of+Texas+at+Austin Sanmit+Narvekar 17

Autonomous(Sequencing [s 1 ,+s 2 ,+s 3 …+s " ] Step'5 • No+sources+solvable+ • Sort+tasks+by+sample+relevance [s 4 ,+s 5 ,+s 6 …+s " ] [s 1 ,+s 2 ,+s 3 …+s " ] • Compare+states+experienced+in+ target+task+with+those+in+ Solvable+Tasks Unsolvable+Tasks experienced+in+sources • Recursively create+subQsource+ 5 tasks • Return+to+Step+2+with+the+ current+source+task+as+the+ target+task University+of+Texas+at+Austin Sanmit+Narvekar 18

Autonomous(Sequencing 1 Step'6 2 • No+sources+usable after+ exhausting+the+tree 3 • Increase+budget,+return+to+ Unsolvable+Tasks Solvable+Tasks Step+1 4 5 • Learning+can+be+cached,+so+ agent+can+pick+up+where+it+ left+off University+of+Texas+at+Austin Sanmit+Narvekar 19 6

Connection(to(CMDPs 1 2 3 M 3 ! 1 R 1,3 M 1 ! 4 M 4 R 0,1 R 4,4 Unsolvable+Tasks Solvable+Tasks M 4 M 3 4 ! f ! 2 5 R 2,4 ! 0 R 0,3 M 4 M 2 ! 5 R 5,4 M 3 R 0,2 R 3,3 ! 3 6 • An+optimal+path in+CMDP+is+one+that+reaches+ ! f with+least+cost • Selection+in+Step+4+picks+tasks+that+update+most+towards+ ! f • Learning+budget+minimizes+cost • Algorithm+behaves+greedily to+balance+updates+and+cost University+of+Texas+at+Austin Sanmit+Narvekar 20

Experimental(Setup • Grid+world+domain+presented+previously Create'multiple'agents • Multiple+agents+shows+the+algorithm+is+not+dependent+on+ implementation of+RL+agent • Evaluate+whether+different+agents+benefit+from+individualized+ curricula+ University+of+Texas+at+Austin Sanmit+Narvekar 21

Experimental(Setup Agent'Types • Basic+Agent • State:+Sensors+on+4+sides+that+measure+distance+to+keys,+locks,+etc. • Actions:+Move+in+4+directions,+pickup+key,+unlock+lock • ActionQdependent+Agent+ • State+difference:+weights on+features+are+shared over+4+directions • Rope+Agent • Action+difference:+Like+basic,+but+can+use+rope+action+to+negate+a+pit University+of+Texas+at+Austin Sanmit+Narvekar 22

Basic(Agent(Results University+of+Texas+at+Austin Sanmit+Narvekar 23

ActionEDependent(Agent(Results University+of+Texas+at+Austin Sanmit+Narvekar 24

Rope(Agent(Results University+of+Texas+at+Austin Sanmit+Narvekar 25

Summary ! 1 M 3 R 1,3 ! 4 M 1 M 4 R 0,1 R 4,4 M 4 M 3 ! f ! 2 R 2,4 ! 0 R 0,3 M 4 ! 5 M 2 R 5,4 M 3 R 0,2 R 3,3 ! 3 • Presented+a+novel+formulation+of+ curriculum+generation+as+an+MDP 1 • Proposed+an+algorithm+to+approximate+a+ 2 trace in+this+MDP 3 • Demonstrated+method+proposed+can+ Solvable+Tasks Unsolvable+Tasks 4 create+curricula+tailored+to+sensing+and+ 5 action+capabilities+of+agents 6 University+of+Texas+at+Austin Sanmit+Narvekar 26

Autonomous(Task(Sequencing(for(Customized( - PowerPoint PPT Presentation

Autonomous(Task(Sequencing(for(Customized( Curriculum(Design(in(Reinforcement(Learning Sanmit'Narvekar, Jivko Sinapov,+and+Peter+Stone Department+of+Computer+Science University+of+Texas+at+Austin {sanmit,+jsinapov,+pstone}+@cs.utexas.edu

Sequencing technology and assembly Sanger sequencing Sanger sequencing with radioactivity

Genomics Sequencing tech Sequencing tech: next generation What do we get from sequencing? How

Elementary Social Elementary Social Studies Studies Curriculum Curriculum Overview Overview

Apicomplexan Genome Sequencing in Sanger Arnab Pain, The Pathogen Sequencing Unit (PSU) 2 nd

Next Next Generation Sequencing: an overview of Generation Sequencing: an overview of

Sequencing Technologies Benchtop Production-Scale Illumina: Sequencing Platforms

Introduction to Bioinformatics Genome sequencing & assembly Genome sequencing & assembly

The Massive Parallel Sequencing era: "Global sequencing" Richard Christen CNRS UMR

Oregon Task Force on Autonomous Vehicles Kick-Off Meeting April 18, 2018 4/18/2018 1 Welcome

Customized IOLs- -Post Insertion Post Insertion Customized IOLs Christian A. Sandstedt, Ph.D.

T echnical Services & Process T echnology Customized Solutions Company PROBLEM SOLVING

What is Customized Employment? Laura Owens, Ph.D., CESP September 12, 2018 Learning Objectives

Customized Computing for Power Efficiency Customized Computing for Power Efficiency Jason Cong

LMIs and autonomous work 1 From autonomous work to discontinuous career paths Autonomous

Math 211 Math 211 Lecture #2 2 Autonomous Equations Autonomous Equations General equation:

Curriculum Review Penicuik High School Th The Cu Curriculum Broadly the curriculum covers:

G unter Rote Freie Universit at Berlin G unter Rote, Freie Universit at Berlin

Lifetime of Outdoor Apparel, Footwear & Gear EOG Climate Action Programme Research

Welcome! , = (, ) , + , ,

Fast Rope Optimizing IDIQs as a Prime and a S ub Feb 2016 Optimizing IDIQ and GWACs Prime

February 18, Week 6 Today: Chapter 4, Forces Exam #1 is in mailboxes Homework Assignment #5 -

Exploring the extremes of the Underlying Event Peter Skands (Monash U), with T. Martin & S.

Presented by Jason A. Donenfeld INRIA, February 2017 Who Who Am I? Am I? Jason Donenfeld,

Program Analysis Program Analysis Extracting information, in order to present Extracting

Autonomous(Task(Sequencing(for(Customized( - PowerPoint PPT Presentation

Autonomous(Task(Sequencing(for(Customized( Curriculum(Design(in(Reinforcement(Learning Sanmit'Narvekar, Jivko Sinapov,+and+Peter+Stone Department+of+Computer+Science University+of+Texas+at+Austin {sanmit,+jsinapov,+pstone}+@cs.utexas.edu

Sequencing technology and assembly Sanger sequencing Sanger sequencing with radioactivity

Genomics Sequencing tech Sequencing tech: next generation What do we get from sequencing? How

Elementary Social Elementary Social Studies Studies Curriculum Curriculum Overview Overview

Apicomplexan Genome Sequencing in Sanger Arnab Pain, The Pathogen Sequencing Unit (PSU) 2 nd

Next Next Generation Sequencing: an overview of Generation Sequencing: an overview of

Sequencing Technologies Benchtop Production-Scale Illumina: Sequencing Platforms

Introduction to Bioinformatics Genome sequencing &amp; assembly Genome sequencing &amp; assembly

The Massive Parallel Sequencing era: &quot;Global sequencing&quot; Richard Christen CNRS UMR

Oregon Task Force on Autonomous Vehicles Kick-Off Meeting April 18, 2018 4/18/2018 1 Welcome

Customized IOLs- -Post Insertion Post Insertion Customized IOLs Christian A. Sandstedt, Ph.D.

T echnical Services &amp; Process T echnology Customized Solutions Company PROBLEM SOLVING

What is Customized Employment? Laura Owens, Ph.D., CESP September 12, 2018 Learning Objectives

Customized Computing for Power Efficiency Customized Computing for Power Efficiency Jason Cong

LMIs and autonomous work 1 From autonomous work to discontinuous career paths Autonomous

Math 211 Math 211 Lecture #2 2 Autonomous Equations Autonomous Equations General equation:

Curriculum Review Penicuik High School Th The Cu Curriculum Broadly the curriculum covers:

G unter Rote Freie Universit at Berlin G unter Rote, Freie Universit at Berlin

Lifetime of Outdoor Apparel, Footwear &amp; Gear EOG Climate Action Programme Research

Welcome! , = (, ) , + , ,

Fast Rope Optimizing IDIQs as a Prime and a S ub Feb 2016 Optimizing IDIQ and GWACs Prime

February 18, Week 6 Today: Chapter 4, Forces Exam #1 is in mailboxes Homework Assignment #5 -

Exploring the extremes of the Underlying Event Peter Skands (Monash U), with T. Martin &amp; S.

Presented by Jason A. Donenfeld INRIA, February 2017 Who Who Am I? Am I? Jason Donenfeld,

Program Analysis Program Analysis Extracting information, in order to present Extracting

Introduction to Bioinformatics Genome sequencing & assembly Genome sequencing & assembly

The Massive Parallel Sequencing era: "Global sequencing" Richard Christen CNRS UMR

T echnical Services & Process T echnology Customized Solutions Company PROBLEM SOLVING

Lifetime of Outdoor Apparel, Footwear & Gear EOG Climate Action Programme Research

Exploring the extremes of the Underlying Event Peter Skands (Monash U), with T. Martin & S.