Where's The Reward? Where's The Reward? A Review of Reinforcement - PowerPoint PPT Presentation

The Dark Ages The Dark Ages c. 1972 - 2000s c. 1972 - 2000s By 1970s - Howard, Smallwood, Matheson et al. go back to operations research (sans education) 1975 - Atkinson leaves research (for administrative positions) 17

Suppes (1974) The Place of Theory in Educational Research AERA Presidential Address “The mathematical techniques of optimization used in theories of instruction draw upon a wealth of results from other areas of science, especially from tools developed in mathematical economics and operations research over the past two decades, and it would be my prediction that we will see increasingly sophisticated theories of instruction in the near future. ” 18

Suppes (1974) The Place of Theory in Educational Research AERA Presidential Address “The mathematical techniques of optimization used in theories of instruction draw upon a wealth of results from other areas of science, especially from tools developed in mathematical economics and operations research over the past two decades, and it would be my prediction that we will see increasingly sophisticated theories of instruction in the near future. ” Atkinson (2014) “work [on MOOCs] is promising, but the key to success is individualizing instruction, and necessarily that requires a psychological theory of the learning process” 18

Second Wave: 2000s Second Wave: 2000s Why 2000s? 19

Second Wave: 2000s Second Wave: 2000s Why 2000s? Intelligent Tutoring Systems 19

Second Wave: 2000s Second Wave: 2000s Why 2000s? Intelligent Tutoring Systems Reinforcement Learning formed as a field 19

Second Wave: 2000s Second Wave: 2000s Why 2000s? Intelligent Tutoring Systems Reinforcement Learning formed as a field AIED/EDM: studying statistical models of learning 19

Second Wave: 2000s Second Wave: 2000s Why 2000s? Intelligent Tutoring Systems Reinforcement Learning formed as a field AIED/EDM: studying statistical models of learning Parallels 1960s 19

Second Wave: 2000s Second Wave: 2000s Why 2000s? Intelligent Tutoring Systems Reinforcement Learning formed as a field AIED/EDM: studying statistical models of learning Parallels 1960s Teaching machines and Computer-Assisted Instruction Dynamic Programming and Markov Decision Processes Mathematical Psych: studying mathematical models of learning 19

Reinforcement Learning AI in Education / ITS 20

Reinforcement Learning AI in Education / ITS Andrew Barto Beverly Woolf Joe Beck 20

Reinforcement Learning AI in Education / ITS Andrew Barto Beverly Woolf Balaraman Ravindran Joe Beck 20

Reinforcement Learning AI in Education / ITS Emma Brunskill Vincent Aleven Shayan Doroudi 21

The Third Wave: The Third Wave: What Lies in the Horizon What Lies in the Horizon Why 2010s? 22

The Third Wave: The Third Wave: What Lies in the Horizon What Lies in the Horizon Why 2010s? Massive Open Online Courses (MOOCs) 22

The Third Wave: The Third Wave: What Lies in the Horizon What Lies in the Horizon Why 2010s? Massive Open Online Courses (MOOCs) Deep Reinforcement Learning formed as a field 22

The Third Wave: The Third Wave: What Lies in the Horizon What Lies in the Horizon Why 2010s? Massive Open Online Courses (MOOCs) Deep Reinforcement Learning formed as a field Deep Learning: building deep models of learning 22

The Third Wave: The Third Wave: What Lies in the Horizon What Lies in the Horizon Why 2010s? Massive Open Online Courses (MOOCs) Deep Reinforcement Learning formed as a field Deep Learning: building deep models of learning 35% increase in papers/books mentioning “reinforcement learning” from 2016 to 2017 (Google Scholar) 22

Three Waves: Summary Three Waves: Summary First Wave Second Wave Third Wave (1960s-70s) (2000s-2010s) (2010s) Medium of Teaching Intelligent Massive Open Instruction Machines / CAI Tutoring Systems Online Courses Optimization Decision Reinforcement Deep RL Models Processes Learning Models of Mathematical Machine Learning Deep Learning Learning Psychology AIED/EDM 23

Three Waves: Summary Three Waves: Summary First Wave Second Wave Third Wave (1960s-70s) (2000s-2010s) (2010s) Medium of Teaching Intelligent Massive Open Instruction Machines / CAI Tutoring Systems Online Courses Optimization Decision Reinforcement Deep RL Models Processes Learning More data-driven Models of Mathematical Machine Learning Deep Learning Learning Psychology AIED/EDM 23

Three Waves: Summary Three Waves: Summary First Wave Second Wave Third Wave (1960s-70s) (2000s-2010s) (2010s) Medium of Teaching Intelligent Massive Open More data-generating Instruction Machines / CAI Tutoring Systems Online Courses Optimization Decision Reinforcement Deep RL Models Processes Learning More data-driven Models of Mathematical Machine Learning Deep Learning Learning Psychology AIED/EDM 23

Overview Overview Reinforcement Learning: Towards a “Theory of Instruction” Part 1: Historical Perspective Part 2: Systematic Review Discussion: Where's the Reward? Part 3: Case Study Planning for the Future 24

Inclusion Criteria Inclusion Criteria We consider any papers where: 25

Inclusion Criteria Inclusion Criteria We consider any papers where: There is (implicitly) a model of the learning process, where different instructional actions probabilistically change the state of a student. 25

Inclusion Criteria Inclusion Criteria We consider any papers where: There is (implicitly) a model of the learning process, where different instructional actions probabilistically change the state of a student. There is an instructional policy that maps past observations from a student (e.g., responses to questions) to instructional actions. 25

Inclusion Criteria Inclusion Criteria We consider any papers where: There is (implicitly) a model of the learning process, where different instructional actions probabilistically change the state of a student. There is an instructional policy that maps past observations from a student (e.g., responses to questions) to instructional actions. Data collected from students are used to learn either: the model an adaptive policy 25

Inclusion Criteria Inclusion Criteria We consider any papers where: There is (implicitly) a model of the learning process, where different instructional actions probabilistically change the state of a student. There is an instructional policy that maps past observations from a student (e.g., responses to questions) to instructional actions. Data collected from students are used to learn either: the model an adaptive policy If the model is learned, the instructional policy is designed to (approximately) optimize that model according to some reward function 25

What's Not Included? What's Not Included? 26

What's Not Included? What's Not Included? Adaptive policies that use hand-made or heuristic decision rules (rather than data-driven/optimized decision rules) 26

What's Not Included? What's Not Included? Adaptive policies that use hand-made or heuristic decision rules (rather than data-driven/optimized decision rules) Experiments that do not control for everything other than sequence of instruction 26

What's Not Included? What's Not Included? Adaptive policies that use hand-made or heuristic decision rules (rather than data-driven/optimized decision rules) Experiments that do not control for everything other than sequence of instruction Machine teaching experiments 26

What's Not Included? What's Not Included? Adaptive policies that use hand-made or heuristic decision rules (rather than data-driven/optimized decision rules) Experiments that do not control for everything other than sequence of instruction Machine teaching experiments Experiments that use RL for other educational purposes, such as: generating data-driven hints (Stamper et al., 2013) or giving feedback (Rafferty et al., 2015) 26

Review Overview Review Overview 27 studies empirically compare adaptive policy to baseline 27

Review Overview Review Overview 27 studies empirically compare adaptive policy to baseline ≥ 10 papers compare policies learned with student data in simulation 27

Review Overview Review Overview 27 studies empirically compare adaptive policy to baseline ≥ 10 papers compare policies learned with student data in simulation ≥ 16 papers build policies only on simulated data 27

Review Overview Review Overview 27 studies empirically compare adaptive policy to baseline ≥ 10 papers compare policies learned with student data in simulation ≥ 16 papers build policies only on simulated data ≥ 7 papers that propose using RL for instructional sequencing 27

Review Overview Review Overview 27 studies empirically compare adaptive policy to baseline ≥ 10 papers compare policies learned with student data in simulation ≥ 16 papers build policies only on simulated data ≥ 7 papers that propose using RL for instructional sequencing ≥ 3 other papers with policies used on real students 27

Review Overview Review Overview Among papers with empirical comparisons: 14 found sig difference between adaptive policy and baseline 28

Review Overview Review Overview Among papers with empirical comparisons: 14 found sig difference between adaptive policy and baseline 2 found sig aptitude-treatment interaction Policy is sig better for below median learners 28

Review Overview Review Overview Among papers with empirical comparisons: 14 found sig difference between adaptive policy and baseline 2 found sig aptitude-treatment interaction Policy is sig better for below median learners 2 found sig difference between adaptive policy and some but not all baselines 28

Review Overview Review Overview Among papers with empirical comparisons: 14 found sig difference between adaptive policy and baseline 2 found sig aptitude-treatment interaction Policy is sig better for below median learners 2 found sig difference between adaptive policy and some but not all baselines 9 found no sig difference between policies 28

Studies by Year Studies by Year 29

Review Summary Review Summary 30

Overview Overview Reinforcement Learning: Towards a “Theory of Instruction” Part 1: Historical Perspective Part 2: Systematic Review Discussion: Where's the Reward? Part 3: Case Study Planning for the Future 31

Where's the Reward? Where's the Reward? The Pessimistic Story Studies with sig difference were often constrained: 32

Where's the Reward? Where's the Reward? The Pessimistic Story Studies with sig difference were often constrained: 7 of them only compare to random policy or other RL-induced policy 32

Where's the Reward? Where's the Reward? The Pessimistic Story Studies with sig difference were often constrained: 7 of them only compare to random policy or other RL-induced policy 9 of them were on paired-association tasks or concept learning tasks Decent psychological understanding of how humans learn 32

Where's the Reward? Where's the Reward? The Pessimistic Story Studies with sig difference were often constrained: 7 of them only compare to random policy or other RL-induced policy 9 of them were on paired-association tasks or concept learning tasks Decent psychological understanding of how humans learn 2 of the studies (+ 2 ATI studies) sequenced activity types rather than content 32

Where's the Reward? Where's the Reward? The Pessimistic Story Studies with sig difference were often constrained: 7 of them only compare to random policy or other RL-induced policy 9 of them were on paired-association tasks or concept learning tasks Decent psychological understanding of how humans learn 2 of the studies (+ 2 ATI studies) sequenced activity types rather than content 2 of the studies did not optimize for learning 32

Where's the Reward? Where's the Reward? The Pessimistic Story Studies with sig difference were often constrained: 7 of them only compare to random policy or other RL-induced policy 9 of them were on paired-association tasks or concept learning tasks Decent psychological understanding of how humans learn 2 of the studies (+ 2 ATI studies) sequenced activity types rather than content 2 of the studies did not optimize for learning 1 study seems to have been “lucky” 32

Where's the Reward? Where's the Reward? The Pessimistic Story Among papers without sig difference: 33

Where's the Reward? Where's the Reward? The Pessimistic Story Among papers without sig difference: Only 3 of them only compare to random policy or other RL-induced policy 33

Where's the Reward? Where's the Reward? The Pessimistic Story Among papers without sig difference: Only 3 of them only compare to random policy or other RL-induced policy Only 3 of them were on paired-association or concept learning tasks 33

Where's the Reward? Where's the Reward? The Pessimistic Story Among papers without sig difference: Only 3 of them only compare to random policy or other RL-induced policy Only 3 of them were on paired-association or concept learning tasks Only 2 of them sequenced activity types rather than content. 33

Where's the Reward? Where's the Reward? The Pessimistic Story Among papers without sig difference: Only 3 of them only compare to random policy or other RL-induced policy Only 3 of them were on paired-association or concept learning tasks Only 2 of them sequenced activity types rather than content. Papers that showed no sig. difference were generally more complex and ambitious in a number of dimensions 33

Where's The Reward? Where's The Reward? A Review of Reinforcement - PowerPoint PPT Presentation

Where's The Reward? Where's The Reward? A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Research Question Research Question Over the past 50 years, how Over the past 50 years, how successful has RL

The ULTIMATE Business Incentive Company REWARD YOUR CUSTOMERS; REWARD YOUR EMPLOYEES REWARD YOUR

Risk/Reward Risk/Reward If you buy here, what is the target? What is the risk? 1 221

Being a METS Startup Fast Failure; Fast Reward November 2016 Fast Failure; Fast Reward

MENTAL WELLBEING: THE HEART OF YOUR TOTAL REWARD PROPOSITION Jane Gibbon Group Reward Director

Reward Platform for Healthy Activities Alessio Signorini Chief T echnology Officer REWARD

and rewards positive conduct What does UPBEAT stand for? UPBEAT Merit Reward Scheme U =

Neurobiological Foundations of Reward and Risk ... and corresponding risk prediction errors

Reward Shaping in Episodic Reinforcement Learning Marek Grze s Canterbury, UK AAMAS 2017

PERFORMANCE APPRAISAL SYSTEMS CHAPTER VII REWARD FOR PERFORMANCE PERFORMANCE APPRAISAL SYSTEMS

Balancing Risk and Reward in a Balancing Risk and Reward in a Market- -based Task Service based

W.M. KECK FOUNDATION [Funding for High-Risk, High-Reward Science] October 19, 2017 Watt Family

HARMONISING AND HUMANISING REWARD USING NEW TECHNOLOGIES TO PUT THE INDIVIDUAL AT THE HEART OF

Hannans Reward Ltd Minerals Exploration Western Australia Gold Nickel Iron

No Exchange, Same Pain, No Gain: Risk-Reward of Wearable Healthcare Disclosure for Receiving

Using Natural Language for Reward Shaping in Reinforcement Learning Prasoon Goyal , Scott Niekum

2 3 Markov Decision Process r k+1 s k+1 Environment Environment Action a k State s k Reward r k

The Future of Mobile Banking MICHAEL NUCIFORO Mobile Consultant, Innovator and Futurist About

PUBLIC IMAGE Putting Resources in YOUR Hands PDG Clint Schroeder Spring 2020

GERDA GeDDAQ GERDA GeDDAQ Status, operation, integration INFN Padova INFN & University

Discussion of The Decline in Bank- Led Corporate Restructuring in Japan: 1981-2010 by

PCI based DDAQ PCI based DDAQ status and perspectives status and perspectives INFN Padova INFN

Germanys Campaign I have nothing to disclose. to Increase Organ Donation Christiane Kugler,

CutFem and Finite Differences for wave equations Gunilla Kreiss Uppsala University, Sweden

Review: Basic Concepts Simula5ons 1. Radio Waves

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

Where's The Reward? Where's The Reward? A Review of Reinforcement - PowerPoint PPT Presentation

Where's The Reward? Where's The Reward? A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Research Question Research Question Over the past 50 years, how Over the past 50 years, how successful has RL

The ULTIMATE Business Incentive Company REWARD YOUR CUSTOMERS; REWARD YOUR EMPLOYEES REWARD YOUR

Risk/Reward Risk/Reward If you buy here, what is the target? What is the risk? 1 221

Being a METS Startup Fast Failure; Fast Reward November 2016 Fast Failure; Fast Reward

MENTAL WELLBEING: THE HEART OF YOUR TOTAL REWARD PROPOSITION Jane Gibbon Group Reward Director

Reward Platform for Healthy Activities Alessio Signorini Chief T echnology Officer REWARD

and rewards positive conduct What does UPBEAT stand for? UPBEAT Merit Reward Scheme U =

Neurobiological Foundations of Reward and Risk ... and corresponding risk prediction errors

Reward Shaping in Episodic Reinforcement Learning Marek Grze s Canterbury, UK AAMAS 2017

PERFORMANCE APPRAISAL SYSTEMS CHAPTER VII REWARD FOR PERFORMANCE PERFORMANCE APPRAISAL SYSTEMS

Balancing Risk and Reward in a Balancing Risk and Reward in a Market- -based Task Service based

W.M. KECK FOUNDATION [Funding for High-Risk, High-Reward Science] October 19, 2017 Watt Family

HARMONISING AND HUMANISING REWARD USING NEW TECHNOLOGIES TO PUT THE INDIVIDUAL AT THE HEART OF

Hannans Reward Ltd Minerals Exploration Western Australia Gold Nickel Iron

No Exchange, Same Pain, No Gain: Risk-Reward of Wearable Healthcare Disclosure for Receiving

Using Natural Language for Reward Shaping in Reinforcement Learning Prasoon Goyal , Scott Niekum

2 3 Markov Decision Process r k+1 s k+1 Environment Environment Action a k State s k Reward r k

The Future of Mobile Banking MICHAEL NUCIFORO Mobile Consultant, Innovator and Futurist About

PUBLIC IMAGE Putting Resources in YOUR Hands PDG Clint Schroeder Spring 2020

GERDA GeDDAQ GERDA GeDDAQ Status, operation, integration INFN Padova INFN &amp; University

Discussion of The Decline in Bank- Led Corporate Restructuring in Japan: 1981-2010 by

PCI based DDAQ PCI based DDAQ status and perspectives status and perspectives INFN Padova INFN

Germanys Campaign I have nothing to disclose. to Increase Organ Donation Christiane Kugler,

CutFem and Finite Differences for wave equations Gunilla Kreiss Uppsala University, Sweden

Review: Basic Concepts Simula5ons 1. Radio Waves

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

GERDA GeDDAQ GERDA GeDDAQ Status, operation, integration INFN Padova INFN & University