The Dark Ages The Dark Ages c. 1972 - 2000s c. 1972 - 2000s By 1970s - Howard, Smallwood, Matheson et al. go back to operations research (sans education) 1975 - Atkinson leaves research (for administrative positions) 17
Suppes (1974) The Place of Theory in Educational Research AERA Presidential Address “The mathematical techniques of optimization used in theories of instruction draw upon a wealth of results from other areas of science, especially from tools developed in mathematical economics and operations research over the past two decades, and it would be my prediction that we will see increasingly sophisticated theories of instruction in the near future. ” 18
Suppes (1974) The Place of Theory in Educational Research AERA Presidential Address “The mathematical techniques of optimization used in theories of instruction draw upon a wealth of results from other areas of science, especially from tools developed in mathematical economics and operations research over the past two decades, and it would be my prediction that we will see increasingly sophisticated theories of instruction in the near future. ” Atkinson (2014) “work [on MOOCs] is promising, but the key to success is individualizing instruction, and necessarily that requires a psychological theory of the learning process” 18
Second Wave: 2000s Second Wave: 2000s Why 2000s? 19
Second Wave: 2000s Second Wave: 2000s Why 2000s? Intelligent Tutoring Systems 19
Second Wave: 2000s Second Wave: 2000s Why 2000s? Intelligent Tutoring Systems Reinforcement Learning formed as a field 19
Second Wave: 2000s Second Wave: 2000s Why 2000s? Intelligent Tutoring Systems Reinforcement Learning formed as a field AIED/EDM: studying statistical models of learning 19
Second Wave: 2000s Second Wave: 2000s Why 2000s? Intelligent Tutoring Systems Reinforcement Learning formed as a field AIED/EDM: studying statistical models of learning Parallels 1960s 19
Second Wave: 2000s Second Wave: 2000s Why 2000s? Intelligent Tutoring Systems Reinforcement Learning formed as a field AIED/EDM: studying statistical models of learning Parallels 1960s Teaching machines and Computer-Assisted Instruction Dynamic Programming and Markov Decision Processes Mathematical Psych: studying mathematical models of learning 19
Reinforcement Learning AI in Education / ITS 20
Reinforcement Learning AI in Education / ITS Andrew Barto Beverly Woolf Joe Beck 20
Reinforcement Learning AI in Education / ITS Andrew Barto Beverly Woolf Balaraman Ravindran Joe Beck 20
Reinforcement Learning AI in Education / ITS Emma Brunskill Vincent Aleven Shayan Doroudi 21
The Third Wave: The Third Wave: What Lies in the Horizon What Lies in the Horizon Why 2010s? 22
The Third Wave: The Third Wave: What Lies in the Horizon What Lies in the Horizon Why 2010s? Massive Open Online Courses (MOOCs) 22
The Third Wave: The Third Wave: What Lies in the Horizon What Lies in the Horizon Why 2010s? Massive Open Online Courses (MOOCs) Deep Reinforcement Learning formed as a field 22
The Third Wave: The Third Wave: What Lies in the Horizon What Lies in the Horizon Why 2010s? Massive Open Online Courses (MOOCs) Deep Reinforcement Learning formed as a field Deep Learning: building deep models of learning 22
The Third Wave: The Third Wave: What Lies in the Horizon What Lies in the Horizon Why 2010s? Massive Open Online Courses (MOOCs) Deep Reinforcement Learning formed as a field Deep Learning: building deep models of learning 35% increase in papers/books mentioning “reinforcement learning” from 2016 to 2017 (Google Scholar) 22
Three Waves: Summary Three Waves: Summary First Wave Second Wave Third Wave (1960s-70s) (2000s-2010s) (2010s) Medium of Teaching Intelligent Massive Open Instruction Machines / CAI Tutoring Systems Online Courses Optimization Decision Reinforcement Deep RL Models Processes Learning Models of Mathematical Machine Learning Deep Learning Learning Psychology AIED/EDM 23
Three Waves: Summary Three Waves: Summary First Wave Second Wave Third Wave (1960s-70s) (2000s-2010s) (2010s) Medium of Teaching Intelligent Massive Open Instruction Machines / CAI Tutoring Systems Online Courses Optimization Decision Reinforcement Deep RL Models Processes Learning More data-driven Models of Mathematical Machine Learning Deep Learning Learning Psychology AIED/EDM 23
Three Waves: Summary Three Waves: Summary First Wave Second Wave Third Wave (1960s-70s) (2000s-2010s) (2010s) Medium of Teaching Intelligent Massive Open More data-generating Instruction Machines / CAI Tutoring Systems Online Courses Optimization Decision Reinforcement Deep RL Models Processes Learning More data-driven Models of Mathematical Machine Learning Deep Learning Learning Psychology AIED/EDM 23
Overview Overview Reinforcement Learning: Towards a “Theory of Instruction” Part 1: Historical Perspective Part 2: Systematic Review Discussion: Where's the Reward? Part 3: Case Study Planning for the Future 24
Inclusion Criteria Inclusion Criteria We consider any papers where: 25
Inclusion Criteria Inclusion Criteria We consider any papers where: There is (implicitly) a model of the learning process, where different instructional actions probabilistically change the state of a student. 25
Inclusion Criteria Inclusion Criteria We consider any papers where: There is (implicitly) a model of the learning process, where different instructional actions probabilistically change the state of a student. There is an instructional policy that maps past observations from a student (e.g., responses to questions) to instructional actions. 25
Inclusion Criteria Inclusion Criteria We consider any papers where: There is (implicitly) a model of the learning process, where different instructional actions probabilistically change the state of a student. There is an instructional policy that maps past observations from a student (e.g., responses to questions) to instructional actions. Data collected from students are used to learn either: the model an adaptive policy 25
Inclusion Criteria Inclusion Criteria We consider any papers where: There is (implicitly) a model of the learning process, where different instructional actions probabilistically change the state of a student. There is an instructional policy that maps past observations from a student (e.g., responses to questions) to instructional actions. Data collected from students are used to learn either: the model an adaptive policy If the model is learned, the instructional policy is designed to (approximately) optimize that model according to some reward function 25
What's Not Included? What's Not Included? 26
What's Not Included? What's Not Included? Adaptive policies that use hand-made or heuristic decision rules (rather than data-driven/optimized decision rules) 26
What's Not Included? What's Not Included? Adaptive policies that use hand-made or heuristic decision rules (rather than data-driven/optimized decision rules) Experiments that do not control for everything other than sequence of instruction 26
What's Not Included? What's Not Included? Adaptive policies that use hand-made or heuristic decision rules (rather than data-driven/optimized decision rules) Experiments that do not control for everything other than sequence of instruction Machine teaching experiments 26
What's Not Included? What's Not Included? Adaptive policies that use hand-made or heuristic decision rules (rather than data-driven/optimized decision rules) Experiments that do not control for everything other than sequence of instruction Machine teaching experiments Experiments that use RL for other educational purposes, such as: generating data-driven hints (Stamper et al., 2013) or giving feedback (Rafferty et al., 2015) 26
Review Overview Review Overview 27 studies empirically compare adaptive policy to baseline 27
Review Overview Review Overview 27 studies empirically compare adaptive policy to baseline ≥ 10 papers compare policies learned with student data in simulation 27
Review Overview Review Overview 27 studies empirically compare adaptive policy to baseline ≥ 10 papers compare policies learned with student data in simulation ≥ 16 papers build policies only on simulated data 27
Review Overview Review Overview 27 studies empirically compare adaptive policy to baseline ≥ 10 papers compare policies learned with student data in simulation ≥ 16 papers build policies only on simulated data ≥ 7 papers that propose using RL for instructional sequencing 27
Review Overview Review Overview 27 studies empirically compare adaptive policy to baseline ≥ 10 papers compare policies learned with student data in simulation ≥ 16 papers build policies only on simulated data ≥ 7 papers that propose using RL for instructional sequencing ≥ 3 other papers with policies used on real students 27
Review Overview Review Overview Among papers with empirical comparisons: 14 found sig difference between adaptive policy and baseline 28
Review Overview Review Overview Among papers with empirical comparisons: 14 found sig difference between adaptive policy and baseline 2 found sig aptitude-treatment interaction Policy is sig better for below median learners 28
Review Overview Review Overview Among papers with empirical comparisons: 14 found sig difference between adaptive policy and baseline 2 found sig aptitude-treatment interaction Policy is sig better for below median learners 2 found sig difference between adaptive policy and some but not all baselines 28
Review Overview Review Overview Among papers with empirical comparisons: 14 found sig difference between adaptive policy and baseline 2 found sig aptitude-treatment interaction Policy is sig better for below median learners 2 found sig difference between adaptive policy and some but not all baselines 9 found no sig difference between policies 28
Studies by Year Studies by Year 29
Review Summary Review Summary 30
Overview Overview Reinforcement Learning: Towards a “Theory of Instruction” Part 1: Historical Perspective Part 2: Systematic Review Discussion: Where's the Reward? Part 3: Case Study Planning for the Future 31
Where's the Reward? Where's the Reward? The Pessimistic Story Studies with sig difference were often constrained: 32
Where's the Reward? Where's the Reward? The Pessimistic Story Studies with sig difference were often constrained: 7 of them only compare to random policy or other RL-induced policy 32
Where's the Reward? Where's the Reward? The Pessimistic Story Studies with sig difference were often constrained: 7 of them only compare to random policy or other RL-induced policy 9 of them were on paired-association tasks or concept learning tasks Decent psychological understanding of how humans learn 32
Where's the Reward? Where's the Reward? The Pessimistic Story Studies with sig difference were often constrained: 7 of them only compare to random policy or other RL-induced policy 9 of them were on paired-association tasks or concept learning tasks Decent psychological understanding of how humans learn 2 of the studies (+ 2 ATI studies) sequenced activity types rather than content 32
Where's the Reward? Where's the Reward? The Pessimistic Story Studies with sig difference were often constrained: 7 of them only compare to random policy or other RL-induced policy 9 of them were on paired-association tasks or concept learning tasks Decent psychological understanding of how humans learn 2 of the studies (+ 2 ATI studies) sequenced activity types rather than content 2 of the studies did not optimize for learning 32
Where's the Reward? Where's the Reward? The Pessimistic Story Studies with sig difference were often constrained: 7 of them only compare to random policy or other RL-induced policy 9 of them were on paired-association tasks or concept learning tasks Decent psychological understanding of how humans learn 2 of the studies (+ 2 ATI studies) sequenced activity types rather than content 2 of the studies did not optimize for learning 1 study seems to have been “lucky” 32
Where's the Reward? Where's the Reward? The Pessimistic Story Among papers without sig difference: 33
Where's the Reward? Where's the Reward? The Pessimistic Story Among papers without sig difference: Only 3 of them only compare to random policy or other RL-induced policy 33
Where's the Reward? Where's the Reward? The Pessimistic Story Among papers without sig difference: Only 3 of them only compare to random policy or other RL-induced policy Only 3 of them were on paired-association or concept learning tasks 33
Where's the Reward? Where's the Reward? The Pessimistic Story Among papers without sig difference: Only 3 of them only compare to random policy or other RL-induced policy Only 3 of them were on paired-association or concept learning tasks Only 2 of them sequenced activity types rather than content. 33
Where's the Reward? Where's the Reward? The Pessimistic Story Among papers without sig difference: Only 3 of them only compare to random policy or other RL-induced policy Only 3 of them were on paired-association or concept learning tasks Only 2 of them sequenced activity types rather than content. Papers that showed no sig. difference were generally more complex and ambitious in a number of dimensions 33
Recommend
More recommend