teaching multiple concepts to forgetful learners
play

Teaching Multiple Concepts to Forgetful Learners Yuxin Chen - PowerPoint PPT Presentation

Teaching Multiple Concepts to Forgetful Learners Yuxin Chen chenyuxin@uchicago.edu Oisin Mac Aodha Manuel Gomez Anette Hunziker Yuxin Chen Andreas Krause Pietro Perona Yisong Yue Adish Singla Rodriguez Applications: Language learning


  1. Teaching Multiple Concepts to Forgetful Learners Yuxin Chen chenyuxin@uchicago.edu Oisin Mac Aodha Manuel Gomez Anette Hunziker Yuxin Chen Andreas Krause Pietro Perona Yisong Yue Adish Singla Rodriguez

  2. Applications: Language learning • Over 300+ million students • Based on spaced repetition of flash cards • Can we compute optimal personalized schedule of repetition? 2

  3. Teaching Interaction Using Flashcards Interaction at time 𝒖 = 𝟐, 𝟑, … 𝑼 1. Teacher displays a flashcard 𝑦 𝑢 ∈ {1,2, . . , 𝑜} 2. Learner’s recall is 𝑧 𝑢 ∈ 0, 1 1 3. Teacher provides the correct answer 3 Answer: Spielzeug Learning Phase (1) Learning Phase (2) Learning Phase (3) Learning Phase (4) Learning Phase (5) Learning Phase (6) 1 x jouet 2 3 Answer: Spielzeug Answer: Spielzeug Answer: Nachtisch Answer: Answer: Nachtisch Answer: Nachtisch Buch jouet Submit x jouet ✓ Spielzeug x ✓ Buch x nachs ✓ Nachtisch 2 jouet Submit Spielzeug Submit Submit Buch Submit nachs Submit Nachtisch Submit 3

  4. Background on Teaching Policies Example setup - 𝑈 = 20 and 𝑜 = 5 concepts given by 𝑏, 𝑐, 𝑑, 𝑒, 𝑓 Naïve teaching policies • Random: 𝑏 → 𝑐 → 𝑏 → 𝑓 → 𝑑 → 𝑒 → 𝑏 → 𝑒 → 𝑑 → 𝑏 → 𝑐 → 𝑓 → 𝑏 → 𝑐 → 𝑒 → 𝑓 → • Round-robin: 𝑏 → 𝑐 → 𝑑 → 𝑒 → 𝑓 → 𝑏 → 𝑐 → 𝑑 → 𝑒 → 𝑓 → 𝑏 → 𝑐 → 𝑑 → 𝑒 → 𝑓 → 𝑏 → Key limitation : Schedule agnostic to learning process 4

  5. Background: Pimsieur Method (1967) Used in mainstream language learning platforms Based on spaced repetition ideas 𝑏 → 𝑐 → 𝑏 → 𝑐 → 𝑑 → 𝑏 → 𝑑 → 𝑐 → 𝑒 → 𝑑 → 𝑒 → 𝑏 → 𝑐 → 𝑒 → 𝑑 → 𝑓 → 𝑏 → 𝑐 → 𝑏 → 𝑐 → 𝑑 → 𝑏 → 𝑑 → 𝑐 → 𝑒 → 𝑑 → 𝑒 → 𝑏 → 𝑐 → 𝑒 → 𝑑 → 𝑓 → 5

  6. Background: Leitner System (1972) Adaptive spacing intervals Key limitation : No guarantees on the optimality of the schedule Student 1 : 𝑏 → 𝑐 → 𝑏 → 𝑐 → 𝑑 → 𝑏 → 𝑑 → 𝑐 → 𝑒 → 𝑑 → 𝑒 → 𝑏 → 𝑐 → 𝑒 → 𝑑 → 𝑓 → 𝑏 → 𝑏 → 𝑐 → 𝑏 → 𝑐 → 𝑑 → 𝑏 → 𝑑 → 𝑏 → 𝑐 → 𝑑 → 𝑏 → 𝑐 → 𝑏 → 𝑒 → 𝑑 → Student 2: 6

  7. Modeling Forgetfulness Half-life Regression (HRL) model [Settles & Meeder, ACL 2016] Time since last teaching concept Time t Recall Probability p i ( t | history) = 2 − ∆ ti hi of Concept i: Half-life estimate (depends on feedback) h i += a i h i += b i

  8. Interactive Teaching Protocol • For t = 1…T - Teacher chooses concept 𝑗𝜗 1, … , 𝑛 (e.g., a flashcard) - Learner tries to recall concept (success or fail) - Teacher reveals answer (e.g., “Spielzug”) • Goal: maximize % ' 𝑔 history = 1 1 𝑈 B B 𝑞 𝑗 𝑢 | history $:&)$ 𝑛 "#$ &#$ “Area Under Curve”

  9. Naive Approaches • Round Robin - Doesn’t adapt to new estimates of learner recall probabilities - Over-teaches easy concepts - Under-teaches hard concepts • Lowest Recall Probability - Generalization of Pimsleur method and Leitner system - Doesn’t consider change to recall probability

  10. Greedy Teaching Algorithm (interactive) • Choose concept i to maximize Δ 𝑗 history = 𝐹 * ! 𝑔 history⨁ 𝑗, 𝑧 & − 𝑔(history) y t : success or failure of recall at time t (randomness over model estimate) p i ( t | history) = 2 − ∆ ti hi ( h i updated after observing y t )

  11. Characteristics of the Optimization Problem • Non-submodular - Gain of a concept 𝑦 can increase given longer history - Captured by submodularity ratio 𝛿 over sequences 11

  12. Characteristics of the Optimization Problem (cont.) • Post-fix non-monotone - 𝑔 orange ⨁ blue < 𝑔 blue - Captured by curvature ω 12

  13. Theoretical Guarantees: General Case • Guarantees for the general case ( any memory model ) • Utility of 𝜌 gr (greedy policy) compared to 𝜌 opt is given by ) &*( 1 𝛿 )*& 1 − 𝜕 + 5 𝛿 + ≥ 𝐺 𝜌 #$% 1 − 𝑓 *0 !"# 1 2 !$% 𝐺 𝜌 !" ≥ 𝐺 𝜌 #$% 0 2 𝜕 -./ 𝑈 𝑈 &'( +', Theorem 1 Corollary 2 13

  14. Theoretical Guarantees: HLR Model • Consider the task of teaching 𝑜 concepts where each concept is following an independent HLR model with the same parameters 𝑏 0 = 𝑨, 𝑐 0 = 𝑨 ∀ 𝑦 ∈ {1,2, . . , 𝑜} . A sufficient condition for the algorithm to achieve (1 − 𝜗) high utility is 12 " z ≥ max {log 𝑈, log 3𝑜 , log 3' } 14

  15. Illustration: Simulation Results Round Greedy Robin Optimal Objective

  16. User Study 150 participants from Mechanical Turk platform T=40, m=15, total study time is about 25 mins

  17. Figure 6: Samples from the German dataset GR LR RR RD German Avg. gain 0.572 0.487 0.462 0.467 p-value - 0.0652 0.0197 0.0151 17

  18. (a) Common: Owl, Cat, Horse, Elephant, Lion, Tiger, Bear (b) Rare: Angwantibo, Olinguito, Axolotl, Ptarmigan, Patrijshond, Coelacanth, Pyrrhuloxia GR LR RR RD Biodiversity Avg. gain 0.475 0.411 0.390 0.251 (all species) p-value - 0.0017 0.0001 0.0001 GR LR RR RD Biodiversity Avg. gain 0.766 0.668 0.601 0.396 (rare species) p-value - 0.0001 0.0001 0.0001 18

  19. Summary: Teaching Concepts to People • Teaching forgetful learners - Limited memory (modeling forgetfulness) - Engagement (interface design) • Challenges not covered in this talk: - Limited inference power and noise - Mismatch in representation - Interpretability (e.g., teaching via labels vs. rich feedback) - Safety (e.g., when teaching physical tasks) - Fairness (e.g., when teaching a class) - … Questions? 19

Recommend


More recommend