writing reusable code feedback at scale with mixed
play

Writing Reusable Code Feedback at Scale with Mixed-Initiative - PowerPoint PPT Presentation

Writing Reusable Code Feedback at Scale with Mixed-Initiative Program Synthesis Andrew Head*, Elena Glassman*, Gustavo Soares*, Ryo Suzuki, Lucas Figueredo, Loris DAntoni, Bjrn Hartmann * These three authors contributed equally to the


  1. Writing Reusable Code Feedback at Scale with Mixed-Initiative Program Synthesis Andrew Head*, Elena Glassman*, Gustavo Soares*, Ryo Suzuki, Lucas Figueredo, Loris D’Antoni, Björn Hartmann * These three authors contributed equally to the work.

  2. When Writing Feedback on Student Code, Teachers Can Draw on Deep Domain Knowledge Incorrect Student Code Submissions Teacher Comments X What happens when n is zero? Hint: look at lecture 5’s slides X …but it does not scale. While this helper function is useful, it does not handle the ca X Have you considered what would happen if combiner was set Motivation � 2 � 2

  3. In lieu of Teacher-Written Feedback, Autograder Shows Test Cases Student Submission …but there’s still a gulf of evaluation . Test Case Results Course Autograder Motivation � 3 3

  4. Program Synthesis Techniques Can Shrink the Gulf by Automatically Finding and Suggesting Bug Fixes for Students 1 Student Submission In line 2, change total = 0 to total = 1 …but the automatically generated feedback is Test Case Results often mechanical, formulaic AutoGrader [PLDI13] Can we combine teachers’ deep domain knowledge with AutomataTutor [TOCHI15] program synthesis to give students better feedback ? CodeAssist [FSE16] Motivation � 4 4

  5. Program Synthesis Learning Code Transformations from Pairs of Incorrect and Correct Submissions Student 1 fixes iterative solution Student 2 fixes 
 recursive solution Generalized code transformation Motivation � 5 5

  6. Program Synthesis Learning Bug-Fixing Code Transformations 6 Motivation � 6 6

  7. We Scale Up a Little Teacher-Written Feedback by Attaching It to Code Transformations Incorrect Student Code Submissions X Code Transformation (add base case) X Teacher Comments What happens when 
 n is zero? Hint: look at lecture 5’s slides on base cases. X Motivation � 7 � 7

  8. Two Interfaces for Attaching Feedback to Code Transformations MistakeBrowser: giving feedback on clusters Learn transformations from Autograder Collect feedback from teachers x x x T x incorrect x x … o x submissions o o final correct submission S x S x x S x x o o S o o o S S Feedback Bank Related Systems: Divide and Conquer [ITS14], AutoStyle [ITS16] Motivation � 8 � 8

  9. Two Interfaces for Attaching Feedback to Code Transformations FixPropagator: attaching feedback to individual fixes Learns transformations from and collect feedback from… Teacher fixes Teacher submission and picks a writes a hint submission T x … o x Feedback Bank Motivation � 9 � 9

  10. Our Program Synthesis Backend Refazer (/h ɛ .fa. ˈ ze(h)/) Means “To redo.” Using Refazer [ICSE17] as a backend, our systems learn bug-fixing code transformations. Program Synthesis Motivation � 10 10

  11. Contributions • An approach for combining human expertise with program synthesis for delivering reusable, scalable code feedback • Implementations of two different systems that use our approach: FixPropagator , MistakeBrowser • In-lab studies that suggest that the systems fulfill our goals, also inform teachers about common student bugs

  12. Outline • Related Work • Program Synthesis • Systems • Evaluation

  13. System Design Suggest fixes, feedback Interfaces for Teachers Refazer Program Synthesis [ICSE ’17] [L@S ’17] Demonstrate fixes, write feedback Mixed-initiative workflows Systems � 13 � 13

  14. Uploads test cases Writes feedback for each cluster Test 1 … T T … Test N Teacher Learns Finds transformation x x x x x transformations o that fixes next o o o o submission x Trans 1 x Trans 1 … Trans N o o … … and returns x Clusters submissions Trans N feedback o by transformation written for it System x … x x x x incorrect x x o submissions x o Submits o final correct incorrect submission S S code Submit code S S S S … Next Semester S Students Systems: MistakeBrowser � 14

  15. Systems: MistakeBrowser � 15 � 15

  16. Systems: MistakeBrowser � 16 � 16

  17. Looks like you’re writing a recursive call. What might you be missing to enable recursion? Systems: MistakeBrowser � 17 � 17

  18. But Not All Classes Have Submission Histories for Hundreds of Students x x incorrect x submissions S S Submit code S S S S Systems: MistakeBrowser 18 �

  19. Accepts or modifies Uploads test cases Picks Fixes Writes suggested fixes, submission hint feedback Test 1 … T x x T … T … o o Teacher Test N … x Learns x x o Suggests fixes o o transformations, x and feedback x x makes clusters, x x Returns x attaches … x x x feedback to x x feedback o o students System … x x … x incorrect x x x submissions … x S S S Submit code S S S S S S S S S Students Systems: MistakeBrowser Systems: FixPropagator 19 �

  20. Systems: FixPropagator � 20 20

  21. Systems: FixPropagator � 21 21

  22. New Student Submission with Same Bug Suggested Fix Systems: FixPropagator � 22 22

  23. Systems: FixPropagator � 23 23

  24. Both Fixes and Feedback Can Be Further Modified Systems: FixPropagator � 24 24

  25. A Study of the Systems Participants : Current and former teaching staff from CS1 MistakeBrowser ( N = 9) FixPropagator ( N = 8) Interface Walkthrough (5 mins.) Main Task (30 mins.): Giving feedback on student submissions Measurements : Feedback, Manual corrections, Response to feedback recommendations (accepted, changed, rejected), Between-task surveys… Qualitative Feedback : Survey and Post-interview Evaluation � 25

  26. 1. Can a few manual corrections fix many submissions? Evaluation � 26

  27. 1. Can a few manual corrections fix many submissions? FixPropagator propagates fixes from dozens of corrections to hundreds of submissions. Evaluation � 27

  28. 1. Can a few manual corrections fix many submissions? FixPropagator propagates fixes from dozens of corrections to hundreds of submissions. Median # submissions given feedback by… Teacher FixPropagator 0 50 100 150 200 250 • Fixes were propagated within minutes 
 ( median = 2m20s, σ = 7m34s for each correction). Evaluation � 28

  29. 2. How often is a teacher’s feedback relevant when it is matched to other students’ submission? Evaluation � 29

  30. 2. How often is a teacher’s feedback relevant when it is matched to other students’ submission? Feedback propagated with FixPropagator was correct a majority of the time, but not always. Teachers reused feedback a median of 20 times, modifying it a median of 6 times (30%). Generalizable Non-Generalizable Comment Comment “Check if you have the “Your starting value product of the correct of z should be a number of terms.” function, not an int.” Evaluation � 30

  31. 2. How often is a teacher’s feedback relevant when it is matched to other students’ submission? MistakeBrowser created conceptually consistent clusters of student bugs. Evaluation � 31

  32. 2. How often is a teacher’s feedback relevant when it is matched to other students’ submission? MistakeBrowser created conceptually consistent clusters of student bugs. 40% 30% % of clusters 20% 10% 0% No or 
 50% 75% Almost 
 100% “No idea” 100% Do these submissions share the same misconception? Responses for N = 11 clusters Evaluation � 32

  33. 
 
 Evaluation Questions 1. Can a few manual corrections fix many submissions? 
 With a median of 10 corrections, FixPropagator suggested fixes for a median of 201 submissions. 2. How often is a teacher’s feedback relevant when it is matched to another student submission? 
 Matched feedback was relevant ~75% of the time. Evaluation � 33

  34. Limitations • The impact of teacher feedback on student learning outcomes has not been evaluated • Code transformations were created that fix submissions one or two bugs away from correct Evaluation � 34

  35. Conclusion We present an approach for combining human expertise with program synthesis for delivering reusable, scalable code feedback. And two systems implementing this approach: MistakeBrowser FixPropagator

  36. Conclusion We present an approach for combining human expertise with program synthesis for delivering reusable, scalable code feedback. And two systems implementing this approach: MistakeBrowser FixPropagator Questions?

Recommend


More recommend