An Exploration of Data-Driven Hint Generation in an Open-Ended Programming Problem Thomas Price Tiffany Barnes North Carolina State University Workshop on Graph-based EDM 2015 Thomas Price, Tiffany Barnes (NCSU) Hints in Open-Ended Problems G-EDM 2015 1 / 19
Introduction Introduction Data-driven hint generation The process of extracting contextualized hints from previous students’ solutions to a problem Avoids the need for an expensive, hand-authored expert model Primary example: the Hint Factory, has been applied in a variety of domains Logical proofs (Stamper et al. 2013) Linked list problems (Fossati, Eugenio, and Ohlsson 2009) A programming game (Hicks, Peddycord III, and Barnes 2014) How can we generate hints in complex, open-ended programming problems? Thomas Price, Tiffany Barnes (NCSU) Hints in Open-Ended Problems G-EDM 2015 2 / 19
Introduction Hint Generation Our input is log data from previous students solving a problem, which can be represented as an interaction network (Eagle and Barnes 2015) Thomas Price, Tiffany Barnes (NCSU) Hints in Open-Ended Problems G-EDM 2015 3 / 19
Introduction Hints for Open-Ended Programming Some approaches have successfully generated hints for programming problems (e.g. Jin, Barnes, and Stamper 2012; Rivers and Koedinger 2014) These are most successful on smaller, well-constrained programming problems, with a clear solution We are interested in the opposite type of problem: Large state space Multiple, loosely ordered subgoals Unstructured output A creative, design task Many novice programming activities have these traits, such as making games and apps Thomas Price, Tiffany Barnes (NCSU) Hints in Open-Ended Problems G-EDM 2015 4 / 19
Current Approaches Challenges How do we represent a student’s state in a programming problem, and when should two states be connected? Naive state representation: states correspond to snapshots of students’ code Problem: It is unlikely any two students will have the exact same state Simple connection rule: connect states that past students have traversed Problem: May lead to a very sparse network Thomas Price, Tiffany Barnes (NCSU) Hints in Open-Ended Problems G-EDM 2015 5 / 19
Current Approaches Current Approaches This challenge has been addressed previously in three ways: Canonicalization : Remove semantically unimportant information from student code to increase state overlap (e.g. Lazar and Bratko 2014; Rivers and Koedinger 2012) Connecting States : Connect similar, existing states in the network with synthetic actions (Rivers and Koedinger 2013) Or add whole paths between states, including synthetic states (Rivers and Koedinger 2014) Alternate State Definitions : Choose a non-code state representation, such as the output of the student’s code (Hicks, Peddycord III, and Barnes 2014) Thomas Price, Tiffany Barnes (NCSU) Hints in Open-Ended Problems G-EDM 2015 6 / 19
Data Collection An Open-Ended Problem We wanted to investigate the applicability of current approaches to an open-ended programming problem Thomas Price, Tiffany Barnes (NCSU) Hints in Open-Ended Problems G-EDM 2015 7 / 19
Data Collection An Open-Ended Problem We wanted to investigate the applicability of current approaches to an open-ended programming problem Thomas Price, Tiffany Barnes (NCSU) Hints in Open-Ended Problems G-EDM 2015 7 / 19
Data Collection Data Collection Collected data at a middle school STEM outreach program called SPARCS (Catet´ e, Wassell, and Barnes 2014) 17 6th grade students (12 male; 5 female) Students had 45 minutes to work on the activity Instructor help was provided only upon request Thomas Price, Tiffany Barnes (NCSU) Hints in Open-Ended Problems G-EDM 2015 8 / 19
Analysis Canonicalization Students’ programs were represented as Abstract Syntax Trees (ASTs). We analyzed three levels of state canonicalization: Raw: No canonicalization; states represent exact code Basic: Removed variable names and literal values Rivers and Koedinger suggest a number of other measures, but these were much less applicable to our data Ordered: Also recursively sort all child nodes in the AST Effective serves as an upper bound for removing unimportant ordering information in code Thomas Price, Tiffany Barnes (NCSU) Hints in Open-Ended Problems G-EDM 2015 9 / 19
Analysis Canonicalization - Results Raw Basic Ordered Total States 2380 1781 1656 Percent Unique 97.5% 94.8% 92.8% Mean Non-Unique Freq. 3.44 3.95 2.82 Median Non-Unique Freq. 2 2 2 Mean % Path Unique 89.9% 83.0% 78.9% Standard Deviation (6.67) (10.5) (13.3) For comparison: In (Rivers and Koedinger 2012) , 300 out of 500 final solutions to a basic problem were identical after canonicalization. Thomas Price, Tiffany Barnes (NCSU) Hints in Open-Ended Problems G-EDM 2015 10 / 19
Analysis Visualizing Distances Constructed and plotted distance matrices Used Tree Edit Distance as a distance metric Lighter shades represent smaller distances Min-distance “path” through the matrix shown in green/yellow Red crosses indicate where subgoals were completed Thomas Price, Tiffany Barnes (NCSU) Hints in Open-Ended Problems G-EDM 2015 11 / 19
Analysis Quantifying Distances Mean Max Farthest 1 0.25 (0.27) 0.76 (0.56) 2.23 (0.75) 2 4.88 (3.93) 9.18 (5.74) 12.73 (6.10) 4 4.92 (2.77) 10.11 (3.69) 14.67 (4.77) 5 7.79 (1.32) 13.17 (1.72) 18.17 (1.72) 6 7.49 (1.11) 13.17 (0.98) 18.67 (1.75) How close are the closest students ? Defined the distance between two students as the mean or max distance they get from one another when solving an objective For each student, for each objective solved, find the paired student with minimum distance Thomas Price, Tiffany Barnes (NCSU) Hints in Open-Ended Problems G-EDM 2015 12 / 19
Discussion Conclusions Canonicalization The strongest canonicalization reduced the number of states by 30.4% Over 90% of states in the network were only reached by one student Important but insufficient by itself Connecting States Students maintain proximity when pursuing the same objective, on average within 8 tree edits, but slowly diverge This may not be close enough to connect states How do we take advantage of similar but distinct states? Thomas Price, Tiffany Barnes (NCSU) Hints in Open-Ended Problems G-EDM 2015 13 / 19
Discussion Limitations This was intentionally exploratory, using only 17 students Many data-driven techniques use hundreds, though the Hint Factory has been historically successful with much less data The open-ended programming assignment was very complex compared to those used in previous work It is difficulty to say where these results should be generalized Thomas Price, Tiffany Barnes (NCSU) Hints in Open-Ended Problems G-EDM 2015 14 / 19
Discussion Future Work Can we break problems down into sub-problems, where more overlap is likely? Are there more appropriate distance metrics we should be using How can we use output-based state representations to apps or games with non-deterministic results Thomas Price, Tiffany Barnes (NCSU) Hints in Open-Ended Problems G-EDM 2015 15 / 19
Discussion Thank You! Questions? Comments? Thomas Price, Tiffany Barnes (NCSU) Hints in Open-Ended Problems G-EDM 2015 16 / 19
References References I Catet´ e, V, K Wassell, and T Barnes (2014). “Use and development of entertainment technologies in after school STEM program”. In: Proceedings of the 45th ACM technical symposium on Computer science education , pp. 163–168. Eagle, Michael and Tiffany Barnes (2015). “Exploring Networks of Problem-Solving Interactions”. In: Proceedings of the Fifth International Conference on Learning Analytics And Knowledge (LAK) , pp. 21–30. Fossati, D, B Di Eugenio, and S Ohlsson (2009). “I learn from you, you learn from me: How to make iList learn from students.” In: Artificial Intelligence in Education (AIED) . Thomas Price, Tiffany Barnes (NCSU) Hints in Open-Ended Problems G-EDM 2015 17 / 19
References References II Hicks, A, B Peddycord III, and T Barnes (2014). “Building Games to Learn from Their Players: Generating Hints in a Serious Game”. In: Intelligent Tutoring Systems (ITS) , pp. 312–317. Jin, W, T Barnes, and J Stamper (2012). “Program representation for automatic hint generation for a data-driven novice programming tutor”. In: Intelligent Tutoring Systems (ITS) . Lazar, T and I Bratko (2014). “Data-Driven Program Synthesis for Hint Generation in Programming Tutors”. In: Intelligent Tutoring Systems (ITS) . Springer. Rivers, K and KR Koedinger (2012). “A canonicalizing model for building programming tutors”. In: Intelligent Tutoring Systems (ITS) . Thomas Price, Tiffany Barnes (NCSU) Hints in Open-Ended Problems G-EDM 2015 18 / 19
References References III Rivers, K and KR Koedinger (2013). “Automatic generation of programming feedback: A data-driven approach”. In: The First Workshop on AI-supported Education for Computer Science (AIEDCS 2013) . – (2014). “Automating Hint Generation with Solution Space Path Construction”. In: Intelligent Tutoring Systems (ITS) , pp. 329–339. Stamper, JC et al. (2013). “Experimental evaluation of automatic hint generation for a logic tutor”. In: Artificial Intelligence in Education (AIED) 22.1, pp. 3–17. Thomas Price, Tiffany Barnes (NCSU) Hints in Open-Ended Problems G-EDM 2015 19 / 19
Recommend
More recommend