computationally modeling the impact of
play

Computationally Modeling the Impact of Zarah Weiss, Anja - PowerPoint PPT Presentation

The Impact of Complexity and Accuracy on Human Essay Grading Computationally Modeling the Impact of Zarah Weiss, Anja Riemenschneider, Pauline Schrter, and Task-Appropriate Language Complexity Detmar Meurers Introduction and Accuracy on


  1. The Impact of Complexity and Accuracy on Human Essay Grading Computationally Modeling the Impact of Zarah Weiss, Anja Riemenschneider, Pauline Schröter, and Task-Appropriate Language Complexity Detmar Meurers Introduction and Accuracy on Human Grading of Outline Background German Essays The Abitur Data Our Data Task-Effects Complexity Vectors Building Complexity Vectors Zarah Weiss Anja Riemenschneider Task-Wise Vector Differences Pauline Schröter Detmar Meurers Similarity-Based Ranking Experiment Department of Linguistics, University of Tübingen Set-Up Results IQB, Humboldt-Universität zu Berlin Discussion Conclusion References Appendix 14th Workshop on Innovative Use of NLP for Building Educational Applications Florence, Italy, August 2nd 2019 1 / 27

  2. The Impact of Introduction Complexity and Accuracy on Human Essay Grading Zarah Weiss, Anja Riemenschneider, Pauline Schröter, and Detmar Meurers Introduction ◮ Complexity and accuracy core components in national Outline Background educational standards for language arts and literacy The Abitur Data (CCSSO 2010; KMK 2012) Our Data Task-Effects ◮ Doubts about teachers’ ability to evaluate complexity and Complexity Vectors Building Complexity Vectors accuracy of texts (CCSSO 2010; Vögelin et al. 2019) Task-Wise Vector Differences Similarity-Based Ranking ◮ Assessed manually in German Abitur Experiment → Official school-leaving state examination Set-Up Results → Determines admission to university Discussion Conclusion ◮ Study teachers’ grading behavior in authentic Abitur data References Appendix 2 / 27

  3. The Impact of Research Questions and Hypotheses Complexity and Accuracy on Human Essay Grading Zarah Weiss, Anja Riemenschneider, Pauline Schröter, and Detmar Meurers How do complexity and accuracy influence teachers’ Introduction Outline ◮ language performance grades (partial score)? Background The Abitur Data ◮ content grades (partial score)? Our Data Task-Effects ◮ overall grades (composite score)? Complexity Vectors Building Complexity Vectors Task-Wise Vector Differences It should be the case that complexity and accuracy Similarity-Based Ranking Experiment ◮ strongly affect language performance grades Set-Up Results ◮ do not affect content grades Discussion Conclusion ◮ weakly affect overall grades References Appendix 3 / 27

  4. The Impact of Education System in the U.S. and Germany Complexity and Accuracy on Human Essay Grading Zarah Weiss, Anja Riemenschneider, Pauline Schröter, and Detmar Meurers Introduction Outline Background U.S. System German System The Abitur Data Our Data Education standard CCSSO KMK Task-Effects Complexity Vectors High-stakes testing repeatedly final examination Building Complexity Vectors Qualitative complexity teachers teachers Task-Wise Vector Differences Similarity-Based Ranking Quantitative complexity automatic teachers Experiment Automatic Testing industry yes no Set-Up Results Discussion Conclusion References Appendix 4 / 27

  5. The Impact of German Abitur , Federal States, and the IQB Complexity and Accuracy on Human Essay Grading Zarah Weiss, Anja Riemenschneider, Pauline Schröter, and Detmar Meurers Introduction Outline ◮ Abitur = official state examination required for university Background The Abitur Data ◮ Education is a matter of the German federal states Our Data Task-Effects ◮ The Institute for Educational Quality Improvement (IQB) Complexity Vectors Building Complexity Vectors → monitors schools’ adherence to educational standards Task-Wise Vector Differences → provides an official pool of tasks for the Abitur Similarity-Based Ranking Experiment → Includes templates for performance requirements Set-Up Results ◮ States may choose and partially alter tasks from the pool Discussion Conclusion References Appendix 5 / 27

  6. The Impact of The Data Complexity and Accuracy on Human Essay Grading Zarah Weiss, Anja Riemenschneider, Pauline Schröter, and Detmar Meurers Introduction Outline Background ◮ Graded essays from German Abitur in 2017 ( N = 344) The Abitur Data Our Data ◮ Subject: German literature and language examination Task-Effects Complexity Vectors ◮ Collected across German states and digitized by the IQB Building Complexity Vectors Task-Wise Vector Differences ◮ Texts respond to one of four different task prompts Similarity-Based Ranking → 2 × interpretation of literature (IL-1, IL-2) Experiment Set-Up → 2 × material-based argumentation (MA-1, MA-2) Results Discussion Conclusion References Appendix 6 / 27

  7. The Impact of Task-Effects Complexity and Accuracy on Human Essay Grading Zarah Weiss, IL−1 (N=116) IL−2 (N=31) Anja Riemenschneider, Pauline Schröter, and 15 Detmar Meurers 10 Introduction Outline Original Overall Grade 5 Background The Abitur Data 0 Our Data MA−1 (N=83) MA−2 (N=110) Task-Effects Complexity Vectors 15 Building Complexity Vectors 10 Task-Wise Vector Differences Similarity-Based Ranking 5 Experiment 0 Set-Up Results 1000 2000 3000 1000 2000 3000 Discussion Text Length in Words Conclusion ◮ Task prompts request and elicit texts of different length References Appendix ◮ Influences correlation of text length and overall grade ◮ Task-effects are known to influence linguistic complexity (Alexopoulou et al. 2017; Yoon & Polio 2016) 7 / 27

  8. The Impact of Selecting and Representing Writing Complexity Complexity and Accuracy on Human Essay Grading Zarah Weiss, Anja Riemenschneider, Pauline Schröter, and Detmar Meurers Introduction Outline ◮ Select authentic texts of more and less task-appropriate Background overall linguistic complexity for the experiment ( ± ALC) The Abitur Data Our Data ◮ Two-fold strategy: Task-Effects Complexity Vectors 1. Build document vector representations capturing relevant Building Complexity Vectors dimensions of complexity Task-Wise Vector Differences Similarity-Based Ranking 2. Create a ranking of these vector representations to Experiment identify more and less complex documents Set-Up Results ◮ Separately for each task to account for task-differences Discussion Conclusion References Appendix 8 / 27

  9. The Impact of Step 1: Creating Complexity Vectors Complexity and Accuracy on Human Essay Grading Zarah Weiss, Anja Riemenschneider, Pauline Schröter, and Detmar Meurers Introduction Outline Background The Abitur Data Our Data Doc 1 Doc 2 ... Doc N Doc 1 Doc 2 ... Doc N Automatic Task-Effects Doc 1 Doc 2 ... Doc N Theory-driven Data-driven Language .23 .67 ... .43 .23 .67 ... .43 Complexity Vectors Feature Feature .23 .67 ... .43 IL-1 Complexity IL-1 ... ... ... ... Selection Selection Doc 1 ... ... ... ... Building Complexity Vectors Assessment ... ... ... ... Doc 1 -.44 .23 ... -.12 Task-Wise Vector Differences -.44 .23 ... -.12 -.44 .23 ... -.12 Similarity-Based Ranking Student Experiment Building Complexity Vectors Complexity Vectors Essays Set-Up Results Discussion Conclusion References Appendix 9 / 27

  10. The Impact of Automatic Complexity Assessment Complexity and Accuracy on Human Essay Grading Zarah Weiss, Anja Riemenschneider, Pauline Schröter, and Detmar Meurers Introduction ◮ Automatically extract 320 complexity features (Weiss 2017) Outline Background ◮ Successfully used to assess German readability and The Abitur Data Our Data L1/L2 development (Weiss & Meurers 2018, 2019, in press) Task-Effects Complexity Vectors ◮ Measures of human processing, language use, and Building Complexity Vectors Task-Wise Vector Differences lexical, morphological, syntactic, and discourse complexity Similarity-Based Ranking ◮ Based on SLA research where Complexity, Accuracy, Experiment Set-Up and Fluency are dimensions of language performance Results Discussion (Bulté & Housen 2012; Wolfe-Quintero et al. 1998) Conclusion References Appendix 10 / 27

  11. The Impact of Theoretically-Motivated Complexity Features Complexity and Accuracy on Human Essay Grading Zarah Weiss, Anja Riemenschneider, Pauline Schröter, and Detmar Meurers Introduction ◮ Education standards name examples of welcome writing Outline Background strategies to make language more complex (KMK 2012) The Abitur Data Our Data ◮ Includes argumentation structure, lexical complexity, and Task-Effects Complexity Vectors syntactic complexity (as well as accuracy) Building Complexity Vectors Task-Wise Vector Differences ◮ Register and norm-appropriateness → academic language Similarity-Based Ranking (Hennig & Niemann 2013; Snow & Uccelli 2009) Experiment Set-Up ◮ We identify 75 theoretically-motivated complexity features Results Discussion that are extracted by the system Conclusion References Appendix 11 / 27

Recommend


More recommend