Broad Linguistic Modeling is Beneficial for German L2 Proficiency Assessment Zarah Weiß and Detmar Meurers Eberhard Karls University, T¨ ubingen Learner Corpus Research Conference Bolzano/Bozen, 5-7 October 2017 October 6th, 2017 1
Outline 1 Introduction 2 Merlin Data 3 Study 1 4 Study 2 5 Conclusion 2
Introduction L2 Complexity • Dimensions of L2 performance: C omplexity, A ccuracy, F luency (Housen, Vedder, and Kuiken 2012) • Complexity: elaborateness, variedness, and inter-relatedness of a system (Ellis and Barkhuizen 2005; Rescher 1998) • Research on operationalization of complexity into implementable features (Crossley, Kyle, and McNamara 2016; Hancke, Vajjala, and Meurers 2012; Kyle 2016; Lu and Ai 2015) • Assessment of e.g. proficiency, readability, essay scoring 3
Introduction L2 Complexity • Increasing number of diverse complexity measures available due to advances in NLP technology • Made available through text analysis systems, e.g. CohMetrix (Crossley and McNamara 2012), Linguistic Analysis tool (Kyle 2016), CTAP (Chen and Meurers 2016) • Most studies use only few established measures of linguistic elaborateness → Include measures from various theoretical backgrounds → Include measures of variedness as well as elaborateness 4
Introduction Complexity Analysis System • 398 measures of language elaborateness and variedness extracted by elaborate NLP tool chain • To be integrated in CTAP (Chen and Meurers 2016) by end of 2017 • Cover domains of 1 Theoretical linguistics (syntax, lexico-semantics, morphology), 2 Discourse & encoding of meaning, 3 Language use, 4 Human language processing 5
Introduction Measures of the Linguistic System Theoretical Linguistics • Lexio-Semantic: lexical diversity, variation, and density; semantic relations (Lu 2011; McCarthy and Jarvis 2007) • Syntactic: dependent clause ratios, modifier ratios, complex NP, periphrastic constructions, etc. (Kyle 2016; Wolfe-Quintero, Inagaki, and Kim 1998) • Morphological: inflection, derivation, and composition (Fran¸ cois and Fairon 2012; Hancke, Vajjala, and Meurers 2012) Discourse & Encoding of Meaning • Connectives • Local and global overlap of linguistic material and co-referential expressions (pronouns, articles, etc.) • Local transitions of grammatical roles (Barzilay and Lapata 2008; Todirascu et al. 2013) 6
Introduction Psycho-Linguistic Measures Language Use • Word frequencies (dlexDB, SUBTLEX-DE, Google Books 2000) • Approximate age of active use based on Karlsruhe Chilrend’s Texts corpus (Lavalley, Berkling, and St¨ uker 2015) Human Language Processing • Cognitive processing cost based on Dependency Locality Theory (DLT) by Gibson 2000; Shain et al. 2016 • Argument-verb distances, dependency lengths 7
Merlin Data Overview • 1,033 German L2 texts in German section of Merlin corpus (Abel et al. 2013) • Elicited in official standardized language certification tests for 5 CEFR test levels (A1-C1) • Rated by human experts trained on CEFR-based Merlin rating grid by (Wisniewski et al. 2013) • Holistic proficiency scores range from A1 to C2 8
Merlin Data Distribution across Proficiency Scores 300 200 META_CEFR_LevelOfTest Number of Essays a1 a2 b1 b2 c1 100 0 a1 a2 b1 b2 c1 c2 Overall CEFR Score Figure: Number of essays per holisitic proficiency score grouped by test level. 9
Study 1 Methods • Analysis of overall proficiency scores (A1 - C1+C2) on non-normalized data • SVM classification with SMO with linear kernel (K=1) • Feature ranking with information gain → Implementations from WEKA machine learning toolkit (Frank, Hall, and Witten 2016; Hall et al. 2009) • Training and testing with 10 repetitions of 10-folds cross-validation • Correlation analysis with Pearson rank correlation for − 0 . 7 ≤ r ≤ 0 . 7 10
Study 1 Classification Results Set n Pre. Rec. F1 Majority 1 10.3 32.0 15.6 All 366 67.6 68.2 67.4 IG 100 100 67.9 71.0 68.5 Language Use 54 54.6 59.9 56.6 HLP 32 49.4 54.7 51.7 Discourse 84 58.0 63.4 59.8 Clausal 147 60.3 64.3 61.3 Phrasal 28 57.0 62.1 58.8 Lex/Sem 66 66.0 69.1 66.3 Morph 43 56.2 61.5 58.8 Table: Precision, recall, and F1 scores across feature sets for proficiency classification with 10 ∗ 10-CV SMO (K=1). 11
Study 1 Classification Results Obs. \ Pred. A1 A2 B1 B2 C A1 24 1 0 0 32 A2 20 61 0 0 225 B1 1 56 54 1 219 B2 0 4 44 14 231 C 0 0 1 9 36 Table: Averaged confusion matrix for IG 100 model. 12
Most Informative Measures Information Gain Ranking # Measure # Measure 1 Number of tokens 36 SD of verb cluster size 2 Corrected type token ratio 38 VZ per sentence 7 Sum of longest dependencies 39 P ( not → object ) per transition per sentence 14 Dependent clauses with 40 Total integration cost at conjunction per dep. clause finite verb per finite verb 15 Cvg. of NP modifier types 42 Syllables per token 16 Dep. clauses per sentence 43 HDD 25 P ( not → not ) per transition 49 Cvg. of verb cluster size 26 Verbs per sentence 52 Cvg. of verb cluster types 30 VP modifiers per VP 56 P ( object → not ) per transition � NT per word 34 57 Words in VPs per VP Table: Top 10 uncorrelated measures. Table: Top 20 uncorrelated measures. 13
Most Informative Measures Information Gain Ranking: Clausal Complexity # Measure # Measure 1 Number of tokens 36 SD of verb cluster size 2 Corrected type token ratio 38 VZ per sentence 7 Sum of longest dependencies 39 P ( not → object ) per transition per sentence 14 Dependent clauses with 40 Total integration cost at conjunction per dep. clause finite verb per finite verb 15 Cvg. of NP modifier types 42 Syllables per token 16 Dep. clauses per sentence 43 HDD 25 P ( not → not ) per transition 49 Cvg. of verb cluster size 26 Verbs per sentence 52 Cvg. of verb cluster types 30 VP modifiers per VP 56 P ( object → not ) per transition � NT per word 34 57 Words in VPs per VP Table: Top 10 uncorrelated measures. Table: Top 20 uncorrelated measures. 13
Most Informative Measures Information Gain Ranking: Phrasal Complexity # Measure # Measure 1 Number of tokens 36 SD of verb cluster size 2 Corrected type token ratio 38 VZ per sentence 7 Sum of longest dependencies 39 P ( not → object ) per transition per sentence 14 Dependent clauses with 40 Total integration cost at conjunction per dep. clause finite verb per finite verb 15 Cvg. of NP modifier types 42 Syllables per token 16 Dep. clauses per sentence 43 HDD 25 P ( not → not ) per transition 49 Cvg. of verb cluster size 26 Verbs per sentence 52 Cvg. of verb cluster types 30 VP modifiers per VP 56 P ( object → not ) per transition � NT per word 34 57 Words in VPs per VP Table: Top 10 uncorrelated measures. Table: Top 20 uncorrelated measures. 13
Most Informative Measures Information Gain Ranking: Lexical Complexity # Measure # Measure 1 Number of tokens 36 SD of verb cluster size 2 Corrected type token ratio 38 VZ per sentence 7 Sum of longest dependencies 39 P ( not → object ) per transition per sentence 14 Dependent clauses with 40 Total integration cost at conjunction per dep. clause finite verb per finite verb 15 Cvg. of NP modifier types 42 Syllables per token 16 Dep. clauses per sentence 43 HDD 25 P ( not → not ) per transition 49 Cvg. of verb cluster size 26 Verbs per sentence 52 Cvg. of verb cluster types 30 VP modifiers per VP 56 P ( object → not ) per transition � NT per word 34 57 Words in VPs per VP Table: Top 10 uncorrelated measures. Table: Top 20 uncorrelated measures. 13
Most Informative Measures Information Gain Ranking: Cohesion # Measure # Measure 1 Number of tokens 36 SD of verb cluster size 2 Corrected type token ratio 38 VZ per sentence 7 Sum of longest dependencies 39 P ( not → object ) per transition per sentence 14 Dependent clauses with 40 Total integration cost at conjunction per dep. clause finite verb per finite verb 15 Cvg. of NP modifier types 42 Syllables per token 16 Dep. clauses per sentence 43 HDD 25 P ( not → not ) per transition 49 Cvg. of verb cluster size 26 Verbs per sentence 52 Cvg. of verb cluster types 30 VP modifiers per VP 56 P ( object → not ) per transition � NT per word 34 57 Words in VPs per VP Table: Top 10 uncorrelated measures. Table: Top 20 uncorrelated measures. 13
Most Informative Measures Information Gain Ranking: Human Language Processing # Measure # Measure 1 Number of tokens 36 SD of verb cluster size 2 Corrected type token ratio 38 VZ per sentence 7 Sum of longest dependencies 39 P ( not → object ) per transition per sentence 14 Dependent clauses with 40 Total integration cost at conjunction per dep. clause finite verb per finite verb 15 Cvg. of NP modifier types 42 Syllables per token 16 Dep. clauses per sentence 43 HDD 25 P ( not → not ) per transition 49 Cvg. of verb cluster size 26 Verbs per sentence 52 Cvg. of verb cluster types 30 VP modifiers per VP 56 P ( object → not ) per transition � NT per word 34 57 Words in VPs per VP Table: Top 10 uncorrelated measures. Table: Top 20 uncorrelated measures. 13
Recommend
More recommend