Presenting the new General Service List: Rationale, method, implications Vaclav Brezina, Dana Gablasova
Overview 1. Vocabulary lists: So what? 2. New GSL: compilation procedure 3. Basic English vocabulary: stable and new words 4. Looking ahead 2
West’s GSL: Why does it matter today? 1) Pedagogical purpose 2) Research purpose
1) Pedagogical purpose Which of these words are useful for learners of English? 4
1) Pedagogical purpose (cont.) Nation (1990, 2001), Nation & Newton (1997: 239) 98% 5
2) Research purpose Ijima & Horie (2010); Beglar (2010); Matsuoka & Hirsh (2010); He & Sirinthon (2010); Coxhead, Stevens & Tinkle (2010); Matsumo, Tsutsumi, Matsuo & Gilbert (2010); Chon (2011); Lubliner & Hiebert (2011); Wang, Shen & Masataki (2011); Anderson & Platten (2011); Smith (2011); Millar, Budgell & Kwong (2011); Kokkinakis, Skoldberg & Henriksen (2012); Parent (2012); Fukushima, Watanabe, Kinjo, Yoshihara & Suzuki (2012); Webb & Nation (2012); Cuningham (2012); Coxhead (2012); Budgell (2013)
So why do we need a NEW GSL? 7
milkmaid timely footman shilling stoppage telegraph invaluable West’s (1953 [1936]) GSL 8
So why do we need a NEW GSL? West’s selection based on: 1) Frequency 2) Subjective criteria: Ease of learning • Necessity • Neutrality • 3) One corpus (1936) 4) Organising principle: word families
Task 1: types, lemmas, word families dog, dogs, develop, develops, developed, developing, undeveloped, underdeveloped, development, developments, value, values, valuable, invaluable, train, trains, trained, training, trainer, trainers 10
Words: types, lemmas, word families 1. dog, dogs 1. dog 2. develop, develops, developed, developing, 2. dogs undeveloped, underdeveloped, development, 3. develop developments 1. dog, dogs 4. develops 3. value, values, valuable, invaluable 5. developed 2. develop, develops, developed, developing 6. developing 3. developing (ADJ) 4. train, trains, trained, training, trainer, trainers 7. undeveloped 4. undeveloped 8. underdeveloped 5. underdeveloped 9. development 6. development, developments 10. developments 7. value, values 11. value 12. values 8. valuable 13. valuable 9. invaluable 14. invaluable 10. train, trains 15. train 11. train, trains, trained, training 16. trains 12. training (NOUN) 17. trained 18. training 13. trainer, trainers 19. trainer 20. trainers 11
Our wordlist Quantitative paradigm: 1) Frequency ARF 2) Dispersion 3) Stability across corpora Organising principle: Lemma
Method: RQs • RQ1: Is there a substantial overlap between frequent lexical items in different general language corpora? • RQ2: What is the lexical core common to different language corpora? 13
Method: Data LOB BNC BE06 EnTenTen12 Corpora 1.14 million 112 million 1.15 million 12.97 billion Tokens Period 1961 1990s 2005-7 2012 Variety of English British British British International Spoken component NO YES (10%) NO NO 2k words of each 40-50k words of 2k words of each whole documents Sample size text each text text included No. of texts 500 4,049 500 21.55 million Imaginative (20%) and informative www – a wide range Sampled text-types 15 genres of writing 15 genres of writing (70%) writing + of documents speech (10%) 14
Method: Procedure 1. Creation of wordlists based on the four corpora ( LOB, BNC, BE06, EnTenTen12 ). 2. Comparison of wordlists pairwise (RQ1). 3. Identification of a common lexical core among the four wordlists and extraction of the shared items (RQ2). 4. Identification of lexical items reflecting recent vocabulary changes in the English language based on BE06 and EnTenTen12 . 15
Results Corpora LOB-3000 BNC-3000 BE06-3000 EnTenTen12-3000 LOB-3000 x 2,497 (83.2%) 2,458 (81.9%) 2,352 (78.4%) r s =.870, p<.001 r s =.832, p<.001 r s =.762, p<.001 BNC-3000 x x 2,514 (83.8%) 2,428 (80.9%) r s =.870, p<.001 r s =.819, p<.001 BE06-3000 x x x 2,518 (83.9%) r s =.826, p<.001 EnTenTen12-3000 x x x x 16
Results (cont.) Word class Overlap nouns 1009 verbs (+ modals) 488+10 adjectives 317 71% adverbs 166 conjunction & prepositions 63 pronouns 22 other (gram. words) 47 TOTAL 2,122 Lexical innovations Examples New words (forms) Internet, website, online, email New meanings/functions of old user, via, network, client, mobile, file, web words Old words with recent medium, phone, key, technology, guy, kid, environment, prominence computer, movie, definitely 17
new-GSL 2,122 Lexical core items new-GSL 2,496 lemmas 374 Lexical innovations items Wordlist Number of items Types Lemmas Word families new-GSL 5,115 2,496 - West’s GSL 7,826 4,114 2,000 18
Task 2 A = first 500 ; B = 501 – 1000 ; C = 1001-2500 NOUNS ADJECTIVES Word rank Word rank afraid cake cloud bad colour blue computer good Internet key letter new lover old man organic people red prayer stupid sex society time TV woman 19
NOUNS ADJECTIVES new-GSL new-GSL Word Word rank rank 1. TIME 45 1. GOOD 73 2. PEOPLE 79 2. NEW 87 3. MAN 105 3. OLD 160 4. WOMAN 198 4. BAD 304 5. society 543 5. key 660 6. letter 546 6. red 712 7. Internet 701 7. blue 1018 1878 8. colour 724 8. afraid 9. organic 2367 9. computer 861 10. sex 1422 10. stupid 2438 11. TV 1484 12. cloud 2385 13. lover 2429 14. prayer 2454 15. cake 2457 20
21
Summary • LOB, BNC, BE06, EnTenTen12 • 71 % overlap: lexical core + new items • new GSL: 1) frequency 2) dispersion 3) stability across corpora • new GSL same coverage with half of the lemmas 22
Looking ahead Going back to the two target groups of users: a) Practitioners: creating usable interface b) Researchers: American supplement (methodological questions/decisions to make) 23
Text box: User input OUTPUT Lexical complexity Tree tagger new-GSL 24
Thank (1081) you (25) ! 25
Recommend
More recommend