e learning materials development based on abstract
play

E-Learning Materials Development Based on Abstract Analysis Using - PDF document

E-Learning Materials Development Based on Abstract Analysis Using Web Tools Tomofumi NAKANO and Yukie KOYAMA Nagoya Institute of Technology Gokiso-cho, Showa-ku, Nagoya 466-8555 Japan { nakano, koyama } @center.nitech.ac.jp Abstract. This study


  1. E-Learning Materials Development Based on Abstract Analysis Using Web Tools Tomofumi NAKANO and Yukie KOYAMA Nagoya Institute of Technology Gokiso-cho, Showa-ku, Nagoya 466-8555 Japan { nakano, koyama } @center.nitech.ac.jp Abstract. This study includes an original corpus of engineering journals and is part of the series of E-Learning & English for Specific Purposes (ESP) researches . Purposes (ESP) researches that includes an original corpus of engineering journals. In this paper the results of a corpus study will be presented, and a sample of the ESP e-learning materials being developed for graduate students in engineering will be shown. Abstracts were chosen for the corpus this time because students are likely to read many for their research, and eventually to have to produce their own. We prepare the 40,000-word corpus that consists of 263 abstracts from mechanical and electrical engineering journals. The corpus is analyzed using Wmatrix, which gives part-of-speech tags and semantic tags, and compares the results with those of the BNC written corpus sampler. Some special features found in the analysis are frequencies in seman- tic tags, part-of-speech tags, difference in the use of verbal forms and multi-words. As an application of the important features, we are develop- ing web-based materials which include the original abstracts with target items hyper-linked to various pages containing exercises, concordances, grammar explanations, a bilingual dictionary, etc. 1 Introduction In the field of English teaching, since Swales claimed in his epoch-making book, Genre Analysis [11], it has been widely accepted that ESP is one of the most ef- ficient approaches in terms of content appropriateness and students’ motivation. Since we teach at a university of technology, we first started the need analysis and found that reading, especially reading academic papers, is the most impor- tant skill for engineering students. After that, we started to compile an original corpus of engineering journal papers. This corpus is still growing both in its discipline coverage and its quantity. In this study the results of a corpus study and a sample of the ESP e-learning material will be shown. This material is developed for graduate students in engineering this time because students are likely to read many articles for their research, and eventually to have to produce articles of their own. Needless to say, abstracts play an enormously important role in the academic world, because by reading abstracts, in many cases, the readers decide whether or not they continue

  2. to read the full papers [4]. Another reason is that, in the English as a Foreign Language (EFL) situation, researchers often write their abstracts in English but the rest of the paper is written in their first language. This also raises the need for abstract analysis in EFL situations such as in Japan. While reading is the most important language skill for engineering students in Japan [6], they are hindered in reading academic papers by a lack of vocabu- lary (usually sub-technical or academic) [1] and by difficulty with the grammar of long, often complex sentences [5]. Therefore, this study focuses not only on the word lists but also on part-of-speech and semantic areas. An application introduced in this study also makes it possible to adjust the level of frequency and the degree of specification of the word compared to that in general corpus. As Morton points out, the problem for a student is not technical vocabulary but the difficult words of more general English [7]. Thanks to developments in ICT, E-Learning has become an ideal medium for language learning because of its flexibility and the autonomous learning op- portunities it provides inside and outside the classroom. As a new application of the results of relevant analysis, an e-learning material for engineering graduate students will be introduced in the rest of this paper. 2 Corpus Analysis 2.1 The method of analysis The 40,000-word corpus used in this study consists of 263 abstracts from mechan- ical and electrical engineering journals. This corpus is taken from an originally compiled 1,120,000-word corpus of full papers of these journals. We use Wma- trix [9] for the abstract analysis, not only because this software is very easy to handle but also it has a special function which can determine the characteristics of the corpus. Using the Wmatrix the corpus was automatically tagged, both by part-of-speech tags with CLAWS7 [3] and by semantic tags of USAS (UCREL Semantic Analysis System [8]). Moreover, Wmatrix provides frequency tables and log-likelihood tables of words and these two kinds of tags. Log-likelihood is a measurement which shows the difference in frequencies of two different cor- pora [2]. Therefore, the information given by log-likelihood is very important for ESP material development in order to grasp the characteristics of the ESP Corpus. The two corpora used in this study are the abstract corpus and BNC written corpus sampler which is the built-in corpus in Wmatrix. In Table 1, the left word list is ordered by the frequency and the right word list is ordered by the log-likelihood. While almost all words are general in the left list, the words in right list are specific to the abstracts or engineering papers. 2.2 Results of the analysis The lists of part-of-speech tags and semantic tags are shown in Table 2 and 3 respectively. Both lists show tag names, the frequency in the corpus, its frequency

  3. Table 1. Left: a word list ordered by the frequency. Right: a word list ordered by the log-likelihood. Abst. BNC log-like- rank word freq. rank word freq. rate freq. rate lihood 1 the 2459 1 the 2459 8.38 37283 3.79 1158.11 2 of 1205 2 of 1205 4.10 12817 1.30 1068.71 3 and 945 3 flow 119 0.41 10 0.00 772.82 4 a 683 4 model 103 0.35 20 0.00 621.26 5 in 525 5 results 88 0.30 31 0.00 488.40 6 is 522 6 energy 84 0.29 33 0.00 457.50 7 to 521 7 presented 63 0.21 9 0.00 392.35 8 for 396 8 method 59 0.20 16 0.00 340.94 9 are 262 9 fuel 59 0.20 22 0.00 324.30 10 with 260 10 paper 93 0.32 174 0.02 323.55 11 this 213 11 power 71 0.24 67 0.01 315.47 12 by 190 12 using 91 0.31 189 0.02 302.33 13 that 183 13 analysis 50 0.17 10 0.00 300.55 14 an 172 14 combustion 42 0.14 1 0.00 287.94 15 on 156 15 by 190 0.65 1293 0.13 286.05 16 be 149 16 performance 58 0.20 39 0.00 282.24 17 at 128 17 based on 55 0.19 31 0.00 278.82 18 from 127 18 experimental 43 0.15 4 0.00 277.34 19 as 120 19 conditions 61 0.21 61 0.01 266.37 20 flow 119 20 gas 70 0.24 106 0.01 265.30 Table 2. A part-of-speech tag list POS Abst. BNC log-like- tag freq. rate freq. rate lihood NN1 7297 24.86 147395 15.22 1447.40 JJ 3481 11.86 74927 7.74 533.93 FO 222 0.76 2050 0.21 233.75 VVN 1205 4.10 24675 2.55 226.88 AT 2483 8.46 67521 6.97 84.13 VBZ 522 1.78 11171 1.15 82.10 IO 1204 4.10 30286 3.13 78.32 NN2 2064 7.03 55665 5.75 75.84 IF 398 1.36 8765 0.91 55.09 VVZ 350 1.19 7602 0.79 51.59 VBR 262 0.89 5435 0.56 46.88 . . .

  4. Table 3. A semantic tag list semantic Abst. BNC log-like- tag. freq. rate freq. rate lihood meaning X4.2 444 1.51 3108 0.32 640.06 Mental object :- Means, method O1.3 167 0.57 300 0.03 586.57 Substances and materials generally: Gas O2 610 2.08 6100 0.63 577.74 Objects generally O3 204 0.69 651 0.07 537.88 Electricity and electrical equipment A1.5.1 308 1.05 1965 0.20 485.85 Using N3.1 130 0.44 413 0.04 343.66 Measurement: General O1 151 0.51 689 0.07 314.64 Substances and materials generally M4 161 0.55 843 0.09 301.64 Shipping, swimming etc. O4.6 78 0.27 110 0.01 301.46 Temperature X2.4 252 0.86 2176 0.22 288.38 Investigate, examine, test, search N2 143 0.49 760 0.08 264.68 Mathematics . . . rate, the frequency in BNC written corpus sampler, its frequency rate and the log-likelihood, and these are sorted by the log-likelihood. Examining the results shown in the tables, the findings are as follows: 1. Semantic areas such as objects, mental objects (method and means), sub- stances & materials (gas, solid and general), measurement (length & height, distance, size and volume), comparison, and evaluation occur much more frequently. 2. Parts of speech appearing more often are common nouns, the past participle, general adjectives, the definite article, ‘of’, ‘for’, ‘is’, and ‘are’. 3. In the use of verbal forms, the frequency of past participles is significant as found in the journal corpus [5], while the occurrence of past of lexical verbs and infinitive forms is much less compared to BNC written sampler. 4. Multi-words appearing more frequently are ‘based on’, ‘due to’, ‘used to’, ‘such as’, ‘carried out’, ‘as well’, ‘in order to’, ‘in terms of’, ‘in addition’ and ‘according to’. 3 Material development Through such analysis of corpora data, features of special importance to our students can be selected. Using automatic item generation allows learners to work with different authentic texts each time. Materials underdevelopment include the original abstracts with target items hyper-linked to various pages containing concordances, grammar explanations, a bilingual dictionary, etc. The outline of material is as follows: – An abstract is used as the base of this material, whose objective is to enhance the ability of abstract reading comprehension.

Recommend


More recommend