development of an esp e learning tool using in house
play

Development of an ESP E-Learning Tool Using In-House Corpora Yukie - PDF document

Development of an ESP E-Learning Tool Using In-House Corpora Yukie KOYAMA, Tomofumi NAKANO and Chikako MATSUURA Nagoya Institute of Technology Gokiso-cho, Showa-ku, Nagoya 466-8555 Japan { koyama, nakano and chikako } @center.nitech.ac.jp


  1. Development of an ESP E-Learning Tool Using In-House Corpora Yukie KOYAMA, Tomofumi NAKANO and Chikako MATSUURA Nagoya Institute of Technology Gokiso-cho, Showa-ku, Nagoya 466-8555 Japan { koyama, nakano and chikako } @center.nitech.ac.jp Abstract. This study introduces a methodology for developing an e- learning tool by using corpora and computer software for linguistic anal- ysis. The corpora compiled for this study are of journal articles from two engineering fields, and of articles from general science magazines. Each corpus, consisting of approximately 500000 words, is tagged and parsed. Analysis of these corpora reveals that the past participle form has high frequency among verb forms. Therefore, sentences including this form are extracted, and items asking the main verb were made with them. Students answered for 37,333 sentences in total. The analysis of answers shows that students tend to answer incorrectly for items in longer sen- tences, and items whose main verb is located toward the end. Individual analysis of each student was also conducted. This kind of analysis can help language teachers to provide targeted practice for students, using authentic, discipline-specific textual data. 1 Introduction In the field of English education, English for Specific Purposes (ESP) has been recognized as an important aspect to realize effective learning among learners with language needs relating to a particular discipline[1][2], and a needs analysis is crucial in order to decide on ESP teaching content and methodology [3]. In a previous study, various kinds of needs analyses were conducted to confirm that field specific material is necessary for English for Science and Technology (EST) learners [4]. As the base of genre analysis, text analysis of genre-specific texts has to be targeted. Fortunately, with the development of information technology and high spec PCs, compiling and analyzing a good-sized corpus has become possible for the individual researcher. In this context, utilizing a corpus-based approach for ESP is highly recommended among researchers in English education [5][6][7]. More- over, since all the text data in a corpus is stored in digital form, it is easy to utilize it for e-learning. Developing an e-learning tool with items extracted from a corpus has many advantages, including the use of authentic (if somewhat de- contextualized) materials that can be chosen from specific disciplines and genres. In addition, it is much easier to get sufficient feedback from learners if tasks are

  2. conducted on a web-site. In this context, this study describes the methodol- ogy of the development of a corpus-based e-learning tool and its rationale for EST learners. In this case, the targeted learners are undergraduate students of engineering in Japan. 2 Method 2.1 Compiling corpora The first digital corpus was the Brown Corpus of American English, whose size is approximately one million words [8]. However, at present, the size of the two representative corpora in the world is far larger than those in the early era; The Bank of English consists of 450 million words (in 2002 January) and the British National Corpus of approximately 100 million words. Compared to these general corpora, the corpora compiled for this study are much smaller. However, for a specific purpose, a small corpus can also give us sufficient linguistic data [6]. The data was taken from CD-ROMs, on-line journals and internet homepages. As the first stage of this study, two corpora were compiled: one is a corpus of research articles in academic journals (J-corpus) separated into the two subcorpora of mechanical and electrical engineering fields. Articles were selected from 11 journals in these fields. The second corpus is made up of articles in a general science and technology magazine (M-Corpus). The source for this was articles of Scientific American (1997 to 2001). Information on the corpora is given in Table 1. Table 1. Size and Kinds of Corpora Source of corpus Engineering Journals General Scientific (J-Corpus) (M-Corpus) Electrical Mechanical Size of corpus Engineering Engineering number of sentences 25,295 23,798 25,735 number of words 560,014 567,206 597,208 2.2 Linguistic analysis of the corpora by using software In order to analyze and find linguistic characteristics, a tagger and a parser were used. A tagger marks each word in terms of part of speech, such as adjective, verb and noun. Brill’s tagging software is used in this study [9]. A parser conducts syntactic analysis of a sentence, which makes it possible to define a phrase in a sentence as the subject, the object or the main verb. The parser used here is Apple Pie Parser, which is based on Penn Tree Bank [10].

  3. The following grammatical characteristics, in terms of part of speech, were revealed by tagging: past participle patterns appear most frequently in the J- corpus, in both Electrical and Mechanical Engineering articles. However, in the M-corpus the frequency of the infinitive form is the highest of all. This finding clearly backs up the importance of the empirical judgment of both engineering and English teachers that students often have difficulties distinguishing between the past participle and simple past when they read journal articles, especially in the case of the past participle form being used as the postmodifier of a noun. Table 2. Frequency List of Part-of-Speech (percentage of the whole corpus) Verb form Infinitive Simple Present Past Present Present past progressive participle (1st/2nd person (3rd person Corpus singular) singular) Electrical 2.39 1.27 2.66 4.37 1.63 3.11 Engineering Mechanical 2.15 0.99 2.69 4.25 1.58 3.23 Engineering General Science 3.79 1.85 2.82 2.64 2.26 2.50 2.3 Designing an appropriate e-learning tool From the linguistic characteristics found in the above section, it was decided that the web-learning tool introduced in this study should be made with sentences that include a past participle form as postmodifier and whose main verb is in the simple past. The task is to choose the main verb of the sentence. The method of item making is as follows. 1. Digitally separating one sentence from another in order to make the next process possible. 2. Analyzing the J-corpus and M-corpus with two kinds of software: syntac- tic analysis with Apple Pie Parser and part -of-speech tagging with Brill’s Tagger. 3. Based on the above analyses, extracting sentences with past participle forms. 4. In order to make items comparatively difficult, extracting sentences in past tense from those identified in stage 3. Excluding sentences with passive forms and past perfect forms so that items should be of the appropriate difficulty level for students. 5. Under the above conditions, sentences for item making were extracted and chosen randomly by programming. Each student was asked to give answers to at least one hundred items in total. Participants were first and second year undergraduate students of several engineering departments and were given extra points in an English course if they completed the task.

  4. The following are some examples of the items with answers in <<>> , while the actual CGI page is shown in Figure 1. Choose the main verb of the sentence. If there are two of them, choose the first one. 1. A team of astronomers led by John K. Webb of the University of New South Wales has found the first hint that the laws of physics were slightly different billions of years ago. << has >> 2. Both the introduced analytical and numerical approaches give program users information about the approximation involved in the integration method. << give >> 3. The gas density is negligible compared to liquid density. << is >> 4. Researchers have exploited this equipment in large-scale field studies, de- signed to gauge just where and how people are exposed to potentially dan- gerous chemicals. << have >> Fig. 1. Task Page

  5. 3 Results 3.1 General tendencies in item difficulty The cumulative number of items solved by students was 37,333, the number of students was 218, and the number of different items was 5,868. Figure 2 gives a graphic representation of the analyses in terms of error rate (shown on the vertical axis), length of the sentence (in words, shown on the foreground horizontal axis), and location of the main verb in the sentence (0: beginning, 1: end). The figure shows a peak around the word length of 25. Another thing shown in the graph is that the error rate is lower when the location of the main verb is near the beginning and gradually rises toward the end of the sentence. This result proves that the longer the item sentence is and the nearer toward the end the main verb is located, the more difficult an item becomes for the students who participated in the study. 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 1 0.8 0.6 0 5 10 0.4 15 20 25 0.2 30 35 40 45 0 Fig. 2. Error Rate, Sentence Length in Words, and Location of Main Verb in Sentence. 3.2 Analysis of answers by an individual student using C4.5 In the previous section the general tendencies of the entire data were analyzed. In this section an individual student is randomly taken as an example to be analyzed by a machine learning tool called C4.5 [11]. Sample results are shown:

Recommend


More recommend