Corpora Processing and Computational Scaffolding for a Web-based ...

Liou, Chang, Chen, Lin, Liaw, Gao, Jang, Yeh, Chuang, and You

77

Corpora Processing and Computational Scaffolding for a Web-based English Learning Environment: The CANDLE Project HSIEN-CHIN LIOU JASON S. CHANG

National Tsing Hua University, Hsinchu, Taiwan

HAO-JAN CHEN CHIH-CHENG LIN

National Taiwan Normal University, Taipei, Taiwan

MEEI-LING LIAW

National Taichung University, Taichung, Taiwan

ZHAO-MING GAO

National Taiwan University, Taipei, Taiwan

JYH-SHING ROGER JANG YULI YEH

National Tsing Hua University, Hsinchu, Taiwan

THOMAS C. CHUANG

Vanung University, Taoyuan, Taiwan

GEENG-NENG YOU

National Taichung Institute of Technology, Taichung, Taiwan ABSTRACT This paper describes the development of an innovative web-based environment for English language learning with advanced data-driven and statistical approaches. The project uses various corpora, including a Chinese-English parallel corpus (Sinorama) and various natural language processing (NLP) tools to construct effective English learning tasks for college learners with adaptive computational scaffolding. It integrates the expertise of a group of researchers in four areas: (a) advances in NLP technologies and applications, (b) construction of a self-access reading environment, (c) exploration of English language learning through written exercises and translations, and (d) use of bilingual corpora for culture-based English learning. In this paper, the conceptualization of the system and its various reference tools (e.g., a bilingual concordancer) for English learning and pilot testing on various modules (e.g., a reading module, Text Grader, and Collocation Practice) are reported.

KEYWORDS Bilingual Concordancer, Chinese-English Parallel Corpora, Computational Scaffolding, Natural Language Processing

CALICO Journal, 24 (1), p-p 77-95.

© 2006 CALICO Journal

78

CALICO Journal, Vol. 24, No. 1

INTRODUCTION Professionals in the field of computer-assisted language learning (CALL) have been working on harnessing speech and natural language processing technologies and Internet resources to enhance traditional language learning. They have also explored new pedagogies made possible by computers and the Internet. Among various online resources, computerized corpora are popularly viewed to be able to facilitate inductive data-driven language learning in ways that have been difficult or impossible in the past (Kennedy & Miceli, 2001; Krishnamurthy, 2001; Leech, 1997; Stevens, 1991). In a 3-year project entitled corpus and NLP for digital learning of English (CANDLE), several researchers from computer science, specifically in natural language processing (NLP) and CALL have been working together with the aim of using cutting edge corpora processing and NLP tools to advance English learning for students in Taiwan, ROC. The project is unique domestically and internationally because it uses a Chinese-English parallel corpus and builds on learners’ first language background knowledge to empower learners with culture-based materials. Both the first language and its culture provide a scaffold for learners to use while learning the new language. This paper discusses the development of the CANDLE project and four subprojects. Although most of CANDLE’s components have been used with learners in English classrooms, space limitations permit the description of only two pilot studies undertaken as formative evaluation of its modules. DESCRIPTION OF THE CANDLE PROJECT Theorists and advocates of communicative language teaching, very prevalent in North America and gaining popularity in Asian countries suggest that learners should be guided first towards understanding and responding meaningfully to language and subsequently toward noticing and describing the grammatical structures whose meaning they have understood without necessarily ‘recognizing’ them as structures (e.g., Ellis, 1998). Learners are not supposed to immediately learn grammar rules by merely focusing their attention on them; instead, they gradually assimilate grammar rules over time by continuing to notice them in the language that they come across. Recent foreign language teaching methodology maintains that it is the learners, not the teacher, who formulate grammatical rules, using the evidence of the examples in the language (Johns, 1988). The main idea of the approach is that the process of working out the rules helps the learners process the rules more deeply and therefore encourages effective learning. Along these lines, and with the advancement of Internet technologies, text corpora have come to play a much more important role than before. The use of corpora and their tools helps to maintain a balance between communicative language use and awareness of grammar structures. Corpus linguists study real texts, using explicit algorithms to extract linguistic knowledge from corpora (e.g., Hunston, 2002; Wichmann, Fligelstone, McEnery, & Knowles, 1997). An important function of corpora in the language classroom is to provide learners with concentrated exposure to specific, repeated patterns of language structures (e.g., Braun, 2005). With the use of corpus tools, language learners can avoid unhelpful reliance on over-


79

simplified ‘rules’ prepackaged by the teacher and, instead, develop proficiency through focused, purposeful exposure to, and use of, language in specific contexts (Teubert, 1996). First used by Wood, Bruner, and Ross (1976), ‘scaffolding’ was applied in parent-child talk in which the more capable adult supports the less capable child’s learning efforts. Scaffolding provides the kind of support “that is responsive to the particular demands made on children learning through the medium of a second language—that is critical for success” (Gibbons, 2002, p. 11). Likewise, an adaptive CALL system needs to match the target language learning units to the learner’s current level and make adjustments as the learner moves forward in the system. During this process, an intelligent tutoring system can provide computational scaffolding (Chan, Hue, Chou, & Tzeng, 2001) in many forms, such as different levels of explanation or different kinds of feedback for various error types or different levels of exercise sets designed for learners based on their ongoing performance recorded by the software tracker’s learning history (e.g., Hwu, 2003). There may be other innovative designs that make full use of NLP and that human teachers cannot do easily—specifically for language learning; these designs warrant further exploration. CANDLE makes use of both corpora and computational scaffolding for English learning. Computational scaffolding in an adaptive system can be systematic and precise in assisting English language learners to achieve higher levels of proficiency when they engage in CALL tasks. The goals of the CANDLE project are twofold: (a) construction of a web-based English learning environment that is practical for intermediate learners (e.g., college freshman students and senior high students) in Taiwan and (b) evaluation of the use of the CANDLE system in classrooms by investigating learners’ use of strategies. Both goals are achieved via four subprojects: the first subproject consists of developing advanced NLP tools and is followed by three subprojects on online English language-learning activities which focus on aspects of reading, writing, and culture. The first subproject provides the infrastructure of the web-based system (see http://candle.cs.nthu.edu.tw, http://candle.fl.nthu.edu.tw), that is, basic and advanced NLP tools necessary for the other three subprojects, and tracking mechanisms to monitor online learner progress. Subproject two focuses on the construction and assessment of a self-access reading environment that is adaptive to learners’ levels of English proficiency. Subproject three develops the potential of using writing or translation activities to help students learn English. Subproject four uses a bilingual corpus to enhance English culture learning, an area that has not yet been fully explored. All of these subprojects have implications for innovative digital learning, NLP (computer engineering), and English teaching and learning. The first subproject also produces required digital and content-related advanced technologies for the other three subprojects in order to conduct research on e-learning strategies and behaviors essential to prove the usefulness of such advanced English e-learning. In the first two years, we have developed the reference tools described below, along with a computer-assisted management system, and completed several pilot tests of various modules.

80


NLP and Assessment Tools

Although quite a few monolingual English concordancers have been developed and tested for pedagogical use, fewer bilingual concordancers are available (but Wang, 2001 is an exceptional Chinese-English program). In the CANDLE project, three electronic reference tools for instruction were developed in the first year: “TOTALrecall,” “Tango,” and “Collocation Checker.” TOTALrecall (Wu, Yeh, Chuang, Shei, & Chang, 2003), a Chinese-English bilingual concordancer, makes use of a digitalized Chinese-English parallel corpus taken from the Sinorama magazine. The Sinorama magazine is an official monthly publication that has been provided by the Government Information Office of the Republic of China for over three decades. Among several of the official magazines, Sinorama remains the most popular because it includes insightful reports on life styles, society, economy, and cultures related to the people in Taiwan. The topics of the articles in Sinorama include the following: art achievement, literature, painting and calligraphy, film, dance, music, architecture, museums, drama, traditional opera, handicrafts, clothing and accessories, and stories told in Chinese paper cuts. The digital form of Sinorama represents a 40-million-word encyclopedic and bilingual database containing facts about Taiwan from 1975 to 2002. Other types of corpora are also being processed or acquired, including Studio Classroom (a local English magazine with a lower level of difficulty than that of Sinorama), Official Records of Hong Kong Legislative Council, and daily news articles from Voice of America. The programming of the TOTALrecall tool involves bilingual sentence, word, and phrase alignment and meets learners’ needs by supporting sophisticated queries and displaying ranked outputs. The tool can display 1. collocation information in a concordance; 2. sorting of citations according to the “relevance” of a query based on within-document frequency of terms; 3. four levels of context for citation: subsentential, sentential, paragraph, and text; and 4. accurate word and phrase level alignment of highlighted expressions.

A bilingual concordancer allows searches of various kinds for a key word or phrase and displays a large number of different contexts in two languages. Better than a dictionary entry, a concordancer can show many more examples to help learners to acquire the usage of the word or phrase. The output of the concordancer permits careful comparison and contrast between the two languages and facilitates thorough understanding and further learning of the word or phrase. The TOTALrecall concordancer supports single word and multiple-word queries, exact string queries, queries in English and/or Chinese, and conjunctive and disjunctive queries. The system ranks the results of queries and puts the most useful information in the first one or two displays. It also has two modes for displaying citations. In the monolingual mode, citations are listed according to sentence length. In the bilingual mode, the translation counterpart of the query is highlighted, and citations with same translation counterpart are shown in clusters


81

(see Figure 1). Citations with frequent translation counterparts appear before ones with less frequent translation counterparts. All the citations in a cluster can be called out on demand. Figure 1 Bilingual Citations for the English Word “hard” Clustered by Translation Equivalents

We have also developed a collocation aid, Tango, which allows learners to access English collocations (of a certain types such as verb-noun or adjective-noun) and bilingual examples of these collocations from Sinorama. We are in the process of expanding types of collocations and adding other text corpora. Using Tango, learners can discover idioms, phrasal verbs, compounds, fixed phrases, and grammatical patterns fully supported with evidence from authentic texts. Tango has an easy-to-use interface allowing learners to gain access to real data quickly. Learners simply type in a word and choose one of the several collocation types to display a list of relevant collocations and bilingual examples. Figure 2 shows representative examples for a list of verb collocates of the noun “influence” along with the frequency of each collocation in the corpus. The statistics of the logarithmic likelihood ratio is used to exclude free combinations that are not appropriate collocations and to pinpoint rare and highly specialized collocations. Attempting to catch all of the errors learners make in an essay, as does the grammar checking tool in Word, leads to either false alarms or misses of important errors for a specific language group (e.g., Chinese learners of English). In the CANDLE project, we focus on common but dominant lexical error types in verbnoun collocations (e.g., ‘take medicine’ instead of ‘eat medicine,’ a frequent type of mistake made by Chinese learners; Wible, Kuo, Chien, Liu, & Tsao, 2001). For this purpose, we developed an automatic Collocation Checker. The Collocation Checker utilizes NLP techniques to chunk sentences in order to extract V-N collocations in input texts and to derive a list of candidate English verbs that share the same Chinese translations via the processing of bilingual corpora. After combining nouns with those candidate verbs as V-N pairs, the system makes use of an English corpus to exclude the inappropriate V-N pairs in order to identify

82


the proper collocations. The Collocation Checker can pinpoint the miscollocation and prompt learners with the suggestion of correct collocations that they should use based on mutual translations between Chinese and English (see Figure 3). It is hoped that the Collocation Checker will facilitate students’ acquisition of appropriate collocations and help them to transfer that knowledge to their writing in the future. Figure 2 Collocations for the noun “influence”

Figure 3 Sample Results of the Collocation Checker

The Reading Module

Learning words while reading is common for first language learners and is a means for life-long learning of foreign languages. For this reason, intermediate foreignlanguage learners are often encouraged to acquire vocabulary while reading as a way to promote their language proficiency. In the CANDLE project, the reading component is designed as a partially self-access module with various types of computational scaffolding tools, such as dictionaries, MP3 files, and word highlighting, to help learners. The module can be used by instructors in traditional curricula to complement other instructional reading activities. After learners log into the reading module, they can take a vocabulary-level test

83


in order to ascertain their level of the vocabulary knowledge and select the suitable level of texts they should read. If they decide to progress further, the system (or the class instructor) can provide additional reading texts for them to practice. The difficulty level of the reading texts is checked with a computer vocabulary analyzer that tells learners what portions of the text fall into each of the five vocabulary bands of the Collins COBUILD dictionary. After comparing learners’ level of vocabulary knowledge to that of the reading text, the words beyond level of vocabulary knowledge are highlighted in the text. During the reading process, learners can request help from various online dictionaries and an audio version of the reading (for texts from the Voice of America) in MP3 format at any time. If learners want to learn more about word usage, they have easy access to various online collocation tools such as Tango for help. Their reading comprehension is further checked with multiple-choice comprehension questions developed by the CANDLE team. In addition, online exercises for important vocabulary items are automatically generated from the corpus data. Feedback on the instructional effectiveness of the reading module was obtained from 150 students plus some instructors after they used the module in a questionnaire. The results of their responses are summarized in Table 1. Table 1 Results of Questionnaire Responses to the Reading Module Statements

Responses

1

The web site is easy to use and browse.

79%

2

The number of reading assignments is proper and the vocabulary items are not too difficult.

51%

3

The topics and content of the articles are interesting.

50%

4

The various tools provided by this web site are very useful.

83%

5

The various online dictionaries provided by the web site are very useful.

84%

6

The vocabulary analyzer which can highlight difficult words is very useful.

72%

7

The Voice of America MP3 audio files are very useful.

65%

8

The postreading comprehension questions are very useful.

65%

9

The postreading vocabulary exercises are very useful

78%

10

The system can keep track of each word you consulted in the reading process, which is very useful.

81%

11

This site can keep track of your reading processes and record your test scores, which is very useful.

73%

12

This web site can help one improve English reading abilities

66%

13

This site can help one improve my command of English vocabulary.

65%

14

If one continues to use this web for a longer period of time, this site will help one improve English abilities.

69%

15

In the future, if possible, I will continue to use this web site.

77%

Note: The responses are based on an original 5-point Likert scale.

84


Students’ responses form the basis for making adjustments in the module. Students found the module very useful for improving their reading abilities. In particular, they liked the click and show dictionaries because it helped them read without having to spend extra time on consulting word meanings in a paper dictionary. Some also found the comprehension questions to be useful since they could check if they understood the texts. In addition, the pronunciation in the audio file of the text was extremely clear. However, students also indicated that improvements could be made in text selection, difficulty level of the vocabulary quizzes, help with sentence structures, and varieties of postreading exercises. We will try to revise these features in the near future. Design of this reading environment illustrates the value of computational scaffolding within the spirit of learner control. A separate module below shows a case program control based on pedagogical principles. To more rigorously control text difficulty level for a particular group of learners and to conduct an experimental comparison on word exposure effects, we have designed an extensive self-access reading module, named Text Grader, for students’ winter-break homework. The texts were prepared beforehand using corpora processing tools based on foreign language vocabulary and reading research findings. Scholars claim that texts reaching 95% familiar word coverage for specific groups of English learners may be a requirement for incidental learning (e.g., Laufer, 1989); too many unfamiliar words would impede reading comprehension, let alone vocabulary acquisition. Yet, without the use of computer tools, preparing appropriate texts is a challenge for both researchers and classroom teachers. Word lists research and quantitative corpus analyses using word frequency computer programs enabled us to choose materials appropriate to learners’ levels of vocabulary knowledge in Text Grader (Ghadirian, 2003). We filtered texts with four word lists (the General Service Word List, a local Senior High Students’ Word List, the University Word List, and an Exposed Word list) and selected 16 articles out of the original 5,008 articles in the Sinorama corpus (Liou & Huang, 2004). Easier articles were sequenced to be read first with another control of the number of times of target word exposure. Familiar words and unfamiliar target words (words for the participants to learn while reading) were identified. Pretest-posttest measures were used to investigate what is the adequate amount of exposure for words to be acquired incidentally for receptive or productive use (e.g., understanding or use in making a sentence). With the carefully selected reading materials, we then designed an online reading curriculum for 38 first-year college students to read at home for a period of 12 weeks. A pretest (see Table 2), a posttest, a background questionnaire, and an evaluation questionnaire were used as research instruments. Results showed that the participants made significant improvement on vocabulary gains (t = 8.849, p < .01) (see Table 3) with considerable learner satisfaction (see Table 4). However, the precise number of exposure in such a reading context for receptive or productive vocabulary acquisition remains open to future research.

85

Liou, Chang, Chen, Lin, Liaw, Gao, Jang, Yeh, Chuang, and You Table 2 Self-report Categories in a Vocabulary Knowledge Elicitation Scale Section

Statement

II.

I have seen this word before, but I don’t know what it means.

I.

III. IV.

V.

I don’t remember having seen this word before.

I have seen this word before, and I think it means (synonym or translation) I know this word. It means

I can use this word in a sentence: section, please also do Section IV.)

.

. (synonym or translation) . (If you do this

Table 3 Learners’ Overall Vocabulary Gains by Paired t Test (N = 38) Pretest

Posttest

Mean % 39.00 49.50

SD

17.13

DF

15.41

37

t

8.849

p

.000

Table 4. Sample Responses from the Evaluation Questionnaire Will I use a similar online extensive reading syllabus in the future?

Yes, I will (82%) No, I will not (18%)

The reasons why I will not:

Not used to reading long articles (8%) Not used to online reading (51%) Dislike the fixed set schedules organized by the online extensive reading syllabus (8%) Others (33%) (Students already have satisfying reading materials)

The reasons why I will:

Able to improve reading skills (41%) Able to enhance vocabulary (32%) The articles are interesting (10%) There are various tools fostering vocabulary learning (15%) Others (2%)

The innovation in Text Grader lies in its text selection and sequencing to ensure learners can progress from easy to difficult texts and acquire new words efficiently and effectively. Again, such systematic and precise help from corpus tools and processing illustrates the instructional value of computational scaffolding.

86


The Writing Component

The major building block in the writing component is the TOTALrecall online Chinese-English bilingual concordancer developed by the NLP team. To apply our concordance to language-learning tasks, a writing course and a supplementary Collocation Practice module with verb-noun collocations (meaning ‘appropriate phrase combinations’ in English, e.g., Nattinger & DeCarrico, 1992) were designed and tested with first-year college students. The writing-with-TOTALrecall course was designed for students majoring in English to help them write and revise four essays with the help of TOTALrecall throughout the semester. Students were asked to refer to TOTALrecall for word usage while writing their first drafts and later for error correction as they revised their drafts. On average, researchers marked two to three errors of either misused verbs, nouns, or adjectives on each draft. Over 50% of the total number of errors of each draft were corrected with the help of TOTALrecall. Furthermore, students indicated that TOTALrecall helped them choose appropriate words for writing and provided opportunities for them to understand more about the meaning and usage of target words through the bilingual corpus. Yet, occasional failure to provide exact words or expressions was a notable disadvantage. Collocation, a hallmark of near native fluency in learners’ writing, has been acknowledged as a crucial aspect in vocabulary learning, but the area has long been neglected in foreign language teaching (Nation, 2001). A review of previous work reported in the literature reveals that English learners are seriously deficient in collocations, that good learner-writers use collocation more appropriately and more frequently than poor learner-writers, and that learner’s first language heavily influences their production of correct collocations (e.g., Nesselhauf, 2003). Among different types of collocation, the verb-noun (V-N) type is particularly difficult for Taiwanese learners to master (Liu, 2002). The writing component described here incorporates the Collocation Checker and an online “Collocation Practice” module. The Collocation Practice module has six units based on the analyses of common miscollocations by Taiwanese learners (Chan & Liou, 2005). TOTALrecall was included in the module to encourage inductive learning. Practice item types include multiple-choice, fill-inthe-blank, and translation questions. Thirty-two first-year college students were recruited to participate in the empirical evaluation of the writing component with pretest, posttest, and delayed posttest measures of 40 purposefully sampled blankfilling items in a sentence context such as the following: He a great success and became the leading landscape architect of the day. [answer: achieved]

Additionally, a background questionnaire and an evaluation questionnaire were used to elicit participants’ data and perceptions about the effectiveness of the practice module. The results showed the effectiveness of the practice module with significant learner improvement from the pretest to the posttest (t = 14.880, p < .001) (see Table 5).

87


Table 5 Comparison of Pretest and Posttest scores for Collocation Practice (N = 32) Pretest

Posttest

Mean % 10.59 19.53

SD

3.26 3.95

DF 31

t

-14.880

p

.000

The Culture Component

Culture is an essential part for foreign language learning. It is also believed that the learners’ first language culture can serve as their cognitive scaffold when they learn a foreign language. The cultural component in the CANDLE project uses Sinorama as the basis to sensitize university students to the ways in which language choices are made in culture-related text. Articles from Sinorama are used as source texts to engage learners in various pedagogical activities that aim to raise their intercultural awareness for acquiring the knowledge of English language in general and of both the local and the English cultures in particular. To achieve these objectives, CANDLE’s culture component incorporates the TOTALrecall bilingual concordancer and online dictionaries as basic elements of some of the college English courses focusing on reading, writing, and vocabulary. The courses offer students opportunities for learning how to express familiar cultural concepts in English and for comparing and contrasting local cultures with the those of English-speaking countries. First students read articles from Sinorama and then complete writing assignments. Unknown words chosen by learners while reading and writing are kept in the online tracker after learners send the words to a notebook. Some target cultural words were chosen by the developers to test learners’ vocabulary level and cultural awareness such as ‘jade bangles’ (often used by Chinese women). Formative assessment of the culture component is being undertaken by collecting the students’ reading comprehension test scores, vocabulary quiz scores, and the quality of their writing. In addition, the students’ vocabulary notebooks will be analyzed and put into categories of (a) dictionary query for reading comprehension, (b) dictionary query for writing assistance, (c) concordance query for reading comprehension, or (d) concordance query for writing assistance. This classification will help us to understand students’ preference in the use of different e-reference tools for learning different types of language skills. The vocabulary words will also be examined for the reasons students made the queries by categorizing them into cultural and noncultural reasons. The students’ viewpoints expressed in the written responses to Sinorama readings will be analyzed for cultural awareness. The focus of the analysis is to compare the social customs in the Chinese and Western cultures. Computer-assisted Management System

Within the CANDLE environment, a web-based tracker has been designed to record students’ online learning history. A web-based instruction management sys-

88


tem is considered crucial for a successful online language-learning system since the large amount of tracker data can be used for various purposes (e.g., Hwu, 2003). To motivate learners’ participation and enhance self-regulation, learner performance is tracked and kept in profiles for subsequent use by the system, learners, or classroom teachers. Instructors can determine who has done what and with what degree of success, and students can also keep track of their own progress. The online records help teachers to closely monitor students and learners to self-regulate their own progress. If learners can examine their own learning profiles, they will have some sense of achievement and self-understanding of their own learning progress. Reluctant readers in a class might be pushed to read more if they can view other, more diligent students’ participation records. The tracking system records the title of the article students have read, the source of the article, its length, its difficulty level, the vocabulary items learned, the reading tools the students used, the Chinese and English words they consulted using TOTALrecall, the messages they posted on the discussion board, the chat logs, and the vocabulary test and the collocation exercise scores (see Tables 6 and 7 below). In the Text Grader module, the management system also sets dates to regulate students’ homework progress by reminding them how many articles they should have completed by a specific date. Because CANDLE is a government-supported project, the public has free access to it. Individuals or groups of learners led by course instructors are allowed to access its online materials. From October 2003 to January 2004, TOTALrecall logged 14,798 transactions with information about English queries, Chinese queries, Mono/Bilingual mode, IP, date time, user id, page number viewed, and so forth. Log data were arranged for groups of learners or individuals who signed in the web site. The query log is potentially very useful in revealing the areas of difficulty and the learning strategies of a particular group of learners. Preliminary analysis of the data showed that 1. there were slightly more English queries (7,678) than Chinese queries (5,715) and 417 transactions with both English and Mandarin queries, 2. there were 4,945 transactions in which users requested to operate in the bilingual mode with translation equivalents clustered and highlighted and 8,659 transactions in the monolingual mode, and 3. the queries tend to be very short like typical queries for Internet search engines, on average 1.11 words and 7.12 characters in English queries and 2.75 characters in Chinese queries.

The management system also includes a facility for the instructor to examine the query log of a group of students and detailed records of each student. Instructors are given access to a list of the names of all the students, and, by clicking on the hyperlink associated with a name, they can see the queries made by the student (see Tables 6 and 7).


89

Table 6 Sample Records of Students’ Use of TOTALrecall Student name 1

Total number of queries 57

Total number of words queried 39

Monolingual mode

Bilingual mode 37

11

2

228

190

118

4

80

48

25

3 5

90

75

26

23

20

8

46 67 55 18

Table 7 Sample Words Searched and Search Mode Word searched

Searching mode

Query time(s)

Pages explored

Mono

1

1

Mono

1

1

Mono

2

1

Mono

1

1

In the experiment of Collocation Practice involving teaching non-English majors about appropriate use of collocations by consulting TOTALrecall, we investigated the extent to which the monolingual English mode or the bilingual ChineseEnglish model could help learners locate what they need from a concordancer. The records indicated that 32 students consulted the four collocation instruction units incorporating TOTALrecall for inductive collocation learning a total of 2,804 times and that each student engaged in 85 transactions on average—or 21 transactions per instructional unit—to answer about 20 drill-type exercises. Also about 80% of the total searches were English keyword searches. Students employed the bilingual mode more often since the bilingual mode’s highlighting of keywords in both Chinese and English facilitates collocation spotting. In doing online practice, students preferred to search in English in both searching modes, typical for an online bilingual reference tool. When students wrote in English or translated into English, they tended to make Chinese queries. In addition, the records showed that some of the 32 students used TOTALrecall as a bilingual dictionary for other aspects of English language learning. They mostly consulted TOTALrecall to find translation equivalents of nouns in both languages including proper nouns or compounds. One interesting finding was that the students often (a Chinese game when two keyed in Chinese verb-noun collocations such as people play and drink) in order to find English counterparts, whereas they often merely typed verb or noun in English to clarify their understanding about word meaning and collocation behavior. Further, students sought help of TOTALrecall to find some spoken forms such as ‘worse, awkward’ or ‘bad’ since

90


these expressions are culture specific and easier to find than their English translation equivalents. In the Text Grader module, the participants’ performance with familiar words and unfamiliar targeted words as included in the items of the pretest was shown to be significantly different (see Table 8), a way to verify our design of chosen texts to be read. Table 8 Independent t Test for Familiar and Unfamiliar Targeted Words in Pretest (N = 38) Familiar words

Unfamiliar words

Mean %

SD

84.74

15.89

39.00

15.41

DF 74

t

12.737

p

.000

However, when we checked the words these learners looked up using TOTALrecall while reading the 16 articles, we discovered that they looked up more familiar words than unfamiliar words. This may indicate that the students still relied heavily on intensive reading strategies in the face of an extensive reading task, a habit perhaps from their senior high English learning experiences. This may provide instructors with an opportunity to understand in more detail what learners actually do as they read. Depending on the instructor’s or researcher’s goal, tracking data can provide useful information about the participants’ online learning behavior. In the future, it will be possible to enhance the management system with database and data mining technologies to help with computational scaffolding and individualized adaptive instruction for learners in order to understand learning processes. Integration of the Four Subprojects

The first subproject provides necessary NLP tools with good precision that stimulate new English pedagogies that would be impossible without the digital resources or without previous innovations in the CANDLE project. The other three subprojects make full use of these NLP tools and explore pedagogical possibilities via corpora processing and computational scaffolding. For instance, as an innovative e-reference tool, TOTALrecall supports all the English learning tasks in the reading, writing, and culture components of the project. The system can serve as an anytime English tutor that ‘understands’ what learners say to it. The integration of the four subject projects is illustrated in Figure 4.


91

Figure 4 Integration of the Four Subprojects with Respective Modules NLP TOTALrecall, Tango Collocation Checker

CANDLE Interface

Reading Self-access module, Text Grader

Computer-assisted management system

Writing Writing with TOTALrecall, Collocation Practice

[2nd year]

Culture Culture courses

CONCLUSION In this paper, we have described an innovative web-based English learning project and some of its initial achievements. The major features of CANDLE are its extensive use of various corpora and NLP tools such as TOTALrecall, Tango, and Text Grader in order to build computational scaffolding for intermediate learners. Various levels of computational scaffolding are provided for learners as they engage in reading, writing, or cultural learning activities that make full use of corpus processing. The bilingual corpora in the project uses learners’ knowledge of their native culture to help advance their English learning. Besides the team on NLP tool development, the other members of the project work on the self-access reading component, the writing component, and the culture component. The effectiveness of each of the components has already been or will be verified through real classroom use with empirical methods and curriculum infusion modules for EFL learners. The empirical evidence of the effectiveness on various modules is quite promising. In the 3-year project, we will achieve the following goals via the CANDLE web site: 1. widely circulate resources of the CANDLE learning center for as many students to use as we can reach in three years, 2. provide empirical evidence or usability testing data to prove CANDLE usefulness or effectiveness, and 3. explore the possibilities of curriculum infusion in different universities or colleges for various kinds of learners.

By the end of the third year, we will have developed more useful tools to enhance English learning. Pedagogically, we will explore how to achieve the goal of learner autonomy; learners moving from computational scaffolding to full participation in the English-speaking discourse community.

92


REFERENCES Braun, S. (2005). From pedagogically relevant corpora to authentic language learning contents. ReCall, 17 (1), 47-64.

Chan, T. W., Hue, C. W., Chou, C. Y., & Tzeng, O. J. L. (2001). Four spaces of network learning models. Computers and Education, 37 (2), 141-161.

Chan, T. P., & Liou, H. C. (2005). Effects of online concordancing instruction on EFL students’ learning of verb-noun collocations. Computer Assisted Language Learning, 18 (3), 231-251.

Ellis, R. (1998). SLA research and language teaching. Oxford: Oxford University Press.

Gibbons, P. (2002). Scaffolding language, scaffolding learning: Teaching second language learners in the mainstream classroom. Portsmouth, NH: Henemann.

Ghadirian, S. (2003). Providing controlled exposure to target vocabulary through the screening and arranging of texts. Language Learning & Technology, 6 (1), 147164. Retrieved April 24, 2006, from http://llt.msu.edu/vol6num1/GHADIRIAN/ default.html

Hunston, S. (2002). Corpora in applied linguistics. Cambridge: Cambridge University Press. Hwu, F. (2003). Learners’ behaviors in computer-based input activities elicited through tracking technologies. Computer Assisted Language Learning, 16 (1), 5-29.

Johns, T. (1988). Whence and whither classroom concordancing. In T. Bongaerts, P. De Haan, S. Lobbe, & H. Wekker (Eds.), Computer applications in language learning (pp. 9-27). Dordrecht, The Netherlands: Foris.

Kennedy, C., & Miceli, T. (2001). An evaluation of intermediate students’ approaches to corpus investigation. Language Learning & Technology, 5 (3), 77-90. Retrieved April 24, 2006, from http://llt.msu.edu/vol5num3/kennedy

Krishnamurthy, R. (2001). Language corpora: How can teachers and students use these valuable new resources? In Selected Papers from the 10th International Symposium on English Teaching (pp. 59-65). Taipei: Crane. Laufer, B. (1989). What percentage of lexis is essential for comprehension? In C. Lauren & M. Nordman (Eds.), Special language: From humans thinking to thinking machine (pp. 69-75). Clevedon, England: Multilingual Matters.

Leech, G. (1997). Teaching and language corpora: A convergence. In A. Wichmann, S. Flingelstone, T. McEnery, & G. Knowles (Eds.), Teaching and language corpora (pp. 1-23). London: Longman.

Liou, H. C. & Huang, H. T. (2004, June). Effects of graded texts on EFL college students’ incidental vocabulary learning: Issues of exposure amount and acquisition of productive and receptive vocabulary. Paper presented at CALICO 2004 Symposium, Pittsburgh, PA. Liu, L. E. (2002). A corpus-based lexical semantic investigation of verb-noun miscollocations in Taiwan learners’ English. Unpublished master’s thesis, Tamkang University, Taipei. Nation, P. (2001). Learning vocabulary in another language. New York: Cambridge University Press.


93

Nattinger, J. R., & DeCarrico, J. D. (1992). Lexical phrase and language teaching. Oxford: Oxford University Press. Savignon, S. & Sysoyev, P. (2002). Sociocultural strategies for a dialog of culture. The Modern Language Journal, 86 (4), 508-524.

Nesselhauf, N. (2003). The use of collocations by advanced learners of English and some implications for teaching. Applied Linguistics, 24 (2), 223-242.

Stevens, V. (1991). Classroom concordancing: Vocabulary materials derived from relevant, authentic text. English for Specific Purposes, 10 (1), 10-15. Teubert, W. (1996). Why corpus linguistics? International Journal of Corpus Linguistics, 1 (1), iii-x.

Wang, L. (2001). Exploring parallel concordancing in English and Chinese. Language Learning & Technology, 5 (3), 174-184. Retrieved April 24, 2006, from http://llt. msu.edu/vol5num3/wang/default.html Wible, D., Kuo, C.-H., Chien, F.-Y., Liu, A., & Tsao, N.-L. (2001). A web-based EFL writing environment: Intelligent information for learners, teachers, and researchers. Computers and Education, 37 (3-4), 297-315. Wichmann, A., Fligelstone, S., McEnery, T., & Knowles, G. (Eds.). (1997) Teaching and language corpora. London: Longman.

Wood, D., Bruner, J., & Ross, G. (1976). The role of tutoring in problem solving. Journal of Child Psychology and Psychiatry, 17 (2), 89-100. Wu, J. C., Yeh, K. C., Chuang, T. C., Shei, W. C., & Chang, J. S. (2003, July). TOTALrecall: A bilingual concordance for computer assisted translation and language learning. In Proceedings of the 41st Association of Computational Linguistics Conference, Sappora, Japan.

ACKNOWLEDGMENTS The authors would like to thank all the members involved in the CANDLE project. This project is supported by a grant of National Science Council under the numbers NSC92-2524-S007-002, NSC93-2524-S007-001, and NSC94-2524S007-001. Thanks are also extended to two anonymous reviewers for their valuable comments on an earlier draft of this paper. AUTHORS’ BIODATA Hsien-Chin Liou, Professor at the National Tsing Hua University, has conducted numerous CALL projects. Her current interest is web-based materials and corpus processing for academic writing.

Jason S. Chang, professor of computer science at the National Tsing Hua University, has taken a long-term interest in natural language processing and once served as Director of Republic of China Computational Linguistics Association. Hao-Jan Chen, associate professor at the National Taiwan Normal University, has developed several ESL learning web sites in Taiwan.

94


Chih-cheng Lin is an associate professor in the English Department at the National Taiwan Normal University, where he has been teaching English language courses, multimedia and language teaching, and academic writing. His current research interests include CALL and EFL reading and writing. Meei-Ling Liaw, Professor and Chair of the Department of English at the National Taichung University, has done research on using computer technology to facilitate EFL teaching and learning, intercultural competence, and teacher education.

Zhao-Ming Gao, assistant professor at the National Taiwan University, has done research on language technologies, including intelligent computer-assisted language learning and machine translation

Jyh-Shing Roger Jang, associate professor in the Computer Science Department at the National Tsing Hua University, Taiwan, has done research on speech recognition, speech synthesis, melody recognition, and computer-assisted pronunciation training.

Yuli Yeh is a retired associate professor at the National Tsing Hua University. Her research interest is development and evaluation of materials for computer-assisted language learning.

Thomas C. Chuang, President at Vanung University and Professor in the Department of Computer Science and Information Engineering, has research interests in natural language processing and artificial intelligence.

Geeng-Neng You, Associate Professor in the Department of Multimedia Design at the National Taichung Institute of Technology, Taichung, Taiwan, has done research on multimedia design and computer-assisted translation evaluation.

AUTHORS’ ADDRESSES Hsien-Chin Liou Department of Foreign Languages and Literature 101 Sec 2 Kuang Fu Road National Tsing Hua University Hsinchu, Taiwan Republic of China 30043 Phone: 886 3 5742709 Fax: 886 3 5718977 Email: [email protected] Jason S. Chang Department of Computer Science National Tsing Hua University Hsinchu, Taiwan Email: [email protected]

Liou, Chang, Chen, Lin, Liaw, Gao, Jang, Yeh, Chuang, and You Hao-Jan Chen Department of Foreign Languages and Literatures National Taiwan Normal University Taipei, Taiwan Email: [email protected] Chih-Cheng Lin Department of Foreign Languages and Literatures National Taiwan Normal University Taipei, Taiwan Email: [email protected] Meei-Ling Liaw Department of English National Taichung University Taichung, Taiwan Email: [email protected]

Zhao-ming Gao Department of Foreign Languages and Literatures National Taiwan University Taipei, Taiwan Email: [email protected] Jyh-Shing Roger Jang Department of Computer Science National Tsing Hua University Hsinchu, Taiwan Email: [email protected]

Yuli Yeh Department of Foreign Languages and Literature National Tsing Hua University Hsinchu, Taiwan Email: [email protected] Thomas C. Chuang Department of Computer Science Vanung University Jhongli, Taoyuan, Taiwan Email: [email protected]

Geeng-Neng You Graduate Institute of Multimedia National Taichung Institute of Technology Taichung, Taiwan Email: [email protected]

95

96


CALICO ‘07

CALL FOR PARTICIPATION 24th Annual Symposium

The Many (Inter)Faces of CALL May 22 - 26, 2007 Texas State University San Marcos, Texas Presentations for all levels of expertise—from newcomers to experts—will be considered. Submit proposals for preconference workshops, courseware showcase demonstrations, and presentations at CALICO’s web site

calico.org or contact CALICO headquarters CALICO 214 Centennial Hall Texas State University 601 University Drive San Marcos, Texas 78666 USA Phone: 512/245-1417 Fax: 512/245-9089 Email: [email protected]

Deadline for submission of proposals: October 31, 2006

Corpora Processing and Computational Scaffolding for a Web-based ...

Corpora Processing and Computational Scaffolding for a Web-based ...

Suggest Documents

A Transparency and Scaffolding Framework for Computational ...

Supplementary Information Computational support for a scaffolding

Philippine Languages Online Corpora - Association for Computational ...

Multilingual Corpora Annotation for Processing Definite ... - CiteSeerX

Automatic Processing of Parallel Corpora: A

A computational model for processing of semi

A computational model for processing of semi

Extracting MWEs from Italian corpora - Association for Computational ...

Pre-processing of Bilingual Corpora for Mandarin-English EBMT

Memory-Based Language Processing - Association for Computational

Learner and Error Corpora Based Computational Systems University ...

Natural Language Processing - Association for Computational ...

Point-set Manifold Processing for Computational ...

A Computational Evaluation of Sentence Processing ... - CiteSeerX

A Computational Evaluation of Sentence Processing ... - CiteSeerX

Webbased alcohol intervention for Mori university ... - BioMedSearch

Webbased Educational Resources - MUHS

Technologies for Formative Assessment: Can WebBased Applications ...

A Framework for Computational Processing of Spelling ... - LTRC

Corpora and Cognitive Linguistics - SciELOwww.researchgate.net › publication › fulltext › Corpora-a

A webbased training approach for the structural ... - Wiley Online Library

Webbased peer assessment: feedback for ... - Semantic Scholar

Natural Language Processing and Systems Biology - Computational ...

Usability evaluation of a webbased patient ...