Learner Corpus Research in EAP: Some key issues ...

5 downloads 0 Views 160KB Size Report
learner corpus comprising academic writing was established by John Milton at ..... this detailed analysis Wharton drew up the following five categories of ...
English Language and Linguistics 20(2) 43

Learner Corpus Research in EAP: Some key issues and future pathways Lynne Flowerdew (Hong Kong University of Science and Technology) Flowerdew, Lynne. 2014. Learner Corpus Research in EAP: Some key issues and future pathways. English Language and Linguistics 20.2, 43-60. Learner corpus research is now almost twenty-five years old. While in the 1990s learner corpus research

was

mostly

directed

towards

English

for

general

academic

purposes

(EGAP), i.e. argumentative essay writing, more emphasis is now being given to English for specific academic purposes (ESAP) writing. This article discusses three core

issues,

namely

the

optimum

size

of

the

corpus,

whether

the

study

is

corpus-based or corpus-driven and the interpretation of the corpus data, with respect to key learner corpus studies of EGAP and ESAP writing. These three issues, which have been much discussed in the literature on corpus linguistics in general, have been accorded less attention in learner corpus research and therefore merit further reflection and discussion. The final section of the paper maps out some ongoing

and

future

avenues

for

learner

corpus

research.

These

include

the

compilation and analysis of more longitudinal data, the use of more sophisticated statistical procedures, and the collection of more individual metadata.

Key words:

learner

corpus,

academic

writing,

corpus

size,

corpus-driven,

corpus-based, interlanguage

1. Introduction Learner corpus research in EAP was first launched in the 1990s. An extensive learner corpus project, the International Corpus of Learner English (ICLE), was initiated by Sylviane Granger at the University of Louvain, Belgium, in 1990. ICLE consists of sub-corpora of academic argumentative essays written in English by French, German, Polish, Greek etc. advanced learners. In the early 1990s, a learner corpus comprising academic writing was established by John Milton at HKUST (Hong Kong University of Science and Technology). A defining feature of most L2 learner corpus research is that it is of a comparative nature and benchmarked

against

native-speaker

corpora.

This

is

termed

the

contrastive

44 Lynne Flowerdew

interlanguage analysis (CIA) approach. One aspect of CIA compares L1 and L2; another branch of CIA compares learner language produced by learners with different L1s (Granger 1996). To facilitate this type of interlanguage comparison, Granger and her colleagues compiled the Louvain Corpus of

Native English

Essays (LOCNESS) according to very specific design criteria so that the L1 and L2 learner corpora were as closely matched as possible in terms of task-type, task length etc. While

in

directed

the

early

towards

days

of

English

learner

for

corpus

general

research,

academic

attention

purposes

was

mostly

(EGAP),

i.e.

argumentative essay writing, more emphasis is now being given to English for specific

academic

progress:

the

purposes

compilation

(ESAP)

of

the

writing

Varieties

with of

two

English

large-scale for

projects

Specific

in

Purposes

dAtabase (VESPA) by Magali Paquot at the University of Louvain (see Granger and Paquot 2013) and the Corpus of Academic Learner English (CALE) by Callies and Zaytseva (2013) at the University of Bremen (see Flowerdew, forthcoming 2014). Learner corpus research is gaining increasing recognition, as witnessed by the establishment of a bi-annual learner corpus research conference in 2011, a forthcoming handbook of learner corpus research (Granger et al. 2014) and the very

recent

establishment

of

a

learner

corpus

association

in

September

2013

(www.learnercorpusassociation.org) with the facility of a searchable bibliography. Learner corpus research is thus a thriving area. However, some core issues, namely the optimum size of the corpus, whether the study is corpus-based or corpus-driven and the interpretation of the corpus data, points which have been much discussed in the literature on corpus linguistics in general, seem to have been accorded less attention in learner corpus research. My aim in this paper is thus to discuss these three aforementioned issues with specific reference to the theoretical and methodological underpinnings of key studies in the literature on EAP writing. In the final section of the paper I will briefly map out some ongoing and future avenues for learner corpus research.

Learner Corpus Research in EAP: Some key issues and future pathways 45

2. Core issues in learner corpus research 2.1. Size of learner corpus One

key

issue

concerns

the

size

of

the

corpus,

a

question

that

is

also

of

paramount importance in corpus analysis in general. As a general rule, more broad-based,

quantitative

studies

make

use

of

corpora

of

around

500,000

to

one-million words. However, when qualitative methods are employed, smaller corpora, ranging from 50,000 to 150,000 words, tend to be used with fewer items examined. Moreover, the size of the corpus can also be a reflection of a particular theoretical stance, with collocational analysis being a case in point as this can be examined from either a textual or statistical perspective (see Barnbrook et al. 2013 for an extensive overview of collocation), as illustrated in the following two studies by Nesselhauf (2005) and Groom (2009). Nesselhauf’s theoretical

(2005)

model

phenomenon

on

approach

of the

to

collocation

phraseology grounds

that

analysis

which

views

certain

collocations

is

underpinned

collocation may

as only

a

by

a

textual

show

up

infrequently in a corpus and may be subject to arbitrary variation. Her research involved a qualitative study of verb-noun collocations, specifically support verb constructions, often referred to as ‘delexical’, i.e. those collocations in which the bulk of the meaning is carried by the noun, e.g. make an arrangement. For her study she used the German sub-component of ICLE, comprising 150,000 words of 300 argumentative essays to examine four verbs delexical verbs (make, have,

take,

give).

All

automatically.

the

Then,

verb

forms

of

these

four

those

which

occurred in

a

verbs

were

verb-noun

first

extracted

combination

were

selected on the basis of a manual analysis. As Nesselhauf (2005: 113) points out, though, this manual approach cannot capture instances in which another verb is used instead of a support verb construction (notice instead of take notice). Nor can this approach detect a verb erroneously used (e.g. *do* a construction) other than the four investigated (see Mattheoudakis and Hatzitheodorou (2012) for a small-scale study on the patterning of this type of verb in the Greek component of ICLE). In contrast, Groom (2009) argues for the identification of collocation through purely statistical means, for which a larger corpus than that used by Nesselhauf

would be necessary.

Groom (2009)

made use of the

considerably

46 Lynne Flowerdew

larger

1,221,265-word

Uppsala

Student

English

Corpus

consisting

of

undergraduate student essays for his study on collocate analysis and used two association

measures,

the

t-score

and

MI

score,

for

calculating

statistically

significant co-occurrences. It goes without saying that the size of the corpus is also very much related to

the

phenomenon

under

investigation.

McEnery

and

Wilson

(2001)

have

pointed out that the lower the frequency of the feature one wishes to investigate, the larger the corpus should be. This would apply to nouns, adjectives, adverbs, etc.

(i.e.

content

grammatical smaller

words)

words

corpora

in

can

which

any

be

tend

given

used

for

to

have

corpus.

a

much

Conversely,

investigating

the

lower it

more

can

frequency be

than

argued

common

that

features

of

language, such as grammatical items, as, indeed, Biber (1990) has argued. Smaller corpora would thus seem to be well suited to the study of article usage, a high frequency phenomenon and of great significance as it is at the interface of syntax, semantics and pragmatics and is an area which poses enormous problems for learners, even very advanced ones. However, somewhat surprisingly, there are very few studies of learner corpora investigating article usage. Why should this be so? While

grammatical

items

can

easily

be

retrieved

via

part-of-speech

(POS)

taggers which automatically assign grammatical tags to every word in a corpus, grammatical errors in learner corpora are more difficult to retrieve. One common error tagging system is that produced to accompany the suite of ICLE corpora (see Dagneaux et al. 1996, 1998); however, manual annotation is required, which would be demanding on the analyst. It is therefore not surprising to find that learner corpora used for investigation of article use are quite small, in part due to the demands of hand tagging errors. One such study is that by Narita (2013), who examined article use in a corpus of

10,721

words

comprising

61

argumentative

essays

written

by

Japanese

students in their first-year at university. One of the main aims of her study was to examine three types of errors, omission, redundancy and substitution, which necessitated manual annotation of all the essays with descriptors of article errors. Another study on the use of the article system is that by Díez-Bedmar and Papp (2008). These researchers carried out a contrastive interlanguage analysis of the English article system in two comparable learner corpora, Chinese-English and

Learner Corpus Research in EAP: Some key issues and future pathways 47

Spanish-English, consisting of about 40,000 words each. Of interest is that these researchers devised their own tagging system based on a synthesis of previous taxonomies of the English article system accounting for syntactic and functional uses

centred

on

definiteness

and

specificity.

The

corpora

used

in

these

two

studies are small as the analyses on article usage are fine-grained and size has to be balanced against the level of delicacy of the investigation. In contrast, the research by Leńko-Szymańska (2012) on the use of articles in the argumentative essays of Polish learners was based on a much larger corpus of 230,065 words, comprising 363 essays from the Polish component of ICLE and also

the

56,821-word

Polish

component

of

the

International

Corpus

of

Crosslinguistic Interlanguage (ICCI) of secondary school writing (see Tono et al. 2012 for an overview of this project). In this case, the two learner corpora were not tagged for article errors as the main aim of the study was to examine articles in

conventionalized

language

in

the

form of

three-word

lexical

bundles

and

compare these with the bundles found in native writing. However, at the initial stage of the analysis the frequencies of the definite and indefinite articles were first computed to compare frequencies across different levels of students. While the use of the zero article was not addressed in this study (no tagging was undertaken) Leńko-Szymańska extrapolates from her results that the fact that lower proficiency students underused the and a/ an implies that they overused the zero article, which has been found in previous research to dominate the language production at lower levels of proficiency (Master 1997). The study by Lee and Chen (2009) specifically looks at the phraseology of the in the Chinese Academic

Written

English

(CAWE)

corpus,

comprising

407,960

words

of

78

undergraduate linguistics/applied linguistics dissertations. The was singled out for analysis as the initial keyword analysis showed it to be significantly overused with students appearing not to have mastered the use of plural nouns for making



general statements (e.g., Teachers should

instead of The teachers should).

The paucity of studies on article use in learner corpora could well be explained by the challenges of manual tagging a learner corpus, which, by necessity poses limits on the corpus size, for what is a very frequent and complex item. Also, it

should

not

investigation,

be as

forgotten

noted

by

that

corpus

Granger

tools

(2002),

are

but

reiterating

one methodology

Johansson’s

(1991:

for 313)

dictum that ‘The corpus remains one of the linguist’s tools, to be used together

48 Lynne Flowerdew

with

introspection

and

elicitation

techniques’.

A

case

in

point

here

is

the

enlightening research by Amuzie and Spinner (2013) on Korean EFL learners’ indefinite article use with four types of abstract nouns. They administered a forced-choice elicitation test of 48 target items to fifty Korean intermediate-level students studying at a university in Korea. Of note is that the researchers also conducted a corpus analysis using the 100-million word British National Corpus (BNC) for each of the four target nouns under investigation in order to shed light on their analysis of learner data and to gain a better understanding of the input learners are exposed to. In sum, the size of a learner corpus is highly dependent on a number of factors, such as the theoretical perspective of the analyst, the type of item under investigation, the level of delicacy of the investigation and whether the corpus is manually tagged or items are retrieved automatically. All these factors are of relevance for specialized corpora (see Flowerdew 2004), a classification usually assigned to learner corpora.

2.2. Corpus-based vs. corpus-driven enquiries Another

consideration

is

whether

the

investigation

is

corpus-based

or

corpus-driven, an issue much discussed in the corpus linguistics literature but only touched on in learner corpus studies. In the corpus-based approach the items for investigation are derived from a pre-defined list, sometimes drawn from a reference grammar underpinned by a particular linguistic theory but, more often than not, from previous corpus-based results. In the corpus-driven approach, searches,

on in

the

other

which

case

hand, the

the

linguistic

corpus

would

categories not

arise

normally

from be

the

corpus

annotated

for

part-of-speech, in order to allow for new insights on linguistic patterning, and new theories of language, unobscured by pre-conceived categories (see Tognini Bonelli 2001). A related issue here is if, as in the corpus-based approach, the items for investigation are drawn from exponents listed in a reference grammar, it is important to know the kind of grammar. Why this is important relates to Chomsky’s

(1986,

1988)

concept

of

language

as

‘E-language’,

‘Externalised

language’ (i.e. language description of performance data, which are instances of attestations

found

in

corpora)

or

‘I-language’,

‘Internalised

language’

(i.e.

Learner Corpus Research in EAP: Some key issues and future pathways 49

language which is competence-based and not necessarily attested but possible). As Widdowson (2003: 79) has pointed out, this difference can be exemplified in two standard reference grammars: A Comprehensive Grammar of the English

Language (Quirk et al. 1985) and Longman Grammar of Spoken and Written English (LGSWE)

(Biber

et

al.

1999).

The

former

reference

grammar

is

‘essentially

descriptive of what the grammarian authors know of English as representative users,

and what

they know,

as

grammarians’,

with some

of

the information

checked out in the Survey of English Usage (SEU) Corpus, but not determined by corpus data. By way of contrast, the grammatical descriptions in the LGSWE are

based

on

patterns

of

structure

and

use

from

a

40-million-word

corpus

spanning four key text-types (conversation, fiction, news reportage and academic prose). It would thus seem advisable to consult both types of grammars, and indeed

other

sources,

for

compiling

pre-defined

lists

in

corpus-based

investigations to achieve maximum comprehensiveness (see Flowerdew 2012 for more discussion on this point). To illustrate the corpus-based and corpus-driven approaches, I discuss several studies covering the area of epistemic modality and stance. In the main, these studies

focus

on

argumentative

essay

writing.

Given

that

the

formation

of

argumentation is governed by epistemic modality, i.e. the level of commitment of writers

to the content of their text,

it is not surprising that this

area has

attracted a lot of attention in learner academic writing. Epistemic modality can be realized by verbs, adverbs, lexical verbs, adjectives and nouns and overlaps with

Hyland’s

(1999)

stance

markers

of

hedges,

boosters,

self-mentions

and

attitude markers. In brief, hedges are devices such as possible, might and perhaps, which

allow

information

to

be

presented

as

opinion

rather

than

fact,

thus

implying that a statement is based on the writer’s plausible reasoning. On the other hand,

such as clearly and demonstrate emphasize certainty and

boosters

convey an assured tone. Likewise, self-mentions (I, we) can also lend weight to a writer’s particular stance and give them an authorial voice. Attitude markers, which

indicate

the

writer’s

affective

rather

than

epistemic

attitude,

convey

surprise, agreement, importance etc. and are typically realized by attitude verbs (e.g.

agree),

sentence

adverbs

(e.g.

unfortunately)

and

adjectives

(remarkable).

Research on learner corpora has uncovered just how difficult it is for students to master the complex interplay between hedges and boosters in argumentative

50 Lynne Flowerdew

essay writing. One of the first studies to examine epistemic modality, specifically hedges and boosters for expressing doubt and certainty, was that by Hyland and Milton (1997). The two corpora used in the study consisted of about 500,000 words each. The learner corpus consisted of 150 essays written by Hong Kong students for their

A

level

“Use

of

English”

exam

across

all

ability

bands,

while

the

native-speaker corpus comprised A Level scripts written by British school leavers of similar age and education level as the Hong Kong learners. Of note is that the selection of expressions for investigation was based on a variety of sources. The main source was Holmes’ (1983) analysis of the ‘learned’ sections of the Brown

and

LOB

(Lancaster/Oslo-Bergen)

corpora

of

written

English,

supplemented by the research literature on modality (Coates 1983) and reference grammars (Quirk et al. 1985). From these sources an inventory of 75 of the most frequently occurring epistemic lexical items in native-speaker academic writing was produced. Fifty sentences containing each of these items (if there were 50 occurrences) were extracted from each grade in both the learner and NS corpus for detailed investigation at the sentence level. While the study itself is essentially corpus-based, the inventory of 75 items for investigation is based on sources of language constituting both ‘I-language’ and ‘E-language’. Around 15 years after the Hyland and Milton 1997 study, Hatzitheodorou and Mattheoudakis (2011) conducted a study on the use of stance markers in GRICLE, the Greek component of ICLE using LOCNESS as the native-speaker control corpus. Their main source for adverbials realizing boosters and the other stance markers they investigated, namely hedges and attitude markers, was the list drawn up by Hyland (2005). However,

as

the

focus

of

Hyland’s

list

is

on

academic

discourse,

they

also

enriched this with elements taken from Biber and Finegan’s (1989) model to account

for

attitudinal

stance,

i.e.

markers

expressing

the

author’s

personal

feeling and attitude such as happily and sadly, which are not normally found in academic discourse. As in the Hyland and Milton study, this is a corpus-based study with items sourced from two previous corpus research studies. Another study on stance

in argumentative

essay writing is

that

by Neff

et

al.

(2003)

examining stance across five EFL groups from the ICLE suite of corpora: French, Italian

and

Spanish

writers

representing

Romance

languages

and

Dutch

and

German writers, with LOCNESS as the native reference corpus. Specifically, their

Learner Corpus Research in EAP: Some key issues and future pathways 51

research

focuses

reporting

on

verbs,

modal

e.g.

(can,

verbs

suggest,

argue.

could, may, might and must) and nine

This

research

can

be

viewed

as

both

corpus-based in that the modal verbs are derived from the literature on modal verbs,

and

also

corpus-driven

as

the

nine

reporting

verbs

selected

for

investigation are derived from the learner corpus. Two studies are reported which are purely corpus-driven, but unlike the three previous

ones

their

focus

is

not

on

argumentative

essay

writing,

but

rather

subject courses involving the use of stance. Wharton (2012) reports on stance options

in

a

rhetorically-based

task,

that

of

data

description

in

NNS

undergraduate writing in the discipline of Statistics. What is of interest is that her learner corpus consists of 40 student texts, just 4,705 words in total, making for a very fine-grained analysis (see Section 2.1). Such a small, very specialized corpus

would

thus

be

amenable

to

a

corpus-driven

approach,

which

was

conducted inductively via examination and re-examination of texts, aided by the Nvivo software to aid organization and cross-classification of features. Through this detailed analysis Wharton drew up the following five categories of common stances in assertions: bare, hedged, vague, boosted and reader-inclusive. Another corpus-driven study is that by Hewings and Hewings’ (2002) on anticipatory it and extraposed subject (e.g. ‘It is interesting to note that no solution is offered’) in

business

writing.

Their

student

corpus

consisted

of

15

MBA

student

dissertations (123,633 words) written by non-native speakers at the University of Birmingham. The comparable corpus consisted of 28 papers (203,389 words) from three different journals in the field of Business Studies. From an initial frequency

list

the

researchers

first

excluded

it-clauses

that

presented

propositional content and those with a text-organising role. They subsequently derived the following categories involving anticipatory ‘it’ from their data-driven analysis: hedges, attitude markers, emphatics, and attribution. It is interesting to note

that

while

these

two

aforementioned

studies

are

corpus-driven

the

categories thus derived are, in fact, quite similar to those from more corpus-based studies.

And

indeed

categories

such

as

indicating

that

in

as

McEnery

nouns, the

hedges

et

al.

etc.

corpus-driven

(2006)

have

not

approach

have been

pointed

out,

completely

linguists

are

traditional abandoned,

still

applying

intuitions based on traditional grammatical terms, in line with Popper’s (1963) view that there exists no completely theory-free observation.

52 Lynne Flowerdew

2.3. Explanations for interlanguage features Barlow (2005: 343) usefully presents a list of factors accounting for interlanguage features, as outlined below.

L1 transfer Some

forms

of

grammatical

patterns

found

in

the

learner’s

language

production may result from the intrusion of L1.

General learner strategies To help deal with the complex task of speaking or writing in a second language, the learner may adopt some coping strategies such as the use of L1 forms, circumlocution, avoidance strategies etc.

Paths of interlanguage development Some paths of interlanguage, such as the development of negation or the development of tense/aspect marking proceed in a series of stages which may be tracked using longitudinal studies of learner output.

Intralingual overgeneralization Some features of the learner’s language may be due to overgeneralization of an aspect of L2 grammar such as the use of

–ed

to mark past tense.

Input bias The

form

of

the

learner’s

production

may

reflect

the

particular

input

received, such as the language used in coursebooks (see Römer 2004)

Genre/ register influences Researchers working with learner corpora have suggested that the writing of L2 learners contains a variety of informal patterns

I now take some of the studies discussed in the previous sections on size of the learner corpus and corpus-based vs. corpus-driven enquiries together with a few other studies to illustrate some of the factors listed above. Of note is that one

generally

finds

that

analysts

are

explanations for interlanguage features.

quite

cautious

in

putting

forward

Learner Corpus Research in EAP: Some key issues and future pathways 53

The studies on stance in learner corpora are of particular interest as different reasons

have

been

put

forward

for

learners’

infelicities.

In

the

study

on

anticipatory ‘it’ by Hewings and Hewings (2002) learners were found to make less use of it-clauses in hedging, but greater use of it-clauses in the other three categories,

namely

attitude markers,

emphatics,

and attribution,

when

results

were normalized per 1,000 words as the corpora were of unequal size (123,633 words

of

non-native

MBA

dissertation

writing

and

203,389

words

of

expert

journal writing). Hewings and Hewings speculate that this tendency for MBA students to make a more overt effort at persuasion could be accounted for by the readership. They surmise that as MBA students are generally required to analyse some business context to make policy recommendations to a hypothetical professional,

they

recommendations

may in

a

mistakenly forceful

believe

manner.

that

they

Hyland

have

and

to

Milton’s

present (1997)

their study

revealed that Chinese learners also offered inappropriate strong convictions, e.g. ‘As I

know, I am quite sure some parents are willing to pay whenever their

children ask for’ whereas the NS students used epistemic clusters so as not to overstate their degree of commitment, e.g. ‘On balance, it would seem that the only real solution to the problem would be to allow the papers to introduce an efficient self-governing body’ (p. 199). Hyland and Milton offer several tentative explanations for the observation that Chinese students do not moderate their claims

sufficiently,

suggesting

that

this

could

be

due

lack

of

awareness

of

socio-pragmatic language norms or a teaching-induced effect as it is common practice

for Hong Kong students

to

attend tutorial

schools

which emphasise

boosting expressions in their materials. In contrast, research by McEnery and Kifle

(2002)

learners

on

the

overused

argumentative

hedging

writing

expressions,

of

Eritrean

which

the

students

authors

reveals

ascribe

that

to

the

overemphasis on tentative and uncertain language, i.e. hedging devices, in the school textbooks. It goes without saying that errors in modality may well be due to interference from spoken language. However, Aijmer (2002) and Gilquin and Paquot (2008) caution against oversubscribing carry over to writing of speech-like features such as those with high writer visibility, e.g. It seems to me, I think. It may

well

be

that

such

infelicities

are

a

reflection

of

part

of

the

process

of

becoming an expert writer, which also applies to native-speaker students. Neither is the question of L1 transfer as straightforward as it might at first

54 Lynne Flowerdew

appear,

especially

when

one

sub-component

of

ICLE

is

compared

with

the

native-speaker LOCNESS control corpus, as in the study mentioned earlier by Hatzitheodorou and Mattheoudakis (2011) on adverbials for expressing stance. While their results revealed the tendency of Greek learners to be more emphatic and make extensive use of boosters, more evidence is needed to trace this back to the impact of culture on writing style. To this end, they consulted the 1.7 million-word HNC (Hellenic National Corpus), to better substantiate their claim that this style is most likely culturally induced from the Greek, authoritative intellectual

style

of

writing.

Another

study

moving

beyond

an

L1



L2

comparison is that by Gilquin and Paquot (2008) on the overuse of let’s/ let in French. While learners’ errors can be traced back to the L1 as imperatives occur more frequently in written French, it is possible that L1 transfer may not be the sole explanation for the overuse of this feature. Gilquin and Paquot note that other factors must come into play as this imperative form is also used by learners from other mother-tongue backgrounds (e.g. Dutch) which do not make use of first person plural imperative verbs. They thus suggest other reasons for this phenomenon such as teaching-induced and developmental factors. In fact, Paquot advocates an SLA approach to the question of L1 transfer

in learner corpus

studies, specifically through application of Jarvis’ (2000) unified framework of potential L1 influence to achieve more methodological rigor (see Paquot 2010, 2013).

3. Concluding remarks and future pathways The above discussion has outlined three core issues in learner corpus research of EAP writing, which will no doubt receive more attention in future. What does the future promise for this field? In her plenary speech at the Second Learner Corpus Research Conference, Sylviane Granger (2013) outlined several areas for further

consideration,

including

the

compilation

and

analysis

of

more

longitudinal data, the use of more sophisticated statistical procedures, and the collection of more individual metadata. Let us consider how these three aspects of learner corpus research are in train by way of comparison with past practices. Whereas

in

the

past

cross-sectional

learner

corpus

studies

dominated

the

Learner Corpus Research in EAP: Some key issues and future pathways 55

research, there is now more alignment between learner corpus research and SLA with more longitudinal studies being conducted (Granger 2013), as called for earlier (Granger 2009). By way of example is the study on tense and aspect by Meunier

and

Littré

(2013).

This

study

charts

French

language

learners’

development over a three-year period using a dataset of argumentative essays, similar to those in ICLE. The main aim of the study was to determine if errors in the use of tense and aspect decrease over time and how strong the effect of time is. Granger (2013) has also noted how in the past learner corpus data tended to be aggregated, i.e. only the total number of tokens was available to researchers, stating that it would be better practice to consider individual data so that the number of students contributing to the tokens is known. In this respect software devised for the ICCI project is noteworthy as not only are the statistical counts displayed but also the number of student files in which the tokens occurred (see Hong 2012). Let us now consider Granger’s third point. Both Aston (1995) and Widdowson (2003) have commented that corpus research focuses on writing as

product. Learner corpus data is performance-based data which does not account for writing as process. However, an ongoing initiative is that by Eriksson et al. (2012)

to collect

drafts

of postgraduate thesis writing together with feedback

comments from the thesis supervisor and English tutor. This could be seen as a

contribution

towards

collection

of

more

individual

metadata,

thereby

also

incorporating a more ethnographic perspective. Through an examination of some core issues and future directions in learner corpus research, it is hoped that this paper has also showcased the considerable amount of learner corpus research around the world. No doubt the field will continue to flourish and forge new pathways making use of new learner corpora, new software and new paradigms for research.

References Aijmer,

Karin.

2002.

Modality

in

advanced

Swedish

learners’

written

interlanguage. In S. Granger, J. Hung and S. Petch-Tyson, eds., Computer

Learner Corpora, Second Language Acquisition and Foreign Language Teaching, 55-76. Amsterdam: John Benjamins,

56 Lynne Flowerdew

Aijmer,

Karin

(ed.).

Corpora

2009.

and Language

Teaching.

Amsterdam:

John

Benjamins. Amuzie,

Grace

Lee

and

Patti

article use with four

Spinner.

types

of

2013.

Korean

abstract

nouns.

EFL

learners’

indefinite

Applied Linguistics 34.4,

415-434. Aston, Guy. 1995. Corpora in language pedagogy: matching theory and practice. In

B.

Seidlhofer

and

G.

Cook,

eds.,

Principle and Practice in

Applied

Linguistics. Oxford: Oxford University Press, 257-270. Barlow, Michael. 2005. Computer based analyses of learner language. In R. Ellis and G.

Barkhuizen

(eds.),

Analysing Learner Language. Oxford: Oxford

University Press, 335-358. Barnbrook, Geoff, Oliver Mason and Ramesh Krishnamurthy. 2013. Collocation:

Applications and implications. Basingstoke, UK: Palgrave Macmillan. Biber,

Doug.

1990.

Methodological

issues

regarding

corpus-based

analyses

of

linguistic variation. Literary and Linguistic Computing 5.4, 257-269. Biber, Doug and Edward Finegan. 1989. Styles of stance in English: lexical and grammatical marking of evidentiality and affect. Text 9.1, 93-124. Biber, Doug, Stig Johansson, Geoffrey Leech, Susan Conrad and Edward Finegan (eds.).

1999.

Longman Grammar of Spoken and Written English. London:

Longman. Callies, Marcus and Ekaterina Zaytseva. 2013. The Corpus of Academic Learner

English (CALE)



A

new

resource

for

the

study

and

assessment

of

advanced language proficiency. In Granger et al. (eds.), 49-60. Chomsky, Noam. 1986. Knowledge of Language: Its Nature, Origin and Use. New York: Praeger. Chomsky, Noam. 1988. Language and Problems of Knowledge. Cambridge, Mass.: MIT Press. Coates, Jennifer. 1983. The Semantics of Modal Auxiliaries. London: Routledge and Kegan Paul. Dagneaux, Estelle, Sharon Denness, Sylviane Granger and Fanny Meunier. 1996.

Error Tagging Manual Version 1.1. Centre for English Corpus Linguistics. Louvain-la-Neuve: Université catholique de Louvain. Dagneaux, Estelle, Sharon Denness and Sylviane Granger. 1998. Computer-aided error analysis. System 26, 163-174. Díez-Bedmar, Maria Belén and Szilvia Papp. 2008. The use of the English article system by Chinese and Spanish learners. In G. Gilquin, S. Papp and M Díez-Bedmar,

eds.,

Linking

up

Amsterdam: Rodopi, 147-173.

contrastive

and

learner

corpus

research.

Learner Corpus Research in EAP: Some key issues and future pathways 57

Eriksson,

Andreas.

2012.

MUCH:

the

Malmö

academic writing. Poster presented at

University-Chalmers th

10

Corpus

of

International Teaching and

Language Corpus Conference. Warsaw, Poland, 14 July 2012. Flowerdew, Lynne. 2004. The argument for using English specialized corpora to understand Upton,

academic

eds.,

and

Discourse

professional

in

the

language.

Professions:

In

U.

Connor

Perspectives

from

&

T.

Corpus

Linguistics. Amsterdam: John Benjamins, 11-33. Flowerdew,

Lynne.

Corpora

2012.

and

Language

Education.

Basingstoke,

UK:

Palgrave Macmillan. Flowerdew,

Lynne.

forthcoming,

2014.

Learner

corpora

and

language

for

academic and specific purposes. In S. Granger et al. (eds.). Frankenberg-Garcia, Ana, Lynne Flowerdew and Guy Aston (eds.). 2011. New

Trends in Corpora and Language Learning. London: Continuum. Gilquin,

Gaëtanelle

and

Magali

Paquot.

2008.

Too

chatty:

learner

academic

writing and register variation. English Text Construction 1.1,41-61. Granger, Sylviane. 1996. From CA to CIA and back: an integrated approach to computerized and bilingual corpora. In K. Aijmer, B. Altenberg and S. Johansson

(eds.),

Languages in Contrast.

Lund:

A bird’s

learner

Lund

University

Press,

37-51. Granger,

Sylviane.

2002.

eye

view of

corpus

research.

In

S.

Granger et al., (eds.), 3-33. Granger, Sylviane. 2009. The contribution of learner corpora to second language acquisition and foreign language teaching: a critical evaluation’, In K. Aijmer (ed.), 13-32. Granger,

Sylviane.

Plenary

2013.

speech

Contrastive

delivered

at

interlanguage the

Second

analysis:

Learner

A

reappraisal.

Corpus

Research

Conference. Bergen, Norway, 27 September 2013. Granger, Sylviane, Gaëtanelle Gilquin and Fanny Meunier. (eds.), (forthcoming 2014)

Cambridge

Handbook

of

Learner

Corpus

Research.

Cambridge:

Cambridge University Press. Granger, Sylvane, Joseph Hung and Sylvie Petch-Tyson (eds.). 2002. Computer

Learner Corpora, Second Language Acquisition and Foreign Language Teaching. Amsterdam: John Benjamins. Granger,

Sylviane

and

Magali

Paquot.

2013.

Language

for

specific

purposes

learner corpora. In C. Chapelle, ed., The Encyclopedia of Applied Linguistics. Oxford, UK: Wiley-Blackwell, 3142-3146. Groom,

Nicholas. language

2009.

Effects

collocational

of

second

development.

language In

A.

immersion

Barfield

and

on H.

second

Gyllstad,

58 Lynne Flowerdew

(eds.),

Researching

Collocations

in

another

Language.

Basingstoke,

UK:

Palgrave Macmillan, 21-33. Hatzitheodorou, Anna-Maria and Marina Mattheoudakis. 2011. The impact of culture on the use of stance exponents as persuasive devices: The case of GRICLE and English native-speaker corpora. In A. Frankenberg-Garcia et al. (eds.), 229-246. Hewings, Martin and Ann Hewings. 2002. “It is interesting to note that

…”:

a

comparative study of anticipatory ‘it’ in student and published writing.

English for Specific Purposes, 21.4, 367-383. Holmes, Janet. 1983. Speaking English with the appropriate degree of conviction. In C. Brumfit, ed., Learning and Teaching Languages for Communication.

Applied Linguistic Perspective. London: Center for Language Teaching and Research, 100-113. Hong, Huaqing. 2012. Compilation and exploration of ICCI corpus for learner language research. In Tono et al., (eds.), 47-61. Hyland, Ken. 1999. Disciplinary discourses: writer stance in research articles. In C. Candlin and K. Hyland, (eds,) Writing: Texts, Processes and Practices, London: Longman, 99-121. Hyland, Ken. 2005. Metadiscourse. London: Continuum. Hyland, Ken and John Milton. 1997. Qualification and certainty in L1 and L2 students’ writing. Journal of Second Language Writing 6.2, 183-205. Jarvis, Scott. 2000. Methodological rigor in the study of transfer: identifying the L1 influence in the interlanguage lexicon. Language Learning 50.2, 245-309. Johansson, Stig. 1991. Times change, and so do corpora. In K. Aijmer and B. Altenberg (eds.) English Corpus Linguistics. London: Longman, 305-314. Lee, David and Sylvia Chen. 2009. Making a bigger deal of the smaller words: function

words

and

other

key

items

in

research

writing

by

Chinese

learners. Journal of Second Language Writing 18.4: 281-96. Leńko-Szymańska, Agnieszka. 2012. The role of conventionalized language in the acquisition and use of articles by Polish ELF learners. In Y. Tono et al., (eds.), 83-103. Master,

Peter.

1997.

The

English

article

system:

acquisition,

function

and

pedagogy. System 25, 215-232. Mattheoudakis,

Marina

and

Anna-Maria

Hatzitheodorou.

2012.

The

lexical

patterning of light verbs in GRICLE and native corpora: a comparative corpus-based study. In J. Thomas and A. Boulton, eds., Input, Process and

Product.

Developments

in

Teaching

and

Language

University Press: Brno, Czech Republic, 213-228.

Corpora.

Masaryk

Learner Corpus Research in EAP: Some key issues and future pathways 59

McEnery, Tony and Nazareth Kifle. 2002. Epistemic modality in argumentative essays

of

second-language

writers.

In

J.

Flowerdew

(ed.),

Academic

Discourse. London: Longman, 182-95. nd

McEnery, Tony and Andrew Wilson. 2001. Corpus Linguistics 2

edn. Edinburgh:

Edinburgh University Press. McEnery,

Tony,

Richard

Xiao

and

Yukio

Tono.

2006.

Corpus-Based Language

Studies. London: Routledge. Meunier, Fanny, Sylvie De Cock, Gaëtanelle Gilquin and Magali Paquot (eds.). 2011. A Taste for Corpora. In Honour of Sylviane Granger. Amsterdam: John Benjamins. Meunier, Fanny and Damien Littré. 2013. Tracking learners’ progress: adopting a dual ‘corpus cum experimental data’ approach. The Modern Language

Journal 97, 61-76. Narita, Masumi. 2013. The use of articles in Japanese EFL learners’ essays. In S.

Granger,

G.

Gilquin

and F.

Meunier,

eds.,

Twenty Years of Learner

Corpus Research: Looking back, moving ahead. Louvain-la-Neuve: Presses universitaires de Louvain, 357-366. Neff, JoAnn, Emma Dafouz, Honesto Herrera, Francisco Martinez and Juan Rica. 2003. Contrasting learner corpora: the use of modal and reporting verbs in the expression of writer stance. In S. Granger and S. Petch-Tyson eds.,

Extending the Scope of Corpus-based Research. Amsterdam: Rodopi, 211-230. Nesselhauf,

Nadia.

2005.

Collocations

in

a

Learner

Corpus.

Amsterdam:

John

Benjamins. Paquot,

Magali.

2010.

Academic

Vocabulary

in

Learner

Writing.

London:

Continuum. Paquot, Magali. 2013. Lexical bundles and L1 transfer effects. International Journal

of Corpus Linguistics 18.3,391-417. Popper, Karl. 1963. Conjectures and Refutations: the Growth of Scientific Knowledge. London: Routledge and Kegan Paul. Quirk, Randolph, Stanley Greenbaum, Geoffrey Leech and Jan Svartvik. 1985. A

Comprehensive Grammar of the English Language. London: Longman. Römer,

Ute.

2004.

A

corpus-driven

approach

to

modal

auxiliaries

and

their

didactics. In J. McH. Sinclair, ed., How to use Corpora in Language Teaching. Amsterdam: John Benjamins. Tognini

Bonelli,

Elena.

2001.

Corpus

Linguistics

at

Work.

Amsterdam:

John

Benjamins. Tono, Yukio, Yuji Kawaguchi and Makoto Minegishi (eds.). 2012. Developmental and

Crosslinguistic

Perspectives

in

Learner

Corpus

Research.

60 Lynne Flowerdew

Amsterdam: John Benjamins. Wharton,

Sue.

2012.

description

Epistemological

task:

Findings

and

from

a

interpersonal

stance

discipline-specific

in

learner

a

data

corpus.

English for Specific Purposes 31, 261-270. Widdowson, Henry. 2003. Defining Issues in English Language Teaching. Oxford: Oxford University Press.

Center for Language Education (formerly) Hong Kong University of Science and Technology Clear Water Bay Road, Hong Kong [email protected]

Received: November 15, 2013 Reviewed: December 10, 2013 Accepted: July 11, 2014