English Language and Linguistics 20(2) 43
Learner Corpus Research in EAP: Some key issues and future pathways Lynne Flowerdew (Hong Kong University of Science and Technology) Flowerdew, Lynne. 2014. Learner Corpus Research in EAP: Some key issues and future pathways. English Language and Linguistics 20.2, 43-60. Learner corpus research is now almost twenty-five years old. While in the 1990s learner corpus research
was
mostly
directed
towards
English
for
general
academic
purposes
(EGAP), i.e. argumentative essay writing, more emphasis is now being given to English for specific academic purposes (ESAP) writing. This article discusses three core
issues,
namely
the
optimum
size
of
the
corpus,
whether
the
study
is
corpus-based or corpus-driven and the interpretation of the corpus data, with respect to key learner corpus studies of EGAP and ESAP writing. These three issues, which have been much discussed in the literature on corpus linguistics in general, have been accorded less attention in learner corpus research and therefore merit further reflection and discussion. The final section of the paper maps out some ongoing
and
future
avenues
for
learner
corpus
research.
These
include
the
compilation and analysis of more longitudinal data, the use of more sophisticated statistical procedures, and the collection of more individual metadata.
Key words:
learner
corpus,
academic
writing,
corpus
size,
corpus-driven,
corpus-based, interlanguage
1. Introduction Learner corpus research in EAP was first launched in the 1990s. An extensive learner corpus project, the International Corpus of Learner English (ICLE), was initiated by Sylviane Granger at the University of Louvain, Belgium, in 1990. ICLE consists of sub-corpora of academic argumentative essays written in English by French, German, Polish, Greek etc. advanced learners. In the early 1990s, a learner corpus comprising academic writing was established by John Milton at HKUST (Hong Kong University of Science and Technology). A defining feature of most L2 learner corpus research is that it is of a comparative nature and benchmarked
against
native-speaker
corpora.
This
is
termed
the
contrastive
44 Lynne Flowerdew
interlanguage analysis (CIA) approach. One aspect of CIA compares L1 and L2; another branch of CIA compares learner language produced by learners with different L1s (Granger 1996). To facilitate this type of interlanguage comparison, Granger and her colleagues compiled the Louvain Corpus of
Native English
Essays (LOCNESS) according to very specific design criteria so that the L1 and L2 learner corpora were as closely matched as possible in terms of task-type, task length etc. While
in
directed
the
early
towards
days
of
English
learner
for
corpus
general
research,
academic
attention
purposes
was
mostly
(EGAP),
i.e.
argumentative essay writing, more emphasis is now being given to English for specific
academic
progress:
the
purposes
compilation
(ESAP)
of
the
writing
Varieties
with of
two
English
large-scale for
projects
Specific
in
Purposes
dAtabase (VESPA) by Magali Paquot at the University of Louvain (see Granger and Paquot 2013) and the Corpus of Academic Learner English (CALE) by Callies and Zaytseva (2013) at the University of Bremen (see Flowerdew, forthcoming 2014). Learner corpus research is gaining increasing recognition, as witnessed by the establishment of a bi-annual learner corpus research conference in 2011, a forthcoming handbook of learner corpus research (Granger et al. 2014) and the very
recent
establishment
of
a
learner
corpus
association
in
September
2013
(www.learnercorpusassociation.org) with the facility of a searchable bibliography. Learner corpus research is thus a thriving area. However, some core issues, namely the optimum size of the corpus, whether the study is corpus-based or corpus-driven and the interpretation of the corpus data, points which have been much discussed in the literature on corpus linguistics in general, seem to have been accorded less attention in learner corpus research. My aim in this paper is thus to discuss these three aforementioned issues with specific reference to the theoretical and methodological underpinnings of key studies in the literature on EAP writing. In the final section of the paper I will briefly map out some ongoing and future avenues for learner corpus research.
Learner Corpus Research in EAP: Some key issues and future pathways 45
2. Core issues in learner corpus research 2.1. Size of learner corpus One
key
issue
concerns
the
size
of
the
corpus,
a
question
that
is
also
of
paramount importance in corpus analysis in general. As a general rule, more broad-based,
quantitative
studies
make
use
of
corpora
of
around
500,000
to
one-million words. However, when qualitative methods are employed, smaller corpora, ranging from 50,000 to 150,000 words, tend to be used with fewer items examined. Moreover, the size of the corpus can also be a reflection of a particular theoretical stance, with collocational analysis being a case in point as this can be examined from either a textual or statistical perspective (see Barnbrook et al. 2013 for an extensive overview of collocation), as illustrated in the following two studies by Nesselhauf (2005) and Groom (2009). Nesselhauf’s theoretical
(2005)
model
phenomenon
on
approach
of the
to
collocation
phraseology grounds
that
analysis
which
views
certain
collocations
is
underpinned
collocation may
as only
a
by
a
textual
show
up
infrequently in a corpus and may be subject to arbitrary variation. Her research involved a qualitative study of verb-noun collocations, specifically support verb constructions, often referred to as ‘delexical’, i.e. those collocations in which the bulk of the meaning is carried by the noun, e.g. make an arrangement. For her study she used the German sub-component of ICLE, comprising 150,000 words of 300 argumentative essays to examine four verbs delexical verbs (make, have,
take,
give).
All
automatically.
the
Then,
verb
forms
of
these
four
those
which
occurred in
a
verbs
were
verb-noun
first
extracted
combination
were
selected on the basis of a manual analysis. As Nesselhauf (2005: 113) points out, though, this manual approach cannot capture instances in which another verb is used instead of a support verb construction (notice instead of take notice). Nor can this approach detect a verb erroneously used (e.g. *do* a construction) other than the four investigated (see Mattheoudakis and Hatzitheodorou (2012) for a small-scale study on the patterning of this type of verb in the Greek component of ICLE). In contrast, Groom (2009) argues for the identification of collocation through purely statistical means, for which a larger corpus than that used by Nesselhauf
would be necessary.
Groom (2009)
made use of the
considerably
46 Lynne Flowerdew
larger
1,221,265-word
Uppsala
Student
English
Corpus
consisting
of
undergraduate student essays for his study on collocate analysis and used two association
measures,
the
t-score
and
MI
score,
for
calculating
statistically
significant co-occurrences. It goes without saying that the size of the corpus is also very much related to
the
phenomenon
under
investigation.
McEnery
and
Wilson
(2001)
have
pointed out that the lower the frequency of the feature one wishes to investigate, the larger the corpus should be. This would apply to nouns, adjectives, adverbs, etc.
(i.e.
content
grammatical smaller
words)
words
corpora
in
can
which
any
be
tend
given
used
for
to
have
corpus.
a
much
Conversely,
investigating
the
lower it
more
can
frequency be
than
argued
common
that
features
of
language, such as grammatical items, as, indeed, Biber (1990) has argued. Smaller corpora would thus seem to be well suited to the study of article usage, a high frequency phenomenon and of great significance as it is at the interface of syntax, semantics and pragmatics and is an area which poses enormous problems for learners, even very advanced ones. However, somewhat surprisingly, there are very few studies of learner corpora investigating article usage. Why should this be so? While
grammatical
items
can
easily
be
retrieved
via
part-of-speech
(POS)
taggers which automatically assign grammatical tags to every word in a corpus, grammatical errors in learner corpora are more difficult to retrieve. One common error tagging system is that produced to accompany the suite of ICLE corpora (see Dagneaux et al. 1996, 1998); however, manual annotation is required, which would be demanding on the analyst. It is therefore not surprising to find that learner corpora used for investigation of article use are quite small, in part due to the demands of hand tagging errors. One such study is that by Narita (2013), who examined article use in a corpus of
10,721
words
comprising
61
argumentative
essays
written
by
Japanese
students in their first-year at university. One of the main aims of her study was to examine three types of errors, omission, redundancy and substitution, which necessitated manual annotation of all the essays with descriptors of article errors. Another study on the use of the article system is that by Díez-Bedmar and Papp (2008). These researchers carried out a contrastive interlanguage analysis of the English article system in two comparable learner corpora, Chinese-English and
Learner Corpus Research in EAP: Some key issues and future pathways 47
Spanish-English, consisting of about 40,000 words each. Of interest is that these researchers devised their own tagging system based on a synthesis of previous taxonomies of the English article system accounting for syntactic and functional uses
centred
on
definiteness
and
specificity.
The
corpora
used
in
these
two
studies are small as the analyses on article usage are fine-grained and size has to be balanced against the level of delicacy of the investigation. In contrast, the research by Leńko-Szymańska (2012) on the use of articles in the argumentative essays of Polish learners was based on a much larger corpus of 230,065 words, comprising 363 essays from the Polish component of ICLE and also
the
56,821-word
Polish
component
of
the
International
Corpus
of
Crosslinguistic Interlanguage (ICCI) of secondary school writing (see Tono et al. 2012 for an overview of this project). In this case, the two learner corpora were not tagged for article errors as the main aim of the study was to examine articles in
conventionalized
language
in
the
form of
three-word
lexical
bundles
and
compare these with the bundles found in native writing. However, at the initial stage of the analysis the frequencies of the definite and indefinite articles were first computed to compare frequencies across different levels of students. While the use of the zero article was not addressed in this study (no tagging was undertaken) Leńko-Szymańska extrapolates from her results that the fact that lower proficiency students underused the and a/ an implies that they overused the zero article, which has been found in previous research to dominate the language production at lower levels of proficiency (Master 1997). The study by Lee and Chen (2009) specifically looks at the phraseology of the in the Chinese Academic
Written
English
(CAWE)
corpus,
comprising
407,960
words
of
78
undergraduate linguistics/applied linguistics dissertations. The was singled out for analysis as the initial keyword analysis showed it to be significantly overused with students appearing not to have mastered the use of plural nouns for making
…
general statements (e.g., Teachers should
instead of The teachers should).
The paucity of studies on article use in learner corpora could well be explained by the challenges of manual tagging a learner corpus, which, by necessity poses limits on the corpus size, for what is a very frequent and complex item. Also, it
should
not
investigation,
be as
forgotten
noted
by
that
corpus
Granger
tools
(2002),
are
but
reiterating
one methodology
Johansson’s
(1991:
for 313)
dictum that ‘The corpus remains one of the linguist’s tools, to be used together
48 Lynne Flowerdew
with
introspection
and
elicitation
techniques’.
A
case
in
point
here
is
the
enlightening research by Amuzie and Spinner (2013) on Korean EFL learners’ indefinite article use with four types of abstract nouns. They administered a forced-choice elicitation test of 48 target items to fifty Korean intermediate-level students studying at a university in Korea. Of note is that the researchers also conducted a corpus analysis using the 100-million word British National Corpus (BNC) for each of the four target nouns under investigation in order to shed light on their analysis of learner data and to gain a better understanding of the input learners are exposed to. In sum, the size of a learner corpus is highly dependent on a number of factors, such as the theoretical perspective of the analyst, the type of item under investigation, the level of delicacy of the investigation and whether the corpus is manually tagged or items are retrieved automatically. All these factors are of relevance for specialized corpora (see Flowerdew 2004), a classification usually assigned to learner corpora.
2.2. Corpus-based vs. corpus-driven enquiries Another
consideration
is
whether
the
investigation
is
corpus-based
or
corpus-driven, an issue much discussed in the corpus linguistics literature but only touched on in learner corpus studies. In the corpus-based approach the items for investigation are derived from a pre-defined list, sometimes drawn from a reference grammar underpinned by a particular linguistic theory but, more often than not, from previous corpus-based results. In the corpus-driven approach, searches,
on in
the
other
which
case
hand, the
the
linguistic
corpus
would
categories not
arise
normally
from be
the
corpus
annotated
for
part-of-speech, in order to allow for new insights on linguistic patterning, and new theories of language, unobscured by pre-conceived categories (see Tognini Bonelli 2001). A related issue here is if, as in the corpus-based approach, the items for investigation are drawn from exponents listed in a reference grammar, it is important to know the kind of grammar. Why this is important relates to Chomsky’s
(1986,
1988)
concept
of
language
as
‘E-language’,
‘Externalised
language’ (i.e. language description of performance data, which are instances of attestations
found
in
corpora)
or
‘I-language’,
‘Internalised
language’
(i.e.
Learner Corpus Research in EAP: Some key issues and future pathways 49
language which is competence-based and not necessarily attested but possible). As Widdowson (2003: 79) has pointed out, this difference can be exemplified in two standard reference grammars: A Comprehensive Grammar of the English
Language (Quirk et al. 1985) and Longman Grammar of Spoken and Written English (LGSWE)
(Biber
et
al.
1999).
The
former
reference
grammar
is
‘essentially
descriptive of what the grammarian authors know of English as representative users,
and what
they know,
as
grammarians’,
with some
of
the information
checked out in the Survey of English Usage (SEU) Corpus, but not determined by corpus data. By way of contrast, the grammatical descriptions in the LGSWE are
based
on
patterns
of
structure
and
use
from
a
40-million-word
corpus
spanning four key text-types (conversation, fiction, news reportage and academic prose). It would thus seem advisable to consult both types of grammars, and indeed
other
sources,
for
compiling
pre-defined
lists
in
corpus-based
investigations to achieve maximum comprehensiveness (see Flowerdew 2012 for more discussion on this point). To illustrate the corpus-based and corpus-driven approaches, I discuss several studies covering the area of epistemic modality and stance. In the main, these studies
focus
on
argumentative
essay
writing.
Given
that
the
formation
of
argumentation is governed by epistemic modality, i.e. the level of commitment of writers
to the content of their text,
it is not surprising that this
area has
attracted a lot of attention in learner academic writing. Epistemic modality can be realized by verbs, adverbs, lexical verbs, adjectives and nouns and overlaps with
Hyland’s
(1999)
stance
markers
of
hedges,
boosters,
self-mentions
and
attitude markers. In brief, hedges are devices such as possible, might and perhaps, which
allow
information
to
be
presented
as
opinion
rather
than
fact,
thus
implying that a statement is based on the writer’s plausible reasoning. On the other hand,
such as clearly and demonstrate emphasize certainty and
boosters
convey an assured tone. Likewise, self-mentions (I, we) can also lend weight to a writer’s particular stance and give them an authorial voice. Attitude markers, which
indicate
the
writer’s
affective
rather
than
epistemic
attitude,
convey
surprise, agreement, importance etc. and are typically realized by attitude verbs (e.g.
agree),
sentence
adverbs
(e.g.
unfortunately)
and
adjectives
(remarkable).
Research on learner corpora has uncovered just how difficult it is for students to master the complex interplay between hedges and boosters in argumentative
50 Lynne Flowerdew
essay writing. One of the first studies to examine epistemic modality, specifically hedges and boosters for expressing doubt and certainty, was that by Hyland and Milton (1997). The two corpora used in the study consisted of about 500,000 words each. The learner corpus consisted of 150 essays written by Hong Kong students for their
A
level
“Use
of
English”
exam
across
all
ability
bands,
while
the
native-speaker corpus comprised A Level scripts written by British school leavers of similar age and education level as the Hong Kong learners. Of note is that the selection of expressions for investigation was based on a variety of sources. The main source was Holmes’ (1983) analysis of the ‘learned’ sections of the Brown
and
LOB
(Lancaster/Oslo-Bergen)
corpora
of
written
English,
supplemented by the research literature on modality (Coates 1983) and reference grammars (Quirk et al. 1985). From these sources an inventory of 75 of the most frequently occurring epistemic lexical items in native-speaker academic writing was produced. Fifty sentences containing each of these items (if there were 50 occurrences) were extracted from each grade in both the learner and NS corpus for detailed investigation at the sentence level. While the study itself is essentially corpus-based, the inventory of 75 items for investigation is based on sources of language constituting both ‘I-language’ and ‘E-language’. Around 15 years after the Hyland and Milton 1997 study, Hatzitheodorou and Mattheoudakis (2011) conducted a study on the use of stance markers in GRICLE, the Greek component of ICLE using LOCNESS as the native-speaker control corpus. Their main source for adverbials realizing boosters and the other stance markers they investigated, namely hedges and attitude markers, was the list drawn up by Hyland (2005). However,
as
the
focus
of
Hyland’s
list
is
on
academic
discourse,
they
also
enriched this with elements taken from Biber and Finegan’s (1989) model to account
for
attitudinal
stance,
i.e.
markers
expressing
the
author’s
personal
feeling and attitude such as happily and sadly, which are not normally found in academic discourse. As in the Hyland and Milton study, this is a corpus-based study with items sourced from two previous corpus research studies. Another study on stance
in argumentative
essay writing is
that
by Neff
et
al.
(2003)
examining stance across five EFL groups from the ICLE suite of corpora: French, Italian
and
Spanish
writers
representing
Romance
languages
and
Dutch
and
German writers, with LOCNESS as the native reference corpus. Specifically, their
Learner Corpus Research in EAP: Some key issues and future pathways 51
research
focuses
reporting
on
verbs,
modal
e.g.
(can,
verbs
suggest,
argue.
could, may, might and must) and nine
This
research
can
be
viewed
as
both
corpus-based in that the modal verbs are derived from the literature on modal verbs,
and
also
corpus-driven
as
the
nine
reporting
verbs
selected
for
investigation are derived from the learner corpus. Two studies are reported which are purely corpus-driven, but unlike the three previous
ones
their
focus
is
not
on
argumentative
essay
writing,
but
rather
subject courses involving the use of stance. Wharton (2012) reports on stance options
in
a
rhetorically-based
task,
that
of
data
description
in
NNS
undergraduate writing in the discipline of Statistics. What is of interest is that her learner corpus consists of 40 student texts, just 4,705 words in total, making for a very fine-grained analysis (see Section 2.1). Such a small, very specialized corpus
would
thus
be
amenable
to
a
corpus-driven
approach,
which
was
conducted inductively via examination and re-examination of texts, aided by the Nvivo software to aid organization and cross-classification of features. Through this detailed analysis Wharton drew up the following five categories of common stances in assertions: bare, hedged, vague, boosted and reader-inclusive. Another corpus-driven study is that by Hewings and Hewings’ (2002) on anticipatory it and extraposed subject (e.g. ‘It is interesting to note that no solution is offered’) in
business
writing.
Their
student
corpus
consisted
of
15
MBA
student
dissertations (123,633 words) written by non-native speakers at the University of Birmingham. The comparable corpus consisted of 28 papers (203,389 words) from three different journals in the field of Business Studies. From an initial frequency
list
the
researchers
first
excluded
it-clauses
that
presented
propositional content and those with a text-organising role. They subsequently derived the following categories involving anticipatory ‘it’ from their data-driven analysis: hedges, attitude markers, emphatics, and attribution. It is interesting to note
that
while
these
two
aforementioned
studies
are
corpus-driven
the
categories thus derived are, in fact, quite similar to those from more corpus-based studies.
And
indeed
categories
such
as
indicating
that
in
as
McEnery
nouns, the
hedges
et
al.
etc.
corpus-driven
(2006)
have
not
approach
have been
pointed
out,
completely
linguists
are
traditional abandoned,
still
applying
intuitions based on traditional grammatical terms, in line with Popper’s (1963) view that there exists no completely theory-free observation.
52 Lynne Flowerdew
2.3. Explanations for interlanguage features Barlow (2005: 343) usefully presents a list of factors accounting for interlanguage features, as outlined below.
L1 transfer Some
forms
of
grammatical
patterns
found
in
the
learner’s
language
production may result from the intrusion of L1.
General learner strategies To help deal with the complex task of speaking or writing in a second language, the learner may adopt some coping strategies such as the use of L1 forms, circumlocution, avoidance strategies etc.
Paths of interlanguage development Some paths of interlanguage, such as the development of negation or the development of tense/aspect marking proceed in a series of stages which may be tracked using longitudinal studies of learner output.
Intralingual overgeneralization Some features of the learner’s language may be due to overgeneralization of an aspect of L2 grammar such as the use of
–ed
to mark past tense.
Input bias The
form
of
the
learner’s
production
may
reflect
the
particular
input
received, such as the language used in coursebooks (see Römer 2004)
Genre/ register influences Researchers working with learner corpora have suggested that the writing of L2 learners contains a variety of informal patterns
I now take some of the studies discussed in the previous sections on size of the learner corpus and corpus-based vs. corpus-driven enquiries together with a few other studies to illustrate some of the factors listed above. Of note is that one
generally
finds
that
analysts
are
explanations for interlanguage features.
quite
cautious
in
putting
forward
Learner Corpus Research in EAP: Some key issues and future pathways 53
The studies on stance in learner corpora are of particular interest as different reasons
have
been
put
forward
for
learners’
infelicities.
In
the
study
on
anticipatory ‘it’ by Hewings and Hewings (2002) learners were found to make less use of it-clauses in hedging, but greater use of it-clauses in the other three categories,
namely
attitude markers,
emphatics,
and attribution,
when
results
were normalized per 1,000 words as the corpora were of unequal size (123,633 words
of
non-native
MBA
dissertation
writing
and
203,389
words
of
expert
journal writing). Hewings and Hewings speculate that this tendency for MBA students to make a more overt effort at persuasion could be accounted for by the readership. They surmise that as MBA students are generally required to analyse some business context to make policy recommendations to a hypothetical professional,
they
recommendations
may in
a
mistakenly forceful
believe
manner.
that
they
Hyland
have
and
to
Milton’s
present (1997)
their study
revealed that Chinese learners also offered inappropriate strong convictions, e.g. ‘As I
know, I am quite sure some parents are willing to pay whenever their
children ask for’ whereas the NS students used epistemic clusters so as not to overstate their degree of commitment, e.g. ‘On balance, it would seem that the only real solution to the problem would be to allow the papers to introduce an efficient self-governing body’ (p. 199). Hyland and Milton offer several tentative explanations for the observation that Chinese students do not moderate their claims
sufficiently,
suggesting
that
this
could
be
due
lack
of
awareness
of
socio-pragmatic language norms or a teaching-induced effect as it is common practice
for Hong Kong students
to
attend tutorial
schools
which emphasise
boosting expressions in their materials. In contrast, research by McEnery and Kifle
(2002)
learners
on
the
overused
argumentative
hedging
writing
expressions,
of
Eritrean
which
the
students
authors
reveals
ascribe
that
to
the
overemphasis on tentative and uncertain language, i.e. hedging devices, in the school textbooks. It goes without saying that errors in modality may well be due to interference from spoken language. However, Aijmer (2002) and Gilquin and Paquot (2008) caution against oversubscribing carry over to writing of speech-like features such as those with high writer visibility, e.g. It seems to me, I think. It may
well
be
that
such
infelicities
are
a
reflection
of
part
of
the
process
of
becoming an expert writer, which also applies to native-speaker students. Neither is the question of L1 transfer as straightforward as it might at first
54 Lynne Flowerdew
appear,
especially
when
one
sub-component
of
ICLE
is
compared
with
the
native-speaker LOCNESS control corpus, as in the study mentioned earlier by Hatzitheodorou and Mattheoudakis (2011) on adverbials for expressing stance. While their results revealed the tendency of Greek learners to be more emphatic and make extensive use of boosters, more evidence is needed to trace this back to the impact of culture on writing style. To this end, they consulted the 1.7 million-word HNC (Hellenic National Corpus), to better substantiate their claim that this style is most likely culturally induced from the Greek, authoritative intellectual
style
of
writing.
Another
study
moving
beyond
an
L1
–
L2
comparison is that by Gilquin and Paquot (2008) on the overuse of let’s/ let in French. While learners’ errors can be traced back to the L1 as imperatives occur more frequently in written French, it is possible that L1 transfer may not be the sole explanation for the overuse of this feature. Gilquin and Paquot note that other factors must come into play as this imperative form is also used by learners from other mother-tongue backgrounds (e.g. Dutch) which do not make use of first person plural imperative verbs. They thus suggest other reasons for this phenomenon such as teaching-induced and developmental factors. In fact, Paquot advocates an SLA approach to the question of L1 transfer
in learner corpus
studies, specifically through application of Jarvis’ (2000) unified framework of potential L1 influence to achieve more methodological rigor (see Paquot 2010, 2013).
3. Concluding remarks and future pathways The above discussion has outlined three core issues in learner corpus research of EAP writing, which will no doubt receive more attention in future. What does the future promise for this field? In her plenary speech at the Second Learner Corpus Research Conference, Sylviane Granger (2013) outlined several areas for further
consideration,
including
the
compilation
and
analysis
of
more
longitudinal data, the use of more sophisticated statistical procedures, and the collection of more individual metadata. Let us consider how these three aspects of learner corpus research are in train by way of comparison with past practices. Whereas
in
the
past
cross-sectional
learner
corpus
studies
dominated
the
Learner Corpus Research in EAP: Some key issues and future pathways 55
research, there is now more alignment between learner corpus research and SLA with more longitudinal studies being conducted (Granger 2013), as called for earlier (Granger 2009). By way of example is the study on tense and aspect by Meunier
and
Littré
(2013).
This
study
charts
French
language
learners’
development over a three-year period using a dataset of argumentative essays, similar to those in ICLE. The main aim of the study was to determine if errors in the use of tense and aspect decrease over time and how strong the effect of time is. Granger (2013) has also noted how in the past learner corpus data tended to be aggregated, i.e. only the total number of tokens was available to researchers, stating that it would be better practice to consider individual data so that the number of students contributing to the tokens is known. In this respect software devised for the ICCI project is noteworthy as not only are the statistical counts displayed but also the number of student files in which the tokens occurred (see Hong 2012). Let us now consider Granger’s third point. Both Aston (1995) and Widdowson (2003) have commented that corpus research focuses on writing as
product. Learner corpus data is performance-based data which does not account for writing as process. However, an ongoing initiative is that by Eriksson et al. (2012)
to collect
drafts
of postgraduate thesis writing together with feedback
comments from the thesis supervisor and English tutor. This could be seen as a
contribution
towards
collection
of
more
individual
metadata,
thereby
also
incorporating a more ethnographic perspective. Through an examination of some core issues and future directions in learner corpus research, it is hoped that this paper has also showcased the considerable amount of learner corpus research around the world. No doubt the field will continue to flourish and forge new pathways making use of new learner corpora, new software and new paradigms for research.
References Aijmer,
Karin.
2002.
Modality
in
advanced
Swedish
learners’
written
interlanguage. In S. Granger, J. Hung and S. Petch-Tyson, eds., Computer
Learner Corpora, Second Language Acquisition and Foreign Language Teaching, 55-76. Amsterdam: John Benjamins,
56 Lynne Flowerdew
Aijmer,
Karin
(ed.).
Corpora
2009.
and Language
Teaching.
Amsterdam:
John
Benjamins. Amuzie,
Grace
Lee
and
Patti
article use with four
Spinner.
types
of
2013.
Korean
abstract
nouns.
EFL
learners’
indefinite
Applied Linguistics 34.4,
415-434. Aston, Guy. 1995. Corpora in language pedagogy: matching theory and practice. In
B.
Seidlhofer
and
G.
Cook,
eds.,
Principle and Practice in
Applied
Linguistics. Oxford: Oxford University Press, 257-270. Barlow, Michael. 2005. Computer based analyses of learner language. In R. Ellis and G.
Barkhuizen
(eds.),
Analysing Learner Language. Oxford: Oxford
University Press, 335-358. Barnbrook, Geoff, Oliver Mason and Ramesh Krishnamurthy. 2013. Collocation:
Applications and implications. Basingstoke, UK: Palgrave Macmillan. Biber,
Doug.
1990.
Methodological
issues
regarding
corpus-based
analyses
of
linguistic variation. Literary and Linguistic Computing 5.4, 257-269. Biber, Doug and Edward Finegan. 1989. Styles of stance in English: lexical and grammatical marking of evidentiality and affect. Text 9.1, 93-124. Biber, Doug, Stig Johansson, Geoffrey Leech, Susan Conrad and Edward Finegan (eds.).
1999.
Longman Grammar of Spoken and Written English. London:
Longman. Callies, Marcus and Ekaterina Zaytseva. 2013. The Corpus of Academic Learner
English (CALE)
–
A
new
resource
for
the
study
and
assessment
of
advanced language proficiency. In Granger et al. (eds.), 49-60. Chomsky, Noam. 1986. Knowledge of Language: Its Nature, Origin and Use. New York: Praeger. Chomsky, Noam. 1988. Language and Problems of Knowledge. Cambridge, Mass.: MIT Press. Coates, Jennifer. 1983. The Semantics of Modal Auxiliaries. London: Routledge and Kegan Paul. Dagneaux, Estelle, Sharon Denness, Sylviane Granger and Fanny Meunier. 1996.
Error Tagging Manual Version 1.1. Centre for English Corpus Linguistics. Louvain-la-Neuve: Université catholique de Louvain. Dagneaux, Estelle, Sharon Denness and Sylviane Granger. 1998. Computer-aided error analysis. System 26, 163-174. Díez-Bedmar, Maria Belén and Szilvia Papp. 2008. The use of the English article system by Chinese and Spanish learners. In G. Gilquin, S. Papp and M Díez-Bedmar,
eds.,
Linking
up
Amsterdam: Rodopi, 147-173.
contrastive
and
learner
corpus
research.
Learner Corpus Research in EAP: Some key issues and future pathways 57
Eriksson,
Andreas.
2012.
MUCH:
the
Malmö
academic writing. Poster presented at
University-Chalmers th
10
Corpus
of
International Teaching and
Language Corpus Conference. Warsaw, Poland, 14 July 2012. Flowerdew, Lynne. 2004. The argument for using English specialized corpora to understand Upton,
academic
eds.,
and
Discourse
professional
in
the
language.
Professions:
In
U.
Connor
Perspectives
from
&
T.
Corpus
Linguistics. Amsterdam: John Benjamins, 11-33. Flowerdew,
Lynne.
Corpora
2012.
and
Language
Education.
Basingstoke,
UK:
Palgrave Macmillan. Flowerdew,
Lynne.
forthcoming,
2014.
Learner
corpora
and
language
for
academic and specific purposes. In S. Granger et al. (eds.). Frankenberg-Garcia, Ana, Lynne Flowerdew and Guy Aston (eds.). 2011. New
Trends in Corpora and Language Learning. London: Continuum. Gilquin,
Gaëtanelle
and
Magali
Paquot.
2008.
Too
chatty:
learner
academic
writing and register variation. English Text Construction 1.1,41-61. Granger, Sylviane. 1996. From CA to CIA and back: an integrated approach to computerized and bilingual corpora. In K. Aijmer, B. Altenberg and S. Johansson
(eds.),
Languages in Contrast.
Lund:
A bird’s
learner
Lund
University
Press,
37-51. Granger,
Sylviane.
2002.
eye
view of
corpus
research.
In
S.
Granger et al., (eds.), 3-33. Granger, Sylviane. 2009. The contribution of learner corpora to second language acquisition and foreign language teaching: a critical evaluation’, In K. Aijmer (ed.), 13-32. Granger,
Sylviane.
Plenary
2013.
speech
Contrastive
delivered
at
interlanguage the
Second
analysis:
Learner
A
reappraisal.
Corpus
Research
Conference. Bergen, Norway, 27 September 2013. Granger, Sylviane, Gaëtanelle Gilquin and Fanny Meunier. (eds.), (forthcoming 2014)
Cambridge
Handbook
of
Learner
Corpus
Research.
Cambridge:
Cambridge University Press. Granger, Sylvane, Joseph Hung and Sylvie Petch-Tyson (eds.). 2002. Computer
Learner Corpora, Second Language Acquisition and Foreign Language Teaching. Amsterdam: John Benjamins. Granger,
Sylviane
and
Magali
Paquot.
2013.
Language
for
specific
purposes
learner corpora. In C. Chapelle, ed., The Encyclopedia of Applied Linguistics. Oxford, UK: Wiley-Blackwell, 3142-3146. Groom,
Nicholas. language
2009.
Effects
collocational
of
second
development.
language In
A.
immersion
Barfield
and
on H.
second
Gyllstad,
58 Lynne Flowerdew
(eds.),
Researching
Collocations
in
another
Language.
Basingstoke,
UK:
Palgrave Macmillan, 21-33. Hatzitheodorou, Anna-Maria and Marina Mattheoudakis. 2011. The impact of culture on the use of stance exponents as persuasive devices: The case of GRICLE and English native-speaker corpora. In A. Frankenberg-Garcia et al. (eds.), 229-246. Hewings, Martin and Ann Hewings. 2002. “It is interesting to note that
…”:
a
comparative study of anticipatory ‘it’ in student and published writing.
English for Specific Purposes, 21.4, 367-383. Holmes, Janet. 1983. Speaking English with the appropriate degree of conviction. In C. Brumfit, ed., Learning and Teaching Languages for Communication.
Applied Linguistic Perspective. London: Center for Language Teaching and Research, 100-113. Hong, Huaqing. 2012. Compilation and exploration of ICCI corpus for learner language research. In Tono et al., (eds.), 47-61. Hyland, Ken. 1999. Disciplinary discourses: writer stance in research articles. In C. Candlin and K. Hyland, (eds,) Writing: Texts, Processes and Practices, London: Longman, 99-121. Hyland, Ken. 2005. Metadiscourse. London: Continuum. Hyland, Ken and John Milton. 1997. Qualification and certainty in L1 and L2 students’ writing. Journal of Second Language Writing 6.2, 183-205. Jarvis, Scott. 2000. Methodological rigor in the study of transfer: identifying the L1 influence in the interlanguage lexicon. Language Learning 50.2, 245-309. Johansson, Stig. 1991. Times change, and so do corpora. In K. Aijmer and B. Altenberg (eds.) English Corpus Linguistics. London: Longman, 305-314. Lee, David and Sylvia Chen. 2009. Making a bigger deal of the smaller words: function
words
and
other
key
items
in
research
writing
by
Chinese
learners. Journal of Second Language Writing 18.4: 281-96. Leńko-Szymańska, Agnieszka. 2012. The role of conventionalized language in the acquisition and use of articles by Polish ELF learners. In Y. Tono et al., (eds.), 83-103. Master,
Peter.
1997.
The
English
article
system:
acquisition,
function
and
pedagogy. System 25, 215-232. Mattheoudakis,
Marina
and
Anna-Maria
Hatzitheodorou.
2012.
The
lexical
patterning of light verbs in GRICLE and native corpora: a comparative corpus-based study. In J. Thomas and A. Boulton, eds., Input, Process and
Product.
Developments
in
Teaching
and
Language
University Press: Brno, Czech Republic, 213-228.
Corpora.
Masaryk
Learner Corpus Research in EAP: Some key issues and future pathways 59
McEnery, Tony and Nazareth Kifle. 2002. Epistemic modality in argumentative essays
of
second-language
writers.
In
J.
Flowerdew
(ed.),
Academic
Discourse. London: Longman, 182-95. nd
McEnery, Tony and Andrew Wilson. 2001. Corpus Linguistics 2
edn. Edinburgh:
Edinburgh University Press. McEnery,
Tony,
Richard
Xiao
and
Yukio
Tono.
2006.
Corpus-Based Language
Studies. London: Routledge. Meunier, Fanny, Sylvie De Cock, Gaëtanelle Gilquin and Magali Paquot (eds.). 2011. A Taste for Corpora. In Honour of Sylviane Granger. Amsterdam: John Benjamins. Meunier, Fanny and Damien Littré. 2013. Tracking learners’ progress: adopting a dual ‘corpus cum experimental data’ approach. The Modern Language
Journal 97, 61-76. Narita, Masumi. 2013. The use of articles in Japanese EFL learners’ essays. In S.
Granger,
G.
Gilquin
and F.
Meunier,
eds.,
Twenty Years of Learner
Corpus Research: Looking back, moving ahead. Louvain-la-Neuve: Presses universitaires de Louvain, 357-366. Neff, JoAnn, Emma Dafouz, Honesto Herrera, Francisco Martinez and Juan Rica. 2003. Contrasting learner corpora: the use of modal and reporting verbs in the expression of writer stance. In S. Granger and S. Petch-Tyson eds.,
Extending the Scope of Corpus-based Research. Amsterdam: Rodopi, 211-230. Nesselhauf,
Nadia.
2005.
Collocations
in
a
Learner
Corpus.
Amsterdam:
John
Benjamins. Paquot,
Magali.
2010.
Academic
Vocabulary
in
Learner
Writing.
London:
Continuum. Paquot, Magali. 2013. Lexical bundles and L1 transfer effects. International Journal
of Corpus Linguistics 18.3,391-417. Popper, Karl. 1963. Conjectures and Refutations: the Growth of Scientific Knowledge. London: Routledge and Kegan Paul. Quirk, Randolph, Stanley Greenbaum, Geoffrey Leech and Jan Svartvik. 1985. A
Comprehensive Grammar of the English Language. London: Longman. Römer,
Ute.
2004.
A
corpus-driven
approach
to
modal
auxiliaries
and
their
didactics. In J. McH. Sinclair, ed., How to use Corpora in Language Teaching. Amsterdam: John Benjamins. Tognini
Bonelli,
Elena.
2001.
Corpus
Linguistics
at
Work.
Amsterdam:
John
Benjamins. Tono, Yukio, Yuji Kawaguchi and Makoto Minegishi (eds.). 2012. Developmental and
Crosslinguistic
Perspectives
in
Learner
Corpus
Research.
60 Lynne Flowerdew
Amsterdam: John Benjamins. Wharton,
Sue.
2012.
description
Epistemological
task:
Findings
and
from
a
interpersonal
stance
discipline-specific
in
learner
a
data
corpus.
English for Specific Purposes 31, 261-270. Widdowson, Henry. 2003. Defining Issues in English Language Teaching. Oxford: Oxford University Press.
Center for Language Education (formerly) Hong Kong University of Science and Technology Clear Water Bay Road, Hong Kong
[email protected]
Received: November 15, 2013 Reviewed: December 10, 2013 Accepted: July 11, 2014