CORPUS APPROACHES TO WE 1 Corpus Approaches to World

0 downloads 0 Views 688KB Size Report
Corpus of Present-Day Edited American English, also more commonly known as the. BROWN ... questions relating to the regional diversity of world Englishes.2.
CORPUS APPROACHES TO WE

1

Corpus Approaches to World Englishes: A Bird’s Eye View Sandra C. Deshors Michigan State University

Tobias Bernaisch Justus Liebig University Giessen

General Introduction to Corpus Approaches to WE Research Over the past fifty years or so, corpora (singular, corpus) have gradually been integrated into the field of World Englishes (WE) research and increasingly contributed over the years to the on-going development of the field. As collections of naturally occurring spoken or written texts in electronic format, corpora carry much valuable information on how language (and in our case here, English) is being used across its varieties worldwide. Because they are compiled in a principled manner, corpora provide large carefully structured datasets. More specifically, corpora are designed based on specific sets of criteria influenced by their purpose and scope. Such criteria can include, for instance, different types of texts, different language varieties, or specific domains of language use such as conversation or journalistic writing. Further, corpora are assembled systematically to investigate one or more (linguistic) phenomena. As it will become clear throughout the present chapter, corpus-based research in WE has gradually reached beyond simply exploring world Englishes using corpus data. Instead, it has undergone a significant methodological and statistical turn in the way corpora are utilized, and the research community has witnessed the development of state-of-the-art methodological tools that have allowed for optimal exploration of corpus data. As a result of this methodological revolution, not only have WE researchers managed to gain a better understanding of the linguistic forces that drive the development of Englishes worldwide, but the research of corpus-based WE has become its own sub-field within WE research.

CORPUS APPROACHES TO WE

2

As Mair (2017) recently noted, “there is a specific corpus-linguistic ‘take’ on World Englishes, and this is why corpora are also important for advancing the theoretical debate in the field” (p. 104). In what follows, we trace back the development of corpus-based approaches to world Englishes from their creation to the emergence of corpus-based approaches as representing their own sub-field within WEs research. We begin by briefly describing the general stepping stones in the development of corpora of world Englishes as well as key methodological tools and analytical approaches. We then move on to provide an assessment of the strengths and limitations of those approaches, and we offer pointers for the selection of appropriate corpora, corpus tools, and methodological design. Finally, we discuss future directions for corpus-based approaches to WE and offer suggestions to ensure that corpus-based approaches continue to play an active role in WE research.

Background and Methodological Principles While today, corpus linguistics has become a vibrant part of WE research, the activity currently generated by scholars in this area is the result of approximately half a century of corpus compilation and development of appropriate tools that can handle effectively corpus data and get the best out of them. Below, we present a bird’s eye view of the main corpora of world Englishes that are available today, we explain how those corpora are structured as data sets, and we discuss the type of data they represent and how those corpora can help researchers address questions of a qualitative as well as a quantitative nature.

CORPUS APPROACHES TO WE

3

History of corpora as a source of data. The compilation of corpus data to investigate English varieties goes back as far as 1964 with Francis and Kučera’s Standard Corpus of Present-Day Edited American English, also more commonly known as the BROWN Corpus. Back then, the BROWN corpus laid the foundation stones for a large number of corpora in WE for decades to come. The first of its kind, the BROWN corpus is characterized by its size of 1 million words, its exclusive focus on written English, the inclusion of sample texts of approximately 2,000 words each, and a wide range of represented genres (press, academic writing, fiction, etc.). At the time, corpus compilers readily adopted these attributes to develop other datasets of native English such as the Lancaster-Oslo/Bergen (LOB) Corpus for British English, the Australian Corpus of English, and the Wellington Corpus of Written New Zealand English. In addition, the BROWN design was also used to compile the first corpora of English-as-a-second language (ESL; aka outer-circle) varieties such as the Kolhapur Corpus of Indian English. The data available in the extended BROWN family1 of corpora provided unparalleled empirical resources that have allowed researchers to address up to then unexplored questions relating to the regional diversity of world Englishes.2

A second landmark in corpus development for WE research is the launch, in the early 1990s, of Greenbaum’s (1988) international computerized corpus of English known as the International Corpus of English (ICE). To date, ICE includes 27 national components (most of them being complete but some still in progress) across native and ESL varieties so far spanning Africa (e.g. Ghana, Nigeria, South Africa), The Americas (e.g. Bahamas, Canada, USA), Asia (e.g. Indian, Singapore, Sri Lanka) as well as Australia and the

CORPUS APPROACHES TO WE

4

South Pacific (e.g. Fiji, New Zealand). Mirroring the BROWN design, individual ICE components include 1 million words from various genres in the form of 500 2000-word samples. Unlike Brown though, ICE includes spoken language (60% of its overall content) in the form of monologic (e.g. legal presentations, broadcast news) and dialogic data (e.g. face-to-face conversations, parliamentary debates). Typically, speakers and writers sampled in ICE are at least 18 years of age and educated through English up to a minimum of secondary schooling. Over the past twenty years, ICE has become one, if not the go-to corpus in corpus-based WE research (Nelson & Ozón 2018).3

A third landmark in the establishment of corpus linguistics as an integral part of WE research is the compilation and release of the Global Web-based English Corpus (GloWbE; Davies & Fuchs, 2015), which marks an important step towards developing mega-corpora in WE (see also the continuously updated 5-billion-word News on the Web (NOW) corpus (https://corpus.byu.edu/now/)). With its 1.9 billion words extracted exclusively from online sources (such as informal blogs, newspapers, company websites, etc.), GloWbE is organized around six native English varieties and 14 ESL varieties. While the sheer size of the corpus was of course welcomed by many scholars (because it makes investigating low-frequency linguistic items possible), such a large corpus of online English comes at the cost of the availability of reliable and extensive sociobiographic speaker information, which are nearly impossible to obtain from online sources. Further, in this context, and in order to ensure that the field is developing in the most rigorous manner, corpus analysts in the 21st century are now faced with emerging issues related to the reliability of corpus findings and how certain corpus linguists can be

CORPUS APPROACHES TO WE

5

that variety-specific GloWbE findings are trustworthy when the location of online texts fed into GloWbE is estimated by Google.4 In other words, the inability to be specific with the location of online texts makes it difficult for corpus linguists to isolate individual varieties.

Digging into corpus data: What tools are available? As corpora became available and corpus-based WE research started to bloom, software was developed to facilitate the extraction and automatic handling of corpus data. While a large number of software packages have been developed since the 1960s (see McEnery & Hardie, 2012 for a historical overview of different generations of corpus-linguistic software), here we are focusing on two of the most popular packages amongst corpus linguists, the freeware AntConc (Anthony, 2017; http://www.laurenceanthony.net/software/antconc/) and the commercially licensed WordSmith Tools(Scott, 2017; http://www.lexically.net/wordsmith/). Despite the usefulness of ready-made corpuslinguistic software of this type, corpus linguists are gradually asking research questions that are beyond the scope of such software (e.g. questions that require analysts to assess the simultaneous influence of several variables on English-as-a-foreign language (EFL) and ESL speakers’ linguistic choices; see the fourth sample study in Bernaisch and Deshors (this volume) for an example of a study that includes several variables). For this reason, more and more corpus users turn to programming languages such as Perl, Python, or R for more flexible or complex data extraction than provided by ready-made corpus software. Still, AntConc and WordSmith Tools remain the most widely-used and userfriendly packages with easy-to-navigate interfaces that enable linguists to dig into

CORPUS APPROACHES TO WE

6

corpora and extract information related to the frequency of use or the linguistic context of individual lexical items, word classes via part-of-speech tags, and syntactic patterns. Slight differences between the two software packages exist. For instance, AntConc allows analysts to search corpora using regular expressions, and WordSmith facilitates data annotation in the software package itself. However, AntConc and WordSmith share a number of key functions corpus linguists often resort to, such as word lists, keyword lists, and concordances. Zooming into those functions individually, word lists (or frequency lists) show, for every word attested in the corpus, the frequency with which a given word occurs in the corpus. In word lists sorted according to frequency of occurrence, a relatively stable set of function words including articles, prepositions, etc. usually tops such lists masking the semantic value of a given corpus since lowerfrequency content words tend to occur at lower positions on a word list. So even though ready-made software comes with easy-to-use functions, analysts should use those functions carefully.5 With keyword lists, though, one can avoid the above problematic issue by comparing the frequency of a given word in a particular (e.g. EFL/ESL) corpus to the frequency of the same word in a reference corpus. As Mukherjee and Bernaisch (2015) demonstrate, this type of function is useful to capture cultural keywords in world Englishes, in their case in South Asia. The most widely used function in AntConc and WordSmith is probably concordancing. A concordance as in Figure 1 shows a keyword in context (KWIC), in this case itself and its immediate co-text to the left and to the right in ICE-India.6 In the column ‘Hit’, each match is numbered consecutively and the individual text file in which a match was found is shown in the column ‘File’.7 Independent of whether you look for a single word as in Figure 1 or a more complex string of words,

CORPUS APPROACHES TO WE

7

concordancers merely match the search term you specify with words in the corpus texts you loaded into the software and then show all matches–and their total number–in a tabulated format. Data extracted in this fashion would subsequently be imported into spreadsheet software to annotate each match for a number of relevant variables, which is why concordances are central prerequisites for in-depth qualitative and quantitative corpus-based analyses, but usually not reportable results in their own right.

Figure1: A concordance of itself in ICE-India in AntConc 3.4.4. Analytical approaches to corpus data: qualitative vs./and quantitative accounts. Despite the fact that over the past few years quantitative corpus-linguistic research in WE has significantly grown as a discipline and that quantitative approaches have started to dominate WE research, corpus-based qualitative approaches should not be underestimated. For instance, work by Rosen (2016) and Mair (2018) shows the usefulness of qualitative analyses to better understand the sociolinguistic forces at play

CORPUS APPROACHES TO WE

8

behind the development of individual varieties of world Englishes. In the case of Rosen (2016), qualitative analysis allows her to capture the emergence and development of Norman-French-influenced linguistic innovation in spoken Jersey English, and she shows how this approach helps to (i) distinguish between innovations emerging in a new variety and so-called errors in EFL on a purely linguistic level, (ii) pinpoint social factors that influence the use and probable fates of innovations in Jersey English, and (iii) develop a typology of innovations according to their developments. In the case of Mair (2018), the study focuses on globalization and the changing role of English in Germany and examines the diverse types of English-German language mixing which can be observed among two less prestigious and often marginalised groups (i.e. followers of urban youth cultural movements and the socially disparate group of recent arrivals in Germany from “Anglophone” West Africa). Mair’s (2018) qualitative approach allows him to show how “English has been firmly integrated into the increasingly multilingual languagescape of contemporary Germany [which, in turn,] has integrated Germany more closely into several fast-evolving transnational languagescapes shaping our globalising world” (p. 72).

Overall, however, it is fair to say that the two methodological approaches, quantitative and qualitative, should be seen as complementing each other as qualitative approaches that scrutinize the linguistic context of use of linguistic features can play a crucial role in developing fine-grained annotation schemes for subsequent quantitative treatment of those features. Put differently, qualitative studies are cornerstones for the sophisticated quantitative techniques that are currently being developed in the field, and state-of-the-art

CORPUS APPROACHES TO WE

9

multifactorial statistical methods increasingly investigate linguistic phenomena are annotated using taxonomies developed as a result of fine-grained qualitative research. However, an attractive aspect of quantitative corpus approaches is that they allow researchers to assess how systematic observed linguistic patterns are in the data. As Gries (2009) rightly pointed out, “corpora are (usually text) files and all you can get out of such files is distributional (or quantitative/statistical) information” (p. 3). Once a linguistic pattern has been observed qualitatively, a quantitative approach can therefore help the researcher assess whether the use of that pattern is attested within a given English variety, and if so, in what proportion. Similarly to other corpus-informed fields in linguistics, what is striking with quantitative work in world Englishes is the maturation of the various techniques that have been developed in the last decade and particularly within the past five years. Although corpora only provide distributional information, recently, scholars have demonstrated much creativity and research effort to develop quantitative methods that bring the best out of those distributions to better understand how the linguistic structure of world Englishes varies systematically across individual Englishes.

With regards to the development of quantitative methods, throughout the last 25 years, one can identify different types of approaches: (i) simple frequency comparisons without complex statistical techniques of a comparatively small number of features and a small number of varieties, (ii) Multidimensional Analysis (MDA), an exploratory (i.e. hypothesis generating) approach involving many linguistic features condensed into a smaller number of dimensions, statistically sophisticated approaches such as (iii) classification/grouping/cluster analysis, also exploratory in nature and involving large

CORPUS APPROACHES TO WE

10

numbers of features in large numbers of English varieties to find somehow homogeneous groups of varieties distinct from other groups of varieties, and (iv) multifactorial logistic regression modeling, a confirmatory approach (i.e. hypothesis validating) to predict the linguistic choices of speakers of different world Englishes. While space does not allow us to describe all four types of approaches, we refer the reader to study boxes 5.2 and 5.3 in Bernaisch and Deshors (this volume) for an illustration of MDA and classification analysis. In what follows, however, we focus on contrasting the simple frequency approaches with the logistic regression approach as a way of demonstrating how the methodological shift from monofactorial analysis (i.e. analysis based on simple frequency counts of isolated linguistic features) to now commonly adopted multifactorial analysis (i.e. analysis based on a simultaneous treatment of several linguistic features) has helped unveil, more than ever before, the complexity of world Englishes as linguistic systems in their own right.8

Traditionally, frequentist approaches exclusively focus on exploring isolated linguistic features based on normalized frequency of occurrence across corpora. Researchers therefore assess whether features such as modal verbs or progressive marking are used more or less frequently by EFL/ESL speakers compared to native speakers. This approach comes with the computation of basic descriptive statistics to ensure that observed differences across world Englishes are linguistically meaningful. Overall, for the past 25 years, this approach has generated a multitude of studies, and as Deshors (2016) points out, “[t]he literature shows that this methodological approach has led to insightful descriptive accounts of distributional differences of linguistic items in native

CORPUS APPROACHES TO WE

11

and learner language” (p. 64). However, a main limitation is that it does not account for the linguistic context of use of investigated linguistic items (at least quantitatively speaking). As demonstrated in Gries and Deshors (2014), analyses that do control for the linguistic contexts of use and that account for what a native speaker would say in the same linguistic context remain extremely rare. Importantly, in the specific case of may vs. can in interlanguage, which Gries and Deshors (2014) analyze based on 12 semantic and morphosyntactic predictors from the linguistic context of use of the two modals, it emerges that with may, negation is inversely correlated with the use of the modal. So the fact that learners use may 10% less often than native speakers says nothing about learners’ ability to use the modals in a native-like way, but rather that the 10% difference is due to learners’ use of negations. Methodologically, the recognition that linguistic contexts should be an integral part of a quantitative corpus analysis recently led to the development of a state-of-the-art statistical approach called MuPDAR (Gries & Deshors, 2014), which quickly established itself as a methodological asset of contemporary quantitative corpus-based WE research. Briefly, MuPDAR is a two-step logistic regression approach (referred to as R1 and R2 in Figure 1) that allows researchers to explore WE data by asking: What would a native speaker do in the exact situation the learner/second-language user is in? and How do native speaker choices differ from what the EFL/ESL user did? Crucially, MuPDAR allows researchers to address those questions based on a large number of contextual linguistic factors and their possible mutual influence on the use of a linguistic item.9 Figure 1 summarizes the approach step-by-step in technical terms. While space restrictions do not allow us to offer a detailed commentary on the figure, it is clear that with MuPDAR the research community has

CORPUS APPROACHES TO WE

12

reached an unprecedented level of sophistication in the treatment of corpus data in WE. Illustrations of the approach can be found in Gries and Bernaisch (2016) where MuPDAR is used to capture the epicenter of English in South Asia and in Gries, Bernaisch, and Heller (2018) for an application to diachronic data of Singapore English. Overall, the shift towards multifactorial analyses is a very exciting development for corpus linguists in WE who are now able to predict—rather than describe—the language of world Englishes speakers and who are able to do so with an unprecedented degree of granularity and precision. Finally, looking at Figure 2, one can only be amazed at how fast quantitative methods in corpus-based WE have matured since the launch of the ICE corpus in the late 1980s.

CORPUS APPROACHES TO WE

13

Figure 2. Flowchart of the MuPDAR approach (borrowed from Gries & Deshors, 2014) Strengths and Weaknesses of Corpus-based Approaches to WE

Strengths. The benefits of adopting corpus-based approaches to WE are numerous. First, at the general level of project management and data collection/availability, those approaches present the advantage of allowing researchers to explore a large amount of data very fast and thereby retrieve much linguistic information

CORPUS APPROACHES TO WE

14

about particular linguistic items in a matter of minutes. Furthermore, given the availability of a large number of corpora to the public, researchers do not necessarily need to dedicate considerable time and effort to collecting data, and provided that their research questions are compatible with the use of already compiled corpus data, they are able to immediately focus on extracting and analyzing data from their chosen corpora.A second advantage of corpus-based approaches is, to use Fillmore’s (1992) words, that they “[make] it possible for linguists to get the facts rights” (p. 3) and thus draw a realistic (in the sense of authentic) picture of what it means for speakers of WE to use particular linguistic items, which in order to be captured, require researchers to scan through large amounts of data. Ultimately, while the availability of corpus data encourages the study of aspects of world Englishes that would have otherwise received less attention in other frameworks (Mair, 2013), corpus approaches to WE allow for research foci centered on capturing emerging linguistic patterns likely to occur in very small numbers. In addition, in the specific case of quantitative corpus-based approaches to WE, this type of approach can play an important role in the theorizing process of the development of Englishes world-wide (Mair, 2013; McEnery & Hardie, 2012; see also Deshors, 2018 for a collection of corpus-based studies assessing the validity of existing theoretical models of WE for 21st century Englishes). In addition, another strength of corpus approaches in WE research is that their usefulness is not restricted to a single linguistic level. Indeed, these approaches have been demonstrated to be equal assets to the WE research community in studies with a phonological (e.g. Gut & Fuchs, 2017), pragmatic (e.g. Barron, 2017) or semantic focus (e.g. Werner & Mukherjee, 2012 or Deshors, 2017). Morpho-syntax is the structural level that has benefitted the most from

CORPUS APPROACHES TO WE

15

the development and application of increasingly sophisticated methods to explore, amongst other linguistic phenomena, alternating syntactic constructions such as dative and genitive alternation. Another strength of corpus-based approaches to WE relates to the type of research questions that the recent development of sophisticated corpus methods and statistical techniques has enabled researchers to explore. A good example would be Szmrecsanyi and Kortmann (2011) who, with a clustering technique, are able to ask whether varieties of English exhibit different complexity profiles, such that different sets of varieties tend to exhibit higher or lower degrees of morpho-syntactic complexity and how such structural complexity can be measured in the first place; also, as raised by Kortmann (2010, p. 400), whether the observed similarities and differences across Englishes around the world are best accounted for in terms of geography or in terms of the type of variety they contribute to. In the same spirit, adopting regression-based approaches has allowed scholars to shift their analytical focus from describing world Englishes (i.e. types of nation-bound English varieties) to predicting the linguistic choices of individual English speakers worldwide. In that respect, the strength of corpus approaches as versatile approaches with an array of state-of-the-art methodological tools are beginning to allow scholars the freedom to explore WE data using various methodologies compatible with theoretical frameworks in sociolinguistics, typology, and cognitive linguistics. As a result, corpus linguists in WE are able to play a crucial part in the on-going development of WE as an academic discipline as empirical findings can contribute to WE theory.

CORPUS APPROACHES TO WE

16

Weaknesses. In spite of the above-described advantages, corpus approaches can present a number of disadvantages when applied to WE data. The first disadvantage relates to nativized structures, i.e. markedly regional linguistic structures evolved in a complex scenario of language contact, and their exploration. By nature, these structures are bound to be used relatively infrequently when they first emerge, which makes their identification difficult both in qualitative and quantitative analyses, and in the latter case, their treatment as nativized structures hard to confirm statistically. Therefore, generalization, with regard to this type of structures, is hard, even though studies may be based on large corpora. A second disadvantage concerns the availability of detailed metadata. In contrast to ethnographic data, for example, recently compiled corpora such as GloWbE fail to include the amount of sociolinguistic information about speakers and/or writers. This situation can be quite problematic as it potentially restricts researchers in the type of study they can or cannot conduct using a corpus such as GloWbe and also limits their studies to a linguistic focus rather than a socio-linguistic one. A third disadvantage is the amount of manual annotation that corpus studies require, which is a timeconsuming process. This is an important factor to keep in mind, as corpus studies are increasingly becoming multifactorial in nature, which means that a single linguistic phenomenon can easily be annotated for a dozen variables or so, leading to several thousands of data points to encode. In this context, one can understand why quantitative corpus studies of world Englishes have primarily favored linguistic foci on morphosyntax rather than semantic foci which would incur more complex and therefore even more time-consuming annotations as well as annotations of a possibly more subjective nature. At this point, extensive analyses of spoken world Englishes remain relatively rare

CORPUS APPROACHES TO WE

17

given the scarcity of large-scale spoken corpora due to the intricacies of phonological transcription. In the same spirit, the current unavailability of corpus resources for many ESLs at this point in time prevents all varieties of world Englishes from being equally compared against one another. This view applies specifically to the case of EFL and ESL varieties. As different types of English varieties, EFL and ESL so far have generally been explored independently of one another due to the lack of comparable large-scale corpora (generally, EFLs have been explored based on the International Corpus of Learner English, ICLE, and ESL varieties have mainly been explored based on ICE). This has made the comparisons of research across EFL and ESL very difficult given that ICLE and ICE include different types of data (ICLE is entirely based on argumentative essays, and ICE includes a large variety of genres). Although a number of studies have contrasted EFL and ESL using the above two corpora, such studies have had to be limited in scope by restricting their analysis to student writing. Despite Edwards’ (2017) recent efforts to address this issue by designing and compiling the first corpus of EFL, the Corpus of Dutch English, that follows the ICE architecture, we still have a long way to go before we are able to conduct large-scale fully comparable corpus studies that cover the entire ENL, ESL, and EFL spectrum.

Pointers for Practical Applications

Corpora as a source of data: How do you select an appropriate WE corpus? What criteria would you use? Not all corpora are suitable for all studies. The choice of corpus for a given study very much depends on the nature of the study and the specific

CORPUS APPROACHES TO WE

18

research questions that the researchers are asking. That is mainly because, as explained at the start of the present paper, corpora differ in scope, architecture, size, annotation (i.e. whether, and if so what kind of annotation (phonetic, part-of-speech tagging, syntactic parsing, …) they include). So for example, a smaller specialized corpus may be more suitable for close readings and qualitative analyses whereas a larger corpus will be more useful for quantitative studies that require concordancing. Also, when choosing a corpus, paying attention to its degree of internal homogeneity (i.e. to what extent the corpus exhibits internal variation), which can be done through statistical testing, will help decide the suitability of that corpus for a particular study (McEnery & Hardie, 2012, p. 3).

Digging into corpus data: How do you decide what tool to use? To some extent, the choice of tool depends on the corpus a researcher decides to use and on the research questions. For instance, whether or not the corpus is parsed for clause elements will influence the researcher’s choice of extraction and concordance tool. That is, a researcher exploring a syntactic pattern in ICE-GB would need to use ICECUP to facilitate his or her data extraction from such as structure. Despite the availability of userfriendly corpus tools, over the recent years, the field has witnessed an increase in the development and use of open source software based on programming languages such as R to extract, process, and visualize corpus data. Although writing one’s own script to retrieve and evaluate data incurs substantial learning on the part of the researcher, increasingly, scholars choose this option over the use of more user-friendly tools to have more control over their research project as a whole, and thus, by using one single tool for every stage of their project. In addition, researchers are then in a position to make their

CORPUS APPROACHES TO WE

19

script available to their research community, thereby ensuring that their work is transparent, accessible, and replicable. Analytical approaches to corpus data: Why would you choose to take a qualitative or a quantitative route? The choice between a qualitative and a quantitative approach is one that, again, very much depends on the analyst’s research question and the type of information that the study is designed to capture, and it often results in trade-off situations. For instance, although one can investigate a linguistic phenomenon both qualitatively and quantitatively, the former will most likely lead to more fine-grained analyses but whose results may not be that easily generalizable (mainly because timeconstrains tend to limit analysts to the exploration of very small data samples). Conversely, quantitative approaches will lead to analyses that check for the systematicity of linguistic patterns across corpora but possibly at the expense of granularity. Existing literature shows that researchers tend to find mixed approaches (i.e. the combination of a qualitative and a quantitative approach) to be an insightful option that allows for, say, a first identification of general tendencies in the data (quantitative analysis), and then, based on those first results, a more fine-grained examination of the tendencies in question (qualitative analysis).

Future Directions Given the shape of corpus-based research in WE today, there are various ways in which the field can continue to grow and which generally involve (i) a more generalized and wide-spread use of sophisticated quantitative methods within the research community, (ii) a commitment from the research community to actively compile more corpora

CORPUS APPROACHES TO WE

20

featuring diachronic world Englishes data, and (iii) a widening of the general research focus to address not only questions of a morpho-syntactic nature (as it has primarily been the case so far), but also questions of a semantic and pragmatic linguistic nature. Regarding (i), while we have shown in the current chapter how corpus linguistics has become a dynamic strand within WEs research by systematically enhancing its methodological toolkit, one must admit that the latest methodological and statistical developments described in Section 2 tend to only be applied/adopted by a small group of scholars. Therefore, in order for the field to benefit from methodological developments, it is crucial that recently developed state-of-the-art methodologies be more widely embraced by the majority of WE quantitative scholars and be tested out on a much wider range of linguistic phenomena. Regarding (ii), to date, the greater majority of world Englishes corpora feature synchronic data, and there is only a handful of large-scale corpus projects on diachronic data. Although many studies of world Englishes have inferred diachronic developments from synchronic corpus analyses, Gries et al. (2018) show that this is imprudent since synchronic analyses inWE, which often compare a postcolonial English to their historical input British English, tend to assign any variation in the data to developments in that postcolonial English (sometimes falsely) assuming a decade-long structural stability of British English. Further, scholars such as Evans (2014) have started to question the validity of existing theoretical models of WE based on the fact that those models were not developed on a broad empirical diachronic basis. In this light, diachronic corpus projects on Hong Kong English (Biewer, Bernaisch, Heller, & Berger, 2014) and Singapore English (Hoffmann, Sand, & Tan, 2012) are most welcome and much needed additions to corpora in WE. Finally, regarding (iii), considering the

CORPUS APPROACHES TO WE

21

body of corpus-based research in world Englishes since the latest methodological developments, research questions tend to focus on morpho-syntactic and lexicogrammatical features, ideally with only two structural variants so that they can be modelled using a binary logistic regression approach. To some degree, this can be attributed to the relative easiness of retrieving linguistic forms (i.e. lexical items and syntactic patterns, should corpora be tagged for parts-of-speech) compared to retrieving semantic information from corpora that are not semantically annotated. This can explain why today (multifactorial) studies aiming to address semantic and/or pragmatic questions remain rare.10 In light of the various desiderata outlined, we encourage future research in WE that (i) adopts the state-of-the-art empirical methods outlined, (ii) explores the development of world Englishes and their linguistic structures with real-time diachronic data, and (iii) embraces the investigation of semantic and pragmatic objects of investigation in complementation to the more traditionally researched phonetic, lexical, and syntactic variables. In this context and moving forward, it is important that corpusbased world Englishes studies are representative of all linguistic levels so as to draw a picture of the linguistic structure that is as realistic and comprehensive as it can possibly be.

End Notes 1.

Although it may seem that BROWN family refers to all the datasets following the

BROWN design, the term BROWN family has been reserved for BROWN and LOB as well as their diachronic complements. 2.

See Sand (2013) for a (more) detailed list of WE corpora.

CORPUS APPROACHES TO WE 3.

22

This section focuses on first- and second-language corpora for the study of world

Englishes, but datasets such as ELFA (2008) or ICNALE (Ishikawa, 2014), the latter of which also makes the original sound files used for the transcripts of the spoken data available for phonetic analyses, provide empirical bases for the description of foreignlanguage varieties world-wide. 4.

See English World-Wide 36(1) for a valuable assessment of GloWbE as a corpus-

linguistic resource. 5.

Still, word lists have the potential to unveil (possibly unanticipated) spelling

variation in the dataset when its entries are sorted alphabetically, which may, however, be more relevant to historical studies of the English language. 6.

Please note here that ‘keyword’ in ‘keyword in context (KWIC)’ is not to be

understood as a word that occurs more frequently in one corpus than in another. When concordancing, ‘keyword’ is synonymous with ‘search term’. 7.

For a detailed introduction to (concordancing in) AntConc, please visit

http://www.laurenceanthony.net/software/antconc/ (accessed 21 September 2017). 8.

See study boxes 5.1 and 5.4 in Bernaisch and Deshors (this volume) for an

illustration of a study based on simple frequency counts and a study based on MuPDAR. 9.

Note that sociolinguistic factors (such as sex, native language, etc.) can be

integrated into multifactorial analyses, and their influence on the use of linguistic items can be explored alongside that of linguistic factors. 10.

However, see Deshors (2017) for a recent exception that explores multifactorially

the structure of subjectivity in Asian Englishes.

CORPUS APPROACHES TO WE

22

Timeline Year

Reference

Annotation

Topic

1979

Nihalani, P., Tongue, R. K.,

Nihalani et al.’s (1979) usage guide for Indian English is

Pre-electronic corpus

Hosali, P., & Crowther, J.

an example of variety descriptions pre-dating corpus

description

(1979). Indian and British

studies in the narrow linguistic sense. Based on anecdotal

English: A handbook of usage

rather than statistically verified accounts, phonetic, lexical,

and pronunciation. Delhi:

syntactic, and semantic particularities are anchored in a

Oxford University Press.

keyword and listed in alphabetical order. Usage guides represent invaluable starting points for detailed corpus analyses of the features they describe.

1990

Schmied, J. (1990). Corpus

This article problematizes the adaptation of corpus

linguistics and non-native

compilation techniques developed for native Englishes to

varieties of English. World

ESL contexts and highlights how ESL corpora can inform

Englishes,9(3), 255–268.

various applied and theoretical sub-branches in the WE paradigm including dictionary making. It suggests ways to

Corpus compilation

CORPUS APPROACHES TO WE

23

compile small corpora in contexts where English texts are not sampled systematically. 1996

1996

Greenbaum, S. (1996).

Greenbaum (1996) presents details about the International

Introducing ICE. In S.

Corpus of English (ICE). The countries from which data is

Greenbaum (Ed.), Comparing

sampled are listed and relevant regional challenges

English worldwide: The

discussed. Greenbaum (1996) also presents the core design

International Corpus of English

of the corpus and outlines central speaker selection

(pp. 3–12). Oxford, England:

criteria. Finally, more technical principles of uniform text

Clarendon.

annotation and relevant software are also discussed.

Bauer, L., & Holmes, J. (1996).

This paper uses ICE-New Zealand to analyze realizations

Getting into a flap!: /t/ in New

of /t/ and their constraints in New Zealand English. /t/

Zealand English. World

variants are shown to be sensitive to speaker age; older

Englishes,15(1), 115–124.

New Zealand English users use intervocalic /t/ syllableinitially while younger speakers also use it at the end of a syllable.

Corpus compilation

Phonetic analysis

CORPUS APPROACHES TO WE

1996

24

Meyer, C. F. (1996). Coordinate Meyer (1996) investigates formal and functional structures in English. World

characteristics of coordination in the American and British

Englishes,15(1), 29–41.

ICE components. And is the most dominant coordinator

Syntactic analysis

followed by but and or. Implicit coordination is also examined and its peripheral role with regard to overall frequencies profiled. Meyer also empirically explores pragmatic coordination, where a construction or coordinator does not follow grammatical rules of coordination, but promotes cohesion in a text. 1996

Schmied, J., & Hudson-Ettle, D. Using the newspaper section of the East African (1996). Analyzing the style of

component of ICE, the authors analyze the frequencies of

East African newspapers in

obligatory, optional, and continuous ing-forms across

English. World Englishes,15(1),

different journalistic subgenres to capture stylistic

103–113.

differences. Attention is paid to caveats in text compilation for regional corpora since newspaper house styles may

Stylistic analysis

CORPUS APPROACHES TO WE

25

dictate structural choices and news agency reports produced by non-local writers could potentially threaten regional representativeness. 2004

Haase, C. (2004).

Haase (2004) uses ICE-East Africa to quantitatively and

Conceptualization specifics in

qualitatively study componential profiles of verbal

East African English:

constructions. The author finds a trend towards explicit

Quantitative arguments from

structural reflections of cognitive components in informal

the ICE-East Africa corpus.

styles, but cognitively explicit double conflations as in

World Englishes,23(2), 261–

enter into in their metaphorical use (e.g. enter into an

268.

argument) also occur in formal settings. Consequently,

Cognitive analysis

cognitive analyses may need to take contextual settings into account. 2004

Sand, A. (2004). Shared

Frequencies of definite and indefinite articles are studied

Morpho-syntactic

morpho-syntactic features in

in three native Englishes, four ESLs and one EFL.

analysis

contact varieties of English:

Although article use is strongly influenced by the genre in

CORPUS APPROACHES TO WE

26

Article use. World

which it occurs, substrate influence does not determine

Englishes,23(2), 281–298.

article use and ESL varieties share usage patterns distinct from the inner circle.

2004

Schneider, E.W. (2004). How to Using the East African, British, Indian, Philippine, and trace structural nativization:

Singaporean ICE components, Schneider (2004)

Particle verbs in world

investigates particle verbs with regard to frequency,

Englishes. World

distribution across language modes, and productivity. With

Englishes,23(2), 227–249.

types and tokens, Singapore English displays the largest

Dynamic model

number of particle verbs while the other varieties show lower frequencies. Implicitly, this article shows how to address nativization using corpus-linguistic tools. 2006

Mukherjee, J., & Hoffmann, S.

Mukherjee and Hoffmann (2006) use ICE-Great Britain,

Lexico-grammatical

(2006). Describing verb-

ICE-India, and a web-derived newspaper corpus of Indian

analysis

complementational profiles of

English to study the verb-complementational patterns of

new Englishes: A pilot study of

the ditransitive verb GIVE (and SEND). Monotransitive

CORPUS APPROACHES TO WE

27

Indian English. English World-

GIVE is found to characterize Indian English more

Wide, 27(2), 147–173.

strongly than British English, and a number of verbs are used distransitively in Indian English but not in its historical input variety.

2009

Imm, T. S. (2009). Lexical

Imm (2009) identifies different types of contact-induced

borrowing from Chinese

Chinese-based borrowings in a Malaysian English

languages in Malaysian

newspaper corpus including loanwords, compound blends,

English. World Englishes,28(4),

and loan translations that express certain Chinese concepts.

451–484.

Those concepts are grouped into semantic fields that

Lexical analysis

reflect Malaysian English speakers’ motivation to integrate Chinese-based vocabulary into Malaysian English. 2009

Mukherjee, J., & Gries, S. Th.

Mukherjee & Gries (2009) explore intransitive,

Collo-structional

(2009). Collostructional

monotransitive, and ditransitive complementation patterns

analysis

nativisation in New Englishes:

in Asian Englishes and how strongly individual verbs are

Verb-construction associations

attracted by one of the syntactic patterns. Their results

CORPUS APPROACHES TO WE

28

in the International Corpus of

reveal collostructional nativization, i.e. the diachronic

English. English World-

change in preferences of individual verbs to occur with a

Wide,30(1), 27–51.

specific syntactic pattern. Collostructional nativization is most prominent in Singapore English, the variety argued to be furthest structurally emancipated from British English.

2009

Peters, P. (2009). Australian

Peters (2009) investigates linguistic epicenters (i.e. norm

English as a regional epicentre.

providing standards for a given region) and whether

In T. Hoffmann & L. Siebers

Australian English can be regarded as a linguistic epicentre

(Eds.), World Englishes –

for New Zealand English. Lexical analyses of regional,

Problems, properties and

rural, and aboriginal vocabulary, morphological

prospects (pp. 107–124).

investigations of specific suffixes (e.g. -ie/-(e)y, -o, -aroo/-

Amsterdam, The Netherlands:

eroo) and syntactic studies (e.g. of negation) are

John Benjamins.

compatible with an Australian English influence on New Zealand English.

Linguistic epicenters

CORPUS APPROACHES TO WE

2011

29

van Rooy, B. (2011). A

This quantitative study traces the emergence of two

principled distinction between

linguistic innovations in Black South African English (i.e.

error and conventionalized

the extension of the progressive aspect to stative verbs and

innovation in African

the use of can be able to) and one innovation in East

Englishes. In J. Mukherjee &

African English (enable + bare infinitive). Based on

M. Hundt (Eds.), Exploring

grammatical systematicity and acceptability, the three

second-language varieties of

grammatical patterns are shown to have undergone a

English and learner Englishes:

process of overextension from error to conventionalized

Bridging a paradigm gap (pp.

innovation.

Errors vs. innovations

189–207). Amsterdam, The Netherlands: John Benjamins. 2012

Bolton, K. (2012). World

This is the first study that introduces the use of English in

Englishes and linguistic

urban public signage to the research agenda of WE. Bolton

landscapes. World Englishes,

(2012) shows the relevance of exploring linguistic

31(1), 30–33.

landscapes for WE research and how they complement

Linguistic landscapes

CORPUS APPROACHES TO WE

30

coarse-grained studies of regional varieties of English with more detailed accounts representing stronger speaker/experiencer orientation. 2012

Werner, J., & Mukherjee, J. This is a quantitative corpus study that contrasts the (2012).

Highly

polysemous frequencies of occurrence of the senses of give and take in

verbs in New Englishes: A written Indian and Sri Lankan English. Results show that corpus-based

study

of

Sri polysemous verbs are used systematically differently

Lankan and Indian English. In S. across the two varieties and stress the importance of Hoffmann, P. Rayson & G. semantics as an important aspect of structural nativization. Leech

(Eds.),

Corpus

Linguistics: Looking back – moving forward (pp. 249–266). Amsterdam, The Netherlands: Rodopi.

Semantics

CORPUS APPROACHES TO WE

2013

31

De Cuypere, L. & Verbeke, S.

This quantitative corpus study of the dative alternation

Speakers’ linguistic

(2013). Dative alternation in

with GIVE uses a mixed-effects logistic regression

choices

Indian English: A corpus-based

approach to explain the constructional choices of Indian

analysis. World

English speakers based on 14 linguistic predictors. The

Englishes,32(2), 169–184.

results indicate that three of those predictors explain the preference of ESL speakers for to-dative constructions as a possible transfer effect from Hindi.

2015

Gries, S. Th., & Deshors, S. C.

This contrastive analysis of the dative alternation in

(2015). EFL and/vs. ESL? A

spoken and written EFL and ESL is the first study of its

multi-level regression modeling

kind to involve hierarchical mixed-effects modeling to

perspective on bridging the

control for speaker- and verb-specific effects as well as the

paradigm gap. International

hierarchical structure of its chosen L2 corpus (thereby

Journal of Learner Corpus

accounting for patterns and noise in the data that tend to be

Research, 1(1), 130–159.

ignored). ESL and EFL emerge as discreet types of

MuPDAR, EFL vs. ESL

CORPUS APPROACHES TO WE

32

varieties, which contribute to the wider discussion on the (dis)similarities of EFL and ESL varieties. 2017

Edwards, A. (2017). ICE Age 3: At a time when ICE primarily included native- and The expanding circle. World

second-language English varieties (i.e. varieties pertaining

Englishes, 36(3), 404–426.

to the Inner and Outer Circles in Kachru’s Three Circles

Corpus compilation

model), Edwards provides a template for researchers to expand the family of ICE by integrating varieties of EFL. Edwards explains how she compiled the Corpus of Dutch English, the first EFL corpus that mirrors the ICE architecture. 2017

Szmrecsanyi, B., Grafmiller, J.,

Using a conditional inference tree statistical approach, this

Heller, B., & Röthlisberger, M.

multivariate analysis of particle placement, genitive, and

(2017). Around the world in

dative alternations in British, Canadian, Indian, and

three alternations. Modeling

Singapore English explores the sociolinguistics of

syntactic variation in varieties

postcolonial English communities through the theoretical

Probabilistic grammar

CORPUS APPROACHES TO WE

33

of English. English World-

lens of probabilistic grammar. Although the varieties

Wide, 37(2), 109–137.

investigated share a core probabilistic grammar, a process of “probabilistic indigenization” is observed, whereby stochastic patterns of internal linguistic variation are reshaped by shifting usage frequencies in speakers of postcolonial varieties. This study calls for triangulation of corpus findings and cognitive experiments.

2017

Barron, A. (2017). The speech

This contrastive pragmatic analysis of the speech act

Pragmatics,

act of ‘offers’ in Irish English.

‘offers’ in Irish and British Englishes explores (i) offer

quantitative and

World Englishes,36(2), 224–

strategies and offer strategy realizations used to perform

qualitative corpus

238.

offers in both English varieties, (ii) sociopragmatic

methodologies

constraints relating to offer topic which influence the realization of offers in both Englishes, and (iii) whether offers in Irish English differ from those in British English on a pragmalinguistic and/or a sociopragmatic level.

CORPUS APPROACHES TO WE

34

Methodologically, this study stresses the importance of combining quantitative and qualitative corpus approaches to speech acts across world Englishes. 2017

Heller, B., Bernaisch, T., &

In this study, the state-of-the-art MuPDAR(F) statistical

Large-scale fine-grained

Gries, S. Th. (2017). Empirical

method (a multifactorial deviation analysis based on

approach

perspectives on two potential

random forest classifications) is used to explore the

epicenters: The genitive

identification of linguistic epicenters in South and South-

alternation in Asian Englishes.

East Asia. Focusing on the genitive alternation, the study

ICAMEJournal,41, 111–144.

shows that four factors of structural nativization (possessor animacy, head frequency differences, length differences between possessor and possessum, and possessor thematicity) explain the tendency of Asian English speakers to use s-genitives where British speakers would use of-genitives. Further, structural nativization occurs at a

CORPUS APPROACHES TO WE

35

linguistic level deeper than the surface structure, thereby altering the structural profile of English varieties. 2017

Kruger, H., & van Rooy, B.

This study investigates linguistic innovations in EFL and

(2017). Editorial practice and

ESL by exploring endonormativity in the context of

the progressive in Black South

editorial practice. Focusing on the processes of

African English. World

conventionalization and legitimization, the paper examines

Englishes, 36(1), 20–41.

how an innovation transforms into an accepted linguistic feature and how learner English becomes a New English variety. Linguistically, the study focuses on progressive marking in Black South African English and shows that exposure to linguistic innovations drives endonormative stabilization in a relatively automatic fashion. The results also suggest that this stabilization may result from psycholinguistic processes associated with the production and reception of written texts.

Linguistic innovations

CORPUS APPROACHES TO WE

2017

36

Percillier, M., & Paulina, C.

This study stands out in that it utilizes a corpus of literary

Nativized features vs.

(2017). Corpus-based

texts of Scottish, West African, and South-East Asian

localized literature

investigation of world Englishes Englishes to connect nativized features and local in literature. World

literatures. Adopting a mixed quantitative-qualitative

Englishes,36(1), 127–147.

approach, the study reveals that although texts from the investigated world Englishes share underlying linguistic patterns, as localized varieties, their profiles are very distinct. Further, localized English forms in literary texts serve specific functions allowing for the distinction of characters in terms of importance, ethnicity, and age.

To

Gries, S. Th., Bernaisch, T., &

appear Heller, B. (2018). In S. C.

This empirical study challenges the validity of the two main assumptions that have so far driven corpus-based

Deshors (Ed.), Modeling World

studies of world Englishes: (i) synchronic data can

Englishes: Assessing the

approximate diachronic processes, and (ii) historical

interplay of emancipation and

source varieties change so little in the time period under

Diachronic MuPDAR

CORPUS APPROACHES TO WE

37

globalization of ESL varieties

consideration that their changes relative to the historical

(pp 245–279).Amsterdam, The

variety can be dismissed from consideration. Focusing on

Netherlands: John Benjamins.

the genitive alternation (of vs.’s) the study is the first one to apply the state-of-the-art multifactorial statistical approach MuPDAR to diachronic corpus data of WE to predict to what extent the linguistic choices of Singapore English speakers resembled those of British English speakers across the 1950s, 1960s, and 1990s. The study bears important theoretical implications for the theorizing of WE and their development through time.

CORPUS APPROACHES TO WE

38 References

Anthony, L. (2017). AntConc (Version 3.5.0) [computer software]. Tokyo, Japan: Waseda University. Available from http://www.laurenceanthony.net/software/antconc/. Biewer, C., Bernaisch, T., Heller, B., & Berger, M. (2014). Compiling The Diachronic Corpus of Hong Kong English (DC-HKE): motivation, progress and challenges. Poster presented at 35th Annual Conference of the International Computer Archive for Modern and Medieval English (ICAME 35), Corpus Linguistics, context and culture, 30 Apr–4 May 2014, University of Nottingham. Davies, M., & Fuchs, R. (2015). Expanding horizons in the study of World Englishes with the 1.9 billion word Global Web-based English Corpus (GloWbE). English WorldWide,36(1), 1–28. Deshors, S. C. (2017). Structuring subjectivity in Asian Englishes: Multivariate approaches to mental predicates across genres and functional uses. English Text Construction,10(1), 132–163. Deshors, S. C. (Ed.). 2018. Modeling World Englishes: Assessing the interplay of emancipation and globalization of ESL varieties.Amsterdam, The Netherlands: John Benjamins. ELFA. (2008). The Corpus of English as a Lingua Franca in Academic Settings [Corpus]. Retrieved from http://www.helsinki.fi/elfa/elfacorpus. Evans, S. (2014). The evolutionary dynamics of postcolonial Englishes: A Hong Kong case study. Journal of Sociolinguistics, 18(5), 571–603. Fillmore, C. J. (1992). “Corpus linguistics” or “Computer-aided armchair linguistics”. In J. Svartvik (Ed.), Directions in Corpus Linguistics (pp. 35–66). New York, NY: Mouton de Gruyter.

CORPUS APPROACHES TO WE

39

Greenbaum, S. (1988). A proposal for an international computerized corpus of English. World Englishes, 7(3), 315. Gries, S. Th. (2009). What is Corpus Linguistics. Language and Linguistics Compass, 3, 1–17. Gries, S. Th., & Deshors, S. C. (2014). Using regressions to explore deviations between corpus data and a standard/target: Two suggestions. Corpora,9(1), 109–136. Gries, S. Th., & Bernaisch, T. (2016). Exploring epicentres empirically: Focus on South Asian Englishes. English World-Wide, 37(1), 1–25. Gut, U., & Fuchs, R. (2017). Exploring speaker fluency with phonologically annotated ICE corpora. World Englishes, 36(3), 387–403. Hoffmann, S., Sand, A., & Tan, P. (2012). The Corpus of Historical Singapore English – A first pilot study on data from the 1950s and 1960s. Paper presented at 33rd Annual Conference of the International Computer Archive for Modern and Medieval English (ICAME 33), Corpora at the centre and crossroads of English linguistics, 30 May–3 June 2012, KU Leuven. Ishikawa, S. (2014). Design of the ICNALE-Spoken: a new database for multi-modal contrastive interlanguage analysis. In S. Ishikawa (Ed.), Learner Corpus Studies in Asia and the World, Vol 2 (pp. 63–76). Kobe, Japan: Kobe University. Kortmann, B. (2010). Variation across Englishes. In A. Kirkpatrick (Ed.), Routledge Handbook of World Englishes (pp. 400–424). London, England: Routledge. Mair, C. (2013). The World System of Englishes: Accounting for the transnational importance of mobile and mediated vernaculars. English World-Wide, 34(3), 253–278.

CORPUS APPROACHES TO WE

40

Mair, C. (2017). World Englishes and corpora. In M. Filppula, J. Klemola, and D. Sharma (Eds.), The Oxford Handbook of World Englishes (pp. 103–122). Oxford, England: Oxford University Press. Mair, C. (2018). Stabilising domains of English-language use in Germany: Global English in a non-colonial languagescape. In S. C. Deshors (Ed.), Modeling World Englishes: Assessing the interplay of emancipation and globalization of ESL varieties (pp. 45–77). Amsterdam, The Netherlands: John Benjamins. McEnery, T., & Hardie, A. (2012). Corpus Linguistics: Method, theory and practice. Cambridge, England: Cambridge University Press. Mukherjee, J., & Bernaisch, T. (2015). Cultural keywords in context: A pilot study of linguistic acculturation in South Asian Englishes. In P. Collins (Ed.), Grammatical Change in English World-Wide (pp. 411–435). Amsterdam, The Netherlands: John Benjamins. Nelson, G., & Ozón, G. (2018). World Englishes and Corpus Linguistics. In E. L. Low and A. Pakir (Eds.), World Englishes Rethinking Paradigms (pp. 149–164). New York, NY: Routledge. Rosen, A. (2016). The fate of linguistic innovations: Jersey English and French learner English compared. International Journal of Learner Corpus Research, 2(2), 302–322. Sand, A. (2013). Corpus Analysis of English as a world language. In C. Chapelle (Ed.), The Encyclopedia of Applied Linguistics. London, England: Wiley-Blackwell. Scott, M. (2017). WordSmith Tools (Version 7.0) [computer software]. Available from http://lexically.net/LexicalAnalysisSoftware/. Szmrecsanyi, B., & Kortmann, B. (2011). Typological profiling: Learner Englishes versus indigenized L2 varieties of English. In J. Mukherjee & M. Hundt (Eds.), Exploring

CORPUS APPROACHES TO WE

41

second-language varieties of English and learner Englishes: Bridging a paradigm gap (pp. 167–188). Amsterdam, The Netherlands: John Benjamins. Werner, J., & Mukherjee, J. (2012). Highly polysemous verbs in New Englishes: A corpus-based study of Sri Lankan and Indian English. In S. Hoffmann, P. Rayson & G. Leech (Eds.), Corpus Linguistics: Looking back – moving forward (pp. 249–266). Amsterdam, The Netherlands: Rodopi.