Louvain International Database of Spoken English

1 downloads 0 Views 159KB Size Report
Louvain International Database of. Spoken English Interlanguage. (LINDSEI). Gaëtanelle Gilquin, Sylvie De Cock, Sylviane Granger (eds) ...
Louvain International Database of Spoken English Interlanguage (LINDSEI)

Louvain International Database of Spoken English Interlanguage (LINDSEI)

Gaëtanelle Gilquin, Sylvie De Cock, Sylviane Granger (eds)

© Presses universitaires de Louvain, 2010 Registration of copyright: D/2010/9964/44 ISBN: 978-2-87463-245-7 Cover: mikedsign Printed in Belgium All rights reserved. No part of this publication may be reproduced, adapted or translated, in any form or by any means, in any country, without the prior permission of Presses universitaires de Louvain. Distribution: www.i6doc.com, online university publishers Available at CIACO University Distributors Grand-Place, 7 1348 Louvain-la-Neuve, Belgium Tel. 32 10 47 33 78 Fax 32 10 45 73 50 [email protected]

LINDSEI

i

TABLE OF CONTENTS List of tables and figures ...................................................................... v! Preface ................................................................................................ vii! Acknowledgements ........................................................................... viii! List of abbreviations............................................................................ ix!

0. INTRODUCTION .......................................................................... 1!

PART 1. DESCRIPTION OF THE CORPUS................................. 3! 1. General introduction.................................................................. 3! 1.1. Historical background ........................................................... 3! 1.2. Corpus or database?............................................................... 5! 1.3. LINDSEI design criteria........................................................ 6! 1.3.1. Interview variables ......................................................... 7! 1.3.1.1. Genre ....................................................................... 8! 1.3.1.2. Duration................................................................... 8! 1.3.1.3. Three tasks............................................................... 8! 1.3.1.4. Institution ................................................................ 9! 1.3.2. Learner variables ............................................................ 9! 1.3.2.1. Learning context...................................................... 9! 1.3.2.2. Proficiency............................................................. 10! 1.3.2.3. Age ........................................................................ 11! 1.3.2.4. Gender ................................................................... 11! 1.3.2.5. Mother tongue ....................................................... 11! 1.3.2.6. Country.................................................................. 12! 1.3.2.7. Knowledge of other foreign languages ................. 12! 1.3.2.8. Stay in an English-speaking country ..................... 12! 1.3.3. Interviewer variables .................................................... 12! 1.3.3.1. Gender ................................................................... 12! 1.3.3.2. Mother tongue ....................................................... 12! 1.3.3.3. Knowledge of foreign languages........................... 12! 1.3.3.4. Status ..................................................................... 13! 1.4. Transcription and markup ................................................... 13! 2. Project teams............................................................................. 19! 2.1. Coordinating team ............................................................... 19!

ii

LINDSEI

2.2. IT team ................................................................................ 19! 2.3. National teams..................................................................... 19! 3. Structure of the corpus ............................................................ 23! 3.1. General breakdown.............................................................. 23! 3.1.1. Length of the interviews............................................... 23! 3.1.2. Three tasks.................................................................... 28! 3.1.3. Duration of the interviews............................................ 30! 3.1.4. Set topic........................................................................ 30! 3.1.5. Learners’ age ................................................................ 32! 3.1.6. Learners’ gender........................................................... 32! 3.1.7. Learners’ education ...................................................... 33! 3.1.8. Stay in an English-speaking country ............................ 35! 3.1.9. Interviewers’ gender..................................................... 35! 3.1.10. Interviewers’ mother tongue ...................................... 36! 3.2. Breakdown per national subcorpus ..................................... 37! 3.2.1. Subcorpus information ................................................. 37! 3.2.1.1. Bulgarian subcorpus .............................................. 37! 3.2.1.2. Chinese subcorpus................................................. 38! 3.2.1.3. Dutch subcorpus .................................................... 39! 3.2.1.4. French subcorpus................................................... 40! 3.2.1.5. German subcorpus................................................. 41! 3.2.1.6. Greek subcorpus .................................................... 41! 3.2.1.7. Italian subcorpus.................................................... 42! 3.2.1.8. Japanese subcorpus................................................ 43! 3.2.1.9. Polish subcorpus.................................................... 44! 3.2.1.10. Spanish subcorpus ............................................... 45! 3.2.1.11. Swedish subcorpus .............................................. 46! 3.2.2. Learning context........................................................... 47! 3.2.2.1. Bulgaria (LINDSEI-BG) ....................................... 47! 3.2.2.2. China (LINDSEI-CH) ........................................... 48! 3.2.2.3. Dutch-speaking Belgium (LINDSEI-DU)............. 48! 3.2.2.4. French-speaking Belgium (LINDSEI-FR) ............ 49! 3.2.2.5. Germany (LINDSEI-GE) ...................................... 50! 3.2.2.6. Greece (LINDSEI-GR).......................................... 51! 3.2.2.7. Italy (LINDSEI-IT) ............................................... 52! 3.2.2.8. Japan (LINDSEI-JP).............................................. 53! 3.2.2.9. Poland (LINDSEI-PL)........................................... 54! 3.2.2.10. Spain (LINDSEI-SP)........................................... 55! 3.2.2.11. Sweden (LINDSEI-SW)...................................... 55!

LINDSEI

iii

4. Methodology ............................................................................. 57! 4.1. Learner corpora and SLA .................................................... 57! 4.2. Intra-corpus comparisons .................................................... 58! 4.3. LINDSEI vs ICLE ............................................................... 62! 4.4. LINDSEI vs LOCNEC........................................................ 65! 4.5. Four-way comparisons ........................................................ 67!

PART 2. LINDSEI USER MANUAL ............................................. 71! 1. Introduction .............................................................................. 71! 2. Licence Agreement ................................................................... 72! 3. Software installation ................................................................ 76! 3.1. Individual licence ................................................................ 76! 3.1.1. Installation procedure ................................................... 76! 3.1.2. Launching the software ................................................ 77! 3.1.3. Uninstalling the software.............................................. 77! 3.1.4. Location of the data...................................................... 77! 3.2. Multiple-user licence........................................................... 78! 3.2.1. Installation procedure ................................................... 78! 3.2.2. Launching the software ................................................ 79! 3.2.3. Uninstalling the software.............................................. 79! 3.2.4. Location of the data...................................................... 79! 4. The REQUEST window........................................................... 80! 4.1. Composition of the Request window .................................. 80! 4.1.1. Alphanumerical fields .................................................. 82! 4.1.2. Numerical fields ........................................................... 83! 4.1.3. Variables with multiple options ................................... 83! 4.2. Selecting the corpus............................................................. 84! 4.3. The Zoom function.............................................................. 84! 4.4. Resetting the selection......................................................... 86! 4.5. Submitting the query ........................................................... 86! 4.6. Functions available on the main command bar of the Request window ......................................................................... 86! 5. The RESULT window .............................................................. 87! 5.1. Deselecting profiles............................................................. 87! 5.2. Key terms: selected corpus and sub-corpus ........................ 88!

iv

LINDSEI

5.3. Viewing the profiles ............................................................ 88! 5.4. Generating a report.............................................................. 90! 5.5. Computing statistics ............................................................ 91! 5.6. Viewing a text and merging texts into a corpus .................. 95! 5.7. Functions available on the main command bar of the Result window............................................................................ 97! 6. Report problems ....................................................................... 98!

LINDSEI bibliography ....................................................................... 99! Appendix 1: Cartoon used in the picture description task................ 109! Appendix 2: Learner profile questionnaire ...................................... 110!

LINDSEI

v

LIST OF TABLES AND FIGURES ! Table 1: List of institutions per national subcorpus ............................ 9! Table 2: CEF rating of LINDSEI sample........................................... 11! Table 3: General transcription conventions ...................................... 16! Table 4: Markup for section delimitation........................................... 18! Table 5: Markup for contextual information...................................... 18! Table 6: Distribution of interviews/words per subcorpus (A & B turns) .......................................................................................... 23! Table 7: Average length of interviews (A & B turns)......................... 24! Table 8: Distribution of interviews/words per subcorpus (B turns only)............................................................................................ 25! Table 9: Average length of interviews (B turns only) ........................ 25! Table 10: Distribution of interviews/words per subcorpus (A & B turns, without filled pauses and backchannels).......................... 26! Table 11: Average length of interviews (A & B turns, without filled pauses and backchannels) .......................................................... 27! Table 12: Distribution of interviews/words per subcorpus (B turns only, without filled pauses and backchannels)........................... 27! Table 13: Average length of interviews (B turns only, without filled pauses and backchannels) .......................................................... 28! Table 14: Distribution of words per task (A & B turns) .................... 29! Table 15: Distribution of words per task (B turns only) .................... 29! Table 16: Total duration of interviews (in hours:minutes:seconds) .. 30! Table 17: Average duration of interviews (in minutes:seconds)........ 31! Table 18: Distribution of set topics (C = country, E = experience, F = film/play) ............................................................................. 31! Table 19: Learners’ age ..................................................................... 32! Table 20: Learners’ gender................................................................ 33! Table 21: Number of years of English at school ................................ 34! Table 22: Number of years of English at university........................... 34! Table 23: Number of months in an English-speaking country........... 35! Table 24: Interviewers’ gender .......................................................... 36! Table 25: Interviewers’ mother tongue .............................................. 37! Table 26: Average number of words per interview according to gender......................................................................................... 62! Table 27: Most frequent words in the picture description part of LINDSEI ..................................................................................... 62! Table 28: Shared learner variables in LINDSEI and ICLEv2 ........... 63!

vi

LINDSEI

Table 29: Proportion (%) of the reduced form (cos) to the total number of reduced and full forms (cos + because) in LOCNEC (NS) and the LINDSEI subcorpora............................ 66! Table 30: Relative frequency (per 100,000 words) of I mean in LINDSEI-FR and LOCNEC ....................................................... 67

Figure 1: LINDSEI variables................................................................7 Figure 2: Relative frequency (per 100,000 words) of in fact in several LINDSEI subcorpora.....................................................61 Figure 3: Relative frequency (per 100,000 words) of in fact in several LINDSEI and ICLEv2 subcorpora................................ 64! Figure 4: Relative frequency (per 100,000 words) of cos in LOCNEC (NS) and the LINDSEI subcorpora............................ 66! Figure 5: Relative frequency (per 100,000 words) of delexical make in native and French learner speech and writing............. 68! Figure 6: The Request window (‘Interview’ screen).......................... 80! Figure 7: The Request window (‘Learner’ screen) ............................ 81! Figure 8: The Request window (‘Interviewer’ screen) ...................... 82! Figure 9: The Zoom function ............................................................. 85! Figure 10: The Result window (grid view)......................................... 87! Figure 11: The Result window (form view)........................................ 89! Figure 12: The Request Info window ................................................. 90! Figure 13: Sorting the columns in the grid view................................ 91! Figure 14: The Report Viewer window (profiles) .............................. 91! Figure 15: The Statistics window....................................................... 93! Figure 16: Pie chart representation of the statistics.......................... 94! Figure 17: The Report Viewer window (statistics) ............................ 94! Figure 18: Selection of interview sections ......................................... 96! Figure 19: Saving the corpus ............................................................. 96!

LINDSEI

vii

PREFACE Some twenty years after their emergence, learner corpora appear to have earned their place in the study of interlanguage. Partly thanks to the success of the International Corpus of Learner English, more and more English as a Foreign Language specialists are relying on corpus data to investigate various aspects of the acquisition of a foreign language. Learner corpora representing languages other than English have become more widespread, and learner corpus research has started to make an impact on areas like Second Language Acquisition and the production of pedagogical materials. As was the case with native language, however, spoken corpora tend to lag behind, mainly because of the difficulties involved in recording the data and transcribing them. The Louvain International Database of Spoken English Interlanguage (LINDSEI) seeks to reduce this imbalance by providing a large collection of spoken data produced by foreign learners of English with a wide range of mother tongues. All in all, it contains over one million words, resulting from informal interviews with learners from eleven different mother tongue backgrounds. The corpus, which represents the fruits of 15 years of international collaboration between a large number of universities, is now being released in CD-ROM format with a user-friendly interface which allows researchers to compile their own tailor-made corpora on the basis of a set of predefined criteria. It is accompanied by a handbook which includes a comprehensive description of the corpus and a detailed user manual. LINDSEI is a dynamic project. In fact, as this handbook goes to press, three teams have already begun to collect data for new components of the corpus, so there is little doubt that there will be a second release. We therefore encourage users to contact us with their feedback and suggest additional features for future editions. Sylvie De Cock Gaëtanelle Gilquin Sylviane Granger August 2010

viii

LINDSEI

ACKNOWLEDGEMENTS Our deepest gratitude goes to the LINDSEI national teams, without whom this adventure would never have been possible. We thank them for their trust in this project, their enthusiasm, and their patience. We consider ourselves extremely fortunate to have been able to collaborate with such wonderful people! Over the years, we have been lucky enough to count on the help of several other colleagues or students, among whom Stephanie Petch-Tyson, Claire Hugon and Caroline Gerckens. We are also greatly indebted to our beta-testers, Sandra Götz, Susanne Kämmerer and Pascual Pérez-Paredes, as well as to Patrick Watrin and Miguël Gilquin, who designed the LINDSEI logo. Completing a project like LINDSEI also requires a great deal of technical skills. We found these at the Centre de Traitement Automatique du Langage (CENTAL) of the Université catholique de Louvain, directed by Cédrick Fairon, whom we thank for having welcomed the idea of joining forces on the production of the CDROM. We are particularly grateful to Claude Devis from the CENTAL for kindly lending us his considerable expertise and for doing such a fantastic job on the LINDSEI interface. Last but not least, special thanks are due to the many learners, all over the world, who agreed to be interviewed and to contribute their data to LINDSEI. There is no doubt that, ultimately, their precious cooperation will help future generations of learners like them to improve their oral proficiency in English.

LINDSEI

ix

LIST OF ABBREVIATIONS BNC CEF CIA EFL ESL FL ICLE ICLEv2 L1 L2 LC LCR LLC LOCNEC LOCNESS NNS NS SLA

British National Corpus Common European Framework of Reference for Languages Contrastive Interlanguage Analysis English as a Foreign Language English as a Second Language Foreign Language International Corpus of Learner English Second version of the ICLE CD-ROM and handbook (Granger et al. eds, 2009) Native language Foreign/Second language Learner Corpus Learner Corpus Research London-Lund Corpus Louvain Corpus of Native English Conversation Louvain Corpus of Native English Essays Non-Native Speaker Native Speaker Second Language Acquisition

List of the main markers used in the LINDSEI transcripts



Beginning of the interviewer’s turn End of the interviewer’s turn Beginning of the learner’s turn End of the learner’s turn Beginning of the set topic task End of the set topic task Beginning of the free discussion task End of the free discussion task Beginning of the picture description task End of the picture description task

LINDSEI

1

0. INTRODUCTION This handbook is a companion to the LINDSEI CD-ROM, which contains the actual database, integrated into a search interface that allows users to select (parts of) interviews matching a set of predefined variables. The handbook is divided into two main parts, one describing the corpus (Part I) and the other one consisting of a user manual (Part II). Part I starts with an overview of the historical background of the LINDSEI project (Section 1.1). It then discusses whether LINDSEI should best be described as a corpus or as a database (Section 1.2). Section 1.3 expands on the criteria that were adopted to design LINDSEI. These criteria relate to the interview itself (e.g. duration of the interview), the learner (e.g. age, stay in an English-speaking country) and the interviewer (e.g. gender, mother tongue). The next section, Section 1.4, outlines the conventions that were used to transcribe the interviews, as well as the markup that was introduced into the transcripts. Section 2 lists the collaborators involved in the project, while Section 3 describes the structure of the corpus, starting with a general breakdown according to the main variables encoded (length of the interviews, learners’ education, etc), and then moving on to a breakdown per national subcorpus. The latter also includes a brief summary of the context in which English is learned in each of the countries where the LINDSEI data were collected. The last section of Part I, Section 4, deals with methodology and gives some concrete examples of how the database can actually be exploited to carry out research into learner speech. Part II of the handbook provides a detailed user manual. It is followed by a select list of references based on LINDSEI, as well as two appendices.