ABSTRACT. This study examined the ultimate attainment of adult-onset second language (L2) ... 3.5.2.1. The Peabody Picture Vocabulary Test, Fourth Edition ...... phonology, grammar, and pragmatics, in other words every aspect of language that is ...... manual. San Diego, CA: Singular Publishing Group. Flege, J. E. (1999).
BOSTON UNIVERSITY SCHOOL OF EDUCATION
Dissertation
THE LIMITS OF EVENTUAL LEXICAL ATTAINMENT IN ADULT-ONSET SECOND LANGUAGE ACQUISITION
by
ANDREA BORBÉLY HELLMAN Egyetemi oklevél (M.A.), Attila József University (University of Szeged), 1990
Submitted in partial fulfillment of the requirements for the degree of Doctor of Education 2008
© Copyright by ANDREA BORBÉLY HELLMAN 2008 ii
Approved by
First Reader
___________________________________________________________ Shanley Allen, Ph.D. Associate Professor of Education
Second Reader ___________________________________________________________ Mary Catherine O’Connor, Ph.D. Associate Professor of Education
Third Reader
___________________________________________________________ Marnie Reed, Ed.D. Associate Professor of Education
Fourth Reader
___________________________________________________________ John Read, Ph.D. Associate Professor in Applied Language Studies and Linguistics The University of Auckland, Faculty of Arts
iii
THE LIMITS OF EVENTUAL LEXICAL ATTAINMENT IN ADULT-ONSET SECOND LANGUAGE ACQUISITION (Order No.
)
ANDREA BORBÉLY HELLMAN Boston University School of Education, 2008 Major Professor: Shanley Allen, Ph.D., Associate Professor of Education
ABSTRACT This study examined the ultimate attainment of adult-onset second language (L2) learners in the lexical domain. A substantial body of research has documented age of acquisition effects on the ultimate attainment of L2 learners in the domains of phonology and morphosyntax; however, only limited data exist regarding the ultimate achievement of adult-onset L2 learners in the area of the lexicon, particularly vocabulary size and depth of word knowledge. This study probed the upper limit of eventual L2 lexical achievement by comparing a group of highly proficient adult-onset L2 learners with 1052 years of significant exposure to the English (N = 33) to two groups of comparably educated native speakers of English, a monolingual group (N = 30) and a bilingual group (N = 30). Measures included two vocabulary size tests (aural and written), and a depth of word knowledge test. The results indicated that the L2 learner group was significantly different from both native speaker groups due to lower mean achievement on the aural vocabulary size measure. However, the rate of native level achievement among the adult-onset L2 learners was 76%. Five (15%) obtained scores above the native speaker mean on all three lexical tasks; their characteristics were reported in case studies. Follow-up exploratory analyses suggested that for the adult-onset L2 learners, 46% of the variance on test scores was related to the linear combination of three viii
predictor variables: caregivers’ education, verbal ability and literacy in the native language, interest in new words and daily reading. For L2 vocabulary size and depth of word knowledge, the data in this study did not signal the existence of a critical period for acquisition. The data showed that the upper limit of L2 lexical achievement was native level vocabulary size and depth of word knowledge even for those individuals who did not start acquiring their second language until the third or fourth decade of life. In addition, the study detected no effect for bilingual status on the lexical measures among the native speakers. The findings constitute evidence that the lexical domain may be the most successful area of adult-onset L2 acquisition.
ix
TABLE OF CONTENTS
ACKNOWLEDGEMENTS
v
ABSTRACT
viii
TABLE OF CONTENTS
x
LIST OF TABLES
xiv
LIST OF FIGURES
xvi
1.
CHAPTER ONE:
INTRODUCTION
1
2.
CHAPTER TWO:
REVIEW OF THE LITERATURE
8
2.1.
Age Effects on L2 Acquisition and the Possibility of Nativelike Ultimate Attainment
8
2.2.
Little Evidence of Nativelike Attainment in Phonology
12
2.3.
Nonnativelike Attainment the Norm in Morphosyntax
13
2.4.
Promising Results for Adult-onset L2 Learners in the Area of Phrasal Semantics
2.5.
14
Neuroimaging Evidence Against Age of Onset Effects in the Lexicosemantic Domain
16
2.6.
The Lexicosemantc Domain and the Lexicon
22
2.7.
Models of the Lexicon
23
2.8.
The Lexical Attainment: The Dimensions of Lexical Knowledge
30
2.9.
Inconclusive Findings on Age Effects and Ultimate Lexical Attainment
32
2.10.
The Lexical Attainment of Adult L1 English Speakers: Vocabulary Size
46
2.11.
Testing Depth of Word Knowledge
52
2.12.
Motivations for the Study
55
x
3.
CHAPTER THREE:
METHODOLOGY
3.1.
Overview of the Research Design
58
3.2.
The Research Questions
60
3.3.
Participants
60
3.3.1.
61
Controlled Variables
58
3.3.2.1. Adult-onset L2 Learners: Participation Criteria
65
3.3.2.2. Adult-onset L2 Learners: Measurements of Variables
68
3.4.
Recruitment
74
3.5.
Data Collection
74
3.5.1.
Informed Consent
75
3.5.2.
Research Instruments
76
3.6.
3.7.
3.8.
3.5.2.1. The Peabody Picture Vocabulary Test, Fourth Edition
76
3.5.2.2. Self-rated Vocabulary Test
77
3.5.2.3. Word Associates Test
82
3.5.2.4. Participant Questionnaire
85
Scoring
86
3.6.1.
Scoring the PPVT-4
86
3.6.2.
Scoring the SRVT
86
3.6.3.
Scoring the WAT
86
Data Analysis
86
3.7.1.
Native Level Attainment
88
3.7.2.
Overall Measure of Lexical Attainment
89
3.7.3.
Composite Variables in the Follow-up Analyses
89
Summary of Methodology
90
xi
4.
CHAPTER FOUR:
RESULTS
92
4.1.
Introduction
4.2.
Question 1: Is native level L2 lexical attainment possible for adult-onset
92
L2 learners? 4.2.1.
92
Question 1a: Do high proficient adult-onset L2 learners achieve native level lexical attainment as a group?
93
4.2.2.
What is native level achievement on the three lexical tasks?
97
4.2.3.
Question 1b: Do any high proficient adult-onset L2 learners
reach native level lexical attainment? 4.3.
Question 2: How large can the receptive vocabulary of adult-onset L2 learners be?
4.4.
4.5.
98
99
Question 3: What are the characteristics of exceptionally performing adult-onset L2 learners?
102
4.4.1.
Laura
102
4.4.2.
Antal
104
4.4.3.
David
106
4.4.4.
Anita
107
4.4.5.
Robert
109
4.4.6.
Summary of Question 3
110
Follow-up Analyses: What affects L2 lexical attainment?
111
4.5.1. The Correlations of Relevant Variables and L2 Lexical Attainment
111
4.5.2. What are the functions of the AO, EL, AT, and LL variables and the lexical scores?
114
4.5.2.1. Age of Onset
114
4.5.2.2. Education Level
117 xii
4.5.2.3. Age at Testing
118
4.5.2.4. Length of Immersion
120
4.5.3.
Caregivers’ Education, L1 Ability, and Reading-Vocabulary
Interest
121
4.6.
Summary of the Findings
123
5.
CHAPTER FIVE:
125
5.1.
Overview of Purpose and Method
125
5.2.
Summary of Findings
127
5.3.
The Findings of This Study In Relation To Previous Research
130
5.3.1.
Marinova-Todd, 2003
130
5.3.2.
Bahrick et al., 1994
131
5.3.3.
Kim, 1997
132
5.3.4.
Spadaro, 1998
135
DISCUSSION
5.4.
The Relevance and Implications of Findings
138
5.5.
Limitations of the Study
140
5.6.
Suggestions for Future Research
143
Appendices
144
References
181
Curriculum Vitae
190
xiii
LIST OF TABLES Table 1
Summary of Better-known Models of the Lexicon
24
Table 2
Participants by Age at Immigration to the United States and Length of Residence in the United States (Bahrick. et al. 1994, p. 268)
41
Descriptive Statistics for Controlled Variables in the Three Groups of Participants
62
Distribution of the Age at Testing Variable in the Three Groups of Participants
63
Table 5
Distribution of Education Level in the Three Groups of Participants
63
Table 6
Descriptive Statistics of Current Second Language Use in the Three Groups of Participants 64
Table 7
Second/Foreign Languages of the Participants in the Bilingual NS Group 64
Table 8
Distribution of Training/Occupational Field in the Three Groups of Participants
65
Table 9
Summary of Participation Criteria for the Adult-onset NNS Group
68
Table 10
Descriptive Statistics of AO, LL, and AT in the Adult-onset NNS Group
71
Table 11
Age of Onset of Significant Exposure to English (AO) in the Adult-onset NNS Group
72
Length of Significant Exposure to English (LL) in the Adult-onset NNS Group
72
Table 13
Between Subjects Factors in the Two-Way MANOVA
94
Table 14
Means and Standard Deviations on the Dependent Variables for the Three Groups 95
Table 15
The Combined NS Means and Standard Deviations
Table 16
Z Scores of the Adult-onset NNS Group
145
Table 17
Z Scores of the Bilingual Native Speaker of English Control Group
146
Table 18
Z Scores of the Monolingual Native Speaker of English Control Group
147
Table 19
Correlations Between Variables and Lexical Scores in the Adult-onset NNS group
113
Table 3 Table 4
Table 12
xiv
98
Table 20
Descriptive Statistics of the Regression Analysis Variables in the Adult-onset NNS Group
123
Table 21
Descriptive Statistics of Variables in Spadaro (1998)
148
Table 22
Z Scores of the Adult-Onset NNS Group in Spadaro (1998)
149
Table 23
Z Scores of the Child-Onset NNS Group in Spadaro (1998)
150
Table 24
Z Scores of the Native Speaker of English Control Group in Spadaro (1998)
151
xv
LIST OF FIGURES Figure 1.
The stretched Z shape decline, which is expected to represent the onset and conclusion of a hypothesized critical period. 9
Figure 2.
The mean scores on the two English lexical measures as a function of age at testing. 43
Figure 3.
The mean Scores on the English and Spanish version of the lexical test as a function of length of residence in the United States. 44
Figure 4.
PPVT-4 mean growth scale values based on the standardization sample.
51
Figure 5.
Relationship of age of onset (AO) and length of significant exposure to English (LL) in the adult-onset L2 learner group.
73
Figure 6.
Ninety-five percent confidence interval of the mean for the three dependent variables in the three groups of participants. 95
Figure 7.
Box plot of estimated functionally useful vocabulary size of the adult-onset L2 learner group and the two native speaker control groups pooled. 100
Figure 8.
Mean estimated functional vocabulary size by age group.
Figure 9.
Scatter plot of the estimated functionally useful receptive vocabulary size calculated in relation to the age of onset of significant exposure to English among the adult-onset NNSs. (The estimates were calculated based on the Self-Rated Vocabulary Test scores. The maximum was 26,901 words. The mean for the group (N = 33) was 19,633 (SD = 2,394) and the range 14,225-23,662. The solid black line is the trendline for the adult-onset NNS scores. The solid gray line marks the NS control mean (20,251), the dashed gray line one standard deviation (2,538) below the mean (17,713), and the dotted gray line two standard deviations below the mean (15,175)). 102
Figure 10. Scatter plot of the PPVT-4 raw scores in relation to the age of onset of significant exposure to English among the adult-onset NNSs.
101
115
Figure 11. Scatter plot of the PPVT-4 standardized scores in relation to the age of onset of significant exposure to English among the adult-onset NNSs. (Note that the US mean score for adults was 100.1. 115 Figure 12. Scatter plot of the Word Associates Test scores in relation to the age of onset of significant exposure to English among the adult-onset NNSs. (The at-chance score was 80 and the maximum 160.) 116 Figure 13. Scatter plot of the Self-Rated Vocabulary Test scores in relation to the age of onset of significant exposure to English among the adult-onset NNSs. (The maximum score was 191.) 116 Figure 14. Mean raw scores on the PPVT-4 Form A by age group. xvi
119
Figure 15. Standardized scores on the PPVT-4 by age group.
119
Figure 16. Mean scores on the Word Associates Test by age group.
120
Figure 17. Mean scores on the Self-Rated Vocabulary Test by age group.
120
xvii
CHAPTER TWO REVIEW OF THE LITERATURE 2.1.
Age Effects on L2 Acquisition and the Possibility of Nativelike Ultimate
Attainment The past four decades have produced robust evidence of age effects in second language acquisition. Evidence strongly suggests that, on the whole, the level of ultimate attainment that is possible to reach in a second language declines as the age of onset of L2 learning increases (Hyltenstam and Abrahamsson, 2003; Singleton, 2001; Scovel, 2000; DeKeyser, 2000; Pinker, 1994; Long, 1990; Newport, 1990). Few second language acquirers whose first significant exposure to the language occurred in adulthood are able to reach nativelike proficiency in specific skill areas that are measured in research studies, and we do not know for sure whether even one individual late starter exists who is indistinguishable from native speakers in all skill areas of the target language (Hyltenstam and Abrahamsson, 2000, 2003, Long, 2007, Lardiere, 2007). What we have learned is that age of acquisition as operationalized by age of arrival (AOA) in the target language country is the best-known predictor of how second language learners perform on various tasks that represent some aspect of their ultimate attainment in the target language. AOA can account for as much as 70% of the variance in the scores; it is a better predictor than age at which target language instruction began, length of residence in the target country (LOR), the native language of the learner, or any of the tested demographic, attitudinal, and training variables (Mechelli et al., 2004; Flege, 1999; Johnson and Newport, 1989). Based on the current state of research, the age of acquisition effect is real and significant in second language ultimate attainment. However, the claim that the observed age effects represent maturational constraints or a biological critical period on 8
second language acquisition is controversial. Proponents of the critical period hypothesis (CPH) suggested that the capacity for language acquisition in humans is constrained by age (Long, 2007; Hyltenstam and Abrahamson, 2003; DeKeyser, 2000; Newport, 1990; Lenneberg, 1967), possibly to the extent that no late starter could achieve native level proficiency in a language. Other researchers disagree with the interpretation of data on age effects. Birdsong (2005, 2006) examined the empirical findings on age effects and concluded that the data did not support the existence of a critical period for second language acquisition, but rather a continuous linear decline that is the result of cognitive aging. Birdsong’s argument is based on the observed function of the relationship between age of onset of learning (age of arrival, age of acquisition) and L2 proficiency scores. The existence of a critical period for L2 acquisition would mean a noncontinuous function, that is, proficiency scores would start declining at the beginning of the hypothesized critical period, but the decline would taper off upon the conclusion of the critical period, after which proficiency scores would be flat with a mean much below the native speaker mean. Birdsong described this age function as having a stretched Z shape (Figure 1). (For a discussion of other possibilities, see Birdsong, 2006).
Figure 1. The stretched Z shape decline, which is expected to represent the onset and conclusion of a hypothesized critical period.
9
However, Birdsong reported that the decline of proficiency scores does not flatten out at any point, but rather continues uninterrupted throughout the lifespan. This interpretation was based on reanalysis of earlier data, including Johnson and Newport (1989), using disaggregated cases rather than pooled samples. The function of the observed age effects is far from trivial; in fact, it is rather critical both for inferring the causes of age-related L2 proficiency decline and for drawing conclusions about the prospects of adult-onset second language learning. As Birdsong pointed out, the flat function of L2 attainment at the conclusion of the critical period offers little hope for late second language learners; it suggests that late second language learners on the whole are determined to exhibit deficiencies with only a few exceptional individuals being able to acquire target language skills that approximate those of native speakers. On the other hand, a linear decline that reflects general age effects would suggest less deterministic outcomes because the slope of the line may be moderated by factors that are not pre-determined for language learners. Studies show that the slope of the line that is associated with age effects may be influenced by three types of factors: (1) variables that cannot be changed, such as the learner’s first language (Birdsong and Molis, 2001; Bongaerts et al., 1997; Birdsong, 1992) and language aptitude (DeKeyser, 2000), (2) variables that can be changed, such as L2 use (Piske et al., 2001; Flege et al., 1999), years of education (Urponen, 2004; Flege and Liu, 2001; Flege et al., 1999), possibly some motivational and training variables (Moyer, 1999; Flege et al., 1999; Bongaerts et al., 1997), and (3) task-related variables, such as the language domain and sub-domain being tested (Flege et al., 1999), and mode of task delivery (Birdsong, 1992; Johnson, 1992). Variables that can be changed offer hope for the prospect of adult-onset second language learning.
10
Proponents of the CPH suggested that lexical acquisition falls under the same maturational decline as other language areas (Hyltemstam and Abrahamson, 2003). Recently, Long (2007) restated this position and indicated that age six may represent the onset of critical period in L2 lexical acquisition. This position is particularly problematic in that we have very little data to evaluate Long’s proposal. Birdsong’s (2005, 2006) analysis does not include lexical data. As I will discuss later, as opposed to most cognitive functions, the size of the lexis does not appear to decline with age. Studies show that vocabulary continues to grow until the early 70s (Williams and Wang, 1997; Park et al. 2002). Generally, the bulk of vocabulary growth in the L1 occurs before age 20, but the ability to learn new words does not end at any definable point in life. What follows is that adult-onset second language learners face the challenge of being measured against native speaker adults who have already acquired the bulk of their vocabulary and whose vocabulary continues to expand. At the same time, adult-onset L2 learners also have to contend with the on-going decline of their cognitive capacities, including the steady loss of verbal and working memory (Hedden, Lautenschlager, and Park, 2005; Park et al., 2002). At present, we lack data to be able to predict how adult-onset L2 learners manage the monumental challenge of L2 lexical acquisition. We do not know how common or rare nativelike ultimate L2 lexical attainment is or whether it is even possible. We do not know the function of the relationship between age of onset of learning and eventual L2 vocabulary size or depth of L2 word knowledge. Filling this critical gap in the literature is very important for both second language acquisition theory and educational practice.
11
2.2.
Little Evidence of Nativelike Attainment in Phonology According to the available behavioral data, age effects in L2 acquisition are the
most evident in the domain of phonology. Published studies reported a significant difference between the group means of late-onset second language learners and native target language speakers (Oyama, 1978; Bongaerts et al., 1997; Moyer, 1999; Flege et al., 1999(a), Flege et al., 1999(b); Piske et al., 2001; Flege and Liu, 2001). Of the 283 late-onset second language learners who participated in these seven studies only four individuals achieved a score within the native speaker range on all subtasks: three native speakers of Dutch (Bongaerts et al., 1997) and one native German speaker (Moyer, 1999). No native speakers of Italian, Korean or Chinese could pass as native English speakers. The typical assessment task used to investigate L2 phonological attainment is speech sampling. Participants are either asked to read 3-6 short sentences that are loaded with sounds and sound combinations characteristic of English or have to repeat what they hear after a long pause, the purpose of which is to prompt them to reconstruct the sentences rather than mimic them. Sometimes the sentences are elicited rather than repeated. Flege pioneered and refined the technique, which has been surprisingly effective with distinguishing native and nonnative speakers of English. Example of task sentences are: Ron set a thick rug in the sun. Joe will feed the pup who sat by you. You should thank Sam for the food. (Flege et al., 1999(b), p. 83). A panel of judges who are selected to match the regional dialect of adult bilinguals make scalar judgments about the randomly recorded sentences (for example, 1 = definitely nonnative, 9 = definitely native). With the exception of Flege and Liu (2001) phonological studies investigated performance, rather than competence; it is possible that adult-onset second language learners would have performed better had their phonological competence rather than performance been tested. In addition, none of the phonological studies used formal pre12
screening to identify highly successful late-onset second language learners. These threats prevent us from concluding with confidence what appears to be the case, which is that nativelike ultimate attainment in the area of phonology seems to be very rare or nearly nonexistent among late-onset language learners, particularly among those whose native language is distant from the target language. However, the extent and seriousness of L2 phonological deficiencies and the pattern of phonological decline need further study.
2.3.
Nonnativelike Attainment the Norm in Morphosyntax The most extensively investigated area in second language ultimate attainment is
the morphosyntactic domain, or more specifically, grammatical competence. Coppieters conducted the first of such studies in 1987 in French, but the seminal research paradigm in English was the work of Johnson and Newport (1989). Johnson and Newport designed a grammaticality judgment task with exemplars of basic English grammar rule types such as the plural of nouns, past tense, third person singular, wh- questions, determiners. After listening to recorded sentences, nonnative speakers of English were asked to accept or reject sentences based on grammatical correctness. Johnson and Newport’s 1989 study was replicated with modifications by Johnson (1992), Flege et al. (1999b), DeKeyser (2000), McDonald (2000), Flege and Liu (2001), and Birdsong and Molis (2001). In short, all published behavioral studies on grammatical attainment that did not formally screen their nonnative speaker participants for proficiency reported that the mean score of the adult second language acquirer group was significantly below the mean score of the native speaker group and/or early second language acquirer group (Coppieters, 1987; Johnson and Newport, 1989; Johnson, 1992; Birdsong, 1992; Flege et al., 1999b; DeKeyser, 2000; McDonald, 2000; Birdsong and Molis, 2001; Flege and Liu, 2001). 13
There are definite indications, however, that eventual L2 attainment in the morphosyntactic domain is superior to L2 phonological attainment. First, one study that formally prescreened nonnative participants for near-native level of proficiency reported no significant difference between the means of the near-native adult-onset second language acquirer group and the native control group (White and Genesee, 1996). Another study, Urponen (2004), which was modeled in part after White and Genesee (1996), found that 36% of the late-onset second language learners were nativelike. In addition to the White and Genesee (1996) and Urponen (2004) findings, six of the nine above cited studies (Johnson, 1992; Birdsong, 1992; Flege et al., 1999b; DeKeyser, 2000; McDonald, 2000; Birdsong and Molis, 2001) reported that at least some late-onset second language learners were able to achieve a score within the range of native speakers and/or early-onset second language acquirers. Given the small to moderate sample sizes of the adult second language acquirer groups in these nine studies, which ranged from 9-96 with a mean of 34.22, we could hypothesize that perhaps more than just a few outliers were able to achieve nativelike proficiency among the adult second language learners on the grammaticality competence measures. We need to note that the studies which reported a particularly high level of nativelike L2 grammatical attainment for adult-onset L2 learners administered the grammaticality competence task in the written mode alone (Birdsong, 1992; Johnson, 1992; White and Genesee, 1996, Urponen, 2004), which suggests that adult learners benefit from the additional processing time the written mode of task delivery affords them.
2.4.
Promising Results for Adult-onset L2 Learners in the Area of Phrasal Semantics Among studies that have tested age effects on ultimate L2 attainment, the most
positive results for late-onset L2 learners have been reported in the domain of phrasal 14
semantics. Studies in phrasal semantics involve interpreting the meaning of phrases that represent specific grammatical structures which need to be acquired in the target language. For example, an English speaker who is learning Spanish would have to acquire two different past tense forms, the imperfect and the preterite. Acquiring the forms themselves falls under morphosyntactic acquisition; however, the learner also has to acquire the semantic differences between the two different past tense forms. Only the learner who has successfully acquired the semantic features encoded in the imperfect and preterit forms can judge the felicitousness of the following two Spanish sentences:
(1) La clase era a las 10 pero empezó a las 10:30. “The class was [imperfect] at 10 but started at 10:30.”
logical
(2) La clase fue a las 10 pero empezó a las 10:30. “The class was [preterite] at 10 but started at 10:30.”
contradictory
(Montrul and Slabakova, 2003, p. 369.) Both fue and era correspond to the same English word, was, but in Spanish only era makes logical sense in the given context. This interpretation constitutes phrasal semantic knowledge. Slabakova (2006) reviewed 15 studies, and argued that each study supported the claim that adult second language learners could acquire phrasal semantic properties if they managed the challenge that the underlying morphosyntax posed. In other words, semantic acquisition did not appear to be constrained by age of onset of learning, but it did depend on prior morphosyntactic acquisition, which constitutes a problem for adultonset learners. For learners who were able to reach high morphosyntactic proficiency, the acquisition of nativelike phrasal semantics was attainable. For example, in Montrul and Slabakova (2003), 12 of the 17 subjects in the near-native speaker group achieved a score within two standard deviations of the native speaker controls on all phrasal 15
semantic measures, and the tasks employed were appropriately challenging, probing native level semantic competence. This is a striking result compared to adult learners’ morphosyntactic and phonological scores. A weakness of Slabakova’s argument is that phrasal semantic tasks are presented in the written mode; therefore, results may not reflect online semantic processing. Nevertheless, adult L2 learners’ rate of success on phrasal semantic acquisition tasks presents an anomaly to the critical period hypothesis.
2.5.
Neuroimaging Evidence Against Age of Onset Effects in the Lexicosemantic Domain Support for the hypothesis that the lexicosemantic domain may not be subject to
age of onset effects, or at least not in the same way as morphosyntax and phonology are, comes from neuroimaging research. I summarize briefly the findings of two lines of neuroimaging research that bear relevance to ultimate lexicosemantic attainment, one of which employed functional magnetic resonance imaging (fMRI) and the other eventrelated potentials (ERPs). Functional magnetic resonance imaging is an exciting technology that can provide 3-D mapping of the brain with detailed voxel counts of activated areas. Tasks that can be mapped accurately are silent tasks that require only a bare minimum of physical movement. Measurements involve taking images during an experimental task, mapping the location of brain activity, measuring the distance between activation centers, and/or counting the number of pixels in the digital image that appear to be activated. Typically, the activation during the experimental task is subtracted from an image taken in the same subject in a baseline condition. Despite the fascinating images fMRI studies produce, there are at least two main problems with the validity of their findings. First, experimental tasks have been too complex to be able to tell what specific 16
mental sub-processes are required and are being captured. Second, there is no way to differentiate between activation and inhibition in the images; in other words, we do not know whether a brain area lights up in activity because it is recruited for the task or because it is being inhibited to be able to perform the task. In some cases inhibition can cause a broader activation than the normal processing of a mental task. In addition, a major threat to the reliability of fMRI findings is the individual variation that is observed during task performance. (For a review of the methodological problems of fMRI studies see Chapter 6 of Paradis, 2004.) Kim, Relkin, Lee, and Hirsch (1997) examined 12 subjects, six early bilinguals and six late-onset bilinguals with a variety of first and second languages (L1 and L2) during a silent self-talk task. They measured the area of activation in Broca’s and Wernicke’s area during L1 and L2 use and measured the distance between the center of activation during L1 and L2 use. The mean AOA for the late-onset group was 11.2 years. The most important finding was that they observed an anatomical separation of early and late-learned languages in Broca’s area. In the late-onset bilingual group, there were no common voxels used during L1 and L2 processing in Broca’s area (Broadmann’s area 44), the area presumed to be responsible for morphosyntactic function. In contrast, in Wernicke’s area (Broadmann’s area 22), there was no anatomical separation between the L1 and L2 in either the early or the late-onset bilingual group. The conclusion was that the morphosyntactic processing of late-onset bilinguals differs anatomically from that of early bilinguals and that same significant difference is not apparent in the anatomical area of lexicosemantic functioning. The fMRI images did not show a difference in the lexicosemantic processing of L1 and L2 of the early and late bilinguals. Wartenburger et al. (2003) administered a grammaticality judgment task and a semantic judgment task to Italian-German bilinguals, who were divided into three 17
groups: early high proficient, late-onset high proficient, and late-onset low proficient. The 32 participants completed the tasks in both their L1 and L2 while inside an fMRI scanner. The early high proficient and late-onset high proficient groups did not differ significantly on any behavioral measures; the late-onset low proficient group performed significantly lower on all grammatical and semantic L2 measures. The fMRI images revealed differences between the groups that were not apparent from the behavioral data. During the grammatical judgment task, early high proficient and late-onset high proficient subjects differed in their activation patterns; the late-onset group produced significantly greater activation in their later-learned language. Their brain activation pattern was more similar to the late-onset low proficient group than to the early bilingual group. During the semantic judgment task, the two high proficient groups did not differ significantly in their activation patterns. The study suggests differential AOA effects in the domains of morphosyntax and semantics, with proficiency having a greater impact on the neuroanatomy of semantic processing than AOA. Brain activation can be indexed by electrical activity measured on the surface of the scalp. In the case of event-related potentials (ERPs), the electrical signal recorded on the scalp is first amplified, then averaged. The result is a continuous record of positive and negative changes that happen in different brain regions millisecond-by-millisecond during experimental tasks. Researchers have found that particular mental activities elicit characteristic electrical signals in the brain. For example, when a subject listens to a semantically inappropriate word in a sentence, the noticing of the semantic violation results in a negative wave that peaks 400 milliseconds later in the centroparietal region. This response to a semantic violation is called the negative 400 effect (N400). Grammatical violations also result in a typical electrical response, a positive wave that peaks 600 milliseconds after hearing the error; this response is known as the positive 600 18
effect (P600). Another characteristic wave is a negative wave that peaks 280 milliseconds after hearing a content word and is registered over the left anterior region, the hypothesized location of the mental lexicon (N280). Not all subjects are equally sensitive to the linguistic anomalies in the input; individual variation is quite common, particularly with tasks that make a demand on working memory. The occurrence, timing, and amplitude of a wave elicited by a stimulus are assumed to index differences in mental processing. (Osterhout, McLaughlin, and Bersick, 1997) ERP studies that compared the language processing of monolingual native speakers, early and late-onset bilinguals have found differential results for syntactic and semantic processing. Weber-Fox and Neville (1996) examined 61 Chinese-English bilinguals whose exposure to English started at different ages: 1-3 years (n = 15), 4-6 years (n = 13), 7-10 years (n = 10), 11-13 years (n = 13), and after age 16 (n = 10). Data included proficiency and language use self-report, the results of four standardized language tests, accuracy scores of a syntactic and semantic judgment (which subjects read off a monitor), and ERPs recorded during the syntactic and semantic judgment task. The late-onset bilinguals as a group (age at which English learning began >11years) scored below the native speaker level on every behavioral measure. Although the behavioral measures indicated that early and late bilinguals were different, the ERP recordings exhibited an important similarity between the semantic processing of early and late-onset bilinguals. Both early and late-onset bilinguals’ ERP data showed the occurrence of the same N400 effect in response to semantic violations that is observed in monolingual native speakers; in other words, an important index of semantic processing was nativelike in all bilinguals, regardless of age at which language learning began. There was a small difference, however; the N400 effect peaked slightly later in late-onset bilinguals (>11 years), which indicated somewhat slower processing. In contrast, the 19
ERPs elicited by syntactic violations were markedly dissimilar from monolingual native speakers starting at a much earlier age of learning. The amplitude and distribution of the P600 effect was non-nativelike beginning with 4-6 years of delay in exposure. Weber-Fox and Neville concluded that their findings support a differential effect of delay of exposure in the semantic and syntactic language subsystems, with the semantic domain promising qualitatively more nativelike processing even for late-onset second language learners. Hahne and Friederici (2001) conducted a similar study with 12 adult-onset Japanese-German bilinguals. Subjects listened to four types of German sentences: correct, semantically incorrect, syntactically incorrect, both semantically and syntactically incorrect. In response to semantic violations, the late-onset bilingual subjects exhibited a similar N400 effect to native speakers, although correct sentences triggered a response with higher amplitude than that of native subjects, which indicated that late bilinguals were experiencing greater difficulty during the task than native speakers. The bilinguals in this study did not exhibit the typical native speaker response to syntactic violations, the early anterior negativity and the P600; in addition, their response to sentences with both syntactic and semantic violations were altogether nonnativelike. The non-nativelike response to syntactic violations was partly due to the aural mode of the task, partly to the relatively low grammatical proficiency of the subjects, and partly to the late age of onset of L2 learning. Hahne (2001) repeated the above experiment with a group of 16 German native speakers and 16 Russian-German bilinguals. The Russian-German bilinguals had resided in Germany on average three times as long (12-204 months, mean 66 months) as the Japanese-German bilinguals (Hahne and Friederici, 2001), studied German in a formal setting for almost three times as long on average (2-168 months, mean 84 20
months). They all started learning German after age ten, but on average earlier than the Japanese-German bilinguals. As with Weber-Fox and Neville (1996) and Hahne and Friederici (2001), the subjects in this study also showed the nativelike centroparietal N400 effect in response to semantic violations, although with a slightly later peak than the native speakers, with higher amplitude in the semantically correct condition, and lesser difference between the correct and incorrect conditions than the native speaker. These highly proficient subjects, however, also showed the P600 effect in response to syntactic violations, even though with a later peak, less amplitude, and lesser difference between the correct and incorrect condition than native speakers. Hanhe concluded that although nativelike semantic processing may be more accessible for late-onset bilinguals, nevertheless, nativelike syntactic processing may also begin to develop as higher grammatical proficiency is achieved, primarily through formal language study. Sanders and Neville (2003a) introduced a new aural experimental task. They recorded sentences that were identical in contour but represented different conditions: semantically and syntactically nonsensical, syntactically meaningful but semantically nonsensical, both semantically and syntactically meaningful. They recorded ERPs that were elicited by specified words/nonwords in the experimental sentences. Subjects were 18 Japanese-English bilinguals, who began learning English after age 12 in a classroom setting, arrived in the United States after age 16, and had a minimum of 3 years LOR at the time of testing. The control subjects were 18 monolingual English speakers (Sanders and Neville, 2003b). The late-onset bilingual subjects, once again, showed the nativelike N400 effect in response to target words which were actual words; they differed from native speakers in that they showed greater negativity for nonwords, and consequently a lesser amplitude difference between actual words and nonwords (as with Hahne, 2001, or Hahne and Friederici, 2001). This difference signaled that bilinguals attempted a 21
lexical search for nonsense words. However, the most important difference between native English speakers and late-onset bilinguals was that the latter group did not modulate their response in the syntactically correct and syntactically nonsensical conditions. They showed similar processing effects for nonwords whether or not those were presented in a syntactically possible context, which was a significantly nonnativelike response. In brief, ERP studies have indicated that syntactic processing is non-nativelike in late-onset bilinguals (with perhaps an effect for formally acquired grammatical proficiency and L1 (Hahne, 2001)), while semantic processing may be nativelike at least in those aspects that depend less on syntactic processing. Late-onset second language learners are showing a slower and more effortful response to open-class words and semantic violations than native speakers do; nevertheless, their language processing is much more nativelike in the lexicosemantic area than in the morphosyntactic domain. These studies suggest that the lexicosemantic domain is less vulnerable to delays of exposure to the target language than the morphosyntactic domain.
2.6.
The Lexicosemantic Domain and the Lexicon The lexicosemantic domain the previously cited neurolinguistic studies referred
to is a collective term used in contrast to morphosyntactic processing. The term refers to a rather broad domain of language that is assumed to encompass everything but phonology, grammar, and pragmatics, in other words every aspect of language that is associated with encoding and interpreting meaning. It is not difficult to argue that such a broad term is necessarily too vague to be meaningful. The discipline of linguistics generally separates the lexicon from semantics, with lexicon referring to the words that exist in a given language or the words that a speaker knows and semantics indicating 22
the process by which meaning is encoded and decoded in language. To be clear, the finding of the previously cited neurolinguistic studies is that adult-onset second language learners who are proficient users of their L2 encode and decode the meaning of messages very similarly to native speakers as far as the neurological process can be captured by existing technology and techniques. This finding, of course, does not make any predictions for the ultimate semantic attainment or the eventual size of the target language lexicon. It only suggests that those areas need to be carefully looked at in order to assess the full potential of adult-onset second language acquisition because those areas may allow nativelike results. I will not attempt a discussion of all that may be entailed in the lexicosemantic domain, but will limit this study to the traditional demarcations within the fields of applied linguistics and second language acquisition, where typically the lexicon is studied independently from semantics. By lexicon, I will refer to the words that an individual knows on any level or with any degree of familiarity.
2.7.
Models of the Lexicon A large number of models exist regarding what the lexicon is and how it is
connected to other linguistic systems within an individual. Singleton (1999) offered a valuable critical review of some of the influential models, such as Morton’s logogen model (Morton and Patterson, 1980), Marslen-Wilson’s cohort model (Marslen-Wilson, 1990), Forster’s search model (Forster 1989), Levelt’s blueprint (Levelt, 1989), Fodor’s modularity model (Fodor, 1983, 2000), and the brain metaphor model offered by connectionist researchers (Rumelhart and McClelland, 1986). For a discussion of bilingual models of the lexicon, see Murre, 2005, Kroll and Sunderman, 2003, Kroll and
23
Tokowicz, 2005. I provide an overview of both monolingual and bilingual models in Table 1.
Table 1 A Summary of Better-known Models of the Lexicon Model of the lexicon
Theoretical framework
Logogen model
Information processing
Cohort model
Information processing
Search model
Information processing
Synopsis
Source
The mental lexicon is composed of words that are encapsulated in information units (logogens) and processed as a unit within the various interface systems (phonological input/output, visual input/output). Words are connected into larger cohorts based on acoustic relatedness. Entire cohorts are activated at once; word recognition occurs at the point when all non-matches are turned off and only one activated match remains. Words in the mental lexicon have various peripheral access files that allow them to be searched by their phonology, orthography, syntactic and semantic relationships. These access files contain pointers to a master entry of the word. Once the master file is accessed, various operations with the word become possible.
Morton and Patterson, 1980
24
MarslenWilson, 1990
Forster, 1989
Blueprint model
Adaptive control of thought (Anderson, 1983)
Modularity model
Modularity theory
Brain metaphor
Connectionism
Town map analogy
Connectionism
Hard drive metaphor
Minimalist program (Chomsky, 1995)
The language processing system includes declarative (‘knowledge that’) and procedural (‘knowledge how’) components. One of the declarative components of language is the lexicon, which has a central role as a mediator between the speech comprehension and message encoding systems. Mental functions in the mind are highly specialized; the architecture of these specialized units (modules) is innate. The lexical network, as other modules, is informationally encapsulated. The connection of lexical items is non-semantic, based on general contextual effects. Neural patterns are created based on statistical learning, which is the forming of patterns that are the result of constant computation of the input. The lexicon is a neural net with massive connections that have been generated by statistical learning. The brain is an organically evolved multidimensional organizational system (town map) in which lexical items are coded and arranged differently within the various linguistic and cognitive subsystems (towns). The primary relationship between lexical items is semantic. The mental lexicon is lexical items put in long-term storage, as if data stored on a hard drive; it is not an elegantly organized system; some items are centrally organized, others are scattered on the peripheries.
25
Levelt, 1989
Fodor, 1983, 2000
Rumelhardt and McClelland, 1986
Aitchison, 2003
Jackendoff, 2002
Boolean network model
Random autonomous Boolean networks (Kauffman, 1993)
The bilingual adaptation of Levelt’s blueprint
Adaptive control of thought
Revised hierarchical model
Competition model (Bates and MacWhinney, 1989)
Declarative/ procedural model
Hybrid of connectionist and nativist/ modular linguistic theory
The lexical network operates autonomously based on a very simple predetermined pattern of response to input. The network quickly stabilizes into a state (attractor state) in which a small number of words are permanently activated. In bilinguals both languages may be simultaneously activated in the mental lexicon on the lemma level. A lexical checking device controls whether the L1 or L2 encoding gets further processing and is eventually articulated. The L1 lexicon, L2 lexicon, and conceptual representations form an asymmetrical triangular relationship, where L1 is strongly tied to concepts and L2 is strongly tied to L1. With developing proficiency the L2 lexicon’s word-to-word relationships weaken as new word-to-concept relationships develop. The lexicon and grammar are subserved by distinct neural subsystems. Computational processes are characteristic of specific neural subsystems. The lexicon is part of the associative memory system (declarative memory), which characteristically computes statistical patterns; grammar is embedded in the procedural memory system, which performs symbolic, rulebased manipulations.
Meara, 1999
Poulisse and Bongaerts, 1994
Kroll and Stewart, 1994
Ullman, 2001
A very promising emerging theoretical framework for the mental lexicon (and language in general) based on neurolinguistic evidence is Ullman’s (2001) declarative/procedural model. Ullman’s model is a hybrid modular-connectionist model. Ullman posits that language is mainly accommodated by two distinct 26
subsystems of the brain, the associative memory system of the temporal lobe and the rule-computing system of the left frontal/basal-ganglia. The two systems are modular, distinct in their neuro-anatomy, and perform domain-specific computational processes. However, neither system is specific to language; both the associative memory system and the rule-computing system serve other domains. The associative memory system uses a probabilistic computation to learn, map, and store information such as facts, events, faces, and words. This system specializes in arbitrarily related information and is sensitive to frequency of input as well as co-occurrence (as for example in phonological neighborhood effects). Declarative memory is another term used for this system because the content of this memory can be subject to conscious recollection. The rule-computing system is also called the procedural system; its neural operations are similar to each other, but different from those of the associative memory system. The procedural system includes the learning and performing of habits and skills, simple to complex motor acts, such as moving, walking, driving, speaking, processing grammar, and using cognitive skills. The kind of learning that takes place in the procedural system is implicit (nonconscious) and involves symbol manipulations via rules and constraints. In Ullman’s hybrid model the mental lexicon is one of the functions of the associative memory system in the left temporal lobe, and morphosyntactic tasking is a function of the procedural system of the left frontal/basal-ganglia. Consequently, the mental lexicon is built by “an associative memory of distributed (but structured) representations” (Ullman, 2001, p. 38) that is not unique to language, rather it is the process used for organizing and processing events and facts that humans come across. Operations that involve symbolic manipulation of words (or events, facts), such as sequencing and structuring, take place in a different neural substrate, in the procedural system of the frontal lobe. 27
In the above sense, Singleton’s prediction (1999) that the connectionist model would be productive in creating a model of the mental lexicon is supported by Ullman, while the Fodor/Jackendoff modularity hypothesis is upheld to an extent. However, the mental lexicon is not just a “hard drive” for storing facts learned by rote memorization; by Ullman’s model, it is an associative system weighted by the frequencies and probabilities in the input. The mental lexicon computes the distribution of lexical items, creates and stores mappings, and generalizes mappings to new similar contexts without generating rules. What may look like rules here are nothing more than recognized and stored patterns. However, real rules (and constraints) are computed elsewhere; symbol manipulation and transformations do actually take place in the mental grammar in the frontal lobe; the language system is more than what can be computed by the associative memory system. In Ullman’s model, the connectionist hypothesis does not satisfy all of language; the connectionist model of functioning is limited to specific brain functions. Humans also have a procedural system, a rule-based brain system to rely on, which is a resource to language as well. This function however resides outside the mental lexicon. Ullman’s (2001) declarative/procedural model proposal emerged primarily from his synthesis of psycholinguistic and neurolinguistic evidence, which includes lesion studies of patients with aphasia, Alzheimer’s disease, Parkinson’s disease, Huntington’s disease, studies on Specific Language Impairment and Williams syndrome, as well as laboratory studies using electroencephalography (ERPs), magnetoencephalography (MEG), and neuroimaging with PET and fMRI. Studies published since Ullman’s (2001) proposal have added to his hypothesis. In an event-related fMRI (ER-fMRI) study Beretta et al. (2003) mapped the lexical activation of regular and irregular word forms (controlling for frequency) and found a broad distinction between the regular and irregular forms, with irregulars causing 28
greater brain activity in both hemispheres. Brain images they published show that both regular and irregular forms caused activation in both the left temporal lobe (an associative memory area, location of the mental lexicon) and the frontal lobe (procedural system, location of the mental grammar). The activation for irregular forms was greater in both language areas. One possible explanation is that when the brain proceeds to produce a past tense form two parallel processes happen: (1) the mental lexicon searches for a stored form, (2) the mental grammar performs an operation to produce the form with a rule. If the mental lexicon can produce a form, the form produced by the rulebased operation is inhibited. Inhibiting the rule-based production requires greater mental effort than going with the automatized rule-based flow. Connectionism cannot account for this phenomenon, but both Ullman’s model and nativist/modularity theory can. In summary, the most elegant model with the best explanatory value for a model of the mental lexicon appears to be Ullman’s declarative/procedural model, which claims that the mental lexicon is located in the temporal lobe embedded in the neural substrate of the associative memory system. The associative memory system underlies not just the mental lexicon but also the probabilistically weighted mapping, learning, and storing of arbitrarily related information, such as facts and events. Therefore, the neurological processes that underlie the mental lexicon are not unique to it. The mental lexicon, like the associative/declarative memory system, is subject to consciousness; its content can be discussed. The mental lexicon processes lexical information in parallel with the mental grammar, which performs automatized, rule-based operations and is located in the left frontal/basal-ganglia. The mental lexicon and grammar are in communication; as a result rule-based automatic operations are inhibited when searches are successfully satisfied by the mental lexicon. 29
2.8.
Lexical Attainment: The Dimensions of Lexical Knowledge Lexical attainment refers to the word knowledge that has been acquired by an
individual. The two primary dimensions of lexical knowledge are generally considered to be the breadth and depth of vocabulary (Nassaji, 2004; Read, 2004, 2000; Vermeer, 2001; Qian, 1999, Nation, 1990), although breadth has received the most attention by far. Breadth refers to the quantity of accumulated vocabulary and depth to the quality of word knowledge. To be clear, breadth and depth of word knowledge are not in opposition, but are two dimensions of the same construct (Vermeer, 2001). For a third dimension of lexical knowledge, several constructs have been suggested, such as fluency or automaticity of access (Meara, 1996; Laufer and Nation, 2001), mastery (Nation, 2001; Henriksen, 1999), or strength (Laufer and Goldstein, 2004). Each of these constructs is related to the level of access an individual has to the acquired word knowledge. The reason why dimensions of lexical knowledge have been suggested is that there is so much to know about a word. For example, Nation (2001) compiled a handy inventory about what can be known about a word; he grouped aspects of word knowledge into three categories: form, meaning, and use. By knowing the form, Nation meant familiarity with the spoken, written form of a word, as well as knowledge of word parts. Knowing the meaning of the word entails understanding the referent the word is associated with, being able to place the word in a concept and having additional associations to go with it. By knowing the use of the word, Nation meant familiarity with grammatical patterns that match the word, remembering its collocations (other words that typically co-occur with the word), as well as socio-linguistic and pragmatic constraints that govern the use of the word (such as frequency, dialect, or register). In his word knowledge framework, Qian (1999) included pronunciation and spelling,
30
morphological properties, syntactic properties, meaning, discourse features, and frequency of use. The important issue here is that “knowing a word” is a complex matter because there is a great deal to know about a word and there are many different avenues to gaining that knowledge (visual, auditory, cognitive, semantic, pragmatic and so on). Word knowledge may be similar to knowing a person, which can be as basic as recognizing a face, a voice, a figure at distance and vaguely recollecting that it is familiar from somewhere, or it can be as complex to knowing the name, life history, family relationships, likes and wants of a person, knowing how to read his moods or bring the best out in him. In the brain we can map the location where face recognition occurs; it happens where the temporal lobe underlies the occipital lobe, that is, it happens on the border of the associative memory system and the visual cortex. Other knowledge about the persons we recognize is likely networked here but stored elsewhere. By analogy, we can imagine that the kind of knowledge actually contained in the mental lexicon is limited. All that we can know about a word is not strictly lexical knowledge. Incidentally, the mental lexicon has developed in close proximity to the auditory pathways, which should suggest to us that word knowledge in the mental lexicon contains primarily auditory information. The most basic form of lexical knowledge then should be recognizing a sound string as having familiar meaning. Consequently, the most central aspect of lexical attainment is receptive vocabulary size in the aural assessment mode. In addition to the number of words known, we may also want to focus on depth of word knowledge by assessing those aspects of qualitative word knowledge that may indicate the degree to which words have been integrated into a lexical network. Although depth of word knowledge could mean familiarity with pronunciation, 31
morphological and syntactic properties, and discourse features as Qian (1999) and Nation (2001) pointed out, these aspects of word knowledge must necessarily overlap with other language domains (phonology, morphosyntax, pragmatics). For assessing ultimate lexical attainment we should limit the depth of word knowledge concept to features that are not confounded with phonological, morphosyntactic, and pragmatic attainment, as far as possible. A good and tried possibility for this is using word associations as a measure of depth of word knowledge.
2.9.
Inconclusive Findings on Age Effects and Ultimate Lexical Attainment Compared to the large body of research devoted to investigating age effects in
the domains of phonology and morphosyntax, far less behavioral evidence exists on the relationship between age of onset and eventual L2 lexical attainment. I am aware of only six studies that investigated the topic (Marinova-Todd, 2003; Spadaro, 1998; Lee, 1998 qtd. in Long, 2007; Kim 1997; Bahrick et al., 1994; Hyltenstam, 1992/1988) and taken together their findings are inconclusive. The study most frequently cited in support of maturational constraints on L2 lexical acquisition is Hyltemstam (1992/1988); unfortunately, the study had many methodological weaknesses. Hyltenstam (1988) compared the lexical proficiency of monolingual and near-native bilingual 17-18-year-olds at a Swedish high school, and found some significant between group differences on a few measures, but no consistent differences in overall lexical proficiency. AOA data were not presented in this version of the study. A major methodological weakness was that the study attempted to compare ten variables on three groups with a total of 36 subjects. Hyltenstam (1992) presented the same study, but reanalyzed the data. To the lexical measures he added grammatical measures as well; however, he reduced the number of variables to four (written lexical 32
errors, oral lexical errors, written grammatical errors, and oral grammatical errors). To demonstrate age effects, he lumped all four variables into just one composite variable, errors. By this method he could show that the error scores of bilinguals with an AOA of greater than 7 years did not overlap with the scores of native speakers. Hyltenstam’s method of data presentation and analysis does not allow the reader to make inferences about the role of lexical errors in nonnativelike late-onset attainment. The late-arrival groups’ errors were not reported separately. In addition, because length of residence (LOR) data were missing, we cannot determine whether the late-arrival group had sufficient time for L2 acquisition. At least one of the subjects was reported to have arrived after puberty (Hyltemstam, 1992, p. 355) and therefore could not have resided in the target country longer than a few years. To sum up, this study did not actually study the effect of age of onset of L2 acquisition on ultimate attainment because the participants had not reached their eventual attainment in the area of L2 lexicon. Second, the variable examined, lexical errors, is not necessarily a valid measure of nonnativelike lexical achievement. In fact, some of the errors classified by Hyltenstam as lexical errors were spelling and phonological errors. In addition, as Birdsong (2006) pointed out minor errors in the second language are not necessarily deficiencies; they could be simply an artifact of having learned a related fact in another language already. Kim (1997) set up an experiment similar to Johnson and Newport (1989) to study the relationship of age of arrival to the target country (AOA) and ultimate L2 lexical attainment. Kim tested 70 Korean-English bilinguals with varying AOAs: 10 participants began acquiring English at age 0-2, 20 participants at age 3-5, 10 at age 6-8, 10 at age 1214, and 10 after age 15. All participants had resided in the United States for a minimum of five years and were 18-26 years old at the time of testing. The native English speaker control group consisted of 10 bilinguals (English-French, English-German, English33
Korean, English-Polish). In the experimental task, Kim used a prime-target paradigm. Participants read word pairs (prime and target) off a computer screen and had to decide whether the target was an actual English word or a non-word. Non-words were designed to differ in only two letters from actual words. Two variables were measured: reaction time and accuracy. The mean reaction time for lexical decision exhibited a stretched Z-shaped trend with a significant decline in reaction time beginning at an AOA of age 5 and the curve of decline flattening after an AOA of age 12. There was a significant interaction for prime-target type, which indicated that age-related decline in lexical decision time was greatest for non-word targets. For the accuracy of lexical decision variable, the only significant between group difference observed was for the late-arrival groups (12-14, 15+) and only in the non-word condition. While Kim’s study (1997) appears to support the existence of maturational constraints in the lexical domain, many questions remain. The most important question is what Kim was measuring with the reaction time and accuracy of lexical decisions. The prime-target task Kim used is more typically used to make inferences about the organization of the mental lexicon in bilinguals. It is unclear what inferences we can make about ultimate L2 lexical attainment based on this particular task. The real words in the experiment were low frequency words likely to be known to a moderately proficient English as a second language learner (for example, moon-stars, bread-butter, dark-light). Therefore, little can be inferred about L2 lexical size from this study. That late-onset bilinguals took longer to perform a lexical search to be able to decide whether a word exists in the target language may or may not have relevance for ultimate lexical attainment. If anything, it is encouraging to see that late-onset second language learners were more likely to be nativelike in the semantically meaningful task condition, that is with the semantically primed targets. The relevance of reaction times in non-natural 34
lexical decision tasks (reading nonsense words off a screen) for eventual L2 lexical attainment is unclear and has little support in the literature. In a second experiment, Kim (1997) administered a grammaticality judgment task with measured reaction times, the content of which he modeled after Johnson and Newport (1989). Kim computed correlations between subtask scores and AOA. The correlations between the grammar scores and AOA were much higher than they were between the lexical measures and AOA. When Kim partialed out LOR, significant partial correlations between grammar subtask reaction times and AOA remained (with one exception, prepositions); however, this was not the case for lexical measures. After partialing out LOR, lexical measures and AOA no longer correlated significantly, except in the reaction time scores in the non-word condition, where AOA was responsible for 15% of the variance. In contrast, AOA was responsible for 21% of the variance in the mean grammaticality judgment reaction time scores. These results could perhaps indicate that AOA may have a lesser influence in the lexical area than on the grammatical domain. Spadaro (1998) investigated the relationship between age of onset of learning (AOL) and ultimate attainment of lexical knowledge on 38 highly-proficient bilinguals from a variety of L1 backgrounds, who were divided into three groups based on their AOL (0-6, 7-12, 13+). She operationalized ultimate attainment of lexical knowledge as total score on a battery of lexical tests. She found no significant between group difference on a word association test she administered (Kent-Rosanoff). However, she found significant between group differences on her self-designed tasks, which she interpreted as support for the existence of maturational constraints on L2 lexical acquisition starting at the age of 6.
35
A key problem with the Spadaro (1998) study was with the participant selection and grouping. First, the study did not have a clear LOR requirement and some individuals who participated had resided in Australia for only a short time, as little as 23 years. Eight of the subjects had an LOR of less than 10 years. Second, no formal measure was employed to support the claim that the participants were in fact nearnative in their L2. Two of the participants had no score within the NS range on any of the subtests; therefore, we need to question whether they should have been included in the study. Third, the measurement of the independent variable, age of onset at learning was clearly problematic. Spadaro wrote, “’Age of Onset of Learning’ represented the beginning of a serious and sustained process of language acquisition, usually as the result of either migration to an English-speaking community or the commencement of a formal English language programme in primary or high school” (Spadaro, 1998, p. 88). In the case of several individuals questions arise. How could individuals have started formal English training at age 2 or 4 if they did not migrate to an English-speaking community until 14 and 28, respectively? Unfortunately, measurements that appeared problematic were left unexplained. Age at which immersion in the L2 began would have been a more appropriate variable. Fourth, there were inconsistencies in the group memberships; Spadaro noted 13 members for the 0-6 group, 15 for the 7-12 group, and 10 for the 13+ group. According to her data set, she had 14 members with an AOL of 0-6, 15 with an AOL 7-12, and 9 with an AOL of 13+. This measurement problem is also unexplained. A closer look at the data set (pp. 85-87) suggests that if participants were grouped by age at which acquisition began in an immersion environment (that is, age on arrival corrected for when substantial interaction in an English-speaking environment actually began), the appropriate grouping likely should have been 10 for the 0-6 group, 7
36
for the 7-12 group, and 21 for the 13+ group. Overall, unfortunately, the measurement problems in Spadaro (1998) make the between group findings unreliable. On a case by case basis, we might note, however, that there is at least one lateonset bilingual participant in the Spadaro (1998) data set who achieved within the native speaker range on every single subtest. This person was a 71-year-old female, a native speaker of Hungarian, who began to study English at age 16, migrated to an Englishspeaking community at age 25, where she resided continuously for 46 years. A special mention and analysis of this case would have been helpful to the discussion. As Hyltenstam and Abrahamsson (2003) pointed out, the existence of at least one individual who is indistinguishable from a native speaker and whose L2 acquisition began after the hypothesized critical period is counter-evidence for the critical period claim. Spadaro’s hypothesized critical period for lexical acquisition is age 6; nevertheless, four of her subjects whose acquisition began after age 6 scored within the native speaker range on every single subtest and a fifth subject was just one point short of that native level achievement. Therefore, Spadaro’s data do not constitute strong evidence for her claim that age 6 is a biological limit for native level ultimate attainment in the lexical domain. Regarding Spadaro’s study, it is worth examining the question “What is it that native speakers know about the lexicon, which eludes nonnative speakers?” Tasks five and six were particularly challenging to nonnative speakers; in particular, nonnative speakers were distinguished from native speakers by not being able to supply a phrase in which the following words occur: beck (beck and call), gab (gift of the gab), muchness (much of a muchness), jinks (high jinks), kilter (out of kilter), askance (regard him askance) (p. 200, p. 177). In addition, the majority of NNSs could not identify the unusual word in the following phrases: get it off her heart (chest), keep it under your jumper (hat), likes to throw his size (weight) around (p. 201, p. 178). What appears to challenge 37
nonnative speakers is multi-word units or idioms that are stored as chunks. Regarding the validity of this measure, we would need to wonder whether the acquisition of these idioms depends on frequency and whether these idioms appear in the output to nonnative speakers as frequently as they occur in the interaction of native speakers. The suggestion that the effect of a biological constraint would be an inability to store multiword chunks is very peculiar. A better explanation for the phenomenon would be an age-related decline in working and verbal memory. Long (2007) reported on Lee (1998), a study that replicated Spadaro’s (1998) research with 45 Korean-English bilinguals with varying age of onset (AO) and 15 English monolinguals. All participants were university students between the ages of 2025. Both LOR and L1 use differed significantly across groups. Although Long’s presentation of the study does not allow for a detailed analysis, it appears that after LOR was partialed out, the only significant correlation between AO and lexical test score was on the third task, which Long described as a collocation task. (Participants needed to chose one of four words to complete a sentence, such as “If you really trust him, you should give him the _________ of the doubt.”) Long did not report on the items that actually differentiated native English speakers and child bilinguals from late-onset second language learners. Also, the minimum LOR requirement was only 5 years and the proficiency requirement was set at 550 on the paper and pencil version of the Test of English as a Foreign Language (TOEFL). This score represents the minimum acceptable proficiency for beginning university level studies at a moderately selective educational institution in the United States. Consequently, it is unlikely that this study would have been able to measure the maximum possible level of lexical attainment in the late-onset second language learner group.
38
Marinova-Todd (2003) compared 30 high proficient late-onset second language learners of English and 30 native speakers of English on a broader range of measures that included three lexical measures as well. For the L2 learners group, the age at testing was 24-53 (mean = 34) years, the LOR 5-20 (mean = 11) years, and the age at first significant exposure to the L2 16-31 (mean = 22) years. The study was exploratory in nature and did not control for L1. Participants came from 19 different native language backgrounds; the majority came from an Indo-European language background and several of them had a Germanic language background (German, Icelandic, Dutch). To assess receptive vocabulary size in English, Marinova-Todd administered the Peabody Picture Vocabulary Test – Revised (PPVT-R). To compare productive vocabulary, she took two measures on transcripts that resulted from an elicited speech task. These two measures were type-token ratio and rate of low-frequency words. Although, there was a highly significant between group difference on the PPVT-R standard scores (taking into account the number of statistical comparisons performed), 57% of the high proficient late-onset second language learners scored within the native speaker range on the PPVTR. Eighty-seven percent scored within the native speaker range on the productive vocabulary measures, the ratio of low-frequency words measure and TTR. These results seem promising for adult-onset L2 lexical acquisition. Because of the large number of measures, the analysis of the productive vocabulary measures was not reported in detail. The TTR measure appeared somewhat problematic in two important ways, one is validity and the other is reliability. As for validity, as Vermeer (2004) pointed out, the TTR measure is not a predictor of vocabulary size or even lexical richness, for which it is commonly used. It is simply a measure of the number of different words used per tokens in a given text. For example, there are many factors that drive up TTR that do not have any implications for 39
productive vocabulary size, such as omitting closed-class words (articles, pronouns, prepositions), using simple verb tenses rather than progressive and perfect tenses, discussing more topics, choosing a multi-word expression over a precise word (an animal that can live either on land or in water vs. amphibian), or making rhetorical choices (as in relying heavily on listings or trying to avoid repeating the same word). Secondly, as for reliability, the TTR measure has a known mathematical relationship to text length. This reliability problem has been solved by the development of a new lexical diversity measure, D, which can be calculated using a random sampling software program called VOCD. D is a more reliable TTR measure and it indexes how a person employs lexical resources, but it is not a direct measure of productive vocabulary size. On the other hand, the rate of low-frequency vocabulary is considered to be a better index of productive vocabulary size, but its calculation is still highly problematic. We do not have reliable frequency data against which to compare spoken American English. Various language corpuses exist on which frequency data have been drawn, but all have sampling, dispersion, and noise issues that need to be taken into consideration when trying to assemble frequency wordlists. Although a detailed analysis may be forthcoming, Marinova-Todd (2003) did not discuss in any detail how she arrived at her rate of low-frequency word measure and therefore it is difficult to infer a decisive conclusion from her study regarding receptive vocabulary size, even though her figure of 87% nativelike L2 lexical production sounds outstanding for adult-onset language learners. Bahrick et al. (1994), an experimental psychology study of 801 Cuban and Mexican immigrants, presented evidence which to date constitutes the strongest argument against critical period effects in the lexical domain. This large scale, carefully designed and meticulously piloted cross-sectional study looked at many different 40
measures of both L2 achievement and L1 retention across the lifespan among late-arrival immigrants in the United States. Table 2 shows the distribution of participants according to age at immigration (AOA) and length of residence in the target country. Control subjects were monolingual Spanish speakers in Mexico and monolingual English speakers in the United States.
Table 2 Participants by Age at Immigration to the United States and Length of Residence in the United States (Bahrick. et al. 1994, p. 268)
Years in the United States
Age at immigration
Total
10-13
14-17
18+
0-2
12
32
72
116
3-6
28
44
31
103
7-15
92
48
33
173
16-25
86
64
50
200
26-37
58
57
52
167
38+
14
20
8
42
290
265
246
801
Total
41
The experimental task included a battery of tests that were administered in both their Spanish (L1) and English (L2) versions. Great care was taken to construct the two versions of each corresponding test to be of equal level of difficulty. One lexical measure was a lexical decision task, in which participants had to differentiate between 18 real words and 18 nonwords (36 points maximum). The real words for the English version were drawn from a wide range of frequency levels from Thorndike and Lorge (1952). The second lexical measure was a 20-item multiple choice vocabulary test (the item selection for which was unfortunately not detailed). Figure 3 shows the mean test scores of English-Spanish bilinguals and English monolinguals. The particular English monolingual group was outperformed slightly on the lexical decision task after age 30 and caught up with on the vocabulary recognition task at age 40.
42
Figure 2. The mean scores on the two English lexical measures as a function of age at testing (Bahrick et al. 1994, p. 274). The within subject results indicated that the bilingual immigrants’ performance on the L2 lexical tasks continued to improve for 30 years after immigrating while their L1 lexical performance remained relatively stable during this time and only declined very late in their lifespan. After 30 years of residence in the United States, the bilingual subjects’ lexical performance in their two languages was near equal. Figure 4 shows the 43
within subject results on the Spanish and English lexical measures as a function of length of residence in the United States.
Figure 3. The mean Scores on the English and Spanish version of the lexical test as a function of length of residence in the United States (Bahrick et al. 1994, p. 271). The study reported highly significant benefits for years of formal English education and percentage of L2 use on the lexical measures. The results of Bahrick et al. (1994) challenge the claim for the existence of maturational constraints on L2 lexical acquisition. The findings also suggest that for obtaining measures on ultimate L2 lexical attainment, a 30-year length of significant exposure to the target language may be 44
necessary for late-arrival immigrant participants. In addition, the findings also indicate that in lexical acquisition studies, controlling for education level is necessary in order to obtain meaningful results. I have argued that the existing literature on eventual L2 lexical attainment is inconclusive. Hyltenstam (1988/1992), Kim (1997), Spadaro (1998), Lee (1998 qtd. in Long, 2007) claimed that they found support for the existence of maturational constraints in L2 lexical acquisition, Marinova-Todd (2003) and Bahrick et al. (1994) claimed that their findings contradicted the notion of a critical or sensitive period for L2 lexical acquisition and upheld the idea that native level L2 lexical attainment is achievable for adult-onset second language learners. From the existing behavioral studies, my conclusion is that crucial aspects of L2 lexical acquisition have not yet been studied in sufficient depth. The variables that attempted to operationalize eventual L2 lexical attainment have not indexed adequately ultimate L2 lexical size and depth of lexical knowledge. One key term here is the word ultimate. Achieving ultimate lexical attainment takes significant time, not a few years, but as Bahrick et al. suggest, perhaps 30 years. It may be that an adequate amount of functionally useful L2 vocabulary can be acquired within a few years, perhaps enough to be able to approximate native speakers’ everyday conversational output (Cummins, 1981), but not enough to approximate native speakers’ actual vocabulary size. Consequently, in order to determine ultimate L2 lexical achievement, we should be studying mature adults, who have had decades of significant exposure to the target language. Moreover, we need to focus on central aspects of lexical attainment, such as size and depth, not occasional lexical errors, reaction time to differentiate words from nonwords, type-token ratio in the output, somewhat incidentally chosen idioms and other multiword units. As I will demonstrate in the following section, good methods exist for measuring functionally useful vocabulary size, 45
which are also suitable for adult-onset second language learners. I will also present two instruments that have been validated to measure depth of word knowledge, Wesche and Paribakht’s (1996) Vocabulary Knowledge Scale and Read’s Word Associates Test (Read, 1993, 1998). We need to employ these existing tools to find more conclusive answers as to the possibility of nativelike ultimate attainment in the L2 for adult-onset learners.
2.10.
The Lexical Attainment of Adult L1 English Speakers: Vocabulary Size In defining vocabulary size as the number of words known to an individual, we
have at least two critical underlying issues to address, which are (1) what should count as a word and (2) what should count as evidence of knowing a word. Most vocabulary size studies settle the first issue by drawing on a predetermined list of words. For example, the best known vocabulary size test for children 8-30 months old is the MacArthur Communicative Development Inventories (CDIs), which includes a list of 700 prototypical lexical items, that is, words and short phrases, that very young children have been observed to typically know (Fenson et al., 1993). As vocabulary size grows, however, compiling similar “typically known” word lists becomes complicated. Such lists are generally collected from corpora that are supposed to sample language which the majority of people are assumed to encounter either in their every day lives or during the typical course of study. (Some of the most scholarly examples are the College Board Vocabulary Study by Breland, Jones, and Jenkins (1994); the Brown corpus by Kucera and Francis (1967; Francis and Kucera, 1982); the General Service List by West (1953), and the Teacher’s Word Book by Thorndike and Lorge (1944)). Of course, it becomes highly problematic what we consider typical everyday life or typical course of study. In these corpora are overrepresented words from printed works U.S. educated people are supposed to have read. Most lexical size tests for school-age children and youth are 46
drawn from these types of vocabulary lists. For adult vocabulary size tests, the word lists to be sampled may come from an entire dictionary. In this way, vocabulary size is determined in proportion to all the words that are listed in the dictionary on which the sample was drawn. This method has its own complications because all dictionary entries are not equal; there are headings and sub-headings, base words and derived words, proper names, transparent compounds, homographs and homonyms, word parts, abbreviations, jargon and esoteric lexical items that are highly specific to context which most people are never exposed to. For this reason, using a dictionary still requires an item-by-item evaluation of what should be included on the word list that will be sampled by the lexical size test. This method is also necessary in order to be able to take a truly random sample of the wordlist. The second underlying issue in determining vocabulary size is what should constitute evidence of word knowledge. On the CDIs, it is parental report of comprehension or active production. With another widely used test, the Peabody Picture Vocabulary Test (PPVT), evidence of word knowledge is being able to point correctly to one of four pictures, the one that corresponds to the word that was heard. Other common sources of word knowledge evidence are self-report, multiple-choice test of synonyms, matching words to meaning on lists, supplying a definition either orally or in writing, selecting an appropriate word to complete an utterance. Each option has its own validity and reliability issues. The type of evidence sought to evaluate word knowledge impacts the estimate of vocabulary size; it is challenging to compare studies that arrived at vocabulary size measures using different types of evidence for word knowledge. (For more details on the methodological challenges involved in measuring vocabulary size, see Lorge and Chall (1963), Nation (1993), and Wesche and Paribakht (1996).) 47
Based on a sampling of Webster’s Third New International Dictionary (1961), Goulden, Nation, and Read (1990) found that the mean receptive vocabulary size of college-educated adults in their sample was 17,200 base words (range = 13,200-20,700). Zechmeister and his associates conducted a series of studies based on stratified random sampling of the Oxford American Dictionary (Ehrlich, Flexner, Carruth, and Hawkins, 1980) (D’Anna, Zechmeister, and Hall, 1991; Zechmeister, D’Anna, Hall, Paus, and Smith, 1993; Zechmeister, Chronis, Cull, D’Anna, and Healy, 1995; Zechmeister, Morgan, Kruger, and Fash, 1998). Based on evidence from self-report on word knowledge, they estimated that junior high school students had a functionally useful receptive vocabulary of around 12,000 words, college students around 16,000 words, and educated older adults around 21,000 words (Zechmeister et al., 1995); however, when for evidence of word knowledge a multiple choice test was used with difficult distractors, the results were significantly lower: around 9,700 for junior high school students, 12,400 for college students, and 17,200 for older adults. The above vocabulary size figures come from monolingual American English speakers who have been educated in the US school system; therefore, we are looking at results with limited generalizability. Participants were tested on base words, that is, not derived words or transparent compounds; the wordlist contained functionally useful vocabulary as opposed to technical jargon, archaic and esoteric words. The higher estimates were the result of self-rating and they reflect that participants recognized the words as familiar, in any meaning, whether their own understanding of the words concurred with the dictionary or not. The results of the multiple-choice versions evidenced familiarity with one particular meaning of the word, give or take testing effects. We can assume that the actual size of the mental lexicon is larger than the measured functional vocabulary size because educated adults also have some specialty 48
or technical vocabulary, words that are unique to their culture and profession; in addition, many transparent compounds, derived words, irregular forms, phrasal verbs, common phrases, and proper names are likely stored as chunks in the associate memory system. What we can observe from the vocabulary size studies is that the mental lexicon continues to expand throughout the lifespan. Its growth starts with a few hundred items in the second year with perhaps reaching 1,000 words by the end of the third year. From that point on we could reasonably expect to add around 1,000 new words a year on average with the growth tapering off as we reach adulthood or the end of our formal education. Nevertheless, the capacity to expand the mental lexicon appears to remain open to experience into the mature adult years. The size of the mental lexicon is in the low tens of thousands; its capacity is not endless. Words that are richly networked and intimately known are probably much fewer; their order is in the low thousands. Although we have norm-referenced standardized vocabulary tests, we lack data about the normal distribution of actual or observed vocabulary size in the native language beyond 30 months of age. Standardized vocabulary tests do not allow us to infer actual vocabulary size because item selection on these tests was not based on a random sample of predefined word lists. However, we are able to observe some trends in vocabulary change by looking at raw scores on these standardized vocabulary tests, such as the Peabody Picture Vocabulary Test (PPVT). The PPVT was originally developed by Lloyd Dunn in 1959; in 1976 the test was revised (PPVT-R), then in 1997 a new edition followed (PPVT-III). Just recently a fourth edition was published by Pearson (PPVT-4). The test is a useful tool for comparing the receptive vocabulary of individuals from age 30 months to adult. In this test, test takers point to one of four pictures to identify the meaning of a word they hear. The PPVT-4 had two national tryouts with 2,303 participants and two norming tryouts with 5,543 participants, who were chosen to 49
match the 2004 U.S. Census on a number of variables (age, gender, education level/parents’ education level, proportion of special education designations, race, and geographic regional distribution). The examinees aged 30 months to 90 years. Details on the standardization, validity and reliability measures for the PPVT-4 are available in Dunn and Dunn (2007). The test mean score is 100.1 and the standard deviation is 15.0. The standardization process did not control for country of birth, native language or bilingual status; therefore, we cannot state with confidence that the mean score is the native U.S. English speaker mean, or more specifically the monolingual native English speaker mean. It is appropriate to interpret the mean score as the U.S. population mean. Norm tables are available for every age group. The test items on the PPVT-4 do not represent a random sample of a frequency wordlist or dictionary. The items were based on a selection from Webster’s New Collegiate Dictionary and several published word lists, but the actual selection was biased. Part of the bias had to have resulted from the fact that the test is a visual multiple-choice test in which items had to be illustrated in a reliable way. The illustrations had to be appropriate for different age groups, developmental levels, and cultural backgrounds. Because of the nature of the item selection, it would be inappropriate to try to infer actual receptive vocabulary size from the test results; however, we can note trends over the cross-sample of ages. We can observe change in vocabulary size over the lifespan vis-à-vis the test items. The rate of change in overall vocabulary size, however, cannot be reliably determined from PPVT score data. Figure 5 provides the smoothed curve of mean growth scale value scores (GSV) from age 18 months to 90+ years. Growth scale value scores were designed to measure change over time independent of which test form was administered. From the PPVT-4 GSV scores we can see a rapid growth in receptive vocabulary test scores at a rate of 1 50
GSV point per month between the ages of 30 (92) months to 10 years (181). Between 1014, the growth rate is 5 GSV points per year (181-199), which slows to 2.5 points per year from age 14-19 (199-211). Between 19-25 the average growth rate is 1 GSV point per year. The earlier version of the PPVT, the PPVT-III (Dunn and Dunn, 1997; Williams and Wang, 1997) as well as the Form B standardization sample of the PPVT-4 showed that growth in receptive vocabulary size continued until age 60. The same was not evident in the Form A standardization sample; however, the adult samples that took Form A and B in the PPVT-4 norming study may have differed on education level, verbal/cognitive abilities, or hearing ability. Overall, the PPVT-4 raw scores suggest that (1) vocabulary growth is a lifelong process, (2) that growth either tapers off at the end of the typical formal education years or that the PPVT is not especially sensitive to measuring the type of vocabulary growth that occurs later in adulthood. By and large, the PPVT-4 raw scores appear to support the vocabulary size trends observed by Goulden et al. (1990) and Zechmeister et al. (1995).
Figure 4. PPVT-4 mean growth scale values based on the standardization sample (data from Dunn and Dunn, 2007, pp. 50-51, 182-183).
51
In summary, the best currently available instrument for the assessment of functionally useful receptive vocabulary size in English is the word list Zechmeister and his associates assembled based on random sampling of a word list that resulted from the item-by-item evaluation of entries in the Oxford American Dictionary (D’Anna et al., 1991; Zechmeister et al., 1993, 1995, 1998). This sampled list allows us to infer vocabulary size vis-à-vis the 26,901 items on the complete list. Another English vocabulary test which has been carefully developed and extensively tried is the PPVT. Although this test does not allow for estimating vocabulary size, it is administered in the aural mode and has available norm tables for the U.S. population up to age 90.
2.11.
Testing Depth of Word Knowledge Depth of word knowledge refers to the quality of word knowledge, how well
words are known. As Meara (1990, quoted in Wesche and Paribakht, 1996) pointed out, recognizing a word is the minimum we can refer to as word knowledge; therefore, recognition is the baseline indicator of word knowledge, which may be acceptable evidence for measuring the breadth of vocabulary. Consequently, any behavioral evidence beyond word recognition constitutes some depth of word knowledge, such as being able to supply a synonym, a translation, an example of usage, a collocation, a definition, or definitions of different meanings. It can mean the ability to identify erroneous usage or be able to supply the correct word in a specific context. A common task for assessing depth of word knowledge is the word association task. The advantage of the task is that it is primarily a lexical task, in that it requires little in terms of phonological, morphosyntactic, or pragmatic knowledge of the language. There are several existing formats for this task. In one format the prompt word is supplied and associations are elicited by asking test takers to answer a series of 52
questions about the prompt word (What is X? What can you do with X? What does X look like? Can you tell me more about X?) (Vermeer, 2001, modeled after Anglin, 1985). In another format, the prompt word is presented for an open response and any number of associations are allowed (Wolter, 2001; Schmitt and Meara, 1997). In the forced choice format, the prompt word is followed by a list of possible associations and test takers have to decide which words are related to the prompt word (Greidanus and Nienhuis, 2001; Read, 1993, 1998). Here is an example of the forced choice format from Read (1998): critical clear
dangerous
important
rough
festival
illness
time
water
The forced choice format is significantly more challenging than the open response version. The difficulty depends on the frequency level of the prompt words, the type of relationship between the prompt word and the associate, and the non-related distractors (Greidanus and Nienhuis, 2001). Still another variation was devised by Stallman et al. (1995, quoted in Pearson, Hiebert, and Kamil, 2007, p. 290); they used the same prompt word repeatedly with different sets of possible associations. 1. A gendarme is a kind of a. toy
b. person
c. potato
d. recipe
2. A gendarme is a kind of a. public official
b. farmer
c. accountant
d. lawyer
3. A gendarme is a kind of a. soldier
b. sentry
c. law enforcement officer
d. fire prevention officer
The most established word association test for the assessment of depth of word knowledge is Read’s Word Associates Test (Read, 2004, 2000, 1998, 1993; used in Qian, 53
2002, 1999; adapted in Greidanus, Beks, and Wakely, 2005; Greidanus et al., 2004; Greidanus and Nienhuis, 2001). In this test, the prompt words were selected from among adjectives on 2000- and 3000-level word frequency lists; the assumption behind the selection was that these relatively high-frequency adjectives are likely to have a rich association network in the native English speaker mental lexicon. The purpose of the test is to select words from a list that have a lexical relationship to the target word, and thereby demonstrate in-depth knowledge of the target words. Three types of target word – associate word relationships are tested: paradigmatic (being a synonym or having a close meaning), syntagmatic (being a collocate or frequently co-occurring with the word), and analytic (being a part of the word’s definition). Words that are well known, that is, well integrated into the mental lexicon, are assumed to have stable associations of all three kinds. Validity and reliability data for the new version of the test are available in Read, 1998. Another commonly used instrument to measure depth of word knowledge is Wesche and Paribakht’s (1996) Vocabulary Knowledge Scale (VKS). The VKS is not a stand-alone test, but a generic instrument appropriate for testing any set of target lexical units. The VKS is much like a Likert scale; it allows test takers to respond to a target word in five different ways (indicating no knowledge, indicating familiarity, providing a synonym or translation while indicating the degree of certainty with the answer, and forming a complete sentence). Five possible scores may be assigned from the word knowledge evidence provided. 1 represents “no knowledge at all”; 2 “the word is familiar but the meaning is not known”; 3 “a correct synonym or translation is given”; 4 “the word is used with semantic appropriateness in a sentence”; 5 “the word is used with semantic appropriateness and grammatical accuracy in a sentence” (Wesche and Paribakht, 1996, p. 30). The purpose of the study should determine what test score 54
constitutes appropriate evidence of word knowledge, that is, what depth of word knowledge is sought by the test administrator. Even though the assessment of depth of word knowledge has recently been forefronted in native language research (see Pearson, Hiebert, and Kamil, 2007) and there are good existing methods to test it, I am not aware of any normative data for depth of word knowledge in adult native speakers of English. Nevertheless, this dimension of lexical acquisition is important to consider when comparing the ultimate lexical attainment of adult-onset second language learners to the lexical knowledge of native speakers of the target language.
2.12.
Motivations for the Study Although there is a lot of evidence pointing to the existence of age effects on the
ultimate attainment of adult-onset second language learners in the areas of phonology and morphosyntax, much less is known about the limits of attainment in the lexicosemantic domain in general and the L2 lexicon in particular. Recent neuroimaging studies suggest that the limits of L2 attainment for adult-onset second language learners need a close examination because the lexicosemantic domain may be the most promising area of adult-onset second language acquisition with perhaps the potential for nativelike end results. The existing behavioral data on the ultimate L2 lexical attainment of adultonset second language learners are conflicting; furthermore, better known behavioral studies do not support the claim of neuroimaging studies at all, but rather emphasize the possibility of maturational constraints on L2 lexical attainment. I suggest that the limits of L2 lexical attainment among adult-onset second language learners are far from obvious from the existing data and we need to further pursue the topic.
55
What we can see from the literature is that studies that attempted to investigate the limits of ultimate L2 attainment either used participants who had not had the opportunity to have acquired their eventual L2 lexicon or employed dependent variables that are not necessarily central to L2 lexical attainment. As I have discussed, the two most central aspects of lexical attainment are breadth (quantity) and depth (quality), that is vocabulary size and depth of word knowledge. Reliable assessment methods exist for the evaluation of these two dimensions of lexical attainment, but they have not been employed in the investigation of the ultimate L2 attainment of adult-onset second language learners. I suggest that by bringing in line current vocabulary assessment instruments and the study of L2 lexical attainment as a function of age of acquisition, we can produce dependable data on the limits of L2 attainment. Armed with valid and reliable data, we can observe the likelihood of nativelike L2 attainment. The results can help us evaluate the prospect of nativelike L2 attainment for adult-onset second language learners. The findings have theoretical implications for second language acquisition. The purpose of this study was to examine the limits of eventual L2 lexical attainment of second language learners who had extensive exposure to the target language by assessing the two most central aspects of lexical acquisition, vocabulary size and depth of word knowledge. In addition, I explored attributes that appear to have a relationship to nativelike L2 lexical attainment. This study attempted to correct problems with the methodologies of previous studies. In particular, I sought to establish reliable and defendable measures for such biographical variables as age at first significant exposure to the target language and length of significant exposure to the target language. I assessed the most central aspects of lexical knowledge, vocabulary size and depth of word knowledge. I assumed that for 56
achieving ultimate L2 lexical knowledge perhaps as many as 30 years is needed. I controlled for native language, education level, and age at testing. I employed two control groups to evaluate the effect of bilingual status on vocabulary size and depth of word knowledge. The participant pool was large enough to allow for valid statistical comparisons between the groups. I believe this study has yielded reliable data to evaluate whether nativelike L2 lexical achievement is possible for adult-onset second language learners and whether age effects constrain or severely limit nativelike L2 lexical attainment.
57
CHAPTER FIVE DISCUSSION 5.1.
Overview of Purpose and Method The purpose of the study was to evaluate the upper limit of eventual lexical
attainment in adult-onset second language acquisition. Specifically, the study investigated whether nativelike lexical knowledge is achievable for second language learners who began acquiring the target language in an immersion setting after age 16. A large body of literature has indicated that nativelike attainment in a second language is exceptionally rare, if not impossible, for adult-onset second language learners. The two leading hypotheses that try to account for this phenomenon are the age effects hypothesis and the critical period hypothesis. The age effects hypothesis (Birdsong, 2005; Flege, 1999) states that there is a linear decline between L2 attainment and age of onset of learning, which is due to a steady age-related decline in verbal and working memory. The critical period hypothesis states that ultimately successful language acquisition is limited to the early years of development. Capacity for language acquisition atrophies early in life, perhaps as soon as the first language has stabilized around age 6, but certainly by the onset of puberty. Recent accounts of the CPH stated that the critical period for language acquisition affects the various domains of language, such as phonology, grammar, and the lexicon (Hyltenstam and Abrahamsson, 2003; Long, 2007). However, recent neurolinguistic findings suggested that age has a differential effect on the different language domains and nativelike processing may be a very real possibility even for adult-onset L2 learners in the lexico-semantic domain (Kim et al., 1997; Wartenburger et al., 2003; Weber-Fox and Neville, 1996; Hahne and Friederici, 2001; Hahne, 2001; Sanders and Neville, 2003a). A series of studies in the area of phrasal semantics appear to have supported that claim as well (Slabakova, 2006). This 125
study attempted to investigate whether the lexical domain is subject to a critical period and whether age effects are in fact detrimental to success in L2 lexical acquisition in adulthood. To evaluate the limits of ultimate L2 attainment, I measured the arguably most important aspects of lexical knowledge - vocabulary size and depth of word knowledge - in 33 Hungarian residents in the U.S. who began to acquire English as a second language in adulthood. Participants were selected based on specific criteria that ensured that they had had an adequate length of time to acquire English under relatively favorable circumstances. Only participants with advanced to near native English proficiency were included who had at least several years of college education. Although these individuals were not common in the population, neither were they exceptionally rare. Each of them was given a battery of tasks, including three previously tested and published instruments with demonstrated validity to measure vocabulary size and depth of word knowledge in English. In addition to the lexical tests, they were interviewed and completed a questionnaire about their L2 learning and personal characteristics. In order to determine what may be considered native level lexical knowledge in English, 60 native speakers of English were recruited, who matched the adult-onset L2 learner group on age and education level, to serve as controls. Because the literature suggests that being bilingual may have an effect on ultimate attainment in the target language (Kim, 1997), or that non-nativelikeness may be an epiphenomenon of bilingualism (Birdsong, 2006), this study controlled for bilingual status. Half of the native speaker participants were bilinguals and half of them were monolinguals. All the bilinguals had advanced to native level proficiency in another language and used this language regularly at least for a small percentage of their communication. 126
5.2.
Summary of Findings This study found no between group differences on written measures of
vocabulary size and depth of word knowledge in the three groups of participants. However, there was a statistically significant difference on the aural vocabulary size measure between the highly successful adult-onset L2 learner group and the two native speaker groups. Although the adult-onset L2 learners’ mean (105.42) on the aural vocabulary size measure was significantly above the U.S. mean as indicated in published materials (100.1), it was nearly a standard deviation below the comparably educated native speaker mean as tested in this study (115.90 for the bilingual group and 115.80 for the monolingual group). Another important finding was the relatively high rate of native level achievement on the three vocabulary measures among the successful adult-onset L2 learners. Because the scores of the native speaker controls varied considerably on the three vocabulary tests employed in this study and because 11.67% of the control participants had at least one score below -2.00 standard deviations of the combined native speaker mean (N = 60), native speaker level was defined as scoring above -2.00 standard deviation of the native speaker mean on all three tasks. Twenty-five of the 33 adult-onset L2 learners, or 75.76% scored above this threshold on all three tasks, while everyone had at least two scores better than -2.00. Remarkably, 5 (15.15%) adult-onset NNS group participants achieved a score above the combined native control group mean on all three tasks. A receptive vocabulary size estimate was calculated for all participants based on the Self-Rated Vocabulary Test. Because the wordlist contained in this test was the product of sampling 26,901 entry words in the Oxford American Dictionary, this test allowed for estimating vocabulary size in relation to this population of words that had been 127
randomly sampled. The mean receptive vocabulary size of the adult-onset L2 learner group was 19,633 dictionary entry words (14,225-23,662, SD = 2,394) and for the native speaker groups combined it was 20,251 (13,380-24,366, SD = 2,538) words. The difference of 618 words between the means of the adult-onset L2 learners and the native speakers was not statistically significant. Five individuals were identified as exceptional adult-onset L2 learners on the basis that all three of their test scores were above the native speaker mean. Questionnaire and interview data were used to examine their characteristics. One commonality they shared was their family background; they came from educated families where the male caregiver had college education and the female caregiver held at least a high school degree. All five of these exceptionally successful English learners mentioned childhood foreign language learning experiences, which included both foreign language classes taken in school and private foreign language lessons. For three of them, the outcome of childhood language learning was conversational fluency. Two of them had an extensive background in Latin. They had high self-reported scores on general learning ability and logical problem solving ability. The three participants who were college-educated at the time of arrival in the U.S. reported top scores for their verbal ability in the native language and literacy in the native language. There was very high self-reported interest in reading daily and learning new words. Based on the findings of the five cases studies, four composite variables were computed from the questionnaire data for each target participant: (1) caregivers’ education, (2) L1 abilities, (3) reading-vocabulary interest, and (4) memory. Regression analysis indicated that the linear combination of variables 1, 2, and 3 significantly related to overall L2 lexical knowledge as expressed by the mean achievement on the
128
three lexical tasks. The linear combination of these three predictor variables accounted for 46% of the variance in the overall L2 vocabulary test scores. For the participant sample in this study, the data did not indicate that length of immersion in the target language, age at testing, or education level had a statistically significant relationship to the vocabulary test scores. This finding was the effect of the participation criteria. First, all of the adult-onset NNS participants had a considerable amount of time to acquire their L2 vocabulary in an immersion setting; the mean length of immersion in the target language was 31.15 years (median 24.00, range 10-52 years). Second, education did not show a significant relationship to test scores because every participant was highly educated in this sample. The mean years of education was 18.70 years (median 19.00, range 14-25). Further, education level and age at testing were negatively related. The younger participants in the adult-onset NNS group were more educated than the older participants; therefore; the effect of education and age at testing likely counteracted each other in this particular sample. A statistically significant relationship between L2 vocabulary scores and age of onset of L2 learning was not observed in this sample. Native level L2 vocabulary attainment was possible even for participants whose immersion in the target language did not occur until relatively late in life. One participant who had no experience with the target language at all until age 36 achieved above the control mean vocabulary size and depth of word knowledge. The fact that native level L2 lexical attainment is possible at any age should indicate that lexical acquisition is not constrained by a critical period. On the other hand, it is likely that the effect of age of onset would reach significance in a group that represented typical adult-onset L2 learners, not just highly successful individuals. The purpose of this study was to investigate the limits of eventual L2 lexical attainment; therefore, only those adult-onset L2 learners were recruited who could 129
reasonably be expected to define the upper limit of L2 vocabulary achievement, but not its norm. The data in this study indicated that the upper limit of adult-onset L2 learning is native level vocabulary size and depth of word knowledge in the target language even with age of onset of L2 learning in the third and fourth decade of life. In a population with greater variability in ultimate L2 lexical attainment, it is likely that age of onset of L2 learning would have a statistically significant relationship to L2 vocabulary achievement. Finally, this study found no evidence whatsoever that bilingual status has any role in vocabulary size or depth of word knowledge in the dominant language. The vocabulary scores of the two groups of native speakers of English, bilinguals and monolinguals, were virtually identical. I found no support for the suggestion that bilinguals’ distributing mental lexical resources between their languages has any contraindication for the dominant language.
5.3.
The Findings of This Study In Relation To Previous Research
5.3.1. Marinova-Todd, 2003 The findings of this study are comparable to the results Marinova-Todd (2003) reported. Marinova-Todd (2003) administered the PPVT-R (aural task) to 30 adult-onset L2 learners from 19 different L1 backgrounds who had been immersed in English for 5+ years (mean 11 years, range 5-20 years). The mean standard PPVT-R score of MarinovaTodd’s adult-onset NNS group was 97.77 (SD = 15.08, range 71-133), significantly below the native speaker controls (mean 121.53, SD = 11.38). Nevertheless, 15 (50%) of Marinova-Todd’s target participants scored above -2.00 standard deviations of the native control mean. Of these 15 native level learners, 6 (20%) scored above -1.00 standard deviation of the native control mean, and one (3.33%) was above the native 130
control mean. The pattern of these results is similar to what I found. There are a few important differences, however. One, the adult-onset NNS participants in the present study had longer immersion in the target language (mean 31 years, range 10-52) than did those in the Marinova-Todd study. In Marinova-Todd’s study, there was a significant correlation between vocabulary scores and length of residence (r = .38, p < .05), which probably indicates that at least some of the adult-onset NNS participants she included may not have been immersed in the target language long enough to have reached ultimate attainment in L2 vocabulary. Second, the native control mean was lower in the present study (bilingual group 115.90, SD = 11.92, monolingual group 115.80, SD = 12.70) than in Marinova-Todd’s study. This could have been the outcome of two factors. First, I used a newly revised version of the PPVT, which had more items that discriminated among high scoring individuals and which was normed on a much larger sample than the PPVT-R. Second, Marinova-Todd’s controls were graduate students at Boston area colleges, mainly Harvard University; the control participants in this study, although college educated as well, came from a more diverse background. Overall, the results of this study support Marinova-Todd’s conclusion in that native level lexical attainment is achievable for adult-onset L2 learners.
5.3.2. Bahrick et al., 1994 Bahrick et al. (1994) tested 801 Cuban and Mexican immigrants who arrived in the United States between the ages of 10-26. They administered two written lexical tasks in both Spanish and English; one of the written tasks was a lexical decision task in which test takers decided whether a probable string was an actual word and the other task was a multiple choice vocabulary size test. The Spanish and English versions were constructed to be equal in difficulty. The results were compared against the results of 131
monolingual Spanish speakers and monolingual English speakers. English vocabulary size showed a steady growth for 30 years after immigration. For the cohort of immigrant participants who had been in the United States for 30 years at the time of testing, the lexical decision scores were equal on the Spanish and English versions of the test and the vocabulary size scores were near equal as well. (Unfortunately, exact figures were not reported.) The cohorts varied in how they compared to monolingual native speakers of English. Several target participant cohorts scored equal to native speakers of English on the lexical decision task, and one target group cohort was equal to native speakers of English on the vocabulary size test. However, it is difficult to reach conclusions from the Bahrick et al. data because the monolingual English speaking controls may or may not have been equal to the target participants on education level. It appears from the figures that at least one target group cohort was nativelike on the English vocabulary size measure, and two target group cohorts were nativelike on the English lexical decision task. The suggestion of Bahrick et al. that after an adequately long residence in the target country (perhaps as long as 30 years) native level lexical achievement in the target language may be achieved by late-onset immigrant language learners as a group may be warranted by this study also, but only with two important qualifications. First, my study supports the Bahrick findings only for certain types of L2 learners, those who are highly educated, highly literate, and have had 20/30+ years of immersion in the target language. Second, these findings most likely only apply to written lexical tasks.
5.3.3.
Kim, 1997 Kim (1997) tested 70 Korean-English bilinguals with varying age of onset of L2
acquisition (age of arrival in the U.S.) using a lexical decision task. The task was to read word pairs off a computer screen and decide whether the second letter string presented 132
was an actual word or a nonword in English. The target letter strings were presented in three different conditions: semantically related prime (REL), semantically unrelated prime (UN), and nonword (NON). Ten native speakers of English, who were proficient in another language, served as controls. The independent variable was age of arrival in the United States and the dependent variables were reaction time (RT) and accuracy of response (AR). Kim reported a significant between group difference; the follow-up tests showed that the difference was due to difference in one condition only, the nonword condition. Older arrivals, that is participants with later age of onset of L2 immersion (12+ years) were significantly slower to make a decision about letter strings that were not actual English words and their decision was less likely to be correct than those with earlier age of onset of L2 immersion. Kim suggested that the findings signaled the existence of a sensitive period for L2 lexical acquisition. I believe the data of Kim’s research fit with the findings of this present study, although I disagree with Kim that the data indicate evidence for a sensitive period for L2 acquisition. Here is the reason why. The words that were presented in the REL and UN conditions were relatively common; they were words that L2 learners with intermediate level vocabulary knowledge would be expected to know (for example, flowers, white, apple, bird, sleep, funny). The most difficult words presented in the REL and UN condition were shallow, tin, grief, annoy, and dean. By contrast, nonwords were very difficult items because they were plausible (for example, fush, enidal, cecide, yow, cresp). If we were to interpret the REL and UN conditions as two fairly easy vocabulary size measures and the NON condition as a very difficult vocabulary size measure, then it would not be unreasonable to see why late-onset learners with significantly less time to acquire their L2 vocabulary would not do as well on the last measure as earlier arrivals with many more years of immersion in the target language. In Kim’s study, the NON measure was 133
really the only lexical measure that probed native level lexical achievement. The target participants in the older arrival groups had as few as 5 years of L2 immersion; the mean length of stay in the U.S. was 8 years 2 months for the group which arrived between ages 12 and14 (range 5-11 years) and 6 years 7 months for the group which arrived after age 15 (5-9 years). By contrast, the younger arrivals (i.e. those who arrived prior to age 12) had 10-21 years of residence in the U.S. Kim assumed that these findings occurred because the older arrivals missed the critical period for L2 acquisition. However, the difference on the length of residence variable across the groups would indicate a very plausible alternative: the late-onset participants had not yet had sufficient time in the US to reach native level L2 vocabulary as a group. The correlation data provide support for this explanation. Kim reported correlational coefficients of age of onset and length of stay in the U.S. for both dependent variables (reaction time, accuracy of response) in the three conditions (REL, UN, NON). Both age of onset and length of stay showed significant correlation with the accuracy of response scores in the NON condition. However, the correlation of age of onset and the accuracy of response in the nonword condition was no longer significant after length of stay was partialed out. In other words, once length of stay was controlled for statistically, age of onset had no significant correlation with accuracy of response on any of the subscores. To be exact, one single partial correlation was significant; it was for age of onset and reaction time in the NON condition (r = .22, p < .05). This low level relationship for age of onset was also noted in the present study (Table 19, Figures 11–14), although it did not reach significance. It does not appear to suggest the existence of a critical or sensitive period for L2 lexical acquisition.
134
5.3.4.
Spadaro, 1998 Spadaro tested 38 immigrants to Australia on their English lexical skills. These
participants spoke 14 different native languages and at the time of their arrival in Australia they were 3 – 43 years old. The seven tasks that Spadaro constructed were very different from the present study; they mainly probed knowledge of collocations and idioms. Task 1 required sentence completion, as in “The drunk _____ over to the bus stop”. In Task 2, participants discriminated real English words from possible but not real English words, as with “starshine” or “wholeheart”. In Tasks 3, test takers needed to identify sentences with unusual collocations, like “You’ve got the money you demanded, now pass the hostage”. In Task 4, participants supplied the second half of short phrases (“below the _____”, “stub a _____”). In Task 5, they wrote a phrase with a rare key word, like beck or kilter. In Task 6, they corrected idiomatic expression (for example, “Be careful what you say or you’ll set the cat among the chickens”). Finally, in Task 7 they identified sentences with usual uses of idioms (“Dolores is the party’s life and soul”). It was difficult to compare Spadaro’s data against this study because of the grouping and the presentation of results. Fortunately, Spadaro included her target participant data files and the scan sheets with every test item for each participant. This allowed me to reanalyze her data in a way that they can be directly compared against this study. Nineteen of Spadaro’s participants qualified as adult-onset NNSs by the definition of this study; they arrived in Australia after age 16. Nineteen NNSs arrived in Australia during childhood; I included these participants in the child-onset NNS group. For those participants who arrived in childhood, but reported that they did not start learning the target language until later, I took the age of onset of L2 learning variable to 135
mean the beginning of L2 immersion. For the adult arrivals, I considered age of arrival to mean the beginning of L2 immersion (although I am fully aware that this is an assumption). I provided the descriptive statistics for age at testing, age of L2 immersion, and length of residence in Australia in Table 21 in the Appendices. Spadaro included 10 native speaker controls; the characteristics of these participants were not reported. Years of education were not reported for any of the participants, although it appears that most of the participants had some connection to a college community. In order to compare the lexical scores of the three groups (adult-onset NNS, child-onset NNS, and NS) and determine the rate of native level lexical attainment, I followed the same procedures that I employed in this study and reported on in Section 4.5. I calculated the native speaker group’s mean and standard deviation on each task. I converted each set of scores into z-scores based on the NS group mean and standard deviation (N = 10). I provide the z-scores for all three groups in Tables 22, 23, and 24 in the Appendices. By the standards of this study, participants were considered nativelike if all of their lexical subscores were greater than -2.00. In Spadaro’s study, only 3 of the 38 participants met this criterion: a 47-year-old Dutch speaking female (AO = 4), a 22-yearold German male (AO = 6), and a 46-year-old Italian speaking female (AO = 6). In fact, both the child-onset NNS and the adult-onset NNS group was significantly below the native speaker group on every single subscore, as well as overall. There was little variability in the scores of the NS group; consequently, even small differences, such as missing a single test item, had a major effect on the subscores. In order to correct for this, I considered the overall score, the average of z-scores, as a measure of nativelikeness. This figure expressed how a participant scored on average in relation to the native speaker mean and standard deviation. In the NS group, the lowest overall score was z = 136
-.94; therefore, I took this figure to be the threshold of overall nativelikeness on the lexical tasks. By this definition, 10 of the 38 NNS participants qualified as native level. Six of the participants who averaged at least -.94 were child-onset L2 learners and 4 were adult-onset L2 learners. The characteristics of the four participants whose immersion in English began after age 16 and who were near-native by the overall lexical measure follow (z ! -.94): (1) a 71-year-old Hungarian female (AO = 25, LOR = 46, z = -.79), (2) a 68-year-old Hungarian male (AO = 23, LOR = 45, z = -.94), (3) a 30-year-old Filipino female (AO = 28, LOR = 8, z = -.64), and (4) a 56-year-old French female (AO = 18, LOR = 34, z = -.92). Of these four participants, number 3 reported that she was a childhood bilingual of Tagalog and Hokkien, and studied English from age 6. Number 1 and 4 studied English in high school in their native countries; number 2 had no English learning experience prior to arriving in Australia at age 23. We can conclude, on the one hand, that at least some of the adult-onset NNS participants were near-native, and on the other hand, child-onset NNSs were not guaranteed to be nativelike on this set of lexical tasks. Five of the very young arrivals in the child-onset group missed the nearnative cut-off (z ! -.94): (1) a 25-year-old German male (AO = 3, LOR = 18, z = -2.76), (2) a 31-year-old Italian female, who was born in Australia (AO = 6, LOR = 31, z = -2.62), (3) a 22-year-old French speaking female, who was born in England (AO = 6, LOR = 13, z = 1.99), (4) a 34-year-old French speaking female (AO = 6, LOR = 29, z = -1.99), and (5) a 41-year-old Gaelic speaking male (AO = 4, LOR = 37, z = -1.89). I believe the evidence from Spadaro suggests that there are in fact aspects to lexical knowledge that are extraordinarily difficult for NNSs to acquire. Multiword units, such as collocations and idioms, appear to be especially problematic although it is not immediately clear why. That intuition which makes a native speaker cringe in response to an awkward expression and conclude that “this just doesn’t sound right” 137
does not seem to be guaranteed to non-native speakers. The reflexive response or mental motor skill which allows a native speaker be able to almost automatically complete a common phrase, idiom, or proverb remains a challenge to acquire. These findings, however, do not appear to contradict the claim of this study, which is that the upper limit of adult-onset L2 lexical attainment is native level knowledge in the area of vocabulary size and depth of word knowledge, two key dimensions of lexical knowledge. Native level L2 lexical attainment, although not common, is achievable in adulthood. According to Spadaro’s data, adult immigrants can reach near-native level even in the area of collocations and idioms, which constitute a formidable challenge to any nonnative speakers, not just adult-onset L2 learners.
5.4.
The Relevance and Implications of Findings The present study constitutes counterevidence for the critical period hypothesis
(CPH) in the L2 lexical domain. In fact, very little evidence exists that supports the CPH in L2 lexis. In his 2007 synthesis of the critical period hypothesis, Long, a leading SLA theorist, stated: “Age 6 has also been implicated for native-like attainment of L2 lexis and collocation (Hyltenstam, 1992; J. Lee, 1998; Spadaro, 1996)” (Long, 2007, p. 50). I have discussed these studies in Sections 2.9. and 5.3. and I strongly disagree that they constitute conclusive evidence for the CPH in L2 lexis. The findings of this study suggest that Long’s conclusion is not warranted for the L2 lexical domain. He wrote, “The widely documented failure of late starters to achieve native-like proficiency, even when motivation, cognitive abilities, and opportunity are optimal and plentiful, all agree, is one of the most salient facts about SLA” (p. 71). On the contrary, this study found that native-like attainment in L2 lexical size and depth of word knowledge was possible for adult-onset learners who had relatively ideal opportunities as well as high cognitive 138
abilities, which included several decades of L2 immersion along with a college education. Long also wrote, The easiest way of refuting all variants of a CPH is to show native-like attainment by one or more learners first exposed to the L2 after the closure of the alleged period in question in the linguistic domain(s) in question. […] If maturational constraints are nonexistent, then it should be easy for opponents of the CPH to produce numerous learners who, despite the late start, have reached native-like levels after living in the target language environment for several decades. In my view, no one has managed to come up with such a case yet, although many have tried (p. 71). In the present study, I reported on five cases, who were not just nativelike, but whose L2 accomplishments on measures of vocabulary size and depth of word knowledge were above the comparably educated native speaker mean. I found that 76% of the adultonset L2 learner participants were nativelike on all three L2 vocabulary measures. Without a doubt, the findings of this study stand in stark contrast to a large body of evidence that has documented the apparent failure of adult-onset L2 learners to become nativelike in the domains of phonology and morphosyntax. In addition, the results of the present study indicate that the rate of ultimate success in the L2 lexical domain may be much higher than we previously anticipated. The results suggest that there is something special about the lexicon which makes it less vulnerable to the effects of aging than phonology and morphosyntax are. It is unlikely that the L2 lexis is subject to a critical period beyond which native level vocabulary knowledge is unattainable. The data here did not indicate that age effects prevent adult-onset learners from reaching native level L2 vocabulary size and depth of word knowledge. Trendlines suggested that the mean ultimate attainment in L2 lexicon may decline with age of onset; however, in this group of high-proficient L2 learners this decline did not reach statistical significance. Results from a random sample of the population would most likely show a significant linear decline of accumulated L2 lexis as a function of age of onset of L2 learning. 139
We know that the accumulation of lexis is lifelong and vocabulary size does not show a decline until the 70s in the average person. Vocabulary size is affected by education level and experience with a variety of vocabulary that is entailed in one’s activities. The size of functionally important vocabulary that the average native speaker of English acquires is between 10,000-20,000 words, not large enough to be beyond the reach of late-onset second language learners who deliberately engage in experiences that contribute to L2 vocabulary growth. We must note that native level L2 vocabulary is not the norm among adult-onset L2 learners. In addition, it is unrealistic to expect native level L2 vocabulary achievement too soon, which means fewer than 20 years. However, after 20-30 years of significant daily interaction in the target language, adult-onset learners who are actively engaged in L2 language improvement may achieve native level receptive vocabulary even in cases when they do not appear to be nativelike for other aspects of their L2 proficiency. The prognosis to achieve native level receptive vocabulary skills in the target language is reasonably good for adult-onset L2 learners, quite possibly far better than the possibility of reaching the native level in any other L2 skill.
5.5.
Limitations of the Study We must be careful to appropriately interpret the findings of this study. The
findings do not suggest that over 75% of adult-onset immigrant L2 learners reach native level receptive vocabulary size in their L2. Participants in this study were selected for their high proficiency in English after 20 years of significant daily interaction with native speakers of English. The participants who had fewer than 20 years of exposure to English were only included because they identified themselves as near-native in their L2. The participants had optimal conditions under which to acquire English: they were 140
highly educated and highly literate in their native language. They were employed in the United States in positions appropriate for their field of training and education level for a considerable length of time. Many completed graduate level studies in the United States. The participants in this study did not represent the average adult U.S. immigrant L2 learner. Second, the study by no means implies that the participants were native level in any aspect of their L2 other than receptive vocabulary size (as measured by the PPVT-4 and SRVT) and depth of word knowledge (as measured by the WAT). Productive L2 vocabulary and knowledge of multiword units, such as idioms and collocations, were not tested. Of course, neither was there any attempt to assess their phonological or morphosyntactic attainment. Third, the receptive vocabulary size estimate was produced with a self-report measure (SRVT). Self-report measures of vocabulary size have been criticized for being unreliable (Daller, Milton, & Treffers-Daller, 2007; Nation, 2007; Eyckmans et al., 2007). I implemented several control features to improve both the validity and reliability of this measure; nevertheless, I must acknowledge that there is some controversy regarding self-report vocabulary measures. Read suggests, “There is counter-evidence that a format like Yes/No can work effectively as a measure of vocabulary size for research purposes. Given the number of words required to constitute an adequate sample for estimating vocabulary size, self-report can be seen as an indispensable tool, provided that a validity check is incorporated into the measurement procedure” (J. Read, personal communication, February 23, 2008). As an additional sub-point, the unit of measurement must also be noted: the SRVT sampled main dictionary entries in the OAD which represent neither lemmas nor word families; therefore, we must be cautious when we compare the SRVT measures with other measures of receptive vocabulary size. 141
Fourth, the analyses I conducted as a follow-up to the case studies of highly successful language learners (Section 4.9.) had measurement errors. Although, I produced the best measurements I could based on the interview data, the variables were ad hoc and there were no predetermined procedures to establish that they were accurate and consistent measures of the concepts I identified by them. I calculated composite variables from interview data based on what I observed and reported on in the case studies. These composite variables were not firmly grounded constructs, neither were they measured with acceptable reliability. The analyses conducted with these variables and measurements were meant to be exploratory. Finally, the fact that all adult-onset L2 learners were native speakers of Hungarian resulted in both strengths and weaknesses. Because Hungarian and English are not cognate languages, the adult-onset L2 learners did not have the advantage of lexical transfer from their L1. They had to acquire more root words, prefixes, and suffixes than native speakers of languages that are related to English; consequently, their accomplishments are greater than similar accomplishment of native speakers of Dutch or Swedish, for example, would have been. However, some might argue that mandatory Russian instruction in Hungarian schools, which almost all the Hungarian native speakers took part in, could be a confounding factor. Because Russian is an IndoEuropean language, the Hungarian speakers could have benefited from some language transfer there. I suppose there is a minor limitation in this assumption. Two of the adultonset NNSs reported excellent Russian proficiency before age of onset of immersion in English, none of the others indicated that they had more than at one time basic proficiency in Russian. Nevertheless, it is not impossible that Hungarians in general are more likely to have studied foreign languages in childhood because it is all too evident that Hungarian is a small minority language in Europe and not well suited for 142
communication outside the borders of Hungary. If childhood foreign language learning positively affects ultimate lexical attainment in a later learned language, then this could be a confounding factor, which could limit the generalizability of the findings.
5.6.
Suggestions for Future Research The present study has uncovered counterevidence to previous claims regarding
the existence of a critical or sensitive period for the acquisition of L2 lexis. The findings indicate that the upper limit of adult-onset L2 learning is native level receptive vocabulary size and depth of word knowledge. Additional studies will be needed to replicate the findings of this study with adult-onset L2 learners with different L1-L2 pairings. The findings of this study suggest that native level lexical attainment among highly educated adult immigrants with 20+ years of immersion in the target language is not rare. It would be important to conduct research to examine the distribution of eventual lexical attainment in samples that were not recruited for their near-native L2 proficiency but represent a broader spectrum of late-arrival immigrants. A study of this type would not only be able to establish the normal distribution of eventual lexical attainment in late-arrival immigrant language acquirers, but would be able to report on the function of age in relation to age of onset of L2 acquisition. Further, future studies on ultimate L2 lexical attainment will need to examine dependent measures that suitably probe other key aspects of L2 lexical knowledge, such as productive vocabulary size or the ability to supply semantically appropriate words in context. When selecting these dependent measures, it will be particularly important to consider the extent to which the measures are independent from other language domains such as phonology and morphosyntax, the extent to which the measures are 143
dependent on knowledge and experience that are not strictly linguistic, as well as the range and variability which native speakers show on the tasks. Selecting an appropriately large and diverse native speaker control group will be crucial to assuring the validity of future studies on ultimate attainment in L2 lexical acquisition. Finally, I anticipate that the adult-onset L2 learners’ difficulty handling multiword units will be central to future discussions on the critical period in lexical acquisition. We currently lack a published instrument that can provide a valid and reliable assessment of multiword units for research purposes. Producing such an instrument would make a valuable contribution to future research.
APPENDICES
144
Table 16 Z Scores of the Adult-onset NNS Group Z scores on lexical tasks* Adult-onset NNS group 1 2a 3aaa 4aa 5aa 6aa 7a 8aa 9aa 10aaa 11aaa 12 13 14aa 15aaa 16a 17a 18 19a 20aaa 21a 22a 23a 24 25a 26aa 27a 28 29a 30 31aa 32 33aa Mean
PPVT-4 standardized
Word Associates Test
-2.12 -1.46 1.00 0.18 -0.56 -0.64 -1.46 -0.97 -0.97 0.09 0.34 -0.97 -2.04 -0.15 0.34 -1.30 -0.89 -1.54 -1.46 2.22 -0.97 -1.95 -0.56 -1.79 -1.54 -0.72 -1.38 -0.81 -1.22 -1.46 -0.48 -2.28 -0.64 -0.86
-0.65 -0.82 1.33 0.34 0.17 0.17 -0.16 0.67 -0.32 0.01 0.17 -4.11x 0.17 1.16 1.16 -0.49 0.34 -1.47 -0.82 1.33 -0.16 1.16 0.34 -3.12x -1.97 0.34 0.01 -2.13 -1.31 -2.96x -0.82 -1.31 -0.65 -0.44
Self-Rated Average of Vocabulary Test z scores -0.82 -0.93 0.73 -0.10 1.29 -0.32 -1.60 -0.32 -0.27 1.18 0.07 -0.10 -1.99 1.34 0.35 0.57 -1.54 -2.37 -1.54 0.23 -1.26 -0.27 -1.49 -0.15 0.73 0.29 -0.27 0.29 -0.38 -0.04 0.96 -0.04 -0.27 -0.24
-1.20 -1.07 1.02 0.14 0.30 -0.26 -1.07 -0.21 -0.52 0.43 0.19 -1.73 -1.28 0.78 0.62 -0.41 -0.70 -1.80 -1.27 1.26 -0.80 -0.35 -0.57 -1.69 -0.93 -0.03 -0.55 -0.88 -0.97 -1.49 -0.11 -1.21 -0.52 -0.51
* Z scores were calculated on the basis of the native speaker mean and standard deviation (with both control groups combined, N = 60). aaa aa a x
Participant scored above the native speaker mean on all three lexical measures (z > 0). Participant is nativelike on all on all three lexical measures (z > -1). Participant is nativelike on all on all three lexical measures (z > -2). Score is outside the native speaker range.
145
Table 17 Z Scores of the Bilingual Native Speaker of English Control Group Z scores on lexical tasks* Bilingual NS group 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Mean
PPVT-4 standardized
Word Associates Test
Self-Rated Vocabulary Test
Average of z scores
1.08 0.50 0.18 -0.15 1.08 -1.46 -0.72 -1.54 0.18 0.67 -0.72 0.01 1.00 0.34 -0.81 -0.72 -0.97 2.14 0.67 -0.56 -0.15 -1.79 -0.07 0.18 0.09 1.00 -0.23 0.01 2.14 -1.22 0.00
0.83 1.00 0.50 1.00 0.83 -0.16 -0.32 -0.65 -1.14 0.50 -0.65 0.34 1.49 0.67 -1.64 0.50 -2.30 1.16 1.16 -1.80 -0.49 -1.64 -0.49 -0.16 -0.49 0.50 -1.14 -0.49 1.00 -2.13 -0.14
-0.21 0.07 -0.88 -1.21 -0.71 0.01 0.01 -2.71 1.62 1.12 -0.38 -0.65 0.07 0.73 0.01 -0.10 -0.04 1.07 1.62 0.73 0.79 -0.04 0.18 1.07 1.12 0.96 -0.65 0.84 0.62 0.73 0.16
0.57 0.52 -0.07 -0.12 0.40 -0.54 -0.34 -1.63 0.22 0.76 -0.58 -0.10 0.85 0.58 -0.81 -0.11 -1.10 1.46 0.82 -0.54 0.05 -1.16 -0.13 0.36 0.24 0.82 -0.68 0.12 1.25 -0.87 0.01
* Z scores were calculated on the basis of the native speaker mean and standard deviation (with both control groups combined, N = 60).
146
Table 18 Z Scores of the Monolingual Native Speaker of English Control Group Z scores on lexical tasks* Monolingual NS group 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Mean
PPVT-4 standardized
Word Self-Rated Average of Associates Test Vocabulary Test z scores
0.50 -0.48 2.14 0.67 -0.07 -0.81 -0.56 -0.81 -0.40 -0.23 0.59 -1.38 -1.63 0.42 -1.05 -0.72 0.34 -1.46 0.75 -0.23 0.75 0.75 1.40 1.00 2.47 1.00 -0.07 -0.64 -0.97 -1.38 0.00
1.16 -0.49 1.16 1.16 0.17 0.50 0.34 -0.82 0.17 -0.16 -0.65 -0.82 -2.79 -0.32 -0.82 0.17 1.16 0.50 1.16 -0.32 -0.16 1.00 0.50 0.01 0.83 1.16 1.00 0.17 -1.47 0.67 0.14
0.73 0.12 0.35 1.34 0.18 -1.26 -1.76 -2.04 -0.32 0.73 0.62 0.07 -1.87 0.18 -2.43 -0.15 -0.21 -0.21 1.29 -0.21 0.51 -0.49 -0.32 0.79 1.34 1.51 -0.10 -0.76 -2.49 0.07 -0.16
0.80 -0.28 1.22 1.06 0.09 -0.52 -0.66 -1.22 -0.18 0.11 0.19 -0.71 -2.10 0.09 -1.43 -0.24 0.43 -0.39 1.07 -0.25 0.37 0.42 0.53 0.60 1.55 1.22 0.28 -0.41 -1.64 -0.22 -0.01
* Z scores were calculated on the basis of the native speaker mean and standard deviation (with both control groups combined, N = 60).
147
Table 21 Descriptive Statistics of Variables in Spadaro (1998) Group
Age at testing N
Mean Median SD
Age of L2 immersion
Range Mean Median SD
Length of residence Range Mean Median SD
Range
Adult-onset NNSs 19
48.37 45.00
14.38 29-73 26.16 24.00
6.78
16-43 19.68 11.00
16.54 2-46
Child-onset NNSs 19
35.21 31.00
13.11 18-70 7.79
3.47
3-14
14.15 8-62
148
6.00
27.32 26.00
Table 22 Z Scores of the Adult-Onset NNS Group in Spadaro (1998) Z scores on lexical tasks* Adult-onset NNS group 1 2 3 4bc 5c 6 7 8c 9 10 11 12 13c 14 15 16 17 18 19 Mean
1
2
-5.71 -1.90 -3.81 0.95 0.00 -0.95 -0.95 0.95 -4.76 0.95 -2.86 -3.81 0.00 -0.95 -1.90 -6.67 -0.95 -5.71 -2.86 -2.16
-1.04 -1.78 -1.04 -1.04 -0.30 0.44 -4.00 -2.52 -1.04 -6.22 -2.52 0.44 0.44 -4.74 -0.30 -4.74 -4.00 -1.78 -5.48 -2.17
3
4
5
6
7
-18.44 -6.63 -5.80 -16.04 -2.47 0.31 -2.42 -2.39 -8.49 -1.27 -18.44 -6.63 -6.93 -12.26 -3.07 -2.81 -0.32 -0.11 -0.94 -1.27 0.31 -2.42 -0.11 -2.83 -1.27 -5.94 -7.68 -4.66 -10.38 -0.06 -5.94 -7.68 -6.93 -23.58 -3.07 0.31 -0.32 -0.11 -4.72 -0.06 -2.81 -1.37 -5.80 -25.47 -2.47 -24.69 -9.79 -10.34 -21.70 -4.28 -2.81 -8.74 -6.93 -17.92 -3.07 -2.81 -0.32 -1.25 -6.60 -1.27 -2.81 0.74 -0.11 0.94 -3.67 -12.19 -6.63 -8.07 -33.02 -3.67 0.31 -8.74 -6.93 -6.60 -3.67 -12.19 -15.05 -9.20 -40.57 -3.67 -15.31 -9.79 -9.20 -29.25 -5.48 -9.06 -10.84 -8.07 0.94 -2.47 -9.06 -11.89 -9.20 -33.02 -5.48 -7.58 -6.13 -5.38 -15.34 -2.72
Average of z scores -8.02 -2.56 -7.45 -0.79 -0.94 -4.18 -7.45 -0.92 -6.25 -10.87 -6.41 -2.23 -0.64 -9.90 -3.98 -13.16 -10.57 -5.28 -11.00 -5.93
* Z scores were calculated on the basis of the native speaker mean and standard deviation. b Participant is within the NS range on all seven lexical measures. c Participant is within the NS range on the overall measure.
149
Table 23 Z Scores of the Child-Onset NNS Group in Spadaro (1998) Z scores on lexical tasks* Child-onset NNS group 1 2aa c 3 4a c 5a c 6b c 7b 8 9 10 11 12 13 14c 15 16 17 18 19 Mean
1
2
3
4
-1.90 0.95 -0.95 -0.95 0.95 0.00 -1.90 -0.95 0.00 0.00 0.00 0.95 -5.71 0.95 -2.86 0.95 -2.86 -6.67 0.00 -1.05
0.44 0.44 -2.52 -1.04 0.44 -0.30 -1.04 0.44 -1.04 0.44 -4.74 -1.04 -3.26 0.44 -0.30 -2.52 -0.30 -2.52 -0.30 -0.96
-2.81 0.31 0.31 0.31 0.31 -2.81 -2.81 -2.81 -2.81 -2.81 -9.06 -5.94 -5.94 -5.94 -9.06 -5.94 -2.81 -2.81 -2.81 -3.47
-4.53 0.74 -3.47 -0.32 0.74 -0.32 -1.37 -5.58 -1.37 -1.37 -2.42 -1.37 -3.47 0.74 -1.37 -2.42 -2.42 -1.37 -4.53 -1.87
5
6
-3.52 -10.38 -0.11 0.94 -4.66 -10.38 -0.11 -0.94 -1.25 -0.94 1.02 0.94 -1.25 -0.94 -6.93 -2.83 -4.66 -6.60 -3.52 -6.60 -3.52 -6.60 -2.39 -4.72 -4.66 -6.60 -0.11 0.94 -5.80 -4.72 -3.52 -14.15 1.02 -2.83 -0.11 0.94 -2.39 -2.83 -2.45 -4.12
7
Average of z scores
-0.66 0.54 -1.27 -0.06 -0.06 0.54 0.54 -0.66 -1.87 -0.06 -3.67 0.54 -0.06 0.54 -0.06 -5.48 -0.66 -0.66 -0.06 -0.66
-3.34 0.55 -3.28 -0.44 0.03 -0.13 -1.25 -2.76 -2.62 -1.99 -4.29 -1.99 -4.24 -0.35 -3.45 -4.73 -1.55 -1.89 -1.84 -2.08
* Z scores were calculated on the basis of the native speaker mean and standard deviation. aa Participant is nativelike on all on all three lexical measures (z > -1). a Participant is nativelike on all on all three lexical measures (z > -2). b Participant is within the NS range on all seven lexical measures. c Participant is within the NS range on the overall measure.
150
Table 24 Z Scores of the Native Speaker of English Control Group in Spadaro (1998) Z scores on lexical tasks* NS Group 1 2 3 4 5 6 7 8 9 10 Mean
1
2
3
4
5
6
7
Average of z scores
0.95 -1.90 -0.95 0.95 0.95 0.00 0.00 -0.95 0.00 0.95 0.00
0.44 -1.04 0.44 -1.04 0.44 1.19 -0.30 1.19 -1.78 0.44 0.00
0.31 0.31 0.31 0.31 0.31 0.31 0.31 0.31 -2.81 0.31 0.00
0.74 -0.32 0.74 0.74 0.74 -0.32 -2.42 -0.32 -0.32 0.74 0.00
1.02 -0.11 -0.11 1.02 -1.25 1.02 -1.25 1.02 -1.25 -0.11 0.00
-0.94 0.94 -0.94 0.94 -0.94 0.94 -0.94 0.94 -0.94 0.94 0.00
-0.06 0.54 -1.87 0.54 0.54 0.54 0.54 -1.87 0.54 0.54 0.00
0.35 -0.22 -0.34 0.50 0.11 0.53 -0.58 0.05 -0.94 0.55 0.00
* Z scores were calculated on the basis of the native speaker mean and standard deviation.
151
References Aitchison, J. (2003). Words in the mind: An introduction to the mental lexicon. (3rd ed.). Malden, MA: Blackwell Publishing. Anderson, J. R. (1983). The architecture of cognition. Cambridge, MA: Harvard University Press. Anglin, J. M. (1985). The child’s expressible knowledge of word concepts: What preschoolers can say about the meanings of some nouns and verbs. In K. E. Nelson (Ed.), Children’s Language. Hillsdale, NJ: Lawrence Erlbaum. Bahrick, H. P., Hall, L. K., Goggin, J. P., Bahrick, L. E., & Berger, S. A. (1994). Fifty years of language maintenance and language dominance in bilingual Hispanic immigrants. Journal of Experimental Psychology: General, 123 (3), 264-283. Bates, E., & MacWhinney, B. (1989). Functionalism and the competition model. In. B. MacWhinney & E. Bates (Eds.), The crosslinguistic study of sentence processing (pp. 3-76). Cambridge, UK: Cambridge University Press. Beretta, A., Campbell, C., Carr, T. H., Huang, J., Schmitt, L. M., Christianson, K., & Cao, Y. (2003). An ER-fMRI investigation of morphological inflection in German reveals that the brain makes a distinction between regular and irregular forms. Brain and Language, 85, 67-92. Birdsong, D. (1992). Ultimate attainment in second language acquisition. Language, 68, 706-755. Birdsong, D. (2005). Interpreting age effects in second language acquisition. In J. F. Kroll & A. M. B. De Groot (Eds.), Handbook of bilingualism: Psycholinguistic approaches (pp. 109-127). New York: Oxford University Press. Birdsong, D. (2006). Age and second language acquisition and processing: A selective overview. Language Learning, 56, 9-49. Birdsong, D., & Molis, M. (2001). On the evidence for maturational constraints in secondlanguage acquisition. Journal of Memory and Language, 44, 235-249. Bongaerts, T. (1999). Ultimate attainment in L2 pronunciation: The case of very advanced late L2 learners. In D. Birdsong (Ed.), Second language acquisition and the critical period hypothesis (pp. 133-159). Mahwah, NJ: Lawrence Erlbaum. Bongaerts, T., Van Summeren, C., Planken, B., & Schils, E. (1997). Age and ultimate attainment in the pronunciation of a foreign language. Studies in Second Language Acquisition, 19, 447-465. Breland, H. M., Jones, R. J., & Jenkins, L. (1994). The College Board vocabulary study. College Board Report No. 94-4. New York: College Board Publications. Chomsky, N. (1995). The Minimalist Program. Cambridge, MA: MIT Press. 181
Coppieters, R. (1987). Competence differences between natives and near-native speakers. Language, 63, 544-573. Cummins, J. (1981). Age on arrival and immigrant second language learning in Canada: A reassessment. Applied Linguistics, 11 (2), 132-149. D’Anna, C. A., Zechmeister, E. B., & Hall, J. W. (1991). Toward a meaningful definition of vocabulary size. Journal of Reading Behavior, 23 (1), 109-122. Daller, H., Milton, J., & Treffers-Daller, J. (Eds.) (2007). Modelling and assessing vocabulary knowledge. Cambridge, UK: Cambridge University Press. Davies, A. (2003). The native speaker: Myth and reality. Clevedon, UK: Multilingual Matters. Dunn, L. M., & Dunn, D. M. (1997). Peabody Picture Vocabulary Test: Examiner’s manual. Circle Pines, MN: American Guidance Service. Dunn, L. M., & Dunn, D. M. (2007). PPVT 4. Peabody Picture Vocabulary Test, fourth edition. Manual. Minneapolis, MN: Pearson. DeKeyser, R. M. (2000). The robustness of critical period effects in second language acquisition. Studies in Second Language Acquisition, 22, 499-533. Ehrlich, E., Flexner, S. B., Carruth, G., & Hawkins, J. M. (Eds.). (1980). Oxford American dictionary. New York: Oxford University Press. Eyckmans, J., Van de Velde, H.,Van Hout, R., & Boers, F. (2007). Learners’ response behaviour in yes/no vocabulary tests. In. H. Daller, J. Milton, & J. Treffers-Daller (Eds.), Modelling and assessing vocabulary knowledge (pp. 59-76). Cambridge, UK: Cambridge University Press. Fenson, L., Dale, P., Reznick, S., Thal, D., Bates, E., Hartung, J., Pethick, S., & Reilly, J. (1993). MacArthur Communicative Development Inventories: user’s guide and technical manual. San Diego, CA: Singular Publishing Group. Flege, J. E. (1999). Age of learning and second language speech. In D. Birdsong (Ed.), Second language acquisition and the critical period hypothesis (pp. 101-131). Mahwah, NJ: Lawrence Erlbaum. Flege, J. E., & Liu, S. (2001). The effect of experience on adults’ acquisition of a second language, Studies in Second Language Acquisition, 23, 527-552. Flege, J. E., MacKay, I., & Meador, D. (1999a). Native Italian speakers’ production and perception of English vowels. Journal of the Acoustical Society of America, 106 (5), 2973-2987. Flege, J. M, Munro, M. J., & MacKay, I. R. (1995). Factors affecting strength of perceived foreign accent in a second language. Journal of the Acoustical Society of America, 97 (5), 3125-3134. 182
Flege, J. E., Yeni-Komshian, G. H., & Liu, S. (1999b). Age constraints on second-language acquisition. Journal of Memory and Language, 41, 78-104. Fodor, J. (1983). The modularity of mind. Cambridge, MA: MIT Press. Fodor, J. (2000). The mind doesn’t work that way. Cambridge, MA: MIT Press. Forster, K. I. (1989). Basic issues in lexical processing. In W. Marslen-Wilson (Ed.), Lexical representations and process (pp. 75-107). Cambridge, MA: MIT Press. Francis, W. N., & Kucera, H. (1982). Frequency analysis of English usage: Lexicon and grammar. Boston, MA: Houghton Mifflin. Gay, L. R., & Airasian, P. (2003). Educational research: Competencies for analysis and application (7th ed.). Upper Saddle River, NJ: Merrill Prentice Hall. Goulden, R., Nation, P., & Read, J. (1990). How large can receptive vocabulary be? Applied Linguistics, 11 (4), 341-363. Green, S. B., & Salkind, N. J. (2003). Using SPSS for Windows and Macintosh: Analyzing and understanding data (3rd ed.). Upper Saddle River, NJ: Prentice Hall. Greidanus, T., Beks, B., & Wakely, R. (2005). Testing the development of French word knowledge by advanced Dutch- and English-Speaking Learners and Native Speakers. The Modern Language Journal, 89 (2), 221-233. Greidanus, T., Bogaards, P., van der Linden, E., Nienhuis, L., & de Wolf, T. (2004). The construction and validation of a deep word knowledge test for advanced learners of French. In P. Bogaards & B. Laufer (Eds.), Vocabulary in a second language: Selection, acquisition, and testing (pp. 209-227). Philadelphia, PA: John Benjamins. Greidanus, T., & Nienhuis, L. (2001). Testing the quality of word knowledge in a second language by means of word associations: Types of distractors and types of associations. The Modern Language Journal, 85 (4), 567-577. Hahne, A. (2001). What’s different in second-language processing? Evidence from eventrelated brain potentials. Journal of Psycholinguistic Research, 30 (3), 251-266. Hahne, A., & Friederici, A. D. (2001). Processing a second language: Late learners’ comprehension mechanisms as revealed by event-related potentials. Bilingualism: Language and Cognition, 4 (2), 123-141. Hedden, T., Lautenschlager, G, & Park, D. C. (2005). Contributions of processing ability and knowledge to verbal memory tasks across the adult life-span. The Quarterly Journal of Experimental Psychology, 58A (1), 169-190. Henriksen, B. (1999). Three dimensions of vocabulary development. Studies in Second Language Acquisition, 21, 303-317. Hyltenstam, K. (1988). Lexical characteristics of near-native second-language learners of Swedish. Journal of Multilingual and Multicultural Development, 9 (1 & 2), 67-84. 183
Hyltenstam, K. (1992). Nonnative features of near-native speakers: On ultimate attainment of childhood L2 learners. In R. J. Harris (Ed.), Cognitive processing in bilinguals (pp. 351-368). Amsterdam: Elsevier Science Publishing. Hyltenstam, K., & Abrahamsson, N. (2000). Who can become native-like in a second language? All some, or none? On the maturational constraints controversy in second language acquisition. Studia Linguistica, 54 (2), pp. 150-166. Hyltenstam, K., & Abrahamsson, N. (2003). Maturational constraints in SLA. In C. J. Doughty, & M. H. Long (Eds.), The handbook of second language acquisition (pp. 539-588). Malden, MA: Blackwell Publishing. Jackendoff, R. (2002). Foundations of language. New York: Oxford University Press. Johnson, J. S. (1992). Critical period effects in second language acquisition: The effects of written versus auditory materials on the assessment of grammatical competence. Language Learning, 42, 217-248. Johnson, J. S., & Newport, E. L. (1989). Critical period effects in second language learning: The influence of maturational state on the acquisition of English as a second language. Cognitive Psychology, 21, 60-99. Kauffman, S. A. (1993). The origins of order: Self organization in selection and evolution. Oxford, UK: Oxford University Press. Kim, E. J. (1997). The sensitive period for second-language acquisition: A reaction-time study of maturational effects on the acquisition of L2 lexico-semantic and syntactic systems. Dissertation Abstracts International, 57 (12), 5133A. (UMI No. 9717292) Kim, K. H. S., Relkin, N. R., Lee, K-M., & Hirsch, J. (1997). Distinct cortical areas associated with native and second languages. Nature, 388 (10), 171-174. Kroll, J. F., & Stewart, E. (1994). Category interference in translation and picture naming: Evidence for asymmetric connections between bilingual memory representations. Journal of Memory and Language, 33, 149-174. Kroll, J. F., & Sunderman, G. (2003). Cognitive processes in second language learners and bilinguals: The development of lexical and conceptual representations. In C. J. Doughty, & M. H. Long (Eds.), The handbook of second language acquisition (pp. 104-129). Malden, MA: Blackwell Publishing. Kroll, J. F., & Tokowicz, N. (2005). Models of bilingual representation and processing: Looking back and to the future. In J. F. Kroll & A. M. B. De Groot (Eds.), Handbook of bilingualism: Psycholinguistic approaches (pp. 531-553). New York: Oxford University Press. Kucera, H., & Francis, W. N. (1967). A computational analysis of present-day American English. Providence, RI: Brown University Press. 184
Lardiere, D. (2007). Ultimate attainment in second language acquisition: A case study. Mahwah, NJ: Lawrence Erlbaum. Lenneberg, E. (1967). Biological Foundations of Language. New York: Wiley. Levelt, W. J. M. (1989). Speaking: From intention to articulation. Cambridge, MA: MIT Press. Long, M. H. (2007). Problems in SLA. Mahwah, NJ: Lawrence Erlbaum. Long, M. H. (1990). Maturational constraints on language development. Studies in Second Language Acquisition, 12, 251-285. Lorge, I., & Chall, J. (1963). Estimating the size of vocabularies of children and adults: an analysis of methodological issues. Journal of Experimental Education, 32, 147-157. Marinova-Todd, S. H. (2003). Comprehensive analysis of ultimate attainment in adult second language acquisition. Dissertation Abstracts International, 64 (08), 2814A. (UMI No. 3100151) Marslen-Wilson, W. (1990). Activation, competition and frequency in lexical access. In G. Altmann (Ed.), Cognitive models of speech processing (pp. 148-172). Cambridge, MA: MIT Press. McDonald, J. L. (2000). Grammaticality judgments in a second language: Influences of age of acquisition and native language. Applied Psycholinguistics, 21, 395-423. Meara, P. (1996). The dimensions of lexical competence. In G. Brown, K. Malmkjaer, & J. Williams (Eds.), Performance and competence in second language acquisition (pp. 3553). Cambridge, UK: Cambridge University Press. Meara, P. M. (1999). Self-organisation in bilingual lexicons. In P. Broeder, & J. Muure (Eds.), Language and thought in development: Crosslinguistic studies (pp. 127-144). Tubingen, Germany: Gunter Narr. Mechelli, A., Crinion, J. T., Noppeney, U., O’Doherty, J., Ashburner, J., Frackowiak, R. S., & Price, C. J. (2004). Structural plasticity in the bilingual brain: Proficiency in a second language and age at acquisition affect grey-matter density. Nature, 431, 757. Montrul, S., & Slabakova, R. (2003). Competence similarities between native and nearnative speakers: An investigation of the preterite-imperfect contrast in Spanish. Studies in Second Language Acquisition, 25, 351-398. Morton, J., & Patterson, K. (1980). A new attempt at an interpretation, or, an attempt at a new interpretation. In M. Coltheart, K. Patterson, & J. Marshall (Eds.), Deep dyslexia (pp. 91-118). London: Routledge and Kegan Paul. Moyer, A. (1999). Ultimate attainment in L2 phonology. Studies in Second Language Acquisition, 21, 81-108. 185
Murre, J. M. J. (2005). Models of monolingual and bilingual language acquisition. In J. F. Kroll & A. M. B. De Groot (Eds.), Handbook of bilingualism: Psycholinguistic approaches (pp. 154-169). New York: Oxford University Press. Nassaji, H. (2004). The relationship between depth of vocabulary knowledge and L2 learners’ lexical inferencing strategy use and success. The Canadian Modern Language Review, 61 (1), 107-134. Nation, I. S. P. (1990). Teaching and learning vocabulary. New York: Newbury House. Nation, I. S. P. (1993). Using dictionaries to estimate vocabulary size: essential, but rarely followed procedures. Language Testing, 20, 27-40. Nation, P. (1983). Testing and teaching vocabulary. Guidelines, 5, 12-25. Nation, P. (2007). Fundamental issues in modeling and assessing vocabulary knowledge. In. H. Daller, J. Milton, & J. Treffers-Daller (Eds.), Modelling and assessing vocabulary knowledge (pp. 35-43). Cambridge, UK: Cambridge University Press. Newport, E. L. (1990). Maturational constraints on language learning. Cognitive Science 14, 11-28. Osterhout, L., McLaughlin, J., & Bersick, M. (1997). Event-related brain potentials and human language. Trends in Cognitive Science, 1 (6), 203-209. Oyama, S. (1978). The sensitive period and comprehension of speech. Working Papers on Bilingualism, 16, 1-17. Paradis, M. (2004). A neurolinguistic theory of bilingualism. Amesterdam: John Benjamins. Paribakht, T. S., & Wesche, M. (1997). Vocabulary enhancement activities and reading for meaning in second language vocabulary development. In J. Coady & T. Huckin (Eds.), Second language vocabulary acquisition: A rationale for pedagogy (pp. 174-200). New York: Cambridge University Press. Park, D. C., Lautenschlager, G., Hedden, T., Davidson. N. S., Smith, A. D., & Smith, P. K. (2002). Models of visuospatial and verbal memory across the adult life span. Psychology and Aging, 17 (2), 299-320. Pearson, P. D., Hiebert, E. H., & Kamil, M. L. (2007). Vocabulary assessment: What we know and what we need to learn. Reading Research Quarterly, 42 (2), 282-296. Pinker, S. (1994). The language instinct: How the mind creates language. New York: Morrow. Piske, T., MacKay, I. R. A., & Flege, J. E. (2001). Factors affecting degree of foreign accent in an L2: A review. Journal of Phonetics, 29, 191-215. Poulisse, N., & Bongaerts, T. (1994). First language use in second language production. Applied Linguistics, 15 (1), 36-57. 186
Qian, D. D. & Schedl, M. (2004). Evaluation of an in-depth vocabulary knowledge measure for assessing reading performance. Language Testing, 21 (1), 28-52. Qian, D. D. (2002). Investigating the relationship between vocabulary knowledge and academic reading performance: An assessment perspective. Language Learning, 52 (3), 513-536. Qian, D. D. (1999). Assessing the roles of depth and breadth of vocabulary knowledge in reading comprehension. The Canadian Modern Language Review, 56 (2), 282-307. Read, J. (1993). The development of a new measure of L2 vocabulary knowledge. Language Testing, 10, 355-371. Read, J. (1998). Validating a test to measure depth of vocabulary knowledge. In A. J. Kunnan (Ed.), Validation in language assessment: Selected papers from the 17th Language Testing Research Colloquium, Long Beach (pp. 41-60). Mahwah, NJ: Laurence Erlbaum. Read, J. (2000). Assessing vocabulary. New York: Cambridge University Press. Read, J. (2004). Plumbing the depths: How should the construct of vocabulary knowledge be defined? In P. Bogaards & B. Laufer (Eds.), Vocabulary in a second language: Selection, acquisition, and testing (pp. 209-227). Philadelphia, PA: John Benjamins. Read, J., & Chapelle, C. A. (2001). A framework for second language vocabulary assessment. Language Testing, 18 (1), 1-32. Rumelhardt, D. E., & McClelland, J. L. (Eds.). (1986). Parallel distributed processing: Explorations in the microstructure of cognition. Volume 1: Foundations. Cambridge, MA: MIT Press. Sanders, L. D., & Neville, H. J. (2003a). An ERP study of continuous speech processing. II. Segmentation, semantics, and syntax in nonnative speakers. Cognitive Brain Research, 15, 214-227. Sanders, L. D., & Neville, H. J. (2003b). An ERP study of continuous speech processing. I. Segmentation, semantics, and syntax in native speakers. Cognitive Brain Research, 15, 228-240. Schmitt, N. (2000). Vocabulary in language teaching. New York: Cambridge University Press. Schmitt, N., & Meara, P. (1997). Researching vocabulary through a word knowledge framework. Studies in Second Language Acquisition, 20, 17-36. Schmitt, N., Schmitt, D., & Clapham C. (2001). Developing and exploring the behaviour of two new versions of the Vocabulary Levels Test. Language Testing, 18 (1), 5588. 187
Scovel, T. (2000). A critical review of the critical period research. Annual Review of Applied Linguistics, 20, 213-223. Silverberg, S., & Samuel, A. G. (2004). The effect of age of second language acquisition on the representation and processing of second language words. Journal of Memory and Language, 51, 381-398. Singleton, D. (1999). Exploring the second language mental lexicon. New York: Cambridge University Press. Singleton, D. (2001). Age and second language acquisition. Annual Review of Applied Linguistics, 21, 77-89. Slabakova, R. (2006). Is there a critical period for semantics? Second Language Research, 22 (3), 302-338. Spadaro, K. M. (1998). Maturational constraints on lexical acquisition in a second language. Unpublished doctoral dissertation, University of Western Australia. Thorndike, E. L., & Lorge, I. (1944/1952). The teacher’s workbook of 30,000 words. New York: Columbia University Teacher’s College. Ullman, M. T. (2001). The declarative/procedural model of lexicon and grammar. Journal of Psycholinguistic Research 30 (1), 37-69. Urponen, M. I. (2004). Ultimate attainment in postpuberty second language acquisition. Dissertation Abstracts International, 64 (11), 4033A. (UMI No. 3113378) Vermeer, A. (2001). Breadth and depth of vocabulary in relationship to L1/L2 acquisition and frequency of input. Applied Psycholinguistics, 22, 217-234. Vermeer, A. (2004). The relationship between lexical richness and vocabulary size in Dutch L1 and L2 children. In P. Bogaards & B. Laufer (Eds.), Vocabulary in a second language: Selection, acquisition, and testing (pp. 209-227). Philadelphia, PA: John Benjamins. Wartenburger, I., Heekeren, H. R., Abutalebi, J., Cappa, S. F., Villringer, A., & Perani, D. (2003). Early setting of grammatical processing in the bilingual brain. Neuron, 37, 159-170. Weber-Fox, C. M., & Neville, H. J. (1996). Maturational constraints on functional specializations for language processing: ERP and behavioral evidence in bilingual speakers. Journal of Cognitive Neuroscience, 8 (3), 231-256. Webster’s Third New International Dictionary. (1961). Springfield, MA: Merriam-Webster. Wesche, M., & Paribakht, T. M. (1996). Assessing vocabulary knowledge: Depth vs. breadth. Canadian Modern Language Review 53, 13-40. West, M. (1953). The general service list of English words. London: Longman, Green, & Co. 188
White, L., & Genesee, F. (1996). How native is near-native? The issue of ultimate attainment in adult second language acquisition. Second Language Research, 12 (3), 233-265. Williams, K. T., & Wang, J. (1997). Technical references to the Peabody Picture Vocabulary Test Third Edition (PPVT-III). Circle Pines, MN: American Guidance Service. Wolter, B. (2001). Comparing the L1 and L2 mental lexicon: A depth of individual word knowledge model. Studies in Second Language Acquisition, 23, 41-69. Zechmeister, E. B., Chronis, A. M, Cull, W. L., D’Anna, C. A., & Healy, N. A. (1995). Growth of a functionally important lexicon. Journal of Reading Behavior, 27 (2), 201-212. Zechmeister, E. B., D’Anna, C. A., Hall, J. W., Paus, C. H., & Smith, J. A. (1993). Metacognitive and other knowledge about the mental lexicon: Do we know how many words we know? Applied Linguistics, 14 (2), 188-206. Zechmeister, E. B., Morgan, P. M., Kruger, D. J., & Fash, H. K. (1998, August). Carving up the young adult lexicon. Poster session presented at the 106th Annual Meeting of the American Psychological Association, San Francisco, CA. Zimmerman, K. J. (2004). The role of vocabulary size in assessing second language proficiency. Unpublished master’s thesis, Brigham Young University, Provo, UT.
189