Behav Res DOI 10.3758/s13428-013-0381-7
`
Research applications for An Object and Action Naming Battery to assess naming skills in adult Spanish–English bilingual speakers Lisa A. Edmonds & Neila J. Donovan
# Psychonomic Society, Inc. 2013
Abstract Virtually no valid materials are available to evaluate confrontation naming in Spanish–English bilingual adults in the U.S. In a recent study, a large group of young Spanish–English bilingual adults were evaluated on An Object and Action Naming Battery (Edmonds & Donovan in Journal of Speech, Language, and Hearing Research 55:359–381, 2012). Rasch analyses of the responses resulted in evidence for the content and construct validity of the retained items. However, the scope of that study did not allow for extensive examination of individual item characteristics, group analyses of participants, or the provision of testing and scoring materials or raw data, thereby limiting the ability of researchers to administer the test to Spanish–English bilinguals and to score the items with confidence. In this study, we present the indepth information described above on the basis of further analyses, including (1) online searchable spreadsheets with extensive empirical (e.g., accuracy and name agreeability) and psycholinguistic item statistics; (2) answer sheets and instructions for scoring and interpreting the responses to the Rasch items; (3) tables of alternative correct responses for English and Spanish; (4) ability strata determined for all naming conditions (English and Electronic supplementary material The online version of this article (doi:10.3758/s13428-013-0381-7) contains supplementary material, which is available to authorized users. L. A. Edmonds (*) Department of Speech Language and Hearing Sciences, University of Florida, PO Box 117420, 351 Dauer, Gainesville, FL 32611, USA e-mail:
[email protected] L. A. Edmonds Brain Rehabilitation Research Center, Malcolm Randall Veterans Affairs Hospital, Gainesville, FL, USA N. J. Donovan Department of Communication Sciences and Disorders, Louisiana State University, Baton Rouge, LA, USA
Spanish nouns and verbs); and (5) comparisons of accuracy across proficiency groups (i.e., Spanish dominant, English dominant, and balanced). These data indicate that the Rasch items from An Object and Action Naming Battery are valid and sensitive for the evaluation of naming in young Spanish–English bilingual adults. Additional information based on participant responses for all of the items on the battery can provide researchers with valuable information to aid in stimulus development and response interpretation for experimental studies in this population. Keywords Rasch . Naming . Bilingual . Spanish . Actions . Objects Confrontation naming is a common experimental task used across many disciplines (e.g., psychology, cognitive neuropsychology, and communication sciences) to evaluate lexical retrieval in bilingual populations. Naming data are used both to understand variables related to lexical retrieval in bilingual adults and/or to serve as a normal reference for participants with communication disorders such as aphasia and dementia. However, psychometrically sound materials are lacking for researchers to validly assess naming across languages and grammatical classes (nouns and verbs) within and across bilingual participants. To address this issue, Edmonds and Donovan (2012) evaluated object and action naming with An Object and Action Naming Battery (An O&A Battery; Druks & Masterson, 2000) in 91 Spanish–English bilingual adults in the United States. The responses from this young bilingual group were subjected to item response theory (IRT) modeling—specifically, to Rasch analysis. Rasch analysis has a number of measurement attributes that make it ideal for evaluating responses from a bilingual population, given the potential variability of language acquisition, use, and proficiency within and across bilinguals (Grosjean, 1998). First, IRT modeling provides
Behav Res
information about the item-level properties of an instrument, beyond how the test functions as a whole (as is done in classical measures of validity and reliability; Fendrich, Smith, Pollack, & Mackesy-Amiti, 2009; Messick, 1995). Second, IRT uses probability modeling, which converts ordinal data into interval data for both person ability and item difficulty, making it possible to directly and validly compare the performance of one individual across languages (e.g., English nouns to Spanish nouns), grammatical classes (English nouns to English verbs), or both (English nouns to Spanish verbs). Additionally, two individuals who obtain a certain score can be compared to each other across languages and conditions. This cannot be done, for example, using a value such as the percent correct on a naming test (Bond & Fox, 2007; Wright, 1999). Furthermore, interval scores allow researchers to report interval-level changes with confidence, since an increase in interval score represents significant improvement, and a decrease signifies regression (Bond & Fox, 2007; Wright, 1999). The results from the Rasch analysis showed unidimensionality across noun and verb responses in English and Spanish, as well as robust item-level psychometric properties, providing evidence for content validity. Additionally, few items misfit the model; no ceiling or floor effects emerged after the uninformative and misfitting items were removed; and the items reflected a wide range of difficulties. Furthermore, reliability coefficients were high, and the numbers of statistically different ability levels provided indices of sensitivity (Edmonds & Donovan 2012). Evidence for construct validity for the items across languages was obtained by revealing a theoretical relationship between the item hierarchy (on the basis of accuracy) and psycholinguistic variables known to affect naming performance, such as imageability, age of acquisition (of the item), and frequency. Participant variables such as self-ratings of proficiency and language use also contributed to regression models of item accuracy across the four naming conditions (English and Spanish nouns and verbs). Thus, the findings from the Rasch and regression analyses suggested adequate content and construct validity of the items from An O&A Battery retained in the analysis for young Spanish–English bilingual adults.
score sheets that would convert the raw scores into a more interpretable format. Additionally, we would provide item statistics resulting from participant responses (e.g., accuracy and H statistics [a metric of name agreeability]) for the Rasch items (and remaining O&A Battery items) in a searchable Excel sheet that could be used for stimulus development and response interpretation. In addition to the item analyses and response sheets, we compared naming accuracy across and within three proficiency groups (English dominant [ED], Spanish dominant [SD], and balanced bilingual [BB]). We know that the selfratings of overall proficiency from these groups significantly correlated with naming accuracy in all conditions (Pearson rs = .377, .779, .695, and .762, ps < .01, for English and Spanish nouns and English and Spanish verbs, respectively; Edmonds & Donovan, 2012); however, we did not know to what extent the subgroups of participants formed by their relative proficiencies across languages would exhibit different naming patterns. Such information could reveal whether groups determined by self-rating measures are differentiated by naming abilities on the Rasch items, which could aid researchers in conceptualizing the experimental groups for which relative proficiency is a consideration. Furthermore, whereas a number of reports have been related to naming abilities in English-dominant or balanced-bilingual groups in the U.S. (e.g., Kohnert, Hernandez, & Bates, 1998; Muñoz & Marquardt, 2003), there have been few reports of naming in healthy young SD participants, which are included here. Previous research with bilinguals has shown that selfreports of proficiency are generally predictive of language performance on constrained language tasks (e.g., Kohnert et al., 1998; Marian, Blumenfeld, & Kaushanskaya, 2007; Muñoz & Marquardt, 2003, 2008). The participants from Edmonds and Donovan (2012) as a whole did not rate their proficiency across languages differently, though the range of participant ratings indicated subgroups of participants with higher ratings in one language. In the present study, we broke the participants into proficiency groups based on the participants’ self-ratings of proficiency (see the Method section) to determine whether naming accuracies on the Rasch items would discriminate the groups (e.g., whether Spanish naming would be better in the Spanish-dominant group than in the English-dominant group, etc.).
The present study The primary purpose of this study was to provide analyses, raw data, and testing materials to researchers in order to facilitate functional, interpretable, and flexible use of the Rasch items retained from An O&A Battery for young bilingual adults. For the Rasch items, we set out to determine the ability strata for all four naming conditions (English and Spanish nouns and verbs) and to provide instructions and
Method Participants A group of 91 Spanish–English young bilingual adults (M = 22.54 years, SD = 5.09) who reported negative for history of language/neurological disorders, drug/alcohol addiction, or
Behav Res
proficiency in a third language were recruited from central and south Florida. The participants reported being born in 15 different countries, with most being from the United States (38.67 %), followed by Colombia (23 %) and Venezuela (14 %). We asked participants what country or countries influenced their Spanish, and 18 countries were represented, with Colombia representing the highest percentage (26 %), followed by Venezuela (14 %) and Cuba (13 %). Participants self-reported the ability to communicate functionally in English and Spanish in most situations. Participants’ responses from a language-use questionnaire revealed significant differences (p < .001) across the participants related to (1) age of exposure to each language (English M = 6.99 [SD = 7.37] years; Spanish M = 0.13 [3.07] years) and (2) percentage of time using each language, with more English (66.59 %) than Spanish (19.20 %) use. Bilingual contexts made up the difference (14.21 %). Despite the difference in use, participants’ self-reports of overall proficiency (1–7 scale) did not result in a significant difference of proficiencies across languages (English rating M = 6.53 [0.66]; Spanish rating M = 6.07 [0.97]) (Edmonds & Donovan, 2012). To report their overall proficiency in each language, participants rated their proficiency in each language on a bilingual questionnaire using a 1–7 scale (1 = Not fluent/No fluido, 7 = Native fluency/Bien fluido). To determine the proficiency groups in the present study, a difference score (English–Spanish) was calculated between the overall self-ratings of proficiency across languages. The difference scores were then converted into z scores. From those, participants with a score of < −1.0 were deemed Spanish dominant (SD: N = 16), and those with a score >1.0 were deemed English dominant (ED: N = 19). Participants with scores in between were categorized as balanced bilinguals (BB: N = 56). See Table 1 for demographic information for each group. Materials An O&A Battery was developed to assess confrontation naming of objects (N = 162) and actions (N = 100) in research and clinical populations. The objects and actions were matched on the psycholinguistic variables (in English) of printed word frequency and rated age of acquisition and familiarity (Druks & Masterson, 2000). The authors suggested that the stimuli could be used in psycholinguistic research, with paradigms including timed naming, lexical-decision tasks, priming tasks, and imaging studies. Furthermore, the object stimuli can be grouped into different semantic categories (e.g., animals, food items) for testing or treatment. The materials can also be used to investigate potential grammatical impairments in clinical populations (i.e., greater impairment in verb than in noun naming, and vice versa; see Druks & Masterson, 2000, and Masterson & Druks,
1998). Thus, the stimuli have great potential use, not only for monolingual, but for bilingual populations. Procedure First, participants completed a comprehensive language use questionnaire (largely adapted from Muñoz, Marquardt & Copeland 2009). Then, all items from An O&A Battery were administered to the participants by a bilingual research assistant. The participants were tested over two sessions, generally 5–7 days apart, with one session in English and one in Spanish. The order of language testing was counterbalanced across participants. During each session, the examiner established the language of testing and stayed in that language throughout the session. The 262 black-and-white pictures from An O&A Battery were shown on a 17-in. computer monitor, and participants were instructed to name the picture as quickly as possible. No feedback or cues were given to the participants during naming. For the Spanish nouns, participants rarely used the article (e.g., la mano (“hand”), and accuracy determination did not depend on the inclusion or accuracy (e.g., el mano) of the article. For the verbs, participants were instructed to respond in the present progressive form (e.g., “eating”/comiendo), in order to elicit the same form across languages. Participants rarely used a different verb form (e.g., the infinitive) in either language, but if they did, it was still considered correct, assuming that the root of the word was accurate. The language of testing across sessions was counterbalanced across participants, and within each session, half of the participants named verbs first, and half named nouns first. Alternative correct responses In order to score the responses most accurately, the variability of correct responses must be considered. U.S. bilingual groups are diverse and can represent different English and Spanish dialects, which increases lexical variability. (We did not ask participants about potential regional influences of their English, though the participants were all recruited from Florida.) Furthermore, contact between Spanish and English in dense areas of bilingual speakers can result in the use of calques, which are words or phrases constructed by using a word or phrase from one language and translating it into the other language. Some examples of word calques in Spanish that borrow from English include aplicación for “application” (solicitud in standard Spanish) or grado for “grade” (nota in standard Spanish) (Silva-Corvalán, 1994). Some examples from our data include librería (standard is biblioteca) for “library” and collar (cuello in standard Spanish) for “collar.” Thus, the alternative correct responses provided will assist researchers when evaluating unfamiliar or seemingly incorrect responses, especially when they
Behav Res
reflect nonstandard Spanish responses that may result from dense contact between languages. To determine correct responses, bilingual research assistants used monolingual and bilingual dictionaries as resources, including dictionaries of Spanish usage in U.S. (e.g., Cobos 2003; Galván, 1995). With respect to pronunciation, participants were not penalized for dialectal variations (e.g., colbata for corbata). However, pronunciation was evaluated for words that shared many or all phonemes across languages (e.g., “banana,” “bus”) to ensure that the word was pronounced using the appropriate phonemes for the target language. The
intent of the scoring was to capture use rather than to impose “correctness” on the basis of predetermined translations. The alternative correct responses also reflect multiple acceptable terms for an item in English (e.g., “curtains/drapes”) or Spanish (e.g., tomando/bebiendo, “drinking”) and items for which picture ambiguity resulted in responses with different meanings (e.g., the picture of an entry ticket also looks like a receipt, so both terms were accepted). Code-switching responses (e.g., hueso for “bone”) were extremely rare ( 2) in all conditions. These results indicate that English nouns (2.43) and English verbs (2.47) had reliability coefficients of approximately 80 %, and Spanish nouns (3.89) and Spanish verbs (3.53) had reliability
coefficients of >90 %. In the present study, we used these results to calculate ability strata. Using the appropriate computation and data derived from the person separation index for each condition, the statistically distinct numbers of ability levels (strata) were computed. The strata were 4 for English nouns and English verbs, 5 for Spanish verbs, and 6 for Spanish nouns, indicating that the Rasch items from An O&A Battery separated the sample into four to six statistically separate levels of naming ability. See Table 4 for details.
Interval scale score sheets Background To develop the interval scale score sheets, Rasch analysis (similar to the IRT one-parameter [PL] model) was used. Various IRT models exist (i.e., 1-, 2-, and 3-PL models) (Cook, O’Malley, & Roddey, 2005; Fries, Bruce, & Cella, 2005). The Rasch model has the same mathematical computation as the 1-PL IRT model, though it has different theoretical underpinnings (Rasch, 1960/1980). Rasch modeling assumes uniform discrimination and guessing across all items. We chose to use the Rasch model to determine the item-level psychometrics of An O&A Battery in this first study because the central concern was that the items assess the full range of person ability of a latent trait (Messick, 1995; Wright, 1997; Wright & Stone, 1979) and, in part, because validity depends on whether the items in a test capture the construct to be measured. An O&A Battery met the requisite assumptions for Rasch modeling: (1) The latent trait that was modeled was unidimensional (i.e., measured a single trait—in this case, noun and verb naming); (2) the items were equally discriminating (i.e., ideally an item will differentiate among people with different ability levels); and (3) items were locally independent (i.e., a correct response on one item could not be due to a correct response on another item). Therefore, we could maximize the mathematical benefits of intervality to compute ability strata and develop ability maps and accompanying materials that might be beneficial to researchers who currently lack valid or reliable assessments for Spanish–English bilingual speakers. Table 4 Strata for stimuli from An Object and Action Naming Battery by response condition, with person separation indices
English nouns English verbs Spanish nouns Spanish verbs
Person separation index
Strata*
2.43 2.47 3.89 3.51
4 4 6 5
* Formula for strata=[(person separation index×4)+1]/3 (Wright & Masters 2002)
Behav Res
Please be aware that the Rasch materials do not include all of the items on An O&A Battery, since Rasch analysis allows misfitting and irrelevant items (e.g., those named with 100 % accuracy) to be deleted (Edmonds & Donovan, 2012). Using the score sheets The Appendix provides instructions for how to evaluate and interpret participant scores in each naming condition (English and Spanish nouns and verbs) and includes the scoring forms (including a list of the items to administer) and logit-to-0–100 standard scale conversion forms for each naming condition. As we discussed earlier, the interval scores that result from the Rasch analysis of An O&A Battery allow for direct comparisons of the results. For instance, individual participants can be compared across language and grammatical classes in order to evaluate relative performance across and within languages. Comparisons can also be made at different time points for researchers interested in investigating whether naming has changed (e.g., after teaching certain items, to measure potential attrition) or remained static. Furthermore, interval scoring allows one to report the degree of improvement instead of just reporting a numeric change in raw score or percent correct. To illustrate, An O&A Battery is normally scored on the basis of counting the correct responses and computing a percent correct. Percent correct is not an interval measure, so we cannot say that a change from 10 % to 20 % correct indicates that the person is twice as good at naming the second time. However, if a person increases from a scale score of 10 to 20, we can say that the person is two times better at naming the second time. These scores are currently only valid for young Spanish– English bilingual adults, so using them with clinical populations (e.g., for pre- to posttreatment evaluation in persons with bilingual aphasia) should be interpreted cautiously. See Appendices A and B for explanations of the raw-score-toscale-score conversion. Finally, participants can be compared to each other. For example, knowing that two people received scores of 30 % correct on An O&A Battery does not provide information about the words named correctly and incorrectly. However, because item difficulty and person ability are calibrated on the same interval scale, if two people receive standard scores of 30, the likelihood is high that they have produced the easier items below 30 correctly, but the more difficult items above 30 incorrectly. If participants score differently, the participant with a higher score presumably has higher naming abilities, and it is easy to determine which items were named accurately and inaccurately.
accuracies (as percents correct). Between groups, we found a significant effect of proficiency [F(2) = 4,455.03, p = .000]. Within groups, we also found a significant effect of naming condition (e.g., English verbs) [F(3) = 8,876.74, p = .000], with a significant interaction between naming condition and proficiency group [F(6) = 19,434.74, p = .000]. For the post hoc tests and paired t tests below, Cohen’s effect sizes (d) were calculated by dividing the difference of the two means being compared by the pooled standard deviations of both means (Cohen, 1988). To our knowledge, no benchmarks for bilingual naming exist, so the effect sizes are offered in order to evaluate relative magnitudes across comparisons. Bonferroni post hoc comparisons revealed significant differences across the English- and Spanish-dominant groups for all naming conditions (English nouns, p = .002, d = 1.76; Spanish nouns, p = .000, d = 2.94; English verbs, p = .000, d = 2.04; Spanish verbs, p = .000, d = 2.65). Comparisons between the SD and the BB groups revealed significantly more accurate naming in the BB group for English verbs only (p = .000, d = 1.59), with no differences in naming between groups for English nouns (p = .121, d = 0.50), Spanish nouns (p = .171, d = 0.78), and Spanish verbs (p = .513, d = 0.46). Comparisons between the ED and BB groups revealed significantly more accurate naming in the BB group for naming Spanish nouns (p = .000, d = 1.75) and Spanish verbs (p = .000, d = 1.77), with no differences between groups for naming English nouns (p = .083, d = 1.31) or English verbs (p = .730, d = 0.39). To determine in which naming conditions languagenaming differences occurred, paired t tests were conducted. Each naming condition was subjected to a t test three separate times (i.e., noun naming and verb naming were compared for the SD, BB, and ED groups), so the p value for each comparison was set at .0125 (.05/3) in order to adjust for multiple comparisons. Within the ED group, English naming was more accurate than Spanish naming for nouns [t(18) = 8.805, p = .000] and verbs [t(18) = 8.920, p = .000], with d values of 8.88 and 5.03, respectively. A similar pattern was observed for the SD group, in which the dominant language (Spanish) was named more accurately for both nouns [t(15) = −2.910, p = .011] and verbs [t(15) = −3.749, p = .002], with d values of 1.0 and 12.10, respectively. For the BB group, we found no difference in naming nouns across languages [t(55) = 0.842, p = .403], with a d of 0.18; however, verb accuracy was higher in English than in Spanish [t(55) = 4.636, p = .001], with a d of 7.0. See Table 5 for averages and the statistical details.
Discussion Proficiency group comparisons A mixed-model 3 × 4 repeated measures analysis of variance was conducted comparing between- and within-group naming
The purpose of the present study was to provide item and participant analyses, testing materials, and raw data to aid researchers in the evaluation of confrontation naming in
sig .002 **
Significant value (p) was set at .0125, due to multiple (i.e., three) comparisons (.05/3). ED, English dominant; BB, balanced bilingual; SD, Spanish dominant
70.30–92.15 83.00 (6.87) 30.49–90.24 68.44 (16.82) sig .011 81.89–100 93.74 (4.76) 83.86 (12.51) SD (N = 16)
55.20–99.20
38.95–83.60 38.00–94.05 58.48 (11.62) 78.79 (11.31) 64.63–98.78 45.12–100.00 37.01–90.55 46.46–99.21 64.43 87.51 (11.24) 85.60–99.10 51.20–99.20 94.87 (3.38) 89.41 (9.48) ED (N = 19) BB (N = 56)
Avg (StDev) Avg (StDev)
Range
Spanish English
Range
.000 .403
p Value
sig n.s.
93.30 (7.50) 89.85 (10.03)
Avg (StDev) Avg (StDev)
English Sig*
Verbs Nouns Group
Table 5 Within-language proficiency group comparisons for percent correct accuracy across languages for nouns and verbs
Range
Spanish
Range
.000 .000
p Value*
sig sig
Sig*
Behav Res
young Spanish–English bilinguals, for whom few valid measurement tools have been published. The person separation indices, which ranged from 2.43 to 3.89, corresponded to person ability strata of 4 (for both English conditions), 5 (Spanish verbs), and 6 (Spanish nouns), indicating a preliminary indication of the retained items’ sensitivity. Considering that this was a group of young, normal Spanish–English bilingual speakers, it is notable that four to six strata of proficiency levels emerged. As we stated earlier, once testing was complete, the results could be used in a number of ways, since interval measures allow for direct comparisons of results. For example, the person–item map, a graphic representation of naming ability for each condition (person–item maps), might provide researchers a way to visualize and compare ability levels within and across participants, languages, grammatical classes, and time points. Although the strata describe the sensitivity of items in order to determine ability levels, we also wanted to examine whether naming accuracies on the Rasch items would discriminate groups formed from self-ratings of proficiency. Such grouping also provides insight into the demographic and language variability of the group. The ED and BB groups were both exposed to Spanish earlier than English, which is typical of sequential learners in the U.S., whereas the SD group’s ages of exposure were more characteristic of learning English as a second language relatively later in life. Regarding the ED and BB groups, both used English more than Spanish, but the SD group used similar amounts of English and Spanish. The ED and SD groups both reported more education in their dominant languages, whereas the BB group did not show a difference in years of education across languages. However, it should be noted that it is difficult to interpret these data, since it is not clear at what level these years of education were achieved (e.g., primary school vs. university). More precise questioning of education histories will be needed in future studies. The finding of overall differences in naming abilities across the proficiency groups adds validity to the proficiency ratings provided by the participants and is consistent with previously reported correlation findings (Edmonds & Donovan, 2012). We observed clear differences in naming performance across the ED and SD groups, with the dominant groups exhibiting higher naming accuracies in the more-dominant language for each group. Also, the BB and ED groups named similarly to each other for English nouns and verbs, but they differed on Spanish naming. Furthermore, the SD and BB groups named similarly to each other for both Spanish naming conditions, but in English they differed only on verb naming. It is not evident why the SD group scored the same as the BB group on English nouns, whereas the ED group exhibited differences in Spanish noun naming when compared to the BB group. Inspection of the proficiency ratings for both dominant groups revealed similar averages and standard
Behav Res
deviations for the self-ratings of overall proficiency in both languages, as well as in reading (related to vocabulary knowledge). One possible reason is differences in language use, with the ED group reporting Spanish use only 16 % of the time, as compared to the SD group, which reported use of English 43 % of the time. Amounts of use have been associated with proficiency measures in children exposed to English and Spanish at home and/or school (e.g., Chesterfield, Chesterfield & Chávezy, 1982) and with adult heritage speakers (e.g., Muñoz & Marquardt, 2008), as well as in second language acquisition in more formal settings (e.g., Hulstijn & Laufer, 2001; Payne & Whitney, 2002). Additionally, differences in manners of acquisition might have influenced noun knowledge if some of the participants from the SD group formally took English lessons, in which explicit vocabulary instruction may have occurred. The finding that verb naming was more sensitive to differences across groups highlights the importance of verb/action testing in bilingual adults, not only for a comprehensive picture of lexical retrieval abilities, but also to increase the potential of discriminating differences across proficiency groups. It has also been shown that verb naming may be more predictive of efficiency in discourse than is noun naming (Edmonds, 2013). However, crosslinguistic differences in difficulty for the verbs in this word set might have been present, including for verbs that are reflexive in Spanish (but not in English), which have optional bound clitic pronouns [e.g., agachando(se); literally, “bending (himself)”], and a few instances of verbs with multiple words (e.g., haciendo malabares, “juggling”). One fundamental difference between English and Spanish verbs are path verbs (Aske, 1989), in which Spanish verbs tend to combine path and motion (e.g., “enter”), and English verbs combine manner or cause with motion (e.g., “squeezed through” and “kicked,” respectively) (these examples are from Aske, 1989). However, the verb list from An O&A Battery contains very few path verbs, and of those, some have adopted the integration of path and motion in the English form (e.g., “climbing”/subiendo, “swinging”/ mesiendo) (Aske, 1989). The within-proficiency-group findings verified that the ED and SD groups performed with higher accuracy in their respective dominant languages. However, even though the BB group rated their proficiency similarly across languages, and their noun performance across languages was similar, verbnaming accuracy was higher in English than in Spanish, again highlighting the relative difficulty and sensitivity of verb naming. Thus, the proficiency self-ratings for the BB group indicated equal proficiency across languages, but Spanish verbs were relatively difficult for this group, yielding only about 77 % accuracy. These findings make the Rasch person– item map extremely valuable, because a person’s score reveals where his or her abilities lie for Spanish verbs, relative both to the other naming conditions and to other bilinguals. As for
understanding why Spanish verbs were less accurately named than English verbs, the explanation may again relate to the percentage of time using each language, with significantly less Spanish use than English use being reported for the BB group. Finally, it is important to keep in mind that the participants self-reported as being “functional in both languages in most situations,” and confrontation naming is a constrained task that does not allow for circumlocution, code-switching (in our scoring system), or some other manner of conveying an idea that could be used in natural conversation. In other words, the BB group might perceive their abilities to be similar across languages in daily conversation, and as such, their proficiency ratings did not reflect lower ratings in Spanish. However, the verb accuracy scores were sensitive to relative differences in verb knowledge. The Rasch items provided here are composed of valid difficulty hierarchies. Additionally, the alternative correct responses, H statistics, and percent accuracy information from the same population from which the Rasch items were determined provide vital information to consider when designing studies or interpreting findings. Typically, researchers attempt to match naming stimuli on psycholinguistic variables known to affect naming (e.g., frequency, age of acquisition, or length), which can be difficult at best in one language, especially with nouns and verbs. In fact, some researchers have said that it “may be impossible . . . to match action- and objectnaming stimuli in fully orthogonalized experimental designs” (Szekely et al. 2005, p. 20). With two languages, these tasks are decidedly more complex, because variables such as age of acquisition and frequency ratings are not available for bilinguals, and monolingual ratings are incomplete and not valid (which is why they are not provided in the spreadsheet; the interested reader is referred to Edmonds & Donovan, 2012, for references to monolingual sources). Furthermore, trying to match on variables with inherent language differences (e.g., length) makes the process prohibitive. Thus, the valid item hierarchies resulting from the Rasch analyses and the corresponding item statistics, which come directly from a bilingual population, provide a unique set of data and testing materials for researchers. In summary, the purpose of this article was to provide psychometrically sound and user-friendly data and analyses for researchers interested in evaluating object and action naming in young bilingual adults. The information provided may also assist researchers with stimulus development for other experimental paradigms, including priming, timed naming, imaging, and more. The materials provided make it easy to evaluate naming of the Rasch items for English and Spanish nouns and verbs. The results can easily be converted to interval scores calibrated on a 0–100 scale for interpretation with respect to ability levels, which range from 4 to 6 across naming conditions. Furthermore, the scoring and interpretation of responses is made easier here, due to the provision of
Behav Res
alternative correct responses, H statistics, and item accuracy. Once scored, responses can be validly compared within and across languages and grammatical classes for individuals or groups. Additionally, researchers can search the online Excel worksheets that contain all of the 262 items from An O&A Battery in order to develop or interpret stimuli for specific study questions. Finally, the proficiency group results will aid researchers in approximating or comparing their participants to those provided in the present study. Finally, we recognize that a lack of testing materials remains for bilingual adults with acquired neurogenic communication disorders, such as aphasia and dementia. Nonetheless, we remind researchers and clinicians that the information provided here is relevant only for young, healthy Spanish–English bilingual adults. Thus, we caution against using these materials for patient populations. More research will be needed in order to understand naming in populations of Spanish–English bilingual older adults and bilinguals with communication disorders. Author note We extend our gratitude to the following students in the Aphasia and Bilingualism Lab at the University of Florida, who assisted with data collection, management, and analysis: Vanessa Maltby, Maria Andreína Nieto-Quintero, Stacey Herlofsky, Gabriel Mayora, Bianka Vasquez, Melissa Iandoli, and Kristin Goldman.
References Aske, J. (1989). Path predicates in English and Spanish: A closer look. In Proceedings of the Fifteenth Annual Meeting of the Berkeley Linguistics Society (pp. 1–14). Berkeley: Berkeley Linguistics Society. Barry, C., Morrison, C. M., & Ellis, A. W. (1997). Naming the Snodgrass and Vanderwart pictures: Effects of age of acquisition, frequency, and name agreement. Quarterly Journal of Experimental Psychology, 50A, 560–585. doi:10.1080/783663595 Bond, T. G., & Fox, C. M. (2007). Applying the Rasch model: Fundamental measurement in the human sciences (2nd ed.). Mahwah: Erlbaum. Chesterfield, K. B., Chesterfield, F. A., & Chávezy, R. (1982). Peer interaction, language proficiency, and language preference in bilingual preschool classrooms. Hispanic Journal of Behavioral Sciences, 4, 467–486. Cobos, R. (2003). A dictionary of New Mexico and Southern Colorado Spanish. Santa Fe: Museum of New Mexico Press. Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale: Erlbaum. Cook, K. F., O’Malley, K. J., & Roddey, T. S. (2005). Dynamic assessment of health outcomes: Time to let the CAT out of the bag? Health Service Research, 40, 1694–1711. Costa, A., Santesteban, M., & Caño, A. (2005). On the facilitatory effects of cognate words in bilingual speech projection. Brain and Language, 94, 94–103. Druks, J., & Masterson, J. (2000). An Object and Action Naming Battery. London: Psychology Press. Edmonds, L. A. (2013). Correlates and crosslinguistic comparisons of informativeness and efficiency on Nicholas and Brookshire discourse stimuli in Spanish/English bilingual adults. Journal of
Speech, Language, and Hearing Sciences. doi:10.1044/1092-4388 (2012/12-0065) Edmonds, L. A., & Donovan, N. J. (2012). Item level psychometrics and predictors of performance for Spanish/English bilingual speakers on An Object and Action Naming Battery. Journal of Speech, Language, and Hearing Research, 55, 359–381. Fendrich, M., Smith, E. V. J., Pollack, L. M., & Mackesy-Amiti, M. E. (2009). Measuring sexual risk for HIV: A Rasch scaling approach. Archives of Sexual Behavior, 38, 922–935. Fries, J. F., Bruce, B., & Cella, D. (2005). The promise of PROMIS: Using item response theory to improve assessment of patientreported outcomes. Clinical and Experimental Rheumatology, 23, S53–S57. Galván, R. A. (1995). The dictionary of Chicano Spanish (El diccionario del español chicano). Chicago: NTC Publishing Group. Grosjean, F. (1998). Studying bilinguals: Methodological and conceptual issues. Bilingualism: Language and Cognition, 1, 131–149. Hulstijn, J. H., & Laufer, B. (2001). Some empirical evidence for the involvement load hypothesis in vocabulary acquisition. Language Learning, 51, 539–558. Kan, I. P., & Thompson-Schill, S. L. (2004). Effect of name agreement on prefrontal activity during overt and covert picture naming. Cognitive, Affective, & Behavioral Neuroscience, 4, 43–57. doi:10.3758/ CABN.4.1.43 Kohnert, K. J., Hernandez, A. E., & Bates, E. (1998). Bilingual performance on the Boston Naming Test: Preliminary norms in Spanish and English. Brain and Language, 65, 422–440. Kohnert, K., Windsor, J., & Miller, R. (2004). Crossing borders: Recognition of Spanish words by English-speaking children with and without language impairment. Applied Psycholinguistics, 25, 543– 564. Levelt, W. J. M., Schriefers, H., Vorberg, D., Meyer, A. S., Pechmann, T., & Havinga, J. (1991). The time course of lexical access in speech projection: A study of picture naming. Psychological Review, 98, 122–142. Marian, V., Blumenfeld, H. K., & Kaushanskaya, M. (2007). The Language Experience and Proficiency Questionnaire (LEAP-Q): Assessing language profiles in bilinguals and monolinguals. Journal of Speech, Language, and Hearing Research, 50, 940–967. Masterson, J., & Druks, J. (1998). Description of a set of 164 nouns and 102 verbs matched for printed word frequency, familiarity and ageof-acquisition. Journal of Neurolinguistics, 11, 331–354. Mätzig, S., Druks, M., Masterson, J., & Vigliocco, G. (2009). Noun and verb differences in picture naming: Past studies and new evidence. Cortex, 45, 738–758. Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons’ responses and performances as scientific inquiry into score meaning. American Psychologist, 50, 741–749. Muñoz, M. L., & Marquardt, T. P. (2003). Picture naming and identification in bilingual speakers of Spanish and English with and without aphasia. Aphasiology, 17, 1115–1132. Muñoz, M. L., & Marquardt, T. P. (2008). The performance of neurologically normal bilingual speakers of Spanish and English on the short version of the Bilingual Aphasia Test. Aphasiology, 22, 3–19. Muñoz, M. L., Marquardt, T. P., & Copeland, G. (2009). A comparison of the codeswitching patterns of aphasic and neurologically normal bilingual speakers of English and Spanish. Brain and Language, 66, 249–274. Payne, J. S., & Whitney, P. J. (2002). Developing L2 oral proficiency through synchronous CMC: Output, working memory, and interlanguage development. CALICO Journal, 20, 7–32. Rasch, G. (1980). Probabilistic models for some intelligence and attainment tests. Chicago: University of Chicago Press. Original work published 1960.
Behav Res Silva-Corvalán, C. (1994). Language contact and change: Spanish in Los Angeles. New York: Clarendon Press. Snodgrass, J. C., & Vanderwart, M. (1980). A standardized set of 260 pictures: Norms for name agreement, image agreement, familiarity, and visual complexity. Journal of Experimental Psychology: Human Learning and Memory, 6, 174–215. doi:10.1037/0278-7393.6. 2.174 Szekely, A., D’Amico, S., Devescovi, A., Federmeier, K., Herron, D., Iyer, G., . . . Bates, E. (2005). Timed action and object naming. Cortex, 41, 7–25. doi:10.1016/S0010-9452(08)70174-6
Wright, B. D. (1997). A history of social science measurement. Educational Measurement: Issues and Practice, 16, 33–45. Wright, B. D. (1999). Fundamental measurement for psychology. In S. E. Embretson & S. L. Hershberger (Eds.), The new rules of measurement: What every psychologist and educator should know (pp. 65– 104). Mahway: Erlbaum. Wright, B. D., & Masters, G. N. (2002). Number of person or item strata. Rasch Measurement Transactions, 16, 888. Wright, B. D., & Stone, M. H. (1979). Best test design: Rasch measurement. Chicago: MESA Press.