(c) Cambridge University Press http://journals.cambridge.org/action/displayFulltext? type=1&fid=9699276&jid=SLA&volumeId=37&issueId=02&aid=9699231&bodyId=&membershipNumber=&societyETOCSession=
Studies in Second Language Acquisition, 2015, 37, 269–297. doi:10.1017/S0272263114000850
TIMED AND UNTIMED GRAMMATICALITY JUDGMENTS MEASURE DISTINCT TYPES OF KNOWLEDGE Evidence from Eye-Movement Patterns
Aline Godfroid, Shawn Loewen, Sehoon Jung, Ji-Hyun Park, and Susan Gass Michigan State University
Rod Ellis University of Auckland
Grammaticality judgment tests (GJTs) have been used to elicit data reflecting second language (L2) speakers’ knowledge of L2 grammar. However, the exact constructs measured by GJTs, whether primarily implicit or explicit knowledge, are disputed and have been argued to differ depending on test-related variables (i.e., time pressure and item grammaticality). Using eye-tracking, this study replicates the GJT results in R. Ellis (2005). Twenty native and 40 nonnative English speakers judged
We are grateful to Tzufen Chang and David Reyes-Gastelum, the statistical consultants at the Center for Statistical Training and Consulting at Michigan State University, for helping us run the analyses in SAS. We are also appreciative of the many helpful comments from the audiences at the SLRF 2013 symposium on methodological innovations, the AAAL 2014 symposium on implicit and explicit learning, and EuroSLA 2014. Thanks are also due to the students and faculty at Michigan State University to whom we presented our results. Their perceptive comments helped make this article stronger. Despite these many comments, all errors that remain are our own. Correspondence concerning this article should be addressed to Dr. Aline Godfroid, Second Language Studies Program, Michigan State University, B253 Wells Hall, 619 Red Cedar Road, East Lansing, MI, 48824. E-mail:
[email protected] © Cambridge University Press 2015
269
270
Aline Godfroid et al. sentences with and without time pressure. Analyses revealed that time pressure suppressed regressions (right-to-left eye movements) in nonnative speakers only. Conversely, both groups regressed more on untimed, grammatical items. These findings suggest that timed and untimed GJTs measure different constructs, which could correspond to implicit and explicit knowledge, respectively. In particular, they point to a difference in the levels of automatic and controlled processing involved in responding to the timed and untimed tests. Furthermore, untimed grammatical items may induce GJTspecific task effects.
In the present study, we investigate what knowledge types native speakers (NSs) and nonnative speakers (NNSs) of English draw on to make judgments regarding the acceptability of English sentences. Unlike previous studies, we employ a methodology new to this line of research— namely, eye-tracking. Specifically, we compare participants’ eye-movement patterns as they perform judgment tasks with different design features (i.e., time pressure and item grammaticality) hypothesized to elicit the use of different types of linguistic knowledge, in this case, explicit and implicit knowledge. BACKGROUND Implicit and Explicit Knowledge The distinction between implicit and explicit knowledge is now widely accepted in both cognitive psychology (Cleeremans & Jiménez, 2002; Dienes & Perner, 1999; Reber, 1976) and SLA (N. C. Ellis, 1994; R. Ellis, 2005; Rebuschat, 2013; Williams, 2009); it is also supported by neurobiological evidence (Paradis, 2004). The fundamental difference between the two types of knowledge lies in whether individuals are aware of what they know. In the case of explicit knowledge, individuals are aware of and thus can consciously apply what they know; in the case of implicit knowledge, individuals are not aware of what they know and thus use their knowledge without being conscious of it. Linked to this fundamental difference is the issue of the accessibility of the two types of knowledge. Explicit knowledge consists of declarative representations that, in general, can be accessed only through controlled processing. Implicit knowledge involves procedural representations and thus can be accessed through automatic processing. The distinction between controlled and automatic processing is attentional in nature (Bialystok, 1994; Shiffrin & Schneider,
Eye Movements and Grammaticality Judgments
271
1977). We seek to exploit this distinction in the current study by looking for evidence of controlled and automatic processing in participants’ eye-movement records, the assumption being that automatic processing reflects access to implicit knowledge (or automatized explicit knowledge; DeKeyser, 2003)1 and controlled processing draws on analyzed explicit knowledge (see R. Ellis, 2005). With regard to SLA, second language (L2) learners’ deployment of their linguistic knowledge depends in part on the extent of their implicit and explicit representations of the L2 and in part on the type of language use involved. Second language learners who have experienced only traditional form-focused instruction may have to rely mainly on explicit knowledge because that may be the main type of knowledge they possess. In contrast, L2 learners who have learned a language naturalistically may draw primarily on implicit knowledge. Furthermore, tasks that require a focus on meaning and fluency (with the time pressure that comes from participating in a communicative event) favor the use of implicit knowledge, whereas tasks that entail a focus on linguistic forms and exert no time pressure allow for the use of explicit knowledge. However, tasks only predispose learners toward the use of one type of knowledge or the other; they cannot determine which type of knowledge is used, as, in part, what knowledge is used depends on the knowledge sources available to the individual learner.
Grammaticality Judgments as Measures of Implicit and Explicit Knowledge Grammaticality judgment test (GJT) is the common term used for the elicitation methodology adopted in this study.2 In the current study, GJTs are used with reference to target language norms3 with an eye toward understanding processing differences associated with different GJT items (grammatical, ungrammatical, timed, and untimed) and how these differences relate to the linguistic knowledge measured (implicit or explicit). Time pressure may constrain what knowledge representations are available to the test taker because it suppresses reflection and makes it more difficult to access explicit knowledge (R. Ellis, 2005; Loewen, 2009). Grammatical and ungrammatical sentences differ in that only ungrammatical sentences have a clear critical area (i.e., the error) that may invite the use of explicit, declarative knowledge (see Bialystok, 1979; R. Ellis, 1991; Hedgcock, 1993). Because L2 learners and educated NSs are widely assumed to possess both implicit and explicit knowledge of the target language (see, e.g., the contributions in N. C. Ellis, 1994; R. Ellis et al., 2009; Hulstijn & Ellis, 2005;
272
Aline Godfroid et al.
Rebuschat & Williams, 2012; Sanz & Leow, 2011), it is unclear as to which knowledge sources—implicit, explicit, or both—individuals draw on when making judgments. The types of knowledge that participants use could also be influenced by certain task design features, including item grammaticality and the presence or absence of time pressure. Hence, the aim of the present article is to employ eye-tracking methodology as we seek to refine our understanding of the construct validity of GJTs. In two pioneering studies conducted to explore learners’ use of explicit (analyzed) versus implicit (unanalyzed) knowledge,4 Bialystok (1979, 1982) investigated participants’ performance on GJTs with different task characteristics and instructions. The 1979 study looked at the effects of time pressure: L2 French learners performed auditory GJTs under 3-s and 15-s time limits. Because having more time did not improve participants’ performance on the ungrammatical items (for which they could presumably do a mental search for a rule), Bialystok concluded that, overall, auditory GJTs encourage the use of implicit knowledge. However, manipulating task instructions, such as asking participants to identify, correct, or describe the error, could push test takers to rely on explicit or analyzed knowledge, as shown by strong correlations between scores on GJTs that included an additional error identification, error correction, or rule description component (Bialystok, 1979, 1982). Task modality also mattered, with written GJTs arguably allowing for controlled, nonautomatic knowledge retrieval, unlike their auditory counterparts (Bialystok, 1982). More recently, there have been multiple factor-analytic studies investigating the construct validity of GJTs and the roles of time pressure and item grammaticality in influencing the types of knowledge used by learners to make their judgments. In these studies, participants were given a battery of three to five tests, and the accuracy scores were submitted to factor analyses to determine the underlying relationships among the tests. Most studies have found that time pressure, rather than item grammaticality, was the influencing factor in L2 learners’ test performance (Bowles, 2011; R. Ellis, 2005; R. Ellis & Loewen, 2007; Han & Ellis, 1998; Zhang, 2014). This finding has been interpreted as showing that timed and untimed written GJTs provide relatively separate measures of implicit and explicit knowledge, respectively. In contrast, Gutiérrez (2013) identified grammaticality, rather than time pressure, as the crucial variable in his factor analyses, leading him to propose that grammatical items better measured implicit knowledge whereas ungrammatical items better measured explicit knowledge. Partial support for the influential role of grammaticality comes from R. Ellis (2005), who found that ungrammatical items on the untimed GJT correlated more strongly with measures of explicit knowledge than grammatical items did.
Eye Movements and Grammaticality Judgments
273
Evidence for the validity of GJTs and other tests in the battery as measures of implicit and explicit knowledge has consisted primarily of participants’ accuracy scores. Factor analyses can reveal which tests pattern together (exploratory factor analysis) or whether the proposed patterning can explain the data better than a competing model (confirmatory factor analysis). However, what the patterning means—that is, how to interpret the latent variables—falls to the researcher, and, at this time, alternative interpretations (e.g., test difficulty) cannot be ruled out. In an attempt to respond to this criticism and to illuminate the processes that take place during GJT performance, the current study utilizes eye-tracking methodology to test whether previous accuracy-based claims, which rest on a product measure, can be supported by processing data. To this end, we recorded participants’ eye movements and determined whether GJT items hypothesized to measure different types of knowledge are also processed differently.
Eye Movements and Grammaticality Judgments Eye-movement registration, or eye-tracking, refers to the recording of an individual’s eye movements as he or she performs a particular task. Eye movements are a useful data source in cognitive research because the point of gaze serves as an index of overt attention (Wright & Ward, 2008) that can be used to make inferences about participants’ corresponding covert attentional processing, or mental focus. This assumption of an eye-mind link (Reichle, Pollatsek, & Rayner, 2012) underlies most psycholinguistic research using eye-tracking. Eye-movement data have been analyzed extensively in first-language (L1) and L2 sentence processing research (see Clifton, Staub, & Rayner, 2007, for a review of the L1 literature, and Dussias, 2010; Frenck-Mestre, 2005; Roberts & Siyanova-Chanturia, 2013, for L2focused reviews). The bulk of this research has investigated issues in parsing, which is the online computation of syntactic structure, by presenting participants with temporarily or globally ambiguous sentences. Comparatively less eye-movement research has dealt with semantically or syntactically anomalous sentences (see Warren, 2011), and, to our knowledge, only a few eye-tracking studies have employed grammatical and ungrammatical sentences to investigate readers’ knowledge of the violated structures (e.g., N. C. Ellis, Hafeez, Martin, Chen, & Boland, 2014; Keating, 2009; Lim & Christianson, 2014; Sagarra & Ellis, 2013). Evidence for the availability of targetlike knowledge representations during reading comes from so-called sensitivity effects. Readers are said to show sensitivity when they fixate
274
Aline Godfroid et al.
significantly longer on or regress more frequently from the error than from its grammatical counterpart in a sentence.5 In a study on gender agreement, Keating (2009) found that native Spanish speakers and advanced L2 learners looked longer at (in total) and regressed more from incongruous adjectives than congruous ones. Keating interpreted these effects as showing that the advanced L2 speakers had acquired grammatical gender, even though they were not able to process gender as efficiently as NSs. Unlike natives, the advanced L2 speakers detected mismatches in agreement only in adjacent noun-adjective pairs, whereas NSs could also compute agreement across syntactic boundaries. For our purposes, it is important to note that Keating used processing data (eye-movement measures) to make inferences about participants’ underlying grammatical representations of Spanish gender agreement. Because Keating conceived of gender as an abstract feature belonging to Universal Grammar (p. 504), he may have assumed (in line with that theoretical framework) that knowledge of grammatical gender, as reflected in a real-time, meaning-oriented processing task, is by definition implicit. This assumption may also hold true for other researchers adopting grammatical sensitivity as a measure of grammar acquisition (for a review, see Lim & Christianson, 2014). Therefore, if the goal is to differentiate between implicit and explicit knowledge using eye-movement data, analyses other than word-based comparisons of fixation times may be needed (see Godfroid & Winke, in press). Consequently, in this study we examine reading patterns across an entire sentence, rather than sensitivity data for individual words, using what is known as a scanpath analysis (see the Method section for a description). A scanpath or eye-movement pattern is “a trace of a participant’s eye-movements in space and time” (Holmqvist et al., 2011, p. 253). Scanpaths are useful tools in reading research because they consider the entire sequence of eye movements in a trial rather than focusing on a localized subset of events. We used scanpath analysis (with appropriate modifications to fit our data) to address the following research questions: 1. Do written, timed and untimed GJT items elicit different scanpaths among NSs and NNSs? 2. Do written, grammatical and ungrammatical GJT items elicit different scanpaths among NSs and NNSs?
As previous studies have found effects of both time pressure (Bowles, 2011; R. Ellis, 2005; R. Ellis & Loewen, 2007; Han & Ellis, 1998 ; Zhang, 2014) and grammaticality (Gutiérrez, 2013) on GJT accuracy scores, we expect the answer to both questions to be affirmative for NNSs.
Eye Movements and Grammaticality Judgments
275
Because NS data have most commonly been excluded from previous factor analyses, we do not make specific predictions about the effects of time and grammaticality on NS processing. We explore whether scanpaths differ depending on GJT properties. In general, the presence or absence of regressions (i.e., right-to-left eye movements) is hypothesized as a key distinctive feature of participants’ reading behavior under different task conditions. Regressions likely indicate controlled processing (e.g., Clifton & Staub, 2011; Clifton et al., 2007; Reichle, Warren, & McConnell, 2009) and may therefore reflect attempts to retrieve explicit knowledge. Conversely, fluent, left-to-right reading is “the ‘default’ reading process” (Reichle et al., 2009, p. 7) for NSs showing good text comprehension and is, thus, suggestive of automatic processing (see the Analysis and Statistical Analysis sections for further details). METHOD Participants The study took place at a large midwestern university in the United States. Forty NNSs and 20 NSs participated in the study. The NSs were undergraduate students enrolled in language teaching courses. Eighteen were female and two male. Nineteen of them had studied a foreign language before. The NNSs had various L1 backgrounds, including Chinese (n = 21), Arabic (n = 12), Thai (n = 4), Korean (n = 2), and Japanese (n =1). Their average age was 22 (range: 18–47), and all of them had moved to the United States after puberty (range: 17–46). No participants had studied at a university for more than 2 years at the time of data collection. The NNSs were enrolled in two different programs at an English language center. Twenty students were enrolled in an intensive English program, consisting of four levels of fulltime, preuniversity English as a second language (ESL) courses. Eleven students were at Level 3, and nine were at Level 4. The other 20 students were from Level 5 English for academic purposes (EAP) classes. These students were taking part-time ESL courses as well as regular undergraduate courses at the university. Preliminary analyses did not find significant or meaningful differences in GJT scores based on institutional proficiency levels; consequently, all the NNSs were treated as one group for the analysis.
Tasks Two GJTs were administered: one GJT under time pressure (timed GJT) and then an identical GJT without time pressure (untimed GJT). Both tasks were integrated into the EyeLink 1000 eye-tracking system to collect
276
Aline Godfroid et al.
participants’ eye movements and judgment responses during reading. The GJTs consisted of 68 English sentences adapted from Loewen (2009) and R. Ellis (2005), which involved 17 different linguistic structures with two grammatical and two ungrammatical sentences per structure (see the Appendix). The structures were intended to cover a variety of grammatical structures at various levels of L2 proficiency, and the GJTs were administered to learners at similar levels of proficiency to those in R. Ellis (2005) and Loewen (2009). In addition to using the original GJT items, for purposes of counterbalancing, we created another version containing the respective grammatical and ungrammatical counterparts of the original 68 sentences. For example, the counterpart of the original item He plays soccer very well became *He plays very well soccer. Only the targeted linguistic structure was manipulated, and the rest of the sentence was unchanged. Thus, the overall length of the sentence as well as the size of the interest areas (see the Interest Areas section) was approximately identical for each pair of sentences. Participants read 34 grammatical and 34 ungrammatical sentences, with grammaticality counterbalanced among participants. However, each individual participant received the same version of the test in both the timed and untimed conditions. The amount of time allocated for each item in the timed GJT was adapted from R. Ellis’s (2005) and Loewen’s (2009) studies, in which 20% was added to the median response times provided by NS pilot data to account for L2 learners’ slower processing speed. The time limits calculated in this manner ranged from 1,800 ms to 6,240 ms across the test items, whereas the length of the sentences ranged from 5 to 12 words (M = 7.9 words).
Procedures The participants completed the timed GJT first, followed by the untimed GJT. At the end of the experiment, participants filled out a language learning background questionnaire. Timed GJT. Participants read one test sentence at a time on the computer while their eye movements were recorded by a desk-mounted eye-tracker with head support. Each sentence was presented for the predetermined fixed time limit and then disappeared automatically. Participants were instructed to read and make a grammaticality judgment on the sentence by pressing the designated button (either grammatical or ungrammatical) on a controller within the given time limit. For the last 34 of the 68 items, participants were asked to provide source attributions and certainty ratings for their judgments (see Rebuschat, 2013); however, these data are not included in the current analysis.
Eye Movements and Grammaticality Judgments
277
Untimed GJT. After a 15-min break, participants completed the untimed GJT, which was exactly the same as the timed GJT, except that participants were allowed unlimited time to make their judgments. For the second half of the test items, participants once again provided source attributions and confidence ratings. The NSs spent an average total time of 2 min 24 s and 3 min 5 s reading the sentences in the timed test and untimed test, respectively. The average reading times for the NNSs showed greater differences between the two tests: 2 min 46 s in the timed test and 6 min 16 s in the untimed test. Note that the average reading times reported here reflect only the reading times on the test items. The total duration of one test (i.e., timed or untimed GJT) ranged between 17 and 25 min due to the additional components in the experiment procedure (i.e., source attributions and certainty ratings, eye-tracker setup, and practice session).
Data Analysis Inspired by previous scanpath analyses in reading research (e.g., Holmqvist, Holsanova, Barthelson, & Lundqvist, 2003; Hyönä, Lorch, & Kaakinen, 2002; Von der Malsburg, Kliegl, & Vasishth, 2014; Von der Malsburg & Vasishth, 2011) and in reading and writing research (e.g., Johansson, Wengelin, Johansson, & Holmqvist, 2010; Wengelin et al., 2009), we investigated participants’ sequential eye-movement patterns during GJT performance. An advantage of scanpath analysis is that “long effects that are spread over several words can be captured in whole…and do not have to be chopped up into pieces” (Von der Malsburg et al., 2014, p. 3). Furthermore, because every trial is represented by its own scanpath, there is no risk of blending several distinct and sometimes competing fixation patterns into a single analysis (Von der Malsburg & Vasishth, 2011). Von der Malsburg and Vasishth (2011) performed a scanpath analysis of existing data (regression sequences) from Meseguer, Carreiras, and Clifton’s (2002) study on the processing of temporarily ambiguous sentences. Von der Malsburg and Vasishth (2011) quantified the similarity among scanpaths of individual trials and identified three representative “scanpath signatures” (p. 109): (a) a regression to the beginning of the sentence followed by rereading, (b) a regression to the beginning of the sentence without rereading, and (c) a regression from the disambiguating region (the verb) to the ambiguous region (the adverbial clause). Of these three patterns, only a regression followed by rereading occurred significantly more often when the sentence was disambiguated in the nonpreferred manner (i.e., a garden-path sentence that required reanalysis). This led Von der Malsburg and Vasishth to conclude that regression plus rereading is most closely associated with the process of syntactic reanalysis.
278
Aline Godfroid et al.
In the present study, we computed scanpaths only for items that were correctly judged (i.e., grammatical sentences judged grammatical and ungrammatical sentences judged ungrammatical). This resulted in the elimination of about 32% of our dataset. In analyzing correct responses only, we followed the convention of reading time analyses in previous processing studies (e.g., L1: Garnsey, Pearlmutter, Myers, & Lotocky, 1997; Phillips, 2006; L2: Dussias & Piñar, 2010; Juffs, 2005; Omaki & Schulz, 2011; White & Juffs, 1998). Furthermore, reading patterns associated with correct responses are more likely to reflect targetlike knowledge retrieval, whether explicit or implicit, and are therefore directly relevant for our purposes. In contrast, for incorrect responses it is difficult to differentiate nontargetlike knowledge, which could be implicit or explicit, from indeterminate knowledge or no knowledge. In the case of indeterminate or no knowledge, the question of implicit or explicit knowledge is moot. Scanpaths can be represented in different ways, including as symbol strings, vectors, and attention maps (Holmqvist et al., 2011). In this study, we opted for a string-based representation, in which string symbols (e.g., A, B, C, D) represent visits by the eyes to functional interest areas, also termed semantic interest areas by Holmqvist and colleagues (see Figures 1, 2, and 3, for examples). We defined four functional interest areas based on the relative position of the error in the ungrammatical version of a sentence (see the examples in [1] and [2] and the Appendix). Using functional interest areas allowed us to equalize the differences among our target structures. Thus, although the sentences for different structures differed in length, all of them had a sentence-initial region, a primary interest area, and, when the structure allowed it, a spillover region. (Tag questions and sentences with dative alternation do not allow for spillover regions due to the sentence-final location of the structures; see the Appendix.) The number of words within each interest area varied among the structures; however, this was inconsequential for our scanpath analysis because we defined scanpaths on the basis of readers’ eye movements between, rather than within, different interest areas. Interest Areas. All sentences began with a sentence-initial region, which preceded the primary interest area (PIA). The PIA was defined as the part of the sentence where readers could notice the ungrammaticality for the first time during forward reading, if they were sensitive to it. Grammatical sentences were assigned the same PIA as their ungrammatical counterparts, except that the word(s) now appeared in grammatical form (see the examples in [1] and [2] and the Appendix). A spillover region immediately following the PIA was designated to capture spatially delayed effects in participants’ registration of the grammatical error (see Reichle, 2011). We extended the PIA or the spillover region by one or
Eye Movements and Grammaticality Judgments
279
two words when the original region consisted of fewer than five letters, to minimize the amount of skipping and ensuing data loss. Any words following the spillover region were designated as the sentence-final region. Interest areas are indicated in the examples in (1) and (2), which represent a pair of test sentences for indirect questions; A is the sentenceinitial region, B is the PIA, C is the spillover, and D indicates the sentence-final region. (1) She wanted to know why he had studied German. (grammatical) A B C D (2) She wanted to know why had he studied German. (ungrammatical) A B C D Analysis. Fixation durations as well as the number and direction of eye movements were extracted for each interest area in each GJT item. We then conducted an inductive qualitative analysis of a subset of these data to identify relevant scanpath categories. As a first step, we looked at 20% of the participants’ trials for a lexical, a morphosyntactic, and a syntactic structure—namely, since/for, the indefinite article, and the unreal conditional, respectively. We identified recurring patterns in participants’ reading paths for the timed and untimed and grammatical and ungrammatical test items for these structures. This process resulted in an initial classification system with nine different scanpath types. Through an iterative process, involving examination of trials and discussion of their similarities and differences, the original classification system was reduced to four categories, one of which (skipping the PIA and spillover) was eventually discarded (see the final paragraph of the Finished Reading with Regression section for details). In applying the final three categories to NSs’ and NNSs’ reading of all 17 target structures, we found that the scanpath categories captured the data well in that the different scanpaths were represented in all the applicable target structures.6 Each scanpath category was composed of different scanpath patterns with one or more common features. The defining features of each scanpath category are described in the following three sections.
No regression. This category consisted of sentences containing no regressions out of the PIA or subsequent parts of the sentence, meaning that the lettering in the symbol strings must occur in alphabetical order. Sentences could be read in a single pass, without any regressions, or sentences could contain regressions within the sentence-initial region. Most importantly, however, there was no regression out of the PIA or a region after that, indicating that participants did not go back to reanalyze previous parts of the sentence once they reached the critical area. Figure 1 shows a participant’s eye movements while reading a sentence with no regressions. Circles represent eye fixations, and arrows
280
Aline Godfroid et al.
represent eye movements between fixations, known as saccades. In Figure 1, all arrows point right, with the exception of a refixation within the PIA on nicer, and they represent forward, as opposed to regressive, saccades. The pattern can be represented symbolically as ABCD.
Figure 1. No regression. The sentence is *I think that he is more nicer and more intelligent than James. Unfinished reading with regression. Sometimes participants read up until the PIA or spillover area and then regressed to an earlier part of the sentence. They then made their judgments without ever reading the end of the sentence, as shown in the absence of fixations on the sentence-final region (D) in Figure 2. This scanpath is represented as ABCBC. Trials were classified as unfinished reading with regression if (a) the sentence had a sentence-final region but the corresponding symbol string did not have a D and (b) at least one earlier letter in the alphabet followed a later letter in the alphabet (i.e., revisiting A after B or B after C).
Figure 2. Unfinished reading with regression. The sentence is *I think that he is more nicer and more intelligent than James. Finished reading with regression. The third scanpath category is similar to the second in that participants regressed at least one time while reading. In contrast to the previous category, however, participants also read the sentence-final region.7 Any symbol strings containing D (or, more generally, the letter that denoted the last region in that sentence) as well as at least one nonalphabetically ordered sequence (e.g., C before B) were included in this category. For instance, the symbolic representation of the scanpath in Figure 3 is ABCBCDA.
Figure 3. Finished reading with regression. The sentence is *I think that he is more nicer and more intelligent than James.
Eye Movements and Grammaticality Judgments
281
Our original statistical analysis also included a fourth category of scanpaths in which participants skipped the PIA and spillover region entirely. However, because such trials were rare (3% of the data for both NSs and NNSs) and arguably anomalous, we removed this category from the analysis. Doing so did not influence our results. Finally, 43 trials (0.8%) did not fall into any of the preceding categories, mainly due to participants’ performance error during the tasks (e.g., a judgment was made before any reading occurred); these data were also excluded from the analysis. Statistical Analysis. To answer the research questions, we tallied the number of items for each scanpath category and examined the frequencies according to the independent variables of L1-L2 status, time pressure, and grammaticality. We then analyzed the effects of these predictors statistically using a mixed-effects multinomial logistic regression. We entered two length measures as control variables into the regression model. To account for the fact that unfinished reading with regression could only occur if there was a sentence-final region, the length of the sentence-final region (range: 0–4 words) was entered as a covariate. We also added total-sentence length (range: 5–12 words) because this could influence the likelihood of a reader regressing (Kliegl, Grabner, Rolfs, & Engbert, 2004; Rayner, 1998, 2009; Reichle et al., 2009; Vitu & McConkie, 2000) and, therefore, could potentially affect participants’ reading patterns. Multinomial logistic regression is an extension of binary logistic regression to categorical dependent variables with more than two possible outcomes (e.g., Agresti, 2013; Hosmer & Lemeshow, 2013; Skrondal & RabeHesketh, 2003). It produces a set of statistics for each possible outcome separately (see the scanpath categories in Tables 3–4), except for one outcome category, which is the baseline. In this study, we chose no regression as the baseline because no regression, or straight-pass reading, arguably represents the clearest evidence for fluent, automatic processing of the critical part of the sentence (see also Reichle et al., 2009). In comparison, regressions out of the critical part of the sentence indicate reanalysis and controlled processing (compare Clifton & Staub, 2011; Clifton et al., 2007; Frazier & Rayner, 1982; Meseguer et al., 2002; Mitchell, Shen, Green, & Hodgson, 2008; Reichle et al., 2009; Von der Malsburg & Vasishth, 2011). Comparing the amount of finished and unfinished reading with regression, on the one hand, to reading with no regression, on the other, was informative because it showed the relative amount of controlled and automatic processing in a given condition. Because logistic regression is performed on a mathematic transformation of the data, results are interpreted in terms of odds and odds ratios (ORs; see Liberman, 2005, for further details). Important for our purposes is the observation that both odds and ORs are ratios. Therefore, their
282
Aline Godfroid et al.
quotient—whether larger or smaller than 1—indicates whether the tested scanpath (in the numerator) occurred more often or whether it was no regression (in the denominator) that occurred more often. Thus, if we interpret finished and unfinished reading with regression as controlled processing and no regression as automatic processing, odds and ORs that are greater than 1 indicate more controlled processing, and odds and ORs that are less than 1 signal more automatic processing. Odds ratios are effect size measures (Ferguson, 2009). They measure the effect of a particular independent variable on the baseline odds. For instance, for finished reading with regression, OR(timed untimed) indicates how the relative amount of finished reading with regression versus no regression changes with time pressure. Odds ratios that are greater than 1 suggest that adding time pressure promotes controlled processing, whereas ORs that are less than 1 indicate that the timed GJT results in more automatic processing than the untimed test. For L1-L2 status, we examined how being a NNS influenced reading behavior compared to NS processing: OR(NNS NS). For grammaticality, we tested how reading patterns changed for ungrammatical compared to grammatical items: OR(ungrammatical grammatical). Fixed effects are significant at α = .05 if the 95% confidence interval for the odds ratio does not span 1. Ferguson (2009) suggested that an OR of 3.0 (or 0.33 when the baseline is the more frequent pattern) is a moderate effect, whereas an OR of 4.0 or 0.25 represents a strong effect. We also included random intercepts for subjects in the model to account for individual differences in participants’ reading behavior.8 All analyses were run in the Statistical Analysis Software (SAS; Version 9.4) using the glimmix procedure. RESULTS Accuracy Scores The percentages of correctly judged items and corresponding raw frequencies are shown in Table 1. Native speakers scored 87% or above for all test sections except for the timed ungrammatical items. The scores for the NNSs varied much more and ranged from a low of 26% on the timed ungrammatical items to a high of 82% for the untimed grammatical ones. Because we analyzed only correctly judged items, NSs had a somewhat similar number of items for each category, ranging from 518 for the timed ungrammatical items to 662 for the untimed grammatical items. For the NNSs, there was a considerably greater range; however, the proportion of accurate judgments roughly mirrored that of NSs. The timed ungrammatical section had the fewest items, at 355, whereas the untimed grammatical had the most, with 1,123.
283
Eye Movements and Grammaticality Judgments
Table 1. Raw frequencies and percentages of correctly judged items NSs (n = 20) Time condition
Grammaticality
N
Timed
Grammatical Ungrammatical Grammatical Ungrammatical
615 518 662 598
Untimed
NNSs (n = 40)
M (%) SD (%) 90.3 76.1 97.3 87.8
9.1 12.0 3.1 7.0
N
M (%) SD (%)
899 355 1,123 753
65.9 26.0 82.4 55.2
15.3 11.6 10.8 16.6
Reading Patterns: Descriptives Table 2 displays the raw frequencies for the reading patterns found in the data, whereas the pie charts in Figure 4 display percentages, which allow for comparison across L1-L2 status, grammaticality, and time pressure. The pie charts show that for all conditions (i.e., regardless of L1-L2 status, grammaticality, or time pressure), the most frequent reading pattern was finished reading with regression. Nevertheless, there are some differences in the percentages of items read with regressions compared to the other patterns across the data. For the NSs, the percentage of finished reading with regression is roughly similar across all four categories. The lowest percentage is 61% for untimed ungrammatical, whereas the highest is 70% for the untimed grammatical items. In addition, NSs read between 21% and 28% of the items with no regression, whereas unfinished reading with regression occurred in 8% to 13% of the items. For the NNSs, the distribution of patterns is more varied. Although finished reading with regression remains the predominant pattern in all contexts, there are considerable differences for this category between the timed and untimed conditions, with NNSs regressing on roughly half of the items in the timed condition, whereas in the untimed condition, Table 2. Raw frequencies of items according to reading patterns NSs Timed
NNSs Untimed
Timed
Untimed
Reading patterns
UnG
Gram
UnG
Gram
UnG
Gram
UnG
Gram
No regression Unfinished with regression Finished with regression
109 65
165 48
153 68
133 54
122 34
342 26
123 77
180 23
329
379
353
443
174
483
537
904
Note. Gram = grammatical; UnG = ungrammatical.
284
Aline Godfroid et al.
Figure 4. Percentages of reading patterns for timed versus untimed and grammatical versus ungrammatical GJT items. the proportion rises to roughly three quarters of items. Furthermore, there is a difference in the number of finished reading with regression patterns between the ungrammatical and grammatical items, with the grammatical items having the highest percentage of regressed items. In the untimed condition, the finished reading with regression pattern for the grammatical items is 9% greater than for the ungrammatical ones, a difference that is the same for the NSs. As for the other patterns for the NNSs, there were more than twice as many straight-pass items (i.e., no regression) on the timed test as on the untimed test, whereas unfinished reading with regression occurred most frequently in the ungrammatical items yet was virtually absent from the grammatical conditions.
Reading Patterns: Inferentials Finished Reading with Regression. Table 3 presents the odds and ORs for the largest category, finished reading with regression. The reference category for the odds in this table and Table 4 is no regression. Although there are two statistically significant main effects, the more interesting and important effects are the two significant two-way interaction effects: Time × L1-L2 status and Time × Grammaticality. To understand the interplay of these variables, we broke the comparisons down into four post hoc tests and adjusted the significance threshold for each post hoc test to control for the overall type I error rate.
285
Eye Movements and Grammaticality Judgments
Table 3. Regression output (fixed effects) for finished reading with regression Predictors
Odds
Intercept L1-L2 status Time Grammaticality Sentence length (centered) Sentence-final length (centered) L1-L2 status × Time L1-L2 status × Grammaticality Time × Grammaticality L1-L2 status × Time × Grammaticality
3.70
ORs
95% confidence interval
p
0.95 0.51 0.95 0.95 1.00 0.35 1.20 1.98 0.68
[2.53, 5.41] [0.62, 1.44] [0.44, 0.59] [0.83, 1.10] [0.92, 0.99] [0.93, 1.07] [0.25, 0.51] [0.82, 1.77] [1.32, 2.98] [0.39, 1.20]
< .001 .801 < .001 .476 .013 .951 < .001 .351 .001 .187
Note. The intercept represents the odds of regression over no regression for a NS reading a grammatical, untimed sentence with an average total length (7.81 words) and with a sentence-final region of average length (0.61 words).
Because eight post hoc tests were performed for this scanpath category and the next, we set α at .01 for each post hoc test, so the familywise α was .08. First, we compared NS and NNS processing for the timed GJT and the untimed GJT separately. The post hoc analyses revealed that, on the untimed test, NNSs tended to read more items to the end and regress than did NSs (OR = 1.75, 95% CI [1.13, 2.72], p = .013), whereas, under time pressure, NNSs did less regressing than NSs (OR = 0.51, 95% CI [0.33, 0.80], p = .003). Native speakers read and regressed on about the same percentage of sentences in the timed GJT (68%) as the untimed GJT (69%), but NNSs’ finished reading with regression rose from 57% on the timed test to 81% on the untimed test.9 Conversely, NNSs’ straight-pass reading fell from 39% on the timed GJT to 15% on the untimed GJT, whereas NSs read in largely the same manner on the two GJTs (timed: 24%; untimed: 23%). For the second interaction, we compared timed and untimed reading of grammatical items and ungrammatical items separately. We found that item grammaticality influenced reading patterns in the untimed GJT (OR = 0.74, 95% CI [0.61, 0.90], p = .002) but not the timed GJT (OR = 1.22, 95% CI [0.99, 1.50], p = .063). Whereas the amount of finished reading with regression was essentially the same for grammatical and ungrammatical items in the timed GJT (i.e., 62% of trials), it occurred on 70% of ungrammatical items and 80% of grammatical items in the untimed GJT. Thus, more finished reading with regression occurred in the untimed than in the timed test, and the increase was disproportionately strong for the grammatical items.
286
Aline Godfroid et al.
Table 4. Regression output (fixed effects) for unfinished reading with regression Predictors
Odds
Intercept L1-L2 status Time Grammaticality Sentence length (centered) Sentence-final length (centered) L1-L2 status × Time L1-L2 status × Grammaticality Time × Grammaticality L1-L2 status × Time × Grammaticality
0.36
ORs
95% confidence interval
p
0.43 0.68 2.79 0.93 1.74 0.79 4.61 1.82 0.42
[0.24, 0.55] [0.29, 0.65] [0.53, 0.88] [2.16, 3.59] [0.87, 1.00] [1.57, 1.93] [0.37, 1.70] [2.31, 9.18] [0.96, 3.45] [0.15, 1.14]
< .001 < .001 .003 < .001 .041 < .001 .552 < .001 .065 .089
Note. The intercept represents the odds of regression over no regression for a NS reading a grammatical, untimed sentence with an average total length (7.81 words) and with a sentence-final region of average length (0.61 words).
Unfinished Reading with Regression. For unfinished reading with regression, the three-way interaction L1-L2 status × Time × Grammaticality was borderline significant (see Table 4). Therefore, we contrasted NNS with NS processing for each of the four time-by-grammaticality combinations separately. The only condition in which NNSs and NSs engaged in statistically comparable amounts of unfinished reading with regression is the untimed, ungrammatical items (OR = 1.28, 95% CI [0.76, 2.18], p = .354). Nonnative speakers regressed and did not finish reading 9% of these sentences, and NSs performed similarly, with an unfinished-reading rate of 11%. The estimates for timed, ungrammatical sentences are the same, with an unfinished-reading rate of 9% by NNSs and 11% by NSs. However, because of concurrent shifts in the baseline category (no regression), the resulting OR for unfinished reading over no regression for NNSs compared to NSs was significant (OR = 0.43, 95% CI [0.23, 0.78], p = .006). For grammatical items, NNSs engaged in comparatively less unfinished reading than NSs in both the untimed (OR = 0.28, 95% CI [0.15, 0.53], p < .001) and the timed (OR = 0.22, 95% CI [0.12, 0.41], p < .001) conditions. In these conditions, NNSs rarely did not finish reading a sentence after making a regression (2% of trials), whereas NSs sometimes did (7% of trials).
DISCUSSION The aim of this study was to use processing data from eye-tracking to advance our understanding of the types of linguistic knowledge
Eye Movements and Grammaticality Judgments
287
measured by written GJTs. We compared findings from previous factoranalytic research, which relied primarily on accuracy scores, with eye-movement data. Considering how different both measures are, the convergence in findings is rather remarkable. Similar to Bowles (2011), we found that NSs were essentially unaffected by the experimental manipulations (i.e., time pressure and grammaticality). In contrast, NNS processing changed rather drastically under time pressure (see Bowles, 2011; R. Ellis, 2005; R. Ellis & Loewen, 2007; Han & Ellis, 1998; Zhang, 2014), whereas the untimed, grammatical condition stood out as a locus of potential task effects (see R. Ellis, 2005).
Time Pressure Affected NNS Processing Adding time pressure reduced the amount of NNSs’ reading with regression from heavy to moderate (see Figure 4). Post hoc analyses indicated that no regression was the main scanpath category to benefit from this shift: The percentage of trials with no regression increased from 15% in the untimed GJT to 39% in the timed GJT. Thus, when under time pressure, NNSs regressed less and read more sentences in one straight pass. An important question is what, if anything, we can infer from this shift in processing strategy with regard to participants’ underlying knowledge representations. Surely, it is to be expected that when people are given less time to read, they will indeed read less, and, consequently, eye-movement records will reflect this. However, what form such a reduction in reading will take is not determined by the task conditions. In other words, participants might have coped with the time pressure by skipping more words or giving up trying to read the entire GJT item— yet this is not what we found. In what follows, we present two arguments to explain why the observed shift towards straight-pass reading could reflect a qualitative change in the knowledge representations participants draw on. The first argument is that independent factor-analytic research has found that timed and untimed GJTs measure different things, using different data and a different approach than the present study (Bowles, 2011; R. Ellis, 2005; R. Ellis & Loewen, 2007; Han & Ellis, 1998; Zhang, 2014; but see Gutiérrez, 2013). Researchers in these studies factor-analyzed accuracy scores for a number of tests in addition to GJTs. In the present study, we analyzed scanpaths, a processing measure, of correctly judged GJT items only. Despite these different approaches, the same general pattern of findings emerged. The second point in favor of a processing-knowledge connection is more theoretical and argumentative. Forward reading is the default mode of fluent, skilled reading. It “occurs when the reader is proceeding through the text without any problems”—unlike regressions, which are
288
Aline Godfroid et al.
assumed to “reflect difficulty with higher level language processing” (Reichle et al., 2009, p. 4). For knowledge to be retrieved quickly and efficiently during fluent reading, it needs to be stored in a highly accessible form. Such knowledge representations could consist of production rules (e.g., DeKeyser, 2007); subsymbolic, connectionist networks (e.g., Rebuschat & Williams, 2012); or yet other types of representation. Although competing accounts exist (e.g., Bialystok, 1994; McLaughlin, Rossman, & McLeod, 1983), skill acquisition theory and contemporary psychological models posit that automatic, fluent processing operates on automatized explicit knowledge (DeKeyser, 2007) or implicit knowledge (R. Ellis, 2005). Consequently, the processing differences found in the current study could be accounted for by the timed GJT not allowing NNSs to engage in more controlled processing or to access explicit knowledge.
Uniform Processing by NSs across Conditions In contrast to the moderately strong effects of time pressure in the NNS group, the distribution of NSs’ processing patterns remained stable across conditions. Native speakers made at least one regression in the majority of sentences (69% finished reading), with no regression being a clear second (24%). Native speakers also made some fast decisions, including on grammatical sentences (7% unfinished reading). The unfinished reading of grammatical sentences is problematic and could reflect some degree of probabilistic judgment, because strictly speaking one needs to read a whole sentence to be able to tell if it is grammatical. However, the number of such instances is relatively small. In contrast, NNSs engaged in unfinished reading only on ungrammatical sentences, in which the error does indeed license the possibility of an early decision. The lack of time effects on NS processing likely reflects NS-NNS proficiency differences. Following R. Ellis (2005) and Loewen (2009), participants received 20% extra response time, based on NS pilot data. Whereas this approach worked in pressuring NNSs, our data suggest that NSs require shorter response times to pressure their reading. Future researchers should reduce the allotted time to explore how processing changes when NSs experience actual time pressure. As a case in point, in his fourth experiment, Hopp (2010) strongly increased the time pressure in a written GJT administered to native German speakers. The NSs’ performance broke down in a manner similar to near-native speakers, who had taken the same test under less demanding, yet also speeded, conditions. Hopp concluded that “non-native and native grammars and processing systems are fundamentally identical” (p. 901), but their processing efficiency differs.
Eye Movements and Grammaticality Judgments
289
Grammaticality and Task Effects Consistent with Gutiérrez (2013), our analyses also suggested a role for item grammaticality. However, unlike Gutiérrez’s study, we observed an effect of grammaticality only in the untimed condition. Compared to untimed, ungrammatical items, both NSs and NNSs showed a further increase in finished reading with regression on the untimed, grammatical items (NSs: from 63% to 71%; NNSs: from 75% to 84%). An inspection of the other scanpath probabilities showed that this increase was at the expense of NNSs’ unfinished reading with regression, whereas in the NS group the losses were divided evenly between unfinished reading with regression and no regression. One possible explanation for this phenomenon is that grammatical and ungrammatical sentences provide different types of evidence as to the grammaticality of a sentence. Whereas the presence of an ungrammatical element is sufficient evidence that a sentence is ungrammatical, the absence of an ungrammatical element is essentially a lack of evidence. The sentence could be grammatical, or a closer reading might reveal that it is ungrammatical after all. Therefore, learners may wish to reconfirm their initial impression of a sentence by rereading it. Importantly, the rereading of grammatical sentences when time is available is likely an artifact of the task, which is to judge the grammaticality of the sentence. Therefore, we believe that the reading patterns for untimed, grammatical sentences in a written GJT may show the clearest departures from natural reading. The special status of untimed, grammatical items is consistent with Loewen’s (2009) finding that learners took an average of 1 s longer to judge grammatical sentences compared to ungrammatical ones. However, it could be at odds with R. Ellis’s (2005) finding that untimed, ungrammatical items provided a clearer measure of explicit knowledge than untimed, grammatical items. If finished reading with regression is indicative of controlled processing, we would expect the untimed, grammatical items to be the more valid measure of explicit knowledge because that is where this reading pattern occurred most often. The two positions can be reconciled if we take the additional amount of finished reading with regression of untimed, grammatical sentences to reflect task effects and—consequently—untimed, grammatical sentences to be less valid as measures of linguistic knowledge of any type. More generally, the results of this study highlight the need for more comparative SLA research on the effects of reading goals or task instructions on reading (see, e.g., Lim & Christianson, 2014; Holmqvist et al., 2003; Kaakinen & Hyönä, 2005, 2008). Across all conditions, the amount of reading with regression was high or very high, begging the question of what the distribution of reading patterns would look like when participants read for comprehension. Leeser, Brandl, and Weissglass (2011)
290
Aline Godfroid et al.
investigated the effects of task instructions on reading times using selfpaced reading.10 They compared intermediate L2 learners’ reading times for words (or phrases) in grammatical and ungrammatical Spanish sentences. In one task, the participants answered a comprehension question after each sentence, whereas in the other task they made untimed grammaticality judgments. Leeser and colleagues found that participants read more slowly in the GJT task (at least up to the point of the ungrammaticality) than in the comprehension task. Furthermore, readers were sensitive to violations in one of two structures in the GJT task but not in the comprehension task. Leeser and colleagues argued that the sensitivity effect in the GJT showed that “participants were activating a different kind of knowledge (i.e., conscious, metalinguistic knowledge) in addition to the implicit linguistic knowledge that is assumed to guide sentence comprehension” (p. 194; see also Lim & Christianson, 2014, for a similar claim regarding the effects of translation on reading). Because learners could not regress in Leeser et al.’s study, the comparison with the current study is not straightforward; however, both studies suggest that the question of task effects certainly warrants further research.
Limitations and Future Research In this study, we performed a scanpath analysis to address a SLArelated question. We analyzed sentence reading patterns rather than eye-movement data for individual interest areas because we felt this analysis was more revealing about the nature of GJTs (see Von der Malsburg & Vasishth, 2011). Scanpaths have many other potential applications in SLA, including in research on computer-based assessment, computer-assisted language learning, task-based language teaching, dictionary use, subtitle processing, and reading. Indeed, the concept of a functional interest area—that is, a region on the screen with a particular function (Holmqvist et al., 2011)—is flexible and allows tailoring to different research areas. In addition, future researchers could refine the scanpath measure and analysis so as to optimize the approach. One limitation of the present study is that we defined scanpaths in spatial terms only and essentially disregarded the temporal information contained in the eye-movement data. Scanpath algorithms now exist that consider both location and duration of eye movements (Von der Malsburg & Vasishth, 2011), yet they are computationally more intensive. A second potential limitation is that we identified scanpath categories inductively, by visually inspecting trials and referring to previous literature on eye-movement behavior. Cluster analysis could provide a
Eye Movements and Grammaticality Judgments
291
statistical means to identify scanpath categories. Finally, we made the decision to segment our sentences into interest areas (i.e., the sentenceinitial region, primary interest area, spillover region, and sentence-final region). Although this approach took care of length differences among sentences, it would also have been possible to treat space continuously and to work with spatial coordinates instead of symbol strings. Finally, there are several limitations due to the fact that this study replicated GJT items and procedures from previous studies that were not designed specifically for eye-tracking technology. For example, the inclusion of sentences with errors at the end of the sentence meant that there was no sentence-final region that readers could skip in these cases, ruling out the possibility of unfinished reading with regression. Additionally, as one reviewer correctly pointed out, the use of the same items in both the timed and untimed GJTs resulted in the former test containing new items, whereas the latter contained old items that had already been read, a fact that could have affected reading patterns. However, it might be expected that item familiarity on the untimed test would reduce the occurrence of regressions during reading because the learners already knew the answers, in this case from the timed version of the GJT. This is not what we found; in contrast, reading with regression stayed consistent or increased from the timed to untimed tests. Nevertheless, future research could benefit from including different sentences for the timed and untimed GJTs, as Bowles (2011) did. CONCLUSION This eye-tracking study reconciled previous claims about the psychometric properties of written GJTs by showing that both time pressure and grammaticality play a role in the construct validity of GJTs. However, time pressure is the variable that appears to limit NNSs’ ability to engage in more controlled processing, and thereby to access declarative, explicit L2 knowledge. Grammaticality, on the other hand, appears to exhibit the influences of task effects as NSs and NNSs engage in additional processing, time permitting, in an attempt to identify ungrammatical elements in grammatical sentences. The eye-movement data we presented were methodologically innovative for SLA in that they consisted of scanpaths (sentence reading patterns) rather than eye fixations or regressions for individual regions in the sentence. As such, our data suggested that timed and untimed GJTs (particularly ungrammatical, untimed GJTs) measure different things. How this finding is interpreted with reference to the underlying test constructs will depend on one’s theoretical orientation. Specifically, timed GJTs, on the one hand, and ungrammatical items on untimed
292
Aline Godfroid et al.
GJTs, on the other, could be argued to measure implicit and explicit knowledge, respectively (R. Ellis, 2005); automatized explicit and analyzed explicit knowledge (DeKeyser, 2003); or automatic versus controlled access of either knowledge type (Bialystok, 1994). The need to distinguish among these competing accounts will likely push methodological innovation in SLA forward and give rise to more data triangulation and additional techniques such as brain imaging studies. Understanding the interface between processing and knowledge remains an important, and challenging, task. Received 16 November 2014
NOTES 1. DeKeyser (2003) has suggested that automatized explicit knowledge, although not identical to implicit knowledge, is functionally equivalent. Though we accept a possible role of automatized explicit knowledge in L2 behavior, including in L2 GJT behavior, we know of no published study that has attempted to distinguish implicit knowledge and automatized explicit knowledge. Thus, in the study reported in this article, we have elected to refer only to the implicit-explicit distinction. 2. We recognize that elicited judgments are in actuality judgments of acceptability, from which we make indirect inferences about the grammar or interlanguage of an individual (cf. Schütze & Sprouse, 2014). 3. We do not wish to ignore issues relating to the comparative fallacy—namely, “studying the systematic character of one language by comparing it to another” (BleyVroman, 1983, p. 6). In orienting toward target language norms, we are making an assumption about what students are trying to learn. 4. Analysis is a dimension of language proficiency that refers to the structure of mental representations. Analysis corresponds to the phenomenological experience of implicit knowledge becoming more explicit (Bialystok, 1994). Bialystok (1982) introduced the terms analysis and control to make a clearer distinction between the type of knowledge (unanalyzed or analyzed) and the ability to use it (controlled or automatically) than is typically found in the implicit-versus-explicit paradigm. 5. Sensitivity to grammatical violations has been measured more commonly using self-paced reading (SPR; see Coughlin & Tremblay, 2013; Jiang, Novokshanova, Masuda, & Wang, 2011; Roberts & Liszka, 2013; Sagarra & Herschensohn, 2010; and VanPatten, Keating, & Leeser, 2012, for recent examples). Although SPR reaction times can reveal sensitivity effects, the segment-by-segment presentation in noncumulative SPR prevents readers from regressing to earlier parts of the sentence or skipping a segment. 6. One category—unfinished reading with regression—could only occur if the sentence contained a sentence-final region. As we explain in the Statistical Analysis section, this restriction was accounted for statistically. All other scanpath categories were attested for all the structures, and unfinished reading with regression occurred for all the structures that had a sentence-final region. 7. When a sentence did not have a sentence-final region but the reader regressed from the PIA or spillover, this was also considered finished reading with regression. The reason for this categorization is that the reader, in effect, read until the end of that sentence and then regressed. 8. To reduce the complexity of the model, and hence the probability that it would not converge, we opted not to include random slopes for subjects or random item effects. 9. The percentages we report henceforth have been derived from the multinomial regression output. This may explain small departures from the descriptive statistics.
Eye Movements and Grammaticality Judgments
293
10. Note that because of the noncumulative moving window procedure in this study (i.e., a word or phrase disappeared each time the next word or phrase appeared), participants could not regress during reading. Even so, reading times for particular interest areas revealed task effects. REFERENCES Agresti, A. (2013). Categorical data analysis (3rd ed.). New York, NY: Wiley. Bialystok, E. (1979). Explicit and implicit judgments of L2 grammaticality. Language Learning, 29, 81–103. Bialystok, E. (1982). On the relationship between knowing and using linguistic forms. Applied Linguistics, 3, 181–206. Bialystok, E. (1994). Analysis and control in the development of second language proficiency. Studies in Second Language Acquisition, 16, 157–168. Bley-Vroman, R. (1983). The comparative fallacy in interlanguage studies: The case of systematicity. Language Learning, 33, 1–17. Bowles, M. A. (2011). Measuring implicit and explicit linguistic knowledge. Studies in Second Language Acquisition, 33, 247–271. Cleeremans, A., & Jiménez, L. (2002). Implicit learning and consciousness: A graded, dynamic perspective. In R. M. French & A. Cleeremans (Eds.), Implicit learning and consciousness (pp. 1–40). Hove, UK: Psychology Press. Clifton, C. Jr., & Staub, A. (2011). Syntactic influences on eye movements in reading. In S. P. Liversedge, I. D. Gilchrist, & S. Everling (Eds.), The Oxford handbook of eye movements (pp. 895–909). Oxford, UK: Oxford University Press. Clifton, C. Jr., Staub, A., & Rayner, K. (2007). Eye movements in reading words and sentences. In R. P. G. van Gompel, M. H. Fischer, W. S. Murray, & R. L. Hill (Eds.), Eye movements: A window on mind and brain (pp. 341–372). Oxford, UK: Elsevier. Coughlin, C. E., & Tremblay, A. T. (2013). Proficiency and working memory based explanations for nonnative speakers’ sensitivity to agreement in sentence processing. Applied Psycholinguistics, 34, 615–646. DeKeyser, R. M. (2003). Implicit and explicit learning. In C. J. Doughty & M. H. Long (Eds.), The handbook of second language acquisition (pp. 313–348). Oxford, UK: Blackwell. DeKeyser, R. M. (2007). Situating the concept of practice. In R. M. DeKeyser (Ed.), Practicing in a second language: Perspectives from applied linguistics and cognitive psychology (pp. 1–18). New York, NY: Cambridge University Press. Dienes, Z., & Perner, J. (1999). A theory of implicit and explicit knowledge. Behavioral and Brain Sciences, 22, 735–808. Dussias, P. E. (2010). Uses of eye-tracking data in second language sentence processing research. Annual Review of Applied Linguistics, 30, 149–166. Dussias, P. E., & Piñar, P. (2010). Effects of reading span and plausibility in the reanalysis of wh-gaps by Chinese-English second language speakers. Second Language Research, 26, 443–472. Ellis, N. C. (Ed.). (1994). Implicit and explicit learning of languages. San Diego, CA: Academic Press. Ellis, N. C., Hafeez, K., Martin, K., Chen, L., & Boland, J. (2014). An eye-tracking study of learned attention in second language acquisition. Applied Psycholinguistics, 35, 547–579. Ellis, R. (1991). Grammaticality judgments and second language acquisition. Studies in Second Language Acquisition, 13, 161–186. Ellis, R. (2005). Measuring implicit and explicit knowledge of a second language: A psychometric study. Studies in Second Language Acquisition, 27, 141–172. Ellis, R., & Loewen, S. (2007). Confirming the operational definitions of explicit and implicit knowledge in Ellis (2005): Responding to Isemonger. Studies in Second Language Acquisition, 29, 119–126. Ellis, R., Loewen, S., Elder, C., Erlam, R., Philp, J., & Reinders, H. (Eds.). (2009). Implicit and explicit knowledge in second language learning, testing, and teaching. Bristol, UK: Multilingual Matters. Ferguson, C. J. (2009). An effect size primer: A guide for clinicians and researchers. Professional Psychology: Research and Practice, 40, 532–538.
294
Aline Godfroid et al.
Frazier, L., & Rayner, K. (1982). Making and correcting errors during sentence comprehension: Eye movements in the analysis of structurally ambiguous sentences. Cognitive Psychology, 14, 178–210. Frenck-Mestre, C. (2005). Eye-movement recording as a tool for studying syntactic processing in a second language: A review of methodologies and experimental findings. Second Language Research, 21, 175–198. Garnsey, S. M., Pearlmutter, N. J., Myers, E., & Lotocky, M. A. (1997). The contributions of verb bias and plausibility to the comprehension of temporarily ambiguous sentences. Journal of Memory and Language, 37, 58–93. Godfroid, A., & Winke, P. (in press). Investigating implicit and explicit processing using L2 learners’ eye-movement data. In P. Rebuschat (Ed.), Implicit and explicit learning of languages. Amsterdam, the Netherlands: Benjamins. Gutiérrez, X. (2013). The construct validity of grammaticality judgment tests as measures of implicit and explicit knowledge. Studies in Second Language Acquisition, 35, 423–449. Han, Y., & Ellis, R. (1998). Implicit knowledge, explicit knowledge and general language proficiency. Language Teaching Research, 2, 1–23. Hedgcock, J. (1993). Well-formed versus ill-formed strings in L2 metalingual tasks: Specifying features of grammaticality judgments. Second Language Research, 9, 1–21. Holmqvist, K., Holsanova, J., Barthelson, M., & Lundqvist, D. (2003). Reading or scanning? A study of newspaper and net paper reading. In J. Hyönä, R. Radach, & H. Deubel (Eds.), The mind’s eye: Cognitive and applied aspects of eye movement research (pp. 657–670). Amsterdam, the Netherlands: Elsevier Science. Holmqvist, K., Nyström, M., Andersson, R., Dewhurst, R., Jarodzka, H., & Van de Weijer, J. (2011). Eye tracking: A comprehensive guide to methods and measures. Oxford, UK: Oxford University Press. Hopp, H. (2010). Ultimate attainment in L2 inflection: Performance similarities between non-native and native speakers. Lingua, 120, 901–931. Hosmer, D. W., & Lemeshow, S. (2013). Applied logistic regression (3rd ed.). Hoboken, NJ: Wiley. Hulstijn, J. H., & Ellis, R. (Eds.). (2005). Implicit and explicit second-language learning [Special issue]. Studies in Second Language Acquisition, 27(2). Hyönä, J., Lorch, R. F. Jr., & Kaakinen, J. K. (2002). Individual differences in reading to summarize expository text: Evidence from eye fixation patterns. Journal of Educational Psychology, 94, 44–55. Jiang, N., Novokshanova, E., Masuda, K., & Wang, X. (2011). Morphological congruency and the acquisition of L2 morphemes. Language Learning, 61, 940–967. Johansson, R., Wengelin, Å., Johansson, V., & Holmqvist, K. (2010). Looking at the keyboard or the monitor: Relationship with text production processes. Reading and Writing, 23, 835–851. Juffs, A. (2005). The influence of first language on the processing of wh-movement in English as a second language. Second Language Research, 21, 121–151. Kaakinen, J., & Hyönä, J. (2005). Perspective effects on expository text comprehension: Evidence from think-aloud protocols, eye-tracking, and recall. Discourse Processes, 40, 239–257. Kaakinen, J., & Hyönä, J. (2008). Perspective-driven text comprehension. Applied Cognitive Psychology, 22, 319–334. Keating, G. D. (2009). Sensitivity to violations of gender agreement in native and nonnative Spanish: An eye-movement investigation. Language Learning, 59, 503–535. Kliegl, R., Grabner, E., Rolfs, M., & Engbert, R. (2004). Length, frequency, and predictability effects of words on eye movements in reading. European Journal of Cognitive Psychology, 16(1/2), 262–284. Leeser, M. J., Brandl, A., & Weissglass, C. (2011). Task effects in second language sentence processing research. In P. Trofimovich & K. McDonough (Eds.), Applying priming methods to L2 learning, teaching and research: Insights from psycholinguistics (pp. 179–198). Amsterdam, the Netherlands: Benjamins. Liberman, A. M. (2005). How much more likely? The implications of odds ratios for probabilities. American Journal of Evaluation, 26, 253–266. Lim, J., & Christianson, K. (2014). Second language sensitivity to agreement errors: Evidence from eye movements during comprehension and translation. Applied Psycholinguistics. Advance online publication. doi:10.1017/S0142716414000290
Eye Movements and Grammaticality Judgments
295
Loewen, S. (2009). Grammaticality judgment tests and the measurement of implicit and explicit L2 knowledge. In R. Ellis, S. Loewen, C. Elder, R. Erlam, J. Philp, & H. Reinders (Eds.), Implicit and explicit knowledge in second language learning, testing and teaching (pp. 94–112). Bristol, UK: Multilingual Matters. McLaughlin, B., Rossman, T., & McLeod, B. (1983). Second language learning: An informationprocessing perspective. Language Learning, 33, 135–158. Meseguer, E., Carreiras, M., & Clifton, C. (2002). Overt reanalysis strategies and eye movements during the reading of mild garden path sentences. Memory & Cognition, 30, 551–561. Mitchell, D. C., Shen, X., Green, M. J., & Hodgson, T. L. (2008). Accounting for regressive eye-movements in models of sentence processing: A reappraisal of the Selective Reanalysis hypothesis. Journal of Memory and Language, 59, 266–293. Omaki, A., & Schulz, B. (2011). Filler-gap dependencies and island constraints in second-language sentence processing. Studies in Second Language Acquisition, 33, 563–588. Paradis, M. (2004). A neurolinguistic theory of bilingualism. Amsterdam, the Netherlands: Benjamins. Phillips, C. (2006). The real-time status of island phenomena. Language, 82, 795–823. Rayner, K. (1998). Eye movements in reading and information processing: 20 years of research. Psychological Bulletin, 124, 372–422. Rayner, K. (2009). Eye-movements and attention in reading, scene perception, and visual search. The Quarterly Journal of Experimental Psychology, 62, 1457–1506. Reber, A. (1976). Implicit learning of synthetic languages: The role of instructional set. Journal of Experimental Psychology, Human Learning and Memory, 2, 88–94. Rebuschat, P. (2013). Measuring implicit and explicit knowledge in second language research. Language Learning, 63, 595–626. Rebuschat, P., & Williams, J. (Eds.). (2012). Statistical learning and language acquisition. Berlin, Germany: Walter de Gruyter. Reichle, E. D. (2011). Serial-attention models of reading. In S. P. Liversedge, I. D. Gilchrist, & S. Everling (Eds.), The Oxford handbook of eye movements (pp. 767–786). Oxford, UK: Oxford University Press. Reichle, E. D., Pollatsek, A., & Rayner, K. (2012). Using E-Z Reader to simulate eye movements in nonreading tasks: A unified framework for understanding the eye-mind link. Psychological Review, 119, 155–185. Reichle, E. D., Warren, T., & McConnell, K. (2009). Using E-Z Reader to model the effects of higher level language processing on eye movements during reading. Psychonomic Bulletin & Review, 16, 1–21. Roberts, L., & Liszka, S. A. (2013). Processing tense/aspect-agreement violations on-line in the second language: A self-paced reading study with French and German L2 learners of English. Second Language Research, 29, 413–439. Roberts, L., & Siyanova-Chanturia, A. (2013). Using eye-tracking to investigate topics in L2 acquisition and L2 processing. Studies in Second Language Acquisition, 35, 213–235. Sagarra, N., & Ellis, N. C. (2013). From seeing adverbs to seeing verbal morphology. Studies in Second Language Acquisition, 35, 261–290. Sagarra, N., & Herschensohn, J. (2010). The role of proficiency and working memory in gender and number agreement processing in L1 and L2 Spanish. Lingua, 120, 2022–2039. Sanz, C., & Leow, R. P. (Eds.). (2011). Implicit and explicit language learning: Conditions, processes, and knowledge in SLA and bilingualism. Washington, DC: Georgetown University Press. Schütze, C. T., & Sprouse, J. (2014). Judgment data. In R. J. Podesva & D. Sharma (Eds.), Research methods in linguistics (pp. 27–50). Cambridge, UK: Cambridge University Press. Shiffrin, R. M., & Schneider, W. (1977). Controlled and automatic human information processing. II. Perceptual learning, automatic attending and a general theory. Psychological Review, 84, 127–190. Skrondal, A., & Rabe-Hesketh, S. (2003). Multilevel logistic regression for polytomous data and rankings. Psychometrika, 68, 267–287. VanPatten, B., Keating, G. D., & Leeser, M. J. (2012). Missing verbal inflections as a representational problem: Evidence from self-paced reading. Linguistic Approaches to Bilingualism, 2, 109–140.
296
Aline Godfroid et al.
Vitu, F., & McConkie, G. W. (2000). Regressive saccades and word perception in adult reading. In A. Kennedy, R. Radach, D. Heller, & J. Pynte (Eds.), Reading as a perceptual process (pp. 301–326). Amsterdam, the Netherlands: Elsevier. Von der Malsburg, T., Kliegl, R., & Vasishth, S. (2014). Determinants of scanpath regularity in reading. Cognitive Science. Advance online publication. doi:10.1111/cogs.12208 Von der Malsburg, T., & Vasishth, S. (2011). What is the scanpath signature of syntactic reanalysis? Journal of Memory and Language, 65, 109–127. Warren, T. (2011). The influence of implausibility and anomaly on eye movements during reading. In S. P. Liversedge, I. D. Gilchrist, & S. Everling (Eds.), The Oxford handbook of eye movements (pp. 911–923). Oxford, UK: Oxford University Press. Wengelin, Å., Torrance, M., Holmqvist, K., Simpson, S., Galbraith, D., Johansson, V., & Johansson, R. (2009). Combined eyetracking and keystroke-logging methods for studying cognitive processes in text production. Behavior Research Methods, 41, 337–351. White, L., & Juffs, A. (1998). Constraints on wh-movement in two different contexts of nonnative language acquisition: Competence and processing. In S. Flynn, G. Martohardjono, & W. O’Neil (Eds.), The generative study of second language acquisition (pp. 111–129). Hillsdale, NJ: Erlbaum. Williams, J. N. (2009). Implicit learning in second language acquisition. In W. C. Ritchie & T. K. Bhatia (Eds.), The new handbook of second language acquisition (pp. 319–353). Bingley, UK: Emerald. Wright, R. D., & Ward, L. M. (2008). Orienting of attention. New York, NY: Oxford University Press. Zhang, R. (2014). Measuring university-level L2 learners’ implicit and explicit linguistic knowledge. Studies in Second Language Acquisition. Advance online publication. doi:10.1017/S0272263114000370
Eye Movements and Grammaticality Judgments
297
APPENDIX EXAMPLE SENTENCES AND INTEREST AREAS Structure Modal verbs Plural -s Verb complements Regular past tense Yes/no question Since and for Indefinite article Possessive ’s
Third person -s Adverb placement Comparatives Dative alternation Embedded questions Ergative verbs Question tags Relative clauses Unreal conditions
Example sentence I can / speak / French / very well. * I can / to speak / French / very well. Anne bought two / presents / for her / children. * Anne bought two / present / for her / children. Nate says he wants / to buy / a car next / week. * Nate says he wants / buying / a car next / week. Joseph / missed an / interesting / party last weekend. * Joseph / miss an / interesting / party last weekend Did Martin / visit / his father / yesterday? * Did Martin / visited / his father / yesterday? He has been living in New Orleans for / three / years. * He has been living in New Orleans since / three / years. I saw a very funny / movie / last night. * I saw very funny / movie / last night. Joseph flew to meet the President’s / advisor / in Washington. * Joseph flew to meet the President / advisor / in Washington. Anthony / lives with / his friend / Kevin. * Anthony / live with / his friend / Kevin. She / always likes / watching / television. * She / likes always / watching / television. This building is / bigger / than your / house. * This building is more / bigger / than your / house. The teacher explained / the problem to the students. * The teacher explained / the students the problem. She wanted to know why / he had / studied / German. * She wanted to know why / had he / studied / German. Her English vocabulary / increased / a lot last / year. * Her English vocabulary was / increased / a lot last / year. She is working very hard, / isn’t she? * She is working very hard, / is it? The book that Mary wrote / won the / prize. * The book that Mary wrote / it won the / prize. If she had worked hard, she / might have / passed / the exam. * If she had worked hard, she / may pass / the exam.
Note. The PIA is bolded and the spillover region is underlined. The sentence-initial region is the sentence segment that precedes the PIA. The sentence-final region is the segment, if any, that follows the spillover region. An asterisk (*) indicates an ungrammatical sentence.