John Benjamins Publishing Company
This is a contribution from The Mental Lexicon 5:3 © 2010. John Benjamins Publishing Company This electronic file may not be altered in any way. The author(s) of this article is/are permitted to use this PDF file to generate printed copies to be used by way of offprints, for their personal use only. Permission is granted by the publishers to post this file on a closed server which is accessible to members (students and staff) only of the author’s/s’ institute, it is not permitted to post this PDF on the open internet. For any other use of this material prior written permission should be obtained from the publishers or through the Copyright Clearance Center (for USA: www.copyright.com). Please contact
[email protected] or consult our website: www.benjamins.com Tables of Contents, abstracts and guidelines are available at www.benjamins.com
Using a maze task to track lexical and sentence processing Kenneth Forster
University of Arizona
A word maze consists of a sequence of frames, each containing two alternatives. Subjects are required to select one of those alternatives according to some criterion defined by the experimenter. This simple technique can be used to investigate a wide range of issues. For example, if one alternative is a word and the other is a nonword, the subject may be required to press a key to indicate where the word is. This provides an interesting variant of the lexical decision task, since the difficulty of the lexical discrimination can be manipulated on a trial-by-trial basis by varying the properties of the nonword alternative. On the other hand, a version of a self-paced reading task is created if each successive frame contains a word that can continue a sentence, and the subject is required to identify which word that is. Once again, by manipulating the properties of the incorrect alternative one may be able to control the mode of processing adopted by the subject. Although this is a highly artificial form of reading, it does allow one to study the sentence processing under more tightly controlled conditions. Keywords: lexical access, sentence processing, maze task
Investigating the mechanisms involved in lexical access typically involves measuring the time required for various properties of words to reach consciousness. These words are usually presented in isolation, without any surrounding context, and it is the artificiality of this situation that often attracts criticism. The usual defense of this approach is that the only way to assess the effect of context is to compare measures of access time when words are presented in context with measures obtained in the absence of context. However, it is not so easy to measure lexical access time during sentence processing. The most obvious method is to present an incomplete sentence for the subject to read, and after a suitable pause, to present a potential completion of the sentence. The subject’s task is to decide whether the final item was a word or not (e.g., Fischler & Bloom, 1979; Forster, 1981). The problem is that two different tasks are involved. One task is sentence The Mental Lexicon 5:3 (2010), 347–357. doi 10.1075/ml.5.3.05for issn 1871–1340 / e-issn 1871–1375 © John Benjamins Publishing Company
348 Kenneth Forster
comprehension, and the other is lexical decision. When the final item is presented, it may be that it is initially integrated into the prior context (a purely reflex action), whether this is relevant to the task or not. This process may have a higher priority than deciding whether the input is a word or not, and hence delay the lexical decision response, especially if the final item is a word that is semantically implausible given the prior context. As reported by some subjects, there is a temptation to respond “No” to such words because they do not “make sense”, or they are not what was expected. When measured relative to a neutral baseline, these words take longer to respond to. This conflict at the decision level can be avoided if we just measure how long the reader takes to read each word in the sentence, as in an eye-tracking experiment. Presumably, lexical access must be completed before the eyes move onto the next word, and therefore words that take longer to access should require longer fixations. However, things are not so straightforward. Some words (short words) may not be fixated at all. Another problem is that the time taken to process a word can be longer if the frequency of the preceding word is low, as if the gaze was shifted onto the next word before the current word had been completely accessed. This is referred to as a “spill-over” effect, and makes perfect sense since access could be completed during the time spent moving the eyes to the next fixation point. This means that the gaze duration is likely to underestimate the time required for access. In this type of experiment, the eyes are free to roam across the sentence in a very complex manner. We do not gaze at the first word, then the next, and continue in an orderly fashion until we get to the last word. Instead, the eyes move in a series of jumps (more technically, saccades) from one location to the next, sometimes skipping short words altogether, sometimes tracking back to earlier words in the sentence. The problem with this relatively unconstrained interaction with the printed words is that there are multiple indices of how long it took to retrieve and assimilate the information from the printed signal. One measure is the length of the initial fixation on a particular word, but the reader may well return to this word later, either directly, or by returning to a still earlier word. And very often there is more than one fixation on a given word, sometimes separated by fixations on other words. The question then is whether each individual fixation should be given the same weight. For researchers interested in how people inspect the text when reading normally, this complexity is the very stuff that they want to observe and understand. But for researchers who are more interested in the uptake of information from the printed words, it would be more convenient if the sentence was viewed in a more orderly manner. One way to achieve this is to present the sentence one word at a time, with no opportunity to look ahead, or to go back. This is the procedure used with event-related brain potentials (ERPs). In this case, we find that some
© 2010. John Benjamins Publishing Company All rights reserved
Language processing with a maze task 349
properties of the sentence are associated with different wave forms. For example, semantic violations affect the N400 component, but grammatical violations have little or no effect on this measure, but instead affect a quite different component of the wave form, the P600. While these correlations between sentence properties and brain activation are of great interest, they do not give us a direct indication of the time required to retrieve and assimilate the information contained within the words. For researchers who believe that variations in reaction times are the most revealing indicators of cognitive processes, the moving window self-paced reading technique is a more satisfactory technique (Just, Carpenter, & Woolley, 1982). In this case, the individual words of the sentence are exposed left to right, one at a time, but only as the reader presses a key to indicate that they have processed that word. Since only one word is visible at any time, the reader is prevented from returning to earlier words, and hence must integrate each word into the developing structure. Under these conditions, the time spent reading each word can be measured reliably. This technique has also been used widely, and to good effect. But for some researchers, there are unsettling aspects to the task. One problem with self-paced reading is that we need to assume that readers can synchronize their key presses with the integration process. This assumes that readers are aware of the precise time at which each word was integrated, and that the motor response can actually keep up with the mental events. What some subjects report is that they press the keys at a more or less constant rate, and that if they encounter difficulties, they slow down. This means that if a particular word causes processing difficulties, it may not be until one or two words downstream that the rate of key pressing slows down. A more extreme approach is that of socalled “tappers”, who proceed through the sentence at a constant rate, with very low variance in RTs. The problem here is that the mapping from mental events to key presses has a high error component. Another issue that arises is how can we tell whether the reader is actually processing the sentence accurately? This problem is addressed in most cases by including comprehension questions after some of the sentences. If participants get most of the questions right, we assume that they were reading the sentences carefully enough. But how much is “enough”? This obviously depends on the difficulty of the comprehension questions. It is therefore possible that the difficulty of these questions is an uncontrolled variable across experiments. Some investigators may use difficult questions that require readers to make inferences, while others may use very simple questions. This variation in difficulty may affect how carefully the subject reads the sentence. This notion of how carefully the sentence is read needs to be fleshed out. At one extreme, we have a style of reading best described as “skimming”. This style of
© 2010. John Benjamins Publishing Company All rights reserved
350 Kenneth Forster
reading is very fast, and may be good enough to detect whether the input sequence is well-formed or not, but may not be good enough to answer questions about the content. At the other extreme, both syntactic and semantic processing is very precise, the presuppositions and implications of the sentence are considered, and the fully interpreted surface structure is entered into working memory, ready to answer any comprehension question. Presumably, most researchers would want to avoid both of these extremes. For example, the typical instructions to participants in eye-tracking experiments are to read the sentences “naturally”, but this is not defined. Obviously, without any clear indication, the comprehension questions will then dictate how carefully the sentences are read. In what follows, we discuss a different task that eliminates many of these problems.
The Maze task The Maze task is an adaptation of the self-paced reading paradigm in which two items are presented on each frame, and the subject’s task is to rapidly choose one of those items according to some criterion defined by the experimenter. For sentence processing experiments, this criterion would be which word would be a grammatical continuation of the sentence. For example, if the subject is given the first word of the sentence (The), and then is presented with the pair weather on, then the subject would choose weather by pressing a key indicating that the grammatical continuation is the left-hand member of the pair. If the next pair was to in, then the subject would choose in, etc. This procedure is repeated until the end of the sentence is reached. If the subject makes an error, the trial is aborted, and the program switches to the next sentence. A demonstration of the technique is available at the following website: http://www.u.arizona.edu/~kforster/MAZE. Obviously, this is a highly unnatural task, and if our aim is to understand how people “normally” read, then this task is not very useful. But if our aim is to understand how sentences are processed (whether printed or spoken), then the task will be useful to the extent that it taps into the skills that are normally used in sentence comprehension and production. The degree to which it does so is an empirical matter. What are the critical features of this task? First, the subject is exposed to the sentence in a highly constrained manner, in which no look-ahead is possible, nor is it possible to look back to earlier sections of the sentence. So each word must be fully integrated into the prior context before moving onto the next word. One could imagine that the time taken to carry out this integration might be equivalent to the total time spent on a word in an eye-tracking experiment, including all
© 2010. John Benjamins Publishing Company All rights reserved
Language processing with a maze task 351
fixations. The fact that only one word is visible at any one time might be seen as a disadvantage, but all it means is that the estimates of processing time obtained in this task do not allow for look-ahead, and that the subject must remember the prior context. Any results obtained with this task must be interpreted in light of these limitations. The second feature of the task is that post-sentence comprehension questions are apparently unnecessary, since the task appears to be quite sensitive to structural variables, such as the difference between subject-extracted relative clauses and object-extracted relative clauses in English (Forster, Guerrera, & Eliot, 2009), and the processing difficulty created by number agreement (Nicol, Forster, & Veres, 1997). This implies that subjects are processing the sentences accurately, despite the absence of comprehension questions. The reason for this is simply that it would be impossible to make the right sequence of choices without accurate and careful processing of the sentence. A third feature is that the task provides meaningful information about the time taken to assimilate each individual word, something that is not readily available with other techniques. For example, in the study comparing subject and object relatives (Forster et al., 2009), a surprising result was that all of the difference between conditions was located on the definite article, not on the verb, as predicted by various theories. Thus, if we compare The tiger that the mouse frightened… with The tiger that frightened the mouse…, and we compare RTs to the, mouse, and frightened, then there is no difference on mouse or frightened, but there is a substantial difference on the. In an eye-tracking experiment, this effect would be difficult to detect, since often short words such as the are not fixated at all. This is exactly what was found by Staub (2010) in a similar experiment using eye tracking. However, although there was no effect of clause type on gaze duration for the determiner, a regressive saccade at the determiner was much more likely in object relatives, indicating processing difficulty at this point. This correspondence between tasks is encouraging, and the result makes sense, since the determiner is the first place where it becomes clear that a subject relative is not upcoming, suggesting that much of the advantage of a subject relative is due to its greater likelihood, not its structure. Finally, another advantage is that the response criterion is much more clearly defined. Instead of being asked to press the key when each word has been “understood”, the subject is asked which word is a better continuation of the sentence. This is something that is easily determined, especially if care is taken to make sure that the incorrect alternative is completely ungrammatical. Further, the clarity of the task requirements solves the problem of specifying how carefully the sentence should be processed. It must be processed carefully enough to decide which word continues the sentence. Thus we could have more confidence when comparing results across different laboratories. All that would be necessary would be to examine the nature of the incorrect alternatives used in each experiment.
© 2010. John Benjamins Publishing Company All rights reserved
352 Kenneth Forster
The G-maze and the L-maze Actually, it is possible to systematically vary how carefully the sentence must be processed. The type of maze we have considered so far is called a G-maze (G standing for grammaticality). In another version of the task, the incorrect alternative is not a word, but is a legal nonword (e.g., thamon). This is referred to as an L-maze (L standing for lexicality). In this case, there is no requirement at all to consider syntax — the correct choice can be made on lexical grounds. One might think that this would completely eliminate any syntactic effects, but this is not the case. In an L-maze, a sequence of words that makes a sentence is processed faster than a scrambled sequence, and the expected difference between subject relatives and object relatives is still present (Forster et al., 2009). However, the effects are smaller and less reliable, which is perhaps not surprising. The L-maze is particularly useful in two ways. First, it allows us to measure performance on semi-sentences of doubtful acceptability. For example, suppose one wanted to know which caused the most problems for the parser, failure of number agreement or gender agreement? Using a G-maze would be inappropriate, since we would have one frame on which both alternatives were ungrammatical. For example, if the target sentences were All of the men in the room tried to hide himself and The only woman in the room tried to hide himself, then we would want to compare responses to himself in each these ungrammatical contexts with performance on the same words in a grammatical context. However, the problem is how to specify the incorrect alternative on the final frame, since both alternatives would be ungrammatical. The response time would be heavily influenced by decision uncertainty, and the reader would have to guess which alternative was supposed to be correct. However, if an L-maze is used, this problem disappears. The incorrect alternative would be a legal nonword, so that the final frame could be something like “himself thamon”. An L-maze is also useful when working with a language in which word order is much more flexible than in English. In some languages, it can occasionally be very difficult to find an alternative that is completely impossible given the prior context. This is especially true in Japanese, where scrambling makes it virtually impossible to create a suitable set of alternatives. Recently, Jeff and Naoko Witzel reported the results of a Japanese sentence processing experiment using an L-maze, with robust effects (J. Witzel & N. Witzel, 2009). Xiaomei Qiao has also had similar success in Mandarin (Qiao, Shen, & Forster, submitted). This latter example raised a very interesting issue. Qiao used a G-maze to examine Chinese subject relatives and object relatives, and found that the relative clause region was processed faster in an object relative, but once the relativizer was reached (de, which occurs after the clause), there was a substantial reversal. This was interpreted as indicating
© 2010. John Benjamins Publishing Company All rights reserved
Language processing with a maze task 353
that subjects were not expecting to encounter a relativizer in an object relative, but were expecting one in a subject relative. However, when the same sentences were tested in an L-maze (the nonwords were novel combinations of characters), any difference on the relativizer disappeared altogether. If the interpretation of the G-maze data is correct, then this means that in the L-maze experiment, subjects were not trying to anticipate how the structure would unfold, whereas they were in the G-maze experiment. Thus, one might infer that a “wait-and-see” strategy is more likely to be used in an L-maze. What is attractive about this result is that we can apparently modify the way in which a sentence is processed by manipulating the properties of the incorrect alternative. This reinforces what was said earlier, namely that the conditions of the maze task offer a greater degree of experimental control over the strategies adopted by readers.
Disadvantages of the maze task The most obvious disadvantage of a maze task is that on each frame, two items have to be read instead of one. This may introduce so much noise that subtle effects might go unnoticed. But actually, it appears that exactly the opposite is true. In a series of experiments that compared the two versions of the maze task with eye tracking and moving-window self-paced reading, it was found that both the G-maze and the L-maze tasks detected the expected syntactic effects with great reliability, and the amount of variance accounted for compared very favorably with both eye tracking and self-paced reading techniques (N. Witzel, J. Witzel, & Forster, submitted). The main difference between the tasks appeared to be that in the maze tasks, the effect was always localized on the word that caused processing difficulty, whereas with the other tasks, this effect was sometimes distributed across the subsequent words. This result draws attention to the incremental nature of processing that is characteristic of maze tasks. Before moving from one frame to the next, the reader needs to have fully integrated the correct word into the context, and once this has been accomplished, there is no reason why subsequent words should be affected adversely. However, in an eye-tracking experiment it is obvious that the eyes could move from one word to the next without full integration, and this could lead to difficulty on subsequent words. In this study, there was one sentence type that caused problems for the maze tasks. This involved a structural ambiguity evident in sentences like The robber shot the jeweler and the salesman called the police. The expected result is that the jeweler and the salesman will be taken as the object of shot, which produces a garden-path effect at called. However, if a comma is inserted after jeweler, this garden-path effect is avoided. Thus sentences with a comma are easier to process than sentences
© 2010. John Benjamins Publishing Company All rights reserved
354 Kenneth Forster
without the comma. However, neither maze task showed any trace of this effect. This result probably reflects the very incremental nature of parsing that maze tasks encourage. What appears to be happening is that the reader closes each constituent as soon as possible, and hence the comma after jeweler is redundant. If this is right, then in the sentence The robber shot the jeweler and the salesman before the police arrived, problems should be encountered at before in a maze task, but in eye tracking this would not be the case. The important point to note therefore, is that the maze task does influence the processing style adopted by readers.
The maze task and lexical access Another issue where a maze task has provided useful information concerns the size of the frequency effect during sentence processing. Schilling, Rayner, and Chumbley (1998) compared the size of the frequency effect across different tasks, and found that isolated words in a lexical decision task produced a much larger effect of frequency on gaze duration than when the same words were embedded in a sentence. This has been interpreted as indicating that context reduces the size of the frequency effect (e.g., see Van Petten & Kutas, 1990). According to this view, the context provided by the sentence narrows down the range of possible words that could continue the sentence, with the result that the frequency effect gets smaller as one moves further into the sentence. However, when measured in a maze task, the size of the frequency effect obtained by Forster et al., (2009, Exp. 3) was 91 ms (the difference in selection time when the correct alternative was a high- or a lowfrequency word), which is much greater than the effect on gaze duration observed by Rayner, Liversedge, and White (2006) using the same sentences, which ranged from 46 to 50 ms over three experiments. One could explain this discrepancy by pointing out that in an eyetracking experiment, a saccade might be initiated before lexical access of the currently fixated word had been completed (Reichle, Pollatsek, Fisher, & Rayner, 1998). This would be more likely to occur for low-frequency words than high-frequency words, which would reduce the estimated size of the frequency effect on gaze duration, but would not adversely affect processing, since access could be completed during the saccade. However, this is not possible in a G-maze task, since the syntactic properties of each alternative must be evaluated, and this requires access to be completed before advancing to the next frame. One might then ask what is the size of the frequency effect when the sentence context is eliminated, and the task is lexical decision. To make the tasks more comparable, an L-maze can be used, in which each frame contains either a high- or a low-frequency word and a legal nonword, and the subject’s task is merely to indicate whether the word is on the left or right. There is no context, since the words in
© 2010. John Benjamins Publishing Company All rights reserved
Language processing with a maze task 355
each frame have no relation to the words on previous frames. Under these conditions, the frequency effect for the same set of words was 67 ms (Forster et al., 2009, Exp. 3). So it appears that a sentence context increases rather than decreases the frequency effect, suggesting that frequency might also affect how rapidly a word can be integrated into the context. This procedure of using an L-maze suggests an interesting variant of the normal lexical decision task. Essentially it alters the task from a Yes/No task to a twoalternative forced-choice task. One advantage is that this may increase sensitivity, since even if the subject is uncertain about the spelling of the word, they can still be confident in their decision, since the alternative is clearly less like a word. Another advantage is that it is possible to vary the difficulty of the task by manipulating the properties of the incorrect alternative. For example, Stone and Gorraiz (2008) compared frames such as nurse-grean with frames such as nurse-glump, and found slower responses when the incorrect alternative was a pseudohomophone. This procedure makes it possible to examine the effect of different types of foils in a withinsubjects design, instead of the between-subjects design that is normally used.
Using a maze task for language learning Practice makes perfect, and language skills are no exception. However, creating opportunities for student-initiated practice is a challenge for foreign language teachers, especially in the case of on-line language production skills. For example, although students can be given practice in comprehending a particular construction, it is another matter altogether to arrange practice sessions in producing a particular sentence structure. Recently, we have considered the possibility that a maze task might be one way of providing such practice. The ideal situation would be one where the task materials were available to the student over the internet. Preliminary work using such an arrangement has provided encouraging results (Forster & Enkin, 2010). Students in an introductory Spanish class were given a homework assignment that required them to complete a total of 20 Spanish maze sentences involving a particular relative clause construction. This had to be completed twice each week, for three weeks. In the final week, performance on a new set of sentences was compared for practiced and non-practiced constructions. The instructor was able to monitor progress since the results of each session were automatically emailed back to the instructor. The results indicated substantial improvement on the practiced construction relative to the non-practiced construction, although both constructions were emphasized in formal classwork. How do we think that learning could take place via the maze procedure? Think of the parser as a push-down store automaton. The instructions (or rules) for such
© 2010. John Benjamins Publishing Company All rights reserved
356 Kenneth Forster
an automaton specify that when it is in state Si, reading input symbol a, and with symbol b as the uppermost symbol in the push-down store, then it can switch to state Sj, and output x. Learning how to produce sentences in a language involves not only building such a device, but also strengthening the connections between states so that the device transitions from one state to another more rapidly and reliably. Formal instruction may be the best way to help the student build such a device, but extensive practice in using the device is necessary to reach a high level of competence, and the maze task may have a role to play here. One might say that formal instruction creates representations in explicit (declarative) memory, but practice converts these into implicit (procedural) representations.
Conclusion The maze task clearly has a number of applications, and there may be many more to come. Despite the long RTs that are necessarily involved in discriminating between two alternatives, the task appears to be remarkably sensitive, and may even out-perform other tasks in some situations. However, it should be seen as just one alternative among a battery of tasks that can be applied to a given problem.
References Fischler, I., & Bloom, P. A. (1979). Automatic and attentional processes in the effects of sentence contexts on word recognition. Journal of Verbal Learning and Verbal Behavior, 18, 1–20. Forster, K. I. (1981). Priming and the effects of sentence and lexical contexts on naming time: Evidence for autonomous lexical processing. Quarterly Journal of Experimental Psychology, 33A, 465–495. Forster, K.I., & Enkin, E. (2009). The Spanish Maze task: An investigation of a L2 practice effect. Unpublished report, University of Arizona. Forster, K. I., Guerrera, C., & Elliot, L. (2009). The Maze Task: Measuring forced incremental sentence processing time. Behavior Research Methods, 41, 163–171. Just, M. A., Carpenter, P. A., & Woolley, J. D. (1982). Paradigms and processes in reading comprehension. Journal of Experimental Psychology: General, 111, 228–238. Nicol, J. L., Forster, K. I., & Vereš, C. (1997). Subject–verb agreement processes in comprehension. Journal of Memory & Language, 36, 569–587. Qiao, X., Shen, L., & Forster, K. I. (2009). Relative clause processing in Mandarin: Evidence from the Maze Task. Manuscript submitted for publication, University of Arizona. Reichle, E. D., Pollatsek, A., Fisher, D. L., & Rayner, K. (1998). Toward a model of eye movement control in reading. Psychological Review, 105, 125–157. Rayner, K., Liversedge, S. P., & White, S. J. (2006). Reading and disappearing text: Cognitive control of eye movements. Vision Research, 46, 310–323.
© 2010. John Benjamins Publishing Company All rights reserved
Language processing with a maze task 357
Schilling, H. E. H., Rayner, K., & Chumbley, J. I. (1998). Comparing naming, lexical decision, and eye fixation times: Word frequency effects and individual differences. Memory & Cognition, 26, 1270–1281. Staub, A. (2010). Eye movements and processing difficulty in object relative clauses. Cognition, 116, 71–86. Stone, G., & Gorraiz, M. (2008, November). Are Two Stimuli Better than One? Forced Choice Lexical Decisions. Poster presented at the 49th Meeting of the Psychonomic Society, Chicago. Van Petten, C., & Kutas, M. (1990). Interactions between sentence context and word frequency in event-related brain potentials. Memory & Cognition, 18, 380–393. Witzel, N., Witzel, J., & Forster, K.I. (2009). Comparisons of online reading paradigms: Eye tracking, moving-window, and maze. Manuscript submitted for publication, University of Arizona. Witzel, J., & Witzel, N. (2009, March). Pre-Head Gap Filling in Japanese Sentence Processing. Poster presented at the CUNY Sentence Processing Conference.
Author’s address Kenneth Forster Department of Psychology University of Arizona Tucson, Arizona USA 85721
[email protected]
© 2010. John Benjamins Publishing Company All rights reserved