Strategic Control of Processing in Word Recognition

3 downloads 0 Views 4MB Size Report
Jonasson, 1978; C. T. James, 1975; McQuade, 1981;. Shulman & Davison ...... example, Patterson, Seidenberg, and McClelland (1989) ac- counted for acquired ...
Journal of Experimental Psychology: Human Perception and Performance 1993, Vol. 19, No. 4, 744-774

Copyright 1993 by (he American Psychological Association, Inc. 0096-1523/93/53.00

Strategic Control of Processing in Word Recognition Gregory O. Stone and Guy C. Van Orden Strategic control of word recognition in a lexical decision task was examined by manipulating the similarity of nonword foils to real words (nonword lexicality). Overall correct reaction times to words and the advantage of high- over low-frequency words were greater when nonword foils were more wordlike. This was true for both illegal (BTESE) versus legal (DEEST) nonword foils and legal nonword versus pseudohomophone (BEEST) foils. The same pattern of results was replicated in a 2nd experiment in which the word targets were always irregular (e.g., HAVE). A 3rd experiment demonstrated a large frequency blocking effect for low-frequency words, given pseudohomophone foils. The results are applied to pathway selection and random-walk frameworks. For both frameworks, canonical models are developed, which characterize qualitative predictions of broad classes of models within that framework. We argue for a pluralistic approach to theory development that moves from lower to higher order isomorphisms between data and theory.

Strategic control of processing, in response to task demands or variable goals, increases the flexibility and power of an information-processing system. However, this power poses problems for scientific study of human information processing. A theory that too freely invokes strategies can account for any possible empirical pattern (Besner, 1984). Difficult results can be attributed post hoc to some strategy. The ability to obtain inconsistent empirical results is generally considered an essential characteristic of scientific theories (Popper, 1959; Suppe, 1977; cf. Lakatos, 1970). The most common restriction on freedom in invoking strategic control assumes processing is divided among specialized modules that operate independently of each other and of top-down constraints like goals or task demands (Fodor, 1983). Strategic control determines which modules (or pathways through modules) participate in a given situation. This constrains the treatment of strategic control in several ways. First, it limits the range of mechanisms invoked. Second, the implications of empirical results for processing models are often straightforward. Third, recognition of independent modules is generally easier than an account of the mecha-

nisms within modules. For example, consensus is greater on the modules in visual word recognition than on the internal operation of those modules (cf. Carr & Pollatsek, 1985). Fourth, modules accounting only for performance in a single, artificial condition have a blatantly ad hoc quality. In the present experiments, we investigated strategic control in a lexical decision task (i.e., subjects discriminated words [e.g., BEAST] from nonwords [e.g., DEEST]). To ensure that observed results were caused by differences in strategy, we chose only ideal strategy manipulations. Across all levels of an ideal strategy manipulation, the trials analyzed are physically identical (e.g., same stimuli, durations, and visual quality) and always require the same response (e.g., a lexical decision keypress). Because the stimulus-driven information and the response are identical across the manipulation, some top-down change must be responsible for any observed difference in performance. For example, when measuring performance on word trials, nonword foil lexicality (i.e., the similarity of the nonword foils to true words) is an ideal strategy manipulation. The word trials are identical across levels of the nonword foil lexicality manipulation—the only difference is in the nonword foil stimuli, which complete the overall stimulus set.1 It is important to emphasize that this difference need not be conscious or deliberate on subjects' part. Subjects may settle on what works best without awareness of what, precisely, this entails. Subjects' deliberate strategies form a subset of a broader class of processing differences that cannot be attributed to direct effects of the stimulus being processed. Nonword foil lexicality affects the recognition of words and interacts with other factors that influence recognition of words (Andrews, 1982; Davelaar, Coltheart, Besner, &

Gregory O. Stone and Guy C. Van Orden, Department of Psychology, Arizona State University. This research was supported in part by Arizona State University Summer Research Grant SF 89-30 and National Institute of Health FIRST Award CMS 5 R29 NS26247 to Guy C. Van Orden. Special thanks to Ken Forster. Conversations with him helped to bring the issue of strategy to the forefront of our thinking. Thanks also to Ken Paap, Sally Andrews, Steve Goldinger, and Roger Schvaneveldt for valuable discussions of the present results and ideas. Derek Besner, Ken Paap, and anonymous reviewers also deserve thanks for a particularly helpful review process. Thanks are also in order to Patrice Gibbs for help in the construction of the stimulus set for Experiment 2 and to T. M. G., Greg Matthews, and Jim Lesser for testing subjects. Correspondence concerning this article should be addressed to Gregory O. Stone, Department of Psychology, Arizona State University, Tempe, Arizona 85287.

1

The argument is not that strategic control cannot be induced by stimulus-based factors. Indeed, it seems unlikely that subjects would use the same strategy for stimuli close to visual threshold and clearly perceptible stimuli. 744

745

STRATEGIC CONTROL IN WORD RECOGNITION

Jonasson, 1978; C. T. James, 1975; McQuade, 1981; Shulman & Davison, 1977; Shulman, Hornak, & Sanders, 1978; Stone & Van Orden, 1989, 1992). Two contrasts have been used: legal nonword foils (orthographically regular and pronounceable, such as DEEST) versus illegal nonword foils (orthographically irregular or unpronounceable, such as BTESE) and legal nonword foils versus pseudohomophone foils (e.g., BEEST). The former comparison has generally been used to investigate lexical versus nonlexical processing of words. The latter comparison has generally been used to investigate the role of phonology in word recognition. In the present experiments, we used each of these three types of nonword lexicality (i.e., illegal, legal, and pseudohomophone) as foils under otherwise identical conditions.2 Our focus was the effect of nonword foil lexicality on word trials. In addition to considering the main effect of nonword foil lexicality on correct reaction times (RTs) to words, we used word frequency as a measure of lexical processing. Shulman and Davison (1977) performed one of the first tests for a nonword foil lexicality effect. They noted that processing in a phonologic module would distinguish words from nonwords when nonwords were unpronounceable (BTESE), but not when nonwords were pronounceable (DEEST). As a result, the output from the phonologic module could be used by the decision process (Pathway N in Figure 1) when nonword foils were unpronounceable, but not when they were pronounceable. Given faster completion of phonologic processing than of lexical processing, correct RTs to words would be faster given unpronounceable nonword than to pronounceable nonword foils. Furthermore, because semantic priming is presumably localized in lexical processing, the priming effect for correct word RTs should be smaller (or absent) given unpronounceable nonword versus pronounceable nonword foils. Shulman and Davison tested these predictions of faster correct RTs to words and reduced priming given unpronounceable nonword foils. The data fit the predictions.3 A similar argument applies to comparison of word trials given legal nonword (DEEST) versus pseudohomophone (BEEST) foils. Many theories of word recognition assume a semantic lexicon may be accessed in at least two ways. First, the lexicon may be accessed directly using a visually based representation of the stimulus (Pathway D in Figure 1). Second, access may be phonologically mediated (Pathway P). In other words, a visually based representation of the stimulus is used to generate a phonologic representation, which is in turn used to access the semantic lexicon. When the phonologic representation is assembled to reflect the rulelike structure of orthographic-phonologic correspondences, and the phonologic representations of pseudohomophones are identical to those of their corresponding words, phonologically mediated access to the lexicon (Pathway P) is counterproductive on pseudohomophone trials. McQuade (1981; see also Carr, Davidson, & Hawkins, 1978) argued that phonologic access to the lexicon (pathway P in Figure 1) should not occur given pseudohomophone foils. However, the reasoning in this case is more involved

DECISION process

N

ASSEMBLED PHONOLOGY

P

LEXICON

VISUAL

processes Figure 1. A canonical strong pathway selection processing model for lexical decisions with two modules (whose internal operation is strategy invariant) and three pathways to a response through those modules. (The labels assigned to the routes between modules indicate participation in a given pathway. Pathway D is the direct lexical pathway. Pathway N is the nonlexical pathway.)

than in the case considered by Shulman and Davison (1977). Shulman and Davison could assume that the nonlexical pathway through the phonologic module (Pathway N) was faster than the pathways based on lexical processing (Pathways D and P). But, in this case, the pathway based on phonologically mediated lexical access (Pathway P) is generally considered to be no faster than the pathway (Pathway D) based on direct lexical access (Allport, 1977; M. Coltheart, Davelaar, Jonasson, & Besner, 1977; McCusker, Hillenger, & Bias, 1981; Seidenberg, 1985; Seidenberg, Waters, Barnes, & Tanenhaus, 1984), for both theoretical reasons (i.e., it requires an extra stage of processing) and empirical reasons (e.g., little effect of variables associated with phonology on lexical decision times to words; Seidenberg et al., 1984). 2

A single dimension of nonword lexicality does not imply that nonwords differ from words along a single dimension. The similarity of nonwords to words is almost certainly multidimensional. For example, one can distinguish between similarity to words in general and similarity to specific words and between visual and phonologic similarity. However, our reasoning requires only that nonword lexicality qualifies as an ordinal scale. Thus, for present purposes, nonword lexicality is treated as a single dimension. 3 Shulman and Davison (1977) are sometimes cited as having demonstrated a reduced frequency effect with illegal nonword foils. This manipulation was not, in fact, performed.

746

GREGORY O. STONE AND GUY C. VAN ORDEN

If phonologically mediated access to the lexicon is no faster (or is slower) than direct access, would elimination of phonologically mediated access have no effect on (or even improve) performance on word trials? As we shall show, the answer to this question depends on how one conceptualizes modules and the pathways through them. Previous empirical studies that have addressed this question have produced mixed results. C. T. James (1975) reported slower correct RTs to words given pseudohomophone versus legal nonword foils. On the other hand, Andrews (1982) reported faster correct RTs to words when the nonword foils included pseudohomophones. However, Andrews also obtained a speed-accuracy trade-off that might have been responsible for the significant RT effect.4 Consistent with this possibility, Davelaar et al. (1978) and McQuade (1981) both reported no significant difference in correct RTs to words given pseudohomophone versus legal nonword foils. Unfortunately, each of these results can only be taken as suggestive. The comparisons made by C. T. James and Andrews were across experiments; McQuade's analysis was an auxiliary analysis of an experiment designed to test performance on nonword trials; and Davelaar et al. always presented legal nonword foils at the beginning of the experiment and pseudohomophone foils at the end. We avoided each of these limitations in the present experiments. In the following sections, we consider the theoretical issues relevant to the question of how differential use of phonologically mediated lexical access influences performance on word trials; In the next section, we distinguish three types of modules based on the time course of input to the module. We then consider ways multiple pathways through these modules can be used in making a decision. We develop a simple model that is canonical with respect to nonword foil lexicality. This canonical model gives the same qualitative predictions, with respect to nonword foil lexicality, as many comprehensive models of word recognition. Furthermore, comprehensive models differing from the canonical analysis can be evaluated in terms of modifications to the canonical model. Thus, many relevant models of word recognition can be fairly evaluated with respect to predictions about nonword foil lexicality without considering each model individually. We used this framework to generate predictions for pathway selection models of word recognition in Experiment 1. Modules A module can be defined as a special-purpose process operating independently of other such special-purpose processes. Obviously, processing within a module depends on the input it receives, which is generally the output of other modules. Input to a module has two aspects: the information concerning stimulus identity and the time course of this information. This distinction suggests three types of module, based on assumptions about the time course of input: (a) strong-sense modules, (b) moderate-sense modules, and (c) weak-sense modules. In strong-sense modules, processing depends only on the identity information. For this to be true, processing cannot

begin until all input is available. The time course of input can affect only the time at which processing begins. Strong sense modules were central to Steinberg's (1969) additive-factors logic. In moderate-sense modules, output from each source module becomes available at a discrete point in time. Unlike strong-sense modules, moderate-sense modules begin processing when the first pulse of input arrives. As a result, processing within a moderate-sense module is sensitive to the relative finishing times of the modules providing it with input. For example, a lexicon that receives a completed visual representation at time t: and a completed phonologic representation at time t2 is a moderate-sense module. In dualaccess models of word recognition, predictions about the role of phonology depend on the relative times phonologic and visual information arrive at the lexicon (Allport, 1977; M. Coltheart et al., 1977; McCusker et al., 1981). In weak-sense modules, identity information from a single source can accumulate over time (McClelland, 1979). Thus, processing is sensitive to the time course of processing within other modules. For example, if a visual feature module provides input to a lexical module that includes both local feature information and global envelope information (Healy, 1981), processing in a weak-sense lexical module will be sensitive to whether local or global features are extracted first by the visual feature module (cf. Miller, 1981; Navon, 1977). In contrast, processing in a moderate-sense lexical module will be sensitive to the identity information concerning local versus global features but not to their relative time course of extraction because they are part of the same input pulse from the visual feature module. The concept of module ceases to apply in any sense when a process can feed information back to processes that provide it with input (as does the word level in the interactive activation model: McClelland & Rumelhart, 1981; Rumelhart & McClelland, 1982). Such a process has considerable influence over the identity and time course of its own input. Indeed, two processes sharing information in a feedback loop are interdependent.

Pathway Selection If processing within modules depends only on stimulusdriven input, how can strategy affect processing? In principle, an executive process could control a parameter of processing within a module. For example, an executive process could regulate the match criterion in a matching (verification) process (cf. Ratcliff, 1978). In practice, however, modules are usually assumed to be strategy invariant. In other words, if a module receives the same input (identity and time course) under two different strategies, it must operate identically in both cases. However, if a module receives input from several sources, eliminating one source through strategic control changes the 4 The error scores for Andre ws's (1982) Experiments lAand IB were mistakenly switched. Correct values can be found in her Appendix (S. Andrews, personal communication, June 23, 1991).

STRATEGIC CONTROL IN WORD RECOGNITION overall input to the module. Thus, processing can be controlled without violating the assumption of strategy-invariant modules. In a network of modules, a number of pathways through modules can be identified. A pathway is a sequence of modules that lead to a response. If a link between two modules is effectively eliminated by strategic control, any pathway passing through both modules is eliminated. For example, if phonologic input is not used by a lexical module, the phonologically mediated lexical pathway (Pathway P in Figure 1) is eliminated. This change in strategy can alter both the results and the time course of overall processing, despite the strategy invariance of all the modules.5 The pathway selection approach to strategic control can be motivated by a simple observation. In a given experimental situation, a pathway may provide information that is counterproductive, or irrelevant, to the decision being made. For example, if the non words in a lexical decision task are always unpronounceable, Pathway N in Figure 1 would reliably indicate the correct lexical decision. On the other hand, if nonwords are always pronounceable, pathway N would indicate pronounceability for all stimuli and could not be productive in a lexical decision (Shulman & Davison, 1977). The optimal strategy for a given experimental situation is easily identified. It includes only those pathways in a model that confer an overall benefit. Counterproductive pathways are excluded.6 Our purpose in the present experiments was to investigate strategic control using pathway selection by holding to the strong assumption that strategic control in word recognition is always accomplished through pathway selection (this assumption is hereafter referred to as the strong pathway selection, or SPS, hypothesis). The SPS hypothesis cannot be formally falsified—one can always, at worst, posit as many pathways as there are strategy conditions. However, the appeal of pathway selection is the prospect that a reasonable number of modules might provide a principled account of the empirical literature on word recognition. The most comprehensive models of word recognition include a fairly large number of modules (e.g., Besner & Johnston, 1989, with 10; Carr & Pollatsek, 1985, with 8; and, Paap, McDonald, Schvaneveldt, & Noel, 1987, with 9). Although these numbers are not unacceptably large considering the models' large empirical domains, the authors have frankly acknowledged the growing complexity of these comprehensive models. If the results of the present study can be accommodated by pathways through previously proposed modules, one may hope that a reasonable number of modules may ultimately account for the full spectrum of results in word recognition. On the other hand, if the results required the addition of too many new modules, we would be motivated to at least consider alternatives to pathway selection (Besner & Johnston, 1989). This would not constitute rejection of pathway selection as a mechanism for strategic control. Rather, it would suggest that equal consideration of alternative approaches to strategic control might be necessary to maintain parsimony and power in models of human information processing. Initially, theoretical analysis is solely in terms of the SPS

747

hypothesis (an alternative is considered in the General Discussion section). In principle, most of our colleagues are probably unwilling to commit to so strong a position. In practice, pathway selection is almost always the approach of choice when dealing with strategy. Other alternatives are generally considered only when it is conceptually obvious that effects are localized in the decision process (e.g., stimulus probability effects) or when the pathway selection account is conceptually unappealing to the theorist (e.g., proliferation of special purpose lexicons). Our purpose in initially working exclusively in terms of the SPS hypothesis was not to investigate a "straw man" position. Rather, it is our hope that our restricting ourselves to this strong position will motivate approaches to strategic control that put a number of possible mechanisms for strategic control (including pathway selection) on more equal footing.

Overall Finishing Times With Pathway Selection Predictions about how pathway selection affects RTs depend on how multiple pathways are used in making a decision. There are three qualitatively distinct cases. First, if the modules are strong-sense modules, addition of a pathway can only slow RTs. For example, with input via direct visual and phonologically mediated pathways (D and P in Figure 1), the lexical module would not begin processing until the input from both pathways was available. Adding a nonlexical pathway (N) could only slow RTs, even if it is usually faster than the lexical pathways (D and P). In formal terms, the maximum of two random variables must be greater than or equal to either random variable alone for every trial. In this case, the optimal strategy is to use only the fastest reliable pathway. Strong-sense modules are virtually never posited in current theories of word recognition, because a number of key findings are generally understood in terms of early versus late contributions to lexical processing. Therefore, we do not consider further the strong-sense module assumption. Second, a single pathway may be selected for use on each trial (Glanzer & Ehrenreich, 1979) with the probability of selecting a given pathway determined by strategic control. The finishing time distribution for trials involving a given pathway is the finishing time distribution for that pathway. 5

Of course, the process responsible for selecting active pathways (e.g., a limited-capacity central processor; cf. Kahneman & Triesman, 1983; Norman & Bobrow, 1975) must be strategy sensitive. This does not diminish the value of assuming that all other aspects of processing are strategy invariant. 6 Stroop effects suggest that an optimal strategy may not always be possible. This assumes optimality is defined solely in terms of performance costs (minimizing errors and reaction time). Factors like the amount of effort required (e.g., in suppressing lexical processing) could also affect optimality. Consistent with this, Stroop effects are sensitive to stimulus probability, with larger Stroop effects found when the inconsistent condition occurs less frequently (Cheeseman & Merkle, 1986), so that the overall performance cost of not suppressing lexical processing is smaller. See also Neisser's (1976) account of Stroop effects.

748

GREGORY O. STONE AND GUY C. VAN ORDEN

Because the pathways are independent, the overall distribution of finishing times is the sum of the distributions for each pathway weighted by the probability of using that pathway. Such a system is easy to analyze because the mean overall finishing time will be the weighted sum of the mean finishing times for the pathways used. An unused pathway simply has a zero probability of use. If an added pathway is faster than the previous pathway(s), overall finishing time can only decrease. If an added pathway is slower than the previous pathway(s), overall finishing time can only increase. Shulman and Davison (1977) used this logic to argue that the nonlexical pathway (N in Figure 1) must be faster than the lexical pathways (D and P). Third, multiple pathways may be used on each trial. The two most common approaches of this type assume race-horse processes and integrative processes. In a race-horse process, the first pathway to finish is used. In an integrative process, information from the pathways is combined as it arrives. The race-horse approach is most natural given moderate-sense modules, because information arrives in discrete pulses that represent the completed results of processing in the source module. Selection of a single pulse can, in principle, determine the response. The integrative approach is most natural given weak-sense modules. Incomplete information is arriving over time and must be accumulated to make a viable response. Because these two approaches make the same qualitative prediction—that the addition of a pathway can only benefit decision times—we consider them jointly as the multipathway selection approach. The qualitative predictions of the multipathway approach depend on two points. First, adding a sufficiently slow pathway has no effect on overall finishing times. Second, the benefit from adding a pathway is a monotonically nondecreasing function of the speed of the added pathway. We illustrate these points using addition of the phonologically mediated lexical pathway (P in Figure 1) to previous use of only a direct visual lexical pathway (D). This corresponds to the legal nonword (both P and D) versus pseudohomophone (D only) foil contrast. Consider the hypothetical case in which Pathway Pis much slower than Pathway D. The analyses for race-horse and integrative processes differ but lead to the same basic conclusion. With a race-horse process, Pathway P never finishes before Pathway D—it always wins the race. The overall finishing time distribution is simply the finishing time distribution for Pathway D, which was the finishing time distribution prior to adding Pathway P. There is no cost (or benefit) to overall finishing time from adding Pathway P. With an integrative process, finishing time is not an intrinsic property—information simply accumulates over time. However, finishing time can be defined as the time required to accumulate some criterion amount of information (or evidence). The faster evidence accumulates, the faster the finishing time. If Pathway P is so slow that it never contributes evidence before Pathway D has contributed enough to reach criterion, finishing times will be the same as those for Pathway D alone. Again, there is no cost (or benefit) to overall finishing time from adding Pathway P. Thus, unlike probabilistic pathway selection, addition of a slow pathway in a

multipathway process cannot slow overall finishing time as long as both pathways contribute to correct responses. Consider now adding a Pathway P that is fast enough to influence overall finishing time. If the added pathway is made to finish more quickly, overall finishing time must be speeded as well. With a race-horse process, decreasing the finishing time of Pathway P increases the probability it will finish before Pathway D. This increases the number of trials for which overall finishing time is faster than the finishing time for Pathway D alone. As a result, the benefit from adding Pathway P must increase when the finishing time of Pathway P is decreased. With an integrative process, the benefit from adding Pathway P can be thought of in terms of a decrease in the evidence required from Pathway D to reach criterion. Decreasing the finishing time of Pathway P increases the evidence from Pathway P at any given time and so decreases the amount of evidence required from Pathway D. This increases the gain in overall finishing time relative to the finishing time given only Pathway D. Again, the benefit from adding Pathway P must increase when the finishing time of Pathway P is decreased. Our objective in the preceding discussion was to demonstrate that race-horse and integrative processes make the same qualitative predictions with respect to pathway selection and can thus be considered together as the multipathway approach. However, it is worth noting a quantitative difference between the two approaches. Compare addition of Pathway P to Pathway D given a race-horse versus an integrative decision process. In the race-horse system, Pathway P will affect overall finishing times only when it finishes before Pathway D. In the integrative system, Pathway P can impact overall finishing times if it begins to provide evidence before Pathway D finishes. Thus, if all else is equal, an integrative process generates larger benefits when a pathway is added than does a race-horse process. Three final clarifications are needed to demonstrate the generality of the preceding analysis. First, the preceding analysis was predicated on the assumption that pathways are selected to optimize overall performance. Given this, it was shown that pathway selection in a multipathway approach can only benefit performance. However, a pathway can be counterproductive for some stimuli but confer a benefit for most of the stimuli. In this case, the globally optimal strategy might be to use that pathway, despite the cost in some conditions. Paap and Noel (1991) illustrated how this might occur. Assume that Pathway D is relatively automatic whereas Pathway P requires limitedcapacity attention. Imposing a memory load during a naming task should disable Pathway P much more than Pathway D. Most stimuli should suffer from this inability to use the optimal strategy. However, use of Pathway P hurts pronunciation of inconsistent words. Thus, stimuli whose pronunciation is normally slowed by the generally beneficial use of Pathway P (low-frequency inconsistent words) should actually benefit from the inability to use Pathway P. Paap and Noel's data were consistent with this analysis (cf. Lukatela & Turvey, 1993). The key point is that the benefit-only argument for multipathway approaches applies only when the added pathway

749

STRATEGIC CONTROL IN WORD RECOGNITION

Table 1 Pathways to a Response in Lexical Decisions Viability for nonword foil Pathway D: Direct lexical P: Phonologically mediated lexical N: Nonlexical Note. V = visual processes, L = lexical

Configuration

Illegal

Legal

Pseudohomophone

V -* L ^ De Yes Yes Yes V -*• Ph -> L -> De Yes Yes No V -* Ph -> De Yes No No module, De = decision process, Ph = phonologic module.

provides evidence consistent with a correct response in a given condition. Thus, adding a pathway can produce a cost in some conditions as long as it produces an overall benefit across all conditions.7 Second, the preceding qualitative analysis holds when pathways merge at different points in processing. For example, assume there is a single output from the lexical module, which reflects processing via both Pathways D and P. In this case, the decision process can combine a merger of Pathways D and P with Pathway N. The qualitative predictions in this case are the same as those for a decision process receiving separate input from each lexical pathway. If Pathway P has no impact on lexical finishing times, it clearly cannot affect overall finishing times. Likewise, if the finishing time of Pathway P is decreased, the combined lexical finishing time will decrease. Decreasing the finishing time of this merged pathway will still decrease overall finishing time. In fact, because the qualitative analysis does not depend on the distinction between race-horse and integrative processes, the preceding analysis holds even if, for example, the lexical module integrates input from Pathways D and P but the decision process is a race-horse selection of a nonlexical pathway or the merged lexical pathways. Third, qualitative analysis of pathway selection is not affected by the combination of several pathways in a comprehensive model into a single pathway in a canonical model, as long as those pathways share the same pattern of use across conditions. For the strong-sense-module, probabilistic selection, race-horse, and linear integrative approaches, this property is a straightforward result of associativity. For example, let FD1, FD2, and FP be the finishing-time random variables for one comprehensive, direct lexical pathway (Dl); another comprehensive, direct lexical pathway (D2); and Pathway P, respectively. For a race-horse approach, overall finishing times are given by F = min (FD1, FD2, FP) = min (min (FD1, ^02). FP) = min (FD, FP), where FD = min (FD1, FD2). In other words, the fastest horse in a race still wins if we assign the horses to teams. The only difference is that we identify the winning team rather than the winning horse. The same associative property also holds for the maximum (strongsense modules) and for addition (probabilistic and integrative approaches). In these cases, the quantitative analysis of finishing times is unaffected by grouping comprehensive model pathways into canonical pathways. In the more general case involving nonlinear integrative processes or different selection approaches at different merger points, the qualitative analysis still holds by induction. Adding one component pathway to any existing set of

pathways can only benefit performance. Adding another component pathway can only further benefit performance. Thus, adding two comprehensive pathways is qualitatively equivalent to adding a single canonical pathway.

Canonical Models In mathematical usage, canonical refers to the simplest form of a system (or expression, etc.) that maintains some critical property. The canonical form of a system can be used, with no loss of generality, in place of any system that is equivalent with respect to that critical property. Simplicity generally refers to conceptual tractability. The critical property is normally determined by one's objectives. For example, when one is solving a set of equations, the critical property is the solution set of the equations. In the case of SPS models, the critical property is the qualitative pattern of predicted RT results. An important advantage of the pathway selection approach is that a canonical model with respect to a set of strategy manipulations is relatively easy to identify. Such canonical models can be used to analyze the predictions of any pathway selection model with respect to those manipulations. Canonical pathways are operationally defined. Each canonical pathway has a distinct pattern of use across the strategy manipulation(s) of interest. Pathways in a comprehensive model that share the same pattern of use across the manipulation(s) are treated as a single canonical pathway. Although the operational definitions of the canonical pathways are the basis for the qualitative predictions, a conceptual rationale is important in deriving canonical pathways from pathways in more comprehensive models. Common assumptions about the recognition of words in lexical decision tasks provide the rationale for an initial canonical pathway selection model for the nonword foil lexicality manipulation. This canonical model has three pathways, which are 7 Another difference between race-horse and integrative approaches arises. Use of counterproductive pathways on a given trial due to overall optimization is more likely to produce errors in a race-horse process and more likely to slow reaction time in an integrative approach. In a race-horse process, if a counterproductive pathway finishes first, the wrong response is made. In an integrative process, the counterproductive pathway adds conflicting evidence and so effectively increases the supporting evidence required from productive pathways. If sufficient counterproductive evidence accrues, this can, of course, also lead to increased error.

750

GREGORY O. STONE AND GUY C. VAN ORDEN

operationally defined (see Figure 1). Table 1 summarizes those operational definitions. Pathway D is operationally defined as a pathway that can be used in all three nonword foil lexicality conditions. The conceptual rationale for this pathway comes from the common assumption of modules that include stimulus-specific representations for words (i.e., lexicons) but no representations for nonwords. Because nonwords lack lexical representations, regardless of their similarity to words, the result of lexical processing can always be used to make a lexical decision. Pathway D can be thought of as direct, visually based access to a lexicon. Pathway P is operationally defined as a pathway that can be used with legal and illegal nonword foils but not with pseudohomophone foils. The conceptual rationale begins with the common assumption of modules that construct representations of stimuli based on properties derived from general experience with words. In particular, a phonologic representation might be assembled using rules (or a rulelike process) relating orthography to phonology. Phonology can be assembled for nonwords as well. If this phonologic representation is used by a lexicon, legal and illegal nonwords will not contribute false-positive evidence to lexical processing, because the nonwords are not pronounced like any word. However, the phonologic representation of pseudohomophones will contribute to false recognition of the word the pseudohomophone sounds like. In this case, use of Pathway P may be counterproductive. Pathway P can be thought of as assembled, phonologically mediated access to a lexicon. If the cost of using Pathway P on pseudohomophone trials outweighs the benefit on word trials, Pathway P will not be used when the foils are pseudohomophones. Note, however, it is possible for the benefit on word trials to outweigh the cost on pseudohomophone trials. In this case, the optimal strategy would be to use phonologically mediated lexical access in all three nonword foil conditions. Factors like the proportion of stimuli with inconsistent orthographicphonologic correspondences (e.g., HAVE and CAVE) would influence this balance. In this latter case, Pathway P does not exist as a canonical pathway (with respect to word performance), because it would have the same operational definition as Pathway D. The canonical model still facilitates analysis of this possibility. One need only consider operation of the canonical model without Pathway P. Pathway N is operationally defined as a pathway that is used only when nonword foils are illegal. The conceptual rationale for such a pathway returns to the concept of a rulegoverned process. When a stimulus violates the rules seriously enough that no rule-based representation can be generated (e.g., unpronounceable nonwords), the success or failure of the rule-governed process can be used directly to make a lexical decision. Stimulus-specific look-up is unnecessary. This occurs only when the nonword foils are illegal, so Pathway N is used only in that condition. Pathway N is nonlexical because it does not pass through a lexicon. Figure 1 also includes two canonical "modules": a lexical module and an assembled phonologic module. They represent the simplest configuration of modules consistent with the conceptual analysis that guided identification of the ca-

nonical pathways. However, whereas pathways in a comprehensive model must map into the canonical pathways, the same is not true of the canonical modules. They are purely an expository convenience. Comprehensive pathway selection models can be mapped into this canonical model by determining how each of the comprehensive pathways will be used across the nonword foil lexicality manipulation. If each comprehensive pathway has a pattern of use that matches one of the operationally defined canonical pathways, then the canonical model provides a fair characterization of the comprehensive models' predictions for manipulation of nonword foil lexicality. Figure 2 illustrates a mapping of a comprehensive model, adapted from the Carr and Pollatsek (1985) parallel coding system (PCS) model, into the canonical model of Figure 1. This adaptation is constructed for illustrative value and includes hypothetical cases that do not occur in the PCS model. However, the mapping techniques can be applied to the actual PCS model, as well as other comprehensive models. The pathway through the morphemic decomposition module and the semantic lexicon (Pathway D2) is not direct visual access to a lexicon, yet it satisfies the operational definition for Pathway D with respect to the nonword foil lexicality manipulation. It is essential to analyze comprehensive pathways in terms of differential use across conditions, rather than working purely in terms of the convenient conceptual characterization that served to introduce the canonical model. The direct pathway from visual processes to the decision process (Pathway N2) is mapped into Canonical Pathway N if it can be used to discriminate illegal nonwords from words.

DECISION process

PHONOLOGIC LEXICON

ASSEMBLED PHONOLOGY

N2

Figure 2. A comprehensive pathway selection model adapted from Carr and Pollatsek (1985). (Except for X, which indicates a pathway that does not map into a conditional pathway, the letter in labeling each pathway indicates the pathway's mapping into the canonical pathways of Figure 1. Pathway D is the direct lexical pathway. Pathway P is the phonologically mediated lexical pathway. Pathway N is the nonlexical pathway.)

STRATEGIC CONTROL IN WORD RECOGNITION It is not important that Canonical Pathway N passes through a phonology module but Pathway N2 does not. What matters in canonical modeling is that Pathway N2 satisfies the operational definition for Canonical Pathway N. Characteristics of the experimental stimuli are important in specifying a canonical model. For example, if the orthographic and phonologic legality of nonword foils were manipulated independently, the visually based nonlexical pathway (N2) and the phonologically based nonlexical pathway (Nl) would map into different canonical pathways. The next example concerns the phonologic decision mechanism of Carr and Pollatsek (1985). The phonologic decision mechanism is intended primarily to account for naming performance. Given a nonword stimulus, the phonologic decision process receives a phonologic representation from assembled phonologic processing but not from the phonologic lexicon. To allow naming of nonwords, the phonologic decision process should pass this phonologic representation on as its output. Given a word stimulus, the phonologic decision process receives input from both the phonologic lexicon and assembled phonologic processing. Its purpose is to resolve any conflicts between these sources of information. Given that the phonologic decision process generates a pronunciation for any pronounceable letter string, it discriminates words from nonwords only when nonwords are illegal, and thus the pathway from it to the decision process (Pathway Nl) maps into Canonical Pathway N. The pathway that passes through the phonologic decision process and the semantic lexicon (Pathway P) is counterproductive only when nonwords are pseudohomophones, and so this pathway maps into Canonical Pathway P. Note that merger of assembled phonology with the results of phonologic lexical processing makes the phonologic lexicon irrelevant for qualitative predictions concerning nonword foil lexicality in lexical decisions. Its inclusion does speed Pathway P on word trials, but this does not change qualitative predictions. Again, canonical models are always defined with respect to the particular task(s) and manipulation^). The value of the phonologic lexicon and the phonologic decision process lies in their ability to account for naming data and differences between naming and lexical decision. The canonical model does not deny the importance of this role; it simply clarifies why this role is not relevant to the present studies. Assumptions concerning the morphemic decomposition module illustrate how a comprehensive model might fail to map into the canonical model. In this strictly hypothetical case, a comprehensive pathway exists that does not map into any of the canonical pathways. Assume the morphemic decomposition module can distinguish known stems (e.g., BEAST) from unknown stems (e.g., BEEST). Pseudohomophones and illegal nonwords can, in principle, be correctly rejected using the pathway directly connecting the morphemic decomposition module to the decision process (Pathway X). If the legal nonwords are all unknown combinations of morphemes (e.g., UNBEAST), Pathway X will not distinguish words from nonwords and should not be used. Thus, it can be used given pseudohomophone and illegal nonword foils but not given legal nonword foils. This does not map

751

into any of the canonical pathways in Figure 1, so the canonical model would not fairly characterize the qualitative predictions of this comprehensive model under these conditions. Although the foregoing scenario is somewhat unusual, it shows that unrecognized possibilities for the operation of a module and unrecognized properties of stimuli can affect whether a canonical model truly characterizes a more comprehensive model. Despite the difference in complexity between comprehensive and canonical models, a canonical model approach can simplify the evaluation of many comprehensive models. When a canonical model is supported by data, it is easier to see that a variety of models are supported, rather than only the model proposed by the investigator. On the other hand, when a canonical model fails because it has too few pathways, this does not automatically constitute evidence against all comprehensive models equivalent to it. Using a canonical model in this way would make it a "straw man." In discussing the results of the present experiments, we illustrate a method of extending a canonical model that is fair to the comprehensive models for which it stands.

Experiment 1 A number of possible predictions for Experiment 1, which crossed nonword foil lexicality (illegal nonword vs. legal nonword versus pseudohomophone) with word frequency (high vs. low), were derived on the basis of considerations like the inclusion of Canonical Pathway P, the nature of pathway selection, and the relative finishing times of the pathways. Given the preceding analysis, it was not difficult to generate predictions for the main effect of nonword foil lexicality, given any combination of these considerations. We analyzed only the most natural predictions. Canonical modeling also facilitates analysis of performance for high- versus low-frequency words. The key observation is that the former are recognized more quickly along Pathway D than are the latter. On trials with highfrequency words, Pathway D is effectively a faster pathway than it is on trials with low-frequency words. Thus, predictions concerning the crossing of frequency and nonword foil lexicality can be derived using the canonical model by considering the effects of speeding or slowing Pathway D within a given configuration of assumptions. Consider first the legal versus illegal nonword foil contrast. Given that frequency affects the lexical pathways but not Pathway N, several configurations of assumptions predict results like those of Shulman and Davison (1977): longer RTs to words and a larger frequency effect given legal versus illegal nonword foils. Probabilistic pathway selection predicts this pattern as long as Pathway N is faster than the lexical pathways and the selection strategy is nonoptimal (i.e., the slower lexical pathways are sometimes selected). Multipathway selection predicts this pattern, regardless of whether the lexical or nonlexical pathway is faster (as long as the difference is not too extreme). The predicted main effect of faster RTs to words given illegal versus legal nonword foils is a consequence of using Pathway N in the illegal nonword foil condition but not in the legal nonword foil

752

GREGORY O. STONE AND GUY C. VAN ORDEN

condition. Recall that adding a pathway can only benefit performance. The predicted interaction with frequency results from the observation that slower trials along the lexical pathways (low-frequency words) will benefit more from addition of Pathway N than will faster trials (high-frequency words). This is a straightforward application of the monotonically nonincreasing magnitude of benefit as a function of the difference between the finishing times of the added pathway and the previous pathway(s). Now consider the legal nonword versus pseudohomophone foil contrast. If, contrary to Figure 1, there is no Canonical Pathway P, there should be no differences in word trial performance given pseudohomophone versus legal nonword foils. This would be consistent with the word trial results of Davelaaret al. (1978) andMcQuade (1981). It would also be consistent with Andrews (1982) if a speed-accuracy trade-off produced her finding of faster word responses with pseudohomophone foils present. If there is a Canonical Pathway P, predictions for the main effect of nonword foil lexicality depend on the nature of pathway selection and the relative finishing times of the pathways. Given probabilistic pathway selection and the standard assumption that Pathway P is no faster than Pathway D, lexical decisions to words should be as fast (or faster) in the pseudohomophone foil condition as in the legal nonword foil condition (because the latter condition adds a pathway that is no faster than that used in the former condition). This would be consistent with Andrews's (1982) results if her observed speed-accuracy trade-off was not large enough to fully counteract the faster RTs to words with pseudohomophone foils present. Analysis of the main effect of nonword foil lexicality for multipathway selection is identical to that for the illegal versus legal nonword foil contrast. Correct RTs to words should be faster in the legal nonword foil condition than in the pseudohomophone foil condition, because of differential use of Pathway P. This would be consistent with the results of C. T. James (1975). Analysis of crossing nonword foil lexicality and frequency for multipathway selection depends on a distinction that can be seen in the canonical model. Pathways D and P may merge in lexical processing. If Pathways D and P merge in the lexicon and frequency affects a postlexical process (Balota & Chumbley, 1984), the frequency effect should be equivalent given legal nonword versus pseudohomophone foils if the postlexical process is a moderate-sense (or strong-sense) module. This follows from the observation that the process responsible for frequency effects receives a single pulse of input from the lexicon that combines Pathways D and P. In this case, nonword foil lexicality can only influence the time at which the frequency-sensitive postlexical process begins. Alternatively, the postlexical process could be a weaksense module and evidence might accumulate more slowly for low- versus high-frequency words (Gordon, 1983). In this case, the most natural prediction is that elimination of Pathway P will hurt low-frequency words more than highfrequency words. There are two cases. First, if evidence from Pathway P generally arrives later than evidence from Pathway D, weak-sense modularity preserves this temporal re-

lationship and more rapid decisions on high-frequency words will be less affected by loss of Pathway P than will generally slower decisions on low-frequency words. Second, if the two pathways have equivalent time courses, the rate of evidence accumulation at the postlexical process is reduced by elimination of Pathway P. The simplest case clarifies the predicted interaction. Halving a larger number (the slower rate of accumulation for low-frequency words) produces a larger absolute decrease than does halving a smaller number (the faster rate for high-frequency words). The case in which a frequency effect occurs along Pathway D before merger with Pathway P is homologous to that for the legal versus illegal contrast. The case in which a frequency effect occurs at the point of merger for the two pathways is homologous to the case given for a weak-sense postlexical module, because weak-sense modularity carries the temporal configuration of the merger process over into the postlexical module. In sum, only one suite of assumptions about frequency effects does not most naturally predict a larger pseudohomophone versus legal nonword foil effect for low- versus highfrequency words. If frequency effects are restricted to a moderate-sense (or strong-sense) postlexical module after merger of Pathways D and P, frequency and the pseudohomophone versus legal nonword foil contrast should be additive.

Method Subjects. Sixty undergraduates at Arizona State University served as subjects in partial fulfillment of a course requirement. All were native speakers of English. The number of left-handed subjects was equated across the between-subjects conditions. Apparatus. The experiment was run on IBM XT clones in sound-attenuated booths. All stimuli were presented in the center of a standard monochrome monitor using the 80 X 25 text mode character set. Subjects were seated roughly 50 cm from the display, so that a letter subtended a visual angle roughly 22' horizontally and 50' vertically. Responses were collected from a standard IBM keyboard. Correct responses to words were always made on the lower rightmost key on the keyboard (key [+]). Correct responses to nonwords were always made on the lower leftmost key on the keyboard (key [F9]). Stimuli. The stimulus set consisted of 100 experimental words, half of which (the high-frequency words) had a frequency count of 90 or greater (M = 298; Kucera & Francis, 1967) and half of which (the low-frequency words) had a frequency count of 9 or less (M — 4.6). All subjects were presented with these experimental word stimuli in all conditions. Three sets of 100 nonword foils were generated, one for each nonword lexicality condition. The following procedure was used to construct the nonword foil sets. First, pseudohomophones of words not used in the experimental set were generated. For the vast majority of these pseudohomophones, we verified their pseudohomophony using the procedure of Van Orden, Johnston, and Hale (1988). Ten judges read aloud potential pseudohomophones embedded in lists of pronounceable nonwords that were not pseudohomophones. For an item to be deemed a pseudohomophone, nine of the judges must have produced its homophonic pronunciation, and any mispronunciation must have resulted from misperception of visual characteristics (e.g., BEEST pronounced as BREAST). Next, legal nonwords were generated from these pseudohomophones by changing the first or last letter so as to maintain roughly equivalent bigram frequencies as well as

753

STRATEGIC CONTROL IN WORD RECOGNITION pronounceability. Finally, illegal nonwords were generated by taking anagrams of the pseudohomophones so that the resulting nonword violated English orthography and was effectively unpronounceable. The experimental word stimuli and the nonword foils are presented in the Appendix. Forty-four practice stimuli were also generated (half words and half nonwords). The nonword foils were constructed as just described for the three nonword lexicality conditions. Neither the practice words nor the words from which the practice nonwords were generated appeared among the experimental stimuli. All stimuli were five letters long. The sets of high- and lowfrequency words were also roughly equated on number of syllables (1.46 for high- vs. 1.30 for low-frequency words) and part of speech (numbers of nouns, verbs, adjectives, and adverbs) as determined by first entry in the American Heritage Dictionary (Morris, 1976). Prefixed and suffixed words were avoided. Procedure. Subjects were instructed in the lexical decision task and shown 4 sample trials (two words and two nonwords from the practice list). The nature of the nonwords was explained to subjects in each nonword lexicality condition. Subjects were asked to respond as quickly and as accurately as possible, with the emphasis on speed. Each subject then received a practice block of 30 trials (half words and half nonwords) from the practice list. Any final questions by the subject were then answered before beginning the experimental trials. The 200 experimental trials were divided into five blocks with a break between blocks that was terminated by the subject. The first 2 trials in each block were taken from the practice list. Half the trials in each block were words, and half were nonwords. Each trial began with a 500-ms warning signal (a +). The lexical decision target was presented immediately following the warning signal for 2,500 ms or until a response was made. Accuracy feedback (a "correct" or "error" message) was then presented for 500 ms. Trials with RTs of less than 150 ms or greater than 2,500 ms were discarded, and the appropriate error message ("early" or "late") was displayed. The next trial began automatically after 1,500 ms. Design. There were two crossed experimental factors. Word frequency was a within-subject, between-items manipulation with two levels: high frequency and low frequency. Nonword foil lexicality was a between-subjects, within-item manipulation with three levels: illegal nonword (BTESE), legal nonword (DEEST), and pseudohomophone (BEEST). The word trials were identical across the nonword lexicality manipulation.

Results Mean correct RTs and errors for all conditions (including nonwords) are given in Table 2. Unless otherwise noted, all

tests were planned on the basis of a priori theoretical issues and the results of pilot studies. All tests were conducted with both subjects and items as random factors, and the alpha level was always set at .05. The pairwise tests for an overall nonword lexicality effect demonstrated slower mean correct RTs to word targets when nonword foils were more lexical than when they were less lexical, for both contrasts: legal versus illegal nonword foils (659 ms - 560 ms = 99 ms), fs(38) = 3.22 and f:(98) = 6.93, and pseudohomophone versus legal nonword foils (787 ms - 659 ms = 128 ms), fs(38) = 3.76 and ?j(98) = 8.09. This nonword foil lexicality effect was also found using only high-frequency words, for both contrasts: legal versus illegal nonword foils (621 ms - 542 ms = 79 ms), rs(38) = 2.70 and fj(49) = 18.24, and, pseudohomophone versus legal nonword foils (707 ms - 621 ms = 86 ms), fs(38) = 2.79 andr,(49) = 11.95. The frequency advantage (i.e., correct RTs to lowfrequency words minus correct RTs to high-frequency words) was significantly larger when nonword foils were more lexical than when they were less lexical, for both contrasts: legal versus illegal nonword foils (76 ms - 36 ms = 40 ms), Fs(l, 38) = 17.24 and Fz(l, 98) = 19.43, and pseudohomophone versus legal nonword foils (159 ms - 76 ms = 83 ms), Fs(l, 38) = 20.93 and Fr(l, 98) = 48.84. Finally, there was a significant frequency effect in the illegal nonword foil condition (578 ms - 542 ms = 36 ms), rs(19) = 7.44 and r,(98) = 5.45. The error scores followed a compatible pattern. There were very few errors (0.6% overall) to high-frequency words, and their rate was not affected by nonword lexicality. There were, however, significantly more errors to low-frequency words than to high-frequency words (by the conservative standard of confidence intervals) in both the pseudohomophone (8.8% vs. 0.7%) and legal nonword (3.5% vs. 0.3%) conditions. By the liberal criterion of Fisher's least significant difference (LSD) test, there was no significant difference in errors to low- and high-frequency words in the illegal nonword condition (1.9% vs. 0.7%). Furthermore, there were significantly more errors (based on confidence intervals) to low-frequency words in the pseudohomophone condition (8.8%) than in the legal nonword condition (3.5%) but only a marginally significant difference (p < . 10 by LSD analysis) between errors

Table 2 Mean Reaction Times and Error Rates in Experiment 1 Illegal Frequency

M

Legal Error

M

Pseudohomophone

% Error

M

Error

3.5 0.3 3.2

867 707 159

8.8 0.7 8.1

Nonword foil lexicality Word trials Low High Frequency advantage

578 542 36

1.9 0.7 1.2

697 621 76

Nonword lexicality 595 Nonword trials 1.3 728 2.3 959 8.2 Note. Means are for correct responses. All responses between 150 ms and 2,500 ms were used in computing the means.

754

GREGORY O. STONE AND GUY C. VAN ORDEN

to low-frequency words in the legal (3.5%) and illegal (1.9%) nonword conditions.

Discussion These results demonstrate that the lexicality of nonword foils had a major impact on correct RTs to word stimuli and on the advantage of high-frequency words over lowfrequency words. This was true both when the contrast was between illegal and legal nonword foils and when the contrast was between pseudohomophone and legal nonword foils. Furthermore, nonword lexicality had a significant impact on correct RTs to high-frequency words. This pattern of results is consistent with C. T. James's (1975) finding of longer correct RTs to words given pseudohomophone foils than those given legal nonword foils. However, this is not necessarily converging evidence against the studies in which no effect was found (Davelaar et al., 1978; McQuade, 1981). Recall that differential use of Pathway P depends on the balance between the benefit for word trials and the cost on pseudohomophone trials of using Pathway P. Thus, if the stimuli of Davelaar et al. (1978), McQuade (1981), and Andrews (1982, given that her results were due to a speed-accuracy trade-off) produced a greater benefit from using Pathway P on nonpseudohomophone trials than the cost on pseudohomophone trials, then their results are consistent with the canonical model. This is particularly likely for Andrews and McQuade, because only one-fourth of their stimuli were pseudohomophones. Davelaar et al.'s subjects always received legal nonword foils in the first half of the experiment and pseudohomophones in the second half. This may have biased them toward continued use of Pathway P (cf. Kroll & Merves, 1986). Discussion thus far has assumed that Andrews's (1982) finding of faster RTs to words when pseudohomophones were included than when they were excluded can be attributed to the observed speed-accuracy trade-off. What if this result cannot be fully attributed to speed-accuracy trade-off? In this case, an SPS model has two choices. First, add a pathway that was used in Andrews's pseudohomophonepresent condition but not in her pseudohomophone-absent condition, or second, take the awkward position that selection of Pathway P is sometimes probabilistic and sometimes is not. We do not consider further the apparently conflicting results for several reasons. First, we have observed the present results using five stimulus sets. Thus, there exists a strategy for which the present results obtain. Second, as noted earlier, reasonable analyses of the previous studies are possible that are consistent with the canonical SPS model. Third, the previous studies are all suggestive. Further work is warranted before the canonical SPS model should be complicated on such grounds. Our position is that strategic control plays a ubiquitous role in word recognition. Given this, the conservative position for us is to appeal to a strategy difference only when it is unquestionably required. If the Andrews (1982) findings reliably resulted from another processing strategy, our basic position that strategies play a critical part in word recognition is only strengthened.

The results of Experiment 1 suggest that use of Pathway P can be controlled strategically. This finding raises serious problems for probabilistic pathway selection. In this approach, use of Pathway P in the legal nonword foil condition but not the pseudohomophone foil condition can benefit the legal nonword foil condition only if Pathway P is faster than Pathway D. Because mean finishing times given probabilistic pathway selection are the weighted sum of finishing times for the separate pathways, the results of Experiment 1 suggest that the mean finishing time for Pathway P is at least 86 ms faster than that for Pathway D for high-frequency words.8 To our knowledge, no one has proposed that phonologically mediated lexical access is significantly faster than direct lexical access for high-frequency words within a pathway selection framework. On the other hand, the multipathway approach correctly predicted the observed pattern of results, given the assumptions that Pathway P is about as fast as Pathway D and that frequency effects occur in Pathway D before its merger with Pathway P. The only possible point of concern is that effects were larger in the pseudohomophone versus legal contrast than in the legal versus illegal contrast. All else equal, this would suggest Pathway P is faster relative to Pathway D than Pathway N is relative to the lexical pathways. If the same output from assembled phonologic processing is used for lexical access along Pathway P and directly for a decision along Pathway N, this appears paradoxical. There are three possible resolutions of this paradox. First, recall that an integrative process will produce larger strategy effects than a race-horse process, all else equal. The larger effects obtained from adding Pathway P than from adding Pathway N could reflect merger of Pathways P and D in lexical processing through an integrative process but a race-horse selection between Pathway N and the merged lexical pathways. This solution is principled and has considerable conceptual appeal. Second, use of Pathway N by the decision mechanism may be slowed by the novelty of using such a strategy (it is difficult to think of an equivalent, ecologically valid task). In this case, the 8

Semiformal proof: Let TD be the mean finishing time for high-frequency words along Pathway D and TP be that for Pathway P. Also let pph be the probability of using Pathway P in the pseudohomophone foil condition and p\ be the probability in the legal nonword foil condition. As noted earlier, in probabilistic selection, mean total finishing time is the probability-weighted sum of the mean finishing times for the pathways used. Assuming that nonword foil lexicality has no other effect, the 86-ms slowing of reaction time given pseudohomophone versus legal nonword foils implies the following: [pptlTP + (1 -p ph )7D] - \P\Tp + (1 pi)TD] = 86, where the left bracket is the finishing time given pseudohomophone foils and the right bracket is the finishing time given legal nonword foils. Rearranging terms gives (p ph - p\)(TP - TD) = 86. Given that pph < p,, TP must be less than TD (i.e., Pathway P must be faster than Pathway D). Rearranging further, TD - TP = 86/(p, - pph). The minimum value for the difference TD - TP occurs when the denominator on the right is 1 (i.e., p\ — 1 and Pph = 0), giving a difference of 86 ms. Any other choice of probabilities gives a denominator less than 1 and thus a difference greater than 86.

STRATEGIC CONTROL IN WORD RECOGNITION

output of assembled phonologic processing still occurs soon enough to produce the large benefit obtained in the pseudohomophone versus legal nonword foil contrast. However, this account is somewhat ad hoc. Third, Pathway N may reflect the use of a nonlexical module other than assembled phonologic processing. In this case, the finishing times of Pathways P and N are conceptually uncoupled. This is the least desirable of the three possible solutions, because it requires addition of a third module to the canonical model. On the other hand, modules that could serve in this role are found in many comprehensive models, and so this account is possible without complicating those comprehensive models. However, all configurations of assumptions within the multipathway selection approach are not viable. The results of Experiment 1 are inconsistent with the assumption that frequency affects a moderate-sense (or strong-sense) postlexical module with the merger of Pathways P and D in the preceding lexical module. This assumption led us to predict that the frequency effect would be equivalent in the legal nonword and pseudohomophone foil conditions. In fact, the frequency effect in the pseudohomophone foil condition was more than double that in the legal nonword foil condition.

Experiment 2 In Experiment 2, we further explored the implications of the conceptual basis of the canonical pathway selection model, which led to the successful account of nonword foil lexicality and frequency effects in Experiment 1. Recall that differential use of Pathway P in Experiment 1 depended on the balance between the cost of not using Pathway P for word trials and the benefit of not using it for pseudohomophone trials. In Experiment 2, word stimuli were irregular and inconsistent. In this case, Pathway P would be counterproductive for word trials. This suggests two possible scenarios for Experiment 2, depending again on relative costs and benefits for word and nonword trials. Given irregular, inconsistent words, we predicted that Pathway P would be counterproductive on word trials, regardless of the nonword foils. Assembled, phonologic processing of irregular, inconsistent words would provide evidence against a word response. In the pseudohomophone foil condition, processing of all stimuli (word and nonword) would be hurt by use of Pathway P. Thus, Pathway P was not expected to be used in the pseudohomophone foil condition. In the legal nonword foil condition, use of Pathway P would still benefit performance on nonword trials, because it would provide evidence for a nonword response. Unlike in Experiment 1, differential use of Pathway Pin Experiment 2 would depend on the balance between the benefit of not using Pathway P for word trials and the cost of not using it for legal nonword trials. If Pathway P is not used in either nonword foil condition, the canonical model would predict no difference on word trials between the pseudohomophone and legal nonword foil conditions. If Pathway P is used in the legal nonword foil condition but not in the pseudohomophone foil condition), word trial performance would be better in the pseudohomophone foil condition than in the legal nonword foil condition.

755

In other words, the canonical model predicted for Experiment 2 that given irregular, inconsistent word stimuli, the effect of nonword foil lexicality found in Experiment 1 would either disappear or reverse. Experiment 2 also differed from Experiment 1 in that the legal nonwords in Experiment 2 were spelling controls for the pseudohomophones. This control was not included in Experiment 1 because an account of Experiment 1 in terms of pathway selection based purely on orthographic differences between the pseudohomophones and the legal nonwords would have been no more parsimonious than that of the canonical model. It would have required at least three canonical pathways. The use of phonology per se was not a critical issue in Experiment 1. However, the logic of Experiment 2 relied critically on differential use of a phonologic pathway in the pseudohomophone versus legal nonword foil contrast. Assume that the effect of pseudohomophone versus legal nonword foils in Experiment 1 was due to greater orthographic familiarity for pseudohomophones than for legal nonwords, so that an orthographic familiarity assessment pathway (cf. Besner & Johnston, 1989) was used given legal nonword foils but not pseudohomophone foils. In this case, eliminating an assembled, phonologically mediated lexical pathway by using irregular, inconsistent words would not change the logic of prediction from Experiment 1, and any pattern of nonword foil lexicality effects could be accounted for using three canonical pathways. With the orthographic properties of the pseudohomophones and legal nonwords equated, replication of Experiment 1 in Experiment 2 would suggest the addition of a new pathway to the canonical model. The pseudohomophones and legal nonwords were equated on two orthographic measures: orthographic similarity to a particular word (using the orthographic similarity measure proposed by Van Orden, 1987, which was derived from the graphemic similarity measure of Weber, 1970) and mean bigram frequency as an indicator of similarity to words generally (Massarro, Taylor, Venezky, Jastrembski, & Lucas, 1980).

Method Subjects. Sixty new subjects were selected by the same means as used in Experiment 1. Apparatus. There was no change from Experiment 1. Stimuli. A pool of 166 candidate irregular words was taken from three sources (Bub, Cancelliere, & Kertesz, 1985; V. Coltheart, Laxon, Rickard, & Elton, 1988; Saffran, 1985). A set of 60 experimental words was selected from this set, subject to the following constraints. First, on the basis of the frequency norms of Kucera and Francis (1967), half the words (high frequency) had a count of 70 or greater (M = 469) and half (low frequency) had a count of 14 or less (M = 3.5). Second, the two frequency sets were equated for length in letters (4.77 for high-frequency words and 4.80 for lowfrequency words), length in syllables (1.2 for both sets), and part of speech, as determined by the first entry in the American Heritage Dictionary (Morris, 1976). These stimuli were also predominantly inconsistent: 73% with 17% hermits (i.e., words with no phonologic neighbors) for the high-frequency words and 70% with 17% hermits for the low-frequency words. These word stimuli were used for all subjects in all conditions.

756

GREGORY O. STONE AND GUY C. VAN ORDEN

We generated a set of 60 nonword foils for each nonword foil lexicality condition using the following procedure. First, pseudohomophones of words not used in the experimental set were generated using the same procedure as in Experiment 1. Next, legal nonwords were generated from these pseudohomophones by changing a single letter of the source word, so that the mean bigram frequency and orthographic similarity to the source word were equated to those for the pseudohomophones. As a result, mean bigram frequencies were slightly larger for legal nonwords (2.699, SD = 0.476) than for pseudohomophones (2.633, SD = 0.507). Orthographic similarity to the source word was also slightly greater for legal nonwords (0.7692, SD = 0.0526) than for pseudohomophones (0.7685, SD = 0.0573). The word stimuli and the nonword foils are presented in the Appendix. Forty-four practice stimuli were also generated (half words and half nonwords). The practice words were taken from the remaining words in the initial candidate set of irregular words. The nonword foils were constructed as just described for the three nonword foil lexicality conditions. Neither the practice words nor the words from which the practice nonword foils were generated appeared among the experimental word stimuli or among the source words for the experimental nonwords. Procedure. The procedure was identical to that in Experiment 1, except the number of trials was smaller. Design. The design was identical to that of Experiment 1.

Results Mean correct RTs and errors for all conditions (including nonword foils) are given in Table 3. The pairwise test for an overall nonword foil lexicality effect demonstrated slower correct RTs to irregular word targets given pseudohomophone versus legal nonword foils (853 ms - 719 ms = 134 ms), rs(38) = 3.18 and ?i(58) = 11.80. This nonword foil lexicality effect was also found for high-frequency words only (771 ms - 666 ms = 105 ms), rs(38) = 2.95 and r,(29) = 7.61. The frequency advantage was significantly larger given pseudohomophone versus legal nonword foils (163 ms - 106 ms = 57 ms), Fs(l, 38) = 5.91 and F,(l, 58) = 9.29. The error scores followed a compatible pattern. There were very few errors (1.8% overall) to high-frequency words, and their rate was not affected by nonword foil lexicality. There were, however, significantly more errors to low-frequency words than to high-frequency words (by the conservative Table 3 Mean Reaction Times and Error Rates in Experiment 2 Legal Frequency

M

Error

Nonword foil lexicality Word trials Low 772 11.8 High 666 1.3 Frequency advantage 106 10.5

Pseudohomophone

M

935 771 163

Error

16.3 2.2 14.1

Nonword lexicality 867 8.7 1,023 Nonword trials 12.5 Note. Means are for correct responses. All responses between 150 ms and 2,500 ms were used in computing the means.

standard of confidence intervals) in both the pseudohomophone foil (16.3% vs. 2.2%) and the legal nonword foil (11.8% vs. 1.3%) conditions. Furthermore, there were significantly more errors (based on confidence intervals) to lowfrequency words in the pseudohomophone condition (16.3%) than in the legal nonword foil condition (11.8%). Discussion Experiment 2 demonstrated a difference in strategy given legal, spelling-control nonword versus pseudohomophone foils, even when the word stimuli were irregular and inconsistent. The use of spelling controls as legal nonwords bolsters the argument that the pseudohomophones' phonologic equivalence to a true word is responsible for this shift in strategy. These results are extremely difficult to reconcile with the conceptual basis of the canonical model. If Pathway P was not used in any nonword foil condition, there should have been no difference in performance for word trials between the pseudohomophone and legal nonword foil conditions. If the benefit of using Pathway P for legal nonwords outweighed the cost of using it on irregular word trials, either mean RTs or errors for word trials should have been greater in the legal nonword condition than in the pseudohomophone foil condition. This was not the case. Because the pseudohomophones and legal nonwords were equated for orthographic factors, it is difficult to argue that Pathway P, as operationally defined, did not depend on phonology and so was not influenced by the use of irregular, inconsistent word stimuli. Still, it is possible to avoid addition of a new pathway to the canonical model by reconceptualizing the operation of assembled phonologic processing in a way that makes Pathway P viable for irregular words but not for pseudohomophones. Two new assumptions make this possible. First, multiple, alternative phonologic representations might be generated by assembled phonologic processing. If rules of varying strengths (Rosson, 1985) reflect not only the more common spelling-sound correspondences, but also the less common (i.e., irregular) correspondences for the same orthographic combinations, assembled phonologic processing might not choose between these alternatives, but rather pass the alternatives on to lexical processing for resolution. Second, assume that input to a semantic lexicon from assembled phonologic processing can only facilitate semantic lexical access. If any alternative phonologic representation supports a word interpretation, that supporting representation facilitates a "word" response. In this case, use of Pathway P given a multipathway selection approach would facilitate recognition of irregular words. However, Pathway P would still be counterproductive given pseudohomophones, which generate a phonologic representation consistent with a word interpretation. Thus, Pathway P could still be used differentially with legal nonword versus pseudohomophone foils when word stimuli were irregular and inconsistent. In sum, if the operation of assembled phonologic processing is reconceptualized, the results of Experiment 2 can be accommodated by the canonical model. Although a similar

STRATEGIC CONTROL IN WORD RECOGNITION

process has been proposed by Brown (1987), this approach differs significantly from standard assumptions about phonologic processing. It requires more than an admission that rules may have varying strengths (Rosson, 1985). One appeal of rule-based processing was that the limitations of stimulusspecific association learning could be avoided (Chomsky, 1959). The existence of exception words (i.e., words whose pronunciation does not follow rules) required at least some stimulus-specific processing. If rule-governed phonologic processing serves only to generate multiple possibilities, which are resolved using stimulus-specific processes, the importance of rule-based processing in word recognition is greatly diminished (see also Van Orden, Pennington, & Stone, 1990). This interpretation is not forced by Experiment 2. Rather, it is the most parsimonious account in terms of pathway selection, in that it requires no addition of pathways to the canonical SPS model of Figure 1. An alternative interpretation may be possible. However, it is difficult for us to imagine an alternative conceptual rationale for a phonologically mediated pathway that is viable for irregular, inconsistent words but not for pseudohomophones.

Experiment 3 Thus far, the canonical pathway selection model was tested with a single ideal strategy manipulation—nonword foil lexicality. The canonical model accounted for the results of the first two experiments, given that assembled phonologic processing generated multiple phonologic representations and lexical processing (in the lexical decision task) was affected only by positive evidence. In Experiment 3, we manipulated nonword foil lexicality with blocked frequency (i.e., subjects saw only high-frequency words or only low-frequency words). Blocked versus mixed word frequency is an ideal strategy manipulation, as defined earlier, because a given word trial is identical in the mixed versus blocked frequency conditions—only the composition of the overall stimulus set differs. Frequency blocking has several advantages as a second factor from our perspective. First, previous results with this manipulation have a clear pathway selection interpretation. Glanzer and Ehrenreich (1979) found that correct RTs to high-frequency words were significantly faster when they appeared alone (blocked frequency) than when they were mixed with low-frequency words (mixed frequency). However, there was no significant difference in correct RTs to low-frequency words when they appeared alone (blocked frequency) than when they were mixed with high-frequency words (mixed frequency). The SPS hypothesis requires a clear-cut, counterintuitive interpretation of these results: that there are two lexicons, one containing all words and one containing only high-frequency words. This is, in fact, the interpretation advocated by Glanzer and Ehrenreich (1979). Although Glanzer and Ehrenreich (1979) proposed a detailed model of their results in terms of probabilistic pathway selection, our discussion is in terms of a broader class of

757

viable "two-dictionary" models. On the one hand, their results are compatible with a multipathway approach that includes a high-frequency lexicon. On the other hand, as we have seen, probabilistic pathway selection is hard-pressed to account for the results of the nonword foil lexicality manipulation. Of course, selection of the high-frequency versus the full lexicon could be probabilistic, whereas other pathways are selected according to a multipathway selection process. But this unparsimonious solution is not necessary. The second advantage of the frequency blocking manipulation, from our perspective, is that the high-frequency lexicon is not generally included in comprehensive models of word identification, possibly because it does not seem to provide explanatory power for any empirical findings besides those from the frequency blocking manipulation. In Experiment 3, we used the same word stimuli as in Experiment 1, except that some subjects received only the high-frequency words and others received only the lowfrequency words. Half the nonword foils from Experiment 1 were selected at random for use in Experiment 3. Two fundamental predictions for the crossing of frequency blocking and nonword foil lexicality were derived from the modified canonical model (Figure 3), which adds a highfrequency lexicon (Glanzer & Ehrenreich, 1979) to the canonical model (Figure 1). First, correct RTs to low-frequency words were not expected to differ in the blocked- and mixed-frequency conditions, because the same pathways (D, P, or N, depending on nonword foil lexicality) would be used in both conditions. Second, we predicted there would be a significant overadditive interaction of frequency and nonword foil lexicality in the blocked-frequency condition. The first observation that led to this prediction followed from the preceding point. Any change in the Frequency X Nonword Foil Lexicality interaction as a result of frequency blocking must result purely from a change for high-frequency words. The only strategy available given blocking would be differential use of Path-

Figure 3. The canonical strong pathway selection model from Figure 1 with the high-frequency lexicon of Glanzer and Ehrenreich (1979) added. (Pathway D is the direct lexical pathway. Pathway P is the phonologically mediated lexical pathway. Pathway N is the nonlexical pathway. H indicates high frequency.)

758

GREGORY O. STONE AND GUY C. VAN ORDEN

Table 4 Reaction Times and Error Rates in Experiment 3 Legal Frequency

M

Error

Pseudohomophone

M

Error

Pseudohomophone cost

M

% Error

Nonword foil lexicality Word trials Low High Frequency advantage

685 560 125

3.5 1.2 2.4

746 669 77

5.8 2.3 3.5

Nonword lexicality Nonword trials Low 796 8.4 871 12.9 73 4.5 High 607 2.8 812 8.4 206 5.5 Note. Mean reaction times are for correct responses. All responses between 150 ms and 2,500 ms were used in computing the means.

way DH. Pathway DH can speed correct RTs to highfrequency words in the blocked-frequency condition only. Furthermore, the high-frequency lexicon would not be affected by use of assembled phonology (consistent with the general lack of phonology effects for high-frequency words and parsimony). Failure to use assembled phonology in the pseudohomophone condition would not alter finishing times for the high-frequency lexicon. As a result, two of three pathways contributing to correct RTs for high-frequency words (Pathways D and DH) would be unaffected by the manipulation, as opposed to one of the two pathways contributing to correct RTs for low-frequency words (Pathway D). Thus, high-frequency words would also be hurt less than lowfrequency words by the use of pseudohomophone foils. On the basis of this analysis, we predicted that the overadditive interaction of frequency and nonword foil lexicality would be larger in Experiment 3 (blocked frequency) than it was in Experiment 1 (mixed frequency).

Method Subjects. Sixty-four new subjects were selected by the same method used in Experiments 1 and 2. Apparatus. The apparatus was the same as that used in Experiments 1 and 2. Stimuli. The word stimuli were those used in Experiment 1. Because of the frequency blocking manipulation, half the non words from Experiment 1 were selected at random. Finally, a new set of practice words was generated for each frequency condition such that only high-frequency words were used for practice in the highfrequency condition and only low-frequency words were used for practice in the low-frequency condition. The practice nonwords were unchanged from Experiment 1. Procedure. The procedure was unchanged from Experiment 1, except that the number of trials was smaller. Design. There were two factors: word target frequency (highfrequency words only vs. low-frequency words only) and nonword foil lexicality (pseudohomophone only vs. legal nonword only). Both were between-subjects factors (e.g., a subject might receive only high-frequency words and only pseudohomophone nonwords). Word frequency was a between-items manipulation; nonword foil lexicality was a within-item manipulation.

Results Mean correct RTs and errors to words and nonwords are presented in Table 4. Comparisons with Experiment 1 are summarized in Table 5. Unless otherwise noted, all analyses were planned. The alpha level for all tests was set at .05. All tests were performed with both subjects and items as random factors. For the comparisons with Experiment 1, frequency in Experiment 1 was treated as a between-subjects variable. Not surprisingly, correct RTs were significantly faster for high-frequency than for low-frequency words (614 ms vs. 716 ms), F s (l, 60) = 19.46 and F,(l, 98) = 111.78. Furthermore, correct RTs to target words were significantly faster with legal nonword foils than with pseudohomophone foils (622 ms vs. 708 ms), Fs(l, 60) = 13.84 and F,(l, 98) = 273.37. The test for an interaction between word frequency and nonword foil lexicality produced mixed results. The frequency advantage was greater in the legal nonword foil condition (125 ms) than in the pseudohomophone foil condition (77 ms). This trend was not significant in the subjects analysis, Fs(l, 60) = 1.10, p > .10, but it was significant in the items analysis, F,(l, 98) = 38.63. The test for an interaction between nonword foil lexicality and frequency blocking for correct RTs to low-frequency words indicated that the 121-ms blocking advantage with pseudohomophone foils was significantly larger than the 12-ms blocking "advantage" with legal nonword foils, Table 5 Comparison of Blocking Benefit in Experiments 1 and 3 Nonword foil lexicality Legal

Pseudohomophone

% % Mean Mean reaction time Error Error Frequency reaction time 3.0 121 Low 12 0.0 38 -1.6 High -0.9 61 Note. The blocking benefit is the performance measure (reaction time or error) for mixed-frequency presentation minus the performance measure for blocked-frequency presentation.

STRATEGIC CONTROL IN WORD RECOGNITION

Fs(l, 68) = 4.24 and F,(l, 49) = 67.78. This result violates the prediction of any theory that argues that low-frequency words are always processed the same whether blocked or mixed with high-frequency words. A post hoc analysis of the 121-ms blocking advantage for low-frequency words with pseudohomophone foils was significant by the conservative standard of confidence intervals, both for subjects and for items. Finally, the overadditive interaction between frequency and nonword foil lexicality in Experiment 1 (mixed frequency) and the apparently underadditive interaction between the two in Experiment 3 (blocked frequency) were marginally different for subjects, Fs(l, 136) = 3.60, p < .10, but significantly different for items, F,(l, 98) = 81.41. Analysis of the error data was consistent with this pattern, with greater error in conditions with longer correct RTs. In the cross-experiment comparison, however, there was a potentially interesting inconsistency between error and RT. The error rate to high-frequency words appeared larger in the blocked-frequency than in the mixed-frequency experiment, whereas the error rate to low-frequency words appeared smaller in the blocked-frequency than in the mixedfrequency experiment. The test of the three-way interaction among frequency, nonword foil lexicality, and list type (blocked vs. mixed frequency) was significant, Fs(l, 136) = 4.47, and F,(l, 98) = 5.79. This interaction was explored using post hoc LSD tests of the blocking effect for each Frequency X Nonword Foil Lexicality condition. For low-frequency words, error rates with legal nonword foils were equal in the mixed- and blocked-frequency conditions (3.5% in each). With pseudohomophone foils, the 3.0% error benefit from blocking was significant for subjects but not for items (p > .10). For high-frequency words, the 0.9% error cost from blocking was significant for subjects but was only marginally significant (p < .10) for items. Finally, the 1.6% error cost from blocking was significant for subjects and for items. Because these effects were essentially main effects across experiments, they should be taken not as definitive, but rather as a suggestive pattern that warrants further investigation.

759

The analyses comparing Experiment 1 (mixed frequency) and Experiment 3 (blocked frequency) shed light on this failure. A key to the overadditive prediction was that lowfrequency words should always be processed the same in blocked- and mixed-frequency conditions. But this assumption was contradicted. The blocking benefit for highfrequency words was not significantly different given pseudohomophone versus legal nonword foils (38 vs. 61 ms, p > .10 by t test). However, there was a sizable blocking advantage (121 ms) for low-frequency words given pseudohomophone foils but no such advantage given legal nonword foils (12 ms). This blocking advantage for lowfrequency words contradicts a fundamental prediction of the two-dictionary model (and all other existing accounts of frequency blocking; e.g., Gordon, 1983)—that low-frequency words are always processed the same way across blocking conditions. The most straightforward way to approach these results within the framework of the canonical model is to add a lexical module containing only low-frequency words to the canonical model of Figure 3 (see Figure 4). However, the results of Experiment 3 place several strong constraints on such a modification. Contrary to Figure 3, there must be phonologically mediated access to the high-frequency lexicon (Pathway PH). This assumption is important for several reasons. First, to maintain an equivalent (or possibly reduced) high-frequency blocking advantage given pseudohomophone versus legal nonword foils, both lexicons must be slowed by loss of phonologic input. Second, the apparent underadditive interaction between frequency and nonword foil lexicality in Experiment 3 can be explained by the loss of two pathways on high-frequency trials given pseudohomophone foils versus loss of only one pathway on low-frequency trials. This could reduce the frequency effect in the pseudohomophone foil condition relative to the legal nonword foil condition if loss of two pathways more than makes up for the larger impact of Pathway P on low-frequency trials.

Discussion The two strong predictions of the two-dictionary framework were not supported. There was no overadditive interaction between frequency and nonword foil lexicality. Indeed, by liberal criteria, there was an underadditive interaction (the frequency effect was smaller in the pseudohomophone than in the legal nonword foil condition). The two-dictionary model made the strong prediction of an overadditive interaction (a larger frequency effect in the pseudohomophone than in the legal nonword foil condition). Although a statistical close call in the correct direction could be reconciled with a two-lexicon approach, a close call in the wrong direction must be seen as a clear falsification of the two-lexicon model's predictions.

Figure 4. A further modified canonical strong pathway selection model, which accounts for the results of all three experiments. Low-frequency lexicons have been added to the model in Figure 3. (Pathway D is the direct lexical pathway. Pathway P is the phonologically mediated lexical pathway. Pathway N is the nonlexical pathway. H indicates high frequency. L indicates low frequency.)

760

GREGORY O. STONE AND GUY C. VAN ORDEN

In addition, the pathway through the low-frequency lexicon must be extremely slow relative to the combination of Pathways D and P. This assumption is necessary to explain the lack of a blocking advantage for low-frequency words in the legal nonword foil condition. Furthermore, the lowfrequency lexicon must not take input from phonologic modules responsible for the pseudohomophone foil effects. In this way, exclusion of Pathway P in the pseudohomophone foil condition brings finishing times for the full lexicon and the low-frequency lexicon close enough together for there to be a blocking advantage for low-frequency words in the pseudohomophone foil condition. There are several troubling aspects to these constraints. First, low-frequency words are generally thought to be more sensitive than high-frequency words to phonology (Doctor & Coltheart, 1980; Jared & Seidenberg, 1991). The preceding account requires that blocked low-frequency words be insensitive to phonology and blocked high-frequency words be sensitive to phonology. Of course, it is possible that sensitivity to phonology is reversed under blocked-frequency conditions, but it is not clear why this should be the case, except to account for the present results. Second, the preceding account can explain a frequency blocking effect for lowfrequency words in the pseudohomophone foil condition. However, given that the low-frequency lexicon must be quite slow (to avoid a low-frequency blocking effect in the legal nonword foil condition), it is rather awkward that the addition of such a slow pathway generates so large an effect (121 ms) compared with the blocking effect for high-frequency words (61 ms) given legal nonword foils. Third, introduction of the frequency blocking manipulation doubles the number of required pathways (from three to six). An alternative SPS model avoids the preceding constraints and includes only five pathways. However, this new model requires an alternative conceptual rationale, less obviously tied to the present manipulations. This rationale draws from differences between normal reading and proofreading (see Figure 5). The standard lexicon is used in normal reading and has a tolerance for nonwords that are perceptually similar to true words. In normal reading, such typographic errors are op-

DECISION

timally read as the true word (cf. Forster, 1987). On the other hand, proofreading requires detection of nonwords that are perceptually similar to true words. Thus, there is a proofreading lexicon, slower than the standard lexicon, but more sensitive in detecting nonwords that are similar to words. In the pseudohomophone foil condition, the proofreading lexicon is used, because it is adept at discriminating pseudohomophones from words. In the legal nonword foil condition, the standard lexicon can also be used to speed recognition. This eliminates the need to assume that assembled phonology influences lexical decisions, as was assumed by the initial canonical model. Frequency blocking effects are accommodated by low- and high-frequency lexicons, associated with proofreading and normal reading, respectively. In normal reading, failure to find a lexical entry is often compensated for by context. For example, in the sentence "The blebs rose rapidly in the champagne," interpretation of "blebs" is on hold until the subsequent context makes a good guess possible. Slower identification of a low-frequency word by the full lexicon could exploit this on-hold strategy; the dominance of highfrequency words' token counts (Kucera & Francis, 1967) makes this strategy adaptive. On the other hand, in a lexical decision task, early evidence for a no response to lowfrequency words from a high-frequency lexicon would induce an overall cost (especially because low- and highfrequency words generally occur equally often in lexical decision tasks). Thus, the high-frequency lexicon only shows itself in the blocked high-frequency condition. In proofreading, the relative importance of high- and lowfrequency words reverses. Low-frequency words are more likely to be misspelled than high-frequency words. Knowledge of low-frequency words' spellings may be less developed or less available (Van Orden, 1987). The argument for a slower, but more conservative, low-frequency lexicon in proofreading is similar to the argument for the highfrequency lexicon in normal reading. The disadvantage of using it in mixed-frequency lexical decisions is also similar. Consequently, there would be a frequency blocking advantage for low-frequency words. It only occurs given pseudohomophones, because both proofreading modules are slower than the other modules and so have little impact given legal nonword foils. If performance in Healy's (1976) letter detection task uses the proofreading module, this suite of assumptions would also explain why detection of letters is poorer in familiar words (e.g., THE) than in less familiar words. General Discussion

Figure 5. An alternative canonical strong pathway selection model, which avoids the finishing time constraints of the model in Figure 4.

A canonical pathway selection model accounted for the results of the present experiments. However, changes in the initial canonical model were required. The initial canonical model accounts for the first two experiments if assembled, phonologic processing outputs multiple alternatives and lexical processing is affected only by alternatives that benefit processing. This assumption differs from the standard assumption that a module must settle on a single interpretation of a stimulus.

STRATEGIC CONTROL IN WORD RECOGNITION

Open-Form Input The assumption of multiple phonologic processing outputs is a generalization of the weak-sense module. A weak-sense module receives information about possible interpretations of the stimulus as processing proceeds in the source module. Given that weak-sense modules can operate effectively with multiple interpretations early in processing, an input that never settles on a single interpretation is not a large conceptual leap. Closed-form input, which settles on a single interpretation, can be distinguished from open-form input, which settles on a combination of multiple possibilities. Although open-form input weakens the independence of modules, the fundamental qualities of independence (as discussed earlier) are maintained. In fact, open-form input has been used to maintain independent modules when stimuli were, by nature, ambiguous with respect to processing in some posited module (see Small, Cottrell, & Tanenhaus, 1988). Furthermore, the distinction between open- and closedform input does not depend on the time course of input. As a result, each type of module can take open- or closed-form inputs. For example, a strong-sense module with open-form inputs does not begin processing until all input is available. However, that input includes multiple interpretations from each input source. Commitment to the SPS hypothesis motivated our use of open-form input when open-form input was not required by stimulus ambiguity. Abandoning the strong form of pathway selection might have been easier. However, if use of openform input given nonambiguous stimuli proves useful elsewhere, restriction of discussion to the SPS hypothesis was worthwhile.

Frequency Blocking The frequency blocking manipulation required greater change in the canonical model. The high-frequency blocking benefit and the low-frequency blocking benefit (given pseudohomophone foils) added at least two pathways. In the most conceptually straight-forward configuration (Figure 4), high- and low-frequency lexicons were added to the original canonical model (Figure 1). However, elaborate assumptions about the time course of processing are required, some of which violate common assumptions about word recognition. These conflicts motivated an alternative canonical model (Figure 5) that abandons the conceptual rationale for the modules of the initial canonical model. In particular, use of pseudohomophone foils is assumed to engage a proofreading module. In contrast, the initial canonical model made the more obvious assumption that use of pseudohomophone foils prevents assembled phonological processes. We emphasize that the alternative canonical model does not deny the existence of an assembled phonology module in more detailed models. Rather, it assumes that such a module is not critical for present purposes. The change in assumptions about the effect of pseudohomophone foils raises an important issue. Should

761

theoretical mechanisms directly express the most obvious ways of thinking about phenomena?

Conceptual and Empirical Isomorphism Psychologic theories commonly assume a one-to-one relationship between the most intuitive description of a phenomenon and the explanatory concepts that account for the phenomenon. For example, one can generate rules that describe the relationship between the spelling and the pronunciation of a word (Venezky, 1970; Wijk, 1966). However, many words violate these rules. These irregular words are treated as word-specific special cases. Conceptually, these are two distinct ways of relating spelling to pronunciation. Many theories of phonology in reading reify this conceptual distinction as two independent processing modules: a rule-governed, assembled phonology module and a word-specific, addressed phonology module (cf. Patterson & Coltheart, 1987). William James (1890) recognized this tendency and warned against what he called "the psychologist's fallacy," "the confusion of [the psychologist's] standpoint with that of the mental fact about which he is making his report" (1890, p. 196). We prefer the less judgmental term "firstorder conceptual isomorphism"9 for the assumption that the most obvious ways of thinking map directly into processing mechanisms. Another common assumpti on is that different patterns of empirical results imply difernt processing mechanisms. Lakoff (1987) called this assumption the "effects equal structure fallacy." Again, we prefer a less judgmental term: "first-order empirical isomorphism." Conceptual and empirical isomorphisms are often intimately related. For example, the conceptual isomorphism of rule-governed versus stimulus-specific processing leads to the empirical isomorphism of accounting for the interaction of word frequency and phonologic factors such as consistency (Andrews, 1982) in terms of independent modules. For this reason, empirical and conceptual isomorphism can be thought of as aspects of theoretical isomorphism, rather than as independent ideas. Our argument is that processing mechanisms based on first-order isomorphisms tend to preserve the complexity of conceptual distinctions and empirical patterns. Detailed pathway selection models are not becoming complex because they use pathway selection per se, but rather because extant pathway selection models use first-order isomorphisms exclusively. The alternative is less direct relationships between theoretical mechanisms and a priori concepts or empirical patterns. Such relationships are higher order isomorphisms. The alternative canonical model of Figure 5, which was based on the distinction between normal reading and proofreading, is 9

The concept of isomorphism, as used here, was inspired by Shephard (1984). However, he discussed a different form of isomorphism—that between representations and the aspects of the world they represent.

762

GREGORY O. STONE AND GUY C. VAN ORDEN

a step in this direction. Higher order isomorphisms should increase the potential for parsimony. This can be illustrated by analyzing pathway selection without any consideration of the conceptual rationale for the pathways. With P pathways, the maximum number of conditions that can be accounted for (C) is equal to the number of unordered subsets of the set of pathways (so C = 2P - 1). The number of pathways (P) through M modules is equal to the number of unordered subsets of the set of modules (so P = 2M - I). 10 These calculations lead to the startling conclusion that pathways through four modules could account for 32,767 different experimental conditions! The catch is that the job done by each module could be totally opaque conceptually. This is an extreme case of trading conceptual tractability for parsimony. However, it does raise the issue of whether a tradeoff between conceptual tractability and parsimony is inescapable. Some constructs require less trade-off than others. The trade-off is not zero sum. The critical question is how we find such constructs. One approach is strong inference: Begin with a conceptually tractable construct and hope it proves parsimonious when tested. We suggest a complementary alternative— pluralistic levels of analysis (Stone, 1988). We begin by arguing that higher order isomorphisms are best identified using lower order isomorphisms as a base. First-order isomorphisms, when done well, provide an excellent description of empirical patterns and the way we think about them. However, they proliferate as the empirical base increases. This tendency is undesirable in explanatory constructs but is natural for descriptive constructs. Higher order isomorphisms bring apparently different concepts or empirical findings together in a single construct. In principle, increasing the empirical base provides important clues for developing higher order isomorphisms, which reduce the number of explanatory constructs. However, higher order isomorphisms are difficult to understand and develop without a foundation in lower order isomorphisms. For example, we believe one reason connectionist models are so difficult to understand is that they tend to use higher order isomorphisms directly without a foundation in lower order isomorphisms. A pluralistic, levels-of-analysis approach coordinates use of lower and higher order isomorphisms. Each type of isomorphism plays an important part in theory development. The firstorder isomorphism is not a "fallacy" when it is a means to an end. Ideally, lower order isomorphisms (and their relation to higher order isomorphisms) provide conceptual tractability, and the higher order isomorphisms move toward greater parsimony. Another property of conceptual isomorphism reduces the trade-off between conceptual tractability and parsimony: Conceptual isomorphism is largely relative. For example, many fundamental constructs in cognitive science, like the algorithm, were difficult to grasp before computers became commonplace. Today, such constructs seem intuitive. As a higher order conceptual isomorphism becomes familiar, it moves toward becoming a lower order conceptual isomorphism. The relating of higher order isomorphisms to lower order isomorphisms in a pluralistic approach facilitates this process. In effect, the balance between lower and higher or-

der conceptual isomorphisms is a balance between creative new possibilities and useful, established constructs.

Strategic Control as Parametric Control Thus far, our discussion of strategic control has been solely in terms of pathway selection. We now consider an alternative approach to strategic control—system control of processing parameters (parametric control). We consider this alternative for several reasons. First, a principled account of word recognition strives for maximum explanatory value using a minimum of conceptual constructs. This does not require that all constructs be of the same type. A parametric control analysis of the present study will add to the menu of constructs available to detailed models. Second, canonical modeling and higher order isomorphism are not restricted to pathway selection. The elegant simplicity of pathway selection made it an ideal medium for introducing those concepts. However, some aspects of canonical modeling generalize across conceptual frameworks, whereas other aspects depend on the particular conceptual framework. Developing a canonical parametric control model should clarify the nature of canonical modeling and higher order isomorphism. Third, our personal bias is toward parametric control. Parametric control is less common than pathway selection in theories of word recognition. Furthermore, when parametric control has been proposed, it has generally appealed to the same conceptual construct—the decision criterion of signal detection theory (Green & Swets, 1966) or an analogous construct, like threshold (Morton, 1969). One notable exception is the change in inhibitory-excitatory balance across conditions in the interactive activation model of Rumelhart and McClelland (1982). Although Rumelhart and McClelland did not explicitly attribute the change to strategic control, this was their intention (J. L. McClelland, personal communication, February 14, 1991). It is important to emphasize that changing a model's parameters is parametric control only if the processing system controls the parameter value (just as the processing system determines the active pathways in pathway selection). For example, Patterson, Seidenberg, and McClelland (1989) accounted for acquired surface dyslexia in terms of a reduction in the number of hidden units after the insult. This change in the number-of-hidden-units parameter was not strategic control because it resulted from an insult to the system, rather than being under system control. 10

The number of pathways is equal to the number of unordered sets of modules because, as noted in the Introduction, if two modules contribute input to each other, the concept of module ceases to apply in any worthwhile sense. Thus, if Module A can be followed by Module B, then Module B cannot be followed by Module A. If this constraint is relaxed, the set of pathways is equal to the set of ordered subsets, which is much larger.

STRATEGIC CONTROL IN WORD RECOGNITION

Canonical Modeling and Parametric Control An appeal of canonical pathway selection models is their clear operational definition of pathways. The high-frequency blocking benefit requires that a pathway used in the blocked high-frequency condition not be used in the mixed-frequency condition. Whereas there is freedom in the conceptual rationale for modules, the configuration of canonical pathways is constrained. Furthermore, the pathways are fairly independent. Thus pathways can often be added without altering previous explanatory accounts. Neither of these properties can be assumed for parametric control. First, any one of several control parameters may account for the same empirical result. Thus empirical evaluation in a parametric control approach requires a broader empirical base. Second, changing one parameter can alter the effects of other parameter changes. In other words, the parameters of most mechanisms are interdependent constructs. Furthermore, the parameters available for control depend on assumed mechanisms. Independent constructs allow proliferation of viable, detailed models. In this case, canonical modeling is particularly useful for characterizing a large number of existing models. Interdependent constructs make the proliferation of viable detailed models more difficult. Another aspect of canonical modeling is particularly useful in this case. Canonical models place constraints on potential detailed models. Such constraints are the basis for identifying higher order isomorphisms. In both roles, canonical modeling facilitates assignment of explanatory credit and blame. We illustrate the preceding points using a canonical model in the random-walk (or diffusion) framework (Krueger, 1978; Ratcliff, 1978).11 This canonical model derives from Gordon's (1983) model of frequency blocking. We then modify the canonical model using a higher order isomorphic parameter (match emphasis) that provides a conceptual rationale for coordinated changes in the basic parameters of the random-walk framework. This match emphasis parameter derives from a higher order parameter in resonant verification (Stone & Van Orden, 1989), that is, the weight given to top-down versus bottom-up information. We have previously proposed that this parameter accounts for nonword foil lexicality effects.

The Random-Walk Framework In the random-walk framework, evidence concerning a hypothesis (e.g., the stimulus is a word) accumulates over time. When that evidence reaches a criterion level, the hypothesis is accepted (e.g., a "word" response is made). In addition, there is variability in the rate of evidence accumulation. At each step, there is a probability distribution for the change in accumulated evidence. Thus the basic parameters of a single random-walk process are (a) the criterion level of evidence for a correct response, (b) the criterion level of evidence for an incorrect response, and (c) the parameters of the step process distribution function (e.g., mean and variance). This framework is not uncommon in theories of word recognition (e.g., Balota & Chumbley, 1984; Besner &

763

Johnston, 1989; Gordon, 1983; Morton, 1969). It is also used in other fields of cognitive research (e.g., Krueger, 1978; Ratcliff, 1978). In discussing this framework, we adopt several conventions for conceptual clarity. First, anyone who has studied physics is familiar with the confusion that can arise with positive and negative directions (e.g., increasing a negative value decreases its absolute value). Our terminology follows the more familiar convention of using positive values in a given direction (toward a word or a nonword decision). Thus, we refer to shifting the word criterion in the positive (word) direction as "shifting the criterion away from the origin." Shifting the nonword criterion in the positive (word) direction is referred to as "shifting the criterion toward the origin." Second, we assume that random-walk finishing times (FTs) correspond qualitatively to observed RTs. We use FT rather than RT to maintain cognizance of this assumption. Given this, one can read RT for FT. Third, we use the modifier absolute to indicate a parameter measured in terms of the unknowable units of evidence. Strategic control adjusts the absolute value of parameters. We use the modifier effective to indicate a parameter measured relative to performance measures (e.g., FTs). This distinction merits clarification. Shifting the absolute word criterion toward the origin reduces the number of steps needed to make a hit response. This speeds hit FTs. The same speeding of hit FTs can be achieved by speeding the absolute word accumulation rate. This also reduces the number of steps needed to make a hit response and so speeds hit FTs. For the observer, activity in the random walk is a "black box." Only the response (word or nonword) and the FT can be observed. Thus, shifting the absolute criterion toward the origin and speeding the absolute accumulation rate cannot be distinguished (given only hit trials). Both can be characterized as a shift of the effective criterion toward the origin, or both can be characterized as a speeding of the effective accumulation rate. The basic random-walk framework was described in terms of a single accumulation process. However, in a lexical decision there are two types of accumulation process, one for word trials and one for nonword trials. Understanding how parametric control (of absolute parameters) affects performance (via effective parameters) requires separate consideration of correct and error trials for words and nonwords. A hit response (responding "word" on word trials) occurs when the accumulation process for a word trial reaches the word criterion. Thus, the effective hit criterion depends on the absolute word criterion and the statistics of the absolute word accumulation process.12 11

For some purposes, the distinction between continuous time (diffusion) and discrete time (random-walk) accumulation is important (Ratcliff, 1978). For the present purposes, either approach is acceptable. Discussion is framed in terms of a random walk to facilitate conceptual clarity. 12 Simplest formulation of the canonical model: Approximate relationships between parameter settings and performance measures can be characterized algebraically using the simplest version of the random walk. Assume that the accumulation processes are

764

GREGORY O. STONE AND GUY C. VAN ORDEN

A miss response (responding "nonword" on word trials) occurs when the accumulation process for a word trial reaches the nonword criterion. Thus, the effective miss criterion depends on the absolute nonword criterion and the statistics of the absolute word accumulation process. The same reasoning applies to nonword trials. The effective correct rejection criterion (for responding "nonword" on nonword trials) depends on the absolute nonword criterion and the statistics of the absolute nonword accumulation process. The effective false-alarm criterion (for responding "word" on nonword trials) depends on the absolute word criterion and the statistics of the absolute nonword accumulation process. Note that the effective criterion for each type of trial depends on one absolute criterion and one absolute accumulation rate. Conversely, each absolute parameter affects two effective criteria. Thus, strategic control of a single absolute parameter predicts a specific interdependence of two effective parameters. Predictions concerning one condition (e.g., hit probability and FTs) constrain predictions for another condition (e.g., false-alarm probability and FTs). Effective criteria are first-order empirical isomorphic constructs. They relate data directly to a theoretical construct. However, parametric control determines the values of absolute criteria and accumulation rates, which are secondorder empirical isomorphic constructs (i.e., they are one step removed from the data). Independent control of absolute (second-order) parameters produces interdependent changes in effective (first-order) parameters. This concept is developed in the remainder of this discussion. An analysis equivalent to (but redundant with) mapping two absolute criteria and two absolute accumulation rates into four effective criteria entails measuring the statistics of accumulation in terms of the proportion of evidence needed to reach a given absolute criterion. In this case, there are four effective accumulation processes based on the same set of absolute parameters. This analysis is interchangeable with that for effective criteria. Table 6 illustrates how a change in absolute word criterion, versus a change in absolute word accumulation rate, affects weak-sense stationary (i.e., the step-size distributions do not change over time). Also assume that step-size distributions are normal and independent. Let R be the accumulation rate, T the mean finishing time for correct responses, and C the decision criterion for correct responses. Then T = CIR. This relationship is approximate for two reasons. First, a given deviation from the mean accumulation rate that slows accumulation will have a larger impact on finishing time than will the same deviation in the direction of speeding accumulation (see Figure 6). Thus finishing time distributions are positively skewed. This factor contributes to underestimation of the mean finishing time. Second, the effect of errors is ignored. Because error trials would have to return from beyond the incorrect decision criterion to count in correct finishing times, they will generally be very slow. Their inclusion thus contributes to overestimation of the mean finishing time. Note that these two approximations act in opposite directions. Exact expressions for the finishing time distributions and response probabilities for the diffusion form of the model have been provided by Ratcliff (1978).

Table 6 Effects of Changing Criteria and Accumulation Rates Change in absolute parameter Performance measure

Moving word" criterion toward originb

Speeding worda accumulation11

Hit finishing times Direct effectd Speeded Speeded Indirect effect5 No effect No effect Miss probability Direct effect No effect Decreased Indirect effect Decreased No effect Correct Rejection finishing times Direct effect No effect No effect Indirect effect Speeded No effect False-alarm probability Direct effect Increased No effect Indirect effect No effect No effect a The effect of changing nonword parameters can be determined by exchanging word (hit and miss) and nonword (correct rejection and false alarm) measures. b To determine the effect of moving a criterion away from the origin, substitute "slowed" for "speeded." c To determine the effect of slowing word accumulation, exchange "increased" and "decreased." d The direct effect of changing the absolute parameter. e The small indirect effect that resulted from changes in the probability of the response type.

the full pattern of results. Shifting the absolute word criterion toward the origin shifts the effective hit criterion toward the origin. This speeds hit FTs. Shifting the absolute word criterion toward the origin also shifts the effective false-alarm criterion toward the origin. This increases the probability of a false alarm. In the direct analysis, misses are unaffected because the effective miss criterion does not depend on the absolute word criterion. Likewise, correct rejection FTs are unaffected in the direct analysis because the effective correct rejection criterion does not depend on the absolute word criterion. These direct effects are given in the first line of each entry in Table 6. This pattern of predicted effects is an operational definition of the canonical, absolute word criterion parameter. If a parameter besides absolute word criterion in a detailed accumulation model predicts the same qualitative pattern, it can be fairly characterized by the canonical parameter. This parallels the operational definition of pathways in canonical pathway selection models. Speeding absolute word accumulation rate has a different overall effect. Once again, hit FTs are speeded because the effective hit criterion shifts toward the origin. However, the effective false-alarm criterion does not depend on the absolute word accumulation rate and the false alarm rate is unchanged. Correct rejection FTs are also unaffected because the effective correct rejection criterion does not depend on absolute word accumulation rate. Finally, speeding the absolute word accumulation rate shifts the effective miss criterion away from the origin. This reduces the probability of a miss.

765

STRATEGIC CONTROL IN WORD RECOGNITION

These direct effects do not reflect a more subtle interdependence of the effective parameters. For example, an increase in hits reduces the number of misses. FTs for the eliminated misses are not symmetrically distributed around the mean FT. This introduces a small indirect effect. On a few word trials, the accumulation process almost reaches the absolute word criterion and then reverses and crosses the absolute nonword criterion to produce a miss. Shifting the absolute word criterion toward the origin converts some of these trials to hits and so reduces the probability of a miss. Thus there is an indirect reduction in misses. Likewise, on a few nonword trials, the accumulation process almost reaches the absolute word criterion and then reverses and crosses the absolute nonword criterion to produce a very slow correct rejection. Shifting the absolute word criterion toward the origin converts some slow correct rejection trials into false-alarm trials. Thus there is an indirect speeding of correct rejections. There are no indirect effects of changing absolute accumulation rate. The effective parameters that determine nonword trial performance are completely unaffected by changes in the absolute word accumulation process, and vice versa. A final, important characteristic of the random-walk framework is the relationship between mean FT and mean accumulation rate. In the simplest case, FT is inversely proportional to mean accumulation rate (see Footnote 12). This relationship has a qualitative property that would be difficult to avoid in random-walk models: FT is a faster than linear decreasing (concave) function of effective accumulation rate. Figure 6 illustrates that the same change in effective accumulation rate will have a larger impact on FTs at slower rates of accumulation (Condition A) than at faster rates of accumulation (Condition B). The function relating FT and effective accumulation rate (henceforth referred to as the finishing time function) need not be a strictly inverse relationship. Any function with a uniformly negative first derivative and a uniformly positive second derivative will yield the same qualitative property.

A Canonical Random-Walk Model Our initial random-walk canonical model is almost identical to Gordon's (1983) random-walk model, except that Gordon's model included no mechanism for nonword responses. It contains decision criteria for word and for nonword responses and accumulation processes for each stimulus type. The effective accumulation rate for high-frequency words is more rapid than that for low-frequency words. The absolute decision criteria are adjusted on the basis of overall sensitivity to the word/nonword distinction. A more rapid growth in overall sensitivity shifts the absolute decision criteria toward the origin. However, the canonical model differs from Gordon's (1983) model in the way it is used. A detailed model commits to many specific assumptions (e.g., global vs. nodal accumulation of evidence). In principle, when a detailed model's prediction fails, the model has been falsified. In practice, an effort is made to modify some assumption to fix the model's account. In canonical modeling, the effort is made a priori to

finishing tim»

accumulation rat*

Figure 6. The finishing time function for random-walk processes. (A given change in a slow accumulation rate [Condition A] has more impact on finishing time than does the same change in a rapid accumulation rate [Condition B]. Concrete examples of the two conditions are given in the text.)

identify whether or how variations in assumptions would affect qualitative predictions. Canonical modeling strives to assign explanatory credit and blame to particular assumptions, rather than detailed conjunctions of assumptions (see Lakatos, 1970). Next, we consider one assumption that does not alter qualitative predictions of the canonical model and two assumptions that do. Evidence can accumulate separately for each lexical entry (Morton, 1969) or globally in a frequency/ meaningfulness accumulation process (Balota & Chumbley, 1984; Besner & Johnston, 1989; Gordon, 1983). This distinction matters when relationships between words are important (e.g., neighborhood effects). However, it is not relevant to a qualitative account of the present results. Separate accumulators for each lexical entry are random-walk processes running in parallel. In such a logogen style process, the first process to finish with a word decision determines the FT for word responses. Accumulated evidence in a global process can be thought of as an integration of the evidence for each lexical entry. As discussed in the Introduction, the minimum and sum of random variables yield the same qualitative predictions. An alternative assumption does alter qualitative predictions. M. Coltheart et al. (1977) proposed that nonword responses are made using a variable deadline. The nonword

766

GREGORY O. STONE AND GUY C. VAN ORDEN

response deadline begins at a brief time value. The deadline is extended on the basis of the rate of "word" evidence accumulation—the faster the accumulation of "word" evidence, the greater the extension of the deadline. The absolute parameters for this mechanism are the initial deadline value and the magnitude of deadline extension for a given rate of evidence accumulation. Analysis of how changes in these absolute parameters affect performance measures, analogous to the analysis summarized in Table 6, is not difficult. We give only the basic prediction and encourage the reader to work through the logic. Setting the initial deadline earlier or slowing extension of the deadline is not equivalent to any of the changes in the effective criterion analysis—hit FTs are unaffected, miss errors increase, correct rejection FTs are speeded, and false-alarm errors decrease. This pattern does not match the operational definition of any canonical parameter. However, it is equivalent to a coordinated shifting of the absolute nonword criterion toward the origin and speeding of absolute nonword accumulation (this can be verified using Table 6). Once again, canonical modeling facilitates analysis of models that are not strictly equivalent. If results suggest such a coordinated change in nonword criterion and nonword accumulation, strategic control of a variable deadline provides a higher order isomorphism with respect to the absolute parameters of random walk. Another alternative assumption merits consideration. In principle, an accumulation process can be very complex. Canonical modeling begins with simplifying assumptions about the process. Simplifying assumptions are common in the modeling of mechanisms. In canonical modeling, one considers how changing such assumptions might impact qualitative predictions. We present one case in point. The canonical model assumes the accumulation process is weak-sense stationary (i.e., the statistics of the step-size distribution do not change over time). To our knowledge, no theory of word recognition has explicitly modeled a nonstationary accumulation process. However, this possibility is implicit in many theories. For example, the integrative decision process discussed in the Introduction can be modeled as a random-walk process. If the statistics of accumulation for fast and slow converging pathways differ, the overall accumulation process will vary over time. Again, by example, if assembled phonologic processing is slower than direct visual processing, early accumulation will not reflect phonologic factors, whereas late accumulation will. This alternative assumption can potentially change the relationship between correct FTs and errors for a given stimulus type. Early accumulation affects the number of errors more than late accumulation. A rapid early move in the correct direction, followed by a slower final approach, will produce fewer errors for the same mean FT than will a weak-sense stationary process with the same mean accumulation rate. If this difference is very large, it will alter qualitative predictions. This analysis of alternative assumptions suggests the qualitative predictions of the canonical random-walk model are fairly robust across the random-walk framework. However,

one cannot assume that more detailed models automatically map into the canonical model in the random-walk framework. This analysis is meant to highlight potentially important differences. Analysis of the canonical model should specify the qualitative properties that are critical for a given prediction. Any detailed model of interest must be checked for qualitative equivalence to the canonical model before success of the canonical analysis can be accepted as support and especially before failure of the canonical analysis can be accepted as falsification.

Canonical Account of the Present Results How do nonword foils affect the setting of the absolute decision criteria? More lexical nonwords should produce more "word" evidence (i.e., less "nonword" evidence) than less lexical nonwords. Accumulation rates for nonwords slow, whereas those for words are unaffected. Thus, there is a less rapid stimulus-driven separation of the word and nonword accumulation processes with more lexical nonword foils. Less rapid increase in sensitivity shifts both absolute decision criteria away from the origin, and both hit and correct rejection FTs are slowed. The interaction of frequency and nonword foil lexicality derives from the relationship between effective criterion and effective accumulation rate, and the concave FT function. Raising the absolute word criterion slows the effective hit accumulation rate for words. Because low-frequency words have slower base accumulation rates (Condition A in Figure 6), FTs to low-frequency words are more slowed by an increase in nonword lexicality (i.e., a slowing of effective accumulation rate) than are high-frequency words (Condition B in Figure 6). The error data for the nonword foil lexicality manipulation raise problems. The miss criterion depends on the absolute word accumulation process and the absolute nonword criterion. When nonwords are more lexical, the word accumulation process is unchanged, but the absolute nonword criterion is shifted away from the origin. This predicts that miss errors should decrease as nonword foil lexicality increases. In fact, a significant effect in the opposite direction was observed. The account of the frequency blocking benefit for highfrequency words is similar to that of Gordon (1983). Eliminating low-frequency words increases the growth of sensitivity to the word/nonword distinction and allows a shift of the absolute word criterion toward the origin. As a result, the same high-frequency words are recognized faster when they are blocked than when they are mixed with low-frequency words. Two factors contribute to this analysis. First, reducing the variance of the overall word accumulation process reduces the unit of measurement for sensitivity. The same absolute difference in distributions becomes larger in standard deviation units. Second, the overall separation between word and nonword distributions increases in absolute units. These effects cooperate to produce a blocking benefit. Consider now correct rejection FTs for the blocked highfrequency and mixed-frequency conditions. Like Glanzer and Ehrenreich (1979), we found faster correct RTs to non-

STRATEGIC CONTROL IN WORD RECOGNITION words in the blocked high-frequency condition than in the mixed-frequency condition. (Gordon, 1983, did not report nonword data.) In the canonical model, correct rejection FTs are speeded because the effective correct rejection criterion is shifted toward the origin by shifting the absolute nonword criterion toward the origin. Why is there no low-frequency blocking effect (given legal nonword foils)? Gordon (1983) gave the impression that the lack of a low-frequency blocking effect follows naturally from the random-walk analysis. In fact, it requires counteraction of the two factors that cooperate to give the highfrequency blocking benefit. Eliminating high-frequency words also reduces the variance of the overall word accumulation process. Again, the same absolute separation between word and nonword distributions is larger in standard deviation units. On the other hand, the absolute separation between the word and nonword distributions is smaller. Thus, the lack of a low-frequency blocking effect requires that the absolute decrease in the separation of word and nonword distributions, measured in smaller standard deviation units, produces no net change in sensitivity. The lack of a lowfrequency blocking effect is not a natural consequence of the framework, but rather a constraint on the canonical model. Given that the lack of a low-frequency blocking effect given legal nonword foils results from a trade-off between two factors, a change in nonword foil lexicality should shift that balance. The absolute separation between word and nonword accumulation processes is smaller in the pseudohomophone than in the legal nonword foil condition. Thus, eliminating high-frequency words produces a proportionately larger reduction in the absolute separation between word and nonword accumulation processes for pseudohomophone foils than for legal nonword foils. On the other hand, the reduction of the standard deviation unit is unaffected by nonword foils, because it depends only on the change, due to blocking, in the overall word accumulation process. Thus, the tendency toward a blocking cost increases, whereas the tendency toward a blocking benefit is unchanged. This predicts a low-frequency blocking cost in the pseudohomophone foil condition. In sum, the initial canonical model suffers from two shortcomings. First, it predicts that miss errors should decrease with increasing nonword foil lexicality. Second, given the lack of a low-frequency blocking effect for legal nonword foils, RTs to blocked low-frequency words should be slower than those to low-frequency words in the mixed-frequency condition given pseudohomophone foils. Is there a coordinated set of changes in the absolute firstorder isomorphic parameters of random walk that solves these problems? If so, can one devise a principled rationale for it? The method of relating changes in two variables by making each a function of some third variable is common in mathematics and physics. For example, the height and vertical distance traveled by a cannon ball are related by a parabolic function. If each of these variables is expressed as a function of time, physical laws can be used to explain the observed parabolic trajectory.

767

In the pathway selection framework, canonical models are modified by the addition of pathways to an existing set of assumed pathways. In parametric control frameworks (like random walk), canonical models are modified by finding a higher order parameter that generates the necessary coordinated changes in several lower order parameters. New patterns of coordinated changes in the lower order parameters, required by new empirical constraints, can, of course, lead to several higher order parameters. However, the same process of identifying a higher order parameter that coordinates lower order parameters can, in principle, be applied again. In other words, initially posited higher order parameters may prove to be functions of an even higher order parameter. The basic method of modifying canonical models in parametric control frameworks is reduction of explanatory constructs through development of higher order isomorphisms. One approach to identifying such higher order isomorphisms is to identify the absolute parameter settings for each condition that provide a full account of results. One then looks for a conceptually tractable pattern in these settings. Unfortunately, in the present case, the pattern is quite complex. One option is to define a higher order isomorphic parameter operationally as one that gives the observed pattern. This is unacceptable. There is no conceptual rationale for such a parameter. Either one cannot predict results for new manipulations, or one allows the operational definition of the parameter to grow more complex as the empirical base is expanded. A second option is to search for several higher order parameters that together predict the observed pattern. This is a better option but pays a relatively high price in parsimony to account for a limited data set. The problem is compounded by the momentum of posited mechanisms—clues provided by new data may be missed in the effort to fit those data to previously posited mechanisms. We proceed with a third option: developing a single higher order isomorphic parameter that is conceptually tractable but does not account for the full pattern of results. The least explored results are left on hold until further empirical work provides a clearer sense of the phenomenon. This is not a long-term cover-up. Rather, it motivates a more detailed investigation of the orphan result. In the present case, the lowfrequency blocking benefit given pseudohomophone foils is the orphan. Note that this is the result that is also most difficult for pathway selection models. We use a match emphasis parameter as the higher order isomorphic parameter in the modified canonical model for several reasons. First, it corresponds to the parameter in resonant verification that we previously proposed for nonword foil lexicality (Stone & Van Orden, 1989). Second, this correspondence helps illustrate the levels of analysis approach. Third, the optimal value of the parameter for a given condition can be derived for the simplest implementation of the canonical model. This illustrates that a control parameter need not be an arbitrary, free parameter in the theoretical account. The match emphasis parameter corresponds to a coordinated change in the basic random-walk parameters. An ad-

768

GREGORY O. STONE AND GUY C. VAN ORDEN

vantage of the match emphasis parameter is that it can be generalized to models besides resonant verification, whereas the top-down, bottom-up balance parameter is specific to resonant verification.

Mapping Resonant Verification Into a Random-Walk Framework

Use of a random-walk (or diffusion) process to characterize resonance has been noted previously (Gordon, 1983; Ratcliff, 1978). In resonance, evidence for a match (or mismatch) between top-down expectations and stimulus-driven (bottom-up) information builds over multiple matching cycles. After each matching cycle, evidence of a match increases (or decreases) as a result of that matching cycle. Each matching cycle corresponds to a step in the random walk and so determines the step distribution function for that cycle. Resonant verification is higher order isomorphic with respect to random walk for four reasons. First, any resonance theory can be described in terms of a random walk (although the description may be complex). However, most theories that can be described in terms of random walk are not resonance theories. Second, resonance theories are more constrained than random walk in the implementation of conceptual constructs. For example, in resonant verification, representations are learned through an unsupervised connectionist learning rule. As a result, more familiar stimuli generally have stronger connection weights than do less familiar stimuli and thus must accumulate stimulus-driven match evidence more efficiently.13 In random walk, a more rapid rate of accumulation for more familiar stimuli is an ad hoc (albeit intuitively compelling) assumption. Third, properties (like the accumulation of evidence) are achieved in resonant verification through mechanisms less obvious than the basic concepts expressed directly in the random-walk framework. In adaptive resonance, top-down and bottom-up information is combined. A nonlinear dynamic then strengthens "features" that receive strong combined support and suppresses "features" that receive weak combined support. The strength (or coherence) of the resulting resonance is effectively equivalent to the level of match evidence (see Grossberg & Stone, 1986; Stone & Van Orden, 1989, for in-depth discussions of the basic process and Carpenter & Grossberg, 1986, for a detailed implementation of adaptive resonance with these properties). Fourth, resonance serves more than one conceptual role. For example, in addition to implementing verification, resonance suppresses noise and unitizes patterns of activity into "representations" (Stone & Van Orden, 1989). We have proposed that in resonant verification, nonword foil lexicality determines the relative weight given to topdown versus bottom-up sources of information in matching (Stone & Van Orden, 1989). How might this parameter be mapped into a canonical parameter in the random-walk framework? When top-down emphasis is increased, features in the stimulus-driven information that match expectations make up a larger proportion of the total activation and so get a larger relative boost from the nonlinear dynamic. Features

in the stimulus-driven information that do not match expectations make up a smaller proportion of the total activation and so suffer greater relative suppression. In other words, match evidence is effectively given greater weight (and mismatch evidence less weight) when relative top-down emphasis is increased. Conversely, match evidence is effectively given less weight (and mismatch evidence greater weight) when relative top-down emphasis is decreased. The top-down/bottom-up balance parameter has several effects on processing. For the present, its role in resonant verification as a match emphasis parameter is most interesting. This parameter coordinates the absolute word and nonword accumulation rate parameters in the random-walk framework. Evidence for match and for mismatch is accumulated on each matching cycle. Thus, the total change in accumulated evidence on each step is a weighted combination of match and mismatch evidence on that match cycle. Match evidence moves accumulation toward a word response. Mismatch evidence moves accumulation toward a nonword response. Consequently, increasing the weight given to match evidence shifts the accumulation process toward the word decision criterion. This speeds the absolute rate of accumulation for words but slows the absolute rate of accumulation for nonwords.

Optimization Determines Match Emphasis The match emphasis is not set arbitrarily. Match emphasis is set to optimize overall performance. When error rates are low, optimization can be approximated by assuming that mean overall FT is minimized. This optimization has an important qualitative property. Increasing the relative proportion of stimulus-driven evidence contributing to mismatch increases the strategically driven emphasis on match evidence. In other words, a stimulus-driven change in match evidence (for word or nonword trials) leads to a compensatory shift in strategically driven match evidence. This property is clearly evident in the formula for the match emphasis parameter in the simplest case model.14 However, as a qualitative property, it applies 13 Familiarity's effect on connection weights is more complex than simple strengthening. Familiarity also reduces the relative cross-talk and so provides a cleaner pattern of activity. This effect also speeds processing. 14 Start with the simple model presented in footnote 12 and the nomenclature introduced there. Let the subscripts W and N indicate word and nonword trials, respectively. Let the subscripts + and - indicate match and mismatch, respectively. Let m indicate the match emphasis parameter (where 0 < m < 1), and let E indicate the total evidence available per step for the given condition. The total accumulation rates for words and nonwords, respectively, as a function of the match emphasis parameter are given by Rw = mRw+ - (1 - m)/?w_ = mEw - /?W-» and RN = (1 - w)/?N_ - mRN+ = RN_ - mEN. Keep in mind that all variables are expressed as positive values in the appropriate direction. In finding the value of m that minimizes overall finishing time, it is useful to measure all variables relative to their corresponding E.

STRATEGIC CONTROL IN WORD RECOGNITION to more general cases. In effect, strategic control of match emphasis acts as a partial homeostatic force. Stimulus-driven benefits to one type of stimulus (e.g., nonwords) are partially transferred to the other type of stimulus (e.g., words). Conceptually, a stimulus-driven gain in one condition, as it is increased, produces decreasing marginal gains. This is a consequence of the concave FT function (see Figure 6). The added change, if shifted to the other condition, will produce a larger marginal gain and so increase the overall benefit to FTs. Consider the implications for the nonword foil lexicality manipulation. In going from legal to illegal nonwords, stimulus-driven mismatch evidence increases for the nonword trials. This in turn shifts strategic control toward greater match emphasis. The increase in strategically driven match emphasis speeds accumulation rates for word trials (which benefit from increased emphasis on match evidence). Hit FTs are speeded, and miss probabilities are reduced (see Table 6). The increase in strategic match emphasis slows nonword accumulation rates relative to what they would have been without strategic control. However, the stimulus-driven speeding of nonword accumulation rate is greater than the strategically driven slowing. Thus, less lexical (illegal) nonwords still generate faster correct rejection FTs and lower false-alarm probabilities relative to more lexical (legal) nonwords. The interaction of nonword foil lexicality and frequency derives again from the FT function: A given change in absolute accumulation rate has more impact on FT when the accumulation rate is less rapid (low frequency and Condition A in Figure 6) than when it is more rapid (high frequency and Condition B in Figure 6). Thus, a shift toward greater match emphasis with decreasing nonword foil lexicality will speed hit FTs for low-frequency words more than that for highfrequency words. An account of frequency blocking begins by considering how blocking affects overall word accumulation. When frequency is mixed, overall word accumulation is the average for high- and low-frequency words. When frequency is blocked, overall word accumulation is determined by the frequency used. Thus blocked high frequency speeds stimulus-driven overall word accumulation rate, whereas blocked low frequency slows stimulus-driven word accumulation rate. Analysis of blocking effects in the modified canonical model depends on two consequences of this change in overall accumulation rate. The first is a direct consequence of match emphasis setting. The second requires an added assumption about accumulation rates for high- and low-frequency words. Recall that the match emphasis parameter partially shifts stimulus-driven gains in one condition (e.g., words) to gains Such variables are indicated by an asterisk. Calculation of optimal m is straightforward, but involved. We thus present only the solution. Detailed derivation is available on request. There are three cases: If C&, = Cfc, m = (R%/- + Kfc_)/2; if C& > C&, m = [(C& - G) * Rb~ + (G - Cfc) * R*,-V[C$, - Cft]; if C^ < C&, m = [(G - C&) * *fc- + (Cft - G) * «&_]/ [C& - Ctvl- G is the geometric mean of C& and Cfc.

769

in the other condition (e.g., nonwords). As a result, there is less strategically driven match emphasis in the blocked highfrequency condition than in the mixed-frequency condition. Likewise, there is more strategically driven match emphasis in the blocked low-frequency condition than in the mixedfrequency condition. At this point, the distinction between overall stimulusdriven evidence and the stimulus-driven evidence on a given trial is important. Nonword foil lexicality changes overall stimulus-driven evidence for nonword trials by changing the stimulus-driven evidence on each nonword trial. Less lexical nonwords have faster correct rejection FTs because a stimulus-driven gain on each trial is only partially transferred to a strategically driven gain for hit FTs. In contrast, frequency blocking changes overall stimulus-driven evidence by changing the composition of the word stimulus set. The stimulus-driven evidence for a given word trial is the same in the mixed and blocked conditions. A given high-frequency word generates the same stimulus-driven match evidence in the mixed as in the blocked condition. However, speeding overall word accumulation rates shifts strategic control toward less match emphasis, which makes hit FTs to highfrequency words slower in the blocked condition than in the mixed-frequency condition. By the same logic, shifting strategic control toward more match emphasis should make hit FTs to low-frequency words faster in the blocked condition than in the mixed-frequency condition. Because of the inverse relationship between effective accumulation rate and total evidence, the benefit for blocked low-frequency words (analogous to Condition A in Figure 6) will be larger than the cost for blocked high-frequency words (analogous to Condition B in Figure 6 in the reverse direction). In the preceding analysis of frequency blocking, we used the simplest possible assumptions about the effect of blocked frequency on word accumulation rates. Unfortunately, the prediction is a high-frequency blocking cost and a lowfrequency blocking benefit. Canonical modeling begins with the simplest, broadest characterization of admissible models. Additional constraints are added when necessary and limit the set of admissible models. The blocking effects require an additional constraint on the modified canonical random-walk model. In the present case, the added constraint follows from a property of connectionist learning rules. As noted earlier, a natural property of connectionist learning rules is that raw, physical match evidence is converted more efficiently into match evidence for high- than for low-frequency words. If the balance between match and mismatch evidence is increased by increasing the effectiveness of raw match evidence, then the total evidence for high-frequency word trials is greater than the total evidence for low-frequency word trials. Personally, this assumption seems more plausible than the alternative explanation that more rapid accumulation occurs for more frequent words because they have a higher proportion of raw, physical match evidence. Total evidence has an important qualitative effect on performance in the random-walk framework. The effective decision criterion for a given condition (e.g., hits) depends on the total evidence contributing to the accumulation process

770

GREGORY O. STONE AND GUY C. VAN ORDEN

(see footnote 14). Increasing total evidence shifts the effective criterion toward the origin. Conceptually, increasing total evidence increases resolution of the match-mismatch balance. The law of large numbers comes into play more rapidly. This increased resolution effectively shifts the FT criterion toward the origin. In the blocked high-frequency versus mixed-frequency condition, there is an overall increase in total evidence on word trials. This shifts the effective hit FT criterion toward the origin, which makes hit FTs to highfrequency words faster in the blocked-frequency than in the mixed-condition. In the blocked low-frequency versus mixed-frequency condition, there is an overall decrease in total evidence on word trials. This shifts the effective hit FT criterion away from the origin, which makes hit FTs to lowfrequency words slower in the blocked than in the mixed condition. The change in effective criteria due to frequency blocking operates in the opposite direction from the change in effective accumulation rates. What is the net effect? Begin by noting that FT is directly proportional to effective criterion (see footnote 12). Thus, the decrease in hit FTs for highfrequency words due to blocking's effect on effective criterion is roughly equal to the increase in hit FTs for lowfrequency words. Recall that the increase in hit FTs for highfrequency words due to blocking's effect on effective accumulation rate is smaller than the decrease in hit FTs for low-frequency words. As a result, if the combined effects for blocked low-frequency words cancel, there should be a net blocking benefit for high-frequency words. An interpretation of the low-frequency blocking benefit given pseudohomophone foils is feasible. The primary assumption is that pseudohomophones produce more total evidence than do legal nonwords because of an increase in effective mismatch evidence. We do not present this interpretation for several reasons. First, it is fairly ad hoc and places very strong constraints on admissible models. Second, this interpretation is hard-pressed to account for the magnitude of the low-frequency blocking benefit, particularly given that pseudohomophone foils did not produce a large increase in the high-frequency blocking benefit. The use of qualitative predictions is not a license to ignore magnitude of effect. Differences in magnitude can be described qualitatively as a significant interaction. In the present study, the appropriate three-way interaction was only marginally significant. Given that this is a problem result, the conservative position is to take it seriously. Third, a single parameter is unlikely to account for all strategic control in word recognition. For reasons discussed earlier, the present analysis was held to the strong constraint of strategically adjusting a single parameter. Further empirical work is necessary before a second parameter can be placed under strategic control. Directions for further empirical work are suggested by a generic prediction of the canonical random-walk models. In both the initial and modified canonical models, blocking effects depend on counteracting system properties. Factors that alter the balance between these properties should alter the balance of blocking effects (cf. Besner & Johnston, 1989). Indeed, it should be possible to obtain a blocking cost under the right conditions. Such a result would be awkward for a

pathway selection approach. Although the low-frequency blocking benefit given pseudohomophone foils is problematic for random-walk models with a single parameter under strategic control, it does provide initial support for the generic prediction.

Canonical Modeling and Design Principles Canonical modeling is one component of a pluralistic, levels-of-analysis approach. It is emphatically not meant to replace detailed modeling. Each type of modeling can benefit from the other. Canonical modeling is also symbiotic with another approach we have previously proposed— identification of design principles (Van Orden et al., 1990). The assumption of independent pathways is a design principle that holds across the pathway selection framework, regardless of the data to be accounted for. In parametric control frameworks, design principles must be tied more closely to empirical constraints. A design principle is a property of system performance that is abstracted away from details of implementation. Any system with that property satisfies that design principle. Thus, a design principle can be thought of as the defining property for a class of models. However, the class of models defined by a design principle differs in important ways from that defined by a canonical model. Consider the design principle of strategically controlled match emphasis. This principle can also be implemented in serial search verification (e.g., Becker, 1976, 1980; Paap et al., 1987). However, such models would not belong to the class of models defined by the modified canonical random-walk model. Design principles are an attempt to move away from the details of implementation toward higher order isomorphisms. Canonical models explore the implication of implementation constraints at levels of lower order isomorphism. Doing so suggests new design principles. For example, the top-down/bottom-up balance parameter of resonant verification (which is difficult to imagine in a traditional matching process) was mapped into a match emphasis parameter in the canonical random-walk model (which can easily be implemented in a traditional matching process). The mapping of resonant verification into a lower order isomorphic framework (random walk) helped to identify a principle that could be implemented more broadly across models that assume a matching process. On the other hand, canonical modeling provides a means of discovering the implications of design principles when they are embodied in a given framework.

Which Approach Is Best? Are canonical, parametric control models with interdependent constructs coordinated through higher order isomorphisms better than detailed, pathway selection models with independent, lower order isomorphic constructs? On the surface, pathway selection fared better than parametric control, in that the former accounted for the low-frequency blocking advantage whereas the latter did not. However, the pathway selection analysis required addition of new constraints or

771

STRATEGIC CONTROL IN WORD RECOGNITION new modules to detailed pathway selection models, whereas the parametric control model was held to a single (albeit higher order) parameter under strategic control and was constrained by a fuller pattern of data (e.g., errors and RTs). We believe that designating either approach as better is the wrong answer. The best approach is a pluralistic, levels-of-analysis approach that combines the strengths of each option (Stone, 1988). The previous choices involve several contrasts. Each choice on each contrast has advantages and disadvantages. We emphasize two related points. First, the analysis of random walk suggested that higher order isomorphisms allow treatment of interdependent constructs in terms of deeper, independent constructs. One is not forced to choose between strict independence of first-order isomorphic constructs and vague gestaltism. Independence remains an essential tool, maintained at deeper levels of analysis. The benefits of interdependence obtain at lower levels of analysis. Second, canonical modeling allows rigorous treatment of qualitative effects. Unfortunately, qualitative analysis is often viewed as a fuzzy retreat from mathematical rigor. In practice, this is often true. In principle, it need not be true. Topology is a branch of mathematics concerned with qualitative properties and is as mathematically rigorous as analysis. Canonical modeling is an initial effort to develop a topologic approach to theory development. It seems to us there is great value in recognizing that a prediction derives from the signs of a function's derivatives, for example, rather than from the explicit formula for the function. The relationship between canonical and detailed modeling parallels, in many ways, the relationship between topology and analysis in mathematics. For example, canonical modeling provides important constraints on admissible detailed models, and, on the other hand, analysis of more detailed models suggests potentially important qualitative properties for use in canonical modeling.

Strategy in the Future We began this article by noting the dilemma that strategic control poses for the scientific study of cognition. Extensive use of sophisticated strategies allows for a system as complex as human cognition seems to be. However, ad hoc appeal to strategy hypotheses undermines the scientific enterprise. Strategic control is not alone in this dilemma. The stimuli confronting a cognitive system are also incredibly complex. There is room here as well for ad hoc appeals and proliferation of explanatory constructs. Knowing the distal stimulus does not necessarily imply we know the proximal stimulus. Perhaps this problem is more difficult than that of strategy for the scientific enterprise. Pursuing strategy as a phenomenon to be understood, rather than avoided, has several advantages. First, when stimuli differ, we cannot be sure of the extent to which observed differences in performance are due to stimulus-driven effects as opposed to strategically driven effects in response to those stimulus-driven differences. With ideal strategy manipulations, differences in performance must be attributed to dif-

ferences in the processing system. Thus the principled investigation of ideal strategy manipulations is a conservative enterprise. Ironically, it is also a move away from the passive, stimulus-driven reductionism of classical behaviorism (which was also conservative in intention). Second, many of the complexities and apparent conflicts in the empirical literature may result from strategy differences that have received little or no theoretical consideration. Rigorous treatments of strategy may, in the long run, simplify rather than complicate the big picture (Stone & Van Orden, 1992). Consider how little information about the nonword foils is given in many lexical decision studies. Yet, the present studies suggest that choice of nonword foil can interact with other factors that influence performance on word trials. Finally, strategic control may be the best vehicle for discovering higher order isomorphisms. Empirical complexities attributed to the complexity of the stimulus world promote lower order isomorphisms between data and theory. Empirical complexities that result from ideal strategy manipulations force an unparsimonious proliferation of lower order isomorphisms or a serious effort to develop powerful, parsimonious higher order isomorphisms.

References Allport, D. A. (1977). On knowing the meaning of words we are unable to report: The effects of visual masking. In S. Dornic (Ed.), Attention and performance VI (pp. 505-533). San Diego, CA: Academic Press. Andrews, S. (1982). Phonological recording: Is the regularity effect consistent? Memory & Cognition, 10, 565-575. Balota, D. A., & Chumbley, J. (1984). Are lexical decisions a good measure of lexical access? The role of word frequency in the neglected decision stage. Journal of Experimental Psychology: Human Perception and Performance, 10, 340-357. Becker, C. A. (1976). Allocation of attention during visual word recognition. Journal of Experimental Psychology: Human Perception and Performance, 2, 556-566. Becker, C. A. (1980). Semantic context effects in visual word recognition: An analysis of semantic strategies. Memory & Cognition, 8, 493-512. Besner, D. (1984). Specialized processors subserving visual word recognition: Evidence for local control. Canadian Journal of Psychology, 38, 94-101. Besner, D., & Johnston, J. C. (1989). Reading and the mental lexicon: On the uptake of visual information. In W. Marslen-Wilson (Ed.), Lexical representation and process (pp. 291-316). Cambridge, MA: MIT Press. Brown, G. D. A. (1987). Resolving inconsistency: A computational model of word naming. Journal of Memory and Language, 26, 1-23. Bub, D., Cancelliere, A., & Kertesz, A. (1985). Whole-word and analytic translation of spelling to sound in a non-semantic reader. In K. E. Patterson, J. C. Marshall, & M. Coltheart (Eds.), Surface dyslexia: Neuropsychological and cognitive studies of phonological reading (pp. 15-34). Hillsdale, NJ: Erlbaum. Carpenter, G., & Grossberg, S. (1986). Neural dynamics of category learning and recognition: Attention, memory consolidation, and amnesia. In S. Grossberg (Ed.), The adaptive brain I (pp. 238286). Amsterdam: North-Holland. Carr, T. H., Davidson, B. J., & Hawkins, H. L. (1978). Perceptual

772

GREGORY O. STONE AND GUY C. VAN ORDEN

flexibility in word recognition: Strategies affect orthographic computation but not lexical access. Journal of Experimental Psychology: Human Perception and Performance, 4, 674-790. Carr, T. H., & Pollatsek, A. (1985). Recognizing printed words: A look at current models. In D. Besner, T. G. Waller, & G. E. MacKinnon (Eds.), Reading research: Advances in theory and practice (Vol. 5, pp. 1-82). San Diego, CA: Academic Press. Cheeseman, J., & Merkle, P. M. (1986). Distinguishing conscious from unconscious perceptual processes. Canadian Journal of Psychology, 40, 343-367. Chomsky, N. (1959). A review of B. F. Skinner's Verbal Behavior. Language, 35, 26-58. Coltheart, M., Davelaar, E., Jonasson, J., & Besner, D. (1977). Access to the internal lexicon. In S. Dornic (Ed.), Attention and performance VI (pp. 535-555). Hillsdale, NJ: Erlbaum. Coltheart, V., Laxon, V, Rickard, M., & Elton, C. (1988). Phonological receding in reading for meaning by adults and children. Journal of Experimental Psychology: Learning, Memory, and Cognition, 14, 387-397. Davelaar, E., Coltheart, M., Besner, D., & Jonasson, J. T. (1978). Phonological receding and lexical access. Memory & Cognition, 6, 391-^02. Doctor, E. A., & Coltheart, M. (1980). Children's use of phonologic encoding when reading for meaning. Memory & Cognition, 8, 195-209. Fodor, J. A. (1983). Modularity of mind: An essay on faculty psychology. Cambridge, MA: MIT Press. Forster, K. I. (1987). Form-priming with masked primes: The best match hypothesis. In M. Coltheart (Ed.), Attention and performance XII (pp. 127-146). Hillsdale, NJ: Erlbaum. Glanzer, M., & Ehrenreich, S. L. (1979). Structure and search of the internal lexicon. Journal of Verbal Learning and Verbal Behavior, 18, 381-398. Gordon, B. (1983). Lexical access and lexical decision: Mechanisms of frequency sensitivity. Journal of Verbal Learning and Verbal Behavior, 22, 24-44. Green, D. M., & Swets, J. A. (1966). Signal detection theory and psychophysics. New York: Wiley. Grossberg, S., & Stone, G. O. (1986). Neural dynamics of word recognition and recall: Attentional priming, learning, and resonance. Psychological Review, 93, 46-74. Healy, A. F. (1976). Detection errors on the word the: Evidence for reading units larger than letters. Journal of Experimental Psychology: Human Perception and Performance, 2, 235-242. Healy, A. F. (1981). The effects of visual similarity on proofreading for misspellings. Memory & Cognition, 9, 453-^-60. James, C. T. (1975). The role of semantic information in lexical decisions. Journal of Experimental Psychology: Human Perception and Performance, 1, 130-136. James, W. (1890). The principles of psychology. New York: Henry Holt. Jared, D., & Seidenberg, M. S. (1991). Does word identification in reading proceed from spelling to sound to meaning? Journal of Experimental Psychology: General, 120, 358-394. Kahneman, D., & Triesman, A. M. (1983). Changing views of attention and automaticity. In R. Parasuraman, R. Davies, and J. Beatty (Eds.), Varieties of attention (pp. 29-61). San Diego, CA: Academic Press. Kroll, J. F., & Merves, J. S. (1986). Lexical access for concrete and abstract words. Journal of Experimental Psychology: Learning, Memory, and Cognition, 12, 92-107. Krueger, L. E. (1978). A theory of perceptual matching. Psychological Review, 85, 278-304. Kucera, H., & Francis, W. (1967). Computational analysis of

present-day American English. Providence, RI: Brown University Press. Lakatos, I. (1970). Falsification and the methodology of scientific research programmes. In I. Lakatos & A. Musgrave (Eds.), Criticism and the growth of knowledge (pp. 91-195). Cambridge, England: Cambridge University Press. Lakoff, G. (1987). Women, fire, and dangerous things: What categories reveal about the mind. Chicago: University of Chicago Press. Lukatela, G., & Turvey, M. T. (1993). Similar attentional, frequency, and associative effects for pseudohomophones and words. Journal of Experimental Psychology: Human Perception and Performance, 19, 166-178. Massarro, D., Taylor, G. A., Venezky, R. L., Jastrembski, J. E., & Lucas, P. A. (1980). Letter and word perception: Orthographic and visual processing in reading. Amsterdam: North-Holland. McClelland, J. L. (1979). On the time relations of mental processes: An examination of systems of processes in cascade. Psychological Review, 86, 287-330. McClelland, J. L., & Rumelhart, D. E. (1981). An interactive activation model of context effects in letter perception: Part 1. An account of basic findings. Psychological Review, 88, 375^107. McCusker, L. X., Hillenger, M. L., & Bias, R. G. (1981). Phonological receding and reading. Psychological Bulletin, 89, 217245. McQuade, D. V. (1981). Variable reliance on phonologic information in visual word recognition. Language and Speech, 24, 99109. Miller, J. O. (1981). Global precedence in attention and decision. Journal of Experimental Psychology: Human Perception and Performance, 7, 1161-1174. Morris, W. (1976). (Ed.) The American Heritage Dictionary of the English Language. Boston: Houghton Mifflin. Morton, J. (1969). Interaction of information in word recognition. Psychological Review, 76, 165-178. Navon, D. (1977). Forest before trees: The precedence of global features in visual perception. Cognitive Psychology, 9, 353-383. Neisser, U. (1976). Cognition and reality: Principles and implications of cognitive psychology. San Francisco: Freeman. Norman, D. A., & Bobrow, D. G. (1975). On data limited and resource limited processes. Cognitive Psychology, 7, 44-64. Paap, K. R., McDonald, J. E., Schvaneveldt, R. W., & Noel, R. W. (1987). Frequency and pronounceability in visually presented naming and lexical decision tasks. In M. Coltheart (Ed.), Attention and performance XII (pp. 221-243). Hillsdale, NJ: Erlbaum. Paap, K. R., & Noel, R. W. (1991). Dual-route models of print to sound: Still a good horse race. Psychological Research, 55, 1324. Patterson, K., & Coltheart, V. (1987). Phonologigical processes in reading: A tutorial review. In M. Coltheart (Ed.), Attention and performance XII (pp. 421-447). Hillsdale, NJ: Erlbaum. Patterson, K., Seidenberg, M. S., & McClelland, J. L. (1989). Connections and disconnections: Acquired dyslexia in a computational model of reading processes. In R. G. M. Morris (Ed.), Parallel distributed processing: Implications for psychology and neurobiology (pp. 131-181). Oxford, England: Oxford University Press. Popper, K. R. (1959). The logic of scientific discovery. New York: Basic Books. Ratcliff, R. (1978). A theory of memory retrieval. Psychological Review, 85, 59-108. Rosson, M. B. (1985). The interaction of pronunciation rules and lexical representations in reading aloud. Memory & Cognition, 13, 90-99.

773

STRATEGIC CONTROL IN WORD RECOGNITION Rumelhart, D. E., & McClelland, J. L. (1982). An interactive activation model of context effects in letter perception: Part 2. The contextual enhancement effect and some tests and extensions of the model. Psychological Review, 89, 60-94. Saffran, E. M. (1985). Lexicalization and reading performance in surface dyslexia. In K. E. Patterson, J. C. Marshall, & M. Coltheart (Eds.), Surface dyslexia: Neuropsychological and cognitive studies on phonological reading (pp. 53-71). Hillsdale, NJ: Erlbaum. Seidenberg, M. S. (1985). The time course of phonological code activation in two writing systems. Cognition, 19, 1-30. Seidenberg, M. S., Waters, G. S., Barnes, M. A., & Tanenhaus, M. K. (1984). When does irregular spelling or pronunciation influence word recognition? Journal of Verbal Learning and Verbal Behavior, 23, 383^04. Shephard, R. N. (1984). Ecological constraints on internal representation: Resonant kinematics of perceiving, imagining, thinking, and dreaming. Psychological Review, 91, 417-447. Shulman, H. G., & Davison, T. C. B. (1977). Control properties of semantic coding in a lexical decision task. Journal of Verbal Learning and Verbal Behavior, 16, 91-98. Shulman, H. G., Hornak, R., & Sanders, E. (1978). The effect of graphemic, phonetic, and semantic relationships on access to lexical structures. Memory & Cognition, 6, 115-123. Small, S. I., Cottrell, G. W., & Tanenhaus, M. K. (Eds.). (1988). Lexical ambiguity resolution. San Mateo, CA: Morgan Kaufman.

Sternberg, S. (1969). The discovery of processing stages: Extension of Donder's method. Acta Psychologica, 30, 276-315. Stone, G. O. (1988). From data to dynamics: The use of multiple levels of analysis. Behavioral and Brain Sciences, 11, 54-55. Stone, G. O., & Van Orden, G. C. (1989). Are words represented by nodes? Memory & Cognition, 17, 511-524. Stone, G. O., & Van Orden, G. C. (1992). Resolving empirical inconsistencies concerning priming, frequency, and nonword foils in lexical decision. Language and Speech, 35, 295-324. Suppe, F. (Ed.). (1977). The structure of scientific theories. Urbana: University of Illinois Press. Van Orden, G. C. (1987). A ROWS is a ROSE: Spelling, sound, and reading. Memory & Cognition, 15, 181-198. Van Orden, G. C., Johnston, J. C., & Hale, B. L. (1988). Word identification in reading proceeds from spelling to sound to meaning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 14, 371-385. Van Orden, G. C., Pennington, B. F, & Stone, G. O. (1990). Word identification in reading and the promise of subsymbolic psycholinguistics. Psychological Review, 97, 488-522. Venezky, R. L. (1970). The structure of English orthography. The Hague, The Netherlands: Mouton. Weber, R. M. (1970). A linguistic analysis of first-grade reading errors. Reading Research Quarterly, 5, 427^51. Wijk, A. (1966). Rules of pronunciation for the English language. Oxford, England: Oxford University Press.

Appendix Table Al

Stimuli for Experiments 1 and 3 High-frequency words

ABOVE EARLY LARGE OFTEN SHORT THING WORLD

CAUSE FINAL LATER PAPER SMALL THINK YOUNG

CHILD GREAT LIGHT PARTY SOUND TODAY

CLASS HOTEL MAJOR PLACE SOUTH TOTAL

CLOSE HOURS MONEY PLANE START VOICE

COLOR HOUSE MONTH PRICE STATE WATER

DAILY HUMAN MUSIC RADIO STUDY WHITE

DRIVE LABOR NIGHT RIGHT TABLE WOMAN

APRON CLOWN DUMMY OUNCE RULER SPICE WAVER

BEANS COBRA FLOAT PEACH RUMOR SPOIL WRECK

BITCH COMIC FREAK PEARL SALTY SPOON

Low-frequency words BLINK CANON CROOK CRUMB GRAPE HOBBY PLUMP POLAR SCARE SHOUT STINK THORN

CARVE CRUSH MARSH PUNCH SKULL TORCH

CHEER DENSE MUNCH QUART SLASH TROUT

CHESS DIARY OLIVE ROAST SNORT VALVE

AMBUR BURTH DRANE GRANE JOYNT LOGIK NURVE POIZE SAYLE SLEAT STOAR TONIK VAYNE

AMUNG BYCEP DREEM FREAK KNEAL LOWDE OXYDE FROZE SCAIL SLITE SURVE TOPIK VURSE

ARROE CHEEF ELBOE GREAN KNYFE MAGIK PAGUN PRUFE SHAIR SNEEK TAYLE TOWLE WHEET

Pseudohomophone foils ATTIK BEEST CLEEN GRAIN FAWLT FEEST HAYLE HERTS KURSE KIWCK MELUN METUL PANIK PANZY PURCH RELIK SHEAP SHURT SNEEZ SNEWS TEATH TEAZE TRALE TRATE WIRTH

BLAIM CRUES FRALE IDEEL LEESH NEEZE PEECE ROOTE SKALP SPANE TEECH TREET

BLEEK DAIZE GAYNE JAIDE LEMUN NERSE PERSE RUBER SKARF SPEEK THURD TROAL

BRANE DANSE GLEWS JOOCE LERCH NOIZE PLEEZ RUFEL SLEAP STAIL TIKEL UNDEW

(Appendix continues on next page)

Table Al—(continued) AMBUN SURTH DRANZ FRANE ROYNT BOGIK TURVE WOIZE ZAYLE SLEAM STAOB TONIB NAYNE

OMUNG BYCEM DREEB GREAL KNEAM GOWDE EXYDE DROZE SCAID CLITE BURVE TOPIT MURSE

ARROT CHEEN ELBOT GREAB KNAFE RAGIK PAGUL PRUFT SHAIG SNEEB TAYLT MOWLE WHEEB

Legal nonword foils ATTIF DEEST CLEEB CRAIL DAWLT VEEST LAYLE VERTS GURSE AWICK MELUF METUM PANIF MANZY GURCH RELIN SHEAM SHURD SNEEF SNEWE FEATH DEAZE PRALE BRATE KIRTH

BLAIT CRUEB FRALP IDEEK LEESK REEZE HEECE POOTE SKALN SPANT GEECH TREEN

BLEES TAIZE HAYNE BAIDE LEMUB DERSE BERSE CUBER SKART SPEEN THURN TROAB

BRANK SANSE PLEWS KOOCE MERCH SOIZE BLEEZ RUFEN PLEAP STAIM TIKEM UNDEK

MRUAB RBUHT NDREA RNEAG NTYOJ LKOIG NVRUE OPEZI AEYSL LSETA OASRT TNKIO EYAVN

NMUGA YBPCE EMRDE RKEGA NLKEA DLOEW XODYE PEOZR SLCAI TLSEI VUSRE PKIOT RSEUV

AEORR FCEHE OEELB NGARE KFYNE AIMGK PNAUG RFPUE HRISA ESNKE LYAET WLTEO HWTEE

Illegal nonword foils KITTIA BTESE NLEEC NCIRA WFATL SFETE AEYLH TSERH SREKU KIKCW UMELN MTULE NKAPI NZYPA PRCHU ERLKI APSEH HSUTR SNZEE SNSEW THTAE EETZA LREAT AERTT IWRTH

IBMLA UCSRE AFLRE IELDE EHLES NZEEE EEECP ROEOT PKASL SPNAE CTEHE RTEET

EEKBL EAIDZ YGNAE DEAD LMNUE ENRSE PSREE RBRUE RFASK EPSKE RHUTD TRLOA

ENRBA DSNAE WESEGL OOEJC CHULR NEOIZ ZLEPE UERFL AESLP LSATI ITLKE UWEDN

Table A2—Stimuli for Experiment 2 High-frequency irregular words BROAD BUILD GONE HAVE NONE OWN TOUCH WERE

ANSWER EIGHT LOVE SURE

BLOOD FIND MIND THOUGHT

BREAK FOOT MOVE THROUGH

ACHE COUGH MILD SWORD

AISLE COUP MOWN TOW

Low-frequency irregular words BENIGN ALIGN BEAD FOWL GHOUL GNAW PEAR RESIGN SEIZE TREACHERY WOMB

ALMUND KORN ELBOE HEET LOTTARY FROZE SLEAT TYME

BEAF COTTUN ERRUR JARGEN MAYER RANDUM SNEEK TREET

ARMOND CIRN ELSOW HEST LOSTERY PROGE SPEET TICE

BELF COPTON ERLOR JARSON MABOR RASDOM SNERK TREST

CHARACTER COME ISLAND KEY POLICE SIGN WHOSE

DESIGN KIND SON

BRUISE HEIR SLEIGHT YACHT

BURY ISLE SOOT

Spelling-controlled pseudohomophone foils BURTH KASHEW CLEEN BERCH DEFEET DURBY DURT CULPRET FRUM FALCE PEER FURST LEESH LEEST JEAP JIRK NURVE PANZY PARRIT NEET SALERY SKORE SURVE ROZE SWERL SPEEK SPEER SOOP WHEET WHEAL

COMIDY DOLLER GALLEN LYNE PENSIL SHEAT TALANT

CONSERN DOAR GREAN LYKE PHAZE SHURT THEEF

Spelling-controlled legal nonword foils CASPEW CREAN BIRSH BORCH DELBY DILT CALPRIT DEFEST FIRAT FRIM FLAR FALPE LEATH LEALT JECK JELP PARMOT PUNSY NERPE NENT SORE SERKE SALALY ROIE SWARL SPERK SLEAR SOLP WHEES WHERT

COMERY DELLAR GELLON LIBE PENCIL SHEST TAMENT

CONTERN JOOR GHEEN LIWE PHIZE SLIRT THAIF

BOUGH GRIND SEW WROUGHT

Received October 30, 1989 Revision received September 29, 1992 Accepted September 22, 1992