Perception & Psychophysics 1993, 53 (6), 601-616
Testing the speech unit hypothesis with the primed matching task: Phoneme categories are perceptually basic STEF DECOENE Katholieke Universiteit Leuven, Leuven, Belgium The basic speech unit (phoneme or syllable) problem was investigated with the primed matching task. In primed matching, subjects have to decide whether the elements of stimulus pairs are the same or different. The prime should facilitate matching in as far as its representation is similar to the stimuli to be matched. If stimulus representations generate graded structure, with stimulus instances being more or less prototypical for the category, priming should interact with prototypicality because prototypical instances are more similar to the activated category than are low-prototypical instances. Rosch (1975a, 1975b) showed that, by varying the matching criterion (matching for physical identity or for belonging to the same category), the specific patterns of the priming x prototypicality interaction could differentiate perceptually based from abstract categories. By testing this pattern for phoneme and syllable categories, the abstraction level of these categories can be studied. After finding reliable prototypicality effects for both phoneme and syllable categories (Experiments 1 and 2), primed phoneme matching (Experiments 3 and 4) and primed syllable matching (Experiments 5 and 6) were used under both physical identity instructions and same-category instructions. The results make clear that phoneme categories are represented on the basis of perceptual information, whereas syllable representations are more abstract. The phoneme category can thus be identified as the basic speech unit. Implications for phoneme and syllable representation are discussed. What is the object to which speech perception processes respond? Which are the fundamental parts on which these processes work? These questions are of extreme importance in speech science. Knowing what the basic processing unit in speech perception is has important implications for the study of speech segmentation and invariance. Moreover, the development of a coherent, computational speech perception theory presupposes an empirically supported decision concerning the sequencing of representations resulting from speech perception processes. Several such units have been proposed, but most research has focused on phonemes and syllables. However, until now, no clear consensus about the speech unit hypothesis has been found. I will argue that this is due largely to the use of an unsuitable experimental procedure: the monitoring task. In the research reported here, the primed matching task will be proposed and used to investigate the speech unit hypothesis, because it allows for a direct test of the status of the phoneme and the syllable as potential speech unit candidates. The research reported here is part of a doctoral dissertation project (supervisor: G. Govaerts). Thanks go to G. van der Steene for making available PCIATs, N. Bovens, R. Delabastida, and M. Lenaerts for technical assistance, and J. Kingston, J. Baele, and an anonymous reviewer for useful remarks. I am gratefulto Tine Daeseleirefor lending me her voice and to Bart Defoor and Jo Baele for help in running Experiment 2. Correspondence should be addressed to S. Decoene, Centrum voor Matematische Psychologie en Psychologische Metodologie, Katholieke Universiteit, Tiensestraat 102, 3-000 Leuven, Belgium (e-mail:
[email protected]).
THE SPEECH UNIT HYPOTHESIS The impetus for the investigation of the speech unit hypothesis came from a simple experiment. Savin and Bever (1970) had subjects monitor a sequence of meaningless syllables for either a complete syllable or a phoneme (the syllable-initial consonant or the medial vowel). Phoneme monitoring was significantly slower than syllable monitoring. Savin and Bever concluded that phonemes are identified after syllables of which they are parts, because phonemes are not perceptual, but abstract, nonsensory entities. However, Foss and Swinney (1973) showed that response times for monitoring two-syllable words were faster than for word-initial syllables, and that syllables were responded to faster than were word-initial phonemes. Instead of arguing that syllables are more abstract than words, they denied the feasibility of dividing perceptual units into "abstract" and "real" ones on the basis of the monitoring task. Making a distinction between perception and identification, Foss and Swinney argued that the monitoring task does not tap into a processing level corresponding to immediate perception: The observed ordering of response times reflects the order by which linguistic units are (consciously) identified rather than the order in which they are perceived. Although McNeill and Lindig (1973) subsequently showed that monitoring differences of linguistic segments depended on the match or mismatch between target and
601
Copyright 1993 Psychonomic Society, Inc.
602
DECOENE
search list, the monitoring task was not abandoned as an inappropriate method for investigating the speech unit hypothesis. On the contrary, it was used again and again, and various variables or artifacts were found to influence response times-for example, the identifiability of phonemes (Healy & Cutting, 1976), listeners' expectancy of target vowel context as determined by the match or mismatch between target and stimulus context (Mills, 1980a, 1980b) and uncertainty of vowel contexts (Swinney & Prather, 1980), the interaction between segmental and suprasegmental cues during ongoing perception (Meltzer, Martin, Mills, Imhoff, & Zohar, 1976), meaningfulness of stimuli (Rubin, Turvey, & van Gelder, 1976), the syllabic relationship between target and stimulus word (Mehler, Dommergues, Frauenfelder, & Segui, 1981; Segui, 1983), methodological confusion between syllable and phoneme monitoring (Mehler, Segui, & Frauenfelder, 1981), and the presence of foils (Norris & Cutler, 1988). The entire set of possible conclusions has been put forward on the basis of these experiments: The phoneme is the speech unit (Norris & Cutler, 1988), the syllable is the speech unit (Mehler, Dommergues, et al., 1981; Mehler, Segui, & Frauenfelder, 1981), the syllable and the phoneme are equally basic in speech perception (Healy & Cutting, 1976), and the monitoring task is inappropriate for studying the speech unit hypothesis (Mills, 1980a, 1980b; Rubin et aI., 1976; Swinney & Prather, 1980). Wood and Day (1975) used a two-choice speededclassification task in which subjects had to identify either the consonant or the vowel of a monosyllable in which the irrelevant segment was either varied or held constant. They assumed that parameter variation will not affect response times when it does not occur within the same minimal perceptual unit. They found that response times were faster when the unattended segment was held constant, so it was concluded that the syllable was the minimal perceptual speech unit. However, Shand (1976) subsequently showed that variation of the first unattended syllable of a bisyllabic stimulus also affected response times to the consonant or vowel of the second syllable. By consequence, he concluded that Wood and Day's (1975) assumption was wrong and that monitoring in a selective attention task is not designed for the investigation of the speech unit hypothesis. It should be stressed that the preceding discussion does not imply that the monitoring task is useless. Recent research has made it clear that the monitoring task is a powerful method for studying, for example, lexical processes (e.g., Cutler, Mehler, Norris, & Segui, 1987; Foss & Blank, 1980; Frauenfelder & Segui, 1989; Frauenfelder, Segui, & Dijkstra, 1990). My point is more modest: The monitoring task is not sensitive to the potential ordering (from concrete to abstract) of representations available during speech perception and, thus, is inappropriate for investigating the speech unit hypothesis. Recently, the interpretation of this body of data has become even more complicated. In an interesting series of
monitoring studies, Cutler and colleagues (Cutler, Mehler, Norris, & Segui, 1983, 1986, 1989) obtained languagespecific syllable monitoring effects. French, which has a simple syllable structure, generated clear syllabic segmentation, whereas English, which has unclear syllabic boundaries due to ambisyllabic segments, showed no syllabic segmentation. 1 But how can language-specific segmentation strategies be related to any systematicity in speech production or to any uniform representational format in speech perception? Cutler and Norris (1985) argued that it is not the classification of speech objects, but the segmentation process itself that should be made the central topic: Speech perception processes are designed to "utilize any information which can be extracted (either directly or indirectly) from the signal to determine where lexical boundaries lie" (p. 698). That is, the results of Cutler et al. (1983, 1986, 1989) make sense only if attention is shifted from speech perception to lexical representation and, more specifically, to the unit of lexical access. One can accept this solution on practical grounds. But two elements should be made explicit. First, the core of their solution is the definition of the term segmentation as lexical boundary detection, which is an extension of its original meaning in speech perception research: detection of segmental boundaries in the acoustic signal, that is, detection of the speech objects. Second, it supports the methodological moral already made by Foss and Swinney (1973): The monitoring task is not designed to investigate the nature of speech object representation, and consequently, it cannot be used to test the abstraction level of a possible speech perception object representation. Although the monitoring task has been the predominantly used method for studying the speech unit hypothesis, some data do exist that are not based on it. Starting from the concept of a preperceptual auditory image and the time it is available for processing, Massaro (1972, 1974) used a recognition-masking task, finding that the processing of speech stimuli occurs within 200-250 msec after stimulus presentation-a duration of roughly syllable length. However, because this represents the upper boundary of speech unit length, smaller units, such as demisyllables (proposed by Samuel, 1989, on the basis of a selective adaptation study) or phonemes, would also fit in. As a result of these diverging data, a certain amount of relativism concerning the speech unit hypothesis has been growing (e.g., Jusczyk, 1986; Pisoni & Luce, 1987). According to this view, there is not one unit but many, depending on the level of processing during speech perception. Such relativism erroneously assumes that, since there are many levels, the search for the basic speech unit is misguided. This assumption is wrong because the speech unit hypothesis does not ask whether, for example, phoneme categories are psychologically real-that is, present at some level of processing. There is a large amount of evidence in favor of segmental representations in language processing (e.g., see Pisoni & Luce, 1987, for a review). Also, the speech unit hypothesis is not concerned with the question of how the lexicon is accessed.
SPEECH UNIT HYPOTHESIS It is not necessarily the case that, for example, because the syllable is the basic speech perception unit, lexical contact representations also must be syllables. In my view, the assumption that the speech unit and the lexical contact representation are the same is an unfortunate one. The speech unit problem is a topic for speech perception research, not for the study of word recognition and on-line language comprehension. Essentially, the speech unit hypothesis asks which representation among all the potential representations that can be (and are effectively) used during speech perception (phoneme category, syllable category, word, etc.) describes best the basic "building blocks" of speech. The category representation of these basic building blocks will be directly perceptually based, because it is directly determined by the acoustic information in the signal. And, since the abstraction level of mental categories is assumed to be closely related to the amount of direct perceptual determination of their representational content, the mental representation corresponding to the basic speech segment should be the most concrete mental category. Given the fundamental importance of the speech unit hypothesis for speech perception theories, one should concentrate on the search for an experimental procedure allowing a more direct investigation of this problem, instead of shifting attention to lexical processing. Such a procedure should fit the logic of the speech unit hypothesis, which centers around the notion of the abstraction level of representations. In the following section, I will discuss the primed matching task as a procedure that fits this logic. Before I discuss the primed matching task, a small note on terminology may be in order. In the subsequent discussion, I will use the terms phoneme category and syllable category, instead of phonetic segment and unit of roughly syllabic length. In this way, I want to indicate that the focus of this research is on how these two entities are mentally represented, and not on issues of whether, for example, speech perception entails phonetic processing, central auditory processing, or both. That is, the topic of the research reported here is speech representation, not speech processing. While processing issues are important, they do not stand in the foreground.
THE PRIMED MATCHING TASK: A METHOD FOR DIFFERENTIATING CATEGORIES AT VARIOUS LEVELS OF ABSTRACTION Mental categories can be placed on a continuum from concrete to abstract, from entirely perceptually defined to entirely conceptual (Medin & Barsalou, 1987; Rosch, 1975a, 1975b): The color blue is a perceptually based category, whereas the concept "freedom" has a purely abstract characterization. Although perceptual and conceptual categories are highly similar (see Medin & Barsalou, 1987, for an excellent discussion), there is an interesting
603
finding using the primed matching task, as discussed in Rosch (1975a, 1975b), on which perceptual and more abstract categories seem to be differentiable.
The Primed Matching Task In the primed matching task (Beller, 1971), subjects are asked to decide as rapidly as possible whether or not elements of a stimulus pair are the same. The priming stimulus preceding the stimulus pair should facilitate the matching process insofar as the mental category representation it activates is similar to the stimuli to be matched. If stimuli are instances of a category that generates a prototypicality effect, then the prime will activate a category representation that is more similar to a good example than to a bad example of the category. Thus, the prime will facilitate judging similarity more in matching goodexample pairs than in matching poor-example pairs. Primed matching will thus result in a priming X prototypicality level interaction (henceforth, a P x P interaction). Moreover, subjects can be induced to match stimuli using different criteria by instructing them to judge stimuli of a pair as the same if they are physically identical or as the same if they belong to the same category. The latter criterion allows subjects to use any information available; the former restricts subjects to using only direct, low-level perceptual information. Rosch (1975a, 1975b) showed that application of the priming logic to this variation in matching instructions allowed for the differentiation of categories that are based on perceptual information from categories that are described by abstract, nonperceptual information, given that they have graded structure. If subjects have to match stimuli under physical identity instructions, the prime will be effective only if the activated category representation contains low-level perceptual information, thus enabling faster matching by the use of this low-level perceptual information (resulting in a PxP interaction). If matching has to be done under same-eategory instructions, different predictions concerning a P X P interaction are possible, depending on the representational content of the category representation (e.g., entirely perceptually determined, entirely based on abstract information, or based on a mixture of perceptual and more abstract information). For example, if the prime activates only perceptual information, then same-eategory pairs will not show a P x P interaction when same-eategory instructions are used, because they do not match with respect to the activated low-level perceptual information. The research of Rosch (1975a, 1975b) showed the strength of the primed matching task. Rosch (1975b, Experiment 1) found a P X P interaction for matching physically identical instances of color categories under physical identityinstruction. In a complementary experiment (Rosch, 1975b, Experiment 3), matching under same-eategory instructions returned a significant P x P interaction for physically identical color pairs but no P x P interaction for same-category pairs. Within the context of the primed
604
DECOENE
matching logic, these results strongly supported a graded structure representation for color categories with representation primarily based on perceptual information. An entirely different pattern was found when the representations generated by superordinate semantic category names were studied (e.g., "chair"). Under physical identity instructions (Rosch, 1975a, Experiment 5), no P xP interaction was found for physically identical pairs. Under same-eategory instructions (Rosch, 1975a, Experiment 2), a significant P x P interaction for physically identical pairs was observed, but no such interaction was observed for same-category pairs. From this pattern of results, Rosch (1975a) inferred that the mental representation of semantic categories is based on a more abstract (semantic) code than is, for example, that of color categories. The primed matching task, then, promises to be a method better suited to investigate the abstraction level of representations in speech perception than the phonemel syllable monitoring task. Rosch (1975a, pp. 226-227) explicitly pointed out that the primed matching task made possible "a detailed investigation of the nature of the mental representations generated by category terms at any level of abstraction." The task has the advantage that it does not depend on the comparison of average response times to phoneme or syllable categories (as is the case with phoneme/syllable monitoring), but instead compares them on a specific pattern over experimental manipulations. Moreover, the task focuses directly on the representation of phoneme and syllable categories and thus avoids the possible confounding of representation versus on-line perceptual processing of speech-the primed matching task does not test which unit is made (consciously) available first, but it does test what the content of a category representation is (perceptually determined or not). No confusion between perceptual processes (which are consciously inaccessible) and conscious identification (Foss & Swinney, 1973) is possible.
Phoneme and Syllable Prototypicaiity Effects As described above, the crucial prerequisite when using the primed matching task is that phonemes and syllables should be categories generating graded structure. As far as phoneme categories are concerned, empirical evidence supports the idea that they have a representational structure that generates prototypicality effects. Miller, Connine, Schermer, and Kluender (1983) used a selective adaptation procedure to study the possible auditory basis of internal phonetic category structure. Syllable series for three phonetic features-voicing, place of articulation, and manner of articulation-were synthesized. They found adaptation functions approximating an inverted U. Miller et al. hypothesized that the adaptation peaks corresponded to the location of the phonetic prototype on the stimulus continuum. However, they did not have a measure of example goodness for the category instances that would have allowed them to test the hypothesis directly. Samuel (1982) made such an external measure by letting subjects perform both a response-timed
identification task and an overt prototype task on a synthesized subset of the Iga/-/kal continuum. He found that prototype adaptors produced more adaptation than did other adaptors. Miller and Volaitis (1989) observed a clear relation between ratings of how good a Ipl the consonants of two synthesized Ibi/-/pil series with progressively larger VOT values were and the effect of syllable duration between the two series on an identification task-as expected, an inverted U-shaped function was found for each series, while the increase of syllable duration both shifted the location of the good exemplars and widened their range. The best evidence to date in favor of prototypicality effects for phoneme categories comes from research by Kuhl and collegues. Using a head-tum procedure, Grieser and Kuhl (1989) trained infants of about 6 months of age with either a synthesized good or a poor example of the vowel Iii (as defined by adults). They subsequently obtained better generalization to novel instances of the vowel for infants trained with the good example (see also Kuhl, Williams, Lacerda, Stevens, & Lindblom, 1992). Kuhl (1991) had subjects (both adults and infants) decide when a referent speech sound (either a synthesized good or a poor example of the vowel Iii) presented continuously once per second changed to a comparison speech sound. A perceptual magnet effect for phoneme category prototypes on stimuli nearby in perceptual space, was demonstrated: Prototypical speech sounds assimilated other instances, making the latter perceptually more similar to the former. In the present research, natural phoneme categories occurring word-initially were used to test for the presence of a prototypicality effect. Since speech perception is cued by a multitude of information sources, it would be more interesting to find graded structure for natural segments in which natural variation is the dominant factor. Indeed, it seems likely that an ecologically valid speech perception phenomenon, such as a prototypicality effect, will be based on exactly such variation (with prototypicality determined by, for example, phonetic context). Moreover, it has been shown that well-established effects in speech perception can disappear when more natural stimuli are used (e.g., see Burton, Baum, & Blumstein, 1989, for the influence of a more natural stimulus structure on lexical effects in phonetic categorization). As will be discussed later, the problems that could be created in tum by using natural stimuli can be dealt with by the primed matching task. As far as syllables are concerned, evidence for graded category structure is sparse. Some models incorporate syllable prototypes by assumption-notably, the fuzzy logical model of speech perception (Massaro, 1987, 1989; Oden & Massaro, 1978)., However, as far as I know, direct empirical support for the existence of syllable prototypes does not exist. In summary, there is good evidence that phonemes are categories whose representations generate graded structure. However, the prototype view would be supported more strongly, and the speech unit problem could be investigated more decisively within a categorization perspec-
SPEECH UNIT HYPOTHESIS tive, if reliable prototypicality judgments could be tapped with natural phoneme categories. This possibility was evaluated in Experiment 1. In Experiment 2, possible prototypicality effects for syllable categories were studied.
A DIRECT TEST OF THE SPEECH UNIT HYPOTHESIS WITH THE PRIMED MATCHING TASK The following experiments use the primed matching task under both physical identity and same-category instructions in order to provide a direct investigation of the speech unit hypothesis. First, the reliability of prototypicality ratings for natural phoneme and syllable categories were assessed (Experiments 1 and 2). These prototypicality judgments were the basis for four experiments using the primed matching task. In Experiments 3A and 3B, subjects performed a primed phoneme matching task under physical identity instructions and under same-category instructions. In Experiments 4A and 4B, a primed syllable matching task was performed under physical identity and same-category instructions. There are two crucial questions. First, can a significant P x P interaction be obtained, demonstrating that good-example and poor-example pairs are affected differently by priming, with either primed good-example pairs responded to faster than unprimed, primed poor-example pairs responded to slower than unprimed, or both? If the answer to this question is negative for phoneme and/or syllable categories, it should be concluded that the goodness-of-example ratings obtained in Experiment 1 and/or Experiment 2 are merely artificial, and do not correspond to any "real" differences between good and poor examples of a category. By consequence, it would be impossible to conclude anything at all about the speech unit hypothesis. If priming has a different effect on good-example and poor-example pairs, the second question is: Under which instructions (physical identity and/or same-category), with which stimulus pairs (physically identical pairs and/or same-category pairs), and with which categories (phonemes and/or syllables) can a PxP interaction be observed? An analysis of the specific distribution of these Px P interactions should make clear whether one, both, or none of the category types is perceptually based. For example, if syllable categories are perceptually based, one should expect a P x P interaction for physically identical pairs under both physical identity instructions and samecategory instructions, but no such interaction for samecategory pairs under same-category instructions. Two additional, important points should be made. The fact that the experiments of Rosch described above point at a way to differentiate mental representations as far as their level of abstraction is concerned should not blur the differences with the present study. The stimuli used by Rosch (l975a, 1975b) were perceptual categories, on the one hand, and semantic categories, on the other hand. In the present study, the stimuli of interest, phoneme versus
605
syllable categories, are both presented as spoken speech stimuli. As such, they can both be described as perceptual. Thus, the investigation here does not ask whether phoneme or syllable categories are perceptual categories (that would be a truism), but whether and how directly their mental representation is based on purely perceptual information. Second, searching for graded structure has no processing implications per se (see Lakoff, 1986, and Rosch, 1978, for discussion of this point). Finding graded category structure is compatible with various theories of categorization, such as representation by prototypes or by exemplars. Reliable and valid graded structure (e.g., for speech categories) does not uniquely constrain the kind of categorization model that should be defended. The research reported here has nothing to say about these issues. The crucial question is whether or not reliable and valid graded structure can be found.
EXPERIMENT 1 Prototypicality Ratings of Phoneme Categories Method
Stimuli. Seven Dutch phoneme categories were chosen: three consonants (/kJ, Irl, and lsI) and four vowels (I E I, la:/, laI, and le:/). The categories and the words are part of the stimulus set currently in use at the Katholieke Universiteit Leuven laboratory of auditory perception and speech communication. This stimulus set had originally been constructed to study the perceptual effects of allophonic distribution. Consequently, each set incorporated all possible allophones for a given phoneme. All stimulus words were low frequency (Uit den Boogaert, 1975), with a median frequency of 0; the crucial phoneme categories occurred in word-initial position. There were 240 words in the IkJ set, 215 words in the Irl set, 98 words in the lsI set, 67 words in the leI set, 40 words in the la:1 set, 124 words in the Ial set, and 90 words in the I E I set. The stimulus words were recorded twice during 10 sessions over a I-week period in a sound-proof room, with a Stellavox SP7 recorder and Sennheiser MD44lN microphone. A female speaker pronounced the words with flat intonation. When the entire set was recorded, the speaker listened to the two versions and selected the best. Whenever the speaker felt unsure about the quality of the recorded material, the words were recorded again. Stimulus tapes were then constructed by randomizing words belonging to a category. An interval of about 5 sec separated each stimulus word in order to give the subjects enough time to give their goodness-ofexample ratings. Subjects. The subjects were first-year psychology and educational sciences students. They participated in partial fulfillment of course credits. None reported hearing deficiencies. A group of 34 subjects rated the lsI set, the le:1 set, the /kJ set, and the la:1set. Another group of 43 subjects rated the lal set, the I E I set, and the Irl set. One subject did not complete his booklet with ratings for the Ial set. Procedure. Each group of subjects listened to the stimulus tapes in a room with excellent acoustics. They received a booklet containing rating scales with 10 intervals in graphical form combined with numeric symbols and anchored at the endpoints (Nunnally, 1970). The subjects first received elaborate written instructions. They were told that prototypicality could be used to judge speech segments such as phoneme categories, and they were instructed to listen carefully to all stimulus words and to indicate how good a member the relevant segment of each word was for the phoneme category under consideration. When the subjects still felt unsure after reading the instructions, the experimenter gave an example,
606
DECOENE
using a segment not included in the experimental set (Ill). Pauses of about 5 min were given between each category set. For each group, the experiment lasted about I h.
Results and Discussion The subjects' ratings for the individual phoneme categories used all steps of the scale, with ratings varying from I to 10, and overall median ratings of7 for IkJ, le:/, la:/, and lsi, 6 for lEI and lal, and 5 for ltl . Two indices of interobserver agreement were computed. First, Stine's (1989) omicron coefficient (corrected for chance) of the median ratings obtained between split halves of the sample of subjects, divided at random, was calculated. Stine's omicron is an index of relational agreement between observers; it measures the degree to which judgments represent admissible transformations of the ordinal relations between the stimuli. High ordinal relational agreement means that the ordinal relations between stimuli are preserved over the observers. Also, the intraclass correlation ICC(2,k) (Schrout & Fleiss, 1979) was calculated. The intraclass correlation measures the correlation between the median ratings of k observers, treating observers as a random effect. Both the intraclass correlation and omicron were significant for all categories (see Table I). The intraclass correlations varied from .663 for lsi to .862 for IkJ. The values of omicron ranged from .46 for lsi to .837 for la:/. It could be objected that although the ratings found in Experiment I were reliable, nothing is known about their actual basis. Are they real, or just artificial or speech irrelevant? And even if they are real, how exactly were they generated? It goes without saying that an investigation of the basis of prototypicality effects for word-initial natural phoneme categories is interesting in its own right. However, it should be stressed that an answer to this question does not affect the interpretation of potentially significant P x P interaction effects in the primed matching task, because the design of the primed matching task allows one to detect any spurious effects of goodness level. (1) If the reliable prototypicality effects for phoneme categories, for example, are mere response strategies of subjects, neither a significant P x P interaction nor a significant main effect of goodness level can be expected. (2) If the reliable Table 1 Interobserver Agreement for Goodness-of-Exarnple Ratings of Phoneme Categories Phoneme S n ICC(2,k) F Omicron la;1 40 34 .771 5.5 .837 11.7 lal 124 42 .756 4.6 .585 5.7 le:1 67 34 .803 6.6 .741 8.2 lEI 90 43 .822 6.4 .730 9 lsI 98 34 .663 3.7 .460 3.6* Iki 240 34 .862 8.4 .779 9.4 Irl 215 43 .788 5.3 .696 8.1 Note-S = number of stimuli. n = number of subjects. All values of ICC (2, k) and omicron are significant, p < .0005, unless indicated otherwise. *p < .005.
prototypicality ratings are determined by systematic but speech-irrelevant information (such as loudness differences between phoneme category instances, or the confounding of the phoneme category ratings with, for example, lexical factors), the most one can expect to find in the primed matching task is a significant main effect of goodness level, but no significant PxP interaction. Cross-modal priming with a category name activates information related to the category representation. So, if prototypicality effects found for phoneme categories are not based on information used in the description of the category representation, priming will not differentially affect good and poor examples of a category. A significant P x P interaction can be found if and only if there is such a differential match between the primed category representation and good-example versus poor-example category instances. This implies that the empirical distinction between good-example and poor-example category instances is not only reliable but also valid. In strictly operational terms, by looking for an interaction specified a priori between priming and prototypicality, interpretation of the prototypicality effect is much more constrained (via the priming logic) than when only a main effect ofprototypicality would be predicted. In summary, since it can be safely concluded from Experiment I that reliable goodness-of-example judgments can be obtained for natural phonemic categories with a direct rating procedure, Experiments 3A and 3B were run to assess the processing implications of these goodness-ofexample ratings and to test the speech unit hypothesis (if, at least, these goodness-of-example judgments are valid).
EXPERIMENT 2 Prototypicality Ratings of Syllable Categories Method Stimuli. Eleven syllable categories, which are part of the stimulus set used at the laboratory, were used. These syllables had been chosen so as to include members of each of the four simple syllables acceptable in Dutch (Clements & Keyser, 1983; Collins & Mees, 1981; van der Hulst, 1984). These four elementary syllable types were a vowel (V) syllable, a consonant-vowel (CV) syllable, a vowel-consonant (VC) syllable, and a consonant-vowelconsonant (CVC) syllable. The stimulus set included 10:1 (n = 53) and Ia! (n = 161) as V syllables, Ibel (n = 131), Ina;1 (n = 43), and Ipa! (n = 89) as ev syllables, lopl (n = 90), lasl (n = 67), and luitl (n = 105) as ve syllables, and Ikopl (n = 77), IvEtl (n = 58), and Ivak! (n = 83) as eve syllables. Only two V syllables were available, because it was difficult to find categories of this syllable type with enough instances. All syllable category instances occurred in word-initial position of two-to-four-syllable lowfrequency words (median frequency = 0; Uit den Boogaert, 1975). The recording procedure was identical to that of Experiment I. The same female speaker was used to pronounce the syllables. Subjects. The subjects were first-year psychology and educational sciences students. They participated in partial fulfillment of course credits. None reported hearing deficiencies. A group of 45 subjects rated the Ina:1 set, the lasl set, the Ikopl set, the Iv E tl set, the 10:1 set, the luitl set, and the lopl set. Another group of25 subjects rated the /bel set, the Ipal set, the Ivakl set, and the Ia! set. Each session lasted about I h.
SPEECH UNIT HYPOTHESIS
EXPERIMENT 3A Primed Phoneme Matching Under Physical Identity Instructions
Procedure. The procedure was the same as in Experiment I. If necessary, the experimenter provided an additional example (!ball).
Results and Discussion The subjects' ratings for the individual syllable categories used all steps of the scale, with ratings varying from 1 to 10, and overall median ratings of 7 for 10:/, lbel, Ina:/, luitl, lopl, las I , and Ivetl, 6.5 for Ipal, Ivakl, and lal, and 6 for Ikop/. Both the intraclass correlation ICC (2, k) and omicron were computed. Both indices obtained significance (see Table 2). The intraclass correlations varied from 0.395 for lal to 0.893 for 10:1. Omicron varied from 0.143 for lal to 0.868 for lop/. It is clear from Tables 1 and 2 that the values of the interobserver indices were comparable to those of phoneme categories. These results indicate that the subjects were able to reliably judge the goodness of example for syllable categories with a direct prototypicality rating task. There was one deviation from the overall pattern. The omicron value for the lal set was, though significant, too low. It is not clear why this syllable category behaved differently from the others. It is possible that the group of subjects who rated the Ial set had difficulty perceiving the stimuli as syllables. However, the good results for the 101 set indicate that this difficulty has to be attributed to this particular group of subjects. A small replication was run, in which 32 subjects rated the stimuli of the lal set, under identical instructions and with the same procedure. Interobserver agreement was clearly better [ICC(2,32) = .480, F(160,4960) = 2.24, P < .0005; omicron = .320, t(159) = 4.26, P < .0005]. Hence, the lalset was not dropped. In sum, because these prototypicality ratings were reliable, they could be used to investigate the processing implications of the goodness-of-example ratings and to test the speech unit hypothesis (if, at least, these goodness-ofexample judgments are valid) (Experiments 4A and 4B). Table 2 Interobserver Agreement for Goodness-of-Example Ratings of Syllable Categories ICC(2,k) SyUable S n F Omicron 10:1 53 45 .893 12.7 .798 9.5 Ia! 161 25 .395 1.9 .143 1.8* 161 .480 2.2 .320 32 4.3 la/+ na:1 43 45 .773 6.5 .414 8.2t Ipal 89 25 .548 2.7 .507 5.6 /bel 131 25 .720 4.3 .481 6.2 lasl 67 45 .730 4.9 .629 6.5 luitl 105 45 .705 4.3 .581 7.2 lopl 45 .798 7.4 90 .868 16.5 Ikopl 77 45 .869 9.8 .838 13.3 Ivakl 83 25 .763 5.0 .447 4.5 IvEtl 58 45 .828 7.5 .674 6.8 Note-S = number of stimuli. n = number of subjects. AU values of ICC(2, k) and omicronare significant, p < .0005, unlessindicated otherwise. *p < .05. tp < .005. +Results of the replication with the lal set.
607
The results of Experiment I form a reliable basis for using the primed matching task with phoneme categories. Experiment 3A used the primed masking task under physical identity instructions; Experiment 3B used the primed matching task under same-category instructions. The results of these two experiments will be discussed together.
Method Subjects. Twelve second-year psychology students participated in the experiment in partial fulfillment of course credits. None reported hearing deficiencies. Stimuli. The stimulus words used here were chosen from the words used in Experiment I. For each phoneme category, stimulus words were rank ordered from high to low prototypical, and the highest and the lowest prototypical stimulus word was selected (resulting in 14 stimulus words). Pairs of stimulus words were then constructed according to the procedure described in Rosch (\975a, 1975b). For each of the seven phoneme categories, a pair of physically identical prototype instances (e.g., [kaars-kaarsj) and a pair of physically identical, poor-example instances (e.g., [kwezel-kwezel)) was constructed. Additionally, 14 different stimulus pairs were made: 7 pairs in which the prototypical member of one category was combined with the prototypical member of another category (e.g., [kaars-siroopl), and 7 pairs in which the nonprototypical member of one category was combined with the nonprototypical member of another category (e.g., [kwezel-elatiet]; see Appendix A for the complete stimulus set). All in all, there were 28 stimulus pairs . Experimental setup. The stimulus pairs were randomized and recorded on one track of a two-track recorder (Philips AAC 4000 language trainer). Each stimulus pair occurred twice: once preceded by the orthographic presentation of the phoneme category name of the first category (the priming stimulus), and once preceded by an exclamation mark (giving a total of 56 stimulus pairs). Primed and unprimed trials were alternated (Beller, 1971; Rosch, 1975a, 1975b). Within each pair, the stimulus words were recorded with an interval of about 250 msec so that forward masking could be avoided (Scharf & Buus, 1986). Stimulus pairs were separated by an interval of 5 sec. A short tone pulse was put on the other track of the tape about 2 sec before the second word of the pair. The exact time interval between pulse location and onset of the second stimulus word was controlled using the Philips speech adapter box and OM821Ospeechprocessing software, so that pulse location could be corrected if necessary. The distance between prime and the first stimulus word was 750-1,000 rnsec. This falls within the time interval in which the benefit or cost (in milliseconds) of priming remains constant when compared with 2 sec (Rosch, 1975b), and thus ensured that encoding of the first phoneme category in each pair would also be affected sufficiently by the prime . The tone pulse was input to a computer (PC/AT), and it activated both the presentation of the prime on a monochrome computer display and the response-time measurement. The prime was presented for 500 rnsec. A Turbo Pascal program (Bovens & Brysbaert, 1990) was used for the response-time measurement. Procedure. In this and the following experiments, the subjects participated in groups varying from 2 to 5. They were seated in a sound-attenuated room before a computer to which two key buttons were connected. They were instructed to press the same key
608
DECOENE
if the words in the stimulus pair began with physically identical segments and the different key for different word-initial segments; they were asked to do this as rapidly as possible without making errors. Half of the subjects used the dominant hand for the same key, and half used the dominant hand for the different key. The subjects listened to the stimulus tape over headphones (Bayer H 805). Four practice pairs (two identical and two different) were given in order to familiarize the subjects with the procedure and the judgment to be made. The experiment lasted about 20 min.
Results
Overall error rate was 3 %. Within-subject, three-way analyses of variance (ANOY As; randomized block factorial design; see Kirk, 1982; Winer, 1971), with level of goodness of example, priming, and phoneme type (vowel/consonant) as fixed effects, and subjects as a random effect, were performed on the response times for same and different pairs separately. The critical priming x goodness level interaction for physically identical pairs and for different pairs are depicted in Figure 1. For pairs of physically identical stimuli, the ANOY A revealed a significant P x P interaction effect [F(l, 77) =
500
EXPERlMENT3B Primed Phoneme Matching Under Same-Name Instructions
@
RT
450
Method
>