William D. MARSLEN-WILSON. Max-Planck-lnstitute for Psycholinguistics, Nijmegen, The Netherlands and Department of Experimental Psychology,. University ...
Speech Communication 4 (1985) 55-73 North-Holland
55
SPEECH SHADOWING A N D SPEECH COMPREHENSION William D. MARSLEN-WILSON Max-Planck-lnstitute for Psycholinguistics, Nijmegen, The Netherlands and Department of Experimental Psychology, University of Cambridge, UK Abstract. Pioneering research by Chistovich and her colleagues used speech shadowing to study the mechanisms of immediate speech processing, and in doing so exploited the phenomenon of close shadowing, where the delay between hearing a speech stimulus and repeating it is reduced to 250 msec or less, The research summarised here began with an extension of Chistovich's findings to the close shadowing of connected prose. Twenty-five percent of the women tested were able to accurately shadow connected prose at mean delays ranging from 250 to 300 msec. The other women, and all the men tested, were only able to do so at longer latencies, averaging over 500 msec. These are called distant shadowers. A second series of experiments established that close, just as much as distant shadowers, were syntactically and semantically analysing the material as they repeated it. This was reflected in the ways their spontaneous errors were constrained, and in their sensitivity to disruptions of the syntactic and semantic structure of the materials they were shadowing. A third series of experiments showed that the difference between close and distant shadowers was in their output strategy. Close shadowers are able to use the products of on-line speech analysis to drive their articulatory apparatus before they are fully aware of what these products are. This means that close shadowing not only provides a continuous reflection of the outcome of the process of language comprehension, but also does so relatively unaffected by post-perceptual processes. In this sense, therefore, close shadowing provides us with uniquely privileged access to the properties of the system. R6sum6. Les travaux d'avant-garde de Chistovich et de ses coll~gues se sont servis de la tfiche de r6p6tition (shadowing) pour 6tudier le traitement imm6diat de la parole. Ce faisant ils ont exploit6 le ph6nom~ne de r6p6tition instantan6e durant lequel le d61ai entre l'audition et la r6p6tition du stimulus est r6duit/i 250 ms ou moins. La recherche ici d6crite a d6but6 par 6tendre les d6couvertes de Chistovich ~ la r6p6tition instantan6e de la prose continue. 25% de nos sujets f6minins furent capables de r6p6ter avec pr6cision de la prose continue avec des d61ais moyens de 250 ~ 300 ms. Les autres sujets f6minins et tousles hommes test6s manifest~rent des latences plus longues atteignant en moyenne plus de 500 ms. Ces sujets sont d6nomm6s des r6p6titeurs distants. La seconde s6rie d'exp6riences permit d'6tablir que les deux cat6gories de sujets proc6daient h une analyse syntaxique et s6mantique du mat6riau verbal pendant qu'ils le r6p6taient. Ceci est manifest6 par les contraintes s'exerqant sur leurs erreurs spontan6es et dans leur sensibilit6 aux ruptures de la structure syntaxique et s6mantique du mat6riau verbal en cours de r~p6tition. Une troisi~me s6rie d'exp6riences a montr6 que la diff6rence entre r6p6titeurs imm6diats et distants r6sidait dans leur strat6gie d'expression. Les premiers sont capables d'utiliser les r6sultats de leur analyse en temps r6el pour diriger leur appareil articulatoire avant m6me d'etre pleinement conscients de ce que sont ces r6sultats. Ceci signifie que la r6p6tition imm6diate, non seulement refl~te de mani~re continue le r6sultat du processus de compr6hension du langage mais 6galement y parvient en n'6tant que relativement peu affect6 par les processus post-perceptifs. En ce sens, donc, la r6p6tition imm6diate nous fournit un ace,s singuli~rement privil6gi6 aux propri6t6s du syst~me.
Zusammenfassung. Durch den Einsatz der Methode des begleitenden Nachsprechens (shadowing) haben sich Chistovich und ihr Kreis den Ruf erworben, Wegbereiter einer neuen Entwicklung in der Erforschung der Prozesse bei der Sprachverarbeitung zu sein. Im Rahmen ihres Ansatzes wandten sie dem Ph/inomen des unmittelbaren Nachsprechens (close shadowing), in dem die zeitliche Verz6gerung zwischen der Aufnahme des sprachlichen Stimulus durch das Ohr und seiner Wiederholung auf 250 ms oder weniger reduziert wird, besondere Aufmerksamkeit zu. Unsere hier im l~!berblick vorgestellten Untersuchungen begannen mit der Ausweitung der yon Chistovich erarbeiteten Forschungsergebnisse auf unmittelbares Nachsprechen yon zusammenhangender Prosa. Fiinfundzwanzig Prozent unserer weiblichen Testpersonen waren in der Lage, zusammenhiingende Prosa mit einer mittleren Verzfgerung von 250 bis 300 ms korrekt nachzusprechen. Die iibrigen weiblichen sowie alle miinnlichen Versuchspersonen benOtigten fiir dieselbe Aufgabe Latenzzeiten von durchschnittlich mehr als 500 ms (distant shadowing). In einer zweiten Versuchsreihe liel3 sich nachweisen, dab die Versuchspersonen beim unmittelbaren wie auch verz6gerten Nachsprechen das Material wiihrend der Wiederholung einer syntaktischen und semantischen Analyse unterwerfen. Dies zeigte sich einmal am Typ ihrer spontanen Fehler, zum anderen in ihrer Anfiilligkeit Briichen in der syntaktischen und semantischen Struktur des nachgesprochenen Materials gegeniiber. Eine dritte Untersuchungsreihe schliel31ich machte offenkundig, dab der Unterschied zwischen beiden VPP-Gruppen in der 0167-6393/85/$3.30 © 1985, Elsevier Science Publishers B.V. (North-Holland)
W.D. Marslen-Wilson / Shadowing and speech comprehension
56
Ausgabestrategie begriindet war. Beim unmittelbaren Nachsprechen werden die Ergebnisse der Sprachanalyse zur quasisimultanen Aussteuerung des Artikulationsapparates verwendet, bevor der Sprecher sich g/inzlich dariiber im klaren ist, was das Produkt eigentlich darstellt. Unmittelbares Nachsprechen spiegelt also nicht nur das Ergebnis des Sprachverstehensprozesses kontinuierlich wider, sondern kann dies auch relativ ungest6rt von post-perzeptiven Prozessen. In diesem Sinne gestattet das unmittelbare Nachsprechen daher einen einzigartigen Zugriff auf die Eigenschaften des Systems. Keywords. Shadowing, immediate processing, connected prose, close and distant shadowers, language comprehension process.
Introduction
Speech shadowing is perhaps the purest example of the classical "black-box" paradigm that it is possible to achieve in the study of human higher mental processes. The shadowing subject listens to a spoken message, and his task is to immediately repeat it back, word for word as he hears it. This means that the output of the system is not only more or less identical to its input, but also that it gives a effectively continuous reading of the transfer-function of the system, as it transduces speech input into speech output. The qualitative and quantitative properties of this shadowing response, and the way it varies when perturbations are fed into the system, can be used to make direct inferences about the properties of human speech processing. In research carried out in the late 1950's and the early 1960's, Chistovich and her co-workers were the first to exploit these properties of speech shadowing, using the task to investigate some fundamental properties of the speech perception process (Chistovich, 1960; Chistovich, Aliakrinskii, & Abul'ian, 1960; Chistovich, Klaas, & Kuzmin, 1962). This research was important on three major counts. First, it focussed on the immediate performance of the speech processing system, looking at the way the system analysed the incoming speech signal in real time. Secondly, it introduced the use of speech shadowing as a way of monitoring the properties of this on-line analysis process. Thirdly, it discovered, and made systematic use of, the phenomenon of speech shadowing at very short response delays--in some studies as short as 150 msec. Short latency--or "close"--shadowing is significant on both methodological and theoretical grounds. Methodologically, it is significant because the shorter the delay between hearing SpeechCommunication
something and repeating it, the more directly this response will reflect the immediate properties of the analysis processes that operate on thespeech input as it is heard. This is in contrast to so-called "successive" (Levelt, 1978) or "off-line" tasks (Marslen-Wilson, 1976; Marslen-Wilson & Tyler, 1980), where the subject responds only after all of the relevant input has been heard, and where post-perceptual processes play a much more important role in determining the character of the response. The theoretical significance of close shadowing is that it tells us something about the minimum time-window over which the process of speech analysis can operate. If an output can be produced within as little as 150 msec, then this places strong constraints on the possible characteristics of the processes involved. But to discover what these constraints are, it is necessary to find out just what types of analysis process are implicated in close shadowing performance. This was not a question that Chistovich herself has investigated. In her speech shadowing research, she exploited the technique to study the primary transduction of the speech signal into some form of acoustic-phonetic representation, and used the shadowing of isolated syllables to do so. The research that I am going to review here used the shadowing technique to investigate the more general process of mapping from sound onto meaning, and focussed, therefore, on the shadowing of connected, meaningful prose. Although some of this work has been published in a summary form (Marslen-Wilson, 1973a; 1975), most of it (Marslen-Wilson, 1973b) is not widely available. I take the opportunity to review it in more detail here, both because of its relevance to research into speech comprehension, and because it stems directly from the pioneering work .of Chistovich and her colleagues.
W.D. Marslen-Wilson / Shadowing and speech comprehension
1. Close shadowing of connected prose In their earliest shadowing studies, Chistovich and her colleagues tested eight subjects, asking them to repeat back passages of normal continuous prose (Chistovich et al., 1960). Five of these eight subjects performed as distant shadowers. They repeated back the speech at average delays of between 500 and 1500 msec. Their repeated speech was completely clear, but with some omission of whole words and phrases. The other three subjects performed in a way that was unique in the published literature. These were close shadowers, repeating the speech back at mean latencies of 150-200 msec. Although no words appeared to be omitted, their speech was extremely slurred. It was so slurred, in fact, that it was impossible to determine how accurately they were repeating the material. In all subsequent research involving close shadowers, Chistovich used only isolated syllables, where it was possible to evaluate more successfully whether or not a particular speech sound was being correctly repeated. This difference in intelligibility between close and distant shadowers was consistent with other evidence suggesting that the accurate repetition of connected prose required long response delays. In other research on the shadowing of connected prose, the latencies ranged from 750 msec to over two seconds (Carey, 1971; Treisman, 1965). Conversely, much shorter latencies were found for the repetition of isolated words--Saslow (1958) and Davis, Moray, and Treisman (1961), for example, both report latencies of 200 msec or less, with even shorter latencies reported by Chistovich's group for the shadowing of single syllables. Normal connected prose differs from lists of isolated words in possessing a higher-order syntactic and semantic organisation. If, as seemed likely, this had consequences for the size of the psychological processing units in the two types of material, then this might explain the differences in shadowing latency, and the deterioration in performance when subjects attempted to shadow connected prose at very short latencies. This in turn was consistent with a view of language comprehension which held that the processes of syntactic and semantic interpretation did indeed op-
57
erate over relatively large processing units (e.g., Fodor, Bever, & Garrett, 1974). The starting-point for my own research in this area was the attempt to evaluate Chistovich et al's (1960) findings for a larger sample. Did their results truly reflect the temporal limits on the accurate shadowing of connected prose? In this first section of the review I will describe the search for close shadowers, the kind of training they required, and the basic properties of their performance in shadowing connected prose.
1.1. Training and selection In the search for accurate close shadowers, I tested 65 students from MIT and Wellesley College (40 men and 25 women). All of them went through an initial screening session where they shadowed five 1000-word passages of descriptive prose, taken from Hemingway novels. These passages had been recorded by an experienced male reader at a speaking rate of 160 words per minute. I chose this rate because it was a speed at which shadowers felt comfortable, and approximates a normal conversational speaking rate. Each subject was first familiarised with the task and allowed to shadow a short passage in whatever way came naturally to them. The subject was then asked to shadow as closely as possible; to say each word immediately as they heard it. Most subjects could not do this. I found that if they did not show signs of being able to shadow closely and clearly as soon as they started trying to do so, then they did not improve with practice. Of the 65 men and women I evaluated, eight proved capable of shadowing closely while still maintaining the intelligibility of their speech. All of these were women. It is not clear whether this reflects a true sex difference, or just the voice in which the stimulus materials were read (Underwood & Moray, 1971). If the stimulus materials had been read by a woman, then some of the men might also have been able to shadow clearly at short latencies. All the other 57 subjects tended naturally to shadow at rather longer latencies. When asked to shadow closely, they either did not, in fact, stop shadowing at their preferred distance, or, if they did shadow more closely, it was at the expense of Vol. 4, Nos. 1-3, August 1985
58
W.D. Marslen-Wilson /Shadowing and speech comprehension
intelligibility. They produced a travesty of normal speech, an incomprehensible muttering that roughly preserved the prosodic structure of the original. Listening to their output in isolation, it was impossible to understand anything save an occasional word at the end of a sentence. Listened to simultaneously with the original, their reproduction became a recognisable but highly degraded representation of the material. It is likely that this was the level of performance reported by Chistovich for connected prose. In subsequent research. I compared two selected groups of shadowers--the seven best close shadowers, and seven distant shadowers (also all women), chosen for their ability to shadow fluently and accurately. Both groups of shadowers were given an additional 5000 words practice.
1.2. Basic parameters of close and distant shadowing performance The first study looked at the exact response delays and associated error-rate for close and distant shadowers. To do this, I selected two new passages of descriptive prose, each 300 words in length, and recorded in the same manner as the training passages, at a rate of 160 wpm. In the test session the subjects heard these preceded by two new practise passages, which they shadowed until they were adequately re-accustomed to the task. They were then asked to shadow the test pasages consistently and naturally at the shadowing distances they had established in the training sessions. The subsequent analyses were based on simultaneous two-track recordings of the target material and of the subjects' responses to it. Seventyfive latency measurements were made in each passage, at the same points for each subject. These measurement points were distributed equally across the passage, and at different serial positions within sentences. All measurements were made at word-onset, so that what I was measuring was always the delay between the onset of a word in what the subject was hearing and the onset of the same word in the subject's repetition of this material. The measurement technique combined oscillographic and auditory Speech Communication
methods for locating the measurement points in each subject's actual output. In the error analysis, all deviations from the original in the subjects' performance were treated as an error. I distinguished three categories of error. Constructive errors were errors in which the shadowers added or substituted new words for what they actually heard, or where they changed part of a word to produce a new word (sometimes a nonsense word). Delivery errors, which reflected articulatory clarity and fluency, included slurrings hesitations, stutterings, and unintelligible responses. Omission errors were simply omissions of complete words. The mean shadowing latencies and the error scores for the 14 shadowers are given in Table 1. The subjects are ranked according to their mean latencies, with $1 having the shortest latency. It is clear that connected normal prose can be shadowed at very short latencies. Four of the subjects have overall mean latencies for the two passages of less than 275 msec. The shortest latency obtained for a single passage was S2's mean of 230 msec for the first passage. This subject shadowed the first 100 words of the passage at a mean latency of 210 msec (with an error-rate of 12%). $1 shadowed the same section at a latency of 223 msec (error-rate of 1%), and the entire passage at a mean latency of 242 msec (error-rate of 0.3%). The latencies of the closest shadowers in this sample are within the range reported by Chistovich et al (1960). Unlike Chistovich's subjects, however, there were no "negative" reactiontimes. The shortest latency recorded was 80 msec. Chistovich also reports that the speech of their close shadowers was very slurred and indistinct. All the close shadowers in this study were easily understandable. The delivery errors in Table 1 measure the general intelligibility of the subjects' responses. The close shadowers' total of 178 such errors (of which a third came from one subject) represent an error-rate of less than 4%. It was also noteworthy how closely these subjects tracked the prosodic contour of the messages they were repeating. It i~ clearly possible to generate p a global suprasegmental structure under conditions where only local information is available. The distant shadowers were, as a group, more
59
W.D. Marslen-Wilson / Shadowing and speech comprehension
Table 1 Shadowing latencies and error scores Close shadowers
Latency (msec) Errors: Omissions Delivery Constructive Total Percent Error
S1
S2
$3
$4
$5
$6
$7
All
254
264
268
273
283
302
361
286
1
6 3
4 52 13
6 34 4
24 28 9
2 25 14
2 11 8
3 22 20
42 178 91
1.7
14.8
7.3
10.2
6.8
3.5
7.5
7.4
Distant shadowers
Latency (msec) Errors: Omissions Delivery Constructive Total Percent Error
$8
$9
S10
S11
S12
S13
S14
All
401
444
526
553
559
600
749
547
3 0 7
0 7 6
0 7 3
0 8 14
3 8 3
0 0 4
1 12 4
7 42 41
1.7
2.2
1.7
2.3
0.6
accurate in their shadowing performance. Regressing error rate on mean shadowing latency, more errors are made at shorter latencies (r = - . 5 7 , p < .025). This overall effect is primarily due to the Delivery errors, and at least partly reflects the different criteria for selecting the two groups. The distant shadowers were chosen on the basis of the fluency and clarity of their performance, whereas the close shadowers were selected primarily on the basis of their latency, given a minimum level of intelligibility. More generally, the error analyses suggest that the difference in accuracy between distant shadowers and very close shadowers is one of degree only, and not the qualitative difference that Chistovich noted for her subjects. In summary, these results demonstrate that 25% of the women screened were able to shadow connected prose accurately and consistently at latencies of 300 msec or less. Five of these subjects shadowed at least one 100-word subsection of the materials at latencies of less than 250 msec. To shadow at these latencies is to repeat the material at a delay of little more than a syllable----estimates of the mean duration of CV syllables in
3.7
2.8
2.1
connected speech converge on a value of 200 msec (e.g., Cooper, Sorenson, & Paccia, 1977; Huggins, 1970). If we assume that some component of the shadowing delay is taken up with the process of response integration and execution, then these subjects are able to initiate their output when they have heard no more than 150 to 200 msec of the input for each word they are repeating. It is likely, in fact, that they are operating at or very near the limits for accurate transduction of spoken materials, given the manner in which the acoustic basis for phonetic decisions is distributed over time. Figure One plots the latency distributions for the four closest shadowers. The striking degree of overlap in these distributions suggests that all of these subjects are indeed up against some common temporal limit on acceptably accurate repetition. This demonstration of accurate close shadowing of connected prose raises two major sets of questions, to which the research summarised in the remainder of this paper is addressed. The first set of questions concerns the type of analysis that the close shadowers are performing on the speech Vol. 4, Nos. 1-3, August 1985
W.D. Marslen-Wilson I Shadowing and speech comprehension
60 25
e,.
@
E 20 0L
0 •
15
W
C
10i
O
.:!/ :11
s
:1
.,,':",...
/.
"
.,
....:
0
100
200
300
400
500
600
700
Response Latency (ms) Fig. I. Distribution of shadowing latencies for four fast shadowers. The distributions are based on 150 measurements for each subject, taken from two 300-word passages of narrative prose.
input. Under what description are they repeating back the materials? Do they process them simply as a sequence of speech sounds, or as a meaningful, structured string? To evaluate the implications of close shadowing for the transfer function of the speech processing system, we need to know just which aspects of the system's capacities are implicated in short latency shadowing. The second, and related set of questions concerns the differences between close and distant shadowers. The relative rarity of accurate close shadowers suggests that they are able to perform in some way that differs from the norm--where the norm can be assumed to be represented by the distant shadowers. But what does this difference consist of, and does it, in particular, reflect some difference in the way the close shadowers perceptually process the material before repeating it? If, unlike distant shadowers, they are able to shadow connected prose just on the basis of some lower-level analysis, then this might explain why they are able to do so at shadowing latencies previously only recorded for the shadowing of Speech Communication
single words. If, on the other hand, the close shadowers' perceptual processing of the input is perfectly normal, then performance in the task would be directly informative about the processes underlying normal listening. The research described in the next section was intended to find out how far the close shadower does repeat back the input on the basis of a purely acoustic-phonetic analysis, and how far-higher levels of analysis---lexical, syntactic, and semant i c - a r e also involved. The strategy I followed was always to test both close shadowers and a comparison group of distant shadowers as well. The extent to which close shadowers were sensitive to the same psycholinguistic variables as the distant shadowers would throw light not only on the similarities and differences between them, but also on the types of analysis being carried out at short shadowing delays.
W.D. Marslen-Wilson / Shadowing and speech comprehension
2. The processing basis for shadowing performance We know from tests of memory for shadowed passages that both close and distant shadowers do at some point construct a normal memory representation of the material they are shadowing. In one experiment (Marslen-Wilson, 1973b), shadowers were given an unexpected memory test after shadowing a 600-word passage of descriptive prose. The mean recall score of the seven close shadowers (22.7) did not differ from the recall score of a distant shadowing group (22.8), or from the score of a control group of normal listeners (23.2). This means that close and distant shadowers can recall information which could only have come from full comprehension of the shadowed material. But this does not necessarily mean that they fully comprehend the material as they repeat it. The close shadowers, in particular, might initially repeat the material on the basis of whatever lower-level analysis they could achieve within 250 msec, and only later complete its interpretation. To evaluate this possibility, and to determine just what kind of analysis does underpin shadowing performance, I conducted three kinds of test. These are described in three following sections. 2.1. Constraints on spontaneous errors
The errors a subject makes in performing a task reflect the forms of analysis in terms of which the task is being performed. If the shadower's immediate repetition is, for example, at a syllabic level of analysis, then her errors should be constrained only by the syllabic character of the material, and not by its higher-level properties. There should be nothing to prevent the shadower, when she made an error, from producing a word that was semantically or syntactically anomalous with respect to its immediately preceding context. And if shadowing latency is related to level of immediate processing, then there should be latency-dependent differences in the kinds of errors made. I used for the error analysis the Constructive errors made on the two passages which I had analysed earlier (see Table 1). These errors are
61
changes in the lexical content of the original input, and (with the exception of 21 nonsense words) can be analysed for their compatibility with the context. The remaining 111 errors were scored according to their semantic and syntactic congruence with their entire preceding context, up to and including the word immediately preceding each error. Only two of these errors were incompatible with their prior context--S7, for example, repeated "They were wet to the skin ...", as "They were went to the skin ...". Otherwise, all the Constructive errors were both syntactically and semantically congruent with their preceding context. This held true for subjects at all shadowing latencies. The implication of this, that both close and distant shadowers have the same kinds of information available to them to guide their word-byword repetition, is supported by the qualitative similarities between the errors made at different latencies. In one case, for example, five subjects made the same error at the same point. In the context "He had heard at the Brigade ...", five subjects (with mean shadowing latencies ranging from 264 to 749 msec) repeated the sentence as "He had heard that the Brigade ...". This example, and the errors in general, show that the subjects' output is constrained by even the last word before the error. They would need, for example, to have extracted the structural implications of "heard" to make the syntactically appropriate error of saying "that" instead of "at". As this example also makes clear, the grammaticality of the shadowers' errors cannot be accounted for simply in terms of a general constraint on the shadower to produce a well-formed output. The appropriateness of any given error is contingent upon an awareness of the specific semantic and syntactic properties of the items preceding the error. I also asked whether the distribution of these errors varied as a function of position in a sentence or clause. It was possible that close shadowers nonetheless exhibited some lag, relative to distant shadowers, in their higher-level analysis of the input, and that the reason incongruent errors were not found at shorter shadowing latencies was because few errors of any kind were made at the Vol. 4, Nos. 1-3, August 1985
62
W.D. Marslen-Wilson / Shadowing and speech comprehension
beginnings of clauses. For both close and distant shadowers, in fact, the errors were equally distributed over the beginning, middle, and end of clausal units. A remaining possibility was that the close shadowers' constructive errors occurred only when they were shadowing at longer latencies than average. To assess this, I measured the close shadowers' latencies for the word immediately preceding each constructive error. In fact, for each close shadower, her mean latency immediately preceding these errors was shorter than her shadowing latency for the rest of the passage (for the group as a whole, a mean of 264 as opposed to 286 msec).
2.2. Global disruption of linguistic structure The analysis of the close shadowers' errors suggests that their output takes into account not only the acoustic-phonetic properties of the material, but also its higher level organisation, including its syntactic and semantic structure. We can test this claim directly by giving the subjects prose materials whose higher-level structure is systematically disrupted. If shadowers do use lexical, syntactic, and semantic information to guide their on-line performance, then the absence of any of these should affect their performance. And if there is any difference, across shadowing latencies, in the types of information that shadowers use to guide their performance, then close and distant shadowers should be affected differently by these manipulations.
2.2.1. Experimental questions and materials The experiment used three kinds of anomalous prose, which varied in the extent to which they departed from normal spoken language. The least divergent type of material was called Syntactic Prose, which is semantically uninterpretable but has relatively normal syntactic organisation (c.f., Cowart, 1982; Marslen-Wilson & Tyler, 1983). This material was constructed following the procedure suggested by Miller and Isard (1963) and Treisman (1965). Taking as the starting-point a normal prose passage, all the content words in the passage are pseudo-randomly replaced by words of the same form-class and similar frequency. The resulting material reads like this: Speech Communication
That moment a blanket sat into the station that led over the ambulance and onto which they took off the drink around the name with the ward. But hotel to the driver you felt in the cheese. The brush to talk was to click empty in another lodge, pick up brigade so that the flood could bomb, and then see the columns on the stable. In Syntactic Prose, the listener cannot construct a coherent meaning representation of the material she is hearing. This means that there can be no semantic (or pragmatic) constraints available to guide the on-line interpretation of the speech input. If these constraints are important in close or distant shadowing, then their absence will cause a deterioration in performance, relative to normal prose. Two earlier experiments (Miller & Isard, 1963; Treisman, 1965) have shown that semantic anomaly increases error rate in distant shadowers. Miller and Isard (1963), for example, found that the percentage of short sentences repeated correctly diminished from 89% for Normal sentences to 79% for Syntactic Prose sentences. Neither of these experiments, however, investigated the effects on latency, nor did they include the critical comparison between close and distant shadowers. Prose structure can be more seriously disrupted by randomising the word-order of a normal prose passage. This material, called Random Word-Order Prose, has neither semantic nor syntactic structure, and forms a first-order approximation to English, since it is constrained only by the distributional properties of English prose. The scrambled prose was segmented into sentence sized chunks, each read with a semblance of a prosodic contour, so that the shadowers knew when they could pause to breathe. A sample of the material follows: Everything the new if takes wires trucks it loaded hot I the. Mess would the or the where to offensive same would, for could went narrow toward. Slope all in up there, narrow of dressing to on off side for river by edge of and, hill bearers the the across and edge the. Both Miller and Isard (1963) and Treisman (1965) showed that destroying the syntactic struc-
W.D. Marslen-Wilson / Shadowing and speech comprehension
64
wocky, r = +.62 (p < .0l). The regression lines derived from these analyses are plotted in Fig. 2. In summary, the latency analyses show that the removal of semantic and pragmatic constraints in Syntactic Prose leads to an increase in latency which is constant across shadowing distances. But the removal of syntactic and lexical constraints as well, in Random Word-Order and Jabberwocky, produces a larger increment in the latencies of the distant shadowers than in those of the closer shadowers. This implies that, while all shadowers make use of the higher-level structure of the material in generating their output, the more distant shadowers are more dependent on this type of information than the closer shadowers.
Paralleling the latency results, the effects of global disruption increase error rate for both close and distant shadowers, but have stronger effects at longer shadowing delays. The distant shadowers in fact show a larger increment in errors over their Normal Prose totals for all three types of disrupted materials (see Figure Three). Linear regression of these difference scores on Normal Prose latencies shows a relatively weak correlation for Syntactic Prose (r = +.50, p < .10), but strong effects for Random Word-Order (r = +.83, p < .001) and for Jabberwocky (r = +.84, p < .001). A breakdown of these totals by error category shows that the major differential increase in error rate is in Omission errors. The distant shadowers omit practically no words at all in Normal and Syntactic Prose (mean omission rate of 0.2 percent). But in Random Word-Order and Jabber-
2.2.3. Effects on error rate
The shadowers' output for the three prose materials was initially scored in the same way as their Normal Prose performance (see Section 1.2). The overall error totals are given in Table Three. Errors were equally frequent in Normal Prose and in Syntactic Prose, with in both cases the distant shadowers making significantly fewer errors. The error-rate increases drastically for Random Word-Order and for Jabberwocky, with more errors being made overall in the Jabberwocky passages. + 4 0 0 |-
Total percent error for prose passages
Shadowing groups
Close
Distant
Prose Types: Normal Prose Syntactic Prose Random Word-Order Jabberwocky
7.4 6.8 13.6 17.9
2.1 3.0 18.7 23.3
• = RWO
/
•
• = SP
' 0 /
• = JCKY
•
,""0:+200[ '" z
o
Table 3
•
•
~ ~ t O" ~ • • . . . v j
•
SP
•
Zuj-I00~"~ =E J i ==='• 0
.•
i
,
i
n
,
,
:500
400
500
6 O0
700
800
Z
W
O0
200 MEAN
NP SHADOWING
LATENCY
(MSEC)
Fig. 2. The differences between each subject's mean shadowing latency for Normal Prose (NP), and her mean latencies for Syntactic Prose (SP), Random Word-Order (RWO), and Jabberwocky (JCKY), plotted as a function of her mean Normal Prose latency. Each point is based on 150 latency measurements, taken from two 300-word passages of each type of prose. SpeechCommunication
W.D. Marslen-Wilson / Shadowing and speech comprehension +250
• = RWO
m=Sp • = JCKY
03 IE
o
n,, n,, w ._1
65
.u&3~/
+ 20C
/
+15C
•
z -
-
+
IOC
+
50
LU
z b.I '.'
It. h m
o
•
z
•
•
SP
0
Ixl
-
50
I00
200
300
400
500
600
700
800
MEAN NP SHADOWING LATENCY (MSEC) Fig. 3. The differences between each subject's mean error rate for Normal Prose (NP) and her error rate for Syntactic Prose (SP), Random word-Order (RWO), and Jabberwocky (JCKY), plotted as a function of her Normal Prose latency.
wocky their omission rate climbs to 6.6 percent, compared to 1.7 percent for the close shadowers. The close shadowers appear to be able to keep talking no matter what, while the more distant shadowers simply leave out chunks when they get into problems. Both groups of shadowers show large and roughly equivalent increases in Delivery and Constructive errors. 2.2.4. Constraints on spontaneous errors
The overall error rates reflect only indirectly the shadowers' use of semantic and syntactic constraints in their on-line performance. By examining the subjects' spontaneous errors we can assess more directly the role of these constraints. I showed in Section 2.1. that almost all of the Constructive errors in Normal Prose were congruent with their preceding syntactic and semantic context, irrespective of shadowing latency. A parallel analysis can be carried out for Syntactic Prose, taking into account only syntactic congruence, and with the same result. Less than five percent of the Constructive errors could be interpreted as ungrammatical, and this held independent of latency. A different type of analysis is possible in Jab-
berwocky, where the subjects changed many of the nonsense words they heard into real words. This not only reflects the role of lexical knowledge in on-line performance, but also provides a pool of words that can be assessed for their congruence with their syntactic context. In fact, 67 percent of these new words were syntactically appropriate, and this proportion was exactly the same for the two groups of shadowers. Finally, in Random Word-Order, over 40 percent of the errors involved a apparent reconstruction of the scrambled strings into a more standard form--one subject, for example, repeated the string "To daylight the the attack was of in" as "To daylight that the attack was to begin". Such errors are evidence for an interaction in the shadowers' performance between the lexical and acoustic-phonetic properties of the material and general syntactic and semantic constraints. The distribution of these reconstructive errors does not differ as a function of shadowing delay, suggesting that even the closest shadowing involves this active striving after meaning and structural order.
Vol. 4, Nos, 1-3, August 1985
66
W.D. Marslen-Wilson / Shadowing and speech comprehension
2.3. Selective disruption of prose materials A final series of experiments used selective, rather than global, disruptions of prose materials to probe the shadowers' immediate processing of the speech they were repeating. These were experiments in which the subjects heard normal sentences in which single words were made anomalous in various ways, ranging from violations of local syntactic constraints, such as subject-verb number agreement, to violations of pragmatic expectations. I will focus here on an experiment that combined two types of anomaly to determine directly the types of analysis available to the shadower as she repeats back what she hears (Marslen-Wilson, 1975). The stimulus materials for this experiment were constructed from a pool of 120 pairs of normal sentences. The second sentence in each pair contained a three-syllable target word. In 40 of the sentences, these target words were normal with respect to their context, in a further 40 they were chosen to be semantically anomalous, and in another 40 they were syntactically anomalous as well. This constituted the first level of anomaly. Nested within this was the second level of anomaly, which disrupted the lexical status of the words themselves. Within each group of 40, 10 words were left unchanged, and the other 30 had either their first, second, or third syllables changed, making them into nonsense words. For example, "president" might be changed to "howident", "company" to "comsiny", and "tomorrow" to "tomorrane". The purpose of interweaving these two types of anomaly was to examine the effects of context on "word restoration". I had noticed in earlier studies that shadowers tended to restore mispronounced words--that is, to repeat them back in their original form, father than in the disrupted form in which they heard them. The question here was whether the likelihood of word restoration would be affected by the congruence of the disrupted words with their sentential context. If the repetition of individual words does involve the relationship of these words to their syntactic and semantic context, then the shadowers should be more likely to restore words that were congruent with their context. This question is most critical SpeechCommunication
for the close shadowers, since they will be starting to repeat the disrupted target-words when they can only have heard the first syllable. The results, summarised in Table Four, are clearcut. Restorations are very infrequent in all conditions except two. These were the conditions where the disrupted word was both syntactically and semantically congruent with its context, and where its first syllable was not disrupted. This pattern held equally strongly for close and for distant shadowers (for details, see Marslen-Wilson, 1975). I also ran a control experiment where the same mispronounced words were given to the subjects in isolation, without a sentential context. Here there were very few restorations, and their distribution did not correlate with the pattern in Table Four. These results are direct evidence that the close shadower repeats back connected prose as a sequence of lexical items embedded in a syntactic and semantic interpretative matrix, and not as a stream of uninterpreted speech sounds. Sensory information plays a determining role in shadowing performance, since words are very rarely restored when their first syllable is disrupted. But contextual constraints are also critical, since words are also rarely restored when the word indicated by an undisrupted first one or two syllables is inconsistent with the syntactic and semantic context in which it is occurring. The shadower's lexical interpretation of the acoustic-phonetic input appears to be directly controlled by the contextual environment in which this input is heard, and for this to be possible the shadower must be constructing, on-line, not only a lower-level analysis of the input, but also a syntactic and semantic analysis. Table 4 Word restoration totals Contextual appropriateness Normal Mispronounced syllable First
Second Third
Syntactically Semantically anomalous and syntactically anomalous
2
5
1
21 32
7 4
0 3
W.D. Marslen-Wilson / Shadowing and speech comprehension
The fact that distant shadowers make these restorations is itself also significant. As I showed in a subsequent study, using a new group of distant shadowers (Marslen-Wilson & Welsh, 1978), these subjects usually initiate their repetition of the restored word after the end of the word, and when they have already heard the disrupted syllable. Not only do they nonetheless go ahead and restore the disruption, but also they behave as if they had never been aware of the disruption in the first place. This suggests that the restoration effects we see in the close shadowers are not simply the guesses of subjects trying to generate a response when they have only heard the first syllable of a word. Rather, what we see, in close and distant shadowers alike, is the on-line integration of acousticphonetic, lexical, and contextual constraints to generate the listener's percept of the incoming speech stream. And under certain conditions, analysed in more detail in Marslen-Wilson and Welsh (1978), these different sources of constraint can evidently conspire together to override acoustic-phonetic deviations, so that a word that is mispronounced can be heard as if it were intact.
2.4. Overview and implications The three sets of experiments reviewed in this section unequivocally demonstrate that there is no difference between close and distant shadowers in the depth or extent of their perceptual processing of the material they are repeating. Their spontaneous errors were sensitive to the same kinds of structural and semantic constraints, and they were affected in the same ways by syntactic and semantic anomalies. In the globally disrupted prose experiment, disruptions at the lexical, syntactic, and semantic levels had qualitatively similar effects at all shadowing latencies. I conclude from this that not only do close and distant shadowers not differ in their perceptual processing of the material, but also that these procedures do not differ significantly from those employed in normal listening. This means that close shadowing gives us a continuous reading of the transfer function of the speech comprehension system, as it maps from sound onto meaning. This conclusion is amply confirmed in subsequent re-
67
search, using a variety of experimental techniques, which shows that normal listening does involve the very rapid projection from sound onto meaning that close shadowing also demonstrates (c.f., Marslen-Wilson, 1984; Marslen-Wilson & Tyler, 1981; Tyler, 1981). What remains unanswered, however, is the question of what does determine latency differences between individuals. To understand fully what close shadowing is telling us about the transfer function of the speech comprehension system, we need to know what close shadowers are doing that distinguishes them from distant shadowers. But if they do not differ in their perceptual processing of the input, then where do they differ? In the next section I review the evidence that shadowing distance reflects individual differences in output strategy.
3. Output strategies in close and distant shadowing An important clue to the nature of the difference between close and distant shadowing is the phenomenological difference between them. Chistovich describes her close shadowers as repeating the material "before they understood" (Chistovich et al., 1960). Many of the close shadowers I tested had a similar experience, feeling that they were repeating the material before they "knew what it was". This is not the experience of more distant shadowers, who report that they listen to one or two words, and then, after they know what the words are, repeat them. How are we to interpret these phenomenological differences? We cannot interpret them as statements about the levels of perceptual process involved in generating the on-line shadowing response. Although a close shadower may feel that she is saying something before she really knows what it is, it is clear that what she is repeating is just as constrained by its higher-level context, and just as integrated into the syntactic and semantic matrix of the current utterance, as anything that the more distant shadowers are repeating. Instead, we should interpret these reports as reflecting the shadower's output strategies. I suggested earlier (Section 1.2.) that close shadowVol. 4, Nos. 1-3, August 1985
68
W.D. Marslen-Wilson / Shadowing and speech comprehension
ers must be operating close to the temporal limits for a phonologically determinate analysis of the speech input. As soon as the input becomes acoustically-phonetically intelligible, the close shadowers appear to be able to start repeating it. The experiential consequence of this--if we take the close shadowers' reports at face value--is that they have access to the products of the speech comprehension process, for the purpose of generating an output, before these products have fully emerged into the light of conscious awareness (Marcel, 1983; Marslen-Wilson & Welsh, 1978). Practically speaking, this means that the close shadowers are repeating words before they are fully aware of what these words are. The more distant shadowers, in contrast, are repeating back the input on the basis of a perceptually complete analysis of what they are hearing. They feel that they do know what words they are repeating-just as, in normal listening, we feel that we know what the words are that we hear. We can then, at a first approximation, characterise the difference between close and distant shadowers in terms of their dependence on an explicit, conscious knowledge of the lexical identity of what they are repeating. Taking this hypothesis as a starting point, I explored the differences between close and distant shadowers in two further sets of experiments. I looked first at the implication that word-level variables should have stronger effects at longer shadowing latencies.
3.1. Lexical variables and shadowing distance The three variables I investigated were wordlength, word-frequency, and lexical status (whether a sound-sequence was a real word or not). If close shadowels start to repeat a word as soon as the input becomes sufficiently phonologically determinate, then the total length of a word should be irrelevant to their repetition latency. Similarly, their latencies should not be affected by differences in word-frequency, since this is also a property of entire words. If distant shadowers do need to know, explicitly, what a word is before repeating it, then both word-length and word-frequency should affect their performance. They Speech Communication
should, furthermore, be more affected than close shadowers by the lexical status of a sound sequence, since nonsense words do not have a lexical identity in the way that real words do. These predictions could not be tested for words in context, since, as we have already seen, lexical interpretation is strongly affected by contextual constraints. Instead, I used lists of isolated words. Two different word-lists were constructed, each read at a rate of one word per second in groups of five words, with a two second break between groups, and preceded by a short practice list. The Normal Word list co-varied word-length and word-frequency. It consisted of 60 words, falling into six categories: High and low frequency varied across one-, two-, and three-syllable words. The High frequency words had a mean Kucera and Francis (1967) frequency of 225, and the Low Frequency words of 2.7. The initial phonemes of the words were matched across the six categories. The accompanying Nonsense Word list consisted of 75 nonsense words, divided equally into one-, two-, and three-syllable sequences. These lists were given, in two separate sessions, to the same close and distant shadowers I had tested earlier, except that S1 was no longer available, and was replaced by a new close shadower (mean Normal Prose latency of 270 msec). In addition, $4 did not participate in the nonsense word session.
Table 5 Mean latencies for normal and nonsense words (msec) Shadowing groups
Close
Distant
Normal Words: One-Syllable Two-Syllable Three-Syllable
256 254 272
484 499 518
Total:
261
500
Nonsense Words: One-Syllable Two-Syllable Three-Syllable
276 304 291
575 604 647
Total:
290
609
W.D. Marslen-Wilson / Shadowing and speech comprehension
The overall mean latencies for the two groups of shadowers are given in Table Five, broken down by word-length and type of word-list. I looked first at the effects of word-length. For both Normal and Nonsense Words, there was a significant interaction between subjects and wordlength, reflecting a tendency for the effects of word-length on shadowing latency to increase for the more distant shadowers. There is a positive correlation between Normal Prose shadowing distance and the size of word-length effects for Normal Words (r = +.55, p < .05) and for Nonsense Words (r = +.77, p < .01). In both lists the close shadowers show no systematic effect of wordlength; in fact, it is only the subjects repeating words at mean latencies of 500 msec or more who show consistent increases in shadowing latency across all three word-lengths. The effects of word-frequency and of lexical status follow similar patterns. Mean latency to low frequency words tends to increase with shadowing distance (r = +.61, p < .025), with the close shadowers showing no effect at all of word-frequency. Frequency did, however, affect error rate. The close shadowers' percent error increased from 7.1% for high frequency words to 16.3% for low frequency words. The distant shadowers made very few errors throughout (overall percent error of 1.2%). Nonsense Words are shadowed significantly more slowly than Normal Words, but, again, the size of the difference correlates positively with Normal Prose shadowing latency (r = +.55, p < .05). Apart from one close shadower who appears to shift output strategy--her mean latency for Nonsense Words increases to 433 msec, compared to her Normal Prose mean of 287 msec--the mean increase for the close shadowing group is 12 msec. Lexical status also had little effect on error rate. The close shadowers' overall percent error (13.3%) was comparable to their error rate for Normal Words (11.4%). The distant shadowers error rate remained very low at 2.9%. This means that all three lexical variables had either weak or nonexistent effects on the response latencies of the close shadowers. These subjects start to repeat what they hear as soon as it becomes phonologically determinate, and before they are fully aware of what words they are utter-
69
ing. The effect of frequency on the close shadowers' error rate is a reminder, however, that although lexical variables may not control the timing of close shadowing performance, they can nonetheless help to determine its quality and accuracy. Finally, for a different perspective on the output strategy hypothesis, I looked at the relationship between the shadowers' mean latencies on Normal and Nonsense Words and their earlier performance on Normal Prose (see Table One). If the differences between individual shadowers reflect output strategies, and are unrelated to input variables, then variations in the nature of the input should not affect these differences. I had already observed that there was a strong correlation between mean Normal Prose shadowing latency and mean latency on the disrupted prose passages (for Syntactic Prose, Random WordOrder, and Jabberwocky, the correlation coeffcients were, respectively, +.91, +.95, and +.98). But this did not mean that the correlation would continue to hold for lists of isolated words, heard at a rate of one per second. In fact, there was a very close relationship between performance on Normal Prose and performance on the two word lists. The regression of mean Normal Word latencies on mean Normal Prose latencies yielded a correlation coefficient of +.94 (p < .001), with the corresponding value for the Nonsense Word lists being +.93 (p < .001). This means that, whatever determines relative differences between individuals' shadowing latencies, it is something that holds constant across a range of materials that vary quite drastically in their perceptual properties. This leads to the second test of the output strategy hypothesis. If such a hypothesis is correct, then the only way to make close shadowers behave like distant shadowers is by making them change their output strategy, so that they too are fully aware of what words they are hearing before they start to repeat them. The attempt to do this is described in the next section.
3.2. Output criterion and shadowing latency Instead of asking shadowers simply to repeat everything they hear, one can ask them to repeat Vol. 4, Nos. 1-3, August 1985
70
W.D. Marslen-Wilson / Shadowing and speech comprehension
only those items that meet certain pre-specified criteria. This will cause the close shadowers to behave more like the distant shadowers if the kind of information necessary to do this is normally not available to them at shadowing onset, but is available to the distant shadowers. To explore this, I used two types of monitoring task. In Semantic Monitoring, the shadowers are instructed to repeat back only those items in a list that fall into a given semantic class--for example, the taxonomic category "animal". In Phoneme Monitoring, in contrast, the shadowers only repeat back those items that contain a given speech sound--for example, the phoneme/b/. The critical difference between the two tasks is that correct performance in Semantic Monitoring requires knowledge of the words being heard, whereas correct performance in Phoneme Monitoring does not. A decision about membership of a semantic category is based on the semantic properties of words, and not of phonemes or syllables. Decisions about the phonological properties of sound sequences are not obligatorily tied to knowledge of words in the same way. The appropriate baseline for performance on these two tasks is the same subjects' performance on Normal Words (Section 3.1.). In normal shadowing, the subject listens to each word for long enough to accumulate whatever information she needs to be able to start repeating it. If either Semantic or Phonemic Monitoring require information that is not available to her at her usual shadowing delay, then her response-time will increase, relative to Normal Words. And if shadowers at different distances vary in the availability, at shadowing onset, of the information required for correct response in these tasks, then the size of this increase should correlate with shadowing distance. Since Semantic Monitoring, more than Phoneme Monitoring, requires the kind of explicit knowledge about a word's identity that I have argued the close shadowers do not have at shadowing onset, it is here that the output strategy hypothesis predicts the strongest effects, making the close shadowers perform most like the distant shadowers.
Speech Communication
3.2.1. Experimental procedure and results The test list for Semantic Monitoring consisted of 100 words, read at a rate of one word a second, in groups of five. Forty-five of the words were animal names, divided into three word-length classes (one, two, and three to five syllables) matched for word-frequency. For Phoneme Monitoring I constructed two different lists. In one of these, the phoneme target always occurred word-initially. Here I expected the strongest differentiation from Semantic Monitoring, since the shadowers could--at least in principle--respond as soon after word-onset as they could manage to identify the initial phoneme. The list consisted of 100 words, of which 45 were target words, all beginning with a Po/. None of the other words contained a/b/. The word-lengths, the word-frequencies, and the order of the target words were copied exactly from the Semantic Monitoring list. As an additional control I included a second list in which the position of the target phoneme was not restricted to initial position. If a target phoneme occurs late in a word, the shadower will have to delay responding until she hears this phoneme. If Semantic Monitoring also requires the shadower to wait until she has heard more of the word, then performance in the two tasks should be more similar than for the word-initial targets. The Unrestricted list consisted of 90 words, of which 40 contained /b/'s. These 40 words were subdivided into eight groups of five: three groups with word-initial targets (one, two, and three syllables in length), three groups where the target came at the end of the first syllable (also for one-, two-, and three-syllable words), and finally two groups where the target came at the end of the second syllable (two- and threesyllable words only). The same close and distant shadowers as before (Section 3.1.) heard the Semantic Monitoring and the Phoneme Monitoring/Initial (PM/Initial) lists in one test session, and the Phoneme Monitoring/Unrestricted (PM/Unrestricted) list in a second session. One subject ($11) performed so poorly on Semantic Monitoring that her results could not be used. She is not included in any of the analyses reported below. TO assess the relative differences between
W.D. Marslen-Wilson / Shadowing and speech comprehension Table 6 Mean latencies for monitoring tasks (msec) Shadowing groups
Close
Distant
Semantic Monitoring Phoneme Monitoring/Unrestricted Phoneme Monitoring/Initial (Normal Words
611 688 373 261
733 776 582 495)
shadowing groups over the three tasks, I looked first at the overall mean latencies for the two groups. These are given in Table Six, with the Normal Words results included for comparison. It is immediately clear that PM/Initial is quite similar to Normal Words, while both PM/Unrestricted and Semantic Monitoring bring the two groups of shadowers much closer together. The mean difference in latency between the two groups is 234 msec for Normal Words, and this falls only slightly (to 209 msec) for PM/Initial. In contrast, the latency difference drops to 122 msec for Semantic Monitoring and to 88 msec for PM/ Unrestricted. This pattern is reflected in the Normal Words difference scores for the three tasks. For PM/Initial, the increase in shadowing latency over Normal Words latency does not vary significantly as a function of individual shadowing distance (r = -.50, p > . 10). But for PM/Unrestricted and for Semantic Monitoring the two are strongly correlated. The regression of the PM/Unrestricted difference scores on shadowing distance gives a coefficient of -.72 (p < .01), with the corresponding value for Semantic Monitoring of - . 8 0 (p < .01). In these two tasks, the most distant shadowers show the smallest increases in shadowing latency. The differential effects of the tasks can also be seen in correlations between overall monitoring latencies and Normal Prose latencies. The strong correlation between PM/Initial scores and Normal Prose (r = + .78) confirms that this version of the monitoring task had only minor effects on individual shadowers' relative shadowing distances. In contrast, the correlation with Normal Prose drops to + .59 (p < .05) for PM/Unrestricted and to +.54 (p < .10) for Semantic Monitoring. This means that the proportion of shared variance has fallen from 88% for Normal Words to 35% for PM/Unrestricted and to only 29% for Semantic
71
Monitoring. The way that the close shadowers are performing in these two tasks has very little in common with their performance in any of the previous experiments. This result not only confirms the claim that shadowing distance reflects output strategy, but also that the crucial variable is explicit knowledge of the words being uttered. In PM/Initial, where the shadower can initiate her response without having to know anything about the word as a whole, there is a general increase in shadowing latency which does not significantly differ for the two groups. PM/Initial requires an extra decision process before shadowing can begin, but this decision is based on information about the input that is apparently equally available at all shadowing distances. In Semantic Monitoring a correct response does require explicit awareness of the word being heard, and here we find a strong differential between close and distant shadowers. Over and above the general increase in response time attributable to the extra decision process involved, the close shadowers must wait longer than the distant shadowers, relative to normal shadowing onset, before they get the information they need about what they are hearing. And because Semantic Monitoring does require a longer delay from word onset, performance on this task is very similar to performance on PM/Unrestricted, where the position of the target late in the word also brings shadowers of all distances into closer alignment with eachother. To complete the analysis I looked at the shadowers' errors, distinguishing Repetition Errors from False Negatives and False Positives (see Table Seven). The number of Repetition Errors-mistakes made while repeating one of the targets--was relatively low, and there were also very few False Negatives. The most common erTable 7 Overall error percentages for monitoring tasks Shadowing groups
Close
Distant
Repetition Errors False Positives False Negative
4.4 8.7 1.9
1.2 1.5 1.2
Vol. 4, Nos. 1-3, August 1985
72
W.D. Marslen-Wilson / Shadowing and speech comprehension
rors were False Positives, where the subject repeats, or starts to repeat, a word that is not a target. The percentage is highest in Semantic Monitoring, where the close shadowers made a large number of errors on words that shared their initial syllables with animal names--for example, words like rash, cab, and paradigm, which have the same initial syllable as the animal names rat, cat, and parrot, respectively. Even in a task where the close shadowers are forced to wait until they have heard more of a word than normal, they will still sometimes initiate their response before they could, in fact, have heard enough of the word in question.
3.3. Concluding remarks In the Introduction to this paper, I argued that close shadowing was methodologically significant because the shorter the delay between hearing something and repeating it, the more directly this response will reflect the immediate properties of the speech perceptual process. The research reported in this last section of the paper suggests that close shadowing provides not only direct, but privileged access to the dynamic properties of the speech comprehension process. Close shadowers are individuals who are able--for whatever reason--to use the products of on-line speech analysis to drive their articulatory apparatus before they are fully aware of what these products are. This means that what they produce not only provides a continuous reflection of the outcome of the process of language comprehension, but also does so relatively unaffected by post-perceptual processes. It is in this sense that close shadowing provides us with uniquely privileged access to the properties of the system. It is the "on-line" task that is the least contaminated by the "off-line" strategies that often obscure the interpretation of other experimental tasks. What it tells us about the process of language comprehension is that the mapping from sound onto meaning is rapid, efficient, and apparently continuous; that, in effect, we can understand speech word by word as we hear it. And by making this wholly explicit, the phenomenon of close shadowing constrains not only the kinds of quesSpeech Communication
tions we need to ask about speech comprehension, but also the kinds of answers that will be acceptable.
References [1] P.W. Carey, "Verbal retention after shadowing and after listening", Perception and Psychophysics, Vol. 9, 1971, pp. 79--83. [2] L. Carroll, Through the Looking Glass, Penguin, London, 1948. (First published 1872). [3] L.A. Chistovich, "Classification of rapidly repeated speech sounds", Akusticheskii Zhurnal, Vol. 6, 1960, pp. 392-398. [4] L.A. Chistovich, V.V. Aliakrinskii, and V.A. Abulian, "Time delays in speech repetition", Voprosy Psikhologii, Vol. 1, 1960, pp. 114-119. [5] L.A. Chistovich, Yu. A. Klaas and Yu. I. Kuz'min, "The process of speech sound discrimination", Voprosy Psikhologii, Vol. 6, 1962, pp. 26-39. [6] W.E. Cooper, J.M. Sorenson, and J.M. Paccia, "Correlations of duration for nonadjacent segments in speech: Aspects of grammatical coding", Journal of the Acoustical Society of America, Vol. 61, 1977, pp. 1046-1050. [7] W. Cowart, "Autonomy and interaction in the language processing system: A reply to Marslen-Wilson & Tyler", Cognition, Vol. 12, 1982, pp. 109-117. [8] R. Davis, N. Moray, and A. Treisman, "Imitative responses and the rate of gain of information", Quarterly Journal of Experimental Psychology, Vol. 13, 1961, pp. 78-89. [9] J.A. Fodor, T.G. Bever, and M.F. Garrett, The Psychology of Language, McGraw-Hill, New York, 1974. [10] A.W.F. Huggins, "Distortion of the temporal pattern of speech: Interruption and alternation", Journal of the Acoustical Society of America, Vol. 36, 1970, 1055-1064. [11] H. Kucera and W,M. Francis, Computational analysis of present-day American English, Brown University Press, Rhode Island, 1967. [12] W.J.M. Levelt, "A survey of studies in sentence perception: 1970-1976", in: W.J.M. Levelt and G.B. Flores D'Arcais, eds., Studies in Sentence Perception, Wiley, New York, 1968. [13] A.J. Marcel, "Conscious and unconscious perception: An approach to the relations between phenomenal experience and perceptual processes", Cognitive Psychology, Vol. 15, 1983, pp. 238-300. [14] W.D. Marslen-Wilson, "Linguistic structure and speech shadowing at very short latencies", Nature, Vol. 244, 1973(a), pp. 522-523. [15] W.D. Marslen-Wilson, Speech shadowing and speech perception, Unpublished Ph.D. Thesis, Department of Psychology, MIT, 1973(b). [16] W.D. Marslen-Wilson, "Sentence perception as an interactive parallel process", Science, Vol. 189, 1975, pp. 226-228.
W.D. Marslen-Wilson / Shadowing and speech comprehension [17] W.D. Marslen-Wilson, "Function and process in spoken word-recognition", in: H. Bouma and D. Bouwhuis, eds., Attention and Performance X: Control of Language Processes, Erlbaum, Hillsdale, N.J., 1984. [18] W.D. Marslen-Wilson, "Linguistic descriptions and psychological assumptions in the study of sentence perception", in: R.J. Wales and E.C.T. Walker, eds., New Approaches to the study of language, North-Holland, Amsterdam, 1976. [19] W.D. Marslen-Wilson and L.K. Tyler, "The temporal structure of spoken language understanding", Cognition, Vol. 8, 1980, pp. 1-71. [20] W.D. Marslen-wilson, and L.K. Tyler, "Central processes in speech understanding", Philosophical Transactions of the Royal Society, Series B, Vol. 295, 1981, pp. 317322. [21] W.D. Marslen-Wilson and L.K. Tyler, "Reply to Cowart", Cognition, Vol. 15, 1983, pp. 227-235.
73
[22] W.D. Marslen-Wilson and A. Welsh, "Processing interactions and lexical access during word recognition in continuous speech", Cognitive Psychology, Vol. 10, 1978, pp. 29--63. [23] G.A. Miller and S. Isard, "Some perceptual consequences of linguistic rules", Journal of Verbal Learning and Verbal Behavior, Vol. 2, 1963, pp. 217-228. [24] A.M. Treisman, "The effects of redundancy and familiarity on translating and repeating back a foreign and a native language", British Journal of Psychology, Vol. 56, 1965, pp. 369-379. [25] UK. Tyler, "Serial and interactive theories of sentence processing", Theoretical Linguistics, Vol. 7, 1981 pp. 2965. [26] G. Underwood and N. Moray, "Shadowing and monitoring for selective attention", Quarterly Journal of Experimental Psychology, Vol. 23, 1971, pp. 284-305.
Vol. 4, Nos. 1-3, August 1985