Recognition of script-based inferences - Springer Link

3 downloads 356 Views 796KB Size Report
tion of script knowledge (Belezza & Bower, 1981;. Den Uyl ... observed false-alarm rates to support their assump- ..... wo er dann sein gebuchtes Ticket abholte.
Psychological

Psychol Res (1985) 4 7 : 5 9 - 6 7

Research © Springer-Verlag 1985

Recognition of script-basedinferences Riidiger Pohl l, Hans Colonius z , and Manfred Thiiring 3 I Tcchnische Universflfit Braunschweig. lnstitut ftir Psychologie, Spielmannstr. 19. D-3300 Braunschweig. FRG 2 Universit~it Oldenburg, D-2900 Oldenburg, FRG 3 Technische Universitfit BerLin, D-1000 Berlin. FRG

Summary. In experiments on the recognition of text information, subjects frequently "recognize" text information that had not explicitly been stated in the text but rather belonged to a script activated by the text. In this paper, we delineate a model attributing these false alarms to an increased activation level of the information in an abstract script memory store which is more or less separate from episodic memory. Observed RTs in a recognition experiment were consistent with a four-stage model, extending the approach of Bower, Black, & Turner (1979).

In recent years the concept of a script has been most influential for studies of text comprehension and text processing. Among the topics investigated are, for example, the serial order of script events (Bower, Black, & Turner, 1979; Galambos & Rips, 1982; Haberlandt & Bingham, 1984; Notenberg & Shoben, 1980), the typicality of script events (Graesser, 1978~ Belezza & Bower, 1982), the script structure (Mandler & Murphy 1983), the discrimination of script concepts (Walker & Yekovich, 1984) and the activation of script knowledge (Belezza & Bower, 1981; Den Uyl & Van Oostendorp, 1980). Schank & Abelson (1977) introduced the script as an organizational unit of text understanding. In the meantime, they have extended their view to make the use of script information more dynamic (Abelson, 1981: Schank, 1982).

Offprint requests to: R. Pohl

In this paper, a model for the recognition of scriptbased inferences is proposed which extends existing models in a particular way. Specifically, we introduce assumptions regarding a search and decision mechanism that allow an empirical test of the model by a reaction time study (reported below). Let us first review the main experimental findings relevant to our study and two models that have been advanced. In a sequence of experiments on the recognition of script information, Bower et al. (1979) observed that subjects frequently recognized text information that had not explicitly been stated in the text but that belonged to the script activated by the text. For example, after having read a text about a visit to the dentist, the subject might have said that the sentence "He opened the door to the waiting room" was in the text (by responding with "old") although this sentence had not actually been stated in the text. Moreover, the likelihood of these false alarms increased as a function of the number of different script versions (visit to the dentist, visit to the internist, etc.) read by the subject. Especially intriguing was the fact that a post-hoc analysis revealed that this effect also showed up with script information possessing no superficial similarity across the different script versions, but rather a functional identity. For example, having read "The nurse checked John's blood pressure and weight" in the internist version eventually increases the likelihood of saying "old" to "The dentist x-rayed Bill's teeth" in the dentist version although the latter sentence did not appear at all. Bower et al. called this kind of error abstract inferences. According to the partial-copy model of Bower et al., reading a text generates two different memory traces. On one

60 hand, an unelaborated version of the text is read into episodic memory. On the other, a script corresponding to the text is activated in a so-called knowledge store. Here, script actions mentioned in the text receive a higher activation level than script actions not stated explicitly. According to this model, the subject has two ways to differentiate a stated text fragment from an unstated one: either by finding it in episodic memory or by reading it out from the knowledge store, given its activation level there is higher than for the unmentioned items. Each script version read by the subject gives rise to a separate episodic memory block. At the same time, every version activates the same script in the knowledge store. The crucial assumption here is that activation accumulates across the different versions read, loosing information about which script actions have been activated by which text fragments. After reading several script versions, some unmentioned script actions may have a relatively high activation level - giving rise to the observed false alarms. In a related approach, Graesser (1978, 1981) holds that reading a text activates a script in an all-or-none fashion. The entire script (or schema) is copied into the specific memory trace. Recognition performance depends on the typicality of the script actions: Recognition should be better for actions that are unrelated to (or even inconsistent with) the underlying script than for actions that are typical. In the extreme, recognition probability should be zero for actions that are very typical for the script. Smith (1981) proposes that in addition to the actions mentioned in the text, only those inferences are stored in episodic memory that are necessary preconditions or consequences of mentioned actions. Both authors advance observed false-alarm rates to support their assumptions. One possible approach to get more information about the underlying processes is to make some explicit assumptions on the search and decision mechanism that imply prediction of reaction times. In the following, we delineate a model which is along the lines of the above authors. In the acquisition phase, the script versions of the input text are processed by the underlying abstract script and stored in independent episodic memory blocks. We leave it open for the moment whether inferences needed for processing are being kept activated only in script memory (like in Bower et al.'s knowledge store) or are also copied into episodic memory (like in Graesser's model).

R. Pohl et al.: Script recognition S t imuius : -cue (title) - test sentence

f

t

Do the concepts of ,"[~ the test sentence no belong to the t h e m e .

J

i yes I

,tes,l

Is the test sentence in episodic memory ?

[ i

[no

l

I

l l'i £esoonse: L . ~ ~ otd ((

'

F

Is the test sentence in script memory ?

l yes Is the activation

of the test sentence

[

high encuah "~

[no / J

I.~L

T ~)n e w

Response ~(

Fig. 1. The four decisions postulated in the recognition model

In the recogmtion phase, a maximum of four decisions has to be taken in processing (cf. Figure 1): 1) Does the sentence presented belong to the theme? 2) Is the sentence stored in episodic memory? 3) Is the sentence stored in script memory? 4) Is the activation of the sentence in script memory high enough for the sentence to have been presented before? As a first approximation, we simply assume that all four decisions take the same average amount of time no matter if the answer is positive or negative. Thus, according to this model, the time to decide if a presented sentence is "old" or "new" is simply a linear function of the number of decisions needed.

Method

Material In order to take up the approach of Bower, Black, & Turner (1979) we first had to develop the appropriate script material in German language. In a number of pilot studies, we created four different scripts with four versions each, thereby controlling typicality of script actions across the various versions as well as similarity among the script actions of a common

R. Pohl et al.: Script recognition

61

Table 1. Themes of scripts and versions used in the experiment Version Script

1

2

3

4

Veranstaltung (cultural event) Essen gehen (dine out)

Fugball (soccer game) Imbig (snack bar)

Kino (movies) Mensa (cafeteria)

Open-Air-Festival (open-air-festival) Restaurant (restaurant)

Verkehrsmittel (public transportation) Waschen (washing)

Bus (bus)

Eisenbahn (railroad)

Konzert (concert) Rastst~itte (highway motel) Flugzeug (airplane)

Baby (baby)

Geschirr (dishes)

Haare (hair)

Wiische (clothing)

Taxi (cab)

N o t e . English translations are given in parentheses. One of the texts used in the experiment

is shown in the appendix. The complete material is available from the first author.

script. First, twenty subjects were asked to write down typical actions and events for four versions of the following script situations: Veranstaltung besuchen (attend a cultural event), Essen gehen (dine out), dffentliches Verkehrsmittel benutzen (use public transportation) and etwas waschen (washing). Second, we determined six abstract inferences for each of the four scripts along the following lines: In each of the four versions of a common script one sentence was to describe an action with the same role in the underlying script. For example, in the public transportation script a sentence like "He watched for the bus" in the bus version has the same role as the sentence "He was waiting for a taxi" in the taxi version. In what follows, these propositions are called same-function (SF) propositions. In order to aviod recognition errors based solely on semantic similarity among SF propositions, care was taken to find sentences having different contents but an analogous role in the script (an example set of sentences together with the generating rule is in the appendix). Thus, all stories (script versions) were rewritten with six SF propositions and eight other propositions each, the latter being called different-function (DF) sentences, because none of them share the same underlying script function. Third, ten subjects had to judge the typicality of all sentences for the situation referred to in the headlines of the stories. On a six-point scale (1 -- "very typical" to 6 -- "very untypical") mean typicality for SF propositions was not different from mean typicality for DF propositions. We added 'new' DF propositions as distractors containing possible but atypical actions that were subse-

quently rated as 'not typical' (with a mean of 4.5 on the scale). This result was consistent for all 16 versions. Fourth, we tested to see if subjects were able to recognize the identical role of propositions with different semantic contents. Fifteen subjects were asked to cluster similar sentences across the four different versions of a script. Eighty-eight percent of the clusterings corresponded to the previously constructed abstract inferences. For the main experiment, we constructed three different forms of each of the 16 versions. These forms differed with respect to the SF propositions: Form A contained all six SF propositions, Form B only three (randomly selected), and Form C contained the remaining three not in Form B. Only versions of Form B and C were tested afterwards. Thus, the test material consisted of six SF propositions of a version (three of them 'old' and three of them 'new'), three randomly selected old DF propositions, and three new DF propositions that were atypical for the script. In order to assure that each subject read a different number of versions of each script, we defined four subject subgroups and assigned them to the factors 'number of versions read' and 'script' according to a sequentially balanced Latin square. Each subject subgroup consisted of eight randomly assigned subjects. Apart from these constraints, all stories were presented and tested in random order. The Latin square was repeated identically for the four factor level combinations of the factors 'test probe' and 'proposition type' thus yielding a repeated measurement design for these factors.

62

Subjects The 32 subjects (22 female, 10 male) were students of Braunschweig University and were paid for their participation in the experiment, which lasted about 1 hour.

Procedure Subjects were sitting in front of a VT-100 terminal. In the initial phase of the experiment, ten stories were presented, each short enough to fit on a single screen. The subjects were instructed to read the text in order to be able to answer questions concerning the content of the stories. After an intervening task of about 15 minutes a recognition test was performed. For each of the stories that were presented in Form B or C, 12 sentences were presented to the subjects in random order: 6 'old' and 6 'new' sentences. The headline of the respective story appeared at the top of the screen five seconds before the first test sentence and stayed until the last sentence. Subjects had to decide if the sentence had already appeared in the first part of the experiment or not. They had two buttons and reaction time was measured from the appearance of the sentence on the screen until one of the buttons was pressed. After this, the subjects had to indicate on a four-point rating scale how sure they were of their decision. Thus, the recognition test yielded three dependent variables: 1) the percentage of correct answers, 2) their reaction times, and 3) confidence ratings.

Specific Hypotheses a) Number of errors The new different-function (DF) sentences that we constructed do not belong to any script version while all other test sentences contain typical script information. Thus, a typicality effect is to be expected: The recognition error rate for typical script information should be higher than for atypical information. Moreover, as the number of versions read by the subject increases, the false-alarm rate for new samefunction (SF) sentences should increase as well because activation of the corresponding abstract inferences in the knowledge store rises. Old SF sentences, however, should not show this activation effect be-

R. Pohl et al.: Script recognition

cause all versions are stored separately in independent episodic memory stores. The same holds true for old DF sentences.

b) Confidence ratings Confidence in false alarms with new same-function sentences should be an increasing function of the number of script versions read since, according to the presented model, activation of abstract script memory increases, too. Averaging confidence ratings over false alarms and correct rejections (with new SF sentences) should shift the mean of the ratings in the direction of "sure old" as the number of versions read increases. In all other cases, the ratings should not depend on the number of versions read.

c) Reaction times New DF sentences are not part of the activated script. Thus, they can, in principle, be rejected without detailed memory search in the first part of the recognition phase (cf. Figure 1). They contain concepts that do not belong to the script version activated by the headline. Old sentences will, in general, be found in episodic memory. Thus, mean RT should be about the same for old SF and old DF sentences. Finally, new SF sentences can only be rejected after unsuccessful search of episodic and script memory. This implies longest RTs for new SF sentences. To summarize, for mean RTs of correct answers we postulate: RT("new"/new DF) -< RT("old"/old DF) RT("old"/old SF) RT("old"/new SF).

.30 ~J

We have no hypotheses for RTs of false alarms with new DF sentences not belonging to the script. There is another divergence in the predictions of the two different encoding theories as concerns RTs for correct and incorrect responses with new SF sentences: 1) If there are no inferences in episodic memory, RTs are expected to be equal for correct and incorrect responses because both involve assessing the activation level in script memory. 2) In the other case, correct responses should take longer than incorrect responses because false alarms can already occur during episodic memory search while correct rejections only follow from assessing script memory activation. Since it is assumed that all script versions are stored in independent episodic memory blocks with direct access from corresponding cues (headlines), there should be no RT differences as a function of the number of versions read. The amount of information in script memory is also constant.

.20 .i0 0

f

I

f

T

1

2

3

4

Number of Versions Read

Fig. 2. Mean false alarm rates for the new same-function sentences of the four scripts as a function of the number of versions read

alarm rates for new SF sentences slightly increase with the number of versions read: 17% with one or two versions, 20% with three, and 22% with four versions (Chi2(3) = 2.08, p' > 0.10). According to our hypothesis, there is no effect of the number of versions on the error rates for old SF and old DF sentences (Chi2(3) = 5.05 and 1.17, respectively, p' > 0.10). For new SF sentences, the error rates for the individual scripts have various trends (cf. Figure 2), but there are no statistically significant effects of the number of versions factor (Chi2(3)= 4.27, 0.69, 1.01, and 1.17, respectively, for the four scripts, p' > 0.10).

Results

Error rates

Confidence ratings

The overall error rate is 15%. Error rates for old and new SF sentences were 20% and 19%, error rates for old and new DF sentences 19% and 2%, respectively. These differences were tested by Chi square with adjusted error probability p' = 14/). Significant differences exist between sentence and probe types (ChiZ(3) = 108.85, p' < 0.014) and among the four scripts (Chi2(3) = 17.36,p' < 0.014). The last difference is based solely on differences among new SF sentences (Chi2(3)= 42.12, p ' < 0.014, see Figure 2). Old SF and old DF sentences do not differ over scripts (Chi2(3) = 2.94 and 2.82, respectively, p' > 0.10). New DF sentences had error rates too small for testing. The factors 'subject group' and 'number of versions' had no effect on error rates (Chi:(3) = 8.58 and 1.01, respectively, p' > 0.10). Nonetheless, false

The subjects made rather different use of the four rating levels. This forced us to leave out an analysis of ratings for new SF sentences do increase as a function of the number of versions read in accordance with the observation of Bower et al. (1979). To this end, we adjoined the two four-level rating scales for 'yes' and 'no' responses to construct an eight-level scale ranging from 1 = "very sure new" to 8 = "very sure old". The mean ratings for new SF sentences then are 2.48, 2.48, 2.74, and 2.75 for 1, 2, 3, and 4 versions, respectively. Compared to Bower et al.'s result, this reduction in confidence (that it is a new sentence) as a function of the number of versions read is much smaller. Things change slightly if the 'attend a cultural event' script is excluded from the analysis: The mean ratings then are 2.60, 2.73, 2.90, and 3.11. This suggests that the stories that made up the four vet-

64 sions of the 'cultural event' script were too different to be considered as belonging to the same underlying script (see Table 1). The eight-level ratings were used to determine nonparametrically ROC curves for each subject under each condition. The area under the ROC curve, P(A), is a measure of discriminability (old vs. new). In order to prepare the data for an analysis of variance, we used an arcsine transformation (cf. McNicol, 1972); R' = 2 , arcsine X/P(A). The transformed data were subjected to a five-way analysis of variance (an extended version of Plan 12, Winer, 1971). All factors were assumed to be fixed except the subject factor (see the discussion in Wike and Church, 1976, and Kieras, 1981). Discriminability for SF sentences was worse than for DF sentences, F(1,28) = 27.80, p < 0.001. There was a significant 'sentence type'/'script'-interaction, F(3,84)= 8.89, p < 0.001; the script factor alone not being significant, F(3,84) = 2.13, p > 0.10. Next, we split the data into SF and. DF sentences and performed two separate four-way analyses of variance for each subset. For SF sentences, the factor 'scripts' and its interaction with 'number of versions read' were significant, F(3,84)= 18.80, p < 0.001, and F(6,84) = 12.90, p < 0.001, respectively. The factor 'number of versions read', however, was not significant, F(3,84) = 2.60,p < 0.08. For DF sentences, there were no significant effects.

Reaction times

With the error rate being relatively high (15% on the average) and significantly different across conditions, it did not seem feasible to treat the incorrect responses as missing data and replace them by estimates. Alternatively, we averaged RTs for correct responses per subject/condition over all presented sentences. Since we had not advanced any hypothesis for the 'script' factor, this factor was not taken into consideration. Moreover, we cut off 0.9% of the data lying outside of the range of four standard deviations per cell. Then, a three-way analysis of variance with repeated measurements over all factors (old/new, SF/DF, number of versions) was performed with the remaining data. a) For correct RTs we found a significant SF/DF factor, F(1,31) = 28.81, p < 0.001, and a significant interaction of this factor with the old/new factor, F(1,31) = 31.26, p' < 0.001. DF sentences were

R. P o h l et al.: Script r e c o g n i t i o n

F(1,31) = 31.26,p' < 0.001. DFsentenceswerefaster than SF sentences (2135 vs. 2396 ms), new DF sentences faster than old ones (1990 vs. 2279 ms), while old SF sentences were faster than new ones (2304 vs. 2488 ms). All these differences are in the expected direction; however, except for the maximum difference (1990 vs. 2488 ms), they do not reach significance. To summarize, there are no important deviations from the predictions. 10b) Since RTs for incorrect responses were too scarce to be included in the analysis of variance, we decided to test our predictions by t-tests with adjusted alpha-error (p' = 2p). However, observed mean RTs of incorrect answers with old and new SF sentences (3143 and 3082 ms, respectively) were not significantly different (t(28) = 0.80,p' > 0.10). Consequently, we could not decide between the above two encoding alternatives for episodic memory on the basis of incorrect responses. c) Comparing mean RTs for correct and incorrect responses with new SF sentences, however, allows for excluding the second encoding alternative: Mean RTs for correct responses are shorter than for incorrect responses (2467 and 3082 ms, respectively; t(30) = 4.08, p' < 0.002). d) Mean RTs for correct responses should be the same for all numbers of versions read. This is true for all sentence and probe types (see Table 2). The factor 'number of versions' was not significant in the analysis of variance nor was any of the interactions of this factor with the other factors, F < 1 in all cases. RTs for the four different scripts did not differ essentially, either.

T a b l e 2. Mean r e a c t i o n t i m e s (in ms) o f c o r r e c t r e s p o n s e s as a f u n c t i o n o f t h e n u m b e r o f versions r e a d N u m b e r o f versions r e a d Sentence

1

2

3

4

SF DF

2315 2340

2245 2237

2338 2337

2265 2324

SF DF

2540 1988

2512 2022

2432 2030

2414 2006

2282

2245

2268

2238

old

new

Total mean

Note. SF = s a m e - f u n c t i o n s e n t e n c e s ; D F = d i f f e r e n t - f u n c t i o n sentences.

R. Pohl et al.: Script recognition

Discussion

Typicality The error rate for new DF sentences, that is, sentences that were not typical for the script, was near to zero, while it reached about 20% for the other sentence types. Thus, it seems that subjects have dif]iculty with typical sentences in telling old ones from new ones. Atypical sentences can be recognized much easier. This is in accordance with the observations of Graesser (1978, 1981), Graesser, Gordon, & Sawyer (1979), and Graesser, Woll, Kowalski, & Smith (1980). We suspect this result to be in part due to the fact that the stories read by the subjects did not contain any atypical sentences. Thus, in the recognition phase every atypical sentence could be judged as being 'new' without searching through the memory structure of the corresponding story. This was not possible for the typical sentences. The observed RTs support this interpretation: Atypical sentences were recognized faster than typical ones. This is consistent with assuming a prior thematic decision process as proposed in our recognition model. This process assesses the likelihood of the sentence probe belonging to the script version that has just been activated by the headline.

Script organization The script Veranstaltung (attend a cultural event) turned out to be the easiest (12% errors), the script Waschen (washing) was the most difficult one (20% errors). This difference is most obvious with new SF sentences (see Figure 2). This suggests a difference in the way the stories are organized in memory: The different versions of the Veranstaltung script (see Table 1) may well have been stored in independent episodic memory blocks without a common 'meta'script, while the versions of the other scripts, being more homogeneous, are interrelated by a common script memory representation as proposed in the model, The Waschen script is different from the others in that it does not describe social interactions but siresimple, instrumental, well-learned manipulations of a rather boring nature. The higher error rates for these stories may simply be an attention effect. Schank and Abelson (1977) assume that in the case of these instrumental scripts only the headline is stored,

65

forcing the subject to reconstruct the whole script afterwards. This, too, would cause the error rate to increase. The analyses of variance of the discrimination data resulted in significant differences in the script factor only with SF sentences. This, too, supports the assumption that the four versions of the scripts were similar to each other to a different degree across the four scripts. Mean discrimination rates indicate most dissimilar versions for the Veranstaltung script (best discrimination: R' = 2.55), most similar versions for the Waschen script (worst discrimination: R' = 1.98). Similarity here corresponds to the probability that a common script underlies processing of the different versions. Observed RTs are in the same direction; for new SF sentences, mean RTs for correct responses were 2280, 2423, 2554, and 2683 ms for the scripts Veranstaltung, Essen gehen, Verkehrsmittel, and Waschen, respectively. These results demonstrate that the range of valid prediction by the above recognition model is rather limited. There is an upper limit of abstraction beyond which the existence of hierarchically structured script processing is doubtful, as is the case for the Veranstaltung script. This corresponds to Schank's scepticisms concerning the existence of an abstract 'Visit-HealthProfessional' script. To be more precise, the human information processor may well be able to construct these abstract structures. However, they do not seem .to be essential in the process of comprehending stories about well known everyday activities.

Number of versions read Discrimination performance with SF sentences was not significant for the 'number of versions' factor (though p < 0.08). Moreover, performance did not decrease as a function of the number of versions as observed by Bower, Black, & Turner (1979). On the other hand, inspecting the false alarm rates with new SF sentences shows an increase with three of the four scripts used (see Figure 2). Unfortunately, these differences do not reach significance. One explanation may be that our material was more abstract and contained more dissimilar versions than that used by Bower, Black, & Turner. Another could be our way of gathering the data. Before the rating task, our subjects had to make a yes/no decision. It turned out that the rating was not very sensitive, that is, most decisions were judged 'very sure'. This may be the

66

reason why there is no clear trend of an effect of the number of versions in the rating with new SF sentences. Mean RTs for correct answers do not depend on the number of versions, either (Table 2). This, at least, is consistent with the view that the script versions were stored in independent episodic memory blocks as suggested by the above recognition model.

Inferences Old SF sentences were judged to be 'new' in 20% of the cases, while 19% of the new SF sentences were judged to be 'old'; that is, the number of SF sentences that were forgotten is about the same as the number of SF sentences that were erroneously 'recognized'. The predictions for the recognition model for mean RTs of these incorrect responses depend on the processing assumptions: If it is assumed that in the initial encoding phase inferences are directly (and erroneously) copied into episodic memory, then false alarms with new SF sentences that relate to these inferences should be faster than correct rejections. The point here is that there is a reasonable chance that false alarms are made already in the second phase of the recognition process, while correct rejection may have to wait until the fourth phase is reached (see Figure 1). Observed RTs do not support this assumption, however. False alarms took much longer than correct rejections, t(30) = 4.08; p' < 0.002. This result is contrary to Graesser's version of the model, In a later version of this model, Graesser and Nakamura (1982) specify that only the part of the script which is especially important for comprehension is being copied into episodic memory. However, this does not modify the predicted relationship between the RTs in our study since all SF sentences were very typical for the respective version. To summarize, there are two main conclusions to be made from this study. a) Oberservd RTs were consistent with the recognition model outlined above, it should thus be kept as a working hypothesis. Moreover, the data supported the assumption of Bower, Black, & Turner (1979), that inferences needed for processing are only activated in the abstract script memory and not also copied into episodic memory, as Graesser claimed. b) Contrary to Bower et al.'s observation, we found no reliable influence of the number of versions read on recognition performance. Aside from methodological differences, this seemes to be essentially

R. Pohl et al.: Script recognition

due to the text material used in our study. Our script versions, which were more heterogeneous, rather correspond to Bower et al.'s meta-scripts. Moreover, there were considerable differences among our four scripts. In sum, our results should not be seen as definitive. Rather, it is a preliminary study that suggests the need for further investigation. Specifically, heterogeneity of script versions could be introduced as an independent variable. Our prediction, then, would be that the effect of the number of versions read is proportional to the homogeneity of the script versions.

Acknowledgement. This research was supported in part by a grant from the Deutsche Forschungsgemcinschaft to Karl F. Wender. We gratefully acknowledge the assistance of llka Grol3naann and Edmund Eberleh in the data collection and analysis. We also thank the members of the Braunschweig Cognitive Research Group for their helplul suggestions and criticism.

Appendix Example of a story used in the experiment

abe (DF) a-c (SF)

Frank fliegt mit dem Flugzeug (Frank takes an airplane) Als Frank am Flughafen ankara, (When Frank arrived at the airport,) sah er sich zunfichst nach dem richtigen Schalter urn,

abe (DF) abe (DF)

-be (SF) abe (DF) a-c (SF) abc (DF) a-c (SF) -be (SF) abe (DF) abe (DF)

-bc (SF) abe (DF)

(he first looked for the correct counter) wo er dann sein gebuchtes Ticket abholte. (to get the ticket that he had booked.) Frank iiberpriifte die Abflugzeit auf der Anzeigentafel. (Frank checked the time of departure on the time schedule. Sein Gep~ick wurde aufs FlieBband gestellt. (His luggage was put on an conveyer belt.) Dann passierte Frank die Zollkontrolle, (Then Frank passed the security check,) beobachtete startende und landende Flugzeuge (watched some planes starting and landing,) und begab sich schlieBlich zu seiner Maschine. (and finally boarded his jet.) Er ging durch die engen Sitzreihen zu seinem Platz. (He walked along the narrow aisles to his seat.) Dann leuchtete das Zeichen zum Anschnallen auf. (Then he was told to fasten his seat belt.) Das Flugzeug gewann schnell an H6he, (The plane quickly gained height) und Frank ffihlte sich zunfichst ein wenig schwindlig. (and Frank felt a bit dizzy at first.) SchlieBlich machte er sich zur Landung fertig. (Finally he got ready for the landing.) Nach der Landung ffihlte er sich wieder besser. (Back on earth he felt better again.)

R. Pohl et al.: Script recognition

Note. English translations are given in parentheses. The letters at the beginning of each line denote in which of the three forms the specific sentence was stated. DF - different-function sentence; SF = same-function sentence.

Generating rule for same-fZmction sentences In order to exclude false alarms to new same-function sentences caused by similar wording SF sentences of each~script differed in the main relation (verb) and at least two arguments (subject and object). Synonyms and paraphrases were not counted as different wording.

References Abelson, R.P. (1981). The psychological status of the script concept. American Psychologist, 36, 715-729. Bellezza, F.S., & Bower, G.H. (1981). The representation and processing characteristics of scripts. Bulletin of the Psychonomic Society, 18, 1-4. Bellezza, F.S., & Bower, G.H. (1982). Remembering scriptbased text. Poetics, 11, 1-23. Bower, G.H., Black, J.B., & Turner, T.J. (1979). Scripts in memory for text. Cognitive Psychology, 11, 177-220. Den Uyl, M., & Van Oostendorp, H. (1980). The use of scripts in text comprehension. Poetics, 9, 275-294. Galambos, J.A., & Rips, L.J. (1982). Memory for routines.

Journal of Verbal Learning and Verbal Behavior, 21,260-281. Graesser, A.C. (1978). How to catch a fish: The memory and representation of common procedures. Discourse Processes, 1, 72-89. Graesser, A.C. (1981). Prose comprehension beyond the word. New York: Springer. Graesser, A.C., Gordon, S.E., & Sawyer, J.D. (1979). Recognition memory for typcial and atypical actions in scripted activities: Tests of a script pointer plus tag hypothesis. Journal of Verbal Learning and Verbal Behavior, 18, 319-332. Graesser, A.C., & Nakamura, G.V. (1982). The impact of a schema on comprehension and memory. In G.H. Bower (Ed.), The psychology of learning and motivation (Vol. 16, pp. 59-109). New York: Academic Press.

67 Graesser, A.C., Woll, S.B., Kowalski, D.J., & Smith, D.A. (1980). Memory for typical and atypical actions in scripted activites. Journal of Experimental Psychology, 6, 503-515. Haberlandt, K., & Bingham, G., (1984). The effect of input direction on the processing of script statements. Journal of Verbal Learning and Verbal Behavior, 23, 162-177. Kieras, D.E. (1981). Component processes in the comprehension of simple prose. Journal of Verbal Learning and Verbal Behavior, 20, 1-23. Mandler, J .M., & Murphy, C.M (1983). Subjective judgements on script structure. Journal of Experimental Psychology: Learning, Memory, and Cognition, 9, 534-543. McNicol, D. (1972). A primer of signal detection theory. London: Allen & Unwin, Ltd. Notenberg, G., & Shoben, E.J. (1980). Scripts as linear orders. Journal of Experimental Social Psychology, 16, 329-347. Schank, R.C. (1981). Language and memory. In D.A. Norman (Ed.), Perspectives on cognitive science (pp. 105-146). Norwood, N J: Ablex. Schank, R.C. (1982). Dynamic memory. Cambridge University Press. Schank, R.C., & Abelson, R.P. (1977). Scripts', plans, goals, and understanding. Hillsdale, N J: Erlbaum. Smith, D.A. (1981). What schema-relevant inferences are passed to the memory representation of text? Unpublished master's thesis, California State University, Fullerton, CA. Walker, C.H., & Yekovich, F.R. (1984). Script based inferences: Effects of text and knowledge variables on recognition memory. Journal of Verbal Learning and Verbal Behavior, 23, 357-370. Wike, E.L., & Church, J.D. (1976). Comments on Clark's "The language-as-fixed-effect fallacy". Journal of Verbal Learning and Verbal Behavior, 15, 24%255. Winer, B.J. (197l). Statistical principles in experimental design (2nd. ed.). Tokyo: McGraw-Hill.

Received August 30, 1984/October 3, 1984

Suggest Documents