Running head: SOURCES OF RELATIVE CLAUSE PROCESSING DIFFICULTY 1
Sources of relative clause processing difficulty: Evidence from Russian
Iya K. Price and Jeffrey Witzel
Department of Linguistics and TESOL, University of Texas at Arlington, 701 Planetarium Place, (Mailbox 19559), Arlington, TX, 76019, USA
Corresponding Author: Iya K. Price E-mail:
[email protected] Telephone number: (817) 272-3133
[to appear in the Journal of Memory and Language]
SOURCES OF RELATIVE CLAUSE PROCESSING DIFFICULTY
2
Abstract This study investigates the sources of processing difficulty in complex sentences involving relative clauses (RCs). Self-paced reading and eye tracking were used to test the comprehension of Russian subject- and object-extracted RCs (SRCs and ORCs) that had the same word-order configuration, but different noun phrase (NP) types (full NPs vs. pronouns) in the embedded clause. In both SRCs and ORCs, this NP intervened between the modified noun and the RC verb. A corpus analysis and acceptability rating experiment indicated different frequency/preference profiles for this word order depending on RC type and embedded NP type. In line with these profiles, processing difficulty was revealed early in the embedded clause for less frequent/dispreferred constructions. Later in the embedded clause, the processing of the RC verb was comparable for both SRCs and ORCs when the same number of NP arguments was available for integration. While there were no indications of an ORC penalty at or after this verb, late-stage comprehension difficulty was found for full-NP ORCs, but not for their pronominal counterparts, suggesting that similarity-based interference in combination with ORC structure influences the overall comprehension of these sentences. Taken together, these findings support a hybrid model under which independent sources of processing difficulty affect different stages of RC comprehension.
Keywords: sentence processing, relative clauses, Russian, self-paced reading, eye tracking, corpus
SOURCES OF RELATIVE CLAUSE PROCESSING DIFFICULTY
3
Sources of relative clause processing difficulty: Evidence from Russian During language comprehension, different types of information contribute to the interpretation of sentences. These include syntactic and semantic information as well as information about the frequency of occurrence of certain structures. One way to determine how these information types are used during real-time sentence comprehension is to investigate sentences that cause processing disruptions. Examples of such sentences are those that involve relative clauses (RCs) (1a-b). An RC is a subordinate clause that typically modifies a noun phrase (NP). Under standard syntactic analyses, RCs contain an extracted constituent that is linked with the modified NP (Heim & Kratzer, 1998). Example 1a shows a case in which the extracted constituent (__) is the subject, or a subject-extracted RC (SRC), while 1b provides an example of an object-extracted RC (ORC). 1.
a. The reporter [that __ attacked the senator] admitted the error. b. The reporter [that the senator attacked __ ] admitted the error.
Generally, research has shown that ORCs are more difficult to comprehend than SRCs (e.g., Gibson, 1998, 2000; King & Just, 1991; Staub, 2010; Staub, Dillon, & Clifton, 2017; Traxler, Morris, & Seely, 2002). This is the case not only in English, but also in other languages, including Chinese (Lin & Bever, 2006; Vasishth, Chen, Li, & Guo, 2013, but see Hsiao & Gibson, 2003), Dutch (Mak, Vonk, & Schriefers, 2002), Hungarian (MacWhinney & Pleh, 1998), and Japanese (Miyamoto & Nakamura, 2003; Nakamura & Miyamoto, 2013). Understanding the nature of this disparity -- and more generally, what makes some RCs more difficult to process than others -- has the potential to shed light on fundamental properties of the language processing system, including how different sources of information contribute to online sentence comprehension.
SOURCES OF RELATIVE CLAUSE PROCESSING DIFFICULTY
4
A number of models of RC processing have been proposed. These accounts attribute processing costs in these sentences to different sources, including subject-object structural asymmetries (Clifton & Frazier, 1989; Hawkins, 1999; Lin & Bever, 2006; MacWhinney & Pleh, 1998; O’Grady, 1997; Townsend & Bever, 2001; Traxler et al., 2002; Traxler, Williams, Blozis, & Morris, 2005), syntactic expectations (Hale, 2001; Levy, 2008; MacDonald & Christiansen, 2002; Reali & Christiansen, 2007), memory costs (Gibson, 1998, 2000; Gordon, Hendrick, & Johnson, 2001; Gordon, Hendrick, & Johnson, 2004; Gordon, Hendrick, Johnson, & Lee, 2006; Gordon, Hendrick, & Levine, 2002; Johnson, Lowder, & Gordon, 2011; King & Just, 1991; Lewis & Vasishth, 2005; Van Dyke & McElree, 2006), or a combination of these factors (Levy, Fedorenko, & Gibson, 2013; Staub, 2010; Staub et al., 2017). One way to test among these models is to examine where difficulty occurs during the incremental processing of RC sentences (Gibson, 1998, 2000; Gordon et al., 2001; Grodner & Gibson, 2005; Levy et al., 2013; Staub, 2010; Staub et al., 2017). Indeed, these models often predict processing costs at different points in the clause. For instance, expectation-based models predict processing difficulty at the first indication that the RC involves a less frequent construction, with ORCs generally occurring less frequently than SRCs. Memory-based accounts, on the other hand, predict additional processing time when arguments are integrated at the RC verb, which often occurs across greater distance and over more potentially interfering material in ORCs compared to SRCs. However, in many languages, such as English, it is difficult to test among these competing accounts because of word-order differences between SRCs and ORCs (as in the examples above). These disparities make it difficult to compare the relevant regions of these clauses and to tease out the nature of observed processing differences. Russian offers a potential solution to this problem because it has a relatively flexible word order. This makes it possible for Russian SRCs and ORCs to have
SOURCES OF RELATIVE CLAUSE PROCESSING DIFFICULTY
5
the same lexical material in the RC in the same linear order (with case-marking distinguishing between the RC types), allowing for clearer comparisons of the processing of these clauses. The present study took advantage of this word-order flexibility to examine potential sources of processing difficulty in RC sentences. Specifically, using self-paced reading (SPR) and eye tracking, this study investigated the online processing of Russian SRC and ORC sentences in which an NP argument intervened between the modified noun and the RC verb in both sentence types. This created a configuration in which the same number of NP arguments was available for integration at the RC verb in both SRCs and ORCs. This design thus allowed for an examination into whether RC processing difficulty relates to integration -- in which case, there should be comparable memory costs for SRCs and ORCs when the number of integrated NPs is held constant -- or to structural asymmetries -- in which case, there should be particular processing difficulty for ORCs even under these conditions. Furthermore, the influence of syntactic expectations was investigated by using full NPs and pronouns in the embedded clause. As indicated by a corpus analysis and an offline acceptability rating experiment, these NP types are associated with very different word-order frequencies/preferences. This made it possible to investigate the role of expectation-based processing in these sentence types while again holding word order constant. In these ways, the present study attempted to assess different potential sources of processing difficulty in RC sentences and thus to test among competing models of the processing of these sentence types. The first models of interest are those that attribute asymmetries in RC processing to structural differences. For example, the incremental minimalist parser theory (Lin & Bever, 2006) posits that processing difficulty for ORCs is due to differences in extracting from subject and object positions. This account holds that SRCs are easier to comprehend due to the shorter
SOURCES OF RELATIVE CLAUSE PROCESSING DIFFICULTY
6
structural distance between the extracted constituent and its extraction site (see also Hawkins, 1999; O’Grady, 1997). Other accounts attribute this ORC penalty to a preference for analyzing the modified noun as the subject of the RC (Clifton & Frazier, 1989; Traxler et al., 2002, 2005), which results in the correct analysis for SRCs, but not for ORCs. Under such models, processing difficulty for ORCs relates to structural reanalysis. Another structure-based model is the perspective maintenance account (MacWhinney & Pleh, 1998), which explains processing difficulty for RCs in terms of shifts in the perspective of the subject. For example, in sentences with subject-modifying SRCs as in 1a, the main-clause subject (the reporter) is also the subject of the RC. In sentences with subject-modifying ORCs as in 1b, however, the perspective of the subject shifts because the main-clause subject is the object of the RC. Other models explain RC processing asymmetries in terms of word-order heuristics (Holmes & O’Regan, 1981; Townsend & Bever, 2001). Under these accounts, an initial parse provisionally assigns thematic roles by mapping the input onto canonical word-order templates. In the case of English, the ease of processing SRCs is attributed to the fact that they conform to the canonical SVO/agent-actionpatient word order for this language. Similar to models that suggest an important role for word-order-based heuristics are those that attribute processing costs in RCs to the frequencies of constructions in the language and the experience of language users. For instance, expectation-based theories (Hale, 2001; Levy, 2008) predict processing difficulty when unexpected constructions are encountered. Generally, in fullNP RCs -- i.e., RCs in which the embedded NP is a full descriptive noun (e.g., the senator in 1ab) -- ORCs are less frequent than SRCs (Gordon & Hendrick, 2005; Reali & Christiansen, 2007). Under expectation-based models, processing difficulty for these ORCs is attributed to this frequency disparity. Support for such models comes from studies showing a switch in the SRC-
SOURCES OF RELATIVE CLAUSE PROCESSING DIFFICULTY
7
ORC processing asymmetry when ORCs occur more frequently than SRCs. In English, this is the case with pronominal RCs -- i.e., when the NP in the RC is a pronoun as in 2a-b (Reali & Christiansen, 2007; Roland, Dick, & Elman, 2007). 2.
a. The reporter [that __ attacked you] admitted the error. b. The reporter [that you attacked __ ] admitted the error.
In line with this frequency difference, Reali and Christiansen (2007) showed that pronominal ORCs (2b) were easier to process than pronominal SRCs (2a). This was taken to indicate that expectation-based accounts make more accurate predictions for RC processing difficulty than structural asymmetry models, which cannot easily explain these findings. Another class of accounts attributes RC processing differences to working memory effects. Specifically, these accounts explain RC processing costs in terms of the encoding, storage, and structural integration of the NPs involved in the clause. For example, according to the dependency locality theory (DLT) (Gibson, 1998, 2000; Warren & Gibson, 2002), integration costs are generally higher in ORCs (1b) because an additional discourse referent (the senator) is introduced before the dependency between the modified noun (the reporter) and RC verb (attacked) can be resolved. Similarity-based interference accounts (Gordon et al., 2001, 2002, 2004, 2006), on the other hand, posit that RC comprehension is hindered when the sentence requires similar NPs to be encoded and held in working memory before they can be integrated with the verb, as is often the case in ORCs (Gordon et al., 2006). Comparably, cue-based retrieval accounts explain this interference in terms of the argument requirements of the RC verb (Lewis & Vasishth, 2005; Van Dyke & Lewis, 2003; Van Dyke & McElree, 2006). Under these accounts, the integration of an RC verb like attacked with its arguments involves setting retrieval
SOURCES OF RELATIVE CLAUSE PROCESSING DIFFICULTY
8
cues for NPs that can act as its agent and patient. RC processing difficulty arises when there is more than one candidate NP for these roles. It is important to note that the absence of processing disruptions for pronominal ORCs, which expectation-based models attribute to their relatively high frequency, receives different explanations under these memory-based accounts. Under the DLT, processing difficulty for pronominal ORCs like (2b) is reduced because the referent of the RC pronoun is easily accessible, making it easier to integrate over (Gibson, 2000; Warren & Gibson, 2002). Similarity-based interference accounts, however, posit that these pronominal ORCs are relatively easy to process due to the dissimilarity between the modified descriptive noun and the RC pronoun (Gordon et al., 2001). Along the same lines, cue-based retrieval accounts explain the reduced difficulty for pronominal ORCs in terms of the ease of distinguishing candidate NPs during retrieval (Van Dyke & McElree, 2006). While all of these models predict processing difficulty for similar RC types, it is important to note that they often make different predictions about the locus of these costs during incremental processing. Take for example the RC type that is predicted to cause processing disruptions under all of these models -- full-NP ORCs in English, as in 1b. Some structure-based accounts predict difficulty at the point of establishing the connection between the extracted constituent and its extraction site (e.g., Lin & Bever, 2006), which in these sentences is the RC verb (attacked); others predict difficulty at words that can trigger structural reanalysis, which in this case is the RC NP (the senator) (e.g., Clifton & Frazier, 1989; Traxler et al., 2002, 2005). Expectation-based theories are more uniform regarding the predicted locus of processing disruptions. Under these models, processing difficulty should occur when the unexpected RC construction is first encountered, which for full-NP ORCs in English is the RC NP (the senator).
SOURCES OF RELATIVE CLAUSE PROCESSING DIFFICULTY
9
These accounts also predict that the verb (attacked) in these ORCs should be processed faster than in comparable SRCs. This is because after seeing the RC subject (the senator), the comprehender should have a strong expectation for an RC verb (Hale, 2001; Levy, 2008). Most memory-based accounts, however, predict processing difficulty at this verb for the ORCs in question (e.g., Gibson, 1998, 2000; Gordon et al., 2001, 2002, 2004, 2006; Lewis & Vasishth, 2005; Van Dyke & McElree, 2006). Under these models, this RC verb is the point of retrieval and integration for the NPs involved in the clause. In addition, several of these models predict that if RC processing costs are partially encoding-based, difficulty should also be observed at the RC NP, i.e., at the second similar NP before the verb (Gordon & Lowder, 2012; see also Johnson et al., 2011; Lewis & Vasishth, 2005; Vasishth, 2011). Studies that have investigated the loci of these effects, however, have revealed processing difficulty at different points in these ORC sentences. Some have observed processing costs early in the embedded clause, at the RC NP (Forster, Guerrera, & Elliot, 2009; Gennari & MacDonald, 2008), while others have found that these costs are concentrated at and after the RC verb (Gordon et al., 2001; Grodner & Gibson, 2005; Johnson et al., 2011). Still other studies have found processing difficulty at both of these points. For example, Staub (2010) tested SRC and ORC sentences as in 1a and 1b in a set of eye-tracking experiments. The results revealed processing difficulty both at the NP beginning the ORC and at the integrating ORC verb. Moreover, these difficulties were characterized by qualitatively different reading patterns (for comparable results, see also Staub et al., 2017). These findings were taken to indicate that both expectation- and memory-based processes play key roles during the incremental processing of RC sentences.
SOURCES OF RELATIVE CLAUSE PROCESSING DIFFICULTY
10
Based in part on an SPR experiment examining Russian SRC and ORC sentences as in 3a-d, Levy et al. (2013) also argued that a complete theory of RC processing must integrate both expectation- and memory-based accounts. 3.
a. [SRC, canonical] Slesar’,
kotoryj
udaril elektrika
so vsego razmaha,
ušel
Repairman,
who.NOM
hit
with all strength,
went
domoj
s
sinjakom
pod
home
with
bruise
under eye.
electrician.ACC glazom.
‘The repairman, who hit the electrician with all his strength, went home with a bruise under his eye.’ b. [SRC, non-canonical] Slesar’,
kotoryj
elektrika
udaril so vsego razmaha,
ušel
Repairman,
who.NOM
electrician.ACC
hit
went
domoj
s
sinjakom
pod
home
with
bruise
under eye.
with all strength,
glazom.
c. [ORC, canonical] Slesar’,
kotorogo
elektrik
udaril so vsego razmaha,
ušel
Repairman, whom.ACC electrician.NOM
hit
went
domoj
s
sinjakom
pod
glazom.
home
with
bruise
under eye.
with all strength,
‘The repairman, whom the electrician hit with all his strength, went home with a bruise under his eye.’
SOURCES OF RELATIVE CLAUSE PROCESSING DIFFICULTY
11
d. [ORC, non-canonical] Slesar’,
kotorogo
udaril elektrik
so vsego razmaha,
ušel
Repairman,
whom.ACC
hit
with all strength,
went
domoj
s
sinjakom
pod
home
with
bruise
under eye.
electrician.NOM glazom.
As illustrated in these examples, Russian permits different word orders in the RC -- the canonical order in 3a and 3c as well as the non-canonical word order in 3b and 3d. This allowed for SRCs and ORCs with local and non-local integration of NPs with the RC verb. In sentences with local integration (3a and 3d), the modified noun (slesar’ 'repairman') was followed by the RC verb (udaril 'hit') and then by the second NP argument in the clause (elektrik(a) ‘electrician’). In sentences with non-local integration (3b and 3c), however, the second NP intervened between the modified noun and the RC verb. Interestingly, a corpus analysis indicated that these non-local word-order configurations occur less frequently than their local counterparts for both SRCs and ORCs. In the SPR experiment, inflated reading times (RTs) were revealed at both the intervening NP and the immediately following RC verb in non-local SRC and ORC sentences, with no difference between these RC types. This pattern of results was interpreted as inconsistent with structural asymmetry models of RC processing difficulty. Rather, the results at the intervening NP were taken to indicate expectation-based processing difficulty for RCs with dispreferred word orders, while the findings at the verb were taken to index memory-based integration that was comparable for SRCs and ORCs when integration distance was held constant. Although these results seem to offer intriguing support for the combined influence of expectation- and memory-based effects in RC processing, questions remain about the nature of the processing costs at these regions and whether they can be attributed to independent sources.
SOURCES OF RELATIVE CLAUSE PROCESSING DIFFICULTY
12
Specifically, because the non-local SRCs and ORCs also involved dispreferred word orders, it is unclear whether the effect at the RC verb reflects memory-based integration processes or the spillover of expectation-based effects from the immediately preceding NP region. Alternatively, the effects at both of these regions might be explained with reference to working memory processes. That is, the effect at the intervening NP might be attributed to encoding effort when the clause requires two NPs to be held in memory before integration at the RC verb, while the effect at this verb might be due to retrieval and integration of these NPs. In light of the conflicting accounts of RC processing difficulty discussed above, the current study builds on and extends research into Russian RCs to determine the loci and sources of comprehension difficulty in these complex sentences. Of particular interest were the reading patterns on Russian SRC and ORC sentences in which an NP argument intervened between the modified noun and the RC verb. In Experiments 1 and 3, this intervening argument was a full NP, while in Experiment 2, it was a pronoun (see Table 1). These manipulations had several important implications. First, by holding the word-order configuration constant, it was possible to decouple RC type from the number of NP arguments available for integration at the RC verb. This allowed for a clear examination into whether RC processing difficulty relates to SRC/ORC structural asymmetries or integration costs. Furthermore, using different NP types in the embedded clause permitted an investigation into syntactic expectations without comparing across sentences with different word-order configurations. Indeed, a corpus analysis and acceptability rating experiment (see below) indicated that these NP types are associated with very different word-order frequencies/preferences in Russian SRCs and ORCs.
SOURCES OF RELATIVE CLAUSE PROCESSING DIFFICULTY
13
It is also important to emphasize several methodological features that were included in order to clarify the nature of processing costs in Russian RCs. First, processing difficulty in SRCs and ORCs was assessed by comparing these sentences with each other, as well as against matched complement clause (CC) sentences. These CC sentences provide a crucial baseline because they are associated with different patterns of word-order preferences (again, see below) and, perhaps more importantly, because they do not involve extraction out of the embedded clause. Moreover, in order to minimize the potential influence of spillover effects on adjacent regions of interest -- and thus to provide clearer indications of incremental processing differences -- the items included buffer material between critical regions. With respect to online processing measures, Experiments 1 and 2 used SPR, while Experiment 3 used eye tracking. This is important (a) because eye tracking is arguably more sensitive to incremental processing difficulty than SPR (Rayner, 1998; Rayner, Sereno, Morris, Schmauder, & Clifton, 1989; Witzel, Witzel, & Forster, 2012), (b) because it has the potential to reveal qualitative differences in the reading patterns for the sentences of interest (Staub, 2010; Staub et al., 2017), and (c) because unlike SPR, it allows for sentences to be read in their entirety and thus does not introduce extraneous working memory demands. Finally, in addition to these online measures, questions related to the embedded-clause material were included in order to evaluate the influence of the experimental manipulations on overall comprehension. Corpus Analysis In order to make predictions for the reading experiments in terms of expectations, the frequencies of canonical and non-canonical word orders in Russian RCs and CCs were analyzed. A previous corpus analysis by Levy and colleagues (2013) found that in Russian SRCs with full NPs, the canonical VO word order was more frequent than the non-canonical OV order (VO: 147
SOURCES OF RELATIVE CLAUSE PROCESSING DIFFICULTY
14
vs. OV: 4). However, for comparable ORCs, the non-canonical VS order was more frequent than the canonical SV order (SV: 29 vs. VS: 41). Levy et al.’s findings also suggested that word-order preferences in these RCs, and particularly in ORCs, depended on the type of embedded-clause NP. Specifically, when all ORCs were considered, including those with full NPs and pronouns, the canonical SV word order was more frequent. That is, the higher frequency of occurrence for the non-canonical VS order emerged only when ORCs with full NPs were analyzed separately. The present study extends Levy et al.'s (2013) analysis by examining word orders in Russian SRCs/ORCs with full NPs and pronouns using a larger sample of sentences (928 compared to their 279 sentences with case-marked relativizers). Furthermore, because previous studies have indicated that animacy influences RC processing (Mak et al., 2002; Traxler et al., 2005), frequencies related to the animacy of the RC head and RC NP were also examined. In addition, the pronoun types in pronominal RCs were analyzed to assess the frequency of the firstperson pronouns used in Experiment 2. Finally, Russian CCs were analyzed to examine whether word-order preferences in RCs apply to other embedded clause types. Method The data for these analyses were accessed from the Russian National Corpus (http://ruscorpora.ru/en/). For RCs, a Key Word in Context search with the relativizer kotor* (‘which’) identified 4,340 sentences in the top results from the main corpus. Sentences were excluded (a) if they did not include an SRC or ORC, (b) if the relativizer did not appear clauseinitially, or (c) if there was no overt subject or direct object in the RC. Sentences with a pronoun or a proper name as the RC head or with a proper name in the RC were also excluded so that the sentences under investigation would resemble the items in the reading experiments. This resulted in a sample of 928 SRCs/ORCs with full NP heads and complete structures in the RC -- that is, a
SOURCES OF RELATIVE CLAUSE PROCESSING DIFFICULTY
15
relativizer followed by a verb and direct object in SRCs, and by a subject and verb in ORCs. These RCs were then classified according to RC type (SRC, ORC), RC NP type (full NP, pronoun), RC word order (canonical: VO/SV, non-canonical: OV/VS), and RC head/RC NP animacy (animate, inanimate). For CCs, a lexico-grammatical search for animate nouns followed by any of the mainclause verbs used in the reading experiments (e.g. skazal* ‘said’), a comma, and čto ‘that’ identified 2,079 sentences in the top results from the main corpus. Sentences were excluded (a) if čto was used as the wh-word ‘what’ and not as a complementizer, or (b) if there was no overt subject or direct object in the CC. This resulted in a sample of 354 CCs with the complementizer čto followed by a subject, verb, and direct object. These CCs were then classified according to the NP type in subject and object positions (full NP, pronoun) and CC word order (canonical: SVO, non-canonical: OVS, OSV, SOV, VOS). Results and Discussion RC frequencies. In the RC sentences, SRCs were more frequent than ORCs (657 vs. 271; χ2 = 160.56, df = 1, p < .001). It is important to note that all RC verbs in the sample were transitive, so this result was not skewed by the fact that SRCs can be formed with both transitive and intransitive verbs, while ORCs can only be formed with transitive verbs (for more on this criticism of previous corpus analyses, see Gordon & Lowder, 2012). The pattern for sentences with full NPs in the RC was the same as for the complete set of RC sentences. That is, full-NP SRCs were more frequent than full-NP ORCs (589 vs. 149; χ2 = 262.33, df = 1, p < .001). This pattern was reversed however when there was a pronoun in the RC -- pronominal SRCs were less frequent than pronominal ORCs (68 vs. 122; χ2 = 15.35, df = 1, p < .001). This result corresponds to findings from corpus analyses in English, and might be attributed to the fact that
SOURCES OF RELATIVE CLAUSE PROCESSING DIFFICULTY
16
NPs in ORCs often refer to information that is given in context (Fox & Thompson, 1990; Gordon & Hendrick, 2005; Gordon & Lowder, 2012; Reali & Christiansen, 2007). Frequencies of RC word orders. Table 2 shows the frequencies for SRCs and ORCs with different NP types and word orders. Out of the 738 full-NP RCs, SRCs with the canonical VO word order occurred most frequently (581), while SRCs with the non-canonical OV order were the least frequent, occurring only 8 times (χ2 = 557.43, df = 1, p < .001). Full-NP ORCs appeared relatively frequently with both the canonical SV (58) and non-canonical VS (91) word orders, but were more frequent with the non-canonical order (χ2 = 7.31, df = 1, p < .01). Thus, there was a strong preference for the canonical word order in full-NP SRCs, but a preference in the opposite direction -- for the non-canonical order -- in full-NP ORCs. According to a chisquare test of independence, this difference in the word-order frequencies for full-NP SRCs and ORCs was highly significant (χ2 = 359.96, df = 1, p < .001). Although a complete account for this pattern is beyond the scope of this paper, one possible explanation relates to information structure (Bailyn, 2012; Kovtunova, 1976; Krylova & Khavronina, 1988; Svedova, 1980). Under such an account, since the embedded full NP usually expresses new information, it is natural for it to occur toward the right edge of the clause, even if this results in a non-canonical word order in ORCs (see Levy et al., 2013, for a comparable account). Pronominal RCs showed essentially the opposite pattern of word-order frequencies. Out of the 190 pronominal RC sentences, ORCs with the canonical SV word order occurred most frequently (116), while ORCs with the non-canonical VS order were the least frequent, appearing in only 6 instances (χ2 = 99.18, df = 1, p < .001). SRCs with the canonical VO and non-canonical OV word orders had very similar frequencies (35 and 33, respectively; χ2 = 0.06, df = 1, p = .81). Thus, there was a strong preference for the canonical word order in pronominal
SOURCES OF RELATIVE CLAUSE PROCESSING DIFFICULTY
17
ORCs, but no preference for either order in pronominal SRCs. According to a chi-square test of independence, this difference in word-order frequencies for pronominal SRCs and ORCs was highly significant (χ2 = 48.27, df = 1, p < .001). The tendency for pronominal RC subjects and objects to appear before the verb could be explained in several ways. For instance, with reference again to information structure, since pronouns do not convey new information, they are less likely to appear toward the right edge of the clause. Moreover, stress patterns in Russian prevent personal pronouns from appearing clause-finally, as they cannot bear primary word stress unless focused (Rappaport, 1988). < Insert Table 2 around here> Animacy of the RC head and RC NP. The sentences of interest in the reading experiments reported below had animate RC heads and RC NPs. Among the full-NP RCs in the corpus, there were 17 SRCs and 12 ORCs with both animate RC heads and animate RC NPs; for pronominal RCs, the counts were 16 for SRCs and 12 for ORCs. Although this animacy combination does not appear frequently, these numbers indicate that it is not especially marked for any of the RC types under investigation (see Mak et al., 2002, for discussion of this issue). Pronoun types in the pronominal RCs. Out of the 190 sentences with pronominal RCs, first- and third-person pronouns occurred most frequently in both ORCs and SRCs (ORCs: 122 total, 52 first-person, 51 third-person; SRCs: 68 total, 16 first-person, 27 third-person). Firstperson pronouns were used in the reading experiments reported below. Frequencies of CC word orders. Table 3 presents the frequencies for CCs with different NP types and word orders. CCs with the canonical SVO word order were generally more frequent than those with non-canonical word orders (280 vs. 74; χ2 = 119.88, df = 1, p < .001).
SOURCES OF RELATIVE CLAUSE PROCESSING DIFFICULTY
18
This was the case regardless of the NP types that appeared in this clause. Specifically, with respect to the word orders of particular interest in the experiments below, the canonical SVO order occurred more frequently than the non-canonical OVS order in CCs with full-NP subjects and objects (161 vs. 15; χ2 = 121.11, df = 1, p < .001), in CCs with full-NP subjects and pronominal objects (26 vs. 10; χ2 = 7.11, df = 1, p < .01), in CCs with pronominal subjects and full-NP objects (78 vs. 0; χ2 = 78.00, df = 1, p < .001), as well as in CCs with pronominal subjects and objects (15 vs. 0; χ2 = 15.00, df = 1, p < .001). In sum, the complete set of corpus analyses indicates different word-order preferences in Russian SRCs and ORCs depending on the type of NP in the embedded clause. For full-NP SRCs, the canonical VO order is preferred over the non-canonical OV order; however, for fullNP ORCs, the non-canonical VS order is preferred over the canonical SV order. Pronominal RCs reveal a very different pattern. In contrast to the strong preference for the canonical order in fullNP SRCs, there is no clear word-order preference in pronominal SRCs. This suggests that the non-canonical OV word order in SRCs is most likely to occur in pronominal versions of these clauses. For pronominal ORCs, the canonical SV order is preferred over the non-canonical VS order, which is the opposite of the pattern for their full-NP counterparts. These analyses also indicate that word-order preferences in Russian CCs are not similarly influenced by embeddedclause NP types. Regardless of whether these clauses have full-NP or pronominal arguments, the canonical SVO order is preferred over non-canonical orders. Acceptability Rating Experiment An acceptability rating experiment was also conducted to examine Russian native speakers’ word-order preferences in RCs and CCs. The items included sentences with embedded full NPs, which were the focus of Experiments 1 and 3, as well as sentences with embedded first-
SOURCES OF RELATIVE CLAUSE PROCESSING DIFFICULTY
19
person pronouns, which were examined in Experiment 2. Items with embedded third-person pronouns were also tested to determine whether similar word-order preferences apply across embedded clauses with different pronoun types. Method Participants. Fifty-four adult native speakers of Russian participated in the experiment online for monetary compensation. On language background screening questions, all participants indicated that Russian was their native and dominant language. Materials and design. The item sets from the reading experiments provided the basis for the test sentences (see Table 1). Specifically, there were 48 items that appeared in conditions defined by (a) embedded clause type (RC, CC), (b) embedded NP type (full NP, first-person pronoun, third-person pronoun), (c) sentence type (SRC, ORC, which for CCs meant that the embedded-clause word order up to the verb was the same as in the corresponding RC), and (d) word order (canonical: VO/SV for RCs, SVO for CCs; non-canonical: OV/VS for RCs, OVS for CCs). These items were based on simplified versions of the sentences in the reading experiments, in which adverbials, prepositional phrases (PPs), and instrumental-case NPs were removed wherever possible. Sentences with third-person pronouns were presented with a context sentence that provided the pronoun’s antecedent. Eight counterbalanced lists were created, in which each item appeared three times, but each time with a different embedded NP type and a different combination of the other three factors listed above. Thirty-two filler sentences were also created -- 16 ungrammatical sentences (e.g., Prepodavatel’ proverjat’ ekzamenacionnyje raboty. ‘*The instructor to grade exams.’) and 16 grammatical sentences with canonical word order (e.g., Frontoviki polučili materialnuju pomošč. ‘The veterans received welfare.’)
SOURCES OF RELATIVE CLAUSE PROCESSING DIFFICULTY
20
Procedure. The experiment was run using the web-based implementation of the DMDX software package (Forster & Forster, 2003; Witzel, Cornelius, Witzel, Forster, & Forster, 2013), making it possible to participate in the experiment over the Internet. Language background screening questions and written instructions were provided in Russian at the beginning of the experiment. In the rating task, each target sentence was presented on a single line, in the center of the computer screen. Context sentences for items with third-person pronouns were presented above the target sentence and were marked as контекст ‘context’. For these items, participants were instructed to read both the context and target sentences, but to rate only the target. The rating scale -- 1 (completely unacceptable) - 2 (not fully acceptable) -3 (somewhat acceptable) 4 (acceptable) - 5 (completely acceptable) -- was presented in Russian just below the target sentence. Participants rated the sentences using the 1-5 keys on the keyboard, and there was a 30second timeout for each trial. The task began with four practice items. Items were then presented in a different random order for each participant in sets of 22, with a short break after each set. Data analysis. Each dataset in the analysis met three inclusion criteria. First, as an indication of reliability in the ratings, the participant’s mean scores for duplicate conditions could differ by no more than 1 point. These duplicate conditions were two sets of structurally identical CCs with full NPs that differed only in terms of the lexical material in the subject and object positions. The other two criteria related to ratings on the filler items. If the participant’s mean rating was less than 4 for the grammatical fillers or greater than 2 for the ungrammatical fillers, the dataset was excluded. The datasets from six participants did not meet one or more of these inclusion criteria. Trials on which the participant reached the 30-second timeout were also discarded (0.91% of the trials). The data for RCs and CCs were analyzed separately. The main analyses consisted of ANOVAs over mean ratings by subjects (F1) and items (F2) with
SOURCES OF RELATIVE CLAUSE PROCESSING DIFFICULTY
21
embedded NP type (full NP, first-person pronoun, third-person pronoun), sentence type (SRC, ORC), and word order (canonical, non-canonical) as repeated measures and list/item group as a grouping factor. Results and Discussion RC results. The mean ratings for the RC sentences are presented in Figure 1. The main analysis revealed a significant three-way interaction of embedded NP type, sentence type, and word order (F1 (2, 80) = 123.92, p < .001; F2 (2, 80) = 188.89, p < .001), indicating differences in the word-order preferences for SRCs and ORCs depending on the type of NP in the RC. In order to shed light on nature of this interaction, separate analyses were conducted for RC sentences with each embedded NP type. < Insert Figure 1 around here > For full-NP RCs, there were significant effects of sentence type (F1 (1, 40) = 26.66, p < .001; F2 (1, 40) = 51.71, p < .001) and word order (F1 (1, 40) = 17.23, p < .001; F2 (1, 40) = 27.26, p < .001). More importantly, the interaction of these factors was also significant (F1 (1, 40) = 123.51, p < .001; F2 (1, 40) = 285.23, p < .001), indicating different word-order preferences in SRCs and ORCs. Pairwise comparisons showed that for SRCs, the canonical VO order was preferred (F1 (1, 40) = 96.55, p < .001; F2 (1, 40) = 290.94, p < .001). However, the opposite preference was shown for ORCs, with the non-canonical VS order rated higher than the canonical SV order (F1 (1, 40) = 59.18, p < .001; F2 (1, 40) = 65.41, p < .001). Pronominal RCs revealed a different pattern. For RCs with first-person pronouns, there were significant effects of sentence type (F1 (1, 40) = 32.52, p < .001; F2 (1, 40) = 29.81, p < .001) and word order (F1 (1, 40) = 83.60, p < .001; F2 (1, 40) = 172.84, p < .001) as well as a significant interaction of these factors (F1 (1, 40) = 46.89, p < .001; F2 (1, 40) = 67.78, p
22, p's < .001). The complex interaction reported above appears to be due to a particular dispreference for the non-canonical order in CC sentences involving embedded first- and third-person pronouns with nominative case. Thus, consistent with the corpus analysis, the results indicated that the non-canonical OVS word order is dispreferred in embedded CCs, regardless of the embedded NP type. < Insert Figure 2 around here> Experiment 1 Experiment 1 tested Russian SRC and ORC sentences, along with corresponding CC sentences, in an SPR task. As shown in the examples in Table 4, the linear word-order configuration in the embedded clause (RC or CC) was held constant across conditions. According to the corpus analysis and acceptability rating experiment reported above, this was the dispreferred word order in all but the ORC-CC control condition. The presentation regions for these sentences are also indicated in these examples. < Insert Table 4 around here> The predictions were as follows: First, expectation-based theories predict processing costs at the relativizer in ORC sentences (e.g., kotoruju.ACC ‘whom.ACC’). This is because the case-marking on this word indicates that the embedded clause is a relatively less frequent ORC. These theories also predict processing difficulty when unexpected word orders are encountered. Under these models, therefore, inflated RTs should be observed at the embedded NP (staruška(u) ‘old_lady’) in sentences with less frequent/dispreferred embedded-clause word orders -- that is, in SRCs, SRC-CC controls, and ORCs -- relative to ORC-CC controls, which had the
SOURCES OF RELATIVE CLAUSE PROCESSING DIFFICULTY
24
preferred word order. If RC processing difficulty is attributable solely to expectation effects, it should be largely confined to these early regions of the embedded clause. Memory-based accounts of RC comprehension difficulty predict a different locus of processing costs -- when NP arguments are integrated with the embedded verb. These costs should be revealed at SRC and ORC verbs (e.g., rasstroila ‘upset’) relative to their respective CC controls, which do not involve extraction out of the embedded clause. Crucially, in both SRCs and ORCs, two full-NP arguments had to be held in the working memory before integration with the RC verb. This allows effects related to memory-based integration to be teased apart from those related to RC structure. If the processing costs at this verb relate only to the number of NP arguments available for integration, these costs should be comparable in both SRCs and ORCs. Under structure-based theories of RC processing difficulty, on the other hand, there should be larger processing costs for ORCs even under these conditions. Finally, hybrid accounts -- and in particular those put forth by Staub (2010) and Levy and colleagues (2013) -- predict independent processing costs related to expectation-based and memory-based sources in these sentences. That is, effects related to frequency/experience-based expectations should be obtained early in the embedded clause, while integration costs should be observed later in the clause, at the RC verb. In light of these predictions, the regions of particular interest were the relativizer/complementizer, the embedded NP, and the embedded verb. As illustrated in the example sentences, buffer regions were added after each of these points to attenuate the influence of possible spillover effects. In addition, as a measure of overall comprehension, each experimental item was followed by a question related to the embedded clause.
SOURCES OF RELATIVE CLAUSE PROCESSING DIFFICULTY
25
Method Participants. Forty-three adult native speakers of Russian participated in the experiment online. On language background screening questions, all participants indicated that Russian was their native and dominant language. Materials and design. The experimental items consisted of 48 sets of sentences as in Table 4 (see Appendix A). Each RC sentence involved a pair of interchangeable animate NPs (e.g., hozjajka ‘housewife’, staruška ‘old_lady’) so that both the SRC and ORC could be constructed with the same lexical items, in the same order, by changing only case-marking (from nominative to accusative, and vice versa). Half of the sentences had feminine NPs and half masculine NPs, with the relativizer inflected accordingly. These NPs were both plausible agents and patients of the embedded verb. In CCs, a third NP (e.g., tetuška ‘aunty’) was used as the subject or object of the embedded clause and appeared at the end of this clause. The buffer regions between the critical regions were adverbials, PPs, or instrumental-case NPs. Four counterbalanced lists were constructed such that each item appeared under each combination of the sentence type (SRC, ORC) and embedded clause type (RC, CC control) factors across lists. Twelve practice items and 48 fillers were also included. These items consisted of a variety of sentence types and were comparable in length with the experimental items (filler: M = 96.42 characters; experimental: M = 96.15 characters). As in the experimental items, half of the practice/filler items started with masculine animate NPs, and half with feminine animate NPs. YES/NO comprehension questions were created for each item, with equal numbers of YES and NO responses. The questions for experimental items were based on information in the embedded clause. The same question was used for both RC conditions (e.g. Hozjajka rasstroila starušku
SOURCES OF RELATIVE CLAUSE PROCESSING DIFFICULTY
26
novostjami? ‘Did the housewife upset the old lady with the news?’) and both CC conditions (e.g. Tetuška rasstroila starušku novostjami? ‘Did the aunty upset the old lady with the news?’). Procedure. The experiment used a self-paced, moving-window reading task (Just, Carpenter, & Woolley, 1982) and was again run with the web-based implementation of DMDX. Language background screening questions and written instructions were provided in Russian at the beginning of the experiment. In the reading task, each trial began with a line of dashes on the computer screen in place of the words in the sentence. The first word was displayed when the participant pressed right CTRL key, and each subsequent key press revealed the next word/phrase in the sentence and masked the previous word/phrase. The time between the presentation of each word/phrase and the subsequent key press was recorded. Due to the length of the sentences, each item was presented on two lines. The line split was at the same word in experimental items -- after spillover region 3. Each sentence was followed by a YES/NO comprehension question. The right CTRL key was used for YES responses, and the left CTRL key for NO responses. Feedback was provided in Russian after each response. Participants were instructed to read at their natural pace and answer comprehension questions as accurately as possible. The task began with 12 practice items. Experimental and filler items were then presented in a different random order for each participant in sets of 16, with a short break after each set. Data analysis. The data from 11 participants with overall error rates (ERs) on comprehension questions of 20% or higher were eliminated from the analysis. The data for the remaining 32 participants (overall ER: M = 11.28%, SD = 4.35) were analyzed as follows: The RT analyses were conducted over the six regions of the embedded clause and included data from both correctly- and incorrectly-answered trials. RTs below 100 ms or above 4000 ms were
SOURCES OF RELATIVE CLAUSE PROCESSING DIFFICULTY
27
discarded (0.24% of the data), and outlier data points (4.86% of the data) were adjusted to two standard deviations (SDs) above and below the participant’s mean for each region. The mean RTs for each region of interest are shown in Table 5 and Figure 3. The statistical analyses for these RTs as well as for the comprehension question ERs consisted of 2x2x4 ANOVAs for both subjects (F1) and items (F2), with sentence type (SRC, ORC) and clause type (RC, CC control) as repeated measures and list/item group as a grouping factor. < Insert Table 5 around here> < Insert Figure 3 around here> Results Comprehension accuracy. The mean ERs on the comprehension questions were as follows: SRC: 15.63% (SEM: 1.61), SRC-CC control: 19.01% (SEM: 1.88), ORC: 28.13% (SEM: 2.14), ORC-CC control: 13.54% (SEM: 1.72). The analysis revealed a reliable main effect of clause type (F1 (1, 28) = 7.41, p < .05, F2 (1, 44) = 8.19, p < .01), but not of sentence type (F1 (1, 28) = 2.41, p = .13, F2 (1, 44) = 1.54, p = .22), as well as a significant interaction (F1 (1, 28) = 27.16, p < .001, F2 (1, 44) = 14.05, p < .001). This interaction was driven by a higher ER for ORCs than for any other condition (ORC vs. SRC: F1 (1, 28) = 16.03 , p < .001, F2 (1, 44) = 12.41 , p < .01; ORC vs. SRC-CC control: F1 (1, 28) = 6.59, p < .05, F2 (1, 44) = 5.90, p < .05; ORC vs. ORC-CC control: F1 (1, 28) = 23.21, p < .001, F2 (1, 44) = 17.34 , p < .001). Reading times. At the beginning of the embedded clause (Rel/Comp), although RC relativizers were read more slowly than CC complementizers (F1 (1, 28) = 15.05, p < .001, F2 (1, 44) = 29.50, p < .001), there was no reliable difference between the nominative relativizers in SRCs and the accusative relativizers in ORCs (both F’s < 1). In the immediately following region (Spill R1), there was again no difference between SRCs and ORCs (both F’s < 1). Rather,
SOURCES OF RELATIVE CLAUSE PROCESSING DIFFICULTY
28
there was only a marginally significant effect suggesting that CCs were read slower than RCs (F1 (1, 28) = 4.71, p < .05, F2 (1, 44) = 3.62, p = .06). At the embedded NP (RC NP/CC NP1), the RTs corresponded to the frequencies/preferences associated with the sentence conditions. Specifically, SRCs and their CC controls -- both of which had infrequent/dispreferred embedded-clause word orders -- were read more slowly than their ORC counterparts (F1 (1, 28) = 7.32, p < .05, F2 (1, 44) = 10.21, p < .01). There was also a marginally significant clause type effect suggesting that RCs were read more slowly than CCs (F1 (1, 28) = 2.98, p = .10, F2 (1, 44) = 4.34, p < .05), but no interaction (both F’s < 1.8). Crucially, the fastest RTs were found in ORC-CC control sentences -- the only sentence type that had the preferred embedded-clause word order. Under pairwise comparisons, RTs at the embedded NP in these sentences were faster than in any other condition (ORC-CC control vs. SRC: F1 (1, 28) = 7.64, p < .05, F2 (1, 44) = 13.72, p < .001; ORC-CC control vs. SRC-CC control: F1 (1, 28) = 10.83, p < .01, F2 (1, 44) = 16.65, p < .001; ORC-CC control vs. ORC: F1 (1, 28) = 5.91, p < .05, F2 (1, 44) = 5.17, p < .05). There were no significant effects in the immediately following region (Spill R2) (all F’s < 2.0). At the embedded verb (RC/CC Verb), there was a significant main effect of clause type, indicating that RC verbs took longer to read than CC verbs (F1 (1, 28) = 18.74, p < .001, F2 (1, 44) = 23.22, p < .001). There was however no sentence type effect and no interaction of sentence type and clause type (all F’s < 1.3). There was also no reliable difference between SRCs and ORCs at this verb (both F’s < 1). In the immediately following region (Spill R3), there was again no difference between these RC types (both F’s < 1). Rather, similar to the pattern of results at the embedded verb, this region revealed only a significant main effect of clause type, with longer RTs in the RC conditions (F1 (1, 28) = 28.91, p < .001, F2 (1, 44) = 63.94, p < .001). Although
SOURCES OF RELATIVE CLAUSE PROCESSING DIFFICULTY
29
this result might seem to reflect the spillover of processing difficulty from the immediately preceding RC verb, it is important to note that this region ended with a comma in the RCs, but not in CCs (see the example sentences). This punctuation difference could account for some, if not all, of the RT differences between RCs and CCs in this final region of interest (see below). Discussion The pattern of RT results revealed RC processing difficulty at different points in the embedded clause -- early in the clause, at the NP that indicated the dispreferred word order, as well as late in the clause, at the verb that allowed for integration of its NP arguments. Furthermore, the processing costs at the verb were comparable for SRCs and ORCs when the same number of full-NP arguments was available for integration. This absence of particular online comprehension difficulty for ORC sentences runs contrary to structure-based accounts of RC processing. Rather, the effects early in the clause are consistent with expectation-based models, while those later in the clause are consistent with memory-based accounts. These findings thus support a hybrid model under which both expectations and memory processes play core roles in the online comprehension of RC sentences. The results early in the embedded clause were as follows: First, there was no processing difficulty for ORC accusative-case relativizers compared to SRC nominative-case relativizers. As discussed above, expectation-based accounts predict this difficulty due to the lower overall frequency of ORCs in Russian. One possibility is that the SPR methodology is not sensitive enough to detect these effects. Indeed, Levy et al.’s (2013) SPR experiments on comparable Russian RC sentences also showed no reliable difference between SRCs and ORCs at the relativizer. Clear indications of expectation-based effects were however found at the first embedded NP -- the region where the unexpected word order was first encountered. At this
SOURCES OF RELATIVE CLAUSE PROCESSING DIFFICULTY
30
point, there were inflated RTs for sentences with dispreferred embedded-clause word orders -that is, for SRCs, SRC-CC controls, and ORCs. A different pattern of results was obtained at the embedded verb. In line with memorybased accounts, there were comparable processing costs for SRCs and ORCs when the same number of full-NP arguments was available for integration. Specifically, RTs at this verb were longer in both SRCs and ORCs than in their respective CC controls, and there was no reliable difference between these RC types. While these findings do not adjudicate among different memory-based models of RC processing difficulty -- e.g., the DLT, similarity-based interference, or cue-based retrieval models -- they clearly show that memory-based integration processes contribute to this difficulty. A comparable pattern of results was also revealed in the region immediately following the embedded verb. As noted above, however, this effect should be interpreted with caution in light of the fact that this region ended in a comma for RCs, but not for CCs. The processing difficulty for RCs in this region is likely due in large part to the end-of-theclause wrap-up effects that are often triggered by this punctuation (Just & Carpenter, 1980; Kennison, Sieck, & Briesch, 2003; Mitchell & Green, 1978; Rayner, Kambe, & Duffy, 2000; Witzel, Witzel, & Nicol, 2012). This interpretive difficulty notwithstanding, it is important to note that there was again no difference between SRCs and ORCs in this spillover region. A posthoc analysis of RTs at the main-clause verb (legla ‘lay’) also revealed no reliable difference between these RC types (SRC: 718 ms vs. ORC: 697 ms; both F’s < 1.3). It is important to emphasize that although this experiment revealed RT results comparable to those of Levy et al. (2013), its design made it possible to clarify the sources of the effects at both the embedded NP and the embedded verb. First, as in Levy et al. (2013), longer processing times were observed at the embedded NP in both SRCs and ORCs. As previously
SOURCES OF RELATIVE CLAUSE PROCESSING DIFFICULTY
31
mentioned, when considered in isolation, these effects might be interpreted to reflect encoding difficulties at a second, similar NP that intervenes between the modified noun and the RC verb. However, in the present study, inflated RTs were also found at this NP in SRC-CC control sentences -- a sentence type in which this NP would not trigger comparable encoding difficulties. The complete pattern of results at this NP therefore resists an encoding-based explanation. Rather, because the processing times for both RCs and CCs in this region correspond straightforwardly to their word-order frequency/preference profiles, these effects can be taken to index expectation-based processes. The comparison of RCs with CC control sentences and the inclusion of intervening material between the regions of interest also allowed for a clearer picture of processing difficulty at the RC verb. Specifically, as discussed above, the CC controls provided a baseline for memory-based integration costs at this verb, while the intervening regions allowed these costs to be distinguished from the possible spillover of processing difficulty at the RC NP. The present study also used comprehension questions targeting the interpretation of the embedded clause, which provided additional insight into the processing of these sentences. Interestingly, while there were no clear RT differences between SRCs and ORCs, there were higher ERs on the comprehension questions for ORCs compared to the other sentence conditions. Although these comprehension problems might be due to incremental processing difficulties that are not readily captured under SPR (a possibility that is explored in Experiment 3), the absence of particular online processing costs for ORCs suggests that these problems relate to relatively late stages of sentence interpretation. That is, this pattern of results suggests that after the sentence is read, it might be more difficult to remember, distinguish, and/or organize the roles of NP arguments (i.e., the agent and patient) in ORCs than in SRCs. The fact that both
SOURCES OF RELATIVE CLAUSE PROCESSING DIFFICULTY
32
SRCs and ORCs had comparable word order configurations, with the same number of NPs available for integration at the RC verb, suggests that this late effect relates to more abstract structural differences between these RCs -- and possibly to the differences in the extraction site associated with the modified NP. It is important to note however that this apparent effect of RC structure on overall comprehension might interact with similarity-based interference. Recall that the SRCs and ORCs in this experiment required two similar full NPs to be indexed in memory before their integration at the RC verb. It is possible that this similarity is particularly disruptive to overall comprehension when assigning roles to arguments and/or slotting them into their appropriate syntactic positions in ORCs. Experiment 2 Consistent with hybrid accounts of RC processing difficulty, Experiment 1 indicated that both syntactic expectations and memory-based integration play key roles in the incremental processing of Russian sentences with full-NP RCs. These sentences were also characterized by late-stage comprehension difficulty for ORCs, suggesting the influence of RC structure on their overall interpretation. Experiment 2 investigated these effects further by examining Russian pronominal SRCs and ORCs, along with corresponding CC sentences, as illustrated in Table 6. These sentences were the same as in Experiment 1, but first-person pronouns were used instead of full NPs in the embedded clause. The linear word-order configuration in the embedded clause was again held constant across conditions. For SRCs and ORCs, this meant that the embedded pronominal NP intervened between the modified head and the embedded verb. As indicated in the corpus analysis and acceptability ratings, while this is the dispreferred word order in full-NP RCs, it is preferred in pronominal RCs. Indeed, these analyses indicated that this is the preferred embedded-clause word order in all but the SRC-CC control condition.
SOURCES OF RELATIVE CLAUSE PROCESSING DIFFICULTY
33
< Insert Table 6 around here > As in Experiment 1, processing difficulty was predicted for ORC accusative-case relativizers. Although this effect was not observed in the previous experiment, expectation-based theories predict this difficulty due to the overall lower frequency of ORCs in Russian. These theories also predict that processing times at the first embedded NP should correspond to the frequency/preference profiles for the sentences in this experiment. Specifically, inflated RTs should be observed at this NP only for sentences in which the embedded clause had the dispreferred word order -- SRC-CC control sentences. A different pattern of results was predicted at the RC verb. If processing costs at this point relate only to the number of NPs available for integration, these costs should again be comparable for both SRCs and ORCs. Note however that this processing difficulty might be attenuated in this case because the integration of the modified full NP takes place over a pronoun/dissimilar word. Finally, in order to examine late-stage comprehension, each item was again followed by a question targeting information in the embedded clause. It was predicted that if the comprehension difficulty observed for ORCs in Experiment 1 relates only to RC structure, this difficulty should be observed in the present experiment as well. However, if this difficulty relates to both the structure of ORCs and similarity-based interference, it should be attenuated (or eliminated) due to the dissimilarity of the modified full NP and the RC pronoun. Method Participants. Forty-four adult native speakers of Russian participated in the experiment online. On language background screening questions, all participants indicated that Russian was their native and dominant language.
SOURCES OF RELATIVE CLAUSE PROCESSING DIFFICULTY
34
Materials and design. The 48 sets of experimental items were created from those in Experiment 1 by replacing the embedded full NP with a first-person pronoun (half singular, half plural) (see Appendix B). These pronouns were considered implicitly present in the discourse, so they did not require additional context. The counterbalancing procedures followed those of Experiment 1. As in this previous experiment, there were 12 practice items and 48 fillers that consisted of a variety of sentence types and that were comparable in length with the experimental items (filler: M = 90.62 characters; experimental: M = 90.12 characters). As in the experimental items, half of the practice/filler items began with masculine animate NPs, and the other half with feminine animate NPs, and each included a singular or plural first-person pronoun. YES/NO comprehension questions again targeted information in the embedded clause, and there were equal numbers of YES and NO responses. The same question was used for both RC conditions (e.g. Byli li my rasstroeny hozjajkoj? ‘Were we upset by the housewife?’) and both CC conditions (e.g. Byli li my rasstroeny tetuškoj? ‘Were we upset by the aunty?’). Half of the questions used passive voice so that the correct response was unrelated to whether the pronouns in the sentence and question had the same case marking. Procedure. The procedure was the same as in Experiment 1. Data analysis. The data from four participants with overall ERs on comprehension questions of 20% or higher were eliminated from the analysis. The data for the remaining 40 participants (overall ERs: M = 5.63%, SD = 4.95) were trimmed and analyzed using the same procedures as in Experiment 1. As in the previous experiment, RTs below 100 ms or above 4000 ms were discarded (0.34% of the data), and outlier data points (5.06% of the data) were adjusted to two SDs above and below the participant’s mean for each region. The mean RTs for each region of interest are shown in Table 7 and Figure 4.
SOURCES OF RELATIVE CLAUSE PROCESSING DIFFICULTY
35
< Insert Table 7 around here> < Insert Figure 4 around here> Results Comprehension accuracy. The mean ERs on the comprehension questions were as follows: SRC: 11.88% (SEM: 1.15), SRC-CC control: 5.83% (SEM: 0.96), ORC: 8.96% (SEM: 0.85), ORC-CC control: 6.25% (SEM: 0.86). The analysis revealed a significant effect of clause type, with RCs more difficult to comprehend than CCs (F1 (1, 36) = 17.11, p < .001, F2 (1, 44) = 9.04, p < .01; all other F’s < 1.8). Reading times. At the beginning of the embedded clause (Rel/Comp), there was a significant main effect of clause type, with relativizers taking longer than complementizers (F1 (1, 36) = 19.52, p < .001, F2 (1, 44) = 42.35, p < .001). More importantly, there was a reliable effect of sentence type (F1 (1, 36) = 4.39, p < .05, F2 (1, 44) = 5.95, p < .05), indicating that ORCs and their controls were read slower than their SRC counterparts. While the interaction was not significant (both F’s < 1.1), this sentence type effect was driven by particularly long RTs for the accusative-case relativizer in ORCs. Indeed, under pairwise comparisons, ORCs were read marginally slower than SRCs (F1 (1, 36) = 3.62, p = .07, F2 (1, 44) = 6.68, p < .05). In the immediately following region (Spill R1), there were no significant effects (sentence type: F1 = 3.03, p = .09, F2 = 2.04, p = .16; clause type: F1 = 1.91, p = .18, F2 = 2.98, p = .09; interaction: both F’s < 1). At the embedded pronominal NP (RC NP/CC NP1), there was particular processing difficulty for SRC-CC control sentences, the only sentence condition in which this region indicated a dispreferred embedded-clause word order. In this region, there was a significant main effect of sentence type (F1 (1, 36) = 6.82, p < .05, F2 (1, 44) = 15.43, p < .001), with longer RTs
SOURCES OF RELATIVE CLAUSE PROCESSING DIFFICULTY
36
for SRCs and their controls. There was also a significant interaction of sentence type and clause type (F1 (1, 36) = 6.80, p < .05, F2 (1, 44) = 10.28, p < .01). This interaction indicated that while SRC-CC control sentences had longer RTs than SRCs, there was a difference in the opposite direction for ORCs and their controls. Under pairwise comparisons, SRC-CC controls had significantly longer RTs than ORC-CC controls (F1 (1, 36) = 11.52, p < .01, F2 (1, 44) = 21.67, p < .001) and marginally longer RTs compared to SRCs (F1 (1, 36) = 3.22, p = .08, F2 (1, 44) = 3.18, p = .08) and ORCs (F1 (1, 36) = 2.81, p = .10, F2 (1, 44) = 4.04, p = .05). In the immediately following region (Spill R2), there were no significant effects (all F’s < 1.8). At the embedded verb (RC/CC Verb), there were no indications of integration costs for SRC/ORC sentences (clause type, interaction: all F’s < 1.9). There was only a main effect of sentence type that was significant by items but not by subjects (F1 (1, 36) = 2.73, p = .11, F2 (1, 44) = 5.33, p < .05), suggesting generally longer processing times for ORCs and their controls. There was also no reliable difference between SRCs and ORCs at this verb (both F’s < 1.2). In the immediately following region (Spill R3), there was again no difference between these RC types (both F’s < 1). However, this final region of interest revealed a main effect of clause type, with RCs taking longer than CCs (F1 (1, 36) = 8.95, p < .01, F2 (1, 44) = 26.57, p < .001), as well as a significant interaction (F1 (1, 36) = 14.57, p < .001, F2 (1, 44) = 6.48, p < .05), indicating that this clause type effect was particularly strong for SRCs (SRC vs. SRC-CC control: F1 (1, 36) = 19.42, p < .001, F2 (1, 44) = 42.73, p < .001; ORC vs. ORC-CC control: F1 (1, 36) = 1.27, p = .27, F2 (1, 44) = 1.80, p = .19). As noted above, however, clause type differences in this region should be interpreted with caution in light of the fact that only RCs included a comma at this point in the sentence.
SOURCES OF RELATIVE CLAUSE PROCESSING DIFFICULTY
37
Discussion This experiment investigated Russian pronominal RCs to further examine the effects of expectations and memory on the incremental processing of RC sentences as well as the possible influence of similarity-based interference on the late-stage comprehension of these sentences. The pattern of results indicated clear expectation-based effects early in the embedded clause. Specifically, at the beginning of this clause, the accusative-case relativizer in ORCs yielded especially long RTs. This effect is predicted under expectation-based models due to the general frequency disparity between ORCs and SRCs. Expectation-based effects were also found at the embedded pronominal NP. As in Experiment 1, the RTs in this region patterned with the wordorder frequency/preference profiles for the embedded clauses in the test sentences. In particular, especially long processing times were revealed only for SRC-CC controls -- the only sentence condition with the dispreferred word order. Also as in the previous experiment, there was no processing time difference between SRC and ORC sentences at or immediately after the embedded verb. This was the case at the main-clause verb (legla ‘lay’) as well: A post-hoc analysis of RTs in this region revealed no reliable difference between the RC types (SRC: 722 ms vs. ORC: 718 ms; both F’s < 1). This experiment thus again revealed comparable processing times for SRCs and ORCs when the number of NPs available for integration at the RC verb was held constant. This pattern of results is of course inconsistent with structure-based models of incremental RC processing costs, which would predict particular difficulty for ORCs even under these conditions. It is important to note, however, that in contrast to Experiment 1, there were no clear indications of integration costs at the RC verb. That is, SRC and ORC sentences did not differ reliably from their CC controls in this region. The only suggestion of such costs came in the form of longer RTs for RC sentences
SOURCES OF RELATIVE CLAUSE PROCESSING DIFFICULTY
38
in the region immediately following the embedded-clause verb. As noted above, however, this finding cannot be unambiguously interpreted as a delayed integration effect in light of the fact that this region ended with a comma in RCs but not in CCs. That integration costs at the RC verb were eliminated -- or at very least, attenuated -- is in line with the idea that it is easier to retrieve/integrate a modified full NP over a pronoun/dissimilar NP, as in the present experiment, than over another full NP, as in Experiment 1. Under the DLT, integration over a pronoun is easier because its referent is present in the discourse and is thus more accessible than the referent of a full NP. According to the similarity-based interference and cue-based retrieval accounts, the reduced processing times at these verbs is due to the dissimilarity of the two integrated NPs, where one is a full NP, while the other is a pronoun. The comprehension questions also revealed symmetrical processing costs for SRCs and ORCs. That is, unlike in Experiment 1, ORCs did not yield especially high ERs on these questions. This indicates that the late-stage comprehension difficulty for full-NP ORCs observed in the previous experiment cannot be attributed to their structural properties alone, but rather that it depends on the interaction of similarity-based interference with these properties. Experiment 3 Contrary to structure-based models of incremental RC processing, neither of the SPR experiments reported above revealed an ORC penalty -- that is, particularly large processing costs for ORCs -- under online comprehension measures. Rather, both experiments showed processing difficulty related to structural frequency early in the embedded clause. Experiment 1, which examined full-NP RCs, also revealed clear processing costs at the RC verb. Crucially, this latter effect was comparable for SRCs and ORCs when the same number of arguments was available for integration at this verb. Taken together, these findings support a model under which
SOURCES OF RELATIVE CLAUSE PROCESSING DIFFICULTY
39
both syntactic expectations and memory-based integration contribute to incremental RC processing difficulty. Experiment 3 used eye tracking to investigate the online comprehension of the sentence type that provided the clearest indications of these separable processing costs -Russian full-NP RCs. The use of eye tracking is important for several reasons. First, this method allows for a number of fine-grained measures of online sentence processing related eye movements during normal reading. This contrasts with SPR, which generates a single RT measure as participants push a button to advance through the text one word/region at a time. This is obviously a newlylearned skill that allows for participant-specific button-pushing criteria and strategies. It has been argued that these task differences (among others) make eye tracking potentially more sensitive to incremental sentence processing difficulty than SPR (Rayner, 1998; Rayner et al., 1989; Witzel et al., 2012). This is especially important in light of the results of Experiment 1, which revealed particular comprehension difficulty for full-NP ORCs on the end-of-the-sentence questions, but not under the online reading measure. One of the goals of the present experiment was to determine whether this comprehension difficulty reflects incremental processing costs that are not readily captured under SPR. Eye-tracking measures have also been associated with different stages and types of processing. This is important for the present study because an eye-tracking investigation into English RCs by Staub (2010; see also, Staub et al., 2017), revealed not just quantitative differences in the processing costs for SRCs and ORCs, but also qualitatively different indications of processing difficulty for ORCs early and late in the clause. The effects early in the clause were characterized by a high rate of first-pass regressive eye movements, while those late in the clause, at the RC verb, were revealed as longer times on initial-pass reading measures that
SOURCES OF RELATIVE CLAUSE PROCESSING DIFFICULTY
40
were not associated with regressions. The pattern of eye movements at the beginning of the clause was taken to reflect expectation-based processing difficulty, whereas the longer first-pass times at the RC verb were interpreted to reflect inflated processing times related to memorybased integration. These different patterns were thus taken to indicate independent sources of processing difficulty for ORCs at these points. Another aim of the present experiment was to explore whether effects early and late in Russian RCs are also manifested in these qualitatively different reading patterns. In order to investigate these questions, the experiment tested the same items as in Experiment 1 -- full-NP RCs and their corresponding CC controls (Table 4). Although the predictions are comparable to those of Experiment 1, several relate specifically to eye-tracking measures. First, processing difficulty was predicted for ORC accusative-case relativizers due to the low frequency of Russian ORCs. It is important to note that there were suggestions of this expectation-based effect in Experiment 2, but not in Experiment 1. One reason for this difference might be that SPR does not allow for regressive eye movements and thus is not sensitive enough to consistently index expectation costs triggered at the ORC relativizer. Expectation-based effects were also predicted at the first embedded NP in sentences with dispreferred embeddedclause word orders -- i.e., in SRCs, SRC-CC controls, and ORCs. These effects were observed as longer RTs in Experiment 1. In this eye-tracking experiment, however, if these effects reflect expectation-based processing difficulty, they should also be revealed in the form of high proportions of first-pass regressive eye movements. A different pattern of results was predicted at/after the embedded verb. If processing costs in these regions relate only to the number of NPs that need to be held in memory before integration at the verb, they should be comparable for both SRCs and ORCs, as in Experiment 1.
SOURCES OF RELATIVE CLAUSE PROCESSING DIFFICULTY
41
Consistent with Staub (2010), these memory-based integration costs should be revealed primarily in a form of longer first-pass times. On the other hand, if there is an ORC penalty during the incremental processing of these regions that was not revealed in SPR, these sentences should show processing difficulty compared to SRCs. Finally, the present experiment used the same comprehension questions as in Experiment 1. As noted above, in Experiment 1, ERs on these questions were especially high for ORCs. The present experiment investigated whether this effect relates specifically to the task demands of SPR -- an online reading method that might be particularly taxing on memory because it does not allow the reader to return to previous parts of the sentence. If this effect is not task-specific, but rather relates to more general comprehension difficulties, there should be especially high ERs for ORCs in this eye-tracking experiment as well. Method Participants. Forty adult native speakers of Russian, who were visiting or living in the United States, participated in the experiment at the University of Texas at Arlington for monetary compensation. On language background screening questions, all participants indicated that Russian was their native and dominant language. Materials and design. The materials and design were the same as in Experiment 1. Procedure. At the beginning of the session, participants completed a language background questionnaire. They were then provided with task instructions in Russian. During the reading task, sentences were presented on two lines of text (with standard punctuation and capitalization) in 13-point Courier font on a 19-inch CRT monitor. As in the SPR experiments, the line split was at the same word in the experimental conditions -- after spillover region 3. The screen was located approximately 60 cm from subjects’ eyes, and a chin rest was used to
SOURCES OF RELATIVE CLAUSE PROCESSING DIFFICULTY
42
minimize head movements. Participants were asked to read at their natural pace and answer comprehension questions as accurately as possible. Eye movements were recorded with an EyeLink 1000 (SR Research) eye tracker, which monitored the movement of the right eye (though viewing was binocular) at a sampling rate of 1000 Hz. At the beginning of each trial, a calibration dot appeared on the left side of the screen. The participants were instructed to look at this dot, which allowed the experimenter to assess whether the eye tracker was correctly calibrated. The experimenter then displayed the sentence. Participants read the sentence silently and pressed a button on a gamepad when finished. There was 15-second timeout for each sentence. After the participant finished reading the sentence, it disappeared from the screen, and a YES/NO comprehension question was displayed. The right button on the gamepad was used for YES responses, and the left button for NO responses. Feedback in Russian was provided after each response. The task began with 12 practice items. Experimental and filler items were then presented in a different random order for each participant in sets of 12, with a short break after each set. The eye tracker was calibrated before each set and then recalibrated as necessary. Data analysis. Four eye-tracking measures were analyzed: first fixation duration, firstpass time, regression-path duration, and first-pass regression proportion. First fixation duration refers to the duration of the initial fixation in a region, provided that the region was not skipped on the first pass through the sentence. First-pass time is the sum of the fixation durations in a region after entering that region until leaving it in any direction (again, not counting cases in which the region was skipped). Regression-path duration is the sum of all fixation durations after entering a region until leaving it to the right. This measure includes regressive fixations to previous regions. First-pass regression proportion refers to the proportion of trials on which the
SOURCES OF RELATIVE CLAUSE PROCESSING DIFFICULTY
43
reader made a regressive eye movement from a given region to a previous region during the initial pass through the sentence. The datasets from three participants who had 10 or more timeouts on the experimental items were excluded from the analysis. The data from five participants with overall ERs on comprehension questions of 20% or higher were also removed. The data for the remaining 32 participants (overall ER: M = 12.78%, SD = 3.55) were analyzed as follows: Trials on which participants reached the timeout or on which there was major tracker loss were excluded from the analysis (3.58% of the trials). Before analyzing the eye-movement data, fixations that were less than 80 ms in duration and within one character of the previous or subsequent fixation were combined with this neighboring fixation. Any remaining fixations that were shorter than 80 ms or longer than 1000 ms were deleted (2.67% of the data). The analyses of the reading measures were conducted over the same regions as in Experiments 1 and 2, and again included data from both correctly-answered and incorrectly-answered trials. The means for these measures are presented by condition and region in Table 8. The statistical analysis procedures followed those of Experiments 1 and 2. The results of these analyses are presented in Table 9. The report below highlights the statistically reliable results of these analyses as well as informative marginally significant trends. The results of relevant tests of simple effects are also reported. Figure 5 presents mean first-pass times, while Figure 6 shows mean first-pass regression proportions. < Insert Tables 8 and 9 around here> < Insert Figures 5 and 6 around here> Results Comprehension accuracy. The mean ERs on the comprehension questions were as follows: SRC: 18.64% (SEM: 1.77), SRC-CC control: 22.34% (SEM: 1.54), ORC: 24.32%
SOURCES OF RELATIVE CLAUSE PROCESSING DIFFICULTY
44
(SEM: 2.22), ORC-CC control: 15.60% (SEM: 1.99). Although neither of the main effects was significant (all F’s < 1.4), there was a significant interaction of sentence type and clause type (F1 (1, 28) = 8.35 , p < .01, F2 (1, 44) = 8.06, p < .01), indicating a particularly high ER rate for ORCs. Indeed, while ORCs had a significantly higher ER than their CC controls (F1 (1, 28) = 5.63, p < .05, F2 (1, 44) = 6.70, p < .05), there was a non-statistically reliable trend in the opposite direction for SRCs and their controls (F1 (1, 28) = 2.53, p = .12, F2 (1, 44) = 1.75, p = .19). Reading measures. Consistent with the results of the previous experiments, RCs were more difficult to process than CCs at the relativizer/complementizer (Rel/Comp) and in the immediately following region (Spill R1) across reading measures. In this immediately following region, there was also a trend suggesting that ORCs and their CC controls had more first-pass regressions than SRCs and their controls. This effect was driven mainly by ORCs, which induced more first-pass regressions than any other condition (vs. SRCs: F1 (1, 28) = 4.03, p = .05, F2 (1, 44) = 5.31, p < .05; vs. SRC-CC controls: F1 (1, 28) = 9.19, p < .01, F2 (1, 44) = 13.87, p < .001; vs. ORC-CC controls: F1 (1, 28) = 7.50, p < .05, F2 (1, 44) = 11.25, p < .01). At the embedded NP (RC NP/CC NP1), the pattern of results largely corresponded to the word-order frequencies/preferences for the test sentences, with particularly robust indications of processing difficulty for SRCs. Specifically, SRCs and their CC controls -- both of which had dispreferred, non-canonical embedded-clause word orders -- were read slower than their ORC counterparts under first-pass time and regression-path duration. Under regression-path duration, there was also a marginally significant interaction suggesting particularly long reading times for SRC sentences. Indeed, pairwise comparisons revealed longer regression-path durations for SRCs than for any other condition (vs. SRC-CC controls: F1 (1, 28) = 3.46, p = .07, F2 (1, 44) =
SOURCES OF RELATIVE CLAUSE PROCESSING DIFFICULTY
45
7.35, p < .01); vs. ORCs: F1 (1, 28) = 11.33, p < .01, F2 (1, 44) = 28.14, p < .001; vs. ORC-CC controls: F1 (1, 28) = 11.46, p < .01, F2 (1, 44) = 10.44, p < .01). In this region, there was also a significant interaction indicating an especially high incidence of first-pass regressions for SRCs. Pairwise comparisons showed that these SRCs induced more first-pass regressions than any other condition (vs. SRC-CC controls: F1 (1, 28) = 7.85, p < .01, F2 (1, 44) = 15.16, p < .001; vs. ORCs: F1 (1, 28) = 11.98, p < .01, F2 (1, 44) = 13.12, p < .001; vs. ORC-CC controls: F1 (1, 28) = 6.25, p < .05, F2 (1, 44) = 5.64, p < .01). The region immediately following the embedded NP (Spill R2) revealed a similar pattern of results. In this region, there was a significant effect of sentence type under regression-path duration and first-pass regression proportion, indicating processing difficulty for SRCs and their controls compared to their ORC counterparts. These measures also showed a reliable clause type effect, with processing difficulty for RCs relative to their CC controls. There was also a marginally significant interaction under regression-path duration and a significant interaction under first-pass regression proportion. This pattern of results is indicative of processing costs for the sentences with dispreferred embedded-clause word orders -- SRCs, SRC-CC controls, and ORCs. Indeed, compared with ORC-CC controls, which were the only sentences with the preferred word order, these sentence conditions had longer regression-path durations (SRCs vs. ORC-CC controls: F1 (1, 28) = 45.31, p < .001, F2 (1, 44) = 40.15, p < .001; SRC-CC controls vs. ORC-CC controls: F1 (1, 28) = 15.85, p < .001, F2 (1, 44) = 9.01, p < .01) and higher firstpass regression rates (SRCs vs. ORC-CC controls: F1 (1, 28) = 36.43, p < .001, F2 (1, 44) = 25.73, p < .001; SRC-CC controls vs. ORC-CC controls: F1 (1, 28) = 7.03 , p < .05, F2 (1, 44) = 4.99 , p < .05). In the case of the pairwise comparisons between ORCs and ORC-CC controls, however, these differences were only non-statistically reliable trends (regression-path duration:
SOURCES OF RELATIVE CLAUSE PROCESSING DIFFICULTY
46
F1 (1, 28) = 2.17, p = .15, F2 (1, 44) = 4.11, p < .05; first-pass regression proportion: both F’s < 1). At the embedded verb (RC/CC Verb), processing difficulty for SRCs and their CC controls continued under regression-path duration and first-pass regression proportion. There were however no indications of integration costs for SRC/ORC sentences in this region. Rather, compared to their RC counterparts, CC sentences had marginally longer first fixation durations as well as significantly longer first-pass times and regression-path durations. These CC sentences also induced more first-pass regressive fixations. Processing difficulty for RC sentences was however observed in the region immediately after the embedded verb (Spill R3). In this region, RCs had significantly longer first-pass times and regression-path durations than their CC counterparts. Crucially, however, there were no differences between SRCs and ORCs under these measures (first-pass time: both F’s < 1.7; regression-path duration: both F’s < 1.5). In fact, if anything, this processing difficulty appeared to be more robust for SRC sentences. Indeed, for first-pass times, the difference between SRCs and their controls (F1 (1, 28) = 14.29, p < .001, F2 (1, 44) = 26.95, p < .001) was larger than between ORCs and their controls (F1 (1, 28) = 4.00, p = .06, F2 (1, 44) = 3.79, p = .06), which was also indicated by a marginally significant interaction. Similarly, for regression-path durations, the difference between SRCs and their CC controls (F1 (1, 28) = 14.23, p < .001, F2 (1, 44) = 13.35, p < .001) was again larger than between ORCs and their controls (F1 (1, 28) = 1.10, p = .30, F2 (1, 44) = 2.78, p = .10). As mentioned in the discussion of the previous experiments, the RC processing difficulty in this region should be interpreted with caution in light of the fact that only RCs included a comma at this point in the sentence.
SOURCES OF RELATIVE CLAUSE PROCESSING DIFFICULTY
47
Discussion This experiment used eye tracking to shed light on the online comprehension of Russian RC sentences. As in the previous SPR experiments, there were clear processing costs related to structural frequency early in the embedded clause. The first indication of these costs was found immediately after the relativizer in ORC sentences. In this region, ORCs yielded a higher incidence of first-pass regressions than any other sentence type. Given the low frequency of Russian ORCs, this effect is consistent with expectation-based processing costs at this point. It is important to recall that there were only suggestions of this processing difficulty for ORCs in the SPR experiments. As discussed above, this disparity is likely due to the reading measures that are available in eye tracking, but not in SPR. That is, insofar as measures referencing regressive eye movements are particularly sensitive to expectation-based processing costs (Staub, 2010), it is possible that eye tracking is especially well-suited to indexing online comprehension difficulty at and after the ORC relativizer. As in the previous SPR experiments, processing difficulty was also revealed at and after the embedded NP in sentences with dispreferred embedded-clause word orders. Specifically, in these regions, there were clear processing costs for SRCs and their CC controls. Similar to the effects after the relativizer, these costs were shown most clearly under first-pass reading measures that take regressive fixations into consideration -- namely, first-pass regression proportion and regression-path duration. In fact, under these measures, SRC sentences revealed the clearest indications of processing difficulty. It is important to note, however, that unlike in Experiment 1, processing costs were not shown at this NP in ORCs. This disparity suggests that eye-tracking measures that are especially sensitive to expectation-based processing difficulty might be able to tease out degrees of dispreference for certain structures. Indeed, although both
SOURCES OF RELATIVE CLAUSE PROCESSING DIFFICULTY
48
the full-NP SRCs and ORCs in this experiment had dispreferred embedded-clause word orders, SRCs of this type were especially infrequent in the corpus (1.2% of all SRCs) and were rated lower than corresponding ORCs in the acceptability rating experiment (non-canonical SRCs: 3.14 vs. canonical ORCs: 3.73; F1 = 38.74, p < .001, F2 = 78.92 p < .001). In contrast to the SPR version of this experiment (Experiment 1) as well as to the findings from eye-tracking investigations into English RCs (Staub, 2010; Staub et al., 2017), there were no indications of integration costs at the RC verb under any measure. In fact, the findings indicated exactly the opposite -- under all measures except first fixation duration, SRC and ORC sentences were easier to read than their CC counterparts at this verb. Although this was not anticipated in light of the clear processing costs at the RC verb in Experiment 1, it is important to note that this pattern of results is predicted under some expectation-based models of RC processing (Hale, 2001; Levy, 2008). As mentioned in the overview of RC processing models, these accounts predict that after receiving both arguments of an RC, there should be a strong expectation for the verb, and thus facilitated processing at this word. This expectation would appear to apply to English ORCs (The reporter(arg1) that the senator(arg2)...) as well as to the Russian SRCs and ORCs in the present experiment (Hozjajka(arg1), kotoraja/kotoruju starušku/staruška(arg2)....). Although the reason for this disparity between the SPR and eyetracking experiments in the present study is not entirely clear, one possibility is that it relates to task-specific processing demands and reading strategies (for more on this issue, see the General Discussion). Despite this difference, it is important to emphasize that as in Experiments 1 and 2, there were no processing differences between SRCs and ORCs at the embedded verb. That is, this experiment again revealed that SRC and ORC verbs were processed comparably when the same
SOURCES OF RELATIVE CLAUSE PROCESSING DIFFICULTY
49
number of NPs was available for integration. This was also the case in the region immediately after the verb. As in the SPR experiments, the reading patterns in this region revealed processing difficulty for RC sentences relative to their CC controls. Although this finding should be interpreted with caution in light of the punctuation differences between the RC and CC sentences, it is nevertheless important to note that this region did not suggest a delayed processing cost for ORCs in particular. In fact, if anything, as in Experiment 2, the processing difficulty for RC sentences in this region appeared to be larger for SRCs. A post-hoc analysis of the reading measures at the main-clause verb (legla ‘lay’) also revealed no reliable differences between the RC types (first-fixation duration -- SRC: 205 ms vs. ORC: 202 ms; first-pass time -SRC: 342 ms vs. ORC: 339 ms; regression-path duration -- SRC: 456 ms vs. ORC: 491 ms; all F's < 1.1). There was only a trend suggesting a higher proportion of first-pass regressions for ORCs in this region (SRC: .04 vs. ORC: .07; F1 (1, 28) = 4.38, p < .05, F2 (1, 44) = 3.63, p = .06). In light of this trend, however, one might suggest that eye-tracking measures that tap into later reading processes would reveal clearer indications of an ORC penalty. In order to examine this possibility, a post-hoc analysis of second-pass times was conducted. This measure was calculated as the sum of all regressive fixation durations in each region (see e.g., Clifton, Traxler, Mohamed, Williams, Morris, & Rayner, 2003; Price, Witzel, & Witzel, 2015). The results of this analysis largely mirrored those reported above for first-pass reading measures: Second-pass times were particularly long for SRCs in the first four regions of the embedded clause (Rel/Comp, Spill R1, RC NP/CC NP1, Spill R2); whereas at and immediately after the embedded verb (RC/CC Verb, Spill R3), CCs had longer second-pass times than RCs. Crucially, there were no indications of inflated second-pass times for ORCs at or immediately after the
SOURCES OF RELATIVE CLAUSE PROCESSING DIFFICULTY
50
embedded verb (SRCs vs. ORCs: all F's < 1) or at the main-clause verb (SRCs vs. ORCs: F1 = 1.33; F2 = 2.39). Finally, the findings for the end-of-the-sentence comprehension questions also matched well with those of Experiment 1. As in this SPR experiment, the ERs on these questions were particularly high for ORCs. This result is interesting in light of the fact that this comprehension difficulty occurred even though ORC sentences were presented in their entirety. This indicates that this effect cannot be attributed to memory demands imposed by the region-by-region presentation of sentences in SPR. Rather, it appears to reflect more general comprehension problems for Russian ORCs. It is also important to emphasize that as in Experiment 1, these comprehension problems occurred in the absence of marked incremental processing difficulty for ORCs. As detailed above, ORCs were more difficult to process than SRCs only in the region immediately following the relativizer, after which point there were larger processing costs for SRC sentences. This suggests that the comprehension difficulty for full-NP ORCs that is reflected in these higher ERs relates to late stages of processing. General Discussion This study used SPR (Experiments 1 and 2) and eye tracking (Experiment 3) to investigate potential sources of processing difficulty in Russian RC sentences and to test among competing models of online processing costs for these sentence types. This was done by comparing reading patterns on SRCs and ORCs in which an NP argument intervened between the modified noun and the RC verb. In Experiments 1 and 3, this intervening argument was a full NP, while in Experiment 2, it was a pronominal NP. A corpus analysis and an acceptability rating experiment established that these full-NP and pronominal RCs are associated with different word-order frequency/preference profiles. Using these different NP types, while
SOURCES OF RELATIVE CLAUSE PROCESSING DIFFICULTY
51
holding lexical material in the same linear order across conditions, made it possible to tease apart the potential influences of structural asymmetries, syntactic expectations, memory-based integration, and similarity-based interference on incremental RC processing. In order to further clarify the nature of RC processing difficulty, SRCs and ORCs were compared with each other as well as with corresponding CC controls, and spillover regions were included between critical regions. Comprehension questions that targeted the embedded clause were also included to examine the overall comprehension of these sentence types. As in previous studies of English (Gordon & Hendrick, 2005; Reali & Christiansen, 2007) and Russian (Levy et al., 2013), the corpus analysis showed that ORCs occur less frequently overall than SRCs. Furthermore, in both the corpus analysis and acceptability ratings, the word-order frequency/preference profiles for SRCs and ORCs were found to depend on the embedded NP type (full NP vs. pronoun). For full-NP RCs, the more frequent and preferred word orders were the canonical VO order for SRCs and the non-canonical VS order for ORCs. However, this pattern was essentially reversed for pronominal RCs. For pronominal ORCs, the canonical SV order was more frequent/preferred, while for pronominal SRCs, there was no clear word-order preference. Interestingly, these preferences do not apply to Russian embedded clauses generally. Rather, in embedded CCs, the canonical SVO word order was more frequent and was rated higher than the non-canonical OVS order, regardless of the embedded NP type. These frequency/preference profiles are of course interesting in and of themselves, especially with respect to the preference for the non-canonical word order in full-NP ORCs. However, more importantly for the hypotheses under investigation, they allowed for clear predictions related to expectation-based effects during incremental sentence processing. This would appear to be the case even though the test sentences in the reading experiments were not taken directly from the
SOURCES OF RELATIVE CLAUSE PROCESSING DIFFICULTY
52
corpus and included more modifying information than the items in the acceptability rating experiment. With respect to the latter disparity, this additional modifying information was considered necessary in order to create buffer regions between the regions of interest in the reading experiments. This information thus served an important purpose in terms of helping to determine the loci of processing difficulty in reading tasks that are notorious for spillover effects and delayed processing costs (i.e., processing costs "downstream" of their predicted trigger point). It is also important to emphasize that this modifying information was included across the experimental conditions for each item. Therefore, this information -- and presumably any influence it had on comprehension -- was constant across conditions. In the online reading experiments, the first indications of expectation-based effects were found at the relativizer. Specifically, ORC accusative-case relativizers were more difficult to process than SRC nominative-case relativizers. In light of the frequency disparity between Russian SRCs and ORCs, this effect is consistent with expectation-based processing difficulty. However, it is important to note several caveats to this interpretation. First, especially large processing costs for ORC relativizers were not obtained in all three experiments. Although the eye-tracking experiment revealed clear processing difficulty for ORC relativizers, there were only suggestions of this effect in SPR -- and in Experiment 2 in particular. As discussed above, this difference might be connected to the reading measures in these tasks. In the eye-tracking experiment, processing difficulty immediately after the ORC relativizer was revealed primarily in the form of a greater incidence of first-pass regressions. It is possible that SPR did not consistently index this processing cost because regressions are not permitted in this task. It is also important to point out that this effect at the ORC relativizer could have alternative
SOURCES OF RELATIVE CLAUSE PROCESSING DIFFICULTY
53
explanations. For instance, it could be attributed to structural reanalysis if there were a particular preference for analyzing the modified NP as the RC subject (see e.g., Traxler et al., 2002, 2005). Clearer expectation-based effects were observed at the embedded NP, which indicated the word order in the embedded clause. In this region, the SPR experiments revealed longer RTs for sentences with embedded-clause word orders that were less frequent in the corpus and dispreferred in the acceptability ratings. For the embedded clauses with full NPs in Experiment 1, there were inflated RTs for both SRCs and ORCs, as well as for the non-canonically ordered SRC-CC controls. For the embedded clauses with pronominal NPs in Experiment 2, the RT patterns changed in accordance with the different word-order frequency/preference profiles for these clauses. In this experiment, there were processing costs at the embedded NP only for SRCCC controls -- the only sentence type with a dispreferred embedded-clause word order. The eyetracking experiment revealed a pattern of results that was slightly different from that of its SPR counterpart, Experiment 1. While the pattern of processing difficulty at and after embedded NP again largely corresponded with the frequency/preference profiles for the test sentences, these processing costs appeared to be larger for SRCs and their controls than for ORCs. As discussed above, this disparity might be attributed to the sensitivity of eye tracking to different levels of dispreference. It is important to emphasize that these expectation-based effects at the embedded NP resist many of the alternative explanations that have been put forward for comparable effects in the processing of English RC sentences. First, since this study tested Russian RCs with casemarked relativizers, there was no extraction-type ambiguity at the onset of the RC. This ambiguity is relevant to the processing of many English RCs, in which the relativizer that can initiate either an SRC or ORC. In such sentences, the presence of an NP before the RC verb
SOURCES OF RELATIVE CLAUSE PROCESSING DIFFICULTY
54
provides disambiguating information indicating that the clause is an ORC, which can trigger structural reanalysis under some models (e.g., Clifton & Frazier, 1989; Traxler et al., 2002, 2005). Because no such ambiguity was involved in the Russian RC sentences in the present study, the processing difficulty observed at the embedded NP cannot be attributed to structural reanalysis of this type. Processing difficulty at the embedded NP in English ORCs has also been explained in terms of encoding costs (Gordon & Lowder, 2012). Under this account, these costs are incurred because English ORCs require two NP arguments to be encoded before the integrating RC verb. In the present study, two NP arguments appeared before the RC verb in both SRCs and ORCs. If processing difficulty at the embedded NP were due to encoding, this difficulty should have been observed only for RC sentences, and it should have been comparable for SRCs and ORCs. However, processing difficulty at this NP was not unique to RC sentences. Rather, across all three experiments, there were also processing disruptions at this point for SRCCC controls, which had the dispreferred non-canonical OVS word order in the embedded clause. Recall also that in the eye-tracking experiment, full-NP SRC sentences, which had the particularly dispreferred non-canonical OV word order in the RC, showed the strongest indications of processing difficulty at and immediately after the embedded NP. Although neither of these findings can be explained under an encoding-based account, they are handled straightforwardly under expectation-based processing models. Taken together, the effects related to structural frequency at the beginning of the embedded clause in the reading experiments reported above indicate a crucial role for syntactic expectations in the incremental processing of RC sentences. A more complex set of findings was obtained later in the clause, at the RC verb. One constant among these findings was that none of the experiments indicated that ORCs were more
SOURCES OF RELATIVE CLAUSE PROCESSING DIFFICULTY
55
difficult to process than SRCs. This pattern of results poses a challenge to models that attribute online RC processing difficulty to SRC/ORC structural asymmetries. However, it is important to note that the processing costs for SRC/ORC verbs were not the same across experiments. In Experiment 1, which used SPR to test full-NP RCs, there was clear processing difficulty at the RC verb that was comparable for both SRCs and ORCs. That is, there were similar processing costs for SRC and ORC verbs when the same number of NP arguments was available for integration at this point. These findings thus correspond to the predictions of memory-based models of RC processing difficulty, such as the DLT (Gibson, 1998, 2000), similarity-based interference accounts (Gordon et al., 2001, 2002, 2004, 2006), and cue-based retrieval accounts (Lewis & Vasishth, 2005; Van Dyke & Lewis, 2003; Van Dyke & McElree, 2006). Experiments 2 and 3, however, did not reveal clear processing costs at the RC verb. This pattern of results was expected in Experiment 2 in light of the fact that integration of the modified full-NP occurred over a dissimilar, and potentially more accessible, embedded pronominal NP. However, this pattern was not anticipated for Experiment 3 -- an eye-tracking replication of Experiment 1. What accounts for this difference? First, it important to note that eye-tracking investigations into English RC processing (Staub, 2010; Staub et al. 2017) have yielded more robust indications of processing difficulty early in the RC than at the RC verb. The present study is in line with these findings in that it also revealed processing difficulty early in the embedded clause across experiments, but less consistent effects at the RC verb. But this observation alone cannot account for the difference between Experiments 1 and 3. After all, Experiment 3 showed a difference between RCs and CCs, but in the opposite direction of Experiment 1. That is, the processing of RC sentences appeared to be facilitated compared to their CC counterparts. As discussed above, this pattern of results is consistent with expectation-based models that posit that
SOURCES OF RELATIVE CLAUSE PROCESSING DIFFICULTY
56
the presence of both arguments early in the RC creates a strong expectation for the RC verb. This then begs the question as to why Experiment 1 yielded effects that are consistent with memorybased accounts of RC processing difficulty, while its eye-tracking replication revealed results that support expectation-based models. One possibility relates to task-specific processing demands and reading strategies. The salient difference between these experiments was that Experiment 1 used SPR, while Experiment 3 used eye tracking. Although these tasks differ in a number of important ways (for a detailed comparison, see e.g., Witzel et al., 2012), one crucial difference relates to the way in which items are presented. In eye tracking, sentences are displayed in their entirety. This allows readers to return to previous regions of the text if they experience processing difficulty or if they would like to re-examine any of its content. In SPR, on the other hand, sentences are presented one word/phrase at a time, in a strictly serial manner. In light of these differences, it is likely the case that SPR places greater demands on working memory resources. After all, if the reader experiences processing difficulty during this task, there is no recourse other than to consult the memory representation developed up to that point. It is perhaps not surprising then that effects related to memory processes during incremental sentence processing would be observed most readily in a task that requires greater working memory resources. The end-of-the-sentence comprehension questions yielded yet a different pattern of results. Specifically, the ERs on these questions -- which targeted the content of the embedded clause -- revealed comprehension difficulty for ORCs compared to SRCs in full-NP RC sentences (Experiments 1, 3), but not in pronominal RC sentences (Experiment 2). That is, although SRC/ORC structural asymmetries did not appear to affect the incremental processing of Russian RC sentences, the influence of these structural differences was observed in their overall
SOURCES OF RELATIVE CLAUSE PROCESSING DIFFICULTY
57
comprehension, presumably due to late-stage comprehension difficulty for ORCs. Although the present study did not test among different explanations for these effects, it is possible that in line with structure-based accounts, it is more difficult either to organize arguments into their correct positions in ORCs, or to retrieve information related to these positions when responding to comprehension questions. Importantly, this appears to be the case only in full-NP ORCs, in which the modified NP and RC NP are similar and therefore arguably more difficult to distinguish in memory. The structural properties of ORCs that gave rise to these effects are also not entirely clear. For instance, they could be due to (a) the more deeply embedded position of the extracted object (Lin & Bever, 2006), (b) the change in the perspective of the subject in each clause (MacWhinney & Pleh, 1998), or (c) the non-canonical appearance of the object before the subject (Holmes & O’Regan, 1981; Townsend & Bever, 2001). Further research is necessary in order to understand the nature of this late-stage comprehension difficulty for ORCs -- and in particular, its interaction with similarity-based interference. The findings for the comprehension questions also point to a potential criticism of this study. Specifically, it could be argued that it is difficult to draw firm conclusions from response/reading times when the relevant experimental conditions -- in this case, SRCs and ORCs -- differ in terms of accuracy. Unfortunately, this is a problem that often occurs in sentence comprehension research and in language processing research more generally. And it would appear to be an issue that is especially difficult to avoid in studies comparing the online comprehension of SRCs and ORCs. Indeed, such studies regularly find higher ERs on end-ofthe-sentence comprehension questions for ORCs than for SRCs (see e.g., Gennari & MacDonald, 2008; Gordon et al., 2001, 2004; King & Just, 1991, among others). In order to investigate this issue, we conducted follow-up analyses for each of the reading experiments in which we
SOURCES OF RELATIVE CLAUSE PROCESSING DIFFICULTY
58
included the data only from trials on which the comprehension question was answered correctly. The response/reading time patterns for this subset of the trials was virtually identical to the pattern for the complete set. In sum, the present study indicates that processing difficulty for Russian RCs, and for RCs in general, cannot be adequately explained under any one structure-based, expectationbased, or memory-based model. Rather, consistent with hybrid models, this processing difficulty was shown to relate to several independent sources. This study also showed that these sources affect different stages of processing and are at least partially task-dependent. First, processing difficulty related to structural frequency was observed early in the RC under both SPR and eye tracking, indicating a crucial role for syntactic expectations in incremental RC processing. These expectations, however, cannot account for all of the observed processing difficulty in RC sentences. In the SPR task, there were also clear processing costs at the RC verb, which were comparable for SRCs and ORCs when the number of arguments available for integration was held constant. This suggests that memory-based integration processes also influence online RC processing, but that these effects might be most apparent in reading tasks that involve additional working memory demands. Crucially, there were no indications -- under SPR or eye tracking -that the incremental processing of RCs was affected by SRC/ORC structural asymmetries. Thus, online processing difficulty in these sentences was not influenced by ORC structure per se, but rather only by factors that are often associated with this structure -- namely, relatively low structural frequency and relatively long integration distance. There was however an indication of an ORC penalty on the end-of-the-sentence comprehension questions, but only for full-NP RCs. As argued above, this effect appears to reflect the combined influence of RC structural asymmetries and similarity-based interference on late-stage comprehension processes. In light of
SOURCES OF RELATIVE CLAUSE PROCESSING DIFFICULTY
59
these results, it is perhaps not surprising that the literature to date has provided empirical support for very different models of RC processing difficulty. The present study suggests that one of the reasons for these conflicting findings is that RC processing studies have tapped into different stages of the comprehension process, with tasks that have varied in their demands on working memory. More research is necessary in order to examine the influence of experimental tasks on the processing of RC sentences and of complex sentences generally.
SOURCES OF RELATIVE CLAUSE PROCESSING DIFFICULTY
60
Acknowledgements We are grateful to Naoko Witzel and Joey Sabbagh for their valuable input on this project. We would also like to thank Inga Khelm for her assistance in developing the experimental items, Sergey Iakovlev for help with the corpus analysis, and all the Russian native speakers who participated in the experiments. This work was supported by the Research Funding program in the UT Arlington College of Liberal Arts and the Jerold A. Edmondson Research Endowment in Linguistics Grant. Earlier versions of this research were presented at the CUNY Conference on Human Sentence Processing,the St. Petersburg Winter Symposium on Experimental Studies of Speech & Language, and the Annual Conference of the German Linguistic Society. Finally, we are thankful for the thorough and helpful comments from Adrian Staub, Roger Levy, and an anonymous reviewer.
SOURCES OF RELATIVE CLAUSE PROCESSING DIFFICULTY
61
References Bailyn, J. F. (2012). The syntax of Russian. Cambridge: Cambridge University Press. Clifton, C., & Frazier, L. (1989). Comprehending sentences with long distance dependencies. In G. N. Carlson & M. K. Tanenhaus (Eds.), Linguistic structure in language processing (pp. 273-317). Dordrecht: Kluwer. Clifton, C., Traxler, M. J., Mohamed, M. T., Williams, R. S., Morris, R. K., & Rayner, K. (2003). The use of thematic role information in parsing: Syntactic processing autonomy revisited. Journal of Memory and Language, 49, 317-334. Cousineau, D. (2005). Confidence intervals in within-subject designs: A simpler solution to Loftus and Masson’s method. Tutorials in Quantitative Methods for Psychology, 1, 42-45. Forster, K. I., & Forster, J. C. (2003). DMDX: A Windows display program with millisecond accuracy. Behavior Research Methods, Instruments, & Computers, 35, 116-124. Forster, K. I., Guerrera, C., & Elliot, L. (2009). The maze task: Measuring forced incremental sentence processing time. Behavior Research Methods, 41, 163-171. Fox, B. A., & Thompson, S. A. (1990). A discourse explanation of the grammar of relative clauses in English conversation. Language, 66, 297-316. Gennari, S. P., & MacDonald, M. C. (2008). Semantic indeterminacy in object relative clauses. Journal of Memory and Language, 58, 161-187. Gibson, E. (1998). Linguistic complexity: Locality of syntactic dependencies. Cognition, 68, 1-76. Gibson, E. (2000). The dependency locality theory: a distance-based theory of linguistic complexity. In A. Marantz, Y. Miyashita, & W. O’Neil (Eds.), Image, language, brain (pp. 95126). Cambridge, MA: MIT Press.
SOURCES OF RELATIVE CLAUSE PROCESSING DIFFICULTY
62
Gordon, P. C., & Hendrick, R. (2005). Relativization, ergativity, and corpus frequency. Linguistic Inquiry, 36, 456-463. Gordon, P. C., Hendrick, R., & Johnson, M. (2001). Memory interference during language processing. Journal of Experimental Psychology: Learning, Memory, & Cognition, 27, 1411-1423. Gordon, P. C., Hendrick, R., & Johnson, M. (2004). Effects of noun phrase type on sentence complexity. Journal of Memory and Language, 51, 97-114. Gordon, P. C., Hendrick, R., Johnson, M., & Lee, Y. (2006). Similarity-based interference during language comprehension: Evidence from eye tracking during reading. Journal of Experimental Psychology: Learning, Memory, and Cognition, 32, 1304-1321. Gordon, P. C., Hendrick, R., & Levine, W. H. (2002). Memory-load interference in syntactic processing. Psychological Science, 13, 425-430. Gordon, P. C., & Lowder, M. W. (2012). Complex sentence processing: A review of theoretical perspectives on the comprehension of relative clauses. Language and Linguistics Compass, 6, 403-415. Grodner, D., & Gibson, E. (2005). Consequences of the serial nature of linguistic input for sentential complexity. Cognitive Science, 29, 261-290. Hale, J. (2001). A probabilistic early parser as a psycholinguistic model. In Proceedings of the North American Chapter of the Association for Computational Linguistics (pp. 159-166). Pittsburgh, PA: Association for Computational Linguistics. Hawkins, J. A. (1999). Processing complexity and filler-gap dependencies across grammars. Language, 75, 244-285.
SOURCES OF RELATIVE CLAUSE PROCESSING DIFFICULTY
63
Heim, I., & Kratzer, A. (1998). Semantics in generative grammar. Malden, MA: Blackwell. Holmes, V. M., & O'Regan, J. K. (1981). Eye fixation patterns during the reading of relative-clause sentences. Journal of Verbal Learning and Verbal Behavior, 20, 417-430. Hsiao, F., & Gibson, E. (2003). Processing relative clauses in Chinese. Cognition, 90, 327. Johnson, M. L., Lowder, M. W., & Gordon, P. C. (2011). The sentence-composition effect: Processing of complex sentences depends on the configuration of common noun phrases versus unusual noun phrases. Journal of Experimental Psychology: General, 140, 707-724. Just, M. A., & Carpenter, P. A. (1980). A theory of reading: From eye fixations to comprehension. Psychological Review, 87, 329-354. Just, M. A., Carpenter, P. A., & Woolley, J. D. (1982). Paradigms and processes in reading comprehension. Journal of Experimental Psychology: General, 111, 228-238. Kennison, S. M., Sieck, J. P., & Briesch, K. A. (2003). Evidence for a late-occurring effect of phoneme repetition during silent reading. Journal of Psycholinguistic Research, 32, 297-312. King, J., & Just, M. A. (1991). Individual differences in syntactic processing: The role of working memory. Journal of Memory and Language, 30, 580-602. Kovtunova, I. I. (1976). Sovremennyj russkij jazyk. Poryadok slov i aktual’noe členenie predloženija: Učeb. posobie dlja studentov ped. in-tov po special'nosti "Russki jaz. i lit." Moscow: Prosveščenie. Krylova, O. A., & Khavronina, S. A. (1988). Word order in Russian. Moscow: Russky Yazyk.
SOURCES OF RELATIVE CLAUSE PROCESSING DIFFICULTY
64
Levy, R. (2008). Expectation-based syntactic comprehension. Cognition, 106, 11261177. Levy, R., Fedorenko, E., & Gibson, E. (2013). The syntactic complexity of Russian relative clauses. Journal of Memory and Language, 69, 461-495. Lewis, R. L., & Vasishth, S. (2005). An activation-based model of sentence processing as skilled memory retrieval. Cognitive Science, 29, 375-419. Lin, C. C., & Bever, T. G. (2006). Subject preference in the processing of relative clauses in Chinese. In D. Baumer, D. Montero, & M. Scanlon (Eds.), Proceedings of the 25th West Coast Conference on Formal Linguistics (pp. 254-260). Somerville, MA: Cascadilla Proceedings Project. MacDonald, M. C., & Christiansen, M. H. (2002). Reassessing working memory: Comment on Just and Carpenter (1992) and Waters and Caplan (1996). Psychological Review, 109, 35-54. MacWhinney, B., & Pleh, C. (1998). The processing of restrictive relative clauses in Hungarian. Cognition, 29, 95-141. Mak, W. M., Vonk, W., & Schriefers, H. (2002). The influence of animacy on relative clause processing. Journal of Memory and Language, 47, 50-68. Mitchell, D. C., & Green, D. W. (1978). The effects of context and content on immediate processing in reading. The Quarterly Journal of Experimental Psychology, 30, 609-636. Miyamoto, E. T., & Nakamura, M. (2003). Subject/object asymmetries in the processing of relative clauses in Japanese. In G. Garding & M. Tsujimura (Eds.), Proceedings of the 22nd West Coast Conference on Formal Linguistics (pp. 342-355). Somerville, MA: Cascadilla Press.
SOURCES OF RELATIVE CLAUSE PROCESSING DIFFICULTY
65
Nakamura, M., & Miyamoto, E. T. (2013). The object before subject bias and the processing of double-gap relative clauses in Japanese. Language and Cognitive Processes, 28, 303-334. O'Grady, W. D. (1997). Syntactic development. Chicago, IL: University of Chicago Press. Price, I. K., Witzel, N., & Witzel, J. (2015). Orthographic and phonological form interference during silent reading. Journal of Experimental Psychology: Learning, Memory, and Cognition, 41, 1628-1647. Rappaport, G. C. (1988). On the relationship between prosodic and syntactic properties of pronouns in the Slavic languages. In A.M. Schenker (Ed.), Linguistics: American Contributions to the Tenth International Congress of Slavists (pp. 301-327). Columbus, OH: Slavica. Reali, F., & Christiansen, M. H. (2007). Processing of relative clauses is made easier by frequency of occurrence. Journal of Memory and Language, 57, 1-23. Rayner, K. (1998). Eye movements in reading and information processing: 20 years of research. Psychological Bulletin, 124, 372-422. Rayner, K., Kambe, G., & Duffy, S. A. (2000). The effect of clause wrap-up on eye movements during reading. The Quarterly Journal of Experimental Psychology, 53, 1061-1080. Rayner, K., Sereno, S. C., Morris, R. K., Schmauder, A. R., & Clifton, C. (1989). Eye movements and on-line language comprehension processes. Language and Cognitive Processes, 4, SI21-SI49. Roland, D., Dick, F., & Elman, J. L. (2007). Frequency of basic English grammatical structures: A corpus analysis. Journal of Memory and Language, 57, 348-379. Russian National Corpus. (2003-2012). Retrieved from http://www.ruscorpora.ru.
SOURCES OF RELATIVE CLAUSE PROCESSING DIFFICULTY
66
Svedova, N. (Ed.). (1980). Russkaja grammatika. (Vol. 2). Moscow: Academy of Sciences, Nauka. Staub, A. (2010). Eye movements and processing difficulty in object relative clauses. Cognition, 116, 71-86. Staub, A., Dillon, B., & Clifton, C. (2017). The matrix verb as a source of comprehension difficulty in object relative sentences. Cognitive Science, 41, 1353-1376. Townsend, D. J., & Bever, T. G. (2001). Sentence comprehension: The integration of habits and rules. Cambridge, MA: MIT Press. Traxler, M. J., Morris, R. K., & Seely, R. E. (2002). Processing subject and object relative clauses: Evidence from eye movements. Journal of Memory and Language, 47, 69-90. Traxler, M. J., Williams R. S., Blozis S. A., & Morris R. K. (2005). Working memory, animacy, and verb class in the processing of relative clauses. Journal of Memory and Language, 53, 204-224. Van Dyke, J. A., & Lewis, R. L. (2003). Distinguishing effects of structure and decay on attachment and repair: A cue-based parsing account of recovery from misanalyzed ambiguities. Journal of Memory and Language, 49, 285-316. Van Dyke, J. A., & McElree, B. (2006). Retrieval interference in sentence comprehension. Journal of Memory and Language, 55, 157-166. Vasishth, S. (2011). Integration and prediction in head-final structures. In H. Yamashita, Y. Hirose, & J. Packard (Eds.), Processing and producing head-final structures (pp. 349-367). Dordrecht: Springer. Vasishth, S., Chen, Z., Li, Q., & Guo, G. (2013). Processing Chinese relative clauses: Evidence for the subject-relative advantage. PLoS ONE, 8, e77006.
SOURCES OF RELATIVE CLAUSE PROCESSING DIFFICULTY
67
Warren, T., & Gibson, E. (2002). The influence of referential processing on sentence complexity. Cognition, 85, 79-112. Witzel, J., Cornelius, S., Witzel, N., Forster, K. I. & Forster, J. C. (2013). Testing the viability of webDMDX for masked priming experiments. The Mental Lexicon, 8, 421-449. Witzel, N., Witzel, J., & Forster, K. (2012). Comparisons of online reading paradigms: Eye tracking, moving-window, and maze. Journal of Psycholinguistic Research, 41, 105-128. Witzel, J., Witzel, N., & Nicol, J. (2012). Deeper than shallow: Evidence for structurebased parsing biases in second-language sentence processing. Applied Psycholinguistics, 33, 419-456.
Table 1. Sample Experimental Items
Condition
Embeddedclause word order
SRC
OV (non-canonical)
SRC-CC control
ORC
ORC-CC control
Experiments 1, 3 // Experiment 2
dispreferred // preferred word order Hozjajka, | kotoraja.NOM | posle progulki | sil'no | rasstroila | novostjami, … | starušku // nas.ACC | Housewife, | who.NOM | after walk | old_lady // us.ACC | really | upset | with_news, … | ‘The housewife who after the walk really upset the old lady // us with the news (lay on the couch in the living room.)’ dispreferred // dispreferred word order OVS Hozjajka skazala, | čto | posle progulki | sil'no | rasstroila | novostjami … | starušku // nas.ACC | (non-canonical) Housewife said, | that | after walk | old_lady // us.ACC | really | upset | with_news … | ‘The housewife said that after the walk (the aunty) really upset the old lady // us with the news.’ dispreferred // preferred word order SV Hozjajka, | kotoruju.ACC | posle progulki | staruška // my.NOM | sil'no | rasstroila(-i) | novostjami, … | (canonical) Housewife, | whom.ACC | after walk | old_lady // we.NOM | really | upset | with_news, … | ‘The housewife whom after the walk the old lady // we really upset with the news (lay on the couch in the living room.)’ preferred // preferred word order SVO Hozjajka skazala, | čto | posle progulki | staruška // my.NOM | sil'no | rasstroila(-i) | novostjami … | (canonical) Housewife said, | that | after walk | old_lady // we.NOM | really | upset | with_news … | ‘The housewife said that after the walk the old lady // we really upset (the aunty) with the news.’
Table 2. Frequencies of Russian Full-NP and Pronominal SRCs and ORCs with Canonical and Non-Canonical Word Orders RC word order SRC canonical (VO) SRC non-canonical (OV) ORC canonical (SV) ORC non-canonical (VS)
Full-NP RC 581 8 58 91
Pronominal RC 35 33 116 6
Table 3. Frequencies of Russian CCs with Full NPs and Pronouns and Canonical and NonCanonical Word Orders CC subject CC object CC word order CC canonical (SVO) CC non-canonical (OVS) CC non-canonical (OSV) CC non-canonical (SOV) CC non-canonical (VOS) †
Full NP Full NP
Full NP Pronoun
Pronoun Full NP
Pronoun Pronoun
161† 15††
26 10**
78* 0
15 0
3 5 4
2 12 1
5 6 0
0 10 1
corresponds to the word order and NP types in the ORC-CC controls in Experiments 1 and 3 corresponds to the word orderand NP types in the SRC-CC controls in Experiments 1 and 3 * corresponds to the word order and NP types in the ORC-CC controls in Experiment 2 ** corresponds to the word order and NP types in the SRC-CC controls in Experiment 2 ††
Table 4. Sample Experimental Items, Experiment 1
Condition
(a) SRC (b) SRC-CC control (c) ORC (d) ORC-CC control
Embeddedclause word order
Region
RC/CC MC Verb/ Spill R3 Verb CC NP2 OV Hozjajka, | kotoraja.NOM | posle progulki | starušku.ACC | sil'no | rasstroila | novostjami, | legla… (non-canonical, Housewife, | who.NOM | after walk | old_lady.ACC | really | upset | with_news, | lay… dispreferred) ‘The housewife who after the walk really upset the old lady with the news lay (on the couch in the living room.)’ OVS Hozjajka skazala, | čto | posle progulki | starušku.ACC | sil'no | rasstroila | novostjami | tetuška.NOM. (non-canonical, Housewife said, | that | after walk | old_lady.ACC | really | upset | with_news | aunty.NOM. dispreferred) ‘The housewife said that after the walk the aunty really upset the old lady with the news.’ SV Hozjajka, | kotoruju.ACC | posle progulki | staruška.NOM | sil'no | rasstroila | novostjami, | legla… (canonical, Housewife, | whom.ACC | after walk | old_lady.NOM | really | upset | with_news, | lay… dispreferred) ‘The housewife whom after the walk the old lady really upset with the news lay (on the couch in the living room.)’ SVO Hozjajka skazala, | čto | posle progulki | staruška.NOM | sil'no | rasstroila | novostjami | tetušku.ACC. (canonical, Housewife said, | that | after walk | old_lady.NOM | really | upset | with_news | aunty.ACC. preferred) ‘The housewife said that after the walk the old lady really upset the aunty with the news.’ Rel/Comp
Spill R1
RC NP/CC NP1
Spill R2
Note. Rel -- Relativizer; Comp -- Complementizer; Spill R 1/ 2/ 3 -- Spillover Region 1/ 2/ 3; RC NP -- Noun Phrase in the RC; CC NP 1/ 2 -- Noun Phrase 1/ 2 in the CC; RC Verb -Verb in the RC; CC Verb -- Verb in the CC; MC Verb -- Main-Clause Verb
Table 5. Mean Reading Times (by Subjects) for SRCs, ORCs, and their CC Controls by Condition and Region (with Standard Errors of the Mean for Repeated Measures in Parentheses), Experiment 1 Region RC NP/CC NP1
Spill R2
RC/CC Verb Spill R3
… kotoraja.NOM posle progulki … who.NOM after walk
starušku.ACC old_lady.ACC
sil'no really
rasstroila upset
novostjami, … with_news, …
… čto … that
posle progulki after walk
starušku.ACC old_lady.ACC
sil'no really
rasstroila upset
novostjami … with_news …
… kotoruju.ACC … whom.ACC
posle progulki after walk
staruška.NOM old_lady.NOM
sil'no really
rasstroila upset
novostjami, … with_news, …
… čto … that 536 (11) 486 (10) 531 (10) 481 (7)
posle progulki after walk 706 (12) 725 (13) 705 (11) 750 (12)
staruška.NOM old_lady.NOM 846 (27) 839 (18) 804 (23) 742 (18)
sil'no really 689 (14) 685 (12) 657 (15) 676 (10)
rasstroila upset 807 (19) 699 (13) 783 (21) 682 (13)
novostjami … with_news … 1044 (26) 840 (21) 1027 (22) 839 (23)
Rel/Comp
Spill R1
SRC
SRC-CC control
ORC
ORC-CC control
SRC SRC-CC control ORC ORC-CC control
Table 6. Sample Experimental Items, Experiment 2
Condition
(a) SRC (b) SRC-CC control (c) ORC (d) ORC-CC control
Embeddedclause word order OV (non-canonical, preferred) OVS (non-canonical, dispreferred ) SV (canonical, preferred) SVO (canonical, preferred)
Region RC NP/ RC/CC MC Verb/ Spill R2 Spill R3 CC NP1 Verb CC NP2 Hozjajka, | kotoraja.NOM | posle progulki | nas.ACC | sil'no | rasstroila | novostjami, | legla… Housewife, | who. NOM | after walk | us.ACC | really | upset | with_news, | lay… ‘The housewife who after the walk really upset us with the news lay (on the couch in the living room.)’ Hozjajka skazala, | čto | posle progulki | nas.ACC | sil'no | rasstroila | novostjami | tetuška.NOM. Housewife said, | that | after walk | us.ACC | really | upset | with_news | aunty.NOM. ‘The housewife said that after the walk the aunty really upset us with the news.’ Hozjajka, | kotoruju.ACC | posle progulki | my.NOM | sil'no | rasstroili | novostjami, | legla… Housewife, | whom.ACC | after walk | we.NOM | really | upset | with_news, | lay… ‘The housewife whom after the walk we really upset with the news lay (on the couch in the living room.)’ Hozjajka skazala, | čto | posle progulki | my.NOM | sil'no | rasstroili | novostjami | tetušku.ACC. Housewife said, | that | after walk | we.NOM | really | upset | with_news | aunty.ACC. ‘The housewife said that after the walk we really upset the aunty with the news.’ Rel/Comp
Spill R1
Note. Rel -- Relativizer; Comp -- Complementizer; Spill R 1/ 2/ 3 -- Spillover Region 1/ 2/ 3; RC NP -- Noun Phrase in the RC; CC NP 1/ 2 -- Noun Phrase 1/ 2 in the CC; RC Verb -- Verb in the RC; CC Verb -- Verb in the CC; MC Verb -- Main-Clause Verb
Table 7. Mean Reading Times (by Subjects) for SRCs, ORCs, and their CC Controls by Condition and Region (with Standard Errors of the Mean for Repeated Measures in Parentheses), Experiment 2 Region RC NP/CC NP1
Spill R2
RC/CC Verb
Spill R3
… kotoraja.NOM posle progulki … who.NOM after walk
nas.ACC us.ACC
sil'no really
rasstroila upset
novostjami, … with_news, …
… čto … that
posle progulki after walk
nas.ACC us.ACC
sil'no really
rasstroila upset
novostjami … with_news …
… kotoruju.ACC … whom.ACC
posle progulki after walk
my.NOM we.NOM
sil'no really
rasstroili upset
novostjami, … with_news, …
… čto … that 562 (7) 507 (8) 585 (11) 519 (9)
posle progulki after walk 801 (12) 770 (14) 816 (14) 799 (11)
my.NOM we.NOM 555 (7) 577 (8) 557 (7) 528 (8)
sil'no really 588 (7) 593 (5) 576 (6) 590 (6)
rasstroili upset 725 (14) 699 (11) 732 (12) 739 (12)
novostjami … with_news … 942 (18) 818 (15) 919 (15) 885 (20)
Rel/Comp
Spill R1
SRC
SRC-CC control
ORC
ORC-CC control
SRC SRC-CC control ORC ORC-CC control
Table 8. Means for the Eye-Tracking Measures (by Subjects) in SRCs, ORCs, and their CC Controls by Condition and Region (with Standard Errors of the Mean for Repeated Measures in Parentheses), Experiment 3
Rel/Comp
Region Spill R1
RC NP/CC NP1
Spill R2
RC/CC Verb
Spill R3
… kotoraja.NOM posle progulki … who.NOM after walk
starušku.ACC old_lady.ACC
sil'no really
rasstroila upset
novostjami, … with_news, …
… čto … that
posle progulki after walk
starušku.ACC old_lady.ACC
sil'no really
rasstroila upset
novostjami … with_news …
… kotoruju.ACC posle progulki … whom.ACC after walk
staruška.NOM old_lady.NOM
sil'no really
rasstroila upset
novostjami, … with_news, …
… čto … that
posle progulki after walk
staruška.NOM old_lady.NOM
sil'no really
rasstroila upset
novostjami … with_news …
221 (3) 209 (3) 225 (3) 215 (3)
254 (4) 242 (4) 246 (4) 251 (4)
237 (4) 246 (4) 241 (4) 246 (4)
247 (4) 253 (3) 247 (3) 259 (4)
235 (4) 231 (3) 231 (4) 239 (5)
524 (12) 539 (14) 505 (10) 533 (9)
423 (10) 432 (12) 379 (9) 399 (9)
310 (8) 341 (8) 330 (7) 327 (8)
372 (9) 393 (10) 371 (7) 414 (14)
536 (15) 448 (12) 516 (13) 476 (10)
634 (21) 607 (16) 648 (12) 612 (13)
598 (22) 533 (17) 474 (16) 505 (14)
644 (27) 513 (20) 465 (22) 421 (15)
498 (16) 559 (19) 430 (18) 490 (15)
820 (29) 673 (26) 763 (40) 702 (27)
.12 (.02) .08 (.02) .18 (.02) .10 (.02)
.23 (.02) .14 (.02) .13 (.01) .16 (.01)
.33 (.02) .21 (.02) .17 (.02) .14 (.02)
.12 (.01) .19 (.01) .08 (.01) .11 (.01)
.29 (.02) .24 (.02) .25 (.02) .26 (.03)
SRC
SRC-CC control
ORC
ORC-CC control
First fixation duration SRC 204 (4) SRC-CC control 189 (4) ORC 206 (4) ORC-CC control 187 (4) First-pass time SRC 248 (7) SRC-CC control 192 (5) ORC 255 (5) ORC-CC control 191 (6) Regression-path duration SRC 303 (9) SRC-CC control 234 (12) ORC 338 (14) ORC-CC control 231 (13) First-pass regression proportion SRC .10 (.02) SRC-CC control .05 (.02) ORC .13 (.02) ORC-CC control .07 (.02)
Table 9. Results of the Statistical Analyses for the Eye-Tracking Measures by Region, Experiment 3 Sentence Type SRC + CC control vs. ORC + CC control F1 (1,28)
F2 (1,44)
First fixation duration