Cameron & Flores-Ferrán, 2004; Szmrecsanyi, 2006), in the Cognitive ... and, more generally, when the distance to its last mention is smaller (Travis & Torres-.
A Cognitive Sociolinguistic model of morphosyntactic alternations: A case study of subject pronoun expression in Cuban Spanish ABSTRACT. Earlier work on existential agreement variation in British English and Caribbean Spanish has made a convincing case for the hypothesis that existential agreement variation is constrained by three domain-general cognitive constraints on language (production) that are assumed in Cognitive Linguistics: markedness of coding, statistical preemption, and structural priming. A corollary of this analysis is that the same constraints should also be able to account for the behavior of other morphosyntactic alternations. To explore this hypothesis, I perform a case study of a well known, but notoriously hard to model alternation: subject pronoun expression in Cuban Spanish. I propose that the variation between overt and omitted tacit amounts to a competition between two abstract constructions: and , which is constrained by the three domain-general cognitive constraints. The database consists of 7,849 conjugated verbs with human-reference subjects that were drawn from 24 sociolinguistic interviews with native speakers of Cuban Spanish residing in Havana, Cuba. The results of a mixed-effects logistic regression suggest that speakers prefer for conceptually more prominent subjects, for verb forms that are entrenched in this construction, and when they have just used or processed this construction. As this pattern coincides completely with the predictions that follow from markedness of coding, statistical preemption, and structural priming, the paper concludes that morphosyntactic variation is constrained by these domain-general cognitive constraints. KEYWORDS: Cognitive Sociolinguistics, cognitive constraints, variationist linguistics, variation and linguistic theory, Spanish subject pronoun expression 1. Introduction In recent years, the usage-based approach to language (e.g., Langacker, 1990: Chap. 10) has implied that cognitive linguists have increasingly moved away from introspective methods, in favor of corpus investigation and experimentation. Because these data sources inevitably confront the researcher with socially, regionally, and stylistically conditioned linguistic variation (e.g., Geeraerts, 2005; Geeraerts & Kristiansen, 2015), this shift of focus has spiked interest in language-internal variation and the sociocultural aspects of that variation. In this context, Cognitive Sociolinguistics emerged, which proposes that a complete understanding of language can only be obtained when the social and cultural factors shaping usage events are considered together with the cognitive ones (Geeraerts & Kristiansen, 2015:366-371; Pütz, Robinson, & Reif, 2012).
1
Because of its general orientation, one would expect Cognitive Sociolinguistics to inspire the enthusiasm of variationist linguistics, including therein both variationist sociolinguistics in the tradition of Weinreich et al. (1968) and Probabilistic Grammar, the fairly recent school of thought initiated by Bresnan et al. (2007). Yet, so far, in spite of constituting a booming research endeavor (as is shown by the multiple edited volumes and special issues that have been devoted to the topic over the past few years; e.g., Geeraerts, Kristiansen, & Peirsman, 2010; Kristiansen & Dirven, 2008; Pütz, Robinson, & Reif, 2012,2014), Cognitive Sociolinguistics remains mostly an enterprise that is practiced by cognitive linguists in the interest of developing Cognitive Linguistics, with results and methods that – innovative as they may be in this context – have little to no impact on the broader community of variationists (Geeraerts & Kristiansen, 2015:366-371; Pütz, Robinson, & Reif, 2012). While some may consider this situation a natural and unproblematic consequence of the different orientations and sensitivities of cognitive sociolinguists and the larger community of variationists, I follow Geeraerts & Kristiansen (2015:378) in considering that, to be successful, Cognitive Sociolinguistics “will have to interact intensively with existing variationist linguistics, and defend the specific contribution of Cognitive (Socio)linguistics in that context.” Therefore, in this paper, I use a case study of subject pronoun expression in Cuban Spanish (see example 1) to provide additional empirical support for one such contribution, namely, the hypothesis that probabilistic patterns in morphosyntactic variation reflect the joint action of three domain-general cognitive constraints on language (production) that are assumed in Cognitive Linguistics: markedness of coding, statistical preemption, and structural priming.
2
(1) Entonces ella me malcriaba mucho y Ø me hacía cosas así (LH07M11/284285).1 ‘So she spoiled me a lot and [she] made me stuff like that.’ The remainder of the paper is structured as proceeds. In section 2, I present the theoretical model of language variation that will be the main focus of this article. In section 3, I review a selection of the rich literature on subject pronoun expression in Spanish. Section 4 applies the theoretical model to subject pronoun expression and presents the predictors that will be examined in section 6. Section 5 is concerned with the methods of the case study. Section 7 offers some concluding discussion. 2. A Cognitive Sociolinguistic model of morphosyntactic variation In earlier work on existential agreement variation in Caribbean Spanish and British English (AUTHOR, 2014, 2015, 2017; AUTHOR & CO-AUTHOR, under review) I proposed that the spreading activation language production model that is assumed in Cognitive Linguistics (e.g., Langacker, 2007, 2008; Hudson, 2010) allows for a psychologically plausible model of the constraints that govern morphosyntactic variation. This model proposes that language production initiates with speakers forming a highly rich conceptualization (Langacker, 2008: 31-34). As the conceptualization takes form, domain-general categorization processes compare it to the conceptual import of constructions. In most cases, this rough first pass activates multiple constructions to the degree they match the conceptualization. These start competing for further activation, while also feeding back into the way the conceptualization is structured (e.g., Dell, 1986; Langacker, 2007:421, 2008: 228-229). Eventually, one
1
The codes at the end of the example identifiers indicate the following: LH: Havana, 07: interview 7, M: female speaker; 1: no university education, 1: 21-35 years old. The numbers behind the slashes indicate the number of the conjugated verb, in this case: the 284th and the 285th conjugated verbs in the interview transcript.
3
construction reaches the highest level of activation and becomes selected to categorize the conceptualization (Langacker, 2007:421, 2008: 228-229). Of course, given a particular conception, not all constructions will have equal probability of serving as a target for categorization. Since Cognitive Linguistics claims that speakers use domain-general cognitive abilities to retrieve constructions from the network, it seems only fair to assume that domain-general cognitive constraints will also condition the probability of activation of constructions. In this regard, three such factors have been mentioned in the Cognitive Linguistics literature (Langacker, 2010: 93): markedness of coding (Langacker, 1991: 298), statistical preemption (Goldberg, 2006:94, 2011), and structural priming (Goldberg, 2006:120-125). Regarding the first of these constraints, the notion of spreading activation entails that the better the conceptualization matches the conceptual import associated with the construction, the more the representation of the construction will become activated. In the domain of lexical choice, Geeraerts (in press:16) argues that “an expression will be used more often for naming a particular referent when that referent is a member of the prototypical core of that expression’s range of application”. In syntax, similar effects have been observed. One aspect of this is that speakers, when confronted with multiple alternatives to encode a particular conceptualization of an event (e.g., active vs. passive), tend to use the variant that attributes the right amount of formal prominence to conceptually prominent participants (e.g., Myachykov & Tomlin, 2015). For instance, the vast majority of the speakers of English will prefer example (1a) (19,900 hits on Google) over example (1b) (10 hits on Google). This is a natural corollary of Langacker’s (1991:312) schematic definition of subject as the ‘primary figure’ of the clause, as example (1a) encodes the entity that draws most attention (the pedestrian, which is both definite and human) with the grammatical function that signals it as such, leading to an optimal correspondence between conceptualization and form. In Cognitive
4
Linguistics, this prototype effect is called ‘markedness of coding’; ‘unmarked coding’, referring to a close correspondence between construction and conceptualization, is preferred (Langacker, 1991:298). (2) a. The pedestrian was hit by a car. b. A car hit the pedestrian. A second constraint that influences a representation’s level of activation is statistical preemption. This notion indicates the fact that when the representations of lexical items and abstract constructions are activated frequently together, the compositional expression becomes stored as a single node in the network; this is called ‘entrenchment’ (Bybee, 2001: Chap. 5). In turn, because this entrenched expression is more detailed and can be activated faster, it is “preferentially produced over items that are licensed but are represented more abstractly, as long as the items share the same semantic and pragmatic constraints” (Goldberg, 2006: 94). This general cognitive constraint has been proposed as a way to explain why speakers do not overgeneralize from input by producing, for example, *stealer instead of thief or *goed instead of went (Goldberg, 2006: Chap. 5) and, more generally, why speakers prefer to use grammatical constructions in ways they have predominantly observed them (as is shown by the contrast between examples 3a-b), whereas, in the absence of such experiences, they are able to accept and produce novel uses of verbs (as in example 4; e.g., Goldberg, 2011; Robenalt & Goldberg, 2015). (3) a. ?She explained me the story (taken from Goldberg, 2006: 96).2 b. She explained the story to me. (4) They coughed the breadcrumbs off the table. 2
Using introspection, Goldberg (2006:96) argues that the ditransitive variant in (3a) is less natural and less acceptable than its prepositional alternative (3b). This is supported by usage data: ditransitive uses of to explain produce no results in the Corpus of Contemporary American English (Davies, 2008-), whereas the corpus provides 693 of prepositional dative uses of this verb.
5
Thirdly, language users tend to pick up and recycle (unintentionally and unconsciously) construction patterns they have (heard) used before, without necessarily repeating the specific words that appear in these structures (e.g., Pickering & Ferreira, 2008; Szmrecsanyi, 2006). Sometimes referred to as ‘persistence’ or ‘perseverance’ in the variationist literature (e.g., Cameron & Flores-Ferrán, 2004; Szmrecsanyi, 2006), in the Cognitive Linguistic and psycholinguistic literature, this tendency is called ‘structural priming’. Psycholinguistic research into structural priming has revealed that the phenomenon can be accounted for as a residual activation effect: once a particular representation has been visited, it remains more activated than others for a period of time, giving it a head start over its competitors. At the same time, structural priming also appears to be a mechanism of implicit learning, which permanently adapts the ease of activation of constructions to observed usage patterns (e.g., Goldberg, 2006: 120-125; Pickering & Ferreira, 2008: 447). In earlier work, I have shown that these constraints offer a principled way of predicting which contextual features will constrain existential agreement variation in British English (e.g., there was/were problems) and Caribbean Spanish (e.g., había/habían problemas) and with what directionality. However, since Cognitive Linguistics includes a ‘generalization commitment’ (Lakoff, 1990), this analysis can only be accepted provided it applies to other phenomena as well. Therefore, the purpose of this article is to examine whether the cognitive constraints outlined above also generate the same kind of correct predictions for a different type of alternation, in particular, subject pronoun expression in Spanish. Let us now review the literature on this phenomenon.
6
3. Subject pronoun expression in Spanish It is well known that Spanish has non-obligatory subject personal pronouns (SPPs), for which the attested example (5a) and its adapted version (5b) are both grammatical and have the same referential value. (5) a. Ø no recuerdo los nombres (LH01H22/158) b. Yo no recuerdo los nombres. ‘Ø/I don’t recall their names.’ As a matter of fact, when compared to languages such as English, subject personal pronouns have a very limited range of application in Spanish, as they generally do not occur for nonhuman or inanimate referents (an exception to this is the neuter pronoun ello ‘it’, which behaves as a demonstrative) and only occur very limitedly to refer to third-person plural generic referents (Shin & Otheguy, 2005). As a result of this limited range of application, in all dialects of Spanish, subject personal pronouns remain implicit for the vast majority of subjects (e.g., Carvalho, Orozco, & Shin, 2015; Flores-Ferrán, 2007). Since the default situation, thus, consists in the absence of subject pronouns, earlier variationist research has mainly tried to establish which discourse-pragmatic and formal factors contribute to the expression of SPPs. After an initial exploration of multiple conditioning factors (e.g., Bayley & Pease-Álvarez, 1997; Cameron, 1993; Cameron & Flores-Ferrán, 2004; Enríquez, 1984) and a productive debate, inspired by Kiparsky (1982), about the function of SPPs to compensate for the partial loss of verbal inflections in Caribbean Spanish (Cameron, 1993; Hochberg, 1986), current research has settled on a ‘golden standard’ model, following a large-scale study performed in New York City by Ricardo Otheguy, Ana Celia Zentella, and their associates (e.g., Shin & Otheguy, 2005; Otheguy & Zentella, 2012; Otheguy et al., 2007; Erker & Guy, 2012). This model includes
7
the following predictors: subject reference, grammatical person and number, absence/presence of reflexive pronouns, semantic class of verb, structural priming, tense-aspect-mood, and type of clause. Let us now review the results that have been obtained for those. 3.1 Subject reference Regarding the first of these contextual features, subject reference, it has been found that speakers use SPPs more often when the reference of the subject changes with respect to the subject of the previous finite verb (e.g., Cameron, 1993: 315; Erker & Guy, 2012: 541; Flores-Ferrán, 2007: 632; Lastra & Martín-Butragueño, 2015:43; Ortiz-López, 2008, 2011; Otheguy & Zentella, 2012: Chap. 8), especially when the subject of that previous clause has human reference (Travis & Torres-Cacoullos, 2012). In turn, speakers are less likely to use SPPs when the subject is coreferential with that of the foregoing clause (Cameron, 1993: 315), with an object or a preposition term present in the foregoing clause (AUTHOR, 2011: 204) and, more generally, when the distance to its last mention is smaller (Travis & TorresCacoullos, 2012: 728, note 12) or its set members have been mentioned before (e.g., me… you… > we; Cameron, 1997: 37-39). These patterns conform to the expectations set forward by cognitive-functional accounts of reference (e.g., Ariel, 2001; Givón, 1983; Gundel, Netberg, & Zacharski, 1993) as they show that less predictable, harder to process referents are coded more elaborately. 3.2 Grammatical person and number As for grammatical person and number, earlier research has found that first- and secondperson singular pronouns are more likely to be expressed than third-person singular pronouns (e.g., Alfaraz, 2015:8-10; Otheguy & Zentella, 2012: Chap. 8). This could be due to the fact that yo ‘I’ and tú ‘you-2 singular informal’ refer to the discourse participants, for which they may fulfill functions unavailable to third-person singular SPPs, such as the disambiguation of
8
epistemic parentheticals (Ajijón-Oliva & Serrano, 2010; Davidson, 1996), turn-taking (Davison, 1996; Travis & Torres-Cacoullos, 2012), the hedging of opinions to save the face of the hearer (Stewart, 2003), the signaling of a subjective stance and personal involvement (Davidson, 1996; Serrano, 2014, Stewart, 2003), or the triggering of speech-act readings of certain verbs (Davidson, 1996). However, many of these functions only seem to apply to verbs of speaking and thinking (e.g., Posio, 2011:782; Serrano, 2014:337), for which it remains unclear why first- and second-person singular SPPs are more frequent across the board. Additionally, when singular and plural pronouns are contrasted, earlier research reveals that the former are more likely to be expressed than the latter (e.g., AUTHOR, 2011:199-200; Lastra & Martín-Butragueño, 2015:43; Orozco, 2015:24-25). According to Cameron (1997), this pattern may be attributed to the effects of referential continuity, as his data support that the majority of plural subjects refer to sets of human referents of which the members have recently been introduced. As a result, on aggregate, plural referents are more predictable and accessible than singular referents, for which they are less likely to be expressed as pronominal subjects. 3.3 Absence/presence of reflexive pronouns With regard to absence/presence of reflexive pronouns, past investigations have revealed that speakers are less likely to insert SPPs when a reflexive pronoun is present (e.g., Bayley & Péase-Álvarez, 1997; Otheguy & Zentella, 2012: Chap. 8; Shin, 2014:311). According to Bayley & Péase-Álvarez (1997), the effects of this constraint can be explained by the fact that there is less potential for ambiguity when a reflexive pronoun is present, as it already signals the grammatical person of the verb.
9
3.4 Semantic type of verb Concerning the semantic type of verb, it has been found that verbs that refer to external actions (e.g., hacer ‘to do’) disfavor SPP presence, whereas verbs denoting speech acts (e.g., decir ‘to say’), states (e.g., estar ‘to-be-in-a-place’), or psychological activities (e.g., creer ‘to think’) typically favor their presence (e.g., AUTHOR, 2011:205; Lastra & MartínButragueño, 2015:43; Orozco, 2015:24-25; Otheguy & Zentella, 2012: Chap. 8). According to Posio (2011:798-797) this pattern suggests that SPPs co-occur more often with verb types that denote events of low transitivity, which favor the focusing of attention on the subject, as opposed to verbs of high transitivity, which foreground the event and its effects on the patient argument (e.g., Hopper & Thompson, 1980, 2001). However, recent work shows that the greater use of SPPs with these verb types could also be due to a few frequent verb forms that co-occur much more often with a SPP than other verbs of the same class (e.g., Erker & Guy, 2012; Orozco, 2015:24-25; Posio, 2015:59). 3.5 Structural priming As for structural priming, SPPs also occur more often when the previous human-reference subject is realized as a subject pronoun (e.g., Cameron & Flóres-Ferrán, 2004; Shin, 2014:306; Travis & Torres-Cacoullos, 2012:726). In the psycholinguistic and Cognitive Linguistic literature, priming has often been used to probe mental representations (e.g., Goldberg, 2006: 120-125; Pickering & Ferreira, 2008). The argument is that, if speakers can be primed into using a particular construction (e.g., Pat was hit by a car) by exposing them to a construction of the same type that involves different lexical items (e.g., The cat was struck by lightning), the two expressions involve the same abstract mental representation. Therefore, the priming results that emerge from the literature support that the presence/absence of SPPs involves a competition between two distinct cognitive representations (Cameron & Flores-
10
Ferrán, 2004). However, as will become evident below, there is no consensus in the literature as to the nature of these representations. 3.6 Tense, aspect, and mood Regarding tense, aspect, and mood, Cameron (1993: 316-319) shows that the indicative imperfect, the conditional, and the subjunctive tenses co-occur more often with SPPs than compound tenses, the indicative present, and the indicative future, which, in turn, occur more frequently with subject pronouns than the preterit. Because the first group of tenses has homonymous first- and third-person singular forms (e.g., cantaba can mean both ‘I sang’ and ‘s/he sang’), Cameron (1993) interprets this pattern as indicating that speakers use SPPs to disambiguate the reference of the subjects of potentially ambiguous tense forms. However, while these results have been replicated in many studies (e.g., Lastra & Martín-Butragueño, 2015:43; Shin, 2014:319-322), other studies do not consistently find that speakers use SPPs more often with these forms (e.g., Bayley et al., 2013: 27-28; Erker & Guy, 2012: 540; SilvaCorvalán, 2001: 159). In addition, a different analysis has been proposed by Silva-Corvalán (2001: 161-162), who argues that the tense pattern uncovered by Cameron (1993) is due to the interaction between the semantics of the verb tense and the attention-focusing function of SPPs in discourse (see Posio, 2011 as well). That is, the imperfect, the conditional, and the subjunctive are mainly used to describe the settings of events, which favors the focusing of attention on the subject with SPPs. In contrast, the preterit is used to retell past punctual actions, which favors the focusing of attention on the predicate by omitting pronouns. In turn, the simple present can be used for both actions and settings, which explains why it co-occurs more often with pronominal subjects than the preterit, but less frequently than the other tenses. To test this analysis empirically, Shin (2014) selects imperfect- and preterit-tense third-person singular
11
verbs from the narrative sections of her sample and codes these for their temporal sequentiality. In narrative analysis (e.g., Hopper & Thompson, 1980), it is generally considered that temporally sequential events form the foreground of the narrative (e.g., I walked in, grabbed myself a coke, when I saw…), whereas verbs that are not temporally sequential constitute the narrative’s background (e.g., …that my wife was already home). Contrary to what is predicted by Silva-Corvalán’s (2001) analysis, the results suggest that speakers use SPPs more often with foregrounded events in either of these two tenses. Other investigations have equally reported tense patterns that do not confirm this analysis (e.g., AUTHOR, 2011: 202; Erker & Guy, 2012: 540; Travis & Torres-Cacoullos, 2012: 734). 3.7 Type of clause Finally, earlier work (e.g., Orozco, 2015:22; Shin, 2014:311) has established that SPPs are more frequent in independent main clauses than in subordinated and coordinated clauses. These results could reflect an effect of referential distance, as the subjects of coordinated and subordinated clauses tend to refer to an entity that has been mentioned in the main clause However, various other patterns have also been observed (e.g., Otheguy & Zentella, 2012:190). In sum, this section has shown that earlier work on subject pronoun expression in Spanish has found recurring patterns of covariation with contextual features. However, after thirty-plus years of intensive inquiry, so far only limited success has been achieved when it comes to accounting for those patterns. This is due to the fact that earlier work has attempted to interpret correlation patterns in the light of highly specific discourse functions of SPPs, which only apply to a subset of pronouns or verb types and are not sufficiently abstract to generalize to the full range of pronoun data or, even less so, beyond subject pronoun expression. Two exceptions to this pattern emerge: the hypothesis that the use of SPPs corresponds to either a
12
disambiguating strategy (see the discussion on tense, aspect, and mood) or the distribution of attention resources in discourse (see the discussion on subject reference, semantic type of verb, and tense, aspect, and mood). As for the former, research in sociolinguistics (e.g., Poplack, 1984; Labov, 1994: Chap. 19-20) and usage-based linguistics (e.g., Bybee, 2008) has shown that referential ambiguity hardly ever arises in discourse, even when sound changes wipe out an entire inflectional paradigm, because speakers draw on multiple contextual and situational clues, as well as their world knowledge, to interpret utterances. Therefore, this hypothesis will not be pursued any further here. In contrast, the latter will constitute one of the key elements of the model that will be presented in the next section. 4. Cognitive constraints at work: Subject pronoun expression in Cuban Spanish 4.1 The nature of subject pronoun expression Before we can begin to explore the predictions that follow from the cognitive constraints that were introduced in section 2, we should first address how the variable expression of subject pronouns should be conceptualized in Cognitive Linguistics, in particular, Cognitive Construction Grammar (Goldberg, 1995, 2006). In this regard, many scholars working on pronominal variation in Spanish (implicitly) assume some version of the mainstream generative syntax position that both clauses with and without SPPs have a pronominal subject (Cameron, 1993; Ortiz-López, 2008; Toribio, 2000; Otheguy & Zentalla, 2012). From this perspective, the variation amounts to a competition between overt pronouns and a phonologically null pronoun, which satisfies the requirement that each clause has a filled subject slot (e.g., Chomsky, 1995, 2001). Travis & Torres-Cacoullos (2012) translate this characterization of the variation into an informal construction grammar framework. Particularly, these authors argue that subject pronoun expression consists in the variable realization of the subject pronoun of an invariant
13
construction, in which the parentheses indicate variable realization. However, this approach has some untenable implications. Specifically, if the variation would indeed boil down to the variable realization of the SPP of such an invariant construction, we would expect to find that the expression of the pronominal subject would be the default situation from which can only be deviated under certain discourse conditions (e.g., Goldberg, 2005). Since Spanish displays exactly the opposite pattern (i.e., the default situation is the absence of a pronoun and SPPs only occur restrictedly), this hypothesis cannot be entertained any further. Therefore, I propose the working hypothesis that subject pronoun expression amounts to a competition between two distinct, abstract, high-ranking clause-structure constructions: and . These constructions can be considered to be near-synonyms, which are only minimally distinct from one another. Building on earlier work, I will assume that the variants have their own social and stylistic values (cf. Orozco, 2015; Serrano, 2014) and that the construction attributes more cognitive prominence to the subject argument, as this construction encodes this participant explicitly (cf. Posio, 2011; Silva-Corvalán, 2001). 4.2 Hypotheses Assuming this working hypothesis, the cognitive constraints presented in section 2 allow for a series of testable predictions about the behavior of subject pronoun expression in naturalistic discourse. Since the working hypothesis proposes a slight semantic difference between the two variants, markedness of coding leads to hypothesis 1: Hypothesis 1, Markedness of coding: Speakers will use more often with conceptually more prominent subjects (cf. Posio, 2011). While reviewing the literature, I have pointed out that the results for lexical type of verb reveal evidence that some verb forms appear much more frequently than others with a SPP.
14
Indeed, assuming the competition between and , statistical preemption predicts that certain verb forms will be entrenched in one variant or the other, disfavoring the use of the opposite variant for conceptualizations that can be expressed with the entrenched form (e.g., Goldberg, 2006:94, 2011; Robenalt & Goldberg, 2015). This is captured by the second hypothesis. Hypothesis 2, Statistical preemption: The strongest mental representations of verb forms can be ranked on a continuum ranging from prefab to prefab. On the prefab ends, markedness of coding and structural priming will only make a minor contribution to explaining the variation, as speakers will generally favor either or . The third constraint, structural priming, leads to the following hypothesis: Hypothesis 3, Structural priming: Using/processing will incite speakers to use in the following variable context, regardless of variations in grammatical person and number, tense-aspect-mood, and verb. Of course, these hypotheses remain rather abstract. To be able to test them empirically, they need to be made more concrete and specific. These operationalizations will be the topic of the next section. 4.3 Operationalization 4.3.1 Markedness of coding. Hypothesis 1 claims that more prominent subjects will preferentially be encoded with . Following a large tradition in Cognitive Linguistics (e.g., Langacker, 1991; Myachykov & Tomlin, 2015; Talmy, 2007), I will assume that participants are prominent when they attract the speaker’s selective attention. According to Langacker (1991:306-308) this correlates with a number of factors pertaining to the
15
referent itself, a first of which is empathy: the more likely an entity is to attract the speaker’s empathy, the more likely it will be conceptually prominent. Therefore, the data were coded for the hierarchy provided in (3). For this predictor, hypothesis 1 suggests that subjects that refer to the speaker and the hearer will favor more than those that refer to others. (3)
Speaker > hearer > other (adapted from Langacker, 1991: 306).
A second factor is semantic role: the more agentive an entity is perceived, the greater the odds that it will attract the speaker’s attention (Myachykov & Tomlin, 2015). 3 To test this prediction, the data were coded for Lakoff’s (1977) agentivity features volitionality and referentiality.4 For these two features, the specific prediction that follows from markedness of coding is that non-volitional, non-referential subjects will less likely be encoded with . Additionally, the literature suggests that highly transitive clauses focus attention on the event and its effects on the patient (e.g., Hopper & Thompson, 1980). Therefore, markedness of coding leads to the prediction that clauses that encode highly transitive events will disfavor SPE, as was already argued by Posio (2011) for first-and second-person singular verbs. To test this prediction, I coded the data for Hopper & Thompson’s (1980) features individuation of the object (coded here as animacy), aspect, and kinesis, which model characteristics of the event. 5 For aspect, Hopper & Thompson’s (1980) telic/atelic binary classification was 3
While agentive subjects are characteristic of highly transitive clauses (Thompson & Hopper, 1980; Hopper & Thompson, 2001; Lakoff, 1977), I believe that within a prototype approach to transitivity it is reasonable to distinguish between three types of transitivity features: features that model the agenthood of the subject argument (volitionality, determination, control, referentiality, responsibility, etc.), which favor the focusing of attention on the agent (Langacker, 1991: 294; Myachykov & Tomlin, 2015); features that model aspects of the event (punctuality, kinesis, aspect, absence/presence of an object), which favor the focusing of attention on the event and its effects on the patient (Hopper & Thompson, 1980); and features that model characteristics of the proposition (mode, affirmation), which correlate only weakly with transitivity (Thompson & Hopper, 2001:36).
4
Other agentivity features such as control, responsibility, and perceptibility were not withheld because they collided with volitionality.
5
Punctuality was not withheld, because it was collinear with kinesis.
16
expanded with a category for continuous aspect. This was motivated by the fact that this type of aspect presents the event in its course, focusing even more attention on it. Finally, a subject can be conceptually more prominent than others because of its discourse status. Particularly, subjects that refer to well-established, continuous topics may be hypothesized to draw relatively less attention than those that refer to newly (re)introduced, disruptive topics (cf. e.g., Givón, 1983: 18; Langacker, 1991: 314). Therefore, hypothesis 1 predicts that the latter will more likely be encoded with . To test this prediction, I coded the data for Givon’s (1983) measure of referential distance (the distance in clauses to the last mention of the subject) and the referential continuity of the subject. For these predictors, markedness of coding predicts that will occur more often with subjects that refer to entities that have been mentioned further away and subjects that refer to a referent different from that of the previous subject. 4.3.2 Statistical preemption. Hypothesis 2 claims that with certain verb forms speakers will preferentially use either a -based or a -based prefab. To test this hypothesis without incurring in circularity I performed ‘distinctive collexeme analyses’ (e.g., Stefanowitsch & Gries, 2005) on frequency data culled from the twenty-million-words twentieth-century section of Corpus del español (Davies, 2002-). Applied to pronominal variation, this type of analysis consists in calculating the positive/negative base-ten logarithm of a p-value obtained with a Fisher-Yates Exact test for Table 1, depending on whether or not the observed frequency of Cell A exceeds its expected frequency. The further the resulting ‘collostruction strength’ deviates from zero, the stronger the association between the verb form and either (positive strengths) or (negative strengths; Levshina, 2015:232,242-243).
17
Table 1: Collocations table Cell A
Cell B
Corpus del español frequency of the verb
Corpus del español frequency of all other
form with its corresponding SPP (e.g., yo
tokens of
creo)
(e.g., él cree, yo bailo, ella duerme)
Cell C
Cell D
Corpus del español frequency of the verb
Corpus del español frequency of all other verb
form outside of
forms outside of (e.g., trabajo,
(e.g., creo)
dice, corre)
4.3.3 Structural priming. Finally, to examine priming effects, all the tokens were coded for the last variant that was used by either the speaker (production-to-production priming) or the interviewer (comprehension-to-production priming), the constructional variant that appears in this earlier mention, the lexical overlap with the case at hand (i.e., whether or not the speaker repeated the same verb form), and the distance (in conjugated verbs) between the prime and this target clause. In this regard, earlier research has shown that priming effects are long-lived, persisting typically for ten or more clauses (e.g., Pickering & Ferreira, 2008). However, in natural discourse, multiple conflicting primes are likely to occur in a ten-clause stretch and the greatest density of primes will occur in adjacent clauses. Therefore, for practical, rather than theoretical, reasons, the maximum distance for priming effects was set to five conjugated verbs. 5. Methods To test these hypotheses, I perform a corpus analysis of subject pronoun expression in Cuban Spanish. The data were collected in June 2011 in Havana, Cuba to analyze the variable agreement of presentational haber ‘there is/are’ in this variety. For that study, 24 recording
18
sessions were performed with native-speaker residents of two age groups (21-35 years; 55+ years), two education groups (university vs. less), and the two genders. The recording sessions consisted of a thirty-minute interview, followed by a reading task and a sentence completion task. The total amount of speech that was gathered sums 25 hours, representing some 200,000 words, but for this study, only the interview sections of the corpus were used, which represent some 12 hours of speech (roughly, 100,000 words). The 24 interviews were orthographically transcribed in their full length and part-of-speech tagged with the Stanford POS Tagger (Toutanova et al., 2003). Subsequently, I used the R package XML (Temple-Lang & CRAN, 2016) to loop through the XML files generated by the Stanford POS Tagger in order to extract and number all conjugated verbs and to annotate them for formal variables. Afterwards, all eligible verbs with human-reference subjects were filtered out manually and coded semi-automatically for the semantic parameters that were described in the previous section. The data and R scripts used for this paper are available off the author’s website (http://www.author.domain). To be eligible, verbs had to occur in contexts where both subject pronouns and verbal markings alone could occur. Following Otheguy & Zentalla (2012) and Otheguy et al. (2007), this implied that verbs with impersonal and inanimate subjects were not considered as instances of the variable (e.g., metereological verbs such as llueve ‘it rains’; existentials with haber e.g., hay cosas ‘there are things’ and hacer e.g., hace años ‘it’s been years’, as well as se-passives e.g., se pide ayuda ‘help is requested’). Verbs that occurred with a lexical subject (e.g., Marta pide ayuda ‘Martha requests help’) or in a subject-headed relative (El que con cojos anda ‘He who walks with cripples’) were not included as instances of the variable either. In turn, contrastive contexts, which have been excluded from some earlier studies of subject pronoun expression, were included in the corpus provided the pronoun was not the focus of contrast, as Matos-Amaral & Schwenter (2005) have shown that bare verbs do occur
19
in such contexts. Also, following Otheguy & Zentella (2012), verb forms accompanied by a topicalized SPP (e.g., Yo lo que Ø quiero es ‘I what [I] want is’) were coded as bare verbs (i.e., absence of a pronoun), as it would have been possible to insert a pronoun directly before the verb (e.g., Yo lo que yo quiero es ‘I what I want’). Crucially, the approach of this paper diverges from the selection criteria set forth by Otheguy & Zentella (2012: 248) in not excluding verb forms that occur in set phrases (e.g., no sé ‘dunno’, qué sé yo ‘what do I know’, tú sabes ‘you know’, etc.), because the existence of such formulaic sequences and their potentially idiosyncratic behavior is of interest to hypothesis 2. To examine the effects of the cognitive constraints, I a mixed-effects logistic regression analysis was performed with the lme4 package (Bates et al., 2016) in R. Since hypothesis 2 claims that the overall entrenchment of tokens in or will modulate the effects of all other predictors, tokens were grouped together by the conjugated verb forms they instantiate. As a surplus, this also controls for their Zipfian distribution. The tokens were also grouped together by speaker. To select a parsimonious model, I started out with full models including the random intercepts, the demographic information recorded by the corpus, the predictors described in the previous section, as well as the distance in clauses to the last intervention of the interviewer, which measures the relative degree to which speakers elaborate their answers. Then, I ran and evaluated candidate models for all possible subsets of these fixed effects, using the pdredge() function of the MuMIn package (Bartón, 2016). The output of this model selection procedure is a list of candidate models ordered by their
AICc
score. The model with
the lowest AICc value was selected as the starting point in the posterior model fitting process. To evaluate whether interactions and random slopes improved the model fit, those were added one at a time. If the addition of an interaction or slope lowered the
20
AICc
value of the model
with two units or more (Burnham & Anderson, 2002:70), it was included in the final model, provided the candidate model converged and did not overfit the data. To guard against overfitting, bootstrap confidence intervals were computed for the final model (using the confint() function of the lme4 package). I also controlled for overdispersion and multicollinearity, which were not issues (summed squares of Pearson residuals < residual degrees of freedom and Variance Inflation Factors < 5; Speelman, 2014). 6. Results As is shown in the top row of Table 2, applying the filters described in the previous section resulted in a database of 7,849 human-reference subjects, of which some 28% correspond to . In general, the predictors that were described in section 4.3 are highly successful at modeling the variation. As is shown at the bottom of the table, the model has ‘excellent’ discriminative ability (C-index = 0.85; Hosmer & Lemeshow, 2000: 162) and accounts for some 40% of the variance (cf. the Nakagawa & Schielzeth, 2013 pseudo-R2). However, much of this variance is absorbed by the random effects structure, not in the least the random effect for verb token. Without taking into account the random effects, the model has ‘acceptable’ discriminative ability (C-index > 0.70), but it only accounts for less than half of the amount of variance that is explained by the full model. As the model becomes much less adequate when the information about the verb forms (and the speakers) is omitted, these data offer a first indication that hypothesis 2 may be ascertained. Let us turn now to the review of the results that were obtained for each of the individual predictors.
21
Table 2: Logistic generalized linear mixed-effects model of subject pronoun expression in Cuban Spanish: Numbers, percentages, and coefficients for pronominal subjects (sum contrasts). N 2194/7849
% 27.95
Coefficient -1.68
Numeric
predictor
0.662
Empathy Speaker - Hearer Other
1618/4542 576/3307
35.62 17.42
0.412 -0.412
Production-to-production priming First/5+ clauses
832/2051 181/566 1181/5232
40.57 31.98 22.57
0.339 -0.037 -0.303
Subject Reference and referential distance Switch:non-adjacent Switch:new Switch:adjacent Continuity:adjacent
1133/3169 169/796 99/387 793/3497
35.75 21.23 25.58 22.68
0.333 0.266 -0.082 -0.516
Aspect Imperfective Perfective Continuous
1719/5928 445/1756 30/165
29 25.34 18.18
0.314 -0.106 -0.208
Kinesis Non-action Action
1627/5151 555/2565
31.59 21.64
0.281 -0.281
Volition Volitional Non-volitional
1300/4929 894/2920
26.37 30.62
0.166 -0.166
Animacy of the object Inanimate Absent Animate
681/2571 1391/4643 122/635
26.49 29.96 19.21
0.131 0.09 -0.221
Referentiality Referential Non-referential
1749/5504 445/2345
31.78 18.98
0.061 -0.061
Comprehension-to-production priming First/5+ clauses
75/1136 400/1562 1419/5151
33.01 25.61 27.55
0.095 -0.034 -0.061
Gender Males Females
972/4020 1222/3829
24.18 31.91
0.139 -0.139
Numeric
predictor
-0.098
1377/4141 817/3708
33.25 22.03 Variance 0.844 0.261 Fixed only 0.19 8295.4 0.73
0.045 -0.045 Std. Deviation 0.919 0.511 Mixed 0.41 7840.2 0.85
(Intercept) Collostruction strength Collostruction strength
Number of words between verb and SPP site Number of words between verb and SPP site Age 55+ years 21-35 years Random Effects Verb form Speaker Model summary Pseudo-R2 AICc C index
Note: The bobyqa optimizer for glmer was used in estimating the models.
22
As for the predictors that were designed to measure the effects of markedness of coding, Table 2 shows that speakers are more inclined to use when referring to themselves or the hearer (see example 6). (6) Yo he visto demasiadas cosas (LH13H21/838) ‘I have seen too many things.’ In turn, speakers are much less inclined to use for subjects that refer to nonparticipants of the usage event, such as the speaker’s father in example (7). These results reflect the empathy hierarchy: Speaker > Hearer >Other. (7) Desde muy pequeña, Ø nos llevó a mi hermana y a mí a la Biblioteca Nacional. ‘Ever since very little, [he] took us, my sister and me, to the National Library.’ In addition, the results for volitionality suggest that verbs that imply volitional involvement of the subject favor (e.g., venir ‘to come back’ and hacer ‘to do’ in example 8). In contrast, subjects that are not presented as volitional in the context of the event encoded by the clause are more likely to be expressed with < Verb>, as is illustrated in example (9). (8) Desde que yo vine para acá para La Habana, nosotros hemos hecho varias fiestecitas aquí (LH05M21/545-546). ‘Since I’ve come back here, to Havana, we’ve done various little parties here.’ (9) Eso todavía Ø lo debo a mi niñez (LH09M12/340). ‘This [I] still owe it to my childhood.’ Turning now the referentiality of the subject, the regression estimates in Table 2 show that subjects that refer to specific referents that have been introduced in earlier discourse as a full nominal are more likely to be expressed with . In contrast, subjects that refer to entities that are contextually evoked (such as ‘we, the inhabitants of this borough in example
23
10) or subjects that have generic first-person (see example 11), second-person (see example 12b), and third-person plural (see example 12a) reference are preferentially encoded with . (10) Y, entonces, esta zona, este barrio es muy propenso a las inundaciones, porque es una zona muy baja. Y a veces Ø pasamos muchos sustos con los, t, con los ciclones. (LH03M12/71) ‘And, so, this area, this borough is very prone to flooding, because it is a low area. And [we] sometimes pass fears with, with hurricanes.’ (11) Y, entonces, pude con ese mucho de teoría pero cuando va a chocar con la práctica, puede ser un desastre (LH20H12/342-343). ‘And, well, I could handle this large amount of theory, but when [one] is going to confront practice, [one] may be a disaster.’ (12) a. Ahora te pasan, ‘Nowadays [they] let you pass, b. sepas o no sepas (LH05M21/281-283). if [you] know or [you] don’t know.’ These results support that semantic traits that favor the focusing of attention on the referent of the subject favor , as hypothesis 1 proposes. Further support for this hypothesis is provided by the predictors that model the amount of attention that is attracted by the event (i.e., aspect, kinesis, and the individuation/animacy of the object). Specifically, for the animacy of the object, Table 2 reveals that speakers are less likely to use when an animate object is present in the clause (see example 13). In turn, when no object is present in the clause or when the object has inanimate reference, speakers are more inclined to use (see example 14).
24
(13) En ese caso que nos metimos en el patio, que Ø cogieron a mi amigo (LH23H21/637). ‘On that occasion, when we went into a patio, when [they] caught my friend.’ (14) Y, bueno, Ø perdías tres años (LH01H22/549). ‘And, well, [you] lost three years.’ A larger effect size is obtained for the aspect variable. For this predictor, the regression analysis reveals that expressed pronouns are preferred for verbs with imperfect aspect (see example 15). For verbs with continuous aspect and perfective aspect, is preferred, as is shown in examples (16) and (17). These data support that is preferred for aspect types that focus attention on the event. (15) Yo sí, yo posiblemente vuelva allá porque mis hijos me invitan. ‘Me I will, I possibly may go back there because my children invite me.’ (16) Y ahora, Ø estoy trabajando en el Atlas Lingüístico de Cuba (LH03M12/224). ‘And now, [I]’m working on the Linguistic Atlas of Cuba.’ (17) Por lo menos en nuestro departamento, Ø hemos logrado crear un ambiente muy bueno (LH09M12/287). ‘At least in our department [we] have succeeded in creating a very good atmosphere.’ Concerning kinesis, Table 2 supports that verbs that refer to actions, that is, verbs that imply movement on the part of the subject (e.g., venir ‘to come’ in example 18) or direct, physical contact between the subject argument and a second entity (e.g., freír ‘to fry’ in example 19) disfavor . Non-action verbs, in turn, favor expressed subjects.
25
(18) Cuando Ø vine de mi campo, que Ø vine con cinco años (LH06H21/163164). ‘When [I] came from my countryside, [I] came with five years of age.’ (19) Ø freía pescado y boniato (LH06H21/297). ‘[She] fried fish and sweet potato.’ These data support that features of lower transitivity (the absence of an object or nonindividuated/inanimate objects, imperfective aspect, verbs that refer to non-energetic events; Hopper & Thompson, 1980) favor , whereas features typical of higher transitivity favor (Posio, 2011). This pattern is highly favorable to the view that speakers prefer when relatively more attention is focused on the event, whereas they prefer when relatively more attention is focused on the subject, as hypothesis 1 predicts Regarding the discourse-oriented prominence of the subject, preliminary analyses revealed that a three-way factorized version of referential distance (referent occurs in an adjacent clause, referent occurs in non-adjacent clause, new) provided a much better fit than the continuous variable, which has a highly skewed distribution. However, this factorized version collides with subject reference, as subjects that form a ‘reference chain’ with the subject of the previous clause, also appear in that clause by definition. Therefore, these two predictors were collapsed into one regressor. As in earlier work on subject pronoun expression, the results reveal that subjects that imply a switch in reference with regard to the subject of the previous clause are more likely to be expressed with than subjects that continue to refer to the same entity. However, this tendency is severely mitigated by referential distance. That is, when the referent to which is switched is present in an adjacent clause, speakers do not display any marked preference for either variant. In contrast, when they switch to a referent that has not appeared in prior discourse or to a referent that was
26
mentioned further away in discourse, they favor . This supports that discursively prominent referents are more likely to be encoded with , as hypothesis 1 predicts. With regard to hypothesis 2, the results for collostruction strength reveal that speakers become gradually more likely to use as the collostruction strength of the verb token rises. In other words, the data support that, when confronted with the choice between and , speakers draw on their past experience with language and use the construction alternative they have witnessed most consistently with a particular verb form. This is exactly the pattern one would expect in the light of statistical preemption. In addition, Table 2 also shows that when one or more words are inserted between the verb and the SPP site, the use of becomes less likely with each element that is inserted. Since the presence of such elements inhibits the use of a prefabricated expression, these results add further support to hypothesis 2 and the more general claim that statistical preemption constrains SPP variation. Concerning structural priming, in preliminary analyses, the familiar ‘lexical boost’ (Pickering & Ferreira, 2008) was observed for both the priming variables. However, since priming also occurred in the absence of lexical repetition and because a simpler model that did not distinguish between priming with and without lexical repetition provided a better fit, the priming variables were collapsed into the following broader categories: , , First/5+ clauses. For these predictors, Table 2 shows that when speakers have just used or processed , they are more likely to use this variant and vice versa. Whenever they have not been exposed to either of the two variants, they are less likely to use this variant, in line with the limited overall use of SPPs in Spanish. However, it should be observed that, even though the directionality of the two priming variables is identical, their sizes are markedly different. This goes counter to laboratory work on structural priming, which typically finds the two priming modalities to have the same effect sizes (e.g., Pickering &
27
Ferreira, 2008), while confirming the results of earlier corpus-based studies that have attended to comprehension-to-production priming (AUTHOR, under revision). Finally, the regression model of Table 2 supports that male speakers and older participants are more likely to use than female speakers and youngsters. In this regard, referring to first-person singular SPP use, Ajijón-Oliva & Serrano (2010) and Serrano (2014) argue that certain groups of speakers use more often because these speakers are more likely to adopt a subjective style, which attributes much prominence to the speaker. Similarly, our results could suggest that male and older speakers are more likely to adopt such styles, but further research into the interaction of age, gender and these communicative preferences remains necessary. Let us now turn to the conclusions. 7. Discussion In this paper I have investigated whether the model of morphosyntactic variation that was presented in earlier work on existential agreement variation in Caribbean Spanish and British English generalizes from this type of alternations to other types of morphosyntactic variation. Overall, the results that were presented in this paper are highly favorable to this hypothesis. Particularly, it was shown that features which increase the likelihood that a speaker focuses his/her attention on the subject argument increase the probability that s/he will use . This is exactly what one would expect in the light of markedness of coding and the hypothesis that the meaning of consists in encoding the subject argument more prominently. For the second constraint – statistical preemption – hypothesis 2 predicts that the overall degree of entrenchment of the verb forms in either or will be a key determinant influencing the use of these two constructions. This is shown by the regression analysis in two distinct ways. Firstly, adding a random intercept for the specific verb form that
28
appears in the clause greatly improves the precision of the model. Secondly, the results for collostruction strength showed that the overall likelihood that speakers will use increases gradually when the collostruction strength values increase. However, when words appear between the verb and the SPP site, these appear to inhibit the use of , as statistical preemption predicts. Finally, as for structural priming, the data show that speakers are less likely to use when they have just processed or used an instance of and vice versa. This supports that the variation between the absence/presence of a pronoun amounts to a competition between two abstract constructions and that this competition is constrained by structural priming, as is claimed by hypothesis 3. In sum, the data presented in this paper support that subject pronoun expression constitutes a competition between and , which is conditioned by three domain-general constraints on the spreading activation of constructions (markedness of coding, statistical preemption, and structural priming). Of those three, only structural priming has been given its due attention in variationist linguistics (e.g., Travis & Torres-Cacoullos, 2012; Szmrecsanyi, 2006). Nevertheless, as I have shown here, markedness of coding and statistical preemption offer a principled way to predict which predictors will condition variation and with what directionality. Since these quantitative tendencies were shown to reflect domain-general cognitive constraints, the model of morphosyntactic alternations that was argued for in this paper offers a psychologically plausible way of characterizing the distribution patterns in morphosyntactic variation. In this sense, this paper has illustrated a particular way in which Cognitive (Socio)Linguistic theory may contribute significantly to variationist linguistics in the traditions of Weinreich et al. (1968) and Bresnan et al. (2007). However, as I have shown, the fruitfulness of this cross-fertilization will depend on a deductive research design, which
29
draws on theoretical insights every step of the way from hypothesis building, to coding, regression analysis, and interpretation. References Ajijón-Olivera, M. Á., & Serrano, M. J. (2010). El hablante en su discurso: Expresión y omisión del sujeto de 'creo'. Oralia: Análisis del Discurso Oral, 13, 7-38. Alfaraz, G. G. (2015). Variation of overt and null subject pronouns in the Spanish of Santo Domingo. In A. M. Carvalho, R. Orozco, & N. Shin, Subject pronoun expression in Spanish: A cross-dialectal perspective (pp. 3-16). Georgetown, DC: Georgetown University Press. Ariel, M. (2001). Accessibility theory: An overview. In T. Sanders, J. Schliperoord, & W. Spooren, Text representation (pp. 29-87). Amsterdam/Philadelphia, PA: John Benjamins. Bartón, K. (2015). MuMIn: Model selection and model averaging based on information criteria (AICc and alike). Retrieved May 2015, from https://cran.rproject.org/web/packages/MuMIn/index.html Bates, D., Maechler, M., Bolker, B., & Walker, S. (2016). lme4: Linear Mixed-Effects Models using 'Eigen' and S4. Retrieved May 2016, from https://cran.rproject.org/web/packages/lme4/index.html Bayley, R., & Pease-Álvarez, L. (1997). Null pronoun variation in Mexican-descent children's narrative discourse. Language Variation and Change, 9 (2), 349-371. Bayley, R., Greer, K., & Holland, C. (2013). Lexical frequency and syntactic variation: A test of a linguistic hypothesis. University of Pennsylvania Working Papers in Linguistics, 19 (2), Art. 4. Bresnan, J., Cueni, A., Nikitina, T., & Baayen, R. H. (2007). Predicting the dative alternation. In G. Bouma, Krämer, Irene, & J. Zwarts, Cognitive foundations of interpretation (pp. 69– 94). Amsterdam: Royal Netherlands Academy of Science. Burnham, K. P., & Anderson, D. R. (2002). Model selection and multimodel inference. New York, NY: Springer. Bybee, J. (2008). Grammaticization: Implications for a theory of language. In J. Guo, E. Lieven, N. Budwig, S. Ervin-Tripp, S. Ozcaliskan, & K. Nakamura, Crosslinguistic approaches to the psychology of language: Research in the tradition of Dan Isaac Slobin (pp. 345-355). Mahwah, NJ: Taylor and Francis. Bybee, J. (2001). Phonology and language use. Cambridge, MA: Cambridge University Press.
30
Cameron, R. (1997). Accessibility theory in a variable syntax of Spanish. Journal of Pragmatics, 28, 29-67. Cameron, R. (1993). Ambiguous agreement, functional compensation and nospecific 'tú' in the Spanish of San Juan, Puerto Rico, and Madrid, Spain. Language Variation and Change, 5 (2), 305-334. Cameron, R., & Flores-Ferrán, N. (2004). Perseveration of subject expression across regional dialects of Spanish. Spanish in Context, 1 (1), 41-65. Carvalho, A. M., Shin, N., & Orozco, R. (2015). Introduction. In A. M. Carvalho, N. Shin, & R. Orozco, Subject pronoun expression in Spanish: A cross-dialectal perspective (pp. xiiixxiii). Georgetown, DC: Georgetown University Press. Chomsky, N. (2001). Derivation by phase. In M. Kenstowicz, Ken Hale: A Life in Language (pp. 1-52). Cambridge, MA: The MIT Press. Chomsky, N. (1995). The Minimalist Program. Cambridge, MA: The MIT Press. Davidson, B. (1996). 'Pragmatic weight' and Spanish subject pronouns: The pragmatic and discourse uses of 'tú' and 'yo' in spoken Madrid Spanish. Journal of pragmatics, 36, 543565. Davies, M. (2016-). Corpus del español. 2 billion words (1200s-1900s). Retrieved January 2016, from http://www.corpusdelespanol.org/ Davies, M. (2008-). Corpus of contemporary American English. 450 million words, 19902012. Retrieved October 2016, from http://corpus.byu.edu/coca/ Dell, G. S. (1986). A spreading-activation theory of retrieval in sentence production. Psychological Review, 92 (3), 283-321. Enríquez, E. (1984). El pronombre personal sujeto en la lengua española hablada en Madrid. Madrid: Consejo Superior de Investigaciones Científicas. Erker, D., & Guy, G. R. (2012). The role of lexical frequency in syntactic variability: Variable subject pronoun expression in Spanish. Language, 88 (3), 526-557. Flores-Ferrán, N. (2007). A bend in the road: Subject personal pronoun expression after 30 years of sociolinguistic research. Language and Linguistics Compass, 1 (6), 624-652. Geeraerts, D. (2005). Lectal variation and emperical data in Cognitive Linguistics. In F. RuizMendoza de Ibañez, & S. Peña-Cervel, Cognitive linguistics: Internal dynamics and interdisciplinary interactions (pp. 163-189). Berlin/Boston, MA: De Gruyter Mouton. Geeraerts, D., & Kristiansen, G. (2015). Variationist linguistics. In D. Divjak, & E. Dabrowska, Handbook of Cognitive Linguistics (pp. 366-389). Berlin/Boston, MA: De Gruyter.
31
Geeraerts, D., Kristiansen, G., & Peirsman, Y. (2010). Introduction: Advances in Cognitive Sociolinguistics. In D. Geeraerts, G. Kristiansen, & Y. Peirsman, Advances in Cognitive Sociolinguistics (pp. 1-19). Berlin/Boston, MA: De Gruyter Mouton. Givón, T. (1983). Topic continuity in discourse: A quantitative cross language study. Amsterdam/Philadelphia, PA: John Benjamins. Goldberg, A. (2005b). Constructions, lexical semantics, and the correspondence principle: accounting for generalizations and subregularities in the realization of arguments. In N. Erteschik-Shir, & T. Rapoport, The syntax of aspect. Deriving thematic and aspectual interpretation (pp. 215-236). Oxford: Oxford University Press. Goldberg, A. E. (2006a). Constructions at work: The nature of generalization in language. Oxford: Oxford University Press. Goldberg, A. E. (1995). Constructions: A Construction Grammar approach to argument structure. Chicago, IL: Chicago University Press. Goldberg, A. E. (2011). Corpus evidence of the viability of statistical preemption. Cognitive Linguistics, 22 (1), 131–153. Gundel, J. K., Headberg, N., & Zacharski, R. (1993). Cognitive status and the form of referring expressions in discourse. Language, 69 (2), 274-307. Hochberg, J. (1986). Functional compensation for /s/ deletion in Puerto Rican Spanish. Language, 63 (3), 609-621. Hopper, P. J., & Thompson, S. A. (1980). Transitivity in grammar and discourse. Language, 56 (2), 251-299. Hosmer, D. W., & Lemeshow, S. (2000). Applied logistic regression. Oxford: Wiley. Hudson, R. (2010). An introduction to Word Grammar. Cambridge, MA: Cambridge University Press. Kiparsky, P. (1982). Explanation in phonology. Dordrecht: Foris. Kristiansen, G., & Dirven, R. (2008). Cognitive Sociolinguistics: Language variation, cultural models, social systems. Berlin/Boston, MA: De Gruyter. Labov, W. (1994). Principles of linguistic change. Volume 1: Internal factors. Oxford: Blackwell. Lakoff, G. (1977). Linguistic gestalts. In W. Beach, S. Fox, & S. Philosoph, Papers from the Thirteenth Regional Meeting of the Chicago Linguistics Society, April 14-16, 1977 (pp. 236-287). Chicago, IL: Chicago Linguistics Society. Lakoff, G. (1990). The invariance hypothesis: Is abstract reason based on image schemas? Cognitive Linguistics, 1 (1), 39-74.
32
Langacker, R. W. (2007). Cognitive Grammar. In D. Geeraerts, & H. Cuyckens, The Oxford handbook of Cognitive Linguistics (pp. 421-462). Oxford: Oxford University Press. Langacker, R. W. (2010). Cognitive Grammar. In B. Heine, & H. Narrog, The Oxford handbook of linguistic analysis (pp. 87-110). Oxford: Oxford University Press. Langacker, R. W. (2008). Cognitive Grammar: A basic introduction. Oxford: Oxford University Press. Langacker, R. W. (1990). Concept, image, symbol: The cognitive basis of grammar. Berlin/Boston, MA: Mouton De Gruyter. Langacker, R. W. (1991). Foundations of Cognitive Grammar. Volume 2: Descriptive application. Stanford, CA: Stanford University Press. Lastra, Y., & Martín-Butragueño, P. (2015). Subject pronoun expression in oral Mexican Spanish. In A.-M. Carvalho, R. Orozco, & N. Shin, Subject pronoun expression in Spanish: A cross-dialectal perspective (pp. 39-58). Georgetown, DC: Georgetown University Press. Levshina, N. (2015). How to do linguistics with R: Data exploration and statistical analysis. Amsterdam/Philadelphia, PA: John Benjamins. Matos-Amaral, P., & Schwenter, S. (2005). Contrast and the (non-) occurrence of subject pronouns. In D. Eddington, Selected proceedings of the 7th Hispanic Linguistics Symposium (pp. 116-127). Somerville, MA: Cascadilla. Myachykov, A., & Tomlin, R. S. (2015). Attention and salience. In E. Dabrowska, & D. Divjak, Handbook of Cognitive Linguistics (pp. 31-52). Berlin/New York, NY: De Gruyter Mouton. Nakagawa, S., & Schielzeth, H. (2013). A general and simple method for obtaining R2 from generalized linear mixed-effects models. Methods in Ecology and Evolution, 4, 133-142. Orozco, R. (2015). Pronominal variation in Colombian Costeño Spanish. In A. M. Carvalho, N. Shin, & R. Orozco, Subject pronoun expression in Spanish: A cross-dialectal perspective (pp. 17-38). Georgetown, DC: Georgetown University Press. Ortiz-López, L. A. (2011). Spanish in contact with Haitian Creole. In M. Díaz-Campos, The handbook of Hispanic sociolinguistics (pp. 418-445). Oxford: Wiley. Ortiz-López, L. (2009). Pronombres del sujeto en el español del Caribe: L2 vs. L1. In M. Lacorte, & J. Leeman, Español en Estados Unidos y otros contextos de contacto: sociolinguística, ideología y pedagogía (pp. 85-110). Frankfurt am Main/Madrid: Vervuert/Iberoamericana. Otheguy, R., & Zentella, A. C. (2012). Spanish in New York: Language contact, dialectal leveling, and structural continuity. Oxford: Oxford University Press.
33
Otheguy, R., Zentella, A. C., & Livert, D. (2007). Language and dialect contact in Spanish in New York: Toward the formation of a speech community. Language, 83 (4), 771-802. Pütz, M., Robinson, J. A., & Reif, M. (2014). Cognitive Sociolinguistics: Social and cultural variation in cognition and language use. Amsterdam/Philadelphia, PA: John Benjamins. Pütz, M., Robinson, J. A., & Reif, M. (2012). The emergence of cognitive sociolinguistics. Review of Cognitive Linguistics, 10 (2), 241-26. Pickering, M. J., & Ferreira, V. S. (2008). Structural priming: A critical review. Psychological Bulletin, 134 (3), 427-459. Poplack, S. (1984). Variable concord and sentential plural marking in Portorican Spanish. Hispanic review, LII (2), 205-222. Posio, P. (2011). Spanish subject pronoun usage and verb semantics revisited: First and second person singular subject pronouns and focusing of attention in spoken Peninsular Spanish. Journal of Pragmatics, 43, 777-798. Posio, P. (2015). Subject pronoun expression in fomulaic sequences: Evidence from Peninsular Spanish. In A.-M. Carvalho, R. Orozco, & N. Shin, Subject pronoun expression in Spanish: A cross-dialectal perspective (pp. 59-78). Georgetown, DC: Georgetown University Press. Robenalt, C., & Goldberg, A. E. (2015). Judgment evidence for statistical preemption: It is relatively better to vanish than to disappear a rabbit, but a lifeguard can equally well backstroke or swim children to shore. Cognitive Linguistics, 26 (3), 467–503. Serrano, M. J. (2014). El sujeto y la subjetividad: variación del pronombre yo en géneros textuales del Español de Canarias. Revista Signos: Estudios de Lengua y LIteratura, 47 (85), 321-343. Shin, N. (2014). Grammatical complexification in Spanish in New York: 3sg pronoun expression and verbal ambiguity. Language Variation and Change, 26 (2), 303-330. Shin, N., & Otheguy, R. (2005). Contact-induced change? Overt nonspecific ellos in Spanish in New York. In L. Sayahi, & M. Westmoreland, Selected Proceedings of the Second Workshop on Spanish Sociolinguistics (pp. 67-75). Somerville, MA: Cascadilla. Silva-Corvalán, C. (2001). Sociolingüística y pragmática del español. Georgetown, D.C.: Georgetown University Press. Speelman, D. (2014). Logistic regression: A confirmatory technique for comparisons in corpus linguistics. In D. Glynn, & J. A. Robinson, Corpus Methods for Semantics: Quantitative studies in polysemy and synonymy (pp. 487–533). Amsterdam/Philadelphia, PA: John Benjamins. Stefanowitsch, A., & Gries, S. T. (2005). Covarying collexemes. Corpus Linguistics and Linguistic Theory, 1 (1), 1-43.
34
Stewart, M. (2003). ‘Pragmatic weight’ and face: pronominal presence and the case of the Spanish second person singular subject pronoun tú. Journal of Pragmatics, 35, 191–206. Szmrecsanyi, B. (2006). Morphosyntactic persistence in spoken English: A corpus study at the intersection of variationist sociolinguistics, psycholinguistics, and discourse analysis. Berlin/Boston, MA: De Gruyter. Talmy, L. (2007). Attention phenomena. In D. Geeraerts, & H. Cuyckens, The Oxford handbook of Cognitive Linguistics (pp. 264-292). Oxfod: Oxford University Press. Temple-Lang, D., & CRAN. (2016b). XML: Tools for Parsing and Generating XML Within R and S-Plus. Retrieved May 2016, from https://cran.rproject.org/web/packages/XML/index.html Thompson, S. A., & Hopper, P. J. (2001). Transitivity, clause structure, and argument structure: Evidence from conversation. In J. Bybee, & P. Hopper, Frequency and the emergence of linguistic structure (pp. 27-60). Amsterdam/Philadelphia, PA: John Benjamins. Toribio, A. J. (2000). Setting parametric limits on dialectal variation in Spanish. Lingua, 110 (5), 315-341. Toutanova, K., Manning, C. D., & Singer, Y. (2003). Feature-rich part-of-speech tagging with a cyclic dependency network. Conference of the North American Chapter of the Association for Computational Linguistics – Human Language Technologies (pp. 252259). North American Chapter of the Association for Computational Linguistics. Travis, C. E., & Torres-Cacoullos, R. (2012). What do subject pronouns do in discourse? Cognitive, mechanical and constructional factors in variation. Cognitive Linguistics, 23 (4), 711-748. Weinreich, U., Labov, W., & Herzog, M. (1968). Empirical foundations for a theory of language change. In W. Lehman, & Y. Malkiel, Directions for historical linguistics. A symposium (pp. 97-195). Austin, TX: University of Texas Press.
35