Variationist Approaches to Spanish Morphosyntax

Draft of chapter to appear in Handbook of Hispanic Sociolinguistics, ed. by Manuel Díaz‐Campos. Oxford: Blackwell.

Variationist Approaches to Spanish Morphosyntax: Internal and External Factors Scott A. Schwenter, The Ohio State University 1. Introduction From the advent of studies of linguistic variation in the early‐mid 1960s, beginning of course with William Labov’s work on the island of Martha’s Vineyard (1963) and most significantly with his monumental research in New York City (1966), there has been a decided tendency among researchers of variation in English to concentrate their attention of variation at the phonic level.1 The reasons behind this tendency are well‐known: in the competition between “different ways of saying the same thing” at the level of the sound system, there is little possibility of meaning change between alternate pronunciations of a given word. Thus, for instance, whether English speakers pronounce the word best with or without the word‐final /t/ is irrelevant to its meaning and, in context, will be readily understood by their interlocutors as the same word. Likewise, to take a Spanish example, whether speakers realize a word‐final /s/ on a pluralized noun (like coches ‘cars’) as [s], as aspiration [h], or delete it altogether (Ø) is also largely irrelevant in context, since ample clues to the word’s plurality will be available from other sources. As researchers have discovered, however, this variation among phonic variants is not free; rather, there is a confluence of external (social) and internal (linguistic contextual) factors that constrain the variation in such cases, and due attention has been paid to both kinds of factors. The near‐exclusive interest in phonic variation lasted for over a decade, and many significant results and advances in variationist research were produced during this period. The future of variation was to forever change, however, when Beatriz Lavandera‐‐who had been a student of Labov’s at the University of Pennsylvania‐‐published her now‐famous (and to some extent infamous) article entitled “Where does the Sociolinguistic Variable Stop?” (Lavandera 1978). In this article, which has been required reading for nearly all variationists‐in‐training ever since its appearance, Lavandera questioned the wisdom of extending the variationist study of phonological phenomena wholesale to morphosyntactic phenomena, due to the very different nature of the two beasts. She pointed out that phonological variables do not carry referential meaning: whether a speaker says best or bes’ does not change the meaning of this word, or of any other word where such variability is possible. At the level of morphosyntax (or, more obviously, lexicon), however, referential meaning is being expressed: the choice between a Preterit and a Present Perfect, at least outside of context, will have referential consequences. Lavandera therefore questioned how and to what extent referential meaning could be affected by variant choice. But beyond that, she also pointed out that morphosyntactic options can affect non‐referential discourse meaning. In fact, one of the main targets of her critique was the work of Weiner 1 One clear exception to this rule is to be found in what could be called the “Poplackian School” of variation.

The work of Shana Poplack and her collaborators and students at the University of Ottawa (e.g. Poplack and Tagliamonte 2001), has focused much of its attention (though not exclusive attention) on morphosyntactic variation.

1


and Labov (later published in 1983), who analyzed English active and passive sentences as a variable. As Lavandera pointed out, it may be reasonable to consider active and passive versions of the same propositional content as expressing the same or similar referential meanings, but at the level of discourse meaning, actives and passives are very different. In the years following the publication of Lavandera’s seminal article, much debate has ensued about exactly what kind of “meaning” is at issue in the study of non‐phonological (i.e. mainly morphosyntactic) variation. Is it referential meaning, discourse meaning, or what? The position taken here, and the one that seems to have been adopted by most researchers, is that the meaning involved is typically discourse‐pragmatic in nature: the choice of one variant over another interrelates that variant and the ongoing discourse context in particular ways. This is not to say, however, that all such choices are made at an intentional, or even conscious, level. There will always be contexts where the interpretive difference between two (or more) variants has been neutralized (Sankoff 1988), and the choice between one or the other(s) will not affect either the message conveyed or the flow of the discourse. While these variants may be interpreted different outside of a discourse context, e.g. in isolation as in introspective methods like grammaticality or felicity judgments, when embedded into an appropriate context these differences are reduced to the point of irrelevance. Most work done on morphosyntactic variation nowadays adopts some version of this neutralization hypothesis and does not assume a priori that the variants in question “say the same thing;” or at least if it does so it is primarily to create a strawman position to be refuted later on in the analytical process. In fact, nearly always the purpose of the research is to use variationist methods to uncover exactly what the DIFFERENCES are between the morphosyntactic variants/constructions, from the perspective of both internal and external factors. In other words, variationist techniques in the study of morphosyntactic phenomena are as much a discovery procedure as they are a concluding analysis. In her seminal overview of the issues in and main findings of studies on morphosyntactic variation in Spanish, Silva‐Corvalán (2001:130‐1) identifies four main factors distinguishing syntactic from phonological variation: 1. In any language there is LESS syntactic variation than phonological variation. 2. Syntactic variation is more difficult to measure and quantify. 3. A syntactic variable’s contexts of occurrence are harder to identify and define. 4. Variation between syntactic forms could be due to SEMANTIC differences. It is these characteristics, for Silva‐Corvalán, that make the task of assigning social and/or stylistic meaning to (morpho)syntactic variation a difficult one. Instead, what research has uncovered is an intricate web of discourse‐pragmatic and syntactic factors as the primary influences on variation at this level. I would add, too, that there is often a logistical problem that makes studying social meaning and external factors difficult in cases of morphosyntactic variation. Precisely due to the fact that (most) morphosyntactic phenomena are much rarer than phonological variables, researchers must use computerized corpora‐‐consisting often times of millions of words‐‐in order to find and extract sufficient amounts of data for their analyses. Typically, such corpora provide only

2


very limited information about the social characteristics of their speakers, and in some cases they provide no such information whatsoever. As a result, there is a trade‐off between the quantity of data that one must examine in order to acquire enough data for one’s study and the information about external factors that is made available in the corpora. And unlike the case of phonological variation, where studies within what Eckert (2005) has called the “Third Wave” of variation analysis now focus on the speech of small groups or even that of a single speaker, and specifically on how speakers deploy phonological variables strategically and dynamically for stylistic effects, at the morphosyntactic level such studies are for the most part untenable. This not to say, however, that external factors do not play a role, and perhaps even an important role, in instances of morphosyntactic variation. What it means is that when social factors are included in quantitative analysis of morphosyntax, they are usually just elements of what Eckert calls speakers’ “social addresses”: age, gender, education level, social class, etc. In‐ depth social analysis of the kind encompassed by “Third Wave” studies, which typically employs ethnographic analysis of speakers as they interact in communities of practice, and looks in particular at how speakers use linguistic resources in order to both reflect and create meaning in their everyday lives (cf. Coupland 2007), is just not feasible when working with multimillion‐word corpora. In what follows, we will look in some detail at three recent studies of morphosyntactic variation, to obtain a picture of what kinds of factors have been considered in such studies and also how they have been interpreted. In each case, the factors represent operationalizations of hypotheses about the effects of contextual configurations on variant choice, and variable rule analysis is employed to disentangle the relative importance of the distinct factors, by ordering in hierarchical fashion both the individual factors that make up one factor group (as indicated by probability weights), and also the different factor groups themselves (as ordered by the range between the highest and lowest weights within a given factor group). Epistemic Adverbs and Mood Choice in Three Spanish Dialects For our first case study, will consider variation between indicative and subjunctive mood. Mood choice in Spanish (and other languages) is a longstanding case of great interest for variation studies. Prior research on this variable, however, has focused mainly on the licensing of mood and mood contrasts in subordinate clauses (e.g. Lavandera 1975; Silva‐ Corvalán 1994; Dunlap 2006) such as in conditional sentences, essentially ignoring factors affecting mood variation in main clauses. One main clause context where variation between indicative and subjunctive moods can be found in Spanish is in the scope of a small set of adverbs expressing notions of possibility and probability. Traditional grammars and Spanish‐language textbooks typically describe the subjunctive mood as “optional” in such contexts, and describe the difference between indicative and subjunctive as one of greater or lesser speaker certainty. Thus, in the same sentence, such as the one in (1), either of the two verb forms is grammatical, but the choice of subjunctive would convey greater speaker

3


uncertainty or doubt, while the indicative version would convey lesser uncertainty on the speaker’s part (cf. Haverkate 2002:34).2 (1) Tal vez el semestre termine (SUBJ)/termina (IND) hoy. The problematic nature of this claim lies in its circularity: the speaker is using the subjunctive mood because she is more uncertain, and she is (or must be) more uncertain because she has chosen the subjunctive. The same circular reasoning would apply for the indicative with respect to greater speaker certainty. Obviously, the presence of the epistemic adverb tal vez already conveys some degree of uncertainty, so to say that the choice of indicative is to convey greater certainty is also problematic, since it conflicts with the semantic content expressed by the adverb. It is precisely in instances like these, where commonsense intuitions fail to provide an independently verifiable account of the variation, that multivariate analysis is called for. However, previous studies of mood variation with these adverbs has approached the issue by using a forced choice task (Studerus 1995), by studying mood choice by prominent authors (Woehr 1972; Renaldi 1977), or by limiting quantitative analysis to raw frequency counts (DeMello 1995). To my knowledge, there have been no large‐scale multivariate analyses of this realm of mood variation in Spanish. To help redress this lack of prior analyses, King, McLeish, Zuckerman, and Schwenter (2008) studied variable mood choice with over 3000 tokens of five adverbs of possibility/probability (tal vez, quizá, quizás, posiblemente, probablemente), taken from three dialects of Spanish (Argentinian, Mexican, and Peninsular).3 All the tokens were extracted from the online Corpus de Referencia del Español Actual (CREA), which is available publicly on the website of the Real Academia Española (www.rae.es). Each token was coded for nine independent variables: adverb, polarity, tense/aspect of verb, adjacency of adverb and verb, temporal reference of the verb, person/number, dialect, style, and verb. However, as one might expect, the tense/aspect of the verb interacted heavily with the temporal reference of the verb, which was a factor group that was included in order to take into account form‐function mismatches between tense and temporal reference, such as when a present tense form is used with future reference (Mañana comemos sopa). Separate analyses with each of these factors but not the other revealed that the temporal reference factor group was the most relevant to the variation; as a result the tense/aspect group was discarded in the final analysis. In Table 1 are the results of a multivariate analysis of mood choice with the five adverbs studied, including the overall results and the comparative results across the three dialects. 2 The scope of the adverb is crucial here: verbs preceding the epistemic adverb occur exclusively in the

indicative: El trimestre termina/*termine tal vez hoy. 3 These dialects were chosen mainly on the basis of the number of tokens available for each of them. Many other dialects were not useable due to their low numbers of tokens for one or more of the adverbs under study.

4


Table 1: Significant Factors for the Choice of Subjunctive Mood with Five Epistemic Adverbs, Overall and Individual Country Results. Nonsignificant factor groups are in brackets [ ].

OVERALL (N = 3022) input = .49 (49.5% Subj) LL = ‐1825.62

Argentina (N = 982) input = .62 (60.3% Subj) LL = ‐560.36

Spain (N = 1083) input = .44 (44.7% Subj) LL = ‐694.51

Mexico (N = 957) input = .42 (43.9% Subj) LL = ‐600.38

Temp. Reference Present Future Past Adverb tal vez quizás quizá posiblemente probablemente Country Argentina México España Person 2nd Impersonal 3rd 1st Adjacency adjacent non‐adjacent

.63 .41 .28 Range 35 .60 .53 .50 .47 .39 Range 21 .62 .44 .43 Range 19 .55 .54 .50 .42 Range 13 [.53] [.49]

.71 .31 .23 Range 48 .66 .55 .38 .42 .45 Range 28 N/A [.69] [.59] [.49] [.48] [.55] [.49]

.61 .42 .30 Range 31 .60 .54 .50 .47 .39 Range 21 N/A [.38] [.55] [.50] [.45] .53 .46 Range 7

.62 .47 .29 Range 33 .58 .52 .57 .53 .32 Range 26 N/A .67 .57 .50 .37 Range 20 [.53] [.47]

As can be seen from the input values across the three dialects and the Country factor group, only Argentina (.62) favors the subjunctive overall. Across all three dialects, however, the Temporal Reference FG is the one that has the greatest effect on the variation, as measured by the Range. And within this FG, there is a clear asymmetry with respect to the temporal

5


reference of the verb in the scope of the adverb: when the verb refers to a situation in the present, it creates a strong favoring context for the subjunctive (.63 in the Overall results). By contrast, both future (.41) and past temporal reference (.28) strongly disfavor the subjunctive and favor the indicative mood. The cross‐dialectal comparison in Table 1 was then supplemented with a separate multivariate analysis of each of the five adverbs studied, again using the choice of mood (subjunctive vs. indicative) as the dependent variable, as shown in Table 2. The goal of this analysis was to uncover the similarities and differences among the adverbs as members of the same functional field, i.e. that of adverbial expressions of epistemic uncertainty, with respect to their licensing of indicative and subjunctive moods. Table 2: Varbrul Analysis of the Factors Affecting Mood Choice with Five Epistemic Adverbs in Spanish

Temporal Reference Present Future Past Dialect Argentina Spain Mexico Person Impersonal 3rd 1st & 2nd Mode Oral Written Adjacency Adjacent Non‐Adjacent

Probab. (N = 631) Input = .36 LL = ‐358.71

Posib. (N = 497) Input = .45 LL = ‐271.82

Quizá (N = 645) Input = .49 LL = ‐399.11

Quizás (N = 628) Input = .53 LL = ‐403.10

Tal Vez (N = 549) Input = .62 LL = ‐385.70

.70 .42 .22 Range 48 .70 .39 .40 Range 32 [.53] [.49] [.53] .61 .47 Range 15 [.46] [.55]

.69 .28 .26 Range 43 [.58] [.43] [.51] .61 .52 .24 Range 37 .64 .47 Range 17 [.55] [.45]

.64 .24 .31 Range 33 [.54] [.49] [.48] [.48] [.50] [.49] .40 .57 Range 17 .57 .45 Range 12

6

.58 .55 .30 Range 28 .62 .45 .43 Range 19 [.59] [.50] [.44] [.48] [.51] .57 .45 Range 12

.58 .51 .27 Range 31 .65 .44 .41 Range 24 [.51] [.50] [.47] [.46] [.51] [.49] [.52]


A common thread uniting all five adverbs was that in each case the factor group that showed most impact on the variation (as measured by the range between the highest and lowest factor weight) was Temporal Reference. As noted above, this was the same factor group found to most affect variation in mood choice across the three dialects studied. What is more, present temporal reference was for every single adverb a highly favoring context for the subjunctive. Only in the case of present temporal reference was there any deviation from this trend, since both quizás (.55) and tal vez (.51) showed slightly favoring probabilities for the subjunctive in present temporal reference contexts. In addition, our analysis revealed that the individual adverbs showed widely varying rates of subjunctive use, permitting the construction of an ordered epistemic scale for each dialect. When examining the variation cross‐dialectally, the ordering of the adverbs along this scale differed, most notably in Argentinian Spanish. A separate chi‐square revealed Argentinian Spanish to be significantly different from the other two dialects at the p animate > inanimate (see Comrie 1979, Næss 2007) • Specificity of Direct Object – is the referent of the DO uniquely identifiable (cf. Laca 2002) • Definiteness of Direct Object – a discourse pragmatic property, is the DO referent associated with a definite expression that can be identified with an already introduced discourse item (von Heusinger & Kaiser 2003) • Mass vs. Count Noun • Number – singular vs. plural • (Noun) Form of the Direct Object – pronoun, proper name, lexical noun • Presence of a same‐referent clitic pronoun (“clitic doubling” of the DO) – yes or no A Varbrul analysis was carried out on the extracted data using GoldVarbX (Sankoff, Tagliamonte, & Smith 2005), resulting in the configuration of significant factor groups presented in Table 3 (Buenos Aires) and Table 4 (Madrid) below: Table 3: Factors Contributing to the Use of a in Buenos Aires Input = .302 (39%); Log likelihood = ‐174.591

Prob % a‐marked N

% of data

Relative Animacy Range Animacy of DO Range Clitic Pronoun Range Specificity of DO Range Form of DO Range

Equal or DO greater Subject unspecified Subject greater 54 Animate Inanimate 50 Present Absent 50 Specific Nonspecific 35 Pronoun/Proper N Lexical N 34

.75 .63 .21 .77 .27 .89 .39 .61 .26 .77 .43

73% 27% 4% 78% 8% 94% 28% 46% 22% 76% 27%

300 44 270 275 339 104 510 431 183 152 462

49% 7% 44% 45% 55% 17% 83% 30% 70% 25% 75%

9


Table 4: Factors Contributing to the Use of a in Madrid Input = .179 (33%); Log likelihood = ‐164.494

Relative Animacy Range Animacy of DO Range Form of DO Range

Prob % a‐marked N

% of data

Equal/DO greater Subject unspecified Subject greater 57 Animate Inanimate 54 Pronoun/Proper N Lexical N 30

.82 .61 .25 .82 .28 .73 .43

35% 12% 53% 38% 62% 19% 81%

75% 37% 3% 77% 5% 57% 27%

198 67 296 216 345 106 455

The constraint hierarchies and relative magnitude of the factor groups Relative Animacy, Animacy of the DO, and Form of the DO across the two dialects are remarkably similar. Looking at the internal constraint ranking of individual factors in all three of these FGs, we find the relative ordering of the factors is also the same across the two dialects. Despite these similarities, however, the results also evince some clear differences between the dialects (see the boldfaced FGs from Buenos Aires in Table 4 above). These will be discussed below, after first considering the FGs shared by the two dialects, as summarized in Table 5. Table 5: Summary of Significant Factor Groups, by dialect BUENOS AIRES MADRID 1. Relative Animacy 1. Relative Animacy 2. Animacy of DO 2. Animacy of DO 3. Clitic Pronoun (Presence/Absence) 3. Form of DO 4. Specificity of DO 5. Form of DO Both Relative Animacy (of Subject and DO) and Animacy of DO were selected as the two most significant factor groups (as reflected in the Range) in the two dialects examined. But how independent are the effects of these two FGs, given that the animacy of the DO is necessarily a subpart of the classification of DOs according to relative animacy? In other words, can these two FGs actually be considered orthogonal in any way, as we would desire

10


them to be? Or are they FGs that show heavy interaction, and thereby just reflecting the same underlying tendency towards a‐marking on animate DOs? To answer this question, crosstabulations of the factor groups Relative Animacy and Animacy of DO were undertaken. The results of these crosstabs show that there is an independent effect of Relative Animacy on inanimate DOs. The overall rate of a‐marking on inanimate DOs is very low in both dialects (8% (26/339) in Buenos Aires; 5% (18/345) in Madrid), but the rate of amarking on inanimate DOs when the DO is equal in animacy to the Subject (i.e. both inanimate) is considerably higher in both dialects, at 35% (14/40) for Buenos Aires and 32% (7/22) for Madrid. This suggests an independent effect of Relative Animacy. In addition, the rates of a‐marking found when the DO is of greater animacy than the Subject (i.e. when the DO is necessarily human/animate and the Subject is not), as exemplified in (4) and (5), (4) Sí, cuáles son los valores que rigen a esa gente (HCBA, 121) ‘Yes, what are the values that govern those people’ (5) [la television] reúne a la familia. (HCM, 120) ‘[television] gathers the family’ also suggest an independent effect for Relative Animacy. In Buenos Aires, 26/28 (93%) of such DOs were a‐marked, while in Spain 10/11 (91%) were. Both of these figures are considerably higher than the overall rate of a‐marking for animate DOs in both dialects, which were 78% and 77% for Buenos Aires and Madrid, respectively. We conclude, therefore, that there is a clear independent effect of Relative Animacy on a‐marking that goes beyond the marking due strictly to the Animacy of the DO. The nominal form of the DO was the other factor group favoring a‐marking in both dialects. Table 6 shows the percentage of animate a‐marked DOs based on the form of the noun. Table 6: Effect of Noun Form on amarking of animate DOs, both dialects Lexical Noun Pronoun/Proper Noun Buenos Aires 68% (106/157) 92% (109/118) Madrid 71% (109/154) 92% (57/62) There are no significant differences between the two dialects for either the Lexical Noun or Pronoun/Proper Noun classes with regard to a‐marking of animate DOs. But as the table shows, a‐marking is much more frequent in both dialects for Pronouns and Proper Nouns than it is for Lexical Nouns. In addition, the rate of a‐marking on animates that are encoded by a Pronoun/Proper Noun in the two dialects considered together is 92%, which is considerably higher than the rate for all animates regardless of Noun Form at 78%.

11


However, additional crosstabs revealed that, for inanimate DOs across both dialects, there is no discernible effect of Noun Form; all are very rarely a‐marked no matter what their lexico‐morphological encoding (8% overall in Buenos Aires; 5% overall in Madrid). After considering the factor groups shared across the two dialects, we are now led to ask whether there are independent effects of the additional favoring factors in Buenos Aires Spanish. Or could it be the case that these FGs are simply interacting with another FG, e.g. Animacy, and thereby not contributing separate effects? Let us first consider the effects of Specificity. While it has been argued by some authors that Specificity is a marginal factor in a‐marking (Leonetti 2004, Brugè & Brugger 1996), the data in Table 7 show that the overall rate of a‐ marking in the class of animate DOs is significantly different across the two dialects when these are considered in the light of the specific vs. non‐specific distinction. Again, recall that Specificity was only selected as a significant FG in the Buenos Aires data, not Madrid. Table 7: Rate of amarking in animate DOs in the two dialects, by Specificity Specific Non‐Specific Buenos Aires 88% (175/200) 53% (40/75) Madrid 79% (141/178) 66% (25/38) 2 χ =37.33, p