Phonology and syntax - RUhosting

J. Linguistics 22 (1986), 455-474- Printed in Great Britain

REVIEW ARTICLE CARLOS GUSSENHOVEN Instituut Engels-Amerikaans, University of Nijmegen (Received 27 March 1986) Elisabeth O. Selkirk, Phonology and syntax: the relation between sound and

structure. Cambridge, Mass. & London: MIT Press, 1984. Pp. xvi+476. 1. INTRODUCTION

After the numerous theoretical proposals that have sought in recent years to revise the syntax-phonology mapping of Chomsky & Halle (1968), Selkirk here attempts a new and comprehensive definition of the relationship between syntax and phonology, one that incorporates much, but also leaves out much, of the theoretical apparatus that has been created in the last decade.1 The most strikingly absent devices are metrical trees (Liberman & Prince, 1977) and phonological domains (Selkirk, 1978). In a bold move, she dispenses with all constituents between the syllable and the Intonational Phrase and makes the metrical grid the pivotal device in her description, taking as her point of departure Prince's (1983) proposal for a grid-only account of stress. Her conception of the organization of the grammar, an aspect of the account that is treated with great care and explicitness, is set out in Chapter 1 ('The relation between syntax and phonology'). Syntactic surface structure, a representation with conventional labelled bracketing of phonologically specified words, is first provided with intonational structure to produce' intonated surface structure'. Assignment of intonational structure to syntactic structure means that pitch accents are assigned to words and focus to syntactic constituents, such that the relation between pitch accents and focus is well formed as determined by the Focus Rules, and secondly that the sentence is divided into Intonational Phrases (IPs). Because both Focus Rules and IP formation refer to functional concepts like 'modifier' and ' argument', Selkirk assumes that this is the level of representation upon which meaning-form conditions are defined. In this review I intend to go through the remaining seven chapters in the order they come in the book, raising questions and providing comment where I think this may be appropriate. Three topics are dealt with in some detail. These are the 'rhythm rule' (Section 3), intonation (Section 4), and function [1] This research was supported by the Netherlands Organisation for the Advancement of Pure Research.

455

JOURNAL OF LINGUISTICS

words (Section 5). The description of word stress is left undiscussed. By way of compensation, I will go a little more deeply into the problem of the ' rhythm rule', and tentatively explore the potential of an alternative approach. Section 2 expresses a preliminary worry about the status of the grid as a form of phonological representation (with thanks to John Ohala). 2. THE GRID AS A FORM OF PHONOLOGICAL REPRESENTATION

Chapter 2 ('Rhythmic patterns in language') discusses the metrical grid as a form of phonological representation. For Selkirk, an utterance is 'sounded' in tune with an idealized rhythmical score, and because that rhythm is hierarchical, with the longer-interval beats coinciding with the shorter-interval ones, the representation of the rhythmical score has the multi-levelled form that it has. This rhythm is abstract, in the sense of'idealized', a claim that should not worry us, as psychological support for it is provided by Donovan and Darwin (1979), who found that perceived speech rhythm is in fact more regular than the physical rhythm. What is particularly suggestive about the idea of the grid as a rhythmical score and the imagery of things being 'sounded' in tune with it is the notion of 'temporal alignment point', and the analogy it evokes with prominence-centres of (accented) syllables (for this concept and references cf. Buxton, 1983). That is, we may have to stop thinking about duration as something that units, domains, etc. have, and see it rather as the time that elapses between two alignment points of the same rank. While there is thus at least a potential interpretation of the grid, subsequent discussion shows that the notion of temporal alignment point is allowed to shade off to a vaguely interpreted notion of 'prominence', which is responsible for the fact that there seems to be no principled limit to the way the grid can be enriched in order to describe linguistic phenomena. For example, Selkirk's Nuclear Stress Rule places an extra grid level on the rightmost of a number of highest columns in a domain (say, in And now back to studio three, the word three would have one grid level more than studio). How should this be interpreted? The 'score' view suggests that the grid column over three is the alignment point of the whole utterance, but this is an unlikely interpretation, not least because Darwin and Donovan (1980) found that the regularization effect is confined to tone groups. Similarly, in this description, a phrase-final preposition is aligned with two levels, plus a following silent demibeat ('silent', because not aligned with a syllable; 'demibeat' = 'first-level grid mark'), while a phrase-final verbal particle is aligned with three levels plus a following silent demibeat. Should we infer that in has different realizations in That's when heflewthe PLANE in and This is the hanger heflewthe PLANE in! If not, must we assume a set of realization rules whose effect is to neutralize this particular distinction? Or again, the grids for phonological RESEARCH and PHONOLOGICAL RESEARCH (where capitalization symbolizes the presence of a pitch accent) differ not 456

REVIEW ARTICLE

only for phonological but also for research, which has one grid mark more in the second case. What does this tell us about its timing? In / don't CARE what sports contests Kahn tests the stretches contests and Kahn tests quite possibly have slightly different durational characteristics (cf. p. 105, note 5), but the difference in grid shape (cf. (1)) seems extravagant. But how can we tell whether indeed it is? (1)

X X

X

X

(x)

X

X

X

X

XXX X X

X

X

sports contests

X

X

X

X

X

X

X X X

Kahn

X XX X X X

tests

3. THE 'RHYTHM RULE'

The core task of the grid is to account for the rhythmic structure and restructuring of speech. Selkirk presents a very elegant account of how the grid is built up and how restructuring can take place. With essentially the same rules and principles accounting for both word-level and phrase-level stress patterns, the grid is built up cyclically according to a set of Text-to-Grid Alignment (TGA) rules. The net effect of these rules is that a reduced syllable has a demibeat, an unreduced syllable has a 'basic beat' (second level grid mark), and a syllable with word stress a third-level grid mark. Moreover, in any domain, the last highest grid column is increased by one grid mark if there are other grid columns that would otherwise be equally high (Selkirk's version of the Nuclear Stress Rule), while the same goes for the first highest column in compounds (her Compound Rule). Since these rules are impervious to the demands of rhythmicity, another set of rules, the Grid Euphony (GE) rules, are enlisted to adjust the grid, subject to the constraint that on any cycle they may not produce configurations that contradict the TGA rules on that cycle. The GE rules, assumed to be parametric choices out of a range of universal rules, are Beat Movement (BM), Beat Addition (BA) and Beat Deletion (BD). Together they aspire to an ideal alternating pattern. The formulation of the English rules, given in (2), is representative of the uncluttered nature of Selkirk's description. (2)

x x x x x x BM x x x -» x x x

x BA x x -* x x

xx x x x BD x x -» x x

For the purposes of rhythmic structure above the word, the grid is further provided with 'silent' demibeats (first-level grid positions), such that the number of silent demibeats correlates with the magnitude of the syntactic boundary (for an example cf. (1)). Another potentially interesting feature is 457


the status of pitch accents, which are freely assigned. While, naturally, the association point within the word they are assigned to is determined by the highest grid level, pitch accents are not in any way constrained by the grid, and are thus correctly divorced from the morphosyntactically determined stress pattern of the utterance. Rather, they determine the shape of the grid, in that any accented syllable has a higher grid column than any other syllable. The chapter on the' rhythm rule' (Chapter 4:' Phrasal rhythmic prominence') begins with a useful discussion of the proposals of Liberman and Prince (1977) and Kiparsky (1979), and their shortcomings. Unfortunately, unlike Chapter 3 ('Word stress and word structure in English'), in which a well-documented analysis of word stress is given, Chapter 4 works with a restricted set of examples. The problem here may be the lack of a canon of data of the type that has been in existence for word stress since Chomsky & Halle (1968), on which analyses can be tested. Prince (1983), however, does provide four types of nominal structures and a discussion of how the 'rhythm rule' behaves in each of them. Let us therefore use his four types as a basis for the discussion and see how Selkirk's description deals with them. In Type (a), a right-branching NP, exemplified by thirteen Japanese bamboo tables, her BM gets stuck after the first application in bamboo tables, cf. the circled grid mark. The syllables -nese and bam- now have equal prominence levels, as illustrated in (3). Note that application of BD or BA would only aggravate the situation.

(3) X

X

XX

X

XX

XXX

Thirteen

X

x X

® X

X

XX

X

XX

Japanese bamboo tables

Selkirk points out (189) that BM may sometimes not apply because the word within which the shift is to take place is not accented. When a pitch accent is assigned to the word in question, and the grid is adjusted accordingly, BM can take place. Observe that this description accounts for Prince's Japanese bamboo (a phrase, not a section from (3)): in Japanese BAMBOO BM is not satisfied; in JAPANESE BAMBOO the columns over -nese and -boo are one level higher, enough for BM to cause stress shift. As Selkirk points out, in the former case stress shift does not give the preferred pronunciation. In the case of (3), however, we would - quite contrary to the spirit of her proposal have to accent bamboo and tables, but NOT Japanese, in order to facilitate BM in the latter word (cf. (4)). (The pitch accent (pa) in bamboo is de-linked, and reassociated with the highest grid level after BM has taken place.)

458

REVIEW ARTICLE

x

(4) X

X X X X

X

XX

XX X

X

XX

(Thirteen) Japanese bamboo tables

It is pointed out that the failure of Selkirk's description in (3) is not general for Type (a) data. It only arises in right-branching structures in which the branches themselves branch. In Farrah Fawcett Majors, for instance, the correct pattern is derived thanks to the application of BA (cf. Hayes, 1984). Type (b), containing a left-branching, finally accented, modifier, exemplified by (my example) New York Times Classified, works fine in Selkirk's description, and is fully exemplified on pp. 194 ff. Because of the cyclic application, BM first takes place in the modifier (from York to New) and then in the NP (from Times to New). Type (c), containing a right-branching modifier, raises questions again. The structure is exemplified by one thirteen Jay Street and Maine-New York railroad, where the patterns of thirteen and New York characteristically fail to undergo stress shift (also noted in Giegerich, 1985: 208). Prince points out that the data are subtle: the examples should be compared to (my examples) John's thirteen cars and the main New York Railroad. Type (c) happens to be fully exemplified on pp. 177 ff., where it turns out that four patterns can be derived. To save space I give numerical versions of these outputs in (5). (5) (a) (b) (c)

3231 2431 2341

(d)

2331

For the examples given in Prince, certainly patterns (5 a) and (5 c) are wrong, and it cannot therefore be said that Selkirk's description deals adequately with this structure. It could be that her series of patterns seems more appropriate for the sort of examples she uses, which are of the type slightly underripe pears and rather lily-white hands. I am not sure, though, that her examples in fact represent a different case from Type (c). Let's first replace rather in the second example with convincingly (the word rather may naturally remain unaccented, and thus creates a possible source of confusion here; moreover, some speakers may find lily-white as a gradable adjective unacceptable, which engenders a further complication). Comparison of (6 a) and (6 b) does not really produce different results from a comparison of (7 a) and

459


(7 b): silly appears stronger with respect to white than does lily. I would suggest that the best description would be one that predicts stress shift in (6 b) and (7 b), but even stress in (6 a) and (7 a). I conclude that Selkirk's description is not up to Type (c), and also that the multiple output of (5) may not be so desirable a feature of the description as she claims it is. (There are, to be sure, examples of Type (c) structures that do behave like (5 c), to be discussed below.) (6) (a) (b) (7) (a) (b)

convincingly lily-white hands seventy silly white hands one thirteen Jay (Street) John's thirteen cars

Type (d), containing a compound modifier and exemplified by Tom Paine Street Blues, kangaroo rider's saddle, is of interest because of the lack of stress shift in the left-hand constituent of the compound, a fact Prince and Selkirk both agree on (p. 176; contra Liberman and Prince, 1977, who allowed stress shift in kangaroo rider's saddle). Selkirk points out that the grid for such structures'is quite stable'. I do not understand how in her Asian law experts' article, for instance, BD could not apply to facilitate the BM. BD could apply to the circled grid element in (8 a), after which the squared ones meet BM (cf. p. 69). x El x

(8) x

El x ®

x

x

x

XX

X

XX

x

x XXX

Asian law experts' article Another sense in which the configuration is not stable is that pitch accents could be assigned so as to facilitate BM. If we assign pitch accents to law and article, which causes their highest grid columns to be upped by one level, the way is free for BM from law to Asian, again, in violation of the facts (cf. p. 190).2 There is, moreover, another problem, one concerning the right-hand constituent of a compound modifier. Here, contrary to fact, stress shift is predicted in words like canteen, bamboo, campaign. In (9), for example, BM would apply to campaign, a situation that BD cannot step in to prevent. As [2] The Focus Domination of Pitch Accents constraint (282) does not prevent this shift if the accent on law is taken to signal focus for the whole of Asian law, a possibility that Selkirk allows (281). Even if she has disallowed it, though, the argument goes through: we could replace Asian law with a single word like antique, as in Prince's antique shop zoning board, where pitch accents on antique and zoning would lead to BM in antique. Incidentally, I do not believe that if lily-white is entirely focused, only white can have an accent, as Selkirk claims on p. 281.

460

REVIEW ARTICLE

in the case discussed under Type (c), it looks as if we would be better off with even stress. (9)

* X

X X X X X X X

X

X

X

X X

Republican election campaign blunders This overview of the way Selkirk deals with Prince's four structures reveals serious shortcomings in the description. The question arises whether these shortcomings result from the particulars of the description, or whether they point to a deeper problem with the framework within which it is cast. Certainly the way Selkirk builds up the grid, with the division into TGA rules and GE rules, as well as the cyclic nature of the derivation, strikes one as intuitively satisfying, and I see no evident single detail that can be blamed for the problems. Our scepticism is increased by the consideration that there seems no way in which one of the most important generalizations about the 'rhythm rule' can be incorporated in the description. This generalization, given in (10), is suggested by Prince's Type (d) data, but goes significantly beyond it, in that it covers both compounding and other level-2 word formations (i.e. word formations with Class II affixes like -ness, -ish, -er (agent.), - hood, etc., cf. Kiparsky, 1982). (10) The stress pattern of a level-2 derivation is inviolable In Type (d) data it is only the highest grid column that may interact with stresses on higher domains, but WITHIN the compound readjustment takes place 'either not at all, or with palpable reluctance', to use Prince's formulation. In (11 a,b), the sort of compound cases discussed by Prince are given, while (nc,d,e,f) illustrate how the generalization holds for other level-2 formations. Obviously, a theory that can incorporate (10) gets rid of all the problems discussed for Type (d) data above.3 (11) (a) Woodrow Wilson Street general store (b) Thirteen Club festivities (c) California-style fraud

- Woodrow Wilson Avenue general store - thirteen club membership cards - California's fraud

[3] Selkirk gives a post-Kq/kaesque ndvel, which is a counter-example to our claim. I find the pronunciation somewhat forced, but the example does illustrate the fact that the constraint may be broken in some types of speech. This does not invalidate it, however, in view of the clear difference in preference for stress shift in the examples in (11). Observe, too, that (10) may give us a way of distinguishing words (e.g. bisexual in bisexual relations) from phrasal compounds (e.g. bilateral in bilateral relations); cf. Giegerich, (1985: 216).

461


(d) the New Yorker's budget (e) the Forty-Nfners' Annual Meeting (f) Albert Hall-ish architecture

- New York's budget - the Forty-Ninth Annual Meeting - Albert Hall's architecture

Selkirk does not discuss the merits and demerits of a move that takes her 'pitch-accent-first' theory one step further. She correctly rejects Bolinger's view that the 'rhythm rule' should be explained as a preference for a particular intonation contour, the 'hat pattern' (190). However, Bolinger's proposal (1981, 1986: 60) also embodies the claim that what we call stress shift is really the effect of which of a number of possible accent positions is actually chosen for the realization of an accent, which claim can be interpreted as one that is independent of the intonational contour. That is, what if the 'rhythm rule' is really an accent rule in the sense of Goldsmith (1983)? Assuming we have a theory of accent assignment, what facts are unaccounted for if we phrase the 'rhythm rule' as a rule that deletes tonal association points (*'s) in particular configurations? There is at least one theoretical advantage to such an account. The contribution of intonational structure to stress shift would be expressed purely in terms of accent POSITIONS. Note that * is simply a slot at which lexical insertion (of a tone) is to take place. By contrast, in Selkirk's theory, Pitch Accent Association is a rule inserting the tonal material itself. However, there is no evidence that the choice of tone influences the propensity to stress shift (as she herself points out on p. 191). The presence of the tonal material prior to readjustment suggests that such an effect is at least possible. Moreover, it necessitates the de-linking and subsequent realignment of the tonal material after the readjustment has taken place (cf. (2), and 269ff.,283). However, there is nothing that could happen 'in between' these operations that could motivate their separation (cf. Poser 1982). For an accentual account of the 'rhythm rule' to work, we would need a theory of accent assignment that would (i) assign accent to modifiers (lily in lily-white, old in an old man, unless these are [-focus]), and (ii) assign accent to the first unreduced syllable in the word, in addition to one in the word-accent position (i.e. a word like kangaroo, if focused, would have two accents). The latter rule, which we could refer to as Initial Accent Assignment, shows idiolectal and dialectal variation (cf. Hayes, 1984), with words of the phonological type Japanese almost always subject to it, words like antique often, and words like September rarely. Rule (12), if applied cyclically and iteratively from left to right, would then derive a significant part of the 'rhythm rule' data. The maximum domain of the rule would appear to be the phrase. [4] Experimental evidence for the influence of the left-hand (but not for that of the right-hand) accent was found for Dutch in Gussenhoven (1983 a). Cooper and Eady (forthcoming) failed to find evidence for the influence of the right-hand accent in English.

462

REVIEW ARTICLE

(12) Rhythm Rule

•$*

There is no reason why (12) should not be rhythmically constrained, and be sensitive to the proximity of the flanking accents. For instance, application seems more natural in the Pennsylvania railroad than in the main Pennsylvania railroad (Bolinger, 1986: 68).4 What are the empirical advantages of (12)? First, we can eliminate Selkirk's Focus Domination of Pitch Accents constraint, which prevents stress shift in situations like three BLIND MICE and lily-WHITE HANDS, where three and lily are outside the focus (i.e. are 'old information'). These words are not accented, and (12) is not met. Second, the difference between Montana cowboy and good-looking soldier is explained without further ado, as (12) is only met in the second example. Third, no problem can arise in structures like Chinese expert ('expert on Chinese'), as again (12) is not met. Fourth, we account for the fact, noted and explained by Selkirk, that stress shift is disfavoured in phonological RESEARCH (190). Fifth, the account correctly predicts alternative pronunciations. In New York Daily News, for instance, (12) would delete the * of York, giving New York Daily News. Thus it predicts a possible pronunciation with LH's (rises) in the accented positions (used to express incredulity), but not the impossible version with such rises on New, York and News. A second application on the same cycle may produce New York Daily News. Selkirk's description, incidentally, makes the wrong prediction here. If we want BM to apply in this phrase, we need to apply BD first (to the grid mark over Daily), after which BM can go through (from York to New). But this leaves York stronger than Daily, which, as noted, seems wrong. Sixth, we need no version of the NSR. Selkirk repeats the standard claim that the last stress/accent is stronger than the others (154), but I suspect that the main motivation for the rule is that it does a certain amount of work in her description. What intuition or fact is the NSR supposed to capture? Certainly in downstepped contours, nai've listeners hardly ever hear the last accent as the strongest. Seventh, most of the cases that were problematic for Selkirk's description are correctly derived. Type (a), like Type (b), is straightforwardly dealt with, in cyclic derivations. Type (d) data are singularly easily dealt with in a word-level accent assignment theory. All we need to do is to postulate a level-2 accent rule, Initial Accent Deletion, which deletes * before *, guaranteeing the inviolability of such formations. Type (c) data, however, pose two problems, one minor, the other major. The minor problem is that examples of the type [[a[bc]]d] do exist in which stress shift takes place in [be]. Hayes gives an almost hard-boiled egg and a hundred thirteen men (cf. 16 a).

463


(i 6) (a)

w w s almost hard-boiled

(b)

s egg

The point must be made, however, that what Selkirk presents as a general type of counter-example to a tree-based description, in particular Kiparsky's version of the Rhythm Rule given in (16b), really concerns a subset of Type (c) data (as should be clear from the fact that in none of Prince's examples does stress shift apply to the constituent [be]). That is, in general we find that stress shift does NOT apply to the constituent concerned. Further examples are a Northern Japanese accent, with ethnic Chinese backing, a southern New York style ofarchitecture. What we seem to be dealing with in (i 6 a), and other examples like it, is a genuine case of prosodic restructuring: they are simply pronounced AS IF they had right-branching structures. (Why such restructuring takes place in certain cases will not concern us here. It is noted, however, that it is de rigueur in common - as opposed to' street number' - numerals (cf. two thousand three hundred and twenty-seven dollars), and apparently adverbs of grade seem to invite it. In any case Hayes' Quadrisyllabic Rule cannot be the whole answer, in view of the minimal pairs that can be formed.) Thus the approach explored here should in principle have no problem with Type (c) data (cf. Selkirk's solution, which generates four patterns, without differentiating between different Type (c) data). The major problem is that, while (12) correctly predicts the heterophony of examples (7 a) and (7 b) after the application of (12) on the lower cycle, it predicts homophony after the application on the higher cycle. Generally, the problem is that (12) does not have anything to say about the rhythmic properties of unaccented stretches of speech, whether occurring initially, medially (i.e. between accents), or finally. For example, if there is only a single accent on tables in (3), we may nevertheless feel there are rhythmic beats on the first syllables of thirteen, Japanese and bamboo, even when the phrase is said on a constant low pitch. A similar observation can be made with regard to / don't HAVE thirteen Japanese bamboo tables. It is stressed, however, that, phonetically, we are treading very uncertain ground here. Gimson (1956) refers to Scott's (1939) report that listeners appeared 'most uncertain' as to which sentence was meant when presented with spoken versions of Are you SURE Wood imports 464

REVIEW ARTICLE

wood and Are you SURE wood imports would. More recently, we have Huss (1975), who reports that, while statistically the durational differences between members of pairs like import (noun) and import (verb) spoken in post-nuclear position go in the right direction, the distinction must be considered to be neutralized, because his listeners sometimes unanimously voted for 'noun' when a verb was produced and vice versa. Huss (1978) investigated the same question in more detail. In a perception test with sentences like Now the SOLdiers insult counts and Now the SOLdiers' insult counts, produced by seven speakers of American English, the percentage of correct responses was 52.7 per cent, hardly better than chance. The interesting finding was that in 78.2 per cent of the cases listeners heard the sentence with the noun. It turned out that the explanation lay in the general syllabic rhythm of the carrier sentences. In his sentences, rhythmic alternation {soldiers insult counts) favoured the noun-pattern, quite regardless of whether the item in question was a noun or a verb. Further experimentation showed that the bias in the scores reflected both a perception effect (i.e. the same item produced markedly different results depending on whether it was spliced into the carrier sentence Say the word - again or What does English - mean ?) and a production effect. Thus the underlying word-stress contrast is neutralized, and there is a rhythmic 'overlay' which is independent of whether syllables have word stress. Now what do we want our surface phonological representation to express? The perception facts? If so, the facts as determined by 'objective' perception of isolated words? Or embedded in sentences whose content is neutral with regard to the pattern in question, as in Huss (1978)? Or embedded in sentences whose structure tells us what pattern to perceive, i.e. 'linguistic' perception? The last position, advocated by Chomsky and Halle (1968: 25) and endorsed by Selkirk (155), is problematic: just how far do we want to go in encoding such morphosyntactic awareness in the surface phonology? However, if we assume that surface forms encode the surface production facts, then the failure of the accentual account to deal with the rhythmic properties of unaccented speech is not so much a problem as an advantage. 4. INTONATION

The description of focus and pitch accent (Chapter 5: 'The grammar of intonation') is crucially not based on the grid. (Recall that Selkirk takes the position that the grid takes pitch accents as input to the grid-building rules.) As she points out, her description of focus is very different from the earlier generative treatments of accent and focus, in particular from those by Chomsky and Jackendoff. There is a clear separation of the representation of pitch accents from that of stress; a recognition that rules relating accents and focus are needed since there is no bi-uniqueness in the relation focusaccent (i.e. the recognition of'focus structure'); that 'normal stress' is not 465


a theoretical concept; that the fact that native speakers of English can have reliable intuitions about the focus structure of German sentences (e.g. that betrachtet in PETER hat das BUCH betrachtet can be 'new') suggests that these languages have - a t least in part-the same rules; that the pitch accent-focus relation is not governed by word order in surface structure, and more. (Many of these positions are familiar from other treatments.) Roughly, the proposal is this: 'focus' is a property that syntactic representations are tagged with in accordance with the presence of pitch accents on words. Two rules account for the relationship between focus annotation and pitch accents: (a) a constituent (word) to which a pitch accent is assigned is a focus (b) a constituent may be a focus if either (i) its head is a focus, or (ii) a constituent contained within it that is an argument of the head is a focus Reference to both 'argument' and 'head' in (b) allows the focus to be sent up to a VP both by a pitch accent on a noun (which can pass it on to its NP, which, as an argument, can send it up to the VP) and by a pitch accent on a verb (which can send it up in its capacity as head of the VP). For instance, Rule (b) accounts for the two interpretations of She watched KOJAK, where either Kojak is a focus (i.e. it could be a reply to What did she watch?) or watched Kojak is (e.g. What did she do?). Similarly, we could send up the focus to the VP from the head, as in She WATCHED Kojak. Within the VP focus, we have embedded foci (e.g. Kojak in the broad-focus reading of She watched KOJAK) and/or embedded unfocused constituents (e.g. Kojak in She WATCHED Kojak). Notice that the way (bii) is formalized commits Selkirk to the view that subject-verb sentences with only an accent on the subject (My UMBRELLA'S been found) cannot have the VP in focus, since the argument (umbrella) is not contained within the VP. This decision is understandable if it is based on the consideration that in subject-verb-object sentences the subject cannot impart focus to the VP (Our NEIGHBOUR'S installed a Jacuzzi), but seems otherwise unmotivated. (Selkirk concedes that SV sentences with just an accent on the subject can be uttered 'out of the blue', but offers no explanation of this fact.) A crucial feature of the description, then, is that focus can be passed on up the syntactic tree (in practice up to the level of the VP) and that within a larger focus there may be focused and unfocused smaller constituents. This notion of embedded foci is particularly useful, so Selkirk argues, in cases where we would wish to characterize the entire VP as focus, but cannot commit ourselves to considering all the constituents within the VP as focus. Thus, both She sent a book to MARY and She sent a BOOK to MARY can have intonational readings in which sent is included in the focus (for example 466

REVIEW ARTICLE

contexts see p. 211) and therefore the VP must be focus. Since they clearly differ where the focus status of a book is concerned, however, we need 'embedded foci'. She then observes that what is a focus and what is 'new' form a one-to-one relation only in the case of arguments: a verb need not be a focus, and yet can be interpreted as ' new' (as in She sent a book/BOOK to MARY), and it can be a focus and yet not be interpreted as 'new', as in Ladd's well-known A: Has John read Slaughterhouse Five? B: John doesn't READ books. Consequently, a Focus Interpretation Principle is needed, which S tentatively formulates as: F(argument) new information. Now, recapitulating the path linking 'new' and 'pitch accent', from a pitch accent we derive a focus on a word (rule (a)). If this constituent is a head or an argument, its focus may be transmitted to directly commanding higher constituents, within which there may be embedded unfocused or focused constituents. In addition, focus is only bi-uniquely related to ' new information' in the case of arguments. This description raises certain questions. For instance, why go to the trouble of having Rule (b) if only foci on arguments are interpreted as new information? Clearly, what needs to be added (and what is no doubt intended) is that any HIGHEST focus should be interpreted as new information. (This is necessary even in cases where the focus is not transmitted, but the verb must be interpreted as new.) Second, note that in this description we would appear to arrive at three structures for sentences of the type / don't SMOKE. If it is a reply to Cigarette?, the verb passes on its focus to the VP but is not itself interpreted as 'new' (i.e. only the VP is, in line with the analysis of Ladd's John doesn't READ books). If it is a reply to What are some of your GOOD qualities?, it passes its focus on to the VP, which is 'new', but the verb itself is also 'new', so is presumably interpreted as such. If it is a reply to You don't DRINK, it does not pass on its focus, and is interpreted itself as 'new'. (The fourth possibility, where no 'newness' is interpreted at all, could be ruled out by an overriding principle by which some focus must be interpreted.) Somewhere in the above, I feel, we seem to have left language structure and entered the area of language use. The richness of this description (I am here interpreting Selkirk's description, not reproducing one of her examples) is suspect. Most probably, the reason is that by having both rules that derive focus from pitch accents, and rules that determine which focus means 'new information', we seem to be having two levels of representation mediating between pitch accents and 'newness'. This gives the theory too many degrees of freedom. It also seems self-defeating: part of the reason the Focus Rules are there is to capture our intuition that unaccented verbs can be new, while the Focus Interpretation Principle tells us we cannot tell newness from a verb anyway. Selkirk stresses that her proposal is a tentative one, and is not intended to provide the answers to all problems. Granting this, it cannot be very encouraging that in spite of the richness in theoretical apparatus to deal with verbs, it does not really come to grips with the focus properties of verbs. 467


Leaving this general problem aside, I would like to take up two minor points. First, by analysing e.g. by Beethoven in (12) as a focused PP with an embedded unfocused NP (224), Selkirk implicitly gives up the idea that languages like English and German have the same focus structures. What would be the focus in the German equivalent (13), or in the Dutch (14)? Would it not be better to say that the three languages here realize the same focus (namely, on the polarity of the embedded clause) by different accent placements, as in Gussenhoven (1983 b)? (12) I didn't know it was BY Beethoven (13) Ich wupte nicht das es von Beethoven geschrieben WAR (? DAS) (14) Ik wist niet DAT het van Beethoven was (or WAS) Second, while Selkirk concedes that her description is not up to Bresnan sentences of the type John asked what BOOKS Helen had written and John asked what Helen had WRITTEN, I fail to see how it accounts for the Bresnan sentence / have INSTRUCTIONS to leave (237). Selkirk, now assuming that if the verb is not separately focused it is 'old , predicts focus for the whole VP and 'old information' for the verb in both the complement reading (as an answer to e.g. Why are you going?) and in the relative clause reading (as an answer to e.g. What are you doing here?). But this seems wrong: only in the former case is to leave 'old information'. Intonational phrasing is dealt with in a final, again tentative, section of this chapter. Selkirk's account is commendable in that it defines what an Intonational Phrase (IP) is in terms of linguistic units (any stretch ending in a boundary tone) and (b) stating a rule defining well-formed IPs. Like Focus Rules, this rule refers to functional concepts, Selkirk's position being that IPs are not syntactically defined. The account sails through examples like This is the cat that ate the rat that stole the cheese, but is stymied by the fact that non-restrictive and restrictive modifiers behave differently. There is one decision in the description that may raise questions. Selkirk follows Bing (1979) in reserving separate IPs for vocatives and other unaccented and 'extra-sentential' material (e.g. Minnie cried in 'The WHOLE NIGHT!', Minnie cried), but not for main clauses following modifiers (e.g. The WHOLE NIGHT Minnie cried), which is clearly unwarranted. There may be differences in junctural timing characteristics between one structure and the next, but any such differences should be accounted for by a theory of syntactic disjuncture, the subject of the next chapter. Chapter 6 ('Syntactic timing: juncture and the grid') deals with the rule of Silent Demibeat Addition (SDA), and in so doing presents a phonological theory of syntactic disjuncture. An interesting prediction made by the description, which essentially amounts to the placement of silent demibeats after syntactic categories such that the magnitude of the boundary corresponds to the number of demibeats, is that final lengthening and pausing are 468

REVIEW ARTICLE

manifestations of the same thing. If this prediction is correct, it would be of great importance for phonetic research into syntactic disjuncture, since any observed variability in the duration of the final syllable or in the duration of the pause may be considerably reduced if the durational unit is taken to be the final syllable plus the pause. To support her analysis, Selkirk cites reports by Martin and Lehiste (312) that final lengthening and pausing are in fact perceptually equivalent. The place of the silent demibeats in structural descriptions of phonological rules is further argued for with the help of the English ' rhythm rule' (cf. Marcel proved... and Marcel Proust, where stress shift is normal in the second case, but disfavoured in thefirst).The discussion is flawed to the extent that Selkirk uses the well-known pair Take Grey to London and Take Greater London to illustrate the effect of SDA after words. The longer Grey relative to Grea- should, at least in part, be accounted for by a difference in syllabic structure: the first syllable of Greater is closed by a fortis obstruent, which shortens the vowel, the syllable Grey is not. A pair like diet ten and die at ten provides a fairer comparison. Selkirk also proposes that rules of sandhi, which in other analyses are either constrained by syntactic constituency or prosodic-domain constituency, should be made sensitive to silent demibeats. That is, a particular sandhi rule might be 'set' at 9 AT (where AT = Arbitrary Time Unit). If speech tempo, expressed as the timespan of a demibeat, is set at 6 AT, the rule would apply across a boundary marked by one silent demibeat ( = 6 AT) but not across one marked by two (=12 AT), but at a higher tempo, with a demibeat occupying 4 AT, the rule would also apply across a two-demibeat boundary. The proposal seems an attractive one for sandhi rules that are sensitive to both syntactic constituency and speech tempo (rather than to stylistic register), such as Dutch Regressive Voicing, which applies as a function of syntactic boundary (Loots, 1983) and speech rate (Slis, 1985). Here, Selkirk also makes some interesting comments on Italian raddoppiamento sintattico and French liaison. 5. FUNCTION WORDS

The big question Selkirk is faced with in Chapter 7 ('Function words: destressing and cliticization') is this: is the absence of the unreduced forms in sentences like those in (17) to be explained as caused by the syntactic gap appearing after the function word, as originally in King (1970), or by the phrase-final position of the function word? (17) What were you thinking [of—]PP last night? *[av] She's not much taller than I [am — ] V P *[am] I saw [his — ] N P burn *[iz] Changing her position relative to Selkirk (1972), she now claims that a gap-solution has little to recommend it. This is indeed convenient, for it is 469


easier to get ends of phrases to modify the grid than it is to get syntactic gaps to do so, since Silent Demibeat Addition is already available for this task. Moreover, not all syntactic gaps are associated with rhythmic disturbances, and the modification in the grid would too obviously amount to the diacritic use of a phonological representation. But do the facts really support the conclusion that a phrase-final solution is preferable to a gap-solution? A phrase-final solution encounters the following problems. (i) Personal pronouns that are phrases are phrase-final, yet do destress (as in you [ju] don't like John?. (ii) Object pronouns occur both at the end of a VP (as well as being themselves NPs) yet do destress (as in John likes him [im]). (iii) Sentences like (18) would appear to demonstrate that a syntactic gap can block destressing of a function word in an environment in which the function word is not phrase-final. (18) John is proud of his daughter, and he [is *[z] —of his son]VP, too Selkirk solves the first problem by declaring phrasal pronouns to be honorarily subject to the Principle of Categorial Invisibility: as a result, no silent demibeat is placed after them, and they will reduce. For object pronouns there remains the problem that they are not just phrases by themselves but are also VP-final, and hence will get assigned silent demibeats after them after all. This she solves by postulating an encliticization rule: Pronoun Encliticization gives pronouns the status of a suffix to a verb or preposition. Both solutions workfine,and the second seems attractive in that it is theory-independent. However, the third problem is not really solved at all. Selkirk tries to neutralize the counter-evidence provided by sentences like (18) by appealing to the explanation she gives for data like (19). (19) She's a better doctor than she {•[I?]1} a linguist With regard to (19), she observes that the starred pronunciation is not in fact ungrammatical in everybody's English, and quotes the following sentence from a newspaper advertisement in a footnote: Looks as good as it's fun to play. She proposes that retention of the full form in sentences like these has a rhythmic explanation: because such sentences display 'focus pairing', they naturally acquire a certain cadence, causing speakers to have a rhythmic hiatus before the second of two paired foci. Before the hiatus, the unreduced form of the function word would be preferred. But now, in order to account for (18), she extends the 'paired focus' account to gapped sentences. This is undesirable for the following reasons. First, no dialectal variation has been reported for such sentences. It would appear that in any variety of English the unstressed form is ungrammatical. Second, the 'paired focus' rule would have the odd property of operating over linguistic constructs to which more 470

REVIEW ARTICLE

than one speaker can contribute. This situation may arise in A: He is proud of his daughter. B: Yes, and he is of his son, too. Third, no focus pairing need be present for the unstressed form to be ungrammatical: He's real PROUD of his son, and I am of MINE. Apparently, sentences like (18) provide genuine counter-examples to Selkirk's theory. What are the arguments against a gap-based solution? First, Selkirk shows that prepositions that function as constituents in verb compounds, as in He flew the PLANE in, are always unreduced, even when no gap follows. As she points out, such constituents need to be treated like real words anyway for the purposes of word stress, so this is not really a problem. Next, not all syntactic empty categories in fact block destressing. In particular, the subject w/i-trace after complementizers in sentences of the type the girl who — left, as large as — could be found has no phonological effect. It turns out that this gap can be distinguished from the blocking gaps in a number of ways. Selkirk herself tentatively suggests (374) that it is the subject status of the wh-trace that could do this (cf. also Berendsen, 1985). Presumably, we could also require that the trace should c-command the function word. This points to what is the simplest generalization, one that in a sense exploits Selkirk's observations, namely that the function word should be followed by a gap WITHIN THE SAME PHRASE.

What Selkirk calls Auxiliary Deletion, the more complete segmental reduction of auxiliaries occurring in I'm ill and She'll go, is viewed as a purely phonological (grid-based) rule of encliticization, unlike the syntactically governed encliticizations of to in / wanna, not in / couldn't, them in I found them and have in / could've. While these four encliticization rules refer to lexical or syntactic properties of the host, Auxiliary Deletion applies in a given grid configuration: if the auxiliary is stressless and is not preceded by silent demibeats. The preceding syllable necessarily belongs to a monosyllabic pronoun, since only these were earlier declared honorarily subject to the Principle of Categorial Invisibility (i.e. unlike other words, they do not have silent demibeats after them).5 Auxiliary Deletion applies to the auxiliary and modal forms am, are, is, has, had, have, would, will, and causes them to lose their syllabicity. Her motivation for a rhythmic description is that Auxiliary Deletion affects a SUBSET of the contexts in which auxiliaries are subject to destressing (i.e. there are no cases where the contracted form of has ([s/z]) is allowed, but the destressed form ([(h)az]) is disallowed). Since Monosyllabic Destressing has a grid-base description, so will Auxiliary Deletion have to have a grid-based description if this generalization is to be expressed, given [5] This declaration is given on p. 346, where it is said to apply to personal pronouns. Note that for the purposes of Auxiliary Deletion not just personal pronouns but all pronouns are taken to be subject to this Principle: on p. 403 where, how, why are assumed to have been invisible qua word to prevent them from having a third-level grid mark as well as a silent demibeat, a regular feature of all function words, but also qua phrase to prevent Silent Demibeat Addition from applying anyway.

471


the particular organization of syntax and phonology that Selkirk argues for in her book. After all, it would be theoretically dubious to first map the syntax into a phonological grid, and then write rules that take the grid as input, but IN ADDITION refer to syntax. Therefore, Auxiliary Deletion is formulated as a non-syntactic, grid-based rule, which means in particular that no reference can be made to syntactic properties of the host, Now the only syntactic information concerning the word preceding the auxiliary that has been encoded in the grid is 'phrasal pronoun'. Clearly, that information is an overriding factor: cf. She'll try that and *Mary'II [1] try that. But for the forms 's and 'd, syntactic information crucially does seem to be involved, as illustrated in the examples in (20), most of them taken from Kaisse (1985). Whatever it is (subject-hood, NP c-command over the auxiliary, some notion of government: cf. Kaisse (1985: ch. 3) for discussion), it is clearly not the case that the explanation is likely to be rhythmic, as Selkirk suggests on p. 405 on the basis of a consideration of examples like (2Og): none of the other examples displays the rhythmic disjuncture that may accompany preposed adverbials like never in (20 g). (20) a. b. c. d. e. f. g.

What day is *[z] today? What sign is *[z] Mary? How high had *[d] he jumped? So would *[d] I Under the rug is *[z] an ugly crack No way is *[z] he going to do that Never has *[z] he seen her

- What day's convenient? - What sign's appropriate? - I didn't know how Di'd jumped - Joe'd go - Under the rug's the safest spot - No way's better than a personal approach - Leaver's seen her

In order to accommodate the fact that contracted [z/s] and [d] occur even when the preceding element is not a pronoun, as in (20), there is an additional rule that destroys that information if the auxiliary is is, has, had, would, thus predicting Auxiliary Deletion for these auxiliaries regardless of what precedes. A minor point is that auxiliary contraction after non-pronominals takes place regardless of thefinalsegment of the host in the case of is, has, but takes place only after vowels in the case of had, would (Zwicky, 1970): Sue'd go, but *Johrid go. Surely, the grid cannot be used for such differentiation? (On p. 401 Selkirk tentatively regards such facts as suppletive in nature.) I would conclude that not even the enriched notion of the grid appears to be the appropriate mechanism for describing the reduction of function words.

472

REVIEW ARTICLE 6.

CONCLUSION

In the preceding sections, considerable scepticism has been expressed about the extent to which Selkirk's account actually describes what it is meant to describe. Yet the overall impression of this book is not unfavourable. It contains the most clearly articulated statement about the organization of syntax and phonology one can find anywhere. This theory, which Selkirk recapitulates in Chapter 8 ('The syntax-phonology mapping'), naturally constrains the way the facts of sentence phonology can be accommodated. The book should therefore primarily be seen as an experiment in applying a coherent theoretical framework to the largest possible domain of data that can reasonably be expected to be accommodated by it. This task is carried out with commendable and enviable care. As a result, in spite of its failure on various counts, the description has produced a number of theoretical positions that are likely to serve as reference points for further research, and it is therefore not a book that should be ignored. REFERENCES Berendsen, E. (1985). Tracing case in phonology. Natural Language and Linguistic Theory 3. 95-106. Bing, J. (1979). Aspects of English prosody. Doctoral dissertation. University of Massachusetts at Amherst. Distributed by Indiana University Linguistics Club, Bloomington, Indiana. Bolinger, D. (1981). Two kinds of vowels, two kinds of rhythm. Distributed by Indiana University Linguistics Club, Bloomington, Indiana. Bolinger, D. (1986). Intonation and its parts: melody in language. Stanford, California: Stanford University Press. Buxton, H. (1983). Temporal predictability in the perception of English speech. In Cutler, A. & Ladd, D. R. (eds.). Prosody: models and measurements. Berlin: Springer, m - 1 2 1 . Chomsky, N. & Halle, M. (1968). The sound pattern of English. New York: Harper & Row. Cooper, W. E. & Eady, S. J. (forthcoming). Metrical phonology in speech production. JML. Darwin, C. J. & Donovan, A. (1980). Perceptual studies of speech rhythm: isochrony and intonation. In Simon, J. C. (ed.) Spoken language generation and understanding. Proceedings of the NATO Advanced Study Institute held at Bonas, France, 1979. Dordrecht: Reidel. 77-85Donovan, A. & Darwin, C. J. (1979). The perceived rhythm of speech. Proceedings of the ninth international congress of phonetic sciences, vol. 2. Copenhagen. 268-274. Giegerich, H. J. (1985). Metrical phonology and phonological structure. Cambridge: Cambridge University Press. Gimson, A. C. (1956). The linguistic relevance of stress in English. Zeitschrift fur Phonetik und allgemeine Sprachwissenschaft 9. 113-149. Reprinted in Jones, W. E. & Laver, J. (1973). Phonetics in linguistics: a book of readings. London: Longman. Goldsmith, J. (1984). Tone and accent in Tonga. In Clements, G. N. & Goldsmith, J. (eds.), Autosegmental studies in Bantu tone. Dordrecht: Foris. 20-51. Gussenhoven, C. (1983a). Stress shift in Dutch as a rhetorical device. Linguistics 21. 603-619. Reprinted in Gussenhoven, C. (1984). On the grammar and semantics of sentence accents. Dordrecht: Foris. Gussenhoven, C. (1983b). Focus, mode and the nucleus. JL 19. 377-417. Reprinted in Gussenhoven, C. (1984). On the grammar and semantics ofsentence accents. Dordrecht: Foris. Hayes, B. (1984). The phonology of rhythm in English. Lin 15. 33-74. Huss, V. (1975). Neutralisierung englischer Akzentunterschiede in der Nachkontur. Phonetica 32. 278-291. 16

473

LIN

22

JOURNAL OF LINGUISTICS Huss, V. (1978). English word stress in the post-nuclear position. Phonetica 35. 86-105. Kaisse, E. M. (1985). Connected speech: on the relation between syntax and phonology. Orlando, Florida: Academic Press. King, H. V. (1970). On blocking the rules for contraction in English. Lin 1. 134-136. Kiparsky, P. (1979). Metrical structure assignment is cyclic. Lin 10. 421-442. Kiparsky, P. (1982). Lexical morphology and phonology. In Linguistics in the morning calm, edited by the Linguistic Society of Korea. Seoul: Hanshin. 3-91. Liberman, M. & Prince, A. (1977). On stress and linguistic rhythm. Lin 8. 249-336. Loots, M. (1983). Syntax and assimilation of voice in Dutch. In Van der Broecke, M. P. R., Van Heuven, V. J. & Zonneveld, W. (eds.), Studiesfor Antonie Cohen: sound structures. Dordrecht: Foris. 173-182. Poser, W. J. (1982). Phonological representation and action-at-a-distance. In Van der Hulst, H. & Smith, N. (eds.), The structure of phonological representations (Part II). Dordrecht: Foris.

121-158. Prince, A. (1983). Relating to the grid. Lin 14. 19-100. Scott, N. C. (1939). An experiment in stress perception. Le Maitre Phonetique 67.44~45Selkirk, E. O. (1972). The phrase phonology of English and French. Doctoral dissertation, Massachusetts Institute of Technology. New York: Garland. Selkirk, E. O. (1978). On prosodic structure in relation to syntactic structure. In Fretheim, T. (ed.), Nordic prosody, vol. 11. Trondheim, TAPIR. Slis, I. H. (1985). The voiced-voiceless distinction and assimilation of voice in Dutch. Doctoral dissertation, University of Nijmegen. Zwicky, A. (1970). Auxiliary reduction in English. Lin I. 323-350.

474