Automatic Evaluation of two Intonational Phrasing ... - Semantic Scholar

Automatic Evaluation of two Intonational Phrasing Algorithms for Dutch Erwin Marsi Department of Language and Speech University of Nijmegen [email protected]

Abstract Intonational phrasing is a research topic in both phonological theory and speech synthesis. Both disciplines can benefit from a systematic evaluation and fine tuning of intonational phrasing algorithms. This article presents a method for the evaluation of intonational phrasing algorithms. The method allows for a largely automatic evaluation. A number of advantages of this approach are discussed. The article reports on the results for two algorithms for intonational phrasing as applied to Dutch: Intonational Phrase Formation (Nespor & Vogel 1986: Chapter 7) and Association Domain Derivation (Gussenhoven 1988). A number of systematic errors is discussed. In addition, a new definition of phonological phrase formation for Dutch is proposed.

Introduction One of the factors that contributes significantly to the prosodic quality of speech is phrasing, i.e. the way the flow of speech is segmented into chunks by prosodic means1. One of these prosodic means is pitch, perceived as intonation. The chunks delimited by intonation are called intonational domains or intonational phrases; they are separated by intonational boundaries.2 An intonational domain has a coherent intonation contour, while its boundaries are characterized by an intonational discontinuity. Although a considerable amount of linguistic theory regarding intonational phrasing is available, most work is based on a small set of crucial observations and has not been verified on the basis of larger sets of data, e.g. Gussenhoven (1988), Hirst (1993), Ladd (1986), Nespor & Vogel (1986), Selkirk (1984), or has not been systematically compared to competing approaches, e.g. Altenberg (1990), Bachenko & Fitzpatrick (1990), Croft (1995). Moreover, most of them deal exclusively with English. At the applied end of the research spectrum, most phrasing algorithms for speech synthesis rely on general heuristics and produce only approximations of human phrasing, e.g. Dirksen (1993), Monaghan (1989), O’Shaughnessy (1989). Such an approach is reasonable within the context of text-to-speech, as it is still impossible to reliably extract detailed syntactic, semantic and discourse information from unrestricted text. But now that the emphasis of current research on speech synthesis has moved in the direction of spoken language generation and spoken dialogue systems, this parsing problem can be avoided. Within the context of such systems, detailed and reliable information about the linguistic structure is often readily available as a side effect of the generation of sentences and texts. This provides the setting for more advanced phrasing algorithms, which go beyond the level of general heuristics. In conclusion, both linguistic theory and spoken language systems could profit from a systematic evaluation and tuning of advanced phrasing algorithms. This article presents a method for systematic evaluation of phrasing algorithms. It reports on the results for two algorithms that both rely, in one way or another, on the notion of a prosodic constituent structure: N&V (Nespor & Vogel 1986: chapter 7) and GUS (Gussenhoven 1988). In order for this comparison to be useful to researchers working on speech synthesis, where the goal is not an approximate and acceptable phrasing, but an exact and optimal one, it is inevitable that a rather detailed error analysis is performed.

Method Overview The basic idea behind this approach to evaluation is similar to the approach in corpus linguistics (Altenberg 1990) or the optimization procedures used in automatic speech recognition: establish an empirical reference and test performance by comparing results to the reference. The method encompasses the following steps: I. Construction of the corpus: assemble a database of sentences; add their syntactic structures and their intonational structures (pitch accents and intonational phrase boundaries). II. Implementation of phrasing algorithms: implement several phrasing algorithms drawn from the literature. III. Evaluation of phrasing algorithms: (a) Automatic evaluation: apply each algorithm and compare the phrasing of the algorithm to the original phrasing; produce a report of the differences, a confusion matrix and an error rate. (b) Human evaluation: based on the results of the automatic evaluation, trace the origin of errors, find the advantages and disadvantages of each algorithm, and determine which algorithm has the best overall performance. These steps will be described in detail in sections 3, 4 and 5. Although the ultimate reason for doing an evaluation is of course to obtain an improved phrasing algorithm in the end, this is not part of the evaluation and will therefore not be discussed in this article. Advantages of automatic evaluation In pilot research, several phrasing algorithms were implemented, applied to (automatically generated) sentences, and evaluated by listening to these sentences as produced by a speech synthesizer. However, it proved to be fairly difficult to judge phrasing algorithms in this way, because of the problem encountered by most developers of synthesis systems: in the long run one’s judgment tends to adapt itself to the peculiarities of the speech synthesizer under development. In the context of intonational phrasing, this means that a developer gets used to the phrasing habits of a particular algorithm. One way to avoid this problem is to carry out an evaluation experiment with a substantial

number of listeners. This, however, would be a rather time consuming method of evaluation. An alternative is provided by the method of automatic evaluation as presented above. This method relies on a predetermined reference phrasing, with which the output of the phrasing algorithms can be compared (section 0 will describe how such a reference was established). It has a number of advantages. First, it renders evaluation less subjective, because a reference is established prior to the actual evaluation. This avoids the problem of habituation mentioned earlier. Second, the evaluation is guaranteed to be fast and reliable, because the output of a phrasing algorithm can be automatically compared to the reference. There is no need to listen to all the sentences again every time another evaluation is required, which procedure would be time consuming and prone to human error. Third, automatic evaluation allows for tuning of continuous variables in the algorithms at a later stage of development, e.g. the length of an intonational phrase.

Material I: construction of the corpus Text The corpus consisted of 10 descriptions of flowers from a botanical handbook (Forey 1992). Each description consists of 4 to 8 sentences. The total number of sentences is 52, the total number of words is 907, an average of 17.4 words per sentence. Most sentences are relatively long and have a rather complex syntactic structure. These properties makethem a good choice for evaluating intonational phrasing. Their length and complexity are likely to lead to a large number of intonational phrases. In contrast, the discourse structure of the texts is fairly simple, essentially just the introduction of particular flower and an elaboration on its properties, so that the influence of discourse structure on phrasing was bound to be limited. Intonational structure A panel consisting of three phonetically experienced, native speakers of Dutch was asked to produce an intonational specification of these texts. They worked on a preprocessed version of the texts: commas were removed, semicolons were changed to full stops, and abbreviations and numbers were expanded to their full lexical form. Their task was to specify the intonational structure to be used in a competent reading of the text. This intonational structure included the accent distribution and the boundary distribution, i.e. the locations of pitch accents and intonational phrase boundaries. Boundaries turned out to always coincide with word boundaries. A distinction was made between obligatory boundaries and optional boundaries. The panel solved disagreements by

discussion and ultimately a consensus specification was arrived at. In total, there are 161 obligatory boundaries, 42 optional boundaries and 704 locations where no boundary was possible. Figure 1a a gives an example. Syntactic structure The sentences were parsed manually. The manner of syntactic analysis is the result of a compromise between two conflicting demands. The first is to keep the syntactic structures as simple as possible. The second is that all morphosyntactic information that any of the particular phrasing algorithms might need should be included. This amounts to an analysis in terms of major constituents (Sentence, Verb Phrase, Noun Phrase, Prepositional Phrase, Adjective Phrase, ADVerbial Phrase), their lexical heads (Verb, Noun, Preposition, Adjective, ADVerbial) and a number of functional categories (DETerminer, Quantifier, Complementizer, CONJunction). The analysis of clausal structure corresponds roughly to generative accounts of Dutch grammar. A number of features added to nodes extend the basic analysis: word-level features marking auxiliaries, copulas and pronominals; semantic features linking predicates to their arguments; and some exceptional features marking root sentences, nonrestrictive relative clauses and appositional elements. There are no empty nodes or traces. See Figure 1b. Prosodic structure Phonological phrase formation according to Nespor&Vogel Conceptually, the phrasing algorithms used in this evaluation are embedded in the theory of prosodic phonology (Nespor & Vogel 1986). One of the central claims of prosodic phonology is that speech is hierarchically organized in a limited number of prosodic constituents. By analogy with a syntactic structure consisting of syntactic constituents, an analysis of an utterance in terms of prosodic constituents is called its prosodic structure. For current purposes, only two prosodic constituents are required: the intonational phrase (I) and the phonological phrase (ϕ). See section 4.1 for the derivation of I. Nespor&Vogel's definition of ϕ is as follows.

(a) de REUzenbalsemien | is uit de HimaLAya naar EuROpa gebracht # en koloniSEERT hier WAterkanten in VEle geBIEden # ook PLAAtselijk in NEderland en BELgie. # the reuzebalsemien is from the Himalayas to Europe brought and colonializes here watersides in many areas also locally in Netherlands and Belgium

(b) S

HFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFOFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFI

S

S

HFFFFFFFFFFFFNFFFFFFFFFOFFFFFFFFFFFFFIHFFFFFFFFNFFFFFFFFFFFFFFFFFFFFFFNFFFFFFFFFFFOFFFFFFFFFFFFFFFFFFFFFFFFFI

NP

V

VP

CONJ V

VP

VP

HFFFFFFFOFIGHFFFFFFFFFFFOFFNFFFFFFFFFFIGGHFFFFFFFFFFFFFFOFIHFFFFFFFFNFFFFFFFFFOFFFFFFFI DET N G3333DUJ!!9SUHG!!GG$'9313DUJ!!$'93$'9333 GGGHFFFFFOFIHFFFOFIGGGGHFFFFFFFOFFFFFIGGHFFFFFFFFFFOFI GGG313313GGG$'9133$'9$'9313 GGGGHFFFOFIGGGGGGGHFFFFFFOFIGGGHFFFFFOFFFFI GGGG'(71G1GGGGG313GGG1313 GGGGGGGGGGGGGGHFFFFOFIGGGGHFFFOFI GGGGGGGGGGGGGG$31GGG1&21-1 GGGGGGGGGGGGGGGGGGGGGG GGGGGGGGGGGGGG$GGGGGGG GGGGGGGGGGGGGGGGGGGGGG de

reuzenbalsemien

is

uit de

Himalaya naar Europa

gebracht

en

koloniseert hier waterkanten in vele gebieden ook

plaatselijk in Nederland

en

Belgie

(c) U

HFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFOFFFFFFFFFNFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFI I

I

I

HFFFFFFFFFFFFFFFFFFFOFFFFFFFFFIHFFFFFFFFFFFFFFFFOFFFFFFFIHFFFFFFFNFFFFFFFFOFFFFFFFI 3 3 3 3 333

HFFFFFFFOFIHFFFFFFFFFFOFFFFNFFFFFFFFFIHFFFFFOFIHFFFFFFFFFFFFFFOFIGGHFFFFOFFFFFFI ZZ333ZZ33 ZZ33 GGHFFFNFFNOFFFFIHFFFOFIGGGGHFFFFFFFOFFFFFIGGHFFFFOFIHFFFOI GGZZZZZZZGGZ33GGZZZZ GGGGGGGGGGGGGHFFFNFFOFFFIGGGGGG GGGGGGGGGGGGZZZZGGGGGG GGGGGGGGGGGGGGGGGGGGGG

de reuzenbalsemien is uit de Himalaya naar Europa gebracht en koloniseert hier waterkanten in vele gebieden ook plaatselijk in Nederland en Belgie

Figure 1.1: Example of one of the sentences in the corpus. (a) The accent and boundary distribution as stored in the corpus. Accented syllables are in capitals, optional boundaries are denoted by a vertical bar (|), obligatory boundaries by a hash sign (#). (b) The syntactic structure. (c) The prosodic structure. I (Intonational phrase) and ϕ (Phonological phrase) were automatically derived, U (Utterance) and w (Phonological Word) were added manually.

Phonological Phrase Formation (Nespor&Vogel) I. ϕ domain The domain of a ϕ consists of a lexical head X and all words on its nonrecursive side up to another lexical head outside the maximal projection of X.3 II. ϕ construction Join into an n-ary branching ϕ all words included in a string delimited by the definition of the domain of ϕ. Lexical heads are content words of the category N, V, or A. Although this definition is claimed to be part of universal grammar, applying it to Dutch reveals several problems. The first problem concerns the notion of ‘nonrecursive side’. Nespor&Vogel’s ϕ formation is not directly applicable, because it requires a specification of the recursive side of a language. It is, however, not clear whether Dutch is rightrecursive or left-recursive, as recursive nodes occur on both sides in Dutch syntax. Nespor&Vogel acknowledge this problem (Nespor & Vogel 1986: 186, note 2), but offer no solution. The second problem is one of indeterminacy. Since Dutch seems to prefer recursion on the right-hand side, we might - for the sake of argument - assume that the nonrecursive side is the left-hand side. Consider the syntactic fragment in (1a), consisting of a lexical head V, followed by an NP containing a lexical head A and a lexical head N. If we determine the ϕ domain for red, we get (1b), but if we determine the ϕ domain for flower, we get (1c). Thus the definition of ϕ formations is non-deterministic, it produces incompatible ϕ domains. (1)

a.

... (V carries) (NP (DET a) (AP (A red))(N flower))

b.

...

carries] [

a

red]

c.

...

carries] [

a

red

flower]

Now assume - again for the sake of argument - that we always prefer the largest ϕ domain. So in (1), we prefer (1c) above (1b). In other words, such a ϕ domain includes all premodifiers of a lexical noun. Now as the number of premodifiers for a noun is in principle unlimited, it follows that the length of such a phonological phrase is in principle unlimited.4 This conclusion is absurd, given

7

the empirical observations below. Similar arguments can be made by substituting ‘right-hand side’ for ‘nonrecursive side’ or ‘V’ for ‘X’. Phonological phrase formation for Dutch The problems mentioned above make it impossible to extract empirical predictions that can be verified. Therefore, an alternative definition of ϕ formation for Dutch is proposed below. Phonological Phrase Formation for Dutch I. Phonological phrase head Every word that is a. a content word, and b. a syntactic head (N, V, A, or ADV), and c. not an adjacent modifier is a ϕ head. II. Phonological phrase boundaries Every location that is a. the start of an S; or b. the end of a ϕ head that is not the final ϕ head in an S; or c. the end of an S is a ϕ boundary. III. Phonological phrase node A ϕ node is an n-ary branching node directly dominating all words between two successive ϕ boundaries. The class of content words excludes function words like determiners, quantifiers, complementizers, prepositions, pronouns, auxiliaries, etc. The definition of adjacent modifier is as follows. Adjacent Modifier X is an adjacent modifier if there is a Y such that a. Y is N, A, or ADV, and b. X is adjacent to Y, and c. X is dominated by the maximal projection of Y. This algorithm for phonological phrase formation for Dutch proved to be adequate for the sentences in the corpus, as it produced no obviously ill-formed ϕs. Figure 1c gives the ϕs derived from the syntactic structure in Figure 1b. Notice that above the level of ϕ, there is a recursive level consisting of ϕ'

8

nodes. This level is irrelevant, and therefore deleted, during the evaluation of the N&V algorithm. It is required however, for evaluation of the GUS algorithm. The details of ϕ' formation fall outside the scope of this article. Suffice it to say that the dominance relations among ϕ's reflect the dominance relations that hold among syntactic constituents. The remainder of this section will discuss a few examples of ϕ formation according to the new definition. Like syntactic constituents, ϕs cannot be observed directly. Their existence must be deduced from empirical phenomena. In order to test ϕ formation, we need to establish which empirical phenomena characterize a ϕ. Apart from certain segmental processes that take ϕ as their domain of application, the most important ‘diagnostic tool’ to check ϕs for Dutch is the accent distribution. The accent distribution within a ϕ is (optionally) modified by the rhythm rule (Gussenhoven 1991, Booij 1995: 161). Informally stated, the rhythm rule says: delete all accents in a ϕ except the leftmost one and rightmost one. Thus if the rhythm rule can be applied to a sequence of accents, they must be in the same ϕ. Consider the sentence in (2a). In a normal, wide-focus reading of the sentence, three accents are required: on pen, on dik and on vle. The same information is captured in an NP in (2b). This time, a wide focus reading requires only two accents: on dik and on pen; the accent on vle is optional. This is attributed to the rhythm rule and we conclude that the three accents on dik, vle and pen are in the same ϕ. A conclusion that is confirmed by the definition of ϕ formation for Dutch. Given the syntactic structure in (2c), the ϕ in (2d) can be derived. Notice that the lexical head dikvlezige is a adjacent modifier of penwortel and therefore does not constitute a ϕ head itself. (2)

a.

de PENwortel is DIKVLEzig the pen-root is thick-fleshy

b.

een DIKvlezige PENwortel a thick-fleshy pen-root

c.

(NP (DET een) (AP (A dikvlezige)) (N penwortel))

d.

[ een DIKvlezige PENwortel ]

If we add another adjective, as in (3), deleting both accents on dik and vle is incompatible with a wide focus reading. Instead, we have to retain the accent on dik as in (3b). Given the syntactic structure in (3c), this observation follows from the application of ϕ formation and the rhythm rule (see 3d). Notice that lange meets the requirements for a ϕ head.

9

(3)

a. # een LANGe dikvlezige a

long

PENwortel thick-fleshy pen-root

b.

een LANGe DIKvlezige

c.

(NP (DET een) (AP (A lange)) (AP (A dikvlezige )) (N penwortel))

d.

[ een LANGe ] [ DIKvlezige PENwortel ]

PENwortel

Of course, these examples do not proof that ϕ formation for Dutch is correct, though it can be shown that a range of syntactic configurations is handled correctly in this way.

Material II: Implemtation of phrasing algorithms The following sections provide an overview of the two phrasing algorithms (N&V and GUS) that were implemented in step 2 of the evaluation. Lack of space makes it impossible to discuss both of them in depth. For reason and motivation, details and complications, see the original publications. Some of the problems that arose during implementation are mentioned in the end notes. All implementations were written in GRAMTSY (Grammatical Transformational System), a dedicated programming language that supports the use of transformational rules and grammars to manipulate labeled bracket structures.5 Nespor & Vogel (N&V) The intonational phrasing algorithm by Nespor and Vogel (1986: chapter 7) maps morpho-syntactic structure to Is. As input it requires the syntactic structure of an utterance, where root sentences and appositional elements can be identified, as well as an analysis of the utterance in terms of ϕs. The output consists of the locations of optional and obligatory I boundaries. The phrasing algorithm proceeds in two steps. In the first step, called IFormation, the initial boundaries are established. Intonational Phrase Formation Join into an n-ary branching I: a. all the ϕs in an appositional string; b. any remaining sequence of adjacent ϕs in a root sentence.

10

The set of appositional strings includes parenthetical expressions, nonrestrictive relative clauses, tag questions, vocatives, expletives, and certain moved elements. A root sentence is a node S that is not dominated by a node other than S.6 The next step, called Restructuring of Intonational Phrases, consists of optionally dividing Is into smaller Is by adding boundaries. Is may be restructured under the influence of factors like length, speech rate, style or contrastive prominence. Although this introduces a considerable amount of variability in intonational phrasing, there are nevertheless semantic and syntactic constraints on restructuring. According to the following definition, restructuring can only occur at certain syntactically defined positions. Intonational Phrase Restructuring (optional) Restructure after an NP if this does not interrupt another NP and does not separate and internal argument from its verb.7 Restructure before an S if this does not interrupt another NP. In addition, there is a special provision for restructuring listed items. List Restructuring (optional) In a sequence of more than two constituents of the same type, i.e. x1, x2, ... xn, restructure before each repetition of the node X, i.e. before x2, x3, ... xn. Applying the N&V algorithm to the sentence in Figure 1 gives the phrasing in (4). Notice that there is no optional boundary after the NP Europa, as this would separate the verb gebracht from its internal argument naar Europa. Also, no boundary occurs after the NP Nederland dominating NP. (4)

de REUzenbalsemien | is uit de HimaLAya | naar EuROpa gebracht # en koloniSEERT hier WAterkanten in VEle geBIEden # ook PLAAtselijk in NEderland en BELgie. #

Gussenhoven (GUS) The phrasing algorithm by Gussenhoven (Gussenhoven 1988, Gussenhoven & Rietveld 1992) is part of a linguistic theory of intonation within the framework of autosegmental phonology. Gussenhoven considers an intonational domain as the stretch of speech in which the tones of one or more pitch accents are associated. It is therefore called an Association Domain (AD). Gussenhoven argues that an accent’s AD is primarily determined by the accent distribution,

11

more specifically, the location of the next accent in the utterance. Accent distribution is a factor that is not accounted for in the derivation of Is in prosodic phonology. In fact, few other phrasing algorithms include the accent distribution as a relevant factor. Although ADs do not belong to the prosodic hierarchy themselves, but are autonomous units, they are related to the prosodic structure in the sense that the end of an AD always corresponds to the end of a prosodic constituent. ADs are derived by means of an algorithm that links them to both the accent distribution and the prosodic constituent structure of an utterance. The input of this phrasing algorithm consists of an accent distribution and a prosodic structure. As already mentioned in section 3.4, this prosodic structure is somewhat different from the one assumed by N&V. It does not conform to the Strict Layer Hypothesis; there is recursion on the ϕ' level.8 Furthermore, its Is remain as derived by I-formation; there is no subsequent I-restructuring. Given these assumptions regarding the prosodic structure that serves as the input, the output of the algorithm consists of the locations of optional and obligatory AD boundaries. As in the previous algorithms, phrasing proceeds in two steps. The first step, known as AD-derivation, derives for every accent a corresponding unique boundary. Association Domain Derivation For every accent, place a boundary the end of the highest prosodic constituent that dominates it, but which does not dominate the next accent. The second step, AD-restructuring, joins one or more ADs to a single one by deleting boundaries. This accounts for the observation that often several accents share a common AD. Restructuring is optional. Furthermore, there is a constraint on the order of restructuring. Restructuring must give precedence to those boundaries that coincide with lower ranking prosodic constituents only (Gussenhoven & Rietveld 1992: 288). This implies, for example, that ϕ-final AD-boundaries must be deleted before I-final AD-boundaries. Still, this formulation of restructuring is not specific enough to build an implementation on, because in principle every boundary can be deleted. Also, there is no notion of optional boundary, which makes it difficult to compare the results to those of the two other algorithms. What is needed is some kind of principle to determine which boundaries will be deleted or turned into optional ones. For the purpose of evaluation, the following implementation of restructuring was used, which relies on both accent distribution and prosodic constituency.

12

Association Domain Restructuring a. delete all boundaries within a prosodic constituent that contains at most three accents; b. optionally delete all boundaries within a prosodic constituent that contains four accents. Notice that ‘optionally delete’ means that an obligatory boundary becomes an optional one. The result of applying the AD-derivation to the sentence in Figure 1 is (5a), after AD-restructuring we get (5b). (5)

a.

de REUzenbalsemien # is uit de HimaLAya # naar EuROpa gebracht # en koloniSEERT # hier WAterkanten # in VEle # geBIEden # ook PLAAtselijk # in NEderland # en BELgie. #

b.

de REUzenbalsemien is uit de HimaLAya naar EuROpa gebracht # en koloniSEERT | hier WAterkanten in VEle geBIEden # ook PLAAtselijk in NEderland en BELgie. #

Results and discussion This section corresponds to step 3 in the evaluation procedure and discusses the results for evaluating N&V and GUS. To facilitate the discussion of examples, boundary locations will be notated as ‘X/Y’, where X denotes the reference and Y denotes the prediction of the algorithm. X and Y can be ‘#’ (an obligatory boundary), ‘|’ (an optional boundary) or ‘∅’ (no boundary). Thus for example ‘#/|’ denotes a location that is an obligatory boundary according to the reference and an optional boundary according to the algorithm. Whenever a comparison between X and Y is irrelevant for the discussion, only Y is mentioned. Furthermore, incorrect predictions of optional boundaries will be ignored, because these are considered as less serious errors. After all, it doesn’t really matter whether an optional boundary in the reference is predicted to be either obligatory, optional or disallowed. Nespor & Vogel General The results are summarized in Figure 1.2. What catches the eye is that N&V predicts no incorrect obligatory boundaries. Apparently, all appositional elements and nonrestrictive relative clauses are indeed delimited by an 13

obligatory boundary in the reference. The other side of the coin is that, due to this conservative strategy for obligatory Is, 42 (26%) of the obligatory boundaries in the reference are only predicted to be optional, while 28 (17%) are not predicted at all. These figures become even more pronounced if we notice that 52 out of the 91 correctly predicted obligatory boundaries are in fact trivial, because these are sentence-final boundaries. In other words, only 39 out of 109 non-final obligatory boundaries were predicted correctly. If we next look at the optional boundaries in the reference, we see that N&V predicts 34 (81%) of them correctly, and misses only 7 (19%) of them. On the other hand, it generates far too many optional boundaries. It predicts 50 boundaries where no boundary should occur, and 42 where instead an obligatory boundary is required. abs N&V

REF obl obl 91 opt 42 non 28 tot 161

% opt 0 34 8 42

non 0 50 654 704

tot 91 126 690 907

N&V

REF obl opt non obl 56,52 0 0 opt 26,09 80,95 7,10 non 17,39 19,05 92,90 100 100 100

Figure 1.2: confusion matrices for N&V algorithm; absolute and relative to number of obligatory, optional and no boundaries in the reference Obligatory boundary in the reference, but no boundary according to N&V N&V: # / ∅ between APs before a PP before an attributive PP before a non-attributive PP between conjuncts between two NP conjuncts between two VP conjuncts before a restrictive relative clause Total

8 6 3 6 2 3 28

Eight of these obligatory boundaries occurred between APs. In general, there is no boundary between two attributive APs, but if the APs are sufficiently long and/or complex, there is. For example, the reference requires an obligatory boundary between the two APs in (6), groenachtig witte and netvormig geaderde. However, N&V can never predict a boundary at this position,

14

because it is not the end of a NP, neither the start of an S, nor located between two list items. (6)

... # en VIJF GROENachtig witte #/∅ NETvormig and five greenish white clathrate geaderde KROONbladen. # veined petals

Nine obligatory boundaries in the reference occurred before a PP, either attributively or non-attributively used. According to N&V, no boundary can occur between a noun and an attributive PP, as this would break up an NP. So in (7), the algorithm fails to predict a boundary between the noun klimplant and the attributive PP met lange kantige en vertakte stengels. The absence of a boundary after klimplant in combination with a boundary after lange (caused by list restructuring) would result in an odd phrasing. The necessity of the boundary may be explained by the length and/or complexity of the premodifier borstelig behaarde plus the postmodifier met lange kantige en vertakte stengels. (7)

HEggerank | is een BORstelig behaarde KLIMplant #/∅ Heggerank is a bristly hairy climber met LAnge | KANtige | en verTAKte STEngels | ... with long angular and branching stalks

Before a non-attributive PP, a boundary can occur only if it is preceded by an NP. Extraposed PPs occur after the verb cluster in Dutch (an SOV language), and are therefore often preceded by a verb. As a consequence, N&V cannot provide a boundary before such non-attributive PPs. An example of this is (8), for which the reference has an obligatory boundary between the verb plaats and the extraposed PP met losbloemige bebladerde trossen van purperroze bloemen.9 (8)

Bloei vindt bloom does

in de zomer plaats #/∅ met losbloemige in the summer occur with flour-ish

bebladerde trossen van purperroze bloemen. # leaf-ish bunches of purple-pink flowers

A third type of discrepancy can be found in the context of conjunction, or coordination, of two constituents. (9a) shows an example of two NP conjuncts, and (9b) of two VP conjuncts. In both cases, the conjuncts are separated by an obligatory boundary in the reference. Although short conjuncts are normally not separated, they are when sufficiently long and/or complex. However, in

15

(9a), where a boundary would break up the dominating NP, N&V disallows a boundary, whereas in (9b), after the verb gekweekt is simply not an appropriate location for a boundary. (9)

a.

Ze bezitten AFstaande they posses protruding

KELKbladen #/∅ en sepals and

VIJF GROENachtig witte NETvormig five greenish white clathrate

geaderde veined

KROONbladen.# petals

b.

Hij wordt wel | in tuinen gekweekt #/∅ en in he is also in gardens grown and in muntsaus verwerkt. # mint-sauce processed

A fourth type discrepancies occurs with restrictive relative clauses. N&V predicts no boundary before a restrictive relative clause, as this would interrupt an NP. Nevertheless, they do occur repeatedly in the reference. (10a) is an example where such a boundary is expected. In fact, the only case in which a boundary before a restrictive relative clause does not appear is when the clause is relatively short. An example of this is (10b) with he short restrictive clause die erop landen.

16

(10) a.

... # en dragen HALverwege een BLOEIkolf #/∅ and carry halfway a bloom-cob die SCHUIN AFstaat en VOLgepakt is met which slanted stands and packed is with onAANgenaam geurende | GEELachtige | unpleasantly smelling yellow-like TWEEslachtige BLOEmen. # androgynous flowers

b.

InSEKten die erop LANden ∅/| blijven STEken # ... insects that on-it land get stuck

Obligatory boundary in the reference, but optional according to N&V N&V: # / | between list elements before a non-attributive PP between two conjuncts between two PP conjuncts between two VP conjuncts after a subject NP Total

27 9 2 1 3 42

The majority of the boundaries that are obligatory in the reference but merely optional according to N&V occur between list items. It seems that if the list items are more complex, i.e. contain more than one lexical head, list restructuring becomes obligatory. (11) is an example of a list with moderately complex list items, which are separated by obligatory boundaries in the reference. There are sentences containing lists with much more complicated list items, even with list items that are lists themselves. (11) AARmunt groeit op VOCHtige PLAATsen #/| langs WEgen aarmunt grows

at humid

places

along roads

#/| en op rudeRAle terREInen in GROte delen van and at rudimental grounds in large parts of EuROpa # ... Europe

A second source of differences consists of obligatory boundaries before nonattributive PPs. They are often preceded by a NP, and therefore by an optional boundary, but this is not sufficient according to the reference. In (12), N&V inserts an optional boundary after the NP smaak en geurmiddel, but this ought 17

to be an obligatory boundary to separate it from the long non-attributive PP in snoepgoed, likeuren, medicijnen en tandpasta. (12) Deze olie | wordt gebruikt als SMAAK this oil

is

used

as

en flavor and

GEURmiddel #/| in SNOEPgoed | liKEUren | aromatic-substance in candy liqueurs mediCIJnen | en TANDpasta. # medicine and toothpaste

Yet another type of discrepancy can be found in the context of conjunction, or coordination, of two constituents. (13) shows an example of two NP conjuncts. The optional boundary that separates them should be an obligatory one according to the reference. (13) Ze

bezitten AFstaande KELKbladen #/| en VIJF they posses off-standing sepals and five GROENachtig witte NETvormig geaderde KROONbladen.# greenish white net-shaped veined petals

Finally, there are some sentences where a subject NP is followed by an obligatory boundary in the reference. Again, because the subject is rather long and/or complex, for example as in (14). (14) de

GEELgroene BOvenzijde van de BLADschijf #/| the yellow-green topside of the leaf-slice is bedekt door een WEke | SLIJmige LAAG # ... is covered by a weak slimy layer

No boundary in the reference, but optional boundary according to N&V N&V: ∅ / | after a subject NP after a non-attributive PP between two PP conjuncts after a non-subject NP before an S Total

34 9 5 1 1 50

The bulk of the predicted optional boundaries that do not correspond to a boundary in the reference occur after a subject NP. A lot of subject NPs are rather short and uncomplicated, typically only a determiner and a noun (see 15), and are therefore not followed by an optional boundary, let alone an obligatory 18

boundary. According to N&V, the subject is not an internal argument and therefore insensitive to the constraint that forbids separating an internal argument from its verb. However, if this constraint would apply to subjects as well, N&V would perform significantly better. Notice however, that extending the constraint to include subjects as well would be inconsistent with example (14). Some of the predicted optional boundaries are wrong because they create optional Is without an accent. This is the case in (12), where an optional boundary after the subject NP Deze olie creates an empty I at the start of the sentence. Is without an accent are generally ill-formed and valid only in a few exceptional cases.10 (15) Deze olie ∅/| wordt gebruikt als SMAAK This oil

is

used

as

en flavor and

GEURmiddel | ... aromatic-substance

Other wrongly predicted optional boundaries occur after NPs in non-attributive PPs. Such a boundary occurs in (16) after the PP in het wild. If these PPs show up before an utterance-final verb, they are likely to create an I without accents, because verbs can often remain unaccented in Dutch. An example of this is the boundary after the PP om de pepermuntolie. (16) Hij groeit in het WILD ∅/| op VOCHtige PLAATsen # it

grows

in the wild

in humid

places

en wordt OOK om de peperMUNTolie ∅/| gekweekt. # and is also for the pepermint-oil grown

Finally, there are inappropriate optional boundaries generated between two NP conjuncts. (17) has wrong boundaries between the PP conjuncts in akkers and op braakliggende gronden as well as between in wegbermen and langs rivieren. (17) ... # en

is op VEEL plaatsen | INgeburgerd and is at many places established

in AKkers ∅/| en op BRAAKligende GROND | in fields and at fallow ground in WEGbermen ∅/| en langs riVIEren | ... at road-shoulders and along rivers

19

Gussenhoven General The results of GUS are in Figure 1.3. In contrast to the other two algorithms, it predicts 33 incorrect obligatory boundaries. On the other hand, it covers 91% of the obligatory boundaries in the reference. However, with a coverage of only 40%, GUS is not very good at predicting optional boundaries. In exchange for this, the overproduction of optional boundaries is fairly limited in comparison to the other algorithms. The errors tend to fall in two categories. Recall that GUS requires a prosodic structure that is recursive at the ϕ' level. Accordingly, a number of the errors can be traced back to an inadequate prosodic structure. The bulk of the errors however, is caused by an accent distribution that incorrectly triggers restructuring or fails to trigger restructuring. Some of the errors of N&V and CAS return in GUS. For example, failure to predict obligatory boundaries between list elements or between two APs. Other errors, like incorrectly predicted boundaries between conjuncts, do not occur. Also, due to the role of accent distribution, empty intonational domains are avoided. abs GUS

REF obl obl 146 opt 13 non 2 tot 161

% opt 12 17 13 42

non 21 12 671 704

tot 179 42 686 907

GUS

REF obl opt non obl 90,68 28,57 2,98 opt 8,07 40,48 1,70 non 1,24 30,95 95,31 100 100 100

Figure1.3: confusion matrices for GUS algorithm; absolute and relative to number of obligatory, optional and no boundaries in the reference

Obligatory boundary in the reference, but no boundary according to GUS GUS: # / ∅ after subject NP after non-attributive PP Total

1 1 2

These two errors are in (18) and (19). The (b) parts contain their prosodic structure, where round brackets delimit ϕs/ϕ's and square brackets delimit Is. Recall that AD restructuring deletes all boundaries within a prosodic

20

constituent that contains at most three accents. In (18), there should be an obligatory boundary after the subject NP insekten die erop landen, but the dominating I contains only three accents and therefore all boundaries inside it are deleted. In (19), the PP modifiers in de bladoksels and aan verschillende planten belong to a single ϕ, which contains only three accents. As a consequence, the obligatory boundary that separates them can not be predicted. Interestingly, (19) is somewhat exceptional, because the boundary is disambiguating: it prevents an attributive interpretation of the final PP. (18) a. b. (19) a.

InSEKten die erop LANden #/∅ blijven STEken # ... insects that on-it land get stuck [(InSEKten) (die erop LANden) #/∅ (blijven STEken)] De MANnelijke en VROUwelijke BLOEmen the male and female flowers

# groeien in grow in

de BLADoksels #/∅ aan verSCHIllende PLANten. # the leaf-armpits at separate plants

b.

[((de MANnelijke) (en VROUwelijke BLOEMmen)) # (groeien)((in de BLADoksels) #/∅ (aan verSCHIllende PLANten))]

Obligatory boundary in the reference, but optional according to GUS GUS: # / | between list elements between two APs between two VP conjuncts before a non-attributive PP Total

6 4 2 1 13

The three list elements in (20) form separate ϕs, which are in turn dominated by a higher ϕ' (cf. 20b). This ϕ' contains four accents, so the boundaries separating the list elements become optional (cf. the principle of AD restructuring in section 4.2). In other words, whether the boundaries between list elements are correctly predicted depends, quite arbitrarily, on the number of accents in the list.

21

(20) a.

... op rechtOPstaande |/∅ ONbebladerde |/∅ at erect leaf-less tot VIJFtien centimeter hoge STEngels. # upto fifteen centimeters high stalks

b.

((op rechtOPstaande) |/∅ (ONbebladerde) |/∅ (tot VIJFtien centimeter hoge STEngels))

For a sequences of two APs or VPs the problem is essentially the same. Most of them do not contain enough accents to retain the boundary that separates them. No boundary in the reference, but obligatory boundary according to GUS GUS: ∅ / # before a PP before an attributive PP before a non-attributive PP after a subject NP after a non-subject NP after a verb Total

6 6 6 2 1 21

With regard to the incorrect boundaries before an attributive PP, at least two of them may be attributed to a prosodic structure that is not sufficiently detailed, which can then be traced back to an insufficient syntactic analysis. The boundary before the final PP in (21), in Europa, would not appear if the prosodic constituents were as in (21c) instead of (21b), and this would be the case if the PP is syntactically analyzed as a modifier of krijt en kalksteenbodems instead of graslanden. The mirror image of this appears in (22). The boundary before the PP van elliptische bladen would be deleted with a prosodic analysis as in (22c) instead of (22b). All other incorrect boundaries before an attributive PP are again a matter of too many accents.

22

(21) a.

... # in KORte GRASlanden # op KRIJT en in short grasslands at chalk and KALKsteenbodems ∅/# in EuROpa. # limestone-grounds in Europe

b.

((in KORte GRASlanden) # (op KRIJT en KALKsteenbodems) ∅/# (in EuROpa))

c.

((in KORte GRASlanden) # ((op KRIJT en KALKsteenbodems) ∅/∅ (in EuROpa)))

(22) a.

... een roZET ∅/# van elLIPtische BLAden # met een a rosette of elliptical leaves with a tot TWINtig centimeter HOge | STEvige up-to twenty centimeter high solid BLOEIstengel # bloom-stalk

b.

((een roZET) ∅/# (van elLIPtische BLAden) # (met een tot TWINtig centimeter HOge STEvige BLOEIstengel))

c.

(((een roZET) ∅/∅ (van elLIPtische BLAden)) # (met een tot TWINtig centimeter HOge STEvige BLOEIstengel))

With regard to the incorrect obligatory boundaries before non-attributive PP, they invariably involve a PP extraposed across a verbal or adjectival predicate. An example of this is (23). An alternative prosodic structure, where the ϕs is bedekt and door een weke slijmige laag are joined into a higher ϕ' , would solve the problem. Still, some of the PPs contain more than four accents and therefore become separated anyway. (23) a.

de GEELgroene BOvenzijde van de BLADschijf the yellow-green topside of the leaf-slice # is beDEKT is covered

b.

∅/# door een WEke SLIJmige LAAG # ... by a weak slimy layer

[((de GEELgroene BOvenzijde) (van de BLADschijf)) # (is beDEKT) ∅/# ((door een WEke) (SLIJmige LAAG))] # ...

After subject NPs, incorrect obligatory boundaries occur because the following ϕ contains too many accents, for example, a list as in (24).

23

(24) a.

de BLAden ∅/# zijn GROOT | LANGwerpig | the leaves are big elongated LANGgesteeld | en geTAND. # long-stalked and toothed

b.

[(de BLAden) ∅/# ((zijn GROOT) | (LANGwerpig) | (LANGgesteeld) | (en geTAND))] #

No boundary in the reference, but optional according to GUS GUS: ∅ / | before a PP before an attributive PP before a non-attributive PP after a subject NP rest Total

5 2 2 3 21

These errors are similar to those described in the previous section and their origin is essentially identical. The difference is that only four accents are involved, instead of three, so the incorrectly predicted boundaries are optional instead of obligatory.

Conclusion Two intonational phrasing algorithms were evaluated by means of a partly automatic evaluation procedure. As a side effect of this work, a new definition of phonological phrase formation for Dutch was proposed. Although both phrasing algorithms perform reasonably well, they show a number of systematic errors. N&V fails to predict boundaries, both obligatory and optional, because the syntactically defined set of possible boundary locations is too restricted. Moreover, it predicts a considerable number of incorrect optional boundaries. In defense of Nespor & Vogel, we can add that they never claimed that all their optional boundaries are actually valid. In their opinion, this depends on factors like speech rate, length, style and prominence. For the same reason, some of their optional boundaries can in fact become obligatory. Such semantic and performance factors were not included in the implementations. Still, the fact that the set of possible boundary locations is too restricted remains a serious problem. N&V cannot account for boundaries between APs or conjuncts. Also, there is no way to avoid empty intonational domains. This poses no problem to the second algorithm, GUS. The problems with GUS are that it predicts too many obligatory boundaries and is not very adequate at 24

predicting optional boundaries. This is partly because there is no adequate account of restructuring. The algorithm we used relies on counting accents within a prosodic constituent, but the number of accents that triggers restructuring, i.e. three for obligatory restructuring and four for optional restructuring, is entirely arbitrary. In addition to the lack of a motivation for restructuring, there the additional problem that restructuring operates on the prosodic structure. This makes it impossible to detect certain syntactic configurations that need a particular phrasing, like lists, sequences of complex APs, or conjuncts. In other words, the prosodic structure is not informative enough for the purpose of intonational phrasing (see also Marsi, Coppen, Gussenhoven & Rietveld 1996). These results seem to point in the direction of a syntax-driven intonational phrasing algorithm, sensitive to certain particular syntactic configurations, the accent distribution, and the length of intonational phrases.

Endnotes 1

Thanks to Peter-Arno Coppen, Carlos Gussenhoven and Toni Rietveld for their support. A different version of this article appears in the proceeding of Nordic Prosody VII. The present article does not discuss the CAS phrasing algorithm (Caspers 1994), but instead contains a more detailed discussion of phonological phrase formation for Dutch. 2 An intonational boundary does not need to be exclusively tonal, it can be additionally marked by temporal means such as pause or prefinal lengthening. However, a boundary marked by temporal means only is not an intonational boundary. 3 Nespor&Vogel’s original definition uses ‘Clitical Groups’ instead of ‘words’. The difference is irrelevant for the discussion at hand. 4 There are some technical details. This reasoning assumes that a noun phrase is analyzed as an NP node dominating at least a node N and optionally some nodes which serve as determiner, premodifiers and postmodifiers: (NP ... determiner ... premodifiers ... (N ... ) ... postmodifiers ... ). This is also the kind of syntactic analysis that Nespor&Vogel presuppose (cf. Nespor & Vogel 1986: 171, ex. 10). An alternative analysis where an NP is analyzed as a complement of a Determiner Phrase (DP) would introduce another problem. Under such an analysis, an adjectival premodifier is adjungated to the NP, i.e. no longer within the maximal projection of N, and therefore necessarily starts a new ϕ domain. However, a sequence of adjective plus noun ought to be in a single ϕ; cf. example (2) further on. 5 GRAMTSY was written by Peter-Arno Coppen. Information can be found at http://iris1.let.kun.nl/TSpublic/coppen/GRAMTSY-home.html 6 The notion of ‘root sentence’ stems from a period in transformational generative syntax in which there were ‘root transformations’. Root transformations, like ‘Verb Second’ and ‘Topicalisation’ only applied to root sentences in Dutch. In the syntactic structures that served as input, S nodes were marked by a feature ‘root’ whenever the result of root transformations could occur or actually occurred. 7 As already noted in (Caspers 1994: 181), it is not clear whether a subject is an internal argument. I follow Caspers who, based on example (18a) in (Nespor & Vogel 1986: 197),

25

concludes that subject are not internal arguments. I will also assume that the end of an NP embedded in another NP does not constitute an optional boundary, unless an intervening S node occurs. (see Caspers 1994: 181). 8 The same assumption is made by Gussenhoven; see (Gussenhoven 1988: 93) , example (5). A strictly layered prosodic structure would pose serious problems to AD-restructuring. 9 In fact, plaats is only the first part of the separable verb plaatsvindt (‘place-takes’ or ‘takes place’). The, vindt, has moved to the second position of the root sentence, directly after the subject NP. 10 See Tone Copy in (Gussenhoven 1988: 96).

References Altenberg, B. (1990). ‘Automatic text segmentation in tone units’. In: J. Svartik (ed.), The London-Lund Corpus of Spoken English. Lund: Lund University Press. 287-323. J. Bachenko, J., E. Fitzpatrick (1990). ‘Discourse-Neutral Prosodic Phrasing in English’. Computational Linguistics (16) 3: 155-170. Booij, G. (1995). The phonology of Dutch. Oxford: Oxford University Press. Caspers, J. (1994). Pitch Movements under Time Pressure. Den Haag: Holland Academic Graphics. Croft, W. (1995). ‘Intonation units and grammatical structure’. Linguistics 33: 839-882. Dirksen, A., H. Quené (1993). ‘Prosodic analysis: The next generation’. In: V. van Heuven, L. Pols (eds.). Analysis and synthesis of speech. Berlin: Mouton de Gruyter. 131-141. Forey, P. (1992). Wilde Bloemen van Europa. Baarn: Thieme. Gussenhoven, C. (1988). ‘Intonational phrasing and the prosodic hierarchy’. Phonologica 1988: 89-99. Gussenhoven, C. (1991). ‘The English rhythm rule as an accent deletion rule’. Phonology 8: 1-35. Gussenhoven, C., T. Rietveld (1992). ‘Intonation contours, prosodic structure and preboundary lengthening’. Journal of Phonetics 20: 283-303. Hirst, D. (1993). ‘Detaching intonational phrases from syntactic structure’. Linguistic Inquiry 24 (4): 781-788. Ladd, D.R. (1986). ‘Intonational phrasing: the case for recursive prosodic structure’. Phonology Yearbook: 311-340. Monaghan, A. (1989). ‘Phonological domains for intonation in speech synthesis’. Proceedings Eurospeech 1989: 502-505. Marsi, E., P.A. Coppen, C. Gussenhoven, T. Rietveld (to appear 1996). ‘Prosodic and intonational domains in speech synthesis’. In: J.van Santen, R. Sproat, J. Olive, J. Hirschberg (eds.). Progress in speech synthesis. Springer Verlag. Nespor, M., I. Vogel (1986). Prosodic Phonology. Dordrecht: Foris Publications. O’Shaughnessy, D. (1989). ‘Parsing with a small dictionary for applications such as text to speech’. Computational Linguistics 15: 97-108. Selkirk, E. (1984). Phonology and syntax: the relation between sound and structure. Cambridge: MIT Press.

26