An experimental evaluation - RUhosting

1 downloads 0 Views 3MB Size Report
Jan 16, 2014 - recorded the utterances It's their honeymoon and It's a lullaby on tape, ... gives the resultant contours for It's a lullaby.2 After resynthesis, the 15.
An experimental evaluation of two nuclear-tone taxonomies* C. GUSSENHOVEN and A. C. M. RIETVELD

Abstract Semantic-difference scores, as obtained in two auditory experiments in which native speakers of English were asked to estimate the semantic contrast in paired nuclear tones, were correlated with two sets of theoretical differences, as predicted by two recent theories of the structure of English intonation, Pierrehumbert (1980) and Gussenhoven (1983a). The latter theory proved to be a better predictor of both sets of experimental scores. Not all component elements in the theories turned out to correlate significantly with the experimental scores. In the former theory, only the element represented by the phrase accent appeared to account for some of the variation, while in the latter the main predictors were pitch range and tone modifications. 1. Introduction Even if we restrict ourselves to single-accent contours, as possible on such structures as HelLO or TELL me please, the number of different intonation patterns of English is quite large. A number of analyses for these nuclear intonation contours are available. In some analyses, they are seen as indivisible units. In spite of their holistic approach, such analyses often group certain nuclear contours as variants of each other. The taxonomies may be primarily based on functional criteria, like Halliday (1967) and Brazil (1985), or, like Bolinger (1958, 1987) and Crystal (1969), be primarily informed by formal criteria. Partly as a result of these different emphases, the proposed taxonomies tend to be fairly divergent. The advent of autosegmental phonology, in which intonation contours are seen as strings of level tone segments (H for 'high', L for low'; Leben 1976; Goldsmith 1976; Liberman 1975), has not led to more agreement in this area. Pierrehumbert (1980) and Beckman and Linguistics 29 (1991), 423-449

0024-3949/91/0029-0423 $2.00 © Walter de Gruyter

Brought to you by | Radboud University Nijmege Authenticated | 131.174.187.135 Download Date | 1/16/14 9:56 AM

424

C. Gussenhoven and A. C. M. Rietveld

Pierrehumbert (1986) give an account formulated entirely in terms of the tone segments H and L. Gussenhoven (1983a) is likewise based on tone segments but incorporates elements reminiscent of the older analyses. Again, the agreement between these two theories is low, and they therefore make rather different predictions about the degree of relatedness between one nuclear contour and the next. The main reason for this continued lack of agreement is probably that it is not very clear what should count as evidence. Internal evidence will have to be based on either form or meaning, but in the case of intonation it is difficult to make a convincing case on the basis of either aspect. Phonological rules, which would be the prime source of evidence for the formal representation of nuclear contours, are controversial, because their postulation will be motivated in terms of the theory in which they are incorporated. In the analysis of Pierrehumbert, there are phonetic implementation rules that refer to tone segments (see below), but because of the partially abstract nature of the input to these rules, it is not clear that these rules capture natural classes. And because no phonological rules that delete, insert, or change tone segments are assumed at all by Pierrehumbert and Beckman, any such rules postulated in other theories could be argued to be superfluous. Semantic criteria are also problematic, because the great variability in communicative effect that the same intonation contour can have in different contexts makes it difficult to assign meaning to contours at all (Gunter 1972; Liberman 1975: 142). In short, collecting internal evidence to decide between different intonational theories is not easy. The situation for external evidence is hardly less precarious. One could try and tap native-speaker intuitions by means of a contour-sorting task (Collier 1975) or a similar experimental procedure. The problem with this approach is that it is not clear why native speakers should be able to give meaningful judgments about the categorial status of intonation contours (see Pierrehumbert 1980: 60; Collier 1989). That is, even though subjects perform the task, it is not clear that their decisions directly reflect the linguistic structure of the objects concerned. This objection also applies to a comparative-judgment task, in which raters are asked to estimate the degree to which two contours differ, either in meaning or in form. A task of this kind was used in Gussenhoven (1983b) with synthetic stimuli. Although the subject is no longer asked to give a categorial judgment on the sameness or difference of two contours and can express a graded judgment, we still do not really know what knowledge is being addressed. The kind of judgments we are interested in should ideally be based on some context-independent internal structure of the nuclear-tone inventory, rather than on the incidental lexical and situational context in which

Brought to you by | Radboud University Nijmege Authenticated | 131.174.187.135 Download Date | 1/16/14 9:56 AM

Two nuclear-tone taxonomies 425 the nuclear tone is used. But can we be sure that contextless utterances are judged in this way? Might not the use of contextless intonation patterns degenerate into a comparison of linguistically uninterpreted surface forms? This would obviously be undesirable, since, as is well known, surface similarity need not, and indeed often does not, correspond with structural similarity. Although we will attempt to obviate this problem in our experiment, we do not think that it can be resolved. Ultimately, theories of English intonation must be evaluated on the basis of larger collections of observations about the semantic properties of English utterances than are now available. At this point, however, we feel it is useful to confront theories with native-speaker judgments. Paradoxically, the very inaccessibility of the structure of intonational paradigms is precisely what makes it interesting to do an experiment. No one would tackle the problem of the structure of the English verbal phrase with the help of an experiment, because it is reasonably clear what its morphological structure is. A differential judgment task with forms like has been painting, mil paint, was painted, etc., would probably give good results (that is, results that would reflect the standard analysis in terms of main verb, voice, mood, and aspect), regardless of whether the semantic difference or the phonological difference was used as the response variable. At the very least, the results of such a test with intonation contours should provide us with a somewhat more secure basis for particular analyses. In this article, we report on an attempt to test the predictions made by the two autosegmental theories mentioned above by means of an experiment. These theories were Pierrehumbert (1980) and Beckman and Pierrehumbert (1986) on the one hand, and Gussenhoven (1983a) on the other. We first conducted a comparative-judgment experiment, in which artificially produced contours on resynthesized speech were compared and rated for semantic difference. As we observed above, a potential danger here is that raters simply compare uninterpreted surface similarities. In order to have some idea of the extent to which this is the case, we also conducted a pencil-and-paper experiment with (non-English) subjects, who rated stylized visual representations of these contours for difference in shape. The idea here is, of course, that the structure of the English intonation contours is defined not only by the surface differences between one pitch pattern and the next, but also by more abstract relations holding among the different contours, that is, that morphological shape may in part be arbitrary, or noniconic. By conducting both experiments, we will be able to separate the contribution of the surface differences, as established on the basis of the visual task, from the results of our auditory test. Next, it would also be of interest to know the extent

Brought to you by | Radboud University Nijmege Authenticated | 131.174.187.135 Download Date | 1/16/14 9:56 AM

426 C. Gussenhoven and A. C. M. Rietveld to which decontextualized synthetic contours elicit judgments different from natural, contextualized stimuli. Therefore, we ran a third experiment in which the same contours appeared in naturally spoken answers to naturally spoken sentences. In section 2, we present the details of the two theories and translate the predictions they make about the relatedness of the nuclear contours into numerical terms. In section 3, we describe the experiments and discuss the results. 2. The theories 2.1. Pierrehumbert (1980); Beckman and Pierrehumbert (1986) In Pierrehumbert (1980) and Beckman and Pierrehumbert (1986), henceforth theory P, intonation contours are described as phonetic implementations of sequences of tone segments. The only tone segments assumed in this analysis are H and L. A complete string of tone segments for a oneaccent contour is three or four segments long. They are built up as follows. For each accented syllable, a pitch accent is chosen. A pitch accent consists of either one or two tone segments. For two-tone pitch accents, either order of H and L occurs, while in addition either segment may be designated as associating with the accented syllable (that is, have the 'star'). There are thus six pitch accents: H*, L*, H* + L, H + L*, L* + H, and L + H*. When the accented syllable occurs finally in an intonational phrase, the pitch accent is followed by two further tone segments. The first is called the phrase accent, which is either H — or L (where the hyphen is used as a diacritic for 'phrase accent'), and the second the boundary tone, which is either H% or L% (where % is a diacritic for 'boundary tone').1 The realization of the ( 6 x 2 x 2 = ) 24 contours does not always follow a straightforward course from high pitch (H) to low pitch (L). The following phonetic implementation rules are applied: 1. A H after a two-tone pitch accent is lowered to mid. This effect is due to a rule called 'downstep'. For example, in H* + L H —H%, the H— would in fact have lower pitch than H*. 2. A L% after a H— is 'stepped up', that is, has the same pitch as the immediately preceding H — . Concomitantly, a H% after a H— is always higher than the preceding H — . For example, the realization of H* H — L% is a high-level contour, since all three tone segments are in effect 'high'. After a two-segment pitch accent, a sustained mid-level pitch occurs, which is the result of downstep and upstep operating together.

Brought to you by | Radboud University Nijmege Authenticated | 131.174.187.135 Download Date | 1/16/14 9:56 AM

Two nuclear-tone taxonomies 427 3. The L of H * -h L is not interpreted phonetically (Pierrehumbert 1980: 86ff). A perceptual experiment involving comparisons among 24 contours would be a fairly unmanageable task. Reduction of the inventory can easily be effected by leaving out all two-tone pitch accents which have the 'star' on the second tone segment. These contours really involve preaccentual pitch configurations, which in other theories are dealt with separately from the nuclear-tone contours proper (Lindsey 1985: 59). This leaves us with 16 contours. In Figure 1 we give diagrammatic representations of these 16 contours. The relative prominence of the contour is an orthogonal variable: Hs are scaled higher and Ls (but no L%) lower as the prominence increases. The contours in Figure 1 are all assumed to have the same prominence, or 'range'. Pierrehumbert's theory makes clear predictions about the structure of the English inventory of nuclear-tone contours. The differences between one contour and the next is expressed by the number of different terms in each of the three tonal paradigms 'pitch accent', 'phrase accent', and 'boundary tone'. We could assume that a different first tone segment of the pitch accent (that is, H* instead of L*) counts as a difference of 1; a different second tone segment (that is, L* + H instead of L* or H* + L) could also count as 1. As a result, L* and L* + H, for instance, are characterized as more akin (difference of 1) than H* and L* + H (difference of 2). If we assign 1 to the phrase accent (H— vs. L —) as well as to the boundary tone, the maximum difference between any two nuclear

L-L%

L-H%

H-L%

H-H%

HVL

L*+H Figure 1.

Diagrammatic representation of 16 nuclear tones in Pierrehumbert's theory

Brought to you by | Radboud University Nijmege Authenticated | 131.174.187.135 Download Date | 1/16/14 9:56 AM

428

C. Gussenhoven and A. C. M. Rietveld

contours becomes 4 (for example, H* + L L-L% and L* H-H%) and the minimum difference 1 (for example, H* + L H-L% and H* + L H —H%). Not all the cells of the matrix in Figure 1 are represented in our experiment. Two nuclear-tone contours, H* + L L-L% and H* + L L —H%, must be excluded from our investigation, because the difference between them and the tones with single-segment pitch accents (H* L — L% and H* L-H%) only surfaces (a) if they are mapped onto multiword texts (Pierrehumbert 1980: 51), or (b) in nonfinal position, where the twosegment but not the one-segment pitch accent triggers downstep (Pierrehumbert 1980: 86). Moreover, we excluded L* L-L%, which is a low, falling tone (Pierrehumbert 1980: 198) and would appear to require a high syllable before the accented syllable. In view of the default low pitch which preceded the other contours, we decided not to include L* L —L% in the experiment. Of the 13 remaining tones, two — H* + L H-L% and L* + H H-L% — have two different, though related, interpretations (Pierrehumbert 1980: 46), comparable to the different interpretations of Liberman's (1975: 104) 'warning/calling contour': a chanted and an ordinary (unchanted) interpretation. In either case, the two interpretations correspond to different nuclear tones in the rival theory. We therefore had to add two tones and ended up with 15 in all. Figure 2 gives a matrix for the 105 differences among the 15 tones calculated in the manner described above. The 'chanted' interpretations of H* + L H-L% and L* + H H-L% are marked 'c'. Of course, their numerical characterization does not differ from the corresponding nonchanted interpretation. This data set is referred to as THEORYP.

2.2.

Gussenhoven (1983a)

Gussenhoven (1983a), henceforth theory G, assumes that there are basic tone words, consisting of two or three tone segments, and a number of 'modifications' which produce variants of the basic tone words. Three tone words are assumed: ÖL, £,H, and HLH, in which the first tone segment is 'starred', that is, designated as associating with the accented syllable. The modifications are seen as affixes, whose phonological content is not necessarily expressed in terms of tone segments but may be an instruction of some sort. Specifically, the following modifications are postulated: 1. DELAY. The modification entails the association of the first (starred) tone segment to the right of the accented syllable, with concomitant rightward displacement of following tone segments.

Brought to you by | Radboud University Nijmege Authenticated | 131.174.187.135 Download Date | 1/16/14 9:56 AM

Two nuclear-tone taxonomies 429 HLL

e

HLH

1

0

HHL

1

2

θ

HHH

2

1

1

e

HLH L

2

3

1

2

θ

HLHLc

2

3

1

2

θ

θ

HLH H

3

2

2

1

1

1

0

LLH

2

1

3

2

4

4

3

e

LHL

2

3

1

2

2

2

3

2

β

LHH

3

2

2

1

3

3

2

1

1

θ

LHLL

2

3

3

4

3

3

4

2

2

3

e

LHLH

3

2

4

3

4

4

3

1

3

2

1

0

LHHL

3

4

2

3

2

2

3

3

1

2

1

2

β

LHHLc

3

4

2

3

2

2

3

3

1

2

1

2

θ

θ

LHHH

4

3

3

2

3

3

2

2

2

1

2

1

1

1

HLL

HLH

HHL

HHH

H LHL

LLH

LHL

LHH

LHLL

LHLH

LHHL

Figure 2.

HLHLc H LHH

β

LHHLc LHHH

Theoretical distances between 15 tones according to theory Ρ

2. HALF-COMPLETION. The trajectory described by the tone word is constrained so as not to cross the mid level. 3. STYLIZATION. Sustained level pitches are created, with concomitant lengthening. L becomes the falling chanted contour of other descriptions: two downstepping middish-level pitches; LH shares the first plateau of stylized ftL and is followed by a low plateau with a rise at the end; £,H yields a single mid-level pitch. The notion 'stylization' as an intonational morpheme meaning 'routine' was introduced by Ladd (1978), as were the characterizations of stylized L and stylized £,H given here. The modifications are semantically characterized by their position on a single semantic continuum running from 'very significant' to 'very routine'. On this scale, the order of the modifications is DELAY, UNMODIFIED, HALF-COMPLETION, and STYLIZATION. In Figure 3, we give diagrammatic representations of the 12 nucleartone contours described by this theory. It, too, makes clear predictions about the degree to which different nuclear-tone contours are akin. Tone words are categorially different. The modifications have a graded effect on differences between contours. Thus, the difference between a stylized

Brought to you by | Radboud University Nijmege Authenticated | 131.174.187.135 Download Date | 1/16/14 9:56 AM

430 C. Gussenhoven and A. C. M. Rietveld DELAYED

UNMODIFIED

HALF-COMPLETED

STYLIZED

HL

HLH

LH

Figure 3. Diagrammatic representation of 12 nuclear tones in theory G

version of some tone word and a delayed version is greater than that between a stylized version and an unmodified one, which in turn is greater than that between a stylized version and a half-completed one. Range is assumed as a continuous variable. If the range increases, everything above the low baseline is raised proportionately, much as if the space between high and low were an elastic band, attached to a fixed low baseline and a variable high reference line. 2.3.

Interpreting the contours of theory P in terms of theory G

Below, we interpret each of the 16 contours of theory P in terms of theory G. H* L — L%. This is the unmodified realization of HL. H* L — H%. This is the unmodified realization of HLH. H* H—L%. This is a high-level tone, a stylized LH in theory G. The pitch height of H* H-L% is that of H*; but a stylized ί,Η would have middish pitch if equivalent in range to L. The tone therefore corresponds to a wide-range, stylized ί,Η. Η* Η— H%. This tone has the same high level as the previous tone but has a rise at the end. While it clearly corresponds to some type of ί,Η, theory G cannot accommodate it (see House 1985): though end-points will vary with range, all (nonstylized) i,H's start from the base line. We will assume that there is a separate variable 'upper register', by which all

Brought to you by | Radboud University Nijmege Authenticated | 131.174.187.135 Download Date | 1/16/14 9:56 AM

Two nuclear-tone taxonomies 431 pitch movements are carried out in the upper half of the speaker's register, that is, above a mid-level pitch. Since the tone's trajectory is situated entirely at and above the pitch for H*. we will in addition assume wide range. The tone therefore corresponds to a wide-range £,H, realized in the upper register. H* + LL-L%. Equivalent to H* L-L%. H* + LL-H%. Equivalent to H* L - H%. H* + L H—L%. This tone falls from high to a mid-level pitch. The tone sequence can be interpreted in two ways (Pierrehumbert 1980: 46). In one interpretation it is the 'calling contour' or 'chanted pattern', that is, HL with STYLIZATION in theory G. Since the first plateau of stylized ÖL would have lower pitch than H in theory G, in this interpretation H* + L H-L% corresponds to a wide-range, stylized ÄL. In the 'nonchanted' interpretation, the tone corresponds to a HL with HALFCOMPLETION. H* + LH-H%. This is a HLH with HALF-COMPLETION. L* L — L%. Not included. It corresponds to a low-range ÖL preceded by a high 'prehead'. L* L — H%. This is a narrow-range LH in theory G. Since the rising movement comes at the very end, DELAY is assumed. L* H-L%. This tone rises sharply and then trails off at a level pitch. In theory G, this corresponds to , with HALF-COMPLETION. Since L* H- L% reaches the pitch height of H*, wide range must be assumed. L*H-H%. This is an unmodified fjH. Since the pitch rises to above the level of H*, wide range must be assumed. L* + H L-L%. This tone corresponds to ÖL with DELAY. L* + H L - H%. This tone corresponds to ÖLH with DELAY. L* + H H-L%. This tone corresponds to ÖL with DELAY, that is, the peak is reached in the syllable after the accented one. From the peak, the tone falls to middish pitch, so that we have — again — two interpretations: either HALF-COMPLETION or STYLIZATION ('chanted pattern').

Brought to you by | Radboud University Nijmege Authenticated | 131.174.187.135 Download Date | 1/16/14 9:56 AM

432

C. Gussenhoven and A. C. M. Rietveld

L* + H H-H%. This tone corresponds to nLH with DELAY and HALF-COMPLETION. Table 1 summarizes these correspondences. In order to translate the linguistic differences in terms of theory G catalogued above into numerical differences, we assigned the following values to the various parameters: Difference Difference Difference Difference

between tone words: 1 between two adjacent modifications: 1 of range: 1 of register: 1.

To take the difference between H* L-L% and H* H-L% as an example: H* L-L% is an unmodified ÖL, while H* H-L% is a stylized, wide-range £,H. Therefore, the difference is 1 (for the difference between ÖL and LH), plus (2 1 =)2 (UNMODIFIED and STYLIZATION are two modifications apart), plus 1 for the range difference, or 4. Tones with two modifications (for example, nL with DELAY and HALF-COMPLETION) presented a problem. The sum of the differences with each of the tone's modifications and that, or those, of the tone with which it is compared would seem to be disproportionately large. It was decided to take the mean of the two smallest differences. Thus, a DELAYED HALF-COMPLETED tone differs from a DELAYED STYLIZED tone by DELAY-DELAY = 0, plus HALFCOMPLETION-STYLIZATION=1, divided by 2, equals 0.5. In this way we established the numerical differences among all 15 tones, which set is referred to as THEORYG. It is given in Figure 4. As will be clear, there are rather large discrepancies between the two matrices. Pearson's correlation coefficient between THEORY? and THEORYG (that is, between Figure 2 and Figure 4) is 0.38, which means that one theory accounts for 14% of the variance predicted by the other.

3.

The experiments

Three experiments were run in which subjects rated pairs of stimuli for perceived difference. One of these involved visually presented stimuli, henceforth VIS, and two involved auditorily presented stimuli. The first of the auditory experiments used contextless utterances with artificial pitch contours, henceforth referred to as SYN, and the second used contextualized, naturally spoken stimuli, henceforth referred to as NAT. We will describe these experiments in the order VIS, SYN, and NAT.

Brought to you by | Radboud University Nijmege Authenticated | 131.174.187.135 Download Date | 1/16/14 9:56 AM

α &

•3 S?

i I

„a

ON

£ ffi

*

JH ^ Q

1

\

3

« '

X

»X Q ^

1

1

JC

s: •S g 5

2 j

?

j

l

X

1

>M

OC

^ ^

| E .53 £ ί , a K

1 o -C

1

2\ *

o J CJ

•K

21 E

ffi

ι

ξ

J

J Γ^ D

> ^ H-3 >TH J W K

*E Q < ι ι

0

S JC

α

1

oN \ N·«

X

«έί ^

O

1 •r CJ MH J*J

^

4> ^*

Ό O« D< -^

52 l

^""

T*

K •^ ιΊ

J

_} ω »ffi O

i i

ft, Ο

\

& •S

*

>,

l »^.

X H

t>

^2

?

1 s, o

4 α s:

1

1 ^ s \\ ii H to

X

|

Sc

\ *

»—3 W *E Q

i i

ffi |

K »X

\ \

X

*

1

*C ·

o

\ \

g

1 SS

11 i! H