Pitch perception of sounds with different timbre - CiteSeerX

2 downloads 0 Views 153KB Size Report
duction (viola, trumpet and tenor voice) with each other, in order to establish whether and to what extent timbre may exert influence on the perceived pitch of a ...
Alma Mater Studiorum University of Bologna, August 22-26 2006

Pitch perception of sounds with different timbre Allan Vurma Estonian Academy of Music and Theatre Tallinn, Estonia [email protected]

Jaan Ross University of Tartu Tartu, Estonia sound, usually produced by a tuning fork, piano, or oboe. A musician must follow and maintain the accepted overall pitch standard during the whole performance.

ABSTRACT This research is aimed at studying to what extent the perceived pitch of a sound is influenced by its timbre. Pairs of two consecutive tones were presented to thirteen experts who had to decide whether the second tone in each pair was flat, sharp, or equal in pitch, in comparison to the first tone. The tones in a pair could have produced by a viola, trumpet, or a singing voice. Fundamental frequency of the tones was varied stepwise up to half of a semitone in comparison with the standard. Pitch of the singing voice and trumpet sound was perceived about 20 cents higher in average than pitch of the viola sound with the same fundamental frequency.

Pitch may be defined as a subjective quality of sounds which permits to order them according to how high or low the sounds are perceived (ASA, 1960). A succession of sounds with different pitch may form a melody. Pitch is closely related to the fundamental frequency of a sound, which in turn is inversely proportional to the length of a period in sound waveform. When the pitch needs to be measured (or estimated), it is usually expressed in fundamental frequency units, i.e. in cycles per second, or hertz (Hz).

Keywords

Together with the fundamental frequency, pitch may to some extent be influenced by other parameters of the sound, like its spectrum or pressure level. Such influence is usually not very systematic and may depend on a particular listener (Terhardt, 1988). For example, Bannister (1934) found that the pitch of a complex sound is usually perceived somewhat higher than pitch of a sine tone, while Terhardt (1971) has claimed the opposite. Evidence about the influence of timbre upon pitch may also be controversial. Warren and Zatorre (2002) expressed the opinion that timbre can influence pitch when individual tones are presented in isolation but it is difficult to observe the same effect with tones in musical context.

Pitch, tuning, timbre.

INTRODUCTION The quality of musical performance depends to a significant extent on the tuning of performing instruments. This means that intervals comprising a musical work must be tuned correctly, but also that the notes, played by different instruments and expected to have the same pitch, must indeed be perceived as having equal pitch. It is common in music practise that before a performance, the pitch level for every participating instrument is adjusted to a reference In: M. Baroni, A. R. Addessi, R. Caterina, M. Costa (2006) Proceedings of the 9th International Conference on Music Perception & Cognition (ICMPC9), Bologna/Italy, August 22-26 2006.©2006 The Society for Music Perception & Cognition (SMPC) and European Society for the Cognitive Sciences of Music (ESCOM). Copyright of the content of an individual paper is held by the primary (first-named) author of that paper. All rights reserved. No paper from this proceedings may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information retrieval systems, without permission in writing from the paper's primary author. No other part of this proceedings may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information retrieval system, without permission in writing from SMPC and ESCOM.

ISBN 88-7395-155-4 © 2006 ICMPC

In music theory, a number of pre-determined tuning systems, like Pythagorean, just or equally-tempered, has been discussed, each of them based on different principles of ordering the twelve tones within the span of an octave. Numerous studies demonstrate, however, that none of the predetermined tuning systems is maintained in the real performance of music. It seems that no fixed and measurable standards exist for correct intonation of individual tones and intervals in performance of music, because of its subjective and context-dependent nature. For example, when two simultaneous complex tones need to be tuned in uni1838

ICMPC9 Proceedings

son, a well-known objective tool used for assistance in such a task is minimization of the beats evoked by the two sounds. When the same sounds are not together but alternate with each other then no beats are available and pitch adjustment may yield a different result because the sounds were influenced by contextual factors.

recorder SONY TCD10. Distance of the microphone from the mouth was about three centimeters in the latter case. The pitch of the original stimuli was manipulated in fivecent steps from the 220 Hz original, using the pitch correction subroutine of the WaveLab computer software. The five-cent value was chosen as a typical value of the difference limen for pitch (Terhardt, 1988). The step magnitude was increased to 10 cents for the series with the singing voice, as the pilot tests showed much poorer discrimination ability in this case. The maximum deviation from the stan-

This work is aimed at comparing pitches of different production (viola, trumpet and tenor voice) with each other, in order to establish whether and to what extent timbre may exert influence on the perceived pitch of a sound. By timbre we consider all differences between the two sounds which originate from other sources than pitch, loudness and duration.

dard was quarter of a tone (50 cents) to both directions. Loudness was adjusted to be subjectively equal for all sounds used in the experiment.

The following timbre pairs were compared to each other: (1) viola-viola, (2) viola-trumpet, (3) trumpet-trumpet, (4) trumpet-viola, (5) tenor-viola, and (6) viola-tenor. The combinations of tenor-trumpet and trumpet-tenor were omitted in order to keep the duration of the whole experiment in reasonable limits. The pitch of the first sound in a pair always corresponded to 220 Hz while pitch of the second sound was varied as described above. Pilot tests with two participants were used in order to detect the most likely pitch match of the second sound to the first, and four deviations of the second sound to both sides from the expected match, plus the match itself, comprised a single block of stimuli which was repeated five times during an experimental series. There was a total of six series, corresponding to the above sound pairs and a series consisted of 5 x 9 = 45 sound pairs. For each presentation, the order of sound pairs within a series, as well as the order of series themselves was randomized. Care was taken to avoid a situation where the last pair of a five-element block in a series and the first pair of the next five-element block would be identical to each other.

METHOD AND STIMULI Six series of perception tests were conducted. Participants had to compare pitches of two successive sounds using a three-alternative forced choice method. They had to conclude whether the second sound in succession was higher or lower than the first one, or equal to it. There were thirteen participants altogether, five female and eight male. They all were professional musicians, including students from the Estonian Academy of Music and Theatre. Ten participants were from 21 to 26 years old and three participants from 45 to 50 years old. All participants have studied music for at least ten years. Their primary instrument could be piano, violin, cello, double bass, oboe, trumpet, or they were singers or choir conductors. The stimuli used in our experiment originated from two different sources. Viola and trumpet sounds were downloaded from a database at the Wendell Johnson Speech and Hearing Center, Iowa University, USA, at http://theremin.music.uiowa.edu/MIS.html. Tenor voice sounds had been recorded earlier at the Estonian Academy of Music and Theatre for another experiment. All sounds had initially a pitch level corresponding to the fundamental frequency of 220 Hz, or A3. The duration of the viola and trumpet sounds was about two seconds and the duration of the tenor voice sounds about a second. The difference between the instrumental and the vocal sound durations was thought not to affect the pitch perception in any way because the effect of duration in pitch discrimination tasks is noticeable for much shorter (less than 100 ms) sounds than those used in the present experiment (Plack et al 2005). The tones of the viola were produced with a bow. The singer used the vowel /a/ and produced the notes with vibrato, with the amplitude of about one semitone and frequency of 5.6 Hz.

Experiments were conducted within the framework of the perception test module from a well-known speech analysis software Praat4. Stimuli were presented diotically through headphones at about 65 dB. Participants could adjust the volume of sound within the limits of a few decibels. In order to make a forced choice, a participant had to mouseclick on one of the three keys presented on the computer screen or, alternatively, to press one of the three designated keys on the keyboard. After every ten stimulus pairs, the participants could take a rest of unrestricted duration. The overall duration of the experiment depended on an individual participant and varied between 25 minutes and one hour.

RESULTS AND DISCUSSION

Viola and trumpet sounds had been recorded in an anechoic room using a Neumann KM 84 Cardioid condensator microphone, a Mackie 1402-VLZ mixer and a Panasonic SV3800 DAT recorder. The tenor voice sounds had been recorded in the rehearsal room of the Tallinn Philharmonic Society using a head microphone AKG C420 and a DAT

ISBN 88-7395-155-4 © 2006 ICMPC

Figure 1 presents results for the two experimental series where pitches of two sounds produced on the same instrument – viola on the left panel and trumpet on the right panel – were compared to each other. The results correspond to a high extent to what might have been expected. The largest number of in-tune-answers was obtained when the fundamental frequencies of the two sounds were equal 1839

ICMPC9 Proceedings

Figure 1. Results for the two experimental series where pitch of two sounds produced by viola (left panel) or by trumpet (right panel) was compared to each other. Horizontal axis: deviation of the fundamental frequency from the unison (cents), vertical axis: percent of correct answers.

Figure 2. Results for the rest of four experimental series where pitches of different timbres were compared to each other. On the left panel, results for the viola-trumpet and trumpet-viola pairs are presented and on the right panel there are results for the tenor-viola and viola-tenor sound pairs. Only percentages of in-tune-answers are depicted. Horizontal axis: deviation of the fundamental frequency from unison (cents), vertical axis: percent of correct answers. to each other. For the pair of viola sounds, the number of in-tune-answers was equal to the maximum also when the second sound in the pair was five cents higher than the first. This difference in responses between the viola-viola and trumpet-trumpet series, however, was statistically not significant (χ2 = 15(8), p = 0.1). The higher-answers and the lower-answers together were symmetrical or nearly symmetrical with respect to the zero deviation, which shows that neither of them was preferred to another during the experiment. It is remarkable that the percent of in-tuneanswers is significantly lower than one hundred even in the case of zero deviation for both the viola-viola (74 %) and trumpet-trumpet sound pairs (82 %).

and trumpet-viola pairs are presented and on the right panel there are results for the tenor-viola and viola-tenor sound pairs. Only percentages of in-tune-answers are depicted. Results in the Figure 2 show that timbre does influence perceived pitch of a sound, when compared to a sound with another timbre. When the sound of a trumpet or a tenor voice sound is followed by that of a viola, the largest proportion of in-tune answers is obtained when the viola sound has the fundamental frequency about 15 to 20 cents higher than the fundamental frequency of the trumpet or the tenor voice. The same conclusion holds also when the sounds in a pair are presented in reverse order, i.e. for the pairs of viola-trumpet and viola-tenor.

Figure 2 displays results for the rest of four experimental series where pitches of different timbres were compared to each other. On the left panel, results for the viola-trumpet

Figure 3 shows distributions of those answers which estimated the two sounds as unequal in pitch. The data are presented for the viola-trumpet and trumpet-viola pairs

ISBN 88-7395-155-4 © 2006 ICMPC

1840

ICMPC9 Proceedings

with different timbres need to be unequal in their fundamental frequencies, in order to be perceived with the same pitch? Secondly, why are the responses of listeners distributed differently depending on the order of presentation of the sounds in pair?

only. It can be seen that within the same series, the two curves for the lower- and higher-answers, respectively, cross at about plus or minus 15-20 cents instead of crossing at zero (cf. Fig. 1). This confirms the pitch shift observed above in Fig. 2 for the sound pairs with different timbre. The distribution of the three types of answers (in tune, lower or higher) at zero deviation is significantly different for the viola-trumpet and trumpet-viola sound pairs (χ2 = 66(2), p ≤ 0.001). The same holds for viola–tenor and tenor–viola pairs (χ2 = 54(2), p ≤ 0.001).

50

SPL (dB)

40

100 vla-tr "flat" vla-tr "sharp" tr-vla "flat" tr-vla "sharp"

60 40 20 -20

20

0 0

0

20

2000

3000

4000

Figure 4. Long-term average spectra (LTAS) for the three sounds (viola, trumpet and tenor voice) used in the present experiment.

40

cents

Figure 4 presents long-term average spectra (LTAS) for the three sounds (viola, trumpet and tenor voice) used in the present experiment. Both the trumpet and the tenor voice sounds have more sound energy at frequencies higher than 2 kHz than the viola sound. Also the spectral peaks have different locations for the sounds: at about 0.5 kHz for the viola and the tenor voice but at about 1.2 kHz for the trumpet. The tenor voice sound has an additional peak, the socalled singer’s formant, at about 2.7 kHz, which the other two sounds lack. This way, center-of-gravity in the sound spectra of trumpet and tenor voice sounds is somewhat tilted towards higher frequencies as compared to the spectrum of the viola sound. It may explain why the viola sound requires a slightly higher fundamental frequency than the sounds of trumpet and tenor voice, in order to be equal to them in pitch. Our results are similar to findings of Singh and Hirsh (1992), namely, that the direction of change in spectral locus of the harmonic complex tone can deflect the perceived pitch to the same direction.

Figure 3. Comparison of viola and trumpet sounds: distribution of answers which indicated the pitch of two sounds being unequal. Horizontal axis: deviation of the fundamental frequency from unison (cents), vertical axis: percent of answers. The in-tune answers of participants, represented as their distribution curves in the Figure 2, have a somewhat different height and shape in the cases of original and reversed presentation order. When the viola sound precedes the trumpet or the tenor voice sound then the distribution has a higher and more pronounced peak. It shows that the reactions of participants are more unambiguously defined in this case than if the sounds were presented in a reverse order. If we compare distributions of all three types of answers between the viola-trumpet and trumpet-viola conditions, there occur significant differences between them (χ2 = 12(2), p = 0.003). The same is true for the viola-tenor and tenor viola comparisons (χ2 = 10(2), p = 0.006).

In order to explain the asymmetry of response distributions depending on the order of presentation of the two sounds, we may recall habits from conventional music practice. Some successions of musical sounds with different timbre occur more frequently there than the other. For example, a sound of a string instrument is more often used as a reference to a sound produced by human voice. This means that in common practice, a singer needs to adjust the intonation according to a string instrument and not the other way around. A requirement to compare a sound of string instrument to a sound by a human voice may evoke a less determined response from listeners. We may witness this in results of the present experiment (see above).

We may also wish to characterise the tolerance of listeners to the magnitude of mistuning. For this purpose, we define the “correct” unison interval category for two stimuli as the range where the number of in-tune-answers is greater than the number of higher- or lower-answers. The width of this category seems to depend on the timbre of sounds: it is about 15 cents for the viola-viola or viola-trumpet pairs but as much as about 40 cents for the viola-tenor comparison. The unison category width becomes equal to about 15 cents when the singing voice is substituted in the above pair by a trumpet or a viola sound. In order to be able to explain the data, we require answers for at least two questions. Firstly, why do the two sounds ISBN 88-7395-155-4 © 2006 ICMPC

1000

frequency (Hz)

0 -40

tr vl ten

10

80

%

30

1841

ICMPC9 Proceedings

Bannister, H. (1934). Auditory phenomena and their stimulus correlations. In C. Murchison (Ed.), Handbook of general experimental psychology. Worchester, MA: Clark University Press.

CONCLUSIONS Results of the present experiment show that two sounds with different timbre result in a slightly different fundamental frequencies when adjusted by professional musicians to being equal in pitch. The sounds produced by trumpet and by tenor voice must be in average 20 cents lower than viola sounds, in order for them to be perceived with the same pitch than the latter. Distribution of the listeners’ responses depends on the order of presentation of the stimuli. The in-tune-answers are more narrowly peaked at the deviation of 20 cents from the standard when a viola sound precedes a trumpet or a tenor voice sound, and not vice versa. The listeners are more tolerant to mistuning of the singing voice than of viola or trumpet when assessed against the viola timbre.

Plack, J.C. & Oxenham, A.J. (2005). The psychophysics of pitch. In Plack J.C., Oxenham A.J., Fay, R.R., & Popper A.N. (Eds.), Pitch: neural coding and perception. New York, NY: Springer. Singh, P.G. & Hirsh, I.J. (1992). Influence of spectral locus and F0 changes on the pitch and timbre of complex tones. The Journal of the Acoustical Society of America 92, 26502661. Terhardt, E. (1971). Die Tonhöhe harmonisher Klänge und das Oktavintervall. Acustica 24, 126-136.

ACKNOWLEDGMENTS

Terhardt, E. (1988). Intonation of tone scales. Archives of Acoustics 13, 147-156.

This work has been supported by research grant # 4712 from the Estonian Science Foundation. We gratefully acknowledge comments and language check by Professor Ilse Lehiste of the Ohio State University.

Warrier, C.M. & Zatorre, R.J. (2002). Influence of tonal context and timbral variation on perception of pitch. Perception and Psychophysics 64, 198-207.

REFERENCES ASA (1960). Acoustical terminology SI, 1-1960. New York: American Standards Association.

ISBN 88-7395-155-4 © 2006 ICMPC

1842