Perception of musical pitch and lexical tones by Mandarin-speaking musicians Chao-Yang Leea兲 and Yuh-Fang Lee School of Hearing, Speech and Language Sciences, Ohio University, Athens, Ohio 45701
共Received 9 July 2009; revised 8 October 2009; accepted 16 October 2009兲 The relationship between music and language processing was explored in two perception experiments on the identification of musical notes and Mandarin tones. In the music task, Mandarin-speaking musicians were asked to identify musical notes of three timbres without a reference pitch. 72% of the musicians met the criterion for absolute pitch when an exact match was required, and 82% met the criterion when one-semitone errors were allowed. Accuracy of identification was negatively correlated with age of onset of musical training, and piano notes were identified more accurately than viola and pure tone stimuli. In the Mandarin task, the musicians were able to identify, beyond chance, brief Mandarin tone stimuli that were devoid of dynamic F0 information and cues commonly considered necessary for speaker normalization. Although F0 height detection was involved in both musical note and Mandarin tone identification, performances in the two tasks were not correlated. The putative link between absolute pitch and tone language experience was discussed. © 2010 Acoustical Society of America. 关DOI: 10.1121/1.3266683兴 PACS number共s兲: 43.75.Cd, 43.71.Hw 关DD兴
Music and language are distinctively human activities. Their parallels and contrasts have garnered much attention 共Patel, 2008兲. However, the relationship between music and language processing remains underspecified. In this study, we explored the music-language relationship by examining pitch perception in music and lexically contrastive tones. Pitch perception is an integral part of music perception. Pitch variations can also signal a difference in word meaning in lexical tone languages. For example, the syllable ma in Mandarin means “mother” with a level tone 共tone 1兲, “hemp” with a rising tone 共tone 2兲, “horse” with a low tone 共tone 3兲, and “scorn” with a falling tone 共tone 4兲. In other words, music perception and lexical tone perception both involve mapping pitch onto discrete musical or linguistic categories. Given the role of pitch in musical and linguistic contrasts, whether the same perceptual mechanism is implicated in both musical and linguistic pitch perception has important implications for the music-language relationship. In the current study, we conducted two perception experiments to address this issue: one on the perception of musical notes without a reference pitch, and the other on the perception of brief, isolated Mandarin tones produced by multiple speakers. Our main goal was to evaluate the hypothesis that absolute pitch is associated with tone language experience 共Deutsch, 2002, 2006; Deutsch et al., 2009, 2004a, 2006兲. The musical note identification experiment was a variant of the absolute pitch task developed by Deutsch et al. 共2006兲, which has been used to gauge the presence of absolute pitch 共Deutsch et al., 2009, 2006; Lee and Hung, 2008兲. The Mandarin tone identification experiment was developed to assess how well Mandarin tones could be identified with-
a兲
Author to whom correspondence should be addressed. Electronic mail:
[email protected].
J. Acoust. Soc. Am. 127 共1兲, January 2010
out dynamic F0 information 共Lee, 2009; Lee and Hung, 2008; Lee et al., 2008, 2009, 2010兲. In particular, isolated Mandarin syllables produced by multiple speakers were digitally processed to generate “onset-only” syllables. As the majority of the F0 contour was neutralized in these syllables, listeners were forced to use F0 height information for tone identification. Since the detection of F0 height is involved in both musical note identification and Mandarin tone identification, we examined the correlation of performances between the two tasks. If a common perceptual mechanism is implicated, performances in the two tasks should be correlated. A. Absolute pitch and tone language
The link between absolute pitch and tone language was first suggested by Deutsch and colleagues 共Deutsch, 2002, 2006; Deutsch et al., 2004a, 2006, 2009兲. Absolute pitch is “the ability to identify the frequency or musical name of a specific tone or, conversely, the ability to produce some designated frequency, frequency level, or musical pitch without comparing the tone with any objective reference tone” 共Ward, 1999兲. Only a small percentage of individuals in North America and Europe possess absolute pitch 共Profita and Bidder, 1988兲, and the genesis of this ability remains unclear. The idea proposed by Deutsch and colleagues is that individuals with absolute pitch are able to associate a particular pitch with a verbal label, just as tone language speakers can associate a particular pitch or a combination of pitches with a lexical tone category. Considering this functional similarity, tone language speakers could be said to possess a form of absolute pitch. In other words, absolute pitch is treated by tone language speakers as a feature of speech. Deutsch and colleagues further hypothesized that infants raised in a tone language environment acquire absolute pitch
0001-4966/2010/127共1兲/481/10/$25.00
© 2010 Acoustical Society of America
481
Author's complimentary copy
I. INTRODUCTION
Pages: 481–490
482
J. Acoust. Soc. Am., Vol. 127, No. 1, January 2010
and nonmusicians, and musical note identification by the musicians. In the Mandarin task, all participants were given a brief tutorial on Mandarin tones and were asked to identify the tones of the syllable sa produced by 32 speakers. The stimuli included intact syllables and acoustically modified syllables with limited F0 information 共i.e., silent-center and onset-only syllables兲. Despite having no prior experience with Mandarin, the musicians identified the intact and silentcenter tones beyond chance. The musicians also outperformed the nonmusicians, indicating that some kind of musical ability facilitated lexical tone identification. In the music task, the musicians were asked to identify synthesized musical notes of three timbres without a reference pitch. Their performance was at chance level, suggesting that their superior performance over nonmusicians in Mandarin tone identification did not arise from absolute pitch. However, since none of the musicians actually possessed absolute pitch, the lack of correlation found between the two tasks could not be fully interpreted. It should be noted that the absolute pitch hypothesis proposed by Deutsch and colleagues does not make explicit predictions about the association between lexical and musical pitch perception performances as were examined in Lee and Hung 共2008兲. In particular, Deutsch and colleagues hypothesized that for Mandarin speakers, absolute pitch for lexical tones is acquired as tones in a first tone language, whereas absolute pitch for musical tones is acquired as tones in a second tone language. Given the distinction between first and second languages, the hypothesis does not necessarily predict an association between absolute pitch and lexical tone processing in tone language users. The hypothesis, however, should favor such an association if absolute pitch for music and absolute pitch for speech share common brain mechanisms 共Deutsch, 2002兲. B. Perception of vocal pitch height
The perception of pitch height is not only relevant for music perception, but also implicated in lexical tone perception. For languages with level tones 共e.g., Cantonese兲, detection of F0 height is necessary for distinguishing the level tones. For languages with contour tones 共e.g., Mandarin兲, where tones could be identified based on dynamic F0 information, F0 height can still influence tone perception 共Gandour, 1983; Gandour and Harshman, 1978; Lee, 2009; Massaro et al., 1985兲. If the same perceptual mechanism is involved in musical pitch and lexical tone perception, one would expect a positive correlation between pitch height detection performance for musical notes and lexical tones. The use of F0 height is particularly relevant when dealing with lexical tones produced by multiple speakers. Since speaking F0 range differs across individuals, absolute F0 values of a particularly tone are likely to vary by speaker. For example, a phonologically low tone produced by a female speaker could be acoustically equivalent to a phonologically high tone for a male speaker. That means F0 height judgment will have to be determined with respect to the perceived F0 range of a speaker, a process commonly referred to as speaker normalization 共Johnson, 2005兲. Potential sources of C. Lee and Y. Lee: Perception of musical pitch and Mandarin tones
Author's complimentary copy
for lexical tones just like they would acquire other features of speech. When they receive musical training, they acquire absolute pitch for musical tones in the same way as they would acquire the tones of a second tone language. By contrast, for individuals reared in a nontone language environment, since no linguistically relevant absolute pitch has been acquired, those individuals would acquire absolute pitch for musical tones as if they would acquire the tones of a first tone language. Deutsch and colleagues therefore predicted more stable and precise pitch templates as well as higher prevalence of absolute pitch in speakers of a tone language. The evidence for the hypothesis has come from acoustic studies comparing the consistency of lexical pitch production by tone and nontone language speakers. In particular, Deutsch et al. 共2004a兲 found that the average F0 of word enunciations across recording sessions was more consistent for tone language speakers than nontone language speakers. In addition, while the nontone language speakers showed a significant decrease in pitch consistency across days relative to immediate repetition, the tone language speakers were equally consistent across days as on immediate repetition. This finding suggests that the pitch templates used by the tone language speakers were more stable over the long term than those used by the nontone language speakers. Burnham et al. 共2004兲 replicated the findings of Deutsch et al. 共2004a兲 with a variant of the word production task, but noted that the difference between tone and nontone languages was fairly small. Studies on the prevalence of absolute pitch in tone and nontone language speakers have also provided support for the putative link between absolute pitch and tone language. Deutsch et al. 共2006兲 asked English- and Mandarin-speaking musicians to identify piano notes without a reference pitch. They found a significantly higher percentage of absolute pitch possessors in the Mandarin speakers. For both groups of musicians, absolute pitch was more prevalent in individuals with an earlier onset of musical training. When the age of onset of musical training was controlled, absolute pitch was still more prevalent in the Mandarin speakers, further supporting the role of tone language experience in absolute pitch. To rule out the potential contribution of ethnic heritage, Deutsch et al. 共2009兲 used the same task to examine absolute pitch in musicians of Caucasian heritage 共without tone language experience兲 and of East Asian heritage 共with three levels of tone language proficiency: very fluent, fairly fluent, and nonfluent兲. They found that identification accuracy was positively associated with tone language experience. Furthermore, the accuracy of the East Asian nonfluent group was comparable to the accuracy of the Caucasian group, effectively ruling out a genetic interpretation. In addition, performance of the very fluent group was comparable to the performance of the Chinese musicians reported in Deutsch et al. 共2006兲, indicating that the superior performance was not likely due to the country where the musicians received their music education. To obtain a direct measure of both musical and lexical pitch perception abilities, Lee and Hung 共2008兲 examined Mandarin tone identification by English-speaking musicians
C. Summary and predictions
Our main goal in this study was to evaluate the hypothesis that absolute pitch is associated with tone language experience. To that end, two perception experiments were conducted with Mandarin-speaking musicians. In the absolute pitch task, participants were asked to identify synthesized musical tones of three timbres 共piano, pure tone, and viola兲 in the absence of a reference pitch. It was expected that the prevalence of absolute pitch in these musicians would be comparable to the Mandarin speaker data reported in Deutsch et al. 共2006, 2009兲. It was also expected that performance in the absolute pitch task would be negatively correlated with the age of onset of musical training. In the ManJ. Acoust. Soc. Am., Vol. 127, No. 1, January 2010
darin tone task, participants were asked to identify isolated, onset-only, multispeaker Mandarin tones. It was expected that the musicians would be able to identify the tones with accuracy beyond chance, just as their nonmusician counterparts did 共Lee, 2009兲. Finally, if a common perceptual substrate existed between pitch perception in music and language, performances in musical note identification and Mandarin tone identification should be positively correlated. II. EXPERIMENT 1: MUSICAL NOTE IDENTIFICATION A. Method 1. Materials
The materials used in this experiment were identical to those used in Lee and Hung 共2008兲. In particular, 36 notes that spanned a three-octave range from C3 共131 Hz兲 to B5 共988 Hz兲 were synthesized with three timbres 共pure tone, piano, and viola兲 for a total of 108 notes. The pure tones were synthesized using MATLAB 共The MathWorks, Natick, MA兲. The piano and viola notes were synthesized with a Kurzweil K2000 synthesizer tuned to the standard A4 at 440 Hz. The duration for all notes was 500 ms. For each timbre, the 36 notes were ordered such that any two consecutive notes were separated by more than an octave. The purpose of the separation was to prevent listeners from developing relative pitch as a reference for the task 共Deutsch et al., 2006兲. The 36 notes of a given timbre were divided into 3 blocks of 12 notes, with a 5 s interstimulus interval and a 10 s break between the blocks. Notes of a given timbre were always presented together. Six lists of stimulus presentation were created to counterbalance the order of timbre presentation. The 72 participants were randomly assigned to 1 of the 6 lists such that a given list was used for 12 participants. 2. Participants
Seventy-two musicians were recruited from the student population at National Taiwan Normal University in Taipei, Taiwan. They included 52 females and 17 males with an age range from 18 to 24 years 共mean= 20.6, SD= 1.6兲. All reported normal hearing and all identified Mandarin as their native language. The age at which they first received musical training ranged from 4 to 13 years 共mean= 6.7, SD= 2.2兲. Their major instrument or area of concentration included cello 共2 individuals兲, clarinet 共6兲, composition 共4兲, double bass 共3兲, erhu 共5兲, euphonium 共1兲, flute 共5兲, harp 共1兲, horn 共3兲, liuqin 共1兲, oboe 共2兲, percussion 共3兲, piano 共21兲, saxophone 共1兲, trombone 共1兲, trumpet 共4兲, tuba 共1兲, viola 共1兲, violin 共4兲, and vocal performance 共3兲. 3. Procedure
The synthesized musical notes were saved as individual audio files and were imported to the Brown Laboratory interactive speech system 共BLISS兲 共Mertus, 2000兲 for stimulus presentation. Twelve practice trials, synthesized as oboe sounds with the Kurzweil K2000 synthesizer, were given prior to the actual experiment to familiarize the participants with the presentation format. The practice notes were se-
C. Lee and Y. Lee: Perception of musical pitch and Mandarin tones
483
Author's complimentary copy
information for speaker normalization include syllableinternal F0 contours, external context, and familiarity with speaker voice through repeated exposure 共Fox and Qi, 1990; Leather, 1983; Lin and Wang, 1984; Moore and Jongman, 1997; Wong and Diehl, 2003; Zhou et al., 2008兲. To isolate the contribution of F0 height detection in lexical tone identification, it would be important to control for these factors. Lee 共2009兲 examined Mandarin tone identification from stimuli deprived of F0 contour, external context, and familiarity with speaker voice. The Mandarin syllable sa, produced by 32 speakers of both genders, was digitally processed to generate stimuli that included only the fricative and the first six glottal periods, thereby neutralizing the dynamic F0 contrasts among the four tones. Each stimulus was presented just once in isolation to native Mandarin listeners, who had no prior exposure to any of the speaker voices. Despite the impoverished nature of the stimuli, tone classification accuracy exceeded chance. Acoustic analyses showed that the high-onset 共tones 1 and 4兲 and low-onset tones 共tones 2 and 3兲 contrasted in F0, duration, and two voice quality measures 共i.e., F1 bandwidth and spectral tilt兲. Correlation analyses showed that F0 covaried with the voice quality measures and that tone classification based on F0 height also correlated with the voice quality measures. Since the same acoustic measures consistently distinguished the female from the male stimuli, speaker gender detection was proposed to be the basis of the F0 height judgment. Similar gender-based accounts for vocal pitch detection have also been noted in literature 共Deutsch, 1991; Deutsch et al., 1990; Dolson, 1994; Honorof and Whalen, 2005兲. What remains unresolved is whether the ability to use F0 height for lexical tone identification is associated with the ability to identify F0 height in music. Although Lee and Hung’s 共2008兲 onset-only Mandarin tone stimuli were used to evaluate the use of F0 height for lexical tone identification, those English-speaking participants were already exposed to all speaker voices from the intact and silent-center syllable presentations. In addition, none of the musicians actually possessed absolute pitch, making it difficult to interpret the lack of correlation between the performance in the absolute pitch task and the Mandarin tone task. In the current study, we extended Lee and Hung 共2008兲 and Lee 共2009兲 by investigating absolute pitch and identification of onset-only Mandarin tones in Mandarin-speaking musicians.
90 80
Accuracy (%)
70 60 50 40 30 20 Piano Pure tone Viola
10 0 0
1
2
3
Number of Semitone Errors Allowed
FIG. 1. A box plot showing the accuracy of musical note identification for piano, pure tone, and viola stimuli as a function of number of semitone errors allowed.
lected from the same pitch range 共C3 – B5兲 and were also 500 ms long. As in the actual experiment, any two consecutive notes were separated by more than an octave. None of the practice notes appeared in the actual experiment. No feedback was given. Participants were tested individually in a sound-treated booth in the Phonetics Laboratory at National Taiwan Normal University. They listened to the stimuli through a pair of headphones 共Philips SHP2500兲 connected to a Windows laptop computer 共Panasonic CF-Y2兲. The participants were told that they would be listening to 9 blocks of 12 notes of 3 timbres ranging from C3 to B5. Their task was to notate the notes that they had heard on a customized staff paper immediately after each note was played, and to apply accidental signatures if applicable. The participants were also told that there would be 5 s to respond to each note and a 10 s break between blocks. 4. Data analysis
The written responses were graded by the second author of this article 共a musician兲. One-way repeated measures analyses of variance 共ANOVAs兲 were conducted on response accuracy with timbre 共pure tone, piano, and viola兲 as a within-subject factor and participants as a random factor. To allow a direct comparison with Lee and Hung 共2008兲, four dependent variables were used: 共1兲 percentage of accurate note identification, 共2兲 percentage of accurate note identification allowing one-semitone errors, 共3兲 percentage of accurate note identification allowing two-semitone errors, and 共4兲 percentage of accurate note identification allowing threesemitone errors. When a main effect from the ANOVAs was significant, the Bonferroni post-hoc test was used for pairwise means comparisons to keep the family-wise type I error rate at 5%. B. Results
Figure 1 shows a box plot of the accuracy of musical note identification, arranged by timbre and the number of semitone errors allowed. When an exact match was required, 484
J. Acoust. Soc. Am., Vol. 127, No. 1, January 2010
average accuracy was highest for piano notes 共mean = 86% , SD= 21.3兲, followed by viola 共mean= 81% , SD = 24.1兲 and pure tone 共mean= 72% , SD= 25.5兲 notes. The ANOVA revealed a significant timbre effect 关F共2 , 142兲 = 35.34, p ⬍ 0.0001兴. All three pair-wise means comparisons were significant. Since there are 12 semitones within an octave, chance level performance is 8.3%. These averages therefore indicated that the musicians’ identification accuracy was substantially beyond chance. Inspection of individual data showed that 52 of the 72 participants 共72%兲 met the criterion for absolute pitch as defined in Deutsch et al. 共2006兲, i.e., achieving at least 85% accuracy in the piano task. When one-semitone errors were allowed, accuracy was expectedly higher. In particular, piano stimuli generated higher average accuracy 共mean= 91% , SD= 16.3兲 than viola 共mean= 88% , SD= 19.3兲 and pure tone 共mean= 86% , SD = 20.9兲 stimuli. The ANOVA revealed a significant timbre effect 关F共2 , 142兲 = 8.7, p ⬍ 0.0005兴. Pair-wise means comparisons showed that the two comparisons involving piano were significant. Permitting one-semitone errors effectively allows a range of three semitones, indicating a chance level of 25%. These averages indicated that the musicians’ identification accuracy was again substantially beyond chance, as expected from the results based on an exact match. Inspection of individual data showed that 59 of the 72 participants 共82%兲 met the 85% accuracy criterion for absolute pitch. When two-semitone errors were allowed, piano stimuli still generated higher average accuracy 共mean= 93% , SD = 12.9兲 than viola 共mean= 90% , SD= 15.6兲 and pure tone 共mean= 90% , SD= 17.5兲 stimuli. The ANOVA revealed a significant timbre effect 关F共2 , 142兲 = 5.68, p ⬍ 0.005兴. Pairwise means comparisons showed that the two comparisons involving piano were significant. Permitting two-semitone errors effectively allows a range of five semitones, indicating a chance level of 41.7%. These averages indicated that the musicians’ identification accuracy was again substantially beyond chance, as expected from the results based on an exact match. We do not report the number of individuals attaining 85% accuracy when two- or three-semitone errors are allowed because accuracy measured with these deviations is no longer considered qualifying for absolute pitch. When three-semitone errors were allowed, identification accuracy became comparable among the three timbres: piano 共mean= 93% , SD= 11.8兲, viola 共mean= 92% , SD= 12.1兲, and pure tone 共mean= 91% , SD= 15兲. The timbre effect was not significant 关F共2 , 142兲 = 3.02, p = 0.0052兴. Permitting three-semitone errors effectively allows a range of five semitones, indicating a chance level of 58.3%. These averages indicated that the musicians’ identification accuracy was again substantially beyond chance, as expected from the results based on an exact match. To evaluate the hypothesis that absolute pitch is associated with the age of onset of musical training 共Deutsch et al., 2006兲, Pearson’s correlation coefficients were derived between the four dependent variables and the age of onset of musical training. Fisher’s r to z transformation was then carried out to evaluate if the correlation coefficients were significantly different from zero. Consistent with the hypothC. Lee and Y. Lee: Perception of musical pitch and Mandarin tones
Author's complimentary copy
100
esis, the age of onset of musical training was negatively correlated with all four dependent measures: accuracy requiring exact match 共r = −0.222, p ⬍ 0.005兲, allowing onesemitone errors 共r = −0.237, p ⬍ 0.0005兲, two-semitone errors 共r = −0.232, p ⬍ 0.001兲, and three-semitone errors 共r = −0.232, p ⬍ 0.0005兲. These results indicated that the earlier the onset of musical training, the better the performance in the absolute pitch task.
in literature 共Lockhead and Byrd, 1981; Miyazaki, 1989兲. The current results are consistent with these findings, suggesting that the presence of timbre facilitates pitch height judgments. III. EXPERIMENT 2: MANDARIN TONE IDENTIFICATION A. Method 1. Materials
A considerable number of the Mandarin-speaking musicians in the current study met the criterion for absolute pitch: 72% when an exact match was required, and 82% when onesemitone errors were allowed. In contrast, with identical testing materials and procedure, Lee and Hung 共2008兲 found that none of the English-speaking musicians possessed absolute pitch. With piano stimuli only, Deutsch et al. 共2006兲 found significantly higher occurrence of absolute pitch in Mandarin-speaking musicians 共ranging approximately from 40% to 75%兲 than in English-speaking musicians 共ranging approximately from 0% to 15%兲 depending on age of onset of musical training. Deutsch et al. 共2009兲 showed that accuracy rate 共not percentage of listeners with absolute pitch兲 was at approximately 90% for the “tone very fluent” musicians, again depending on age of onset of musical training. This value is comparable to the accuracy obtained from the Mandarin-speaking musicians in the current study 共86%兲. Taken together, these results indicate that the occurrence of absolute pitch is positively associated with tone language fluency. However, there are differences other than tone language experience that could account for some of the aforementioned contrasts between tone and nontone language users in the present study. Age of onset of musical training, for example, appears to be an important factor. For example, the Mandarin-speaking musicians in the current study on average started their musical training earlier 共mean = 6.7 years, SD= 2.2兲 than the English-speaking musicians in Lee and Hung 共2008兲 共mean= 9.4 years, SD= 3.2兲. Inspection of data from Deutsch et al. 共2006, 2009兲 further showed that the occurrence of absolute pitch dropped significantly for musicians who started musical training after 8 or 9 years of age. Therefore, the lack of absolute pitch possessors in Lee and Hung 共2008兲 could reflect a joint contribution of tone language experience and age of onset of musical training. Nonetheless, the negative correlation between the age of onset of musical training and musical note identification accuracy is consistent with Deutsch et al. 共2006, 2009兲. The timbre effect found in the current study contrasted with the null finding from Lee and Hung 共2008兲. In particular, piano stimuli in the current study consistently resulted in the most accurate responses except when three-semitone errors were allowed. Viola stimuli also generated more accurate responses when an exact match was required. Compared to pure tones, complex sounds include a myriad of extraneous cues that may facilitate pitch identification 共Ward, 1999兲. The advantage of piano stimuli, in particular, has been noted J. Acoust. Soc. Am., Vol. 127, No. 1, January 2010
The materials used in this experiment were identical to the onset-only stimuli used in Lee and Hung 共2008兲. In particular, the Mandarin syllable sa, produced with all 4 tones by 16 female and 16 male native speakers, was used to generate the stimuli for this experiment. The recordings were made in a sound-treated booth with an Audio-technica AT825 field recording microphone connected through a preamplifier and analog-to-digital converter 共USBPre microphone interface兲 to a Windows personal computer. The speakers were instructed to read the syllables in citation form. The recordings were digitized with BLISS 共Mertus, 2000兲 at 44.1 kHz with 16-bit quantization. Each syllable was identified from the BLISS waveform display, excised from the master file, and saved as an audio file. The peak amplitude was normalized across syllables. Each sa syllable was digitally processed with BLISS to generate the onset-only syllables. In particular, all but the fricative and the first 15% of the voiced portion of the syllable was digitally silenced. There were no perceptible clicks as a result of the signal processing; therefore no further tapering procedure was applied. A total of 128 stimuli 共4 tones⫻ 32 speakers兲 were generated. Detailed acoustic analyses on these stimuli have been reported in Lee 共2009兲 and Lee and Hung 共2008兲. 2. Participants
The 72 musicians who participated in the musical note identification task also participated in this experiment. 3. Procedure
The stimuli were saved as individual audio files and imported to BLISS for stimulus presentation and response data acquisition. The 128 stimuli produced by the 32 speakers were assigned to 4 blocks such that each block included only 1 stimulus from a given speaker. In other words, each block had 32 stimuli and all stimuli were produced by different speakers. The purpose of this arrangement was to minimize familiarity with individual speaker voices that could be used for speaker normalization. Within each block, the number of male and female speakers was balanced 共16 males and 16 females兲, so was the number of the four tones 共8 stimuli for each of the 4 tones兲. Each participant was assigned a uniquely randomized presentation order such that no two participants received the same order of presentation. The order of presentation of the blocks was also randomized for each participant. A 10 s break was given between blocks. Participants were tested individually in the same soundtreated booth with the same equipments as in the previous
C. Lee and Y. Lee: Perception of musical pitch and Mandarin tones
485
Author's complimentary copy
C. Discussion
Analyses were conducted on response accuracy, reaction time, and tone confusion patterns. Identification response and reaction time were automatically recorded by BLISS. Since the duration of individual stimulus varied, reaction time was measured from stimulus offset to avoid the potential confound of the durational differences. Missing cells due to incorrect responses constituted 2.6% of the reaction time data; only correct responses were included in the reaction time analysis. ANOVAs were conducted on accuracy and reaction time with speaker gender 共female and male兲 and tone 共1, 2, 3, and 4兲 as within-subject factors, and participants 共F1兲 and stimulus items 共F2兲 as random factors. Responses to female and male stimuli were evaluated separately to allow a direct comparison with Lee 共2009兲. When a main effect from the ANOVAs was significant, the Bonferroni post-hoc test was used for pair-wise means comparisons to keep the familywise type I error rate at 5%. Tone confusion patterns were also analyzed to evaluate the types of tone errors made by the participants. Contingency table analyses with 2 tests were conducted to evaluate the dependency of tone identification responses on the stimuli.
B. Results 1. Accuracy
Figure 2 shows the accuracy of tone identification responses. The ANOVAs revealed significant main effects of speaker gender 关F1共1 , 71兲 = 44.77, p ⬍ 0.0001; F2共1 , 120兲 = 47.71, p ⬍ 0.001兴 and tone 关F1共3 , 213兲 = 77.23, p ⬍ 0.0001; F2共3 , 120兲 = 40.5, p ⬍ 0.0001兴 and a significant gender⫻ tone interaction 关F1共3 , 213兲 = 38.47, p ⬍ 0.0001; F2共3 , 120兲 = 10.63, p ⬍ 0.0001兴. In particular, female stimuli 共mean= 41% , SD= 22.7兲 were identified more accurately than male stimuli 共mean= 35% , SD= 19.2兲. When all stimuli were considered, tone 1 共52% , SD= 20.9兲 and tone 3 共47% , SD= 16.8兲 were identified more accurately than tone 4 共30% , SD= 18.3兲 and tone 2 共22% , SD = 12.9兲. All pair-wise means comparisons were significant except the tone 1-tone 3 comparison. The interaction arose from the result that male tone 3 was identified more accurately than female tone 3; in contrast, other female stimuli were identified more accurately than the male stimuli. 486
J. Acoust. Soc. Am., Vol. 127, No. 1, January 2010
Female stimuli Male stimuli
Accuracy (%)
80
60
40
20
0
1
2
1
2
3
4
3
4
1500
1200
900
600
300
0
Tone
FIG. 2. Average accuracy and reaction time 共+SE兲 of identification of onsetonly Mandarin tones for female and male stimuli 共N = 72兲.
2. Reaction time
Figure 2 also shows the reaction time of tone identification responses. The ANOVAs revealed significant main effects of speaker gender 关F1共1 , 57兲 = 26.39, p ⬍ 0.0001; F2共1 , 120兲 = 23.31, p ⬍ 0.0001兴 and tone 关F1共3 , 171兲 = 19.17, p ⬍ 0.0001; F2共3 , 120兲 = 9.82, p ⬍ 0.0001兴. There was no interaction effect. Consistent with the accuracy results, responses to female stimuli 共mean = 861 ms, SD= 480兲 were faster than those to male stimuli 共mean= 992 ms, SD= 492兲. When all stimuli were considered, responses to tone 3 共772 ms, SD= 427兲 were fastest, followed by those to tone 1 共896 ms, SD= 464兲, tone 4 共946 ms, SD= 498兲, and tone 2 共1094 ms, SD= 517兲. All pair-wise means comparisons were significant except the tone 1-tone 4 comparison. These results are generally consistent with the accuracy findings that higher accuracy is associated with faster reaction time. 3. Tone confusion patterns
To inspect the type of tone errors made in this task, tone responses were tabulated to generate contingency tables. To evaluate the null hypothesis that tone identification responses were not related to the stimulus tones, expected responses were also calculated for 2 tests. Figure 3 shows the numbers of expected and actual tone responses to the tone stimuli. The expected frequencies were calculated as follows: Suppose Eij is the expected frequency for the cell in row i and column j, Ri and C j are the corresponding row and column totals 共marginal totals兲, and N is the total number of observations, then Eij = RiC j / N 共Cohen, 1996兲. Separate 2 tests were conducted for the female and male stimuli to allow a direct comparison with Lee 共2009兲. Both tests showed that the null hypothesis should be reC. Lee and Y. Lee: Perception of musical pitch and Mandarin tones
Author's complimentary copy
4. Data analysis
100
Reaction Time (ms)
experiment. The participants were told that they would be listening to the syllable sa with all four tones produced by 32 female and male speakers. They were also told that the syllables had been digitally processed such that only the very beginning of the syllables was audible. Their task was to identify the tone of each stimulus by pressing the buttons labeled “1,” “2,” “3,” and “4” on the computer keyboard, representing the four Mandarin tones. All participants indicated that they were familiar with the convention of designating Mandarin tones with the numbers. The participants were further told that they had 5 s to respond to each stimulus and that their response would be timed; therefore they should response as quickly as possible.
Female
Male
350
1152 737
165
39
211
458
273
200
864 2
355
305
281
211
228
196
483
243
3
108
163
483
398
132
157
594
269
576
288 4
584
166
1
2
446
200
48
353
391
200
3 4 1 2 Response Tone (Observed)
212
348
3
4
372
270
0
293
302
207
446
200
213
293
302
206
372
270
3
446
200
213
293
302
207
372
270 288
446 1
200 2
213
293
302
206
3 4 1 2 Response Tone (Expected)
372 3
270 4
−150
1
2
3
4
0
Tone 1 Tone 2 Tone 3 Tone 4
Male
250
576
4
−50
350
864 2
50
−350
# Observed−Expected
Stimulus Tone
213
150
−250
1152 1
Female
250
221 # Observed−Expected
Stimulus Tone
1
150 50 −50 −150 −250
jected: female, 2 共9 , N = 4607兲 = 1311.53, p ⬍ 0.0001; male, 2 共9 , N = 4605兲 = 602.11, p ⬍ 0.0001. 共The unequal N resulted from one and three missing responses in the female and male data, respectively.兲 These results indicated that the listeners’ tone identification responses were related to the tone stimuli. That is, the overall accuracy was different from chance level performance. A second set of contingency tables was generated based on specific tones. To evaluate tone 1 identification, for example, both the stimuli and responses were coded as tone 1 or nontone 1. Therefore, all responses could be classified into “hit” 共tone 1 stimulus→ tone 1 response兲, “miss” 共tone 1 stimulus→ nontone 1 response兲, “false alarm” 共nontone 1 stimulus→ tone 1 response兲, and “correct rejection” 共nontone 1 stimulus→ nontone 1 response兲. For the remaining three tones, the same coding procedure was applied to generate similar 2 ⫻ 2 contingency tables. As before, 2 tests were performed to test the null hypothesis that the responses were not related to the stimuli. For the female stimuli, all four 2 tests indicated that the null hypothesis should be rejected: tone 1, 2 共1 , N = 4607兲 = 412.81, p ⬍ 0.0001; tone 2, 2 共1 , N = 4607兲 = 89.37, p ⬍ 0.0001; tone 3, 2 共1 , N = 4607兲 = 561.15, p ⬍ 0.0001; tone 4, 2 共1 , N = 4607兲 = 21.93, p ⬍ 0.0001. For the male stimuli, all but the tone 2 result indicated that the null hypothesis should be rejected: tone 1, 2 共1 , N = 4605兲 = 144.68, p ⬍ 0.0001; tone 2, 2 共1 , N = 4605兲 = 0.83, n.s.; tone 3, 2 共1 , N = 4605兲 = 259.61, p ⬍ 0.0001; tone 4, 2 共1 , N = 4605兲 = 39.04, p ⬍ 0.0001. In sum, these results showed that tone identification performance was above chance level except for the responses to tone 2 stimuli produced by the male speakers. A final set of 2 tests was conducted based on F0 height of the tones: high-onset tones included tones 1 and 4, and low-onset tones included tones 2 and 3. As the acoustic J. Acoust. Soc. Am., Vol. 127, No. 1, January 2010
−350
1
2
3
4
Stimulus Tone
FIG. 4. Difference between the numbers of observed and expected Mandarin tone identification responses as a function of stimulus tone. Greater values indicate more confusing tone pairs, whereas smaller values indicate less confusing tone pairs. See text for details.
analyses in Lee 共2009兲 showed, dynamic F0 information was absent from these brief stimuli, but the F0 height difference remained between the high- and low-onset tones. If the identification data show that the high-low distinction can be detected beyond chance, the ability to estimate F0 height is most likely the basis for the tone identification performance. Indeed, when both the stimuli and responses were coded by F0 height, 2 tests showed that the high-low judgments were not random: For the female stimuli, 2 共1 , N = 4607兲 = 625.1, p ⬍ 0.0001; for the male stimuli, 2 共1 , N = 4605兲 = 258.49, p ⬍ 0.0001. These results indicate that the listeners were capable of judging the high-low distinction above chance. The contingency tables also provided information about the confusion patterns among the four tones. To facilitate the interpretation of the patterns, Fig. 4 shows the difference between the expected and actual response counts as a function of stimulus tone. Each bar represents the result of subtracting the number of expected responses from the number of actual responses in each cell of Fig. 3. Positive or greater values indicate that the response tone is more confusable with the stimulus tone. In contrast, the smaller or more negative a value is, the less confusable the response tone is with the stimulus tone. There are slight differences between responses to the female and male stimuli, but the overall patterns are similar. In particular, tone 1 was least often misidentified as tone 3; tone 2 was most confused with tone 3 and less often identified as tone 1 or 4; tone 3 was rarely misidentified as tone 1; tone 4 was most often misidentified as tone 1 and least misidentified as tone 3. These results are consistent with the ob-
C. Lee and Y. Lee: Perception of musical pitch and Mandarin tones
487
Author's complimentary copy
FIG. 3. Contingency tables showing the number of observed 共top兲 and expected 共bottom兲 Mandarin tone identification responses. The expected frequencies are theoretical frequencies derived from the row and column totals from the observed contingency tables. See text for details.
C. Discussion
The response patterns of the Mandarin-speaking musicians closely resembled those of the Mandarin-speaking nonmusicians reported in Lee 共2009兲. In particular, they were able to identify onset-only, isolated, multispeaker Mandarin tones with accuracy beyond chance. With the exception of male tone 2 stimuli, tone identification responses were contingent on the tone stimuli, indicating the listeners’ ability to distinguish between target and non-target tones. Given the brevity of the stimuli, dynamic F0 information was not available 共Greenberg and Zee, 1979; Lee, 2009兲; therefore F0 height detection was the most likely basis for the identification performance. Indeed, the 2 analyses by F0 height showed that the listeners were able to distinguish between high- and low-onset tones beyond chance. This finding was further corroborated by the confusion pattern that tones with distinct F0 heights tended to be less confusable. In sum, musicians or not, native Mandarin listeners were able to use F0 height for the identification of these brief Mandarin tones. IV. CORRELATION BETWEEN MANDARIN TONE IDENTIFICATION AND MUSICAL NOTE IDENTIFICATION
Since F0 height detection was involved in both musical note identification and Mandarin tone identification, did those musicians with absolute pitch perform better in the Mandarin tone identification task? Correlation analyses were conducted to quantitatively evaluate the relationship between the musical note and Mandarin tone identification performances. Pearson’s correlation coefficients were derived between accuracy of musical note identification 共averaged across all three timbres; exact match required or onesemitone errors allowed兲 and that of Mandarin tone identification 共averaged across all stimuli兲. Fisher’s r to z transformation was carried out to evaluate if the correlation coefficients were significantly different from zero. Two sets of analyses were conducted, one with all musicians 共72 individuals兲 and the other with those who possessed absolute pitch 共the 53 individuals who achieved an average accuracy of 85% or higher in musical note identification兲. The correlation coefficients were generally low and none of them turned out significantly different from zero. In particular, with all musicians included, the correlation coefficients were ⫺0.03 共exact match required兲 and ⫺0.03 共onesemitone errors allowed兲. When only individuals with absolute pitch were included, the coefficients were ⫺0.16 共exact match required兲 and ⫺0.07 共one-semitone errors allowed兲. In other words, we did not find evidence that performance in the musical note identification task was associated with performance in the Mandarin tone identification task. V. GENERAL DISCUSSION
We evaluated Mandarin-speaking musicians’ identification of musical notes without a reference pitch and their 488
J. Acoust. Soc. Am., Vol. 127, No. 1, January 2010
identification of onset-only, isolated, multispeaker Mandarin tone stimuli that required detection of F0 height. The data supported several predictions made earlier. First, absolute pitch was quite prevalent in the Mandarin-speaking musicians, particularly when compared to the English-speaking musicians tested with the same materials and procedure 共Lee and Hung, 2008兲. The percentage of absolute pitch possessors among the Mandarin-speaking musicians was also comparable to the Mandarin-speaking musicians in Deutsch et al. 共2006兲 and the tone very fluent musicians in Deutsch et al. 共2009兲. These results, based on samples of musicians drawn from China, Taiwan, and the United States, consistently suggest that absolute pitch is associated with tone language experience; i.e., absolute pitch is more prevalent in musicians who speak a tone language fluently. As Deutsch et al. 共2009兲 noted, genetic factors are not likely the reason for the contrast because the “tone nonfluent” musicians of East Asian heritage in their study showed comparably low accuracy as the Caucasian musicians did in the absolute pitch task. The country where the musicians received their education is also not likely the cause because the tone very fluent United States musicians displayed comparable level of performance as the musicians did who were educated in China. Second, performance in the absolute pitch task was negatively correlated with the age of onset of musical training. This finding is consistent with Deutsch et al. 共2006, 2009兲 showing greater occurrences of absolute pitch in musicians who started their musical training early. The age of onset effect also supports the idea that the acquisition of absolute pitch is analogous to the acquisition of speech in that a critical period is implicated 共Deutsch et al., 2006兲. As noted earlier, the age of onset of musical training could have contributed to the absence of absolute pitch in the Englishspeaking musicians in Lee and Hung 共2008兲. In addition, Deutsch et al. 共2006兲 showed that absolute pitch was consistently more prevalent in Mandarin-speaking musicians when the age of onset of musical training was controlled, further affirming the contribution of tone language experience to absolute pitch. Third, the Mandarin-speaking musicians were able to identify the onset-only, isolated, multispeaker Mandarin tone stimuli with accuracy beyond chance, just as their nonmusician counterparts did 共Lee, 2009兲. As noted, tone identification from these brief stimuli was quite challenging because dynamic F0 information was neutralized. Consequently, listeners had to rely primarily on F0 height for tone identification. Furthermore, the decision of whether a tone is phonologically high or low had to be made with respect to individual speaking F0 range, which varied across speakers. For example, Lee’s 共2009兲 acoustic analyses showed considerable overlap between the female low-onset tones and the male high-onset tones. To decipher these acoustically similar tones, listeners had to come up with some estimate of speaking F0 range in order to decide whether a tone is phonologically high or low. Since the tone stimuli were presented in isolation and without repetition, there was no external context or familiarity with speaker voices to facilitate F0 range estimation. Nonetheless, both Mandarin-speaking nonmusicians 共Lee, 2009兲 and musicians 共the current study兲 were C. Lee and Y. Lee: Perception of musical pitch and Mandarin tones
Author's complimentary copy
servation that tone stimuli sharing a similar F0 height are more confusable, indicating that the high-low distinction could be detected by the listeners.
J. Acoust. Soc. Am., Vol. 127, No. 1, January 2010
revisiting because the same test materials and procedure were used as in the current study. In particular, they showed that English-speaking musicians were more accurate than nonmusicians in identifying Mandarin tones from intact and silent-center syllables. However, none of the musicians were found to possess absolute pitch, suggesting that their superior performance over nonmusicians did not arise from absolute pitch. Interestingly, the musicians were no more accurate than the nonmusicians in the onset-only tone stimuli, which, as we noted, were deprived of F0 contour and had to be identified based on F0 height. It would be of interest to find out if English-speaking musicians who actually possess absolute pitch will outperform nonmusicians in identifying onset-only Mandarin tones. The finding from the current Mandarin tone experiment replicated Lee 共2009兲 and is consistent with the idea that speakers of a linguistic community can acquire pitch class templates through long-term exposure to the prevalent speaking F0 of the community, and that these templates can influence both speech production and perception 共Deutsch, 1991; Deutsch et al., 2004b, 2009, 1990; Dolson, 1994; Honorof and Whalen, 2005兲. In particular, to judge whether a tone is high or low, a listener will have to consider the speaking F0 range of a speaker because a phonologically low tone for a female speaker could be acoustically equivalent to a phonologically high tone for a male speaker 共Lee, 2009兲. In the absence of cues commonly considered necessary for speaker normalization 共F0 contour, external context, and familiarity with speaker voice through repeated exposure兲, Mandarin speakers were nonetheless able to identify the tones beyond chance. A similar result was obtained by Honorof and Whalen 共2005兲, who showed that English-speaking listeners were able to locate an F0 reliably within a speaker’s F0 range without context or prior exposure to the speaker’s voice. Since F0 height judgments were positively correlated with the female and male F0 range values, Honorof and Whalen 共2005兲 noted that the ability to detect speaker gender could be the basis for the F0 judgment performance. This speculation was supported by Lee’s 共2009兲 acoustic data and correlation analyses. In particular, Lee 共2009兲 showed that F0 covaried with two voice quality measures 共F1 bandwidth and spectral tilt兲 and that Mandarin tone classification based on F0 height also correlated with these acoustic measures. These results suggest that the Mandarin tone identification performance could be due to the listeners’ successful identification of speaker gender as a precursor. That is, listeners could identify speaker gender based on the voice quality measures and then exploit the covariation between voice quality and F0 for tone identification. Once the gender decision is made, pitch class templates that are gender-specific could be invoked to compare to the tone stimuli. The height of the tone stimuli and thus tone identity could then be inferred with the templates as a reference frame. VI. CONCLUSIONS
With the use of pitch for lexical contrasts, tone languages provide a unique opportunity for exploring the
C. Lee and Y. Lee: Perception of musical pitch and Mandarin tones
489
Author's complimentary copy
able to identify the tones of the stimuli with accuracy exceeding chance, indicating their ability to judge F0 height from these rather impoverished tone stimuli. Since F0 height judgment was involved in both absolute pitch and Mandarin tone identification, it seems reasonable to assume that performance in the two tasks would be correlated. That is, musicians who scored higher in the absolute pitch task should be more accurate in the Mandarin tone identification task. However, results from the correlation analyses did not support this prediction. Correlation coefficients, whether obtained with all musicians or just those with absolute pitch, were generally low and did not differ significantly from zero. One possibility for the lack of correlation is that there was not sufficient performance variability in the absolute pitch and Mandarin tone identification measures. In particular, accuracy in the absolute pitch task was generally high and only a small number of individuals did not qualify for absolute pitch. In contrast, Mandarin tone identification accuracy was generally low even though it did exceed chance. Perhaps one or both of these measures were simply not sensitive enough to detect the correlation. However, the nonsignificant correlation could also reflect a true lack of association between absolute pitch and Mandarin tone identification. That is, even though both require associating pitch with a verbal label, distinct processing mechanisms may be implicated. Considering the functional difference between absolute pitch and lexical tone, this seems to be an equally plausible interpretation. In particular, while absolute pitch involves associating a particular pitch with a verbal label, lexical tone involves associating a pitch pattern and other acoustic cues to phonetic categories that bear linguistic significance. In addition to F0, other acoustic parameters such as duration and amplitude also contribute to the perception of lexical tone 共e.g., Whalen and Xu, 1992; Liu and Samuel, 2004兲. In other words, lexical tone perception is not just pitch perception, rather it involves mapping all relevant acoustic information from the output of a vocal tract onto linguistically significant phonetic categories. Furthermore, all normally developing speakers of a tone language will acquire the association between acoustic correlates of lexical tone and tonal categories without explicit instruction. The verbal labels for absolute pitch, however, have to be taught. Anecdotal experience also suggests that many tone language users are not able to identify or produce musical pitch reliably 共e.g., singing兲, yet they have no trouble identifying or producing lexical tones. These observations suggest that linguistic and nonlinguistic pitches may involve distinct processing mechanisms. These observations are also consistent with behavioral and neurophysiological evidence showing that how an acoustic attribute is processed depends on whether it is interpreted linguistically 共e.g., Remez et al., 1981; Gandour et al., 1998兲. However, the above interpretation does not imply that pitch processing in music and lexical tone is completely dissociated. Many studies have shown the advantage of musical training in lexical tone production and perception 共Alexander et al., 2005; Gottfried, 2007; Gottfried and Riester, 2000; Gottfried et al., 2001; Lee and Hung, 2008; Wong et al., 2007兲. Lee and Hung’s 共2008兲 results are particularly worth
ACKNOWLEDGMENTS
We would like to thank Naomi Ogasawara of National Taiwan Normal University for providing sound booths for the perception experiments. We also thank Diana Deutsch and an anonymous reviewer for their helpful comments. This research was supported by professional development funds from Ohio University. Alexander, J., Wong, P. C. M., and Bradlow, A. 共2005兲. “Lexical tone perception in musicians and nonmusicians,” Proceedings of Interspeech’ 2005—Eurospeech–9th European Conference on Speech Communication and Technology, Lisbon, Portugal. Burnham, D., Peretz, I., Stevens, K., Jones, C., Schwanhäusser, B., Tsukada, K., and Bollwerk, S. 共2004兲. “Do tone language speakers have perfect pitch?,” in Proceedings of the 8th International Conference on Music Perception and Cognition, Evanston, IL, edited by S. D. Libscomb, R. Ashley, R. O. Gjerdingen, and P. Webster 共Causal Productions, Adelaide, Australia兲. Cohen, B. H. 共1996兲. Explaining Psychological Statistics 共Brooks/Cole, Pacific Grove, CA兲. Deutsch, D. 共1991兲. “The tritone paradox: An influence of language on music perception,” Music Percept. 8, 335–347. Deutsch, D. 共2002兲. “The puzzle of absolute pitch,” Curr. Dir. Psychol. Sci. 11, 200–204. Deutsch, D. 共2006兲. “The enigma of absolute pitch,” Acoust. Today 2, 11– 19. Deutsch, D., Dooley, K., Henthorn, T., and Head, B. 共2009兲. “Absolute pitch among students in an American music conservatory: Association with tone language fluency,” J. Acoust. Soc. Am. 125, 2398–2403. Deutsch, D., Henthorn, T., and Dolson, M. 共2004a兲. “Absolute pitch, speech, and tone language: Some experiments and a proposed framework,” Music Percept. 21, 339–356. Deutsch, D., Henthorn, T., and Dolson, M. 共2004b兲. “Speech patterns heard early in life influence later perception of the tritone paradox,” Music Percept. 21, 357–372. Deutsch, D., Henthorn, T., Marvin, E., and Xu, H. 共2006兲. “Absolute pitch among American and Chinese conservatory students: Prevalence differences, and evidence for a speech-related critical period,” J. Acoust. Soc. Am. 119, 719–722. Deutsch, D., North, T., and Ray, L. 共1990兲. “The tritone paradox: Correlate with the listener’s vocal range for speech,” Music Percept. 7, 371–384. Dolson, M. 共1994兲. “The pitch of speech as a function of linguistic community,” Music Percept. 11, 321–331. Fox, R. A., and Qi, Y.-Y. 共1990兲. “Context effects in the perception of lexical tones,” J. Chin. Linguist. 18, 261–284. Gandour, J. 共1983兲. “Tone perception in Far Eastern languages,” J. Phonet-
490
J. Acoust. Soc. Am., Vol. 127, No. 1, January 2010
ics 11, 149–175. Gandour, J., and Harshman, R. A. 共1978兲. “Crosslanguage differences in tone perception: A multidimensional scaling investigation,” Lang Speech 22, 1–33. Gandour, J., Wong, D., and Hutchins, G. 共1998兲. “Pitch processing in the human brain is influenced by language experience,” NeuroReport 9, 2115– 2119. Gottfried, T. L. 共2007兲. “Effects of musical training on learning L2 speech contrasts,” in Language Experience in Second Language Speech Learning, edited by O.-S. Bohn and M. J. Munro 共John Benjamins, Amsterdam兲, pp. 221–237. Gottfried, T. L., and Riester, D. 共2000兲. “Relation of pitch glide perception and Mandarin tone identification,” J. Acoust. Soc. Am. 108, 2604. Gottfried, T. L., Staby, A. M., and Ziemer, C. J. 共2001兲. “Musical experience and Mandarin tone discrimination and imitation,” J. Acoust. Soc. Am. 115, 2545. Greenberg, S., and Zee, E. 共1979兲. “On the perception of contour tones,” UCLA Working Papers in Phonetics 45, 150–164. Honorof, D. N., and Whalen, D. H. 共2005兲. “Perception of pitch location within a speaker’s F0 range,” J. Acoust. Soc. Am. 117, 2193–2200. Johnson, K. A. 共2005兲. “Speaker normalization in speech perception,” in The Handbook of Speech Perception, edited by D. B. Pisoni and R. E. Remez 共Blackwell, Malden, MA兲, pp. 363–389. Leather, J. 共1983兲. “Speaker normalization in perception of lexical tone,” J. Phonetics 11, 373–382. Lee, C.-Y. 共2009兲. “Identifying isolated, multispeaker Mandarin tones from brief acoustic input: A perceptual and acoustic study,” J. Acoust. Soc. Am. 125, 1125–1137. Lee, C.-Y., and Hung, T.-H. 共2008兲. “Identification of Mandarin tones by English-speaking musicians and nonmusicians,” J. Acoust. Soc. Am. 124, 3235–3248. Lee, C.-Y., Tao, L., and Bond, Z. S. 共2008兲. “Identification of acoustically modified Mandarin tones by native listeners,” J. Phonetics 36, 537–563. Lee, C.-Y., Tao, L., and Bond, Z. S. 共2009兲. “Speaker variability and context in the identification of fragmented Mandarin tones by native and nonnative listeners,” J. Phonetics 37, 1–15. Lee, C.-Y., Tao, L., and Bond, Z. S. “Identification of acoustically modified Mandarin tones by non-native listeners,” Lang. Speech 53 共to be published兲. Lin, T., and Wang, W. S.-Y. 共1984兲. “Shengdiao ganzhi wenti 共The issue of tone perception兲,” Zhongguo Yuyan Xuebao 共Bull. Chinese Linguistics兲, 2, 59–69. Liu, S., and Samuel, A. G. 共2004兲. “Perception of Mandarin lexical tones when F0 information is neutralized,” Lang Speech 47, 109–138. Lockhead, G. R., and Byrd, R. 共1981兲. “Practically perfect pitch,” J. Acoust. Soc. Am. 70, 387–389. Massaro, D. W., Cohen, M. M., and Tseng, C.-Y. 共1985兲. “The evaluation and integration of pitch height and pitch contour in lexical tone perception in Mandarin Chinese,” J. Chin. Linguist. 13, 267–289. Mertus, J. A. 共2000兲. The Brown Lab Interactive Speech System 共Brown University, Rhode Island兲. Miyazaki, K. 共1989兲. “Absolute pitch identification: Effects of timbre and pitch region,” Music Percept. 7, 1–14. Moore, C. B., and Jongman, A. 共1997兲. “Speaker normalization in the perception of Mandarin Chinese tones,” J. Acoust. Soc. Am. 102, 1864–1877. Patel, A. D. 共2008兲. Music, Language, and the Brain 共Oxford University Press, New York兲. Profita, J., and Bidder, T. G. 共1988兲. “Perfect pitch,” Am. J. Med. Genet. 29, 763–771. Remez, R. E., Rubin, P. E., Pisoni, D. B., and Carrell, T. D. 共1981兲. “Speech perception without traditional speech cues,” Science 212, 947–950. Ward, W. D. 共1999兲. “Absolute pitch,” in The Psychology of Music, 2nd ed., edited by D. Deutsch 共Academic, New York兲, pp. 265–298. Whalen, D. H., and Xu, Y. 共1992兲. “Information for Mandarin tones in the amplitude contour and in brief segments,” Phonetica 49, 25–47. Wong, P. C. M., and Diehl, R. L. 共2003兲. “Perceptual normalization for interand intra-talker variation in Cantonese level tones,” J. Speech Lang. Hear. Res. 46, 413–421. Wong, P. C. M., Skoe, E., Russo, N. M., Dees, T., and Kraus, N. 共2007兲. “Musical experience shapes human brainstem encoding of linguistic pitch patterns,” Nat. Neurosci. 10, 420–422. Zhou, N., Zhang, W., Lee, C.-Y., and Xu, L. 共2008兲. “Lexical tone recognition with an artificial neural network,” Ear Hear. 29, 326–335.
C. Lee and Y. Lee: Perception of musical pitch and Mandarin tones
Author's complimentary copy
music-language relationship. The present study contributes to literature by providing further evidence on the relationship between absolute pitch and lexical tone perception. We showed that the majority of the Mandarin-speaking musicians met the criteria for absolute pitch in the musical note identification experiment. The performance in the musical note identification task also correlated negatively with the age of onset of musical training. Like their nonmusician counterparts 共Lee, 2009兲, the musicians were able to identify onset-only, isolated, multispeaker Mandarin tones beyond chance. Although the ability to detect F0 height was implicated in both musical note identification and Mandarin tone identification, performances in the two tasks were not correlated. With results from this series of studies 共Lee, 2009; Lee and Hung, 2008兲, we suggest that speaker gender identification is a precursor to Mandarin tone identification from multispeaker input, and that musical abilities other than absolute pitch contribute to the musicians’ superior performance in Mandarin tone identification.