Temporal Coding of Tonal Harmony in the Auditory Nerve Mark Jude Tramo,*†‡ Peter A. Cariani,*† Bertrand Delgutte†‡
*Department of Neurology, Harvard Medical School and Massachusetts General Hospital, Edwards Research Building, Room 405, 55 Fruit Street, Boston, Massachusetts 02114-2696, USA †Eaton-Peabody Laboratory of Auditory Physiology, Department of Otology & Laryngology, Harvard Medical School and Massachusetts Eye and Ear Infirmary, 243 Charles St., Boston, Massachusetts, 02114, USA ‡Research Laboratory of Electronics, Department of Electrical Engineering and Computer Science, M.I.T., Cambridge, MA, 02139-4307,USA
Word counts: Abstract paragraph (177), Figure captions (753), Body (1084) Figures: (4); References (30); Estimated length: 2.5-3 pages
Correspondence regarding this manuscript should be addressed to Dr. Mark Jude Tramo Edwards Research Building, Room 405 Massachusetts General Hospital 55 Fruit Street Boston, Massachusetts 02114-2696, USA Telephone: Fax: Email:
(617) 726-5409 (617) 726-5457
[email protected]
Tramo, Cariani & Delgutte (2001)
One of the first applications of physical science to the study of perception was Pythagoras' discovery that simultaneous vibrations of two string segments sound harmonious when their lengths form small integer ratios (e.g. 1:2, 2:3, 3:4). Galileo1 postulated that tonal dissonance arises from temporal irregularities in eardrum vibrations that give rise to "ever-discordant impulses". Helmholtz2 observed that 20-130 Hz fluctuations in the acoustic envelope of dissonant tones sounded "unpleasant, jarring, and rough" and hypothesized that these rapid beats are "harsh and annoying to the auditory nerve".
Music theorists
and psychophysicists
have also emphasized the
importance of pitch cues associated with fundamental frequencies of vibration, especially in the bass register (e.g., Rameau's3 basse fondamentale). Modern empirical formulations recognize the contributions of both roughness and pitch cues and distinguish between sensory and cognitive aspects of harmony perception4-7. In the present paper, we show that information about the roughness and pitch of musical intervals is present in the temporal discharge patterns of the Type I auditory nerve fibers, which transmit information about sound from the inner ear to the brain.
One hundred and thirty-four fibers having a wide range of characteristic frequencies (CFs), intensity thresholds, and spontaneous discharge rates were sampled in seven Dial-anesthetized cats . We recorded responses evoked by simultaneous tones comprising four musical intervals, each having two different spectral configurations (pure tone versus harmonic complex tone, Fig. 1). These stimuli allowed us to critically evaluate which aspects of neural responses might account for patterns of perceptual judgments observed in twelve studies of consonance and dissonance from Germany, Japan, the Netherlands, and the United States.8-18(Figs. 3b, 4e).
2
Tramo, Cariani & Delgutte (2001)
Animal preparation, stimulus delivery, and microelectrode recording techniques are standard for our laboratory 19. Post-stimulus time (PST) histograms of single-fiber responses to a minor second composed of two pure tones, which is dissonant, showed fluctuations in discharge rate that followed the temporal envelope of the acoustic waveform (Figs. 2c,d). The global envelope periodicity, 34.1 ms (Fig. 2c, red bar), equaled the reciprocal of the frequency difference (29 Hz) between the 440 Hz and 469 Hz tones. This periodicity lies within the range of frequencies associated with perception of roughness (20-130 Hz).
Embedded within the global envelope periodicity is
another periodicity (2.2 ms, purple bar), which corresponds to the mean frequency (454 Hz) and pitch of the tone combination. Discharge patterns evoked by a perfect fifth, which is consonant., contained pitch-related periodicities, including the pitch of the basse fondamentale, but none in the roughness range (Fig. 2d,e) . Whereas temporal discharge patterns evoked by a pure tone interval were similar for all fibers sensitive to the interval's spectral components, those evoked by a complex tone interval were different for fibers with different CF's.
Consistent with Helmholtz's theory, the response
periodicity equaled the reciprocal of the frequency difference between the two partials closest to CF. For example, a minor second composed of two harmonic complex tones, which is dissonant, contains several pairs of adjacent partials with frequency differences in the 20-130 Hz range (Fig. 1). For fibers with low CF's near the first two partials (440, 469 Hz), the global envelope periodicity was 29 Hz (Fig. 2g, red bar); for fibers with CF''s near the fifth and sixth partials (1320 and 1408 Hz), it was 88 Hz (Fig. 1h, red bar). For perfect fourths and fifths composed of two complex tones, which are consonant, adjacent partials are spaced relatively far apart (Fig. 1)
3
Tramo, Cariani & Delgutte (2001)
and there is overlap between some partials (Fig. 1, green lines). Consequently, few fibers showed response periodicities in the roughness range, regardless of CF. For a tritone composed of two harmonic complex tones, an intermediate number of fibers showed reponse periodicities in the roughness range, consistent with this stimulus' intermediate number of closely spaced partials and perceived consonance (Fig. 1). If the observed fluctuations in the response envelopes encode roughness, the relative magnitudes of 20-130 Hz fluctuations evoked by different musical intervals should parallel their relative dissonance. We derived a quantitative estimate of physiological roughness by passing PST histograms through a bandpass filter whose response characteristic matched the population modulation transfer function of inferior colliculus (IC) neurons, which respond best to waveform envelope fluctuations in the frequency range associated with roughness (Fig. 3a) 20. For both pure tone intervals and complex tone intervals, the relative magnitudes of the filter output summed across all frequency bands (Fig. 3b) correlated inversely with relative consonance (Fig. 3c, rpure tones
= -0.89, rcomplex tones = -0.99). Similar results were also obtained when population-wide PST
histograms were constructed first and then passed through the filter (rpure tones = -0.93, rcomplex tones = -0.97). In view of the importance of pitch to harmony perception7,21-23, we also sought physiological correlates of the pitches associated with each musical interval. Musical instruments and the human vocal apparatus produce notes with harmonically related components that fuse together and evoke a strong sense of pitch. From a Gestalt perspective, Stumpf argued that consonance depends on the extent to which individual notes in an interval or chord fuse into a perceptual whole
7,18,24
. We derived a physiological measure of pitch fusion based on the salience of the
4
Tramo, Cariani & Delgutte (2001)
strongest pitch in each musical interval. We analyzed the distribution of all-order interspike intervals (ISIs) in the entire sample population of auditory nerve fibers (Figs. 4a-d). Dominant ISI patterns in the population distribution have been previously shown to predict the pitches of a wide variety of harmonic and inharmonic stimuli
19,25
. Consonant intervals composed of two
harmonic complex tones produced population ISI distributions with simpler patterns of major and minor peaks (Fig. 4c, d) than did dissonant intervals (Fig. 4a, b). For consonant intervals, the dominant pattern (bars, Fig. 4c, d) corresponded to the implied fundamental bass of the interval. We based our physiological measure of pitch fusion on the strength of the dominant ISI pattern in the virtual pitch range. Our estimates of physiological pitch fusion for pure and complex tone intervals (Fig. 4e) corresponded well to perceived consonance ratings (Fig. 4f, rpure
tones
= 0.99,
rcomplex tones = 0.94). The present findings extend previous evidence for the temporal coding of complex sounds in the auditory nerve26,27 to include representations of musical intervals, the building blocks of harmonic structure in music. At the initial stage of neural input into the brain, the spectral sensitivity range of an auditory nerve fiber determines which acoustic features are represented in its temporal discharge pattern. Within a given frequency channel, partial information about both roughness and pitch is available. Since correspondences between perceptual attributes and discharge patterns are found for the entire ensemble of auditory nerve fibers, information about roughness and pitch may be integrated across frequency channels at a later stage of processing within the central auditory nervous system. Roughness could potentially be represented in more central stations by discharge rate fluctuations in populations of neurons in the inferior colliculus 28
and primary auditory cortex 29; at present, the central representation of pitch remains obscure.
5
Tramo, Cariani & Delgutte (2001)
Our findings point to two candidate codes for tonal consonance and dissonance in the auditory nerve that co-exist on different time scales within the same spike trains. The two codes may subserve different aspects of consonance perception. One takes the form of coarse temporal fluctuations in fiber discharge rates in the 20-130 Hz range. These rate fluctuations may give rise to the perception of roughness, which contributes to dissonance. The second candidate code carries fine-timing information in the interspike interval distribution of the entire auditory nerve fiber population. Regularities in interspike interval patterns may foster the perceptual fusion of salient pitches into a unitary, consonant whole.
REFERENCES 1.
Galileo. Dialogues Concerning Two New Sciences (Dover Publications Inc., 1954, New York, 1638).
2.
Helmholtz, H. On The Sensations of Tone as a Physiological Basis for the Theory of Music (Dover reprint (1954), New York, 1885).
3.
Rameau, J. P. Treatise on Harmony (Dover (1971 reprint), New York, 1722).
4.
Terhardt, E. The Concept of Musical Consonance: A Link Between Music and Psychoacoustics. Music Perception 1, 276-295 (1984).
5.
Krumhansl, C. L. Cognitive Foundations of Musical Pitch (Oxford University Press, New York, 1990).
6.
Deutsch, D. (ed.) The Psychology of Music (Academic Press, San Diego, 1999).
7.
Bregman, A. S. Auditory Scene Analysis: The Perceptual Organization of Sound. (MIT Press, Cambridge, MA, 1990).
8.
Malmberg, C. F. The Perception of Consonance and Dissonance. Psychol. Monogr. 25, 93-133 (1918).
9.
Guernsey, M. The role of consonance and dissonance in music. Amer. J. Psychol. 40, 173-204 (1928).
10.
Guthrie, E. R. & Morrill, H. The Fusion of Non-Musical Intervals. Amer. J. Psychol. 40, 624-625 (1928).
6
Tramo, Cariani & Delgutte (2001)
11.
Pratt, C. C. Some qualitative aspects of bitonal complexes. Amer. J. Psychol. 32, 490-515 (1921).
12.
Plomp, R. & Levelt, W. J. M. Tonal consonance and critical bandwidth. J. Acoust. Soc. Am. 38, 548-560 (1965).
13.
Brues, A. M. The fusion of non-musical intervals. Amer. J. Psychol. 38, 624-638 (1927).
14.
Van de Geer, J. P., Levelt, W. J. M. & Plomp, R. The connotation of musical consonance. Acta Psychologia 20, 308-319 (1962).
15.
Kameoka, A. & Kuriyagawa, M. Consonance theory I. Consonance of dyads. J. Acoust. Soc. Am. 45, 1451-1459 (1969).
16.
Butler, J. W. & Daston, P. G. Musical consonance as musical preference: A cross-cultural study. The Journal of General Psychology 79, 129-142 (1968).
17.
Kameoka, A. & Kuriyagawa, M. Consonance theory I. Consonance of dyads. II. Consonance of complex tones and Its calculation method. J. Acoust. Soc. Am. 45, 1451-1469 (1969).
18.
DeWitt, L. A. & Crowder, R. G. Tonal fusion of consonant musical intervals: The oomph in Stumpf. Perception and Psychophysics 41, 73-84 (1987).
19.
Cariani, P. A. & Delgutte, B. Neural correlates of the pitch of complex tones. I. Pitch and pitch salience. J. Neurophysiol. 76, 1698-1716 (1996).
20.
Delgutte, B., Hammond, R. M. & Cariani, P. in Psychophysical and Physiological Advances in Hearing (eds. Palmer, A. R., Rees, A., Summerfield, A. Q. & Meddis, R.) 521-527 (Whurr Publishers, London, 1998).
21.
Parncutt, R. Harmony: A Psychoacoustical Approach (Springer-Verlag, Berlin, 1989).
22.
Hartmann, W. M. in Auditory Function: Neurobiological Bases of Hearing (ed. Edelman, G. M.) 623-347 (John Wiley & Sons, New York, 1988).
23.
Moore, B. C. J., Peters, R. W. & Glasberg, B. R. Thresholds for the detection of inharmonicity in complex tones. J. Acoust. Soc. Am. 77, 1985 (1985).
24.
Schneider, A. in Music, Gestalt, and Computing (ed. Leman, M.) (Springer, Berlin, 1997).
25.
Meddis, R. & Hewitt, M. J. Virtual pitch and phase sensitivity of a computer model of the auditory periphery. I. Pitch identification. J. Acoust. Soc. Am. 89, 2866-2882 (1991).
7
Tramo, Cariani & Delgutte (2001)
26.
Rose, J. E. in Neural Mechanisms of Behavior (ed. McFadden, D.) 1-33 (Springer Verlag, New York, 1980).
27.
Young, E. D. & Sachs, M. B. Representation of steady-state vowels in the temporal aspects of the discharge patterns of populations of auditory nerve fibers. J. Acoust. Soc. Am. 66, 1381-1403 (1979).
28.
McKinney, M. F., Tramo, M. J. & Delgutte, B. in Physiological and Psychophysical Bases of Auditory Function (ed. Houtsma, A. J. M.) 71-77 (Shaker Publishing, Maastricht, 2001).
29.
Fishman, Y. I., Reser, D. H., Arezzo, J. & Steinschneider, M. Complex tone processing in primary auditory cortex of the awake monkey. I. Neural ensemble correlates of roughness. J. Acoust. Soc. Am. 108, 235-246 (2000).
30.
Pressnitzer, D., Patterson, R. D. & Krumboltz, K. The lower limit of melodic pitch. J. Acoust. Soc. Am. 109, 2074-2084 (2001).
Acknowledgments This work was supported by the NIH NIDCD and the McDonnell-Pew Program in Cognitive Neuroscience. We gratefully acknowledge helpful comments from D. Hubel, N.Y.S. Kiang, M . Livingstone, M. McKinney, M.C. Liberman, M.C. Brown, and R.C. Reid, and technical assistance from L. Liberman, B. Kiang, J. Flibotte, and A. Kridler.
8
Tramo, Cariani & Delgutte (2001)
FIGURE LEGENDS Figure 1 Stimuli. Four musical (harmonic) intervals, their names, frequency ratios (scale of just intonation), musical notation (G clef), and corresponding line magnitude spectra (all components 60 dB SPL, cosine phase). The fundamental frequency of the lowest note (root) in all intervals was 440 Hz (A4). Colours indicate harmonics of the 440 Hz root (blue), harmonics of the upper note (yellow), and their overlapping partials (green). Stimuli were digitally synthesized (16 bits, 100,000 samples/sec) and delivered to the ipsilateral eardrum using a calibrated, closed acoustic assembly (100 repetitions). Figure 2 Responses of two auditory nerve fibers with different CFs (rows) to different musical intervals (columns). a, b, Frequency-intensity response area (gray). Coloured squares indicate frequency components present in the complex minor second stimulus, as shown in Fig. 1a. c-h, Post-stimulus time (PST) histograms of fiber responses during the steady-state portion of the response (20-120 ms). i-k, Stimulus waveforms for the pure tone minor second, pure tone fifth, and complex tone minor second. c, Response of the low-CF fiber (CF=350 Hz) to a minor second composed of two pure tones. Red bars, Global envelope periodicity. structure periodicity.
Purple bars, local fine
d, Response of a higher-CF fiber (CF=1100 Hz) to a pure tone
minor second. Temporal features of discharge are similar to those of the low-CF fiber in c. The period of the response envelope evoked by a pure tone minor second was constant across a wide range of fiber CFs, provided that the two tones fell within the fiber's response area.
e, Response of the low-CF fiber to the pure tone fifth. f,
Response of the higher-CF fiber to the pure tone fifth. g, Response of the low-CF fiber to the complex tone minor second. h, Response of the higher-CF fiber to the complex tone minor second.
9
Tramo, Cariani & Delgutte (2001)
Figure 3 Quantitative analysis of physiological roughness.
a, left-to-right, Complex
tone minor second stimulus; PST neurogram showing the responses of 95 fibers grouped into 8 CF bands; band-pass filtering (BPF) using the average modulation transfer function of central nucleus neurons in the inferior colliculus (IC)20, which reflects envelope processing of the auditory pathway up through the IC; the rootmean-square (RMS) amplitude of each filter output weighted (W) according to the human CF distribution and summed (Σ) across all fibers. Variances of physiological roughness measurements were assessed by random selection with replacement of 800 response spike trains (100 from each CF band) to form 100 trial PST ensembles. b, Bar graphs showing total physiological roughness for each musical interval (error bars = 1σ). c, Summary of consonance ratings derived from 12 studies (7 used pure tone intervals8-14,17, 5 complex tone intervals8,9,16-18). Magnitude estimates from each study were normalized by subtracting the mean consonance rating of the four intervals in the study and dividing by their standard deviation5. Error bars indicate the standard deviation across studies. Figure 4 Quantitative analysis of physiological pitch fusion.
a-d, Interspike interval
(ISI) histograms of responses of all fibers to musical intervals composed of two harmonic complex tones. All-order ISI distributions (autocorrelation histograms, binwidth=0.1 ms) from each fiber response were weighted by CF band in proportion to the human CF distribution, summed together to form pooled ISI distributions, and normalized by their means19. ISIs were weighted (w) according to their duration (d) to take into account the decline in pitch strength for lower periodicities30, (w = 1 - d/33). Pitch strength was then estimated by computing the ratio of the mean number of ISIs in bins associated with a given pitch to the mean number of ISIs per bin in the entire distribution. For example, ISI bins related to a 220 Hz pitch (bars above histogram d) would include those within 0.1 ms of the pitch period (4.5 ms) and its integer
10
Tramo, Cariani & Delgutte (2001)
multiples. ISI sieves for dominant periodicities are shown above each histogram [a: the mean frequency of the two notes, 454 Hz (between A4 and B4-flat), b: the tritone’s pseudo-fundamental at 88 Hz (~F2), c: the 4th’s fundamental bass at 147 Hz (D3), d: the 5th’s fundamental bass at 220 Hz (A3)]. The physiological salience of each pitch was computed for all periodicities between 30-800 Hz (1 Hz steps). The maximum salience was used as the index of physiological pitch fusion. Variances of physiological pitch fusion measurements were assessed by random selection with replacement of 800 response spike trains (100 from each CF band) to form 100 trial PST ensembles. e, Physiological pitch fusion for each musical interval (error bars = 1σ). f, Summary of perceived consonance ratings, as in Fig. 3c.
11
Tramo, Cariani & Delgutte (2001)
MUSICAL INTERVAL STIMULI
Two Pure Tones
Two Complex Tones
60 dB SPL
MINOR SECOND 16/15
0 440 469 PERFECT 4th 4/3
400
2000
4000
400
2000
4000
400
2000
4000
400
2000
4000
60
0 440 TRITONE 45/32
587
60
0 440 PERFECT 5th 3/2
619
60
0 440
660
FREQUENCY (Hz)
12
Tramo, Cariani & Delgutte (2001)
RESPONSES OF SINGLE AUDITORY NERVE FIBERS PURE TONE TUNING CURVES
c
a
CF = 350 Hz
20
b 80
0
1k FREQUENCY (Hz)
PURE TONE PERFECT FIFTH
g
0
0
d
h
f
800
1200
800
0
0
0
2k
i
k
j
STIMULUS WAVEFORM
0
20
60
100
20
0
60
100
POST-STIMULUS TIME (ms)
13
COMPLEX TONE MINOR SECOND
1600
800
0
CF = 1100 Hz 0
e
1200 sp/s DISCHARGE RATE
SOUND PRESSURE (dB SPL)
70
PURE TONE MINOR SECOND
20
60
100
Tramo, Cariani & Delgutte (2001)
a
PHYSIOLOGICAL ROUGHNESS ESTIMATION Complex Tone Minor 2nd
Hz
CF BAND
BPF
> 8140 4180
3300
Stimulus Frequencies
2640
................
2420
BPF
< 220
......................
................
220
440
......................
................
660
880
RMS ......................
................
1100
1320
......................
................
1540
1760
......................
......................
................
1980
2200
......................
................
2860
Hz
RMS
................
......................
BPF 50
100
RMS
0
100
Post-Stimulus Time (ms)
Post-Stimulus Time (ms) 0 Magnitude -20 (dB)
TOTAL ROUGHNESS
BPF 0 300 Frequency (Hz)
b PHYSIOLOGICAL ROUGHNESS Pure Tones
Consonance rating (Normalized)
Total roughness
30 20 10 0
c
Complex Tones
m2
m2
Tri 4
5
4
Tri
2 1 0 -1 -2
5
PERCEIVED CONSONANCE Pure Complex Tones Tones
m2
m2
Tri 4
MUSICAL INTERVAL
5
4
Tri
MUSICAL INTERVAL
14
5
Tramo, Cariani & Delgutte (2001)
PHYSIOLOGICAL PITCH FUSION a
c
COMPLEX MINOR SECOND
COMPLEX FOURTH
Normalized interval density
3 2 1
b
3
d
COMPLEX TRITONE
COMPLEX FIFTH
2 1 0
10
20
30
0
10
30
20
Interspike interval (ms)
Max. salience
2
Pure Tones
Complex Tones
1.5
1
f Consonance rating (Normalized)
e PHYSIOLOGICAL FUSION
PERCEIVED CONSONANCE Pure Complex Tones Tones 2
1 0 -1 -2
m2
m2
Tri 4
5
4
Tri
5
m2
m2
Tri 4
5
4
Tri
MUSICAL INTERVAL
MUSICAL INTERVAL
15
5