Bispecturm of musical sounds : an auditory ... - Semantic Scholar

Bispecturm of musical sounds : an auditory perspective. Shlomo Dubnov and Naftali Tishby Institute for Computer Science and Center for Neural Computation and Dalia Cohen Department of Musicology Hebrew University, Jerusalem 91904, Israel E-mail [email protected]

1 Introduction

search methodology, i.e. starting with a physical/mathematical characterization of the acoustic signal we seek for it's perceptual meaning. Stochastic processes, such as acoustic signals, are characterized in general by an in nite series of correlation functions. An important subset of the processes, known as Gaussian, are completely determined by their autocorrelation - or equivalently - their power spectrum. Much of the acoustic signal processing so far is based on powerspectral properties. This is mainly because linear systems are fully determined by their effect on the spectrum, and linear systems are sucient to describe most acoustic phenomena, and are of course easier to understand. Yet, musical instruments have highly nonlinear characteristics which aect their tone, timber, and sound quality. In this paper we suggest the use of higher-order statistics (polyspectra) [1],[2] for the analysis and evaluation of acoustic signals and instruments. In recent years bispectral methods have been applied in various

The research into the realm of musical timbre has become in recent times one of the major research topics, central to the understanding of music and the musical practice itself. With more and more musicians and scientists being involved in the research, contributions considering both the artistic, psychophysical, cognitive and physical/mathematical properties of the musical signals, shed new light and understanding on this fascinating subject. In our research we chose to treat the issue mainly from the mathematical point of view, regarding the acoustic signal as a stochastic process and suggesting new aspects for it's treatment. In contrast to other musical properties such as pitch, interval and meter, that we clearly perceive and search for their physical characterization, the timbral parameter has neither simple perceptive characterization, nor obvious physical properties. Due to these reasons, the research of timbre takes sometimes a reverse re1

signal processing elds, such as sonar, radar, image processing, adaptive ltering, etc.[3],[4]. Surprisingly, polyspectra have almost not been used so far in auditory and acoustic signal processing, primarily due to the diculties in the estimation and analysis. These higher order correlations, known as cumulants, and their associated Fourier transforms, known as polyspectra, not only reveal all the amplitude information of the process, but also maintain the phase information. Since for Gaussian processes all the high (greater than second order) cumulants vanish, the third and fourth order polyspectra provide an indication to the non-Gaussian nature of a random process. These mathematical facts have interesting parallels in the acoustic realizations of signals, on which we focus in this paper. Polyspectra are the natural mathematical generalization of the power spectrum and as such naturally provide the next step in acoustic research. The bispectra contain more information about the signal then the power- spectrum, particularly about it's phase. In case of skewed signals, for instance, which have non-zero bispectrum, the signal can be reconstructed from it's bispectrum up to a constant time shift, much more then can be achieved from the power spectrum alone. In addition, it is easy to analyze and manipulate the bispectral characteristics of signals in linear systems and in quadratic non-linear systems. From the physical point of view, the bispectral parameters correspond, in some models to speci c mechanisms, such as characteristics of reverberant environments, or the non-Gaussian nature of a source signal passing through a resonator. This correspondence provides us with an insight to the questions of modeling particular systems,

and suggests new techniques for signal manipulation and synthesis. Finally, the acoustic perception of the bispectrum is one of the main issues of this study. We present several ideas and experiments concerning aspects of acoustic perception, sound quality, and the nature of the human auditory apparatus that are directly related to bispectra. The following issues are discussed in this paper:

Polyspectral criteria for \quality" design of musical instruments.

Eects of reverberation and cho-

rusing on the perception of tone color. Within this framework an arti cial all-pass reverberator is demonstrated. Tone separation by means of bispectral detection - questions of timbral fusion/segregation are believed to be in uenced by the presence of strong bispectral ingredient. The ear, though almost \blind" to phase, is sensitive to long term phase behavior. This phase coherence is clearly detected by bispectral estimators.

2 Mathematical Preliminaries In this section, multiple correlations and cumulants of time signals are de ned. To simplify the matters, the discussion focuses on discrete signals. Some of the important relationships and properties of nite impulse response (FIR) signals and linear, time invariant systems are brie y reviewed. The results for general non-linear systems are more complex and require the

use of Voltera-Wiener system theory [6], which are beyond the scope of this presentation.

tra in the frequency domain. The kthorder correlation, hk (i ; ::; ik? ) of a signal fh(i)gNi is de ned as

2.1 Multiple Correlations and Cumulants

hk (i1; ::; ik?1) =

Extracting information from a signal is a basic question in every branch of science. The lack of a complete knowledge of the signal exists in many physical settings due to the nature of the observed signal or the type of measurement devices. In information processing we encounter the inverse problem given the signal we want to extract information from it in order to perform basic tasks such as detection and classi cation. We presume that any biological information processing system acts in a similar manner. For instance, our ears perform analysis of the acoustic signal by extracting pitch and timbre information from it. To understand our motivation to study higher order correlations it is worthwhile to recapitulate brie y some of the reasons for using the ordinary double correlation. A customary assumption is that out ears perform spectral analysis of the incoming signal. Naturally, not all of the signal information is retained in our ears, and the simplest assumption is that the phase is neglected. It is well known that the amplitude of the Fourier spectrum is equivalent to the Fourier transform of the signal's autocorrelation. This double correlation in time domain is the basic type of information extracted from the signal by our ears. This information has the meaning of signal's spectral envelope in frequency domain. Now we intend to widen the scope of acoustical analysis by suggesting the use of triple, quadratic and higher correlations, which are known also as polyspec-

1

=0

1

X h(i)h(i+i )::h(i+i ? ) N

k

1

i=0

(1) and in frequency domain it corresponds to the kth-order spectrum Hk (! ; ::; !k? ) 1

=

X N

1

hk (i1; ::; ik?1) i1 ;::;ik?1=?N e?j!1 i1 :::?j!k?1 ik?1

= H (! ) : : : H (!k? ) H (?! ? : : : ? !k? ) (2) Under some common assumptions, the time domain correlation converges to the kth-order moment of the process. The kth-order cumulant is derived from the kth and lower order moments, and contains the same information about the process. We prefer to use cumulants in our de nition of spectra since for Gaussian processes all higher then second cumulants vanish. For zero mean sequences, the second and third order moments and cumulants coincide. Thus we arrive at an equivalent de nition of the kth-order spectrum as the (k ? 1)-D Fourier transform of the respective kthorder cumulant of the process . Let y(i) be the output of an FIR system h(i), which is excited by an input x(i), i.e. 1

1

1

y (i) =

1

X h(j)x(i ? j) N

(3)

j =0

Using the de nition (1) it is easy to show that yk (i ; ::; ik? ) 1

=

X 1

N

j1 ;:::;jk?1 =?N

hk (j1 ; :::; jk?1)

xk (i ? j ; :::; ik? ? jk? ) (4) 1

1

1

1

1

where yk ; hk ; xk are de ned as in (1). Further, employing (1) and (2) we arrive at the frequency domain relations Yk (! ; ::; !k? ) = Hk (! ; ::; !k? )Xk (! ; ::; !k? )(5) An important property of the polyspectra is that if we are given two signals f and g that originate from stochastically independent processes and their sum signal z = f + g, then Zk (! ; ::; !k? ) = Fk (! ; ::; !k? ) + Gk (! ; ::; !k? (6) ) This property is important when considering the perception of simultaneously sounding independent signals as will be discussed later. 1

1

1

1

1

1

1

1

3

1

1

1

1

Sound Quality of Musical Instruments

The rst attempts to use bispectral considerations for sound quality characterization can be traced down to Gerzon[5]. For a musical tone, the power spectrum analysis shows how the fundamental frequency and it's higher harmonics compose together to form the timbre of the tone. However, the power spectrum, being "phase-blind", cannot reveal the relative phases between the sound components. Although the human ear is almost deaf to the phase dierences, the ear can perceive time-varying phase differences. The bispectral analyzer is the generalization of the power spectrum to the third order statistics of the signal. The bispectrum reveals both the mutual amplitude and phase relation between the frequency components ! ; ! . If sound sources are stochastically independent, their bispectra will be the 1

2

sum of their separate bispectra. In order that a bispectral analyzer should be able to recognize the characteristic signature of the sound in the bispectral plane, the excitation of a given ! ; ! should be distinguishable from the background noise. Thus, a "good" instrument is supposed to produce a maximum bispectral excitation possible for a given signal energy. Stating the problem as "can we predict the properties of a Stradivaruis ?", Gerzon claimed that the design requirement for a musical instrument is that "they should have a third formant frequency region containing the sum of the rst two formant frequencies". Surprisingly enough this theoretical criterion seems to be satis ed by many orchestral instruments. For example, particular cases of Stradivarius violin (435 Hz, 535 Hz, 930 Hz), Contrabassoon (245 Hz, 435 Hz, 690 Hz) and Cor Anglais (985 Hz, 2160 Hz, 3480 Hz). In a later work, Lohman and Wirnitzer[11] analyzed two utes by calculating their bispectra. Their results demonstrate that a higher intensity of the phase of the complex bispectra is achieved for the ute of good quality. This also suggests that the intelligibility of speech could be determined by looking at the bispectral signature and might be even enhanced by adding an arti cial third formant to the sum of the momentary two lowest formant frequencies. Such a device can be easily constructed by means of a quadratic lter or other non-linear speech clipping system. One must note that such a simple device will modify the spectrum also, which might be undesirable. 1

2

4

Eects of Reverberation and Chorusing

Other more subtle problems of intelligibility can be considered by looking at eects of reverberation and chorusing. Being an important musical issue, we note, quoting Erickson[9], that "there is nothing new about multiplicity and the choric eect. What is new is the radical extension of the massing idea in contemporary music, and the range of its musical applications; but a great deal more needs to be known before the choric eect is fully understood or adequately synthesized". As mentioned previously, if the sounds are stochastically independent, then their bispectra will simply be the sum of the separate bispectra. Assume a sound source with energies S ; S ; S at frequencies ! ; ! ; ! = ! + ! and bispectrum level B at (! ; ! ) subject to reverberation eect. Now let us assume that this eect can be modeled as a linear lter acting as a reverberator added to the direct sound. Suppose that the eect of the reverberation only is to produce a proportionate spectrum energy kS ; kS ; kS at ! ; ! ; ! . A plausible model for the linear lter describing the reverberator part alone could be an approximation of it's impulse response by a long sample of a random Gaussian process. According to Eq.(10), the bispectral response of such a lter is zero, which results in zero bispectrum of the output signal. The total resultant signal contains a (stochastically independent) mixture of the direct and the reverberant sound. The spectral energy of the combined sound at ! ; ! ; ! will be (1+k)S ; (1+ k )S ; (1 + k )S at ! ; ! ; ! and bispec1

1

2

3

2

1

2

1

1

1

2

2

3

2

3

2

3

1

3

2

1

1

2

3

3

trum level B at ! ; ! . Naturally the proportion of the bispectral energy to the spectral energy of the signal deteriorated. For a signal with complex spectrum I (!), the power spectrum equals S (! ) =j I (! ) j and the bispectrum is B (! ; ! ) = I (! )I (! )I (?! ? ! ). Taking a bicoherence index b(! ; ! ) = B !1;!2 S !1 S !2 S ?!1 ?!2 1=2 we arrive at a dimentionless measure of the proportionate energy between the spectrum and the bispectrum of a signal. If b = bin for original signal, then after reverberation bout = (1 + k)? = bin. Thus for a reverberation energy gain k, the relative bispectral level has been reduced by a factor (1+ k)? = [5]. Now consider a very similar eect of chorusing. For N identical but stochastically independent sound sources the resultant spectral energies at ! ; ! ; ! = ! + ! are NS ; NS ; NS and the resulting bispectra is NB at ! ; ! . Comparing again the bicoherence indexes we arrive at bout = N ? = bin giving a relative attenuation of N ? = due to this chorus eect. It is worth mentioning once again the importance of stochastic independence. The chorusing as described above might be confused with a simple multiplication of the original signal energy by a gain factor N. Such a gain is not stochastically independent and the resulting bispectrum would be augmented by N = instead of N . Only a true lack of coherence between the replicated signals will cause the resulting bispectra to be actually NB . 1

2

2

1

2

(

( (

) (

1

2

1

1

)

) (

2

2

))

3 2

3 2

1

1

2

2

3

1

2

1

2

3

1 2

1 2

3 2

4.1 Experimental results

In order to demonstrate the above effects, we have performed analysis of sampled signals of solo instrument (Solo Viola) and of an orchestral section of the same instruments (Arco Vio-

Solo Viola

450 400 350 300 250 200 150 100 50 0

Arco Violas

450 400 350 300 250 200 150 100 50 0

Figure 1: Bicoherence index amplitude of Solo Viola and Arco Violas signals. The x-y axes are normalized to the Nyquist frequency (8 KHz). las). (The signals were recorded from a sample-player synthesizer and are believed to be true recordings of the above instruments.) The signals have very similar spectral characteristics and the "chorusing" feature, dominantly present in the "Arco Violas" signal, cannot be extracted from the spectral information alone. It has though it's manifestation in signal's bispectral contents. We plotted the amplitude of the bicoherence index for each of the two signals. As we can clearly see from Fig. 1, there is a signi cant reduction of the bispectral amplitude for the "ArcoViolas" signal. Note also that the bispectral excitation pattern is dierent for the two signals, with the "SoloViola" signal having few clear peaks while the "ArcoViolas" has a much more spread and noisy like pattern.

4.2 Arti cial all-pass lter

As seen from Eq.6, the bispectra of the output signal y(i) resulting from passing a signal x(i) through a linear lter h(i) equals to the product of their respective bispectra. An equivalent relation holds for the linear random process, i.e when the output signal results

from passing a stationary random signal through a deterministic linear lter. Consider now a device whose impulse response resembles a long segment of a Gaussian process. Although the lter might be on the overall deterministic, it could be considered as a random signal for any practical purpose. Applying, for instance, a bispectral analyzer of nal temporal aperture to such an impulse response, would average to zero the it's bispectral contents, giving us a lter with zero bispectral characteristics. Naturally, the output signal resulting from passing a deterministic signal through such a lter will have a zero bispectrum. Since the impulse response resembles a white noise signal, it's spectral characteristics are at, giving us an all-pass lter. Also, by properly scaling the impulse response we can assure that the lter gain equals 1. The following gure describes the result of passing the original "Solo Viola" signal through a linear lter whose impulse response was created by taking a 0.5 sec. sample of a Gaussian process. The bispectral analysis of the signal was performed by averaging over 32 frames of 16 msec. each. The subjective auditory result seems to resemble a reverberation

device. Fig.2 shows the bicoherence index of the signal after ltering.

150 100 50 0

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

0.5 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05

Figure 2: Bicoherence index amplitude of the output signal resulting from passing the "Solo Viola" signal through a Gaussian, 0.5 sec. long lter

5

Tone Separation and Timbral Fusion/Segregation

Among the various questions dealing with the timbral characteristics of sounds, the problem of concurrent timbres [7],[8] is basic to the musical practise itself, manifestating itself in a daily orchestration practise, choice of instruments and the ability to perceive and discriminate individual instruments in a full orchestral sound. Originally treated in semi-empirical way by the orchestration manuals, vague criteria for evaluating orchestral choices were presented. In recent times a more quantitative acoustical studies point out several features in the temporal and spectral behavior of the sounds which are pertinent for instrument recognition and modeling spectral blend. None of these attempts have realized the power of polyspectral techniques for the analysis of spectral blend. One of the most

basic applications of bispectral methods reported in literature is the detection of phase coupling between harmonic components of a signal [3]. Such a phase coupling exists in most sounds produced by a single musical instrument (except idealized sine-tone generators). As claimed in section 3, the two strongest lowest frequency components of a musical signal are assumed to be harmonically related to another strong components at a higher frequency. In other words, strong coupling should exist between two harmonic components of a sound and a component at their sum frequencies. Since the power spectrum suppresses all phase relations, it cannot provide the answer. The bispectrum, however, is capable of detecting and quantifying phase coupling. An illustrative example of such an eect can be found in [3]. where a `quadratic phase coupling' phenomenon is treated, which occurs due to quadratic nonlinearities existing in a stochastic process. The non-zero bispectrum does not depend on this particular mechanism and it will hold for any case of statistical dependence between the phases. Such dependence is natural for musical sounds as mentioned above. Now having in our disposal such a powerful tool for detecting coherence between spectral components of a signal we claim that the ear performs grouping of the various spectral components present in the sound, relating strong bispectral peaks to one source or another. Thus, spectral blend would be a blend between bispectral patterns with sounds of close bispectral signature being impossible for separation by a bispectral analyzer. Concluding this discussion we must mention that this bispectral mechanism is one among many others that in uence tone color separation/blending.

6

Conclusions and Future Trends

This paper presented an important additional characteristic of the musical timbre which originates in the nonGaussian and non-linear characteristics of the signal. The higher order spectral characteristics provide an "explanation" of various phenomena in auditory perception, which are basic to the understanding of the perception of tone color and are central to musical research and practice. Currently we are interested in applying the above ideas to a simple model that assumes a linear auto-regressive lter driven by a white non Gaussian noise (WNG) excitation. Although the details of this work are beyond the scope of this presentation, we shell mention that this model enables us to "lump" all of the non Gaussian/polyspectral properties of the signal upon the characteristics of the input. The purpose of the work is to derive an extended feature representation of the musical signal, which would capture more signal characterisitcs then the standard representations such as, for instance, the widely used LPC-coecient-based representation [12],[13],[14]. The relevance of these ideas to music lies further ahead then mathematical modeling of timbre. Direct applications lie in the eld of musical tone synthesis, understanding the mechanisms responsible for timbral fusion and segregation and also in the eld of musical theory. We expect that the bispectral characteristics would be closer to the musician's characterization of the `density' or `complexity' of the musical sound. Informally, one could say that the bispectrum contributes to some sort

of `focusing' quality of the signal, thus enabling us to distinguish between `focused' timbre versus `dispersed' or `chorused' timbre. These random uctuations between the harmonic partials of the sound add a sense of vitality to the signal, and we might suggest that there exists an analogy between this dispersion and the known meaningful variations in other musical parameters, such as the quality of intonation[15]. There is plenty of room for research in this area with several lines of investigation to be pursued. Mainly, there are many open psychoacoustical questions that need studying and extensive experimentation. Naturally, the implications of such results to modeling of the auditory mechanism could be very substantial. Also, it is widely recognized that most of the timbral characteristics are time dependent and thus cannot be analyzed by the stationary methods discussed in this work. An extension of polyspectral methods to transient signal analysis seem to yield promising results[16]. One could study the use of nonlinear lters [17],[18] for processing of audio signals and obtaining a better control over the higher order spectra. Additionally, adaptation of the above techniques to investigation of other musical parameters is suggested. Also, it might be possible to systematize the rules of orchestration or tone-color production in orchestral and electronic music by using a bispectral description. We hope to draw a bispectral description in a manner similar to the way the spectrum-related musical staves have systematized speech. If such a far fetched ideal could be accomplished, it would provide a huge leap towards creating `timbral composition' methodology, so desperately sought for in our days.

References [1] K. Haselmann, W. Munk, G. MacDonald, Bispectra of Ocean Waves, in Proceedings of the Symposium on Time Series Analysis, Brown University, June 11-14, 1962 (ed. Rosenblatt), New-York, Wiley, 1963. [2] D.R. Brillinger, An Introduction to Polyspectra, Ann. Math. Stat., Vol. 36, 1361-1374, 1965. [3] C.L. Nikias, M.R. Raghuveer, Bispectrum Estimation: A Digital Signal Processing Framework, Proceedings of the IEEE, Vol. 75, No. 7, July 1987 [4] J.M. Mendel, Tutorial on HigherOrder Statistics (Spectra) in Signal Processing and System Theory, Proceedings of the IEEE, Vol. 79, No. 3, July 1991 [5] M.A. Gerzon, Non-Linear Models for Auditory Perception, 1975, unpublished. [6] M. Schetzen, Nonlinear System Modelling Based on Wiener Theory, Proceedings of the IEEE, Vol. 69, No. 12, July 1981. [7] S. McAdams, A.Bregman, Hearing Musical Streams, Computer Music Journal 3 (4) : 26-43, 60, 63, 1979. [8] S. McAdams, Spectral Fusion, Spectral parsing and the Formation of Auditory Images, Ph.D. dissertation, Stanford University, CCRMA Report no. STAN-M-22, Stanford, CA., 1984. [9] R., Erickson, Sound Structure in Music, Berkely, CA, University of California Press.

[10] F.Winkel, Music, Sound and Sensation, New-York, Dover, 1967, pp. 12 - 23, 112 - 119 [11] A.Lohmann and B. Wirnitzer, Triple Correlations, Proceedings of the IEEE, Vol. 72, No. 7, July 1984. [12] R.Cann, An Analysis/Synthesis Tutorial, Computer Music Journal 3(3):6-11; 3(4):9-13, 1979; and 4(1), 36-42, 1980. [13] P.Lansky, Compositional Applications of Linear Predictive Coding, Current Directions in Computer Music Research, MIT Press, 1989. [14] R.Gray, A.H.Gray, G.Rebolledo, J.E.Shore, Rate-Distortion Speech Coding with a Minimum Discrimination Information Distrtion Measure, IEEE Transactions on Information Theory, 27 (6), November 1981. [15] D. Cohen, Patterns and Frameworks of Intonation, Journal of Music Theory, 1969. [16] J.R.Fonollosa, C.L.Nikias, Wigner Higher Order Moment Spectra: De nition, Properties, Computation and Application to Transient Signal Analysis, IEEE Transactions on Signal Processing, 41(1):245-266, January 1993. [17] G.L.Sicuranza, Quadratic Filters for Signal Processing, Proceedings of IEEE, 80(8):1263-1285, August 1992. [18] I.Pitas, A.N.Venetsanopoulos, Nonlinear Digital Filters, Kluwer Academic Publishers, 1990.