Document not found! Please try again

An Efficient Algorithm for the Calculation of a ... - Wellesley College

15 downloads 0 Views 427KB Size Report
(Received 5 February 1992; revised 10 April 1992; accepted 16 June 1992) ... Graphical examples of the application of this calculation to musical signals are ...
An efficient algorithm for the calculation of a constant Q transform Judith C. Brown

Physics Department,IVellesley College,I4'ellesley, Massachusetts 01281andMediaLaboratory, Massachusetts Instituteof Technology, Cambridge, Massachusetts 02139 MillerS.

Puckette

IRCAM, 31 rue St Merri, Paris 75004, France

(Received5 February1992;revised10April 1992;accepted16June1992) An efficientmethodof transforminga discreteFouriertransform(DFT) into a constantQ transform,whereQ is the ratio of centerfrequencyto bandwidth,hasbeendevised.This methodinvolvesthe calculationof kernelsthat are thenappliedto eachsubsequent DFT. Only a fewmultiplesareinvolvedin thecalculationof eachcomponent of the constantQ transform, sothistransformationaddsa smallamountto the computation.In effect,this methodmakesit possible to takefull advantage of thecomputational efficiency of thefastFouriertransform (FFT). Graphicalexamplesof the applicationof thiscalculationto musicalsignalsare given for soundsproducedby a clarinetand a violin. PACS numbers:43.60.Gk, 43.75.Yy, 43.75.Dc, 43.75.Ef

I. THEORY

In many cases,suchas that of musical signals,a constantQ transformgivesa betterrepresentation of spectral data than the computationallyefficientfast Fourier transform. Various solutionsto this problem usingconstantQ filterbanksor a "boundedQ" Fourier transformhavebeen proposed(Harris, 1976; Schwede,1983; Mont-Reynaud, 1985).The musicgroupat Marseilleshasproposed a "wavelet transform" (Kronland-Martinet,

1988). Brown (1991)

describes resultsappliedto musicalsignalsbasedon a direct evaluationof the DFT for the desiredcomponents. We have calculateda constantQ transformbasedon transforminga fastFourier transforminto thelog frequency domain.The FFT is calculatedusinga standardFFT program,andtheentirecalculationtakesonlyslightlylongerto run than the FFT sincethereare few operationsinvolvedin the computationof the transformation.The transformation is baseduponthe following.A constantQ transformcanbe calculateddirectly(Brown, 1991) by evaluating: = xcq[kcq]

,

(1)

n=O

whereXC•[kcq ] is the k,• component of theconstant Q transform.Herex In ] isa sampledfunctionof time,and,for

eachvalueof kc,,w[n,kc• ] isa windowfunction of length N [ k,•]. Theexponential hastheeffect ofa filterforcenter frequency In a constantQ filterbankthecenterfrequencies aregeometricallyspaced;for musicalapplications, the calculation isoftenbasedon the frequencies of the equaltemperedscale

ofthewindowing function w[ n,kcq ], butit isinversely proportional toN [ kc•]. Wemaytherefore keepQconstant by choosing values of N [ k•q] inversely proportional to those

of o%. Oftenthebandwidth ischosen aso•%+t --co%, which isproportional tothecenter frequencies a•n,, because of their geometricspacing.In the caseof the equaltempered scale, this leads to

Q = 1/(2 (•/t2)- 1) -• 17. The direct evaluationof Eq. { 1) is computationallyinefficient. However, it can be shownthat for any two discrete functionsof time x [ n I andy[n ]: N--I

• x[nly*[n] =•oX[klY*[k],

(3)

whereX[ k] and Y[ k] arethediscreteFouriertransforms of x[n] andy[n], and Y* [k] isthecomplexconjugateofY[k]. Equation(3) isa formof Parseval's equation(Oppenheim, 1975).

We canuseEq. (3) to evaluateEq. ( 1) asfollows.Letting

w[n,k w]e-•%•" = •-f*[n,kcq ].

(4)

Equation (3) gives N--I

x•q[kc,]= • x[n]o,•f*[n,k,•] =--

Nk

o

X[• ]K*[•,•c•],

(5)

whereK [k,k•] is the discreteFourier transformof •9f [ n,k½• ]; thatis

with

N--I

r_o• •- (2(l/•2))•qco.•i.

(2)

for semitonespacing. The Q of a filter is definedasf/Af, whereAf denotes bandwidthandf thecenterfrequency.In thecaseof thefilter definedin Eq. ( 1), thisbandwidthdependsuponthe choice 2698

1 /V--I

J. Acoust.Sec. Am. 92 (5), November1992

K[k,k•,]= • w[n,k•,]d'%"e -•'•"/~.

(6)

WewillrefertotheK [ k,kcq] in thefrequency domain asthe spectral kernels ofthetransformation andto the•f [ n,k,v] asthe temporalkernels.We haveuseda Hammingwindow:

to[n,kcq ] = ct-- ( 1 -- ct)cos(2rrn/N[ kcq] ),

0001-4966/92/112698-04500.80

¸ 1992 AcousticalSocietyof America

2698

wherea = 25/46. In practice,we havechosenthe window andthe exponentialto be symmetricaboutthe centerof the

KERNELS:

MAGNITUDE

OF TRANSFORM

interval and thus evaluated

Here

the

window

is

57

zero

outside

the

interval

(N/2 -- N(kcq)/2, N/2 + N(k,q)/2). With thechoiceof

'--28.5

frequencies of Eq. (2), we are calculating12 constantQ components peroctavecorresponding to the 12 tonesof the equaltemperedscale.The choiceoff,i, = 174.6Hz correspondingto F3, the F belowmiddleC, leadsto 60 componentsof the constantQ transformfrom F3 to the Nyquist frequency. In somecasesquartertonespacingispreferable, and for this



f/

FIG. 2. Magnitude of kernels against numberof Fourierfrequency componentk. Eachfrequencybin represents 10.8Hz.

The spectralkernelsare real since •"[n,k,q] = o?d'* [ -- n,km]; thatis,thetemporal kernels areconjugate symmetric,which is the conditionfor a real discrete Fourier transform.We have evaluatedthe spectralkernels

usingonlytherealpartof thetemporalkernels,sinceweare lookingat positivefrequencycomponents only. Figure I is a displayof the real part of the temporal

Figure2 is a graphof themagnitudeof thespectralkernelsasdefinedin Eq. (6). The lowerdottedanduppersolid linesareartifactsof the plottingprogramandshouldbeignored. These kernelswere obtainedby taking a standard 1024point FFT of the temporalkernelsgraphedin Fig. 1.

kernels as defined in Eq. (4) for the first 30 kernels

Theverticalaxisislabeled withthekernelnumber kmcorre-

•"[n,k,,•], thatis,kmgoesfrom0 to 29in Eq.(4). The sponding to thefrequency ofthekernelasdefinedin Eq. (2), verticalaxisis labeled withthekernelnumberkcecorre- and the horizontalaxis is labeledwith the FFT frequency sponding to thefrequencyof thekernelasdefinedin Eq. (2), and the horizontalaxisis labeledwith the samplenumbern

asit appearsin Eq. (4). Thesekernelsarenormalizedwith the windowlengthsothat sinusoidalcomponents in the signal havingthesameamplitudewill occurwith thesameamplitudefor theirconstantQ transform.Thusthe amplitude in the figureincreases with the numberof the component. KERNELS:

TIME

WAVES

component numberk asusedin Eq. (7) or Eq. (6). These kernelsneedonly be calculatedonceand can thenbe called for usein Eq. (5). It isclearfromthisfigurethatthekernels are nearzeroovermostof the spectrumsotherefew multipliesinvolvedin theevaluationof Eq. (5). We haveestimatedthe error in keepingonly the values

FRACTIONAL

ERROR VS MINVAL

29

• 20

I e+03

2e+03

0

0.1

0.2 MINVAL

time (samples)->

0.3

0.4

0.5

->

FIG. 1.Realpartoftemporal kernels plottedagainst sample numberforthe

FIG. 3. Error in droppingsmallvaluesof kernels{definedin text) plotted

first 30 temporal kernels.

againstcutoffvalue.

2699

J. Acoust.Soc. Am., VoL 92, No. 5, November 1992

J.C. Brownand M. S. Puckotto:Algorithmfor O transform

2699

CLARINET

greaterthan an adjustableparametercalledMINVAL by summingthe absolutevaluesof the numberswhich are droppedanddividingby the sumof theabsolutevalueof all

SCALE 243

valuesof the kernel.The resultsare givenin Fig. 3 wherewe

haveplottedtheerrorin the approximationasdefinedabove againstMINVAL. ChoosingMINVAL equalto 0.15, then there are only two multiplications(or one in a few cases)for components up to Xc"[41 ] and two to six multiplicationsfor the remainingcomponents.In all thereare roughly280 multiplications for quarter tone spacing.Theseare negligiblecomparedto the multiplicationsinvolvedin the evaluationof the FFT. For a Q of 17at a frequencyof 175Hz (the lowestfrequency for which the constantQ transformis desired)and a sample rate of 11 025, a 1024point FFT is needed.This givesrise to 1024 log,_(1024)= 10 240 complexmultiplies. The directevaluationofEq. ( 1) involvessumsoverwindowsrangingfrom 1074samplesat thelow-frequencyendto 34 samplesat the upperend. For quarter tone spacingthis leadsto roughly 35 000 complexterms.Thus our method I leadsto anincrease of a factorof roughly32in computational efficiency. We have estimatedthe run time for our algorithm with

calculationscarried out on a 40-MHz Intel i860 using a hand-codedroutine.With a 512-pointFFT andquartertone spacingover three octaves,the FFT takes343/zs and the transform166-I- 2/xs (measuredon an oscilloscope). This canbe comparedto a time advanceof 25 msbetweenframes,

E

112

MIDINOTE

->

FIG. 4. ConstantQ transformfor a clarinetplayinga chromaticscaleplotted againstmidi note. The lowestnote is 48 correspondingto Ca with a frequencyof 130.8Hz. Eachtime frameon the verticalaxiscorresponds to 25 ms.

so the calculation can easily be carried out in real time. In a

subsequent article,we shalldescribeapproximationswhich increasethe computationalefficiencyand are appropriate for the applicationof fundamentalfrequencytracking. Although we only discussthe caseof constantQ and centered windows here, it is sometimes useful to make other

choices.For example,for low centerfrequenciesone often wishesto decreaseQ in order that the temporal kernelsdo not exceeda givenlengthof time. This requirementusually arisesfrom a needfor temporalprecisionandisnot relatedto our methodof calculatingthe transform.For real-timeapplications,in which delay must be kept to a minimum, the temporalkernelsmay be alignedto the end of the sample window insteadof centeringthem.

Dialoguede !'Ombre Double 243

II. EXAMPLES

'121•'

Graphicalexamplesof the outputaregivenin Figs.4 to 8 where we have plotted Fourier amplitude,as calculated usingEq. (5) versusmidi notefor timeframeswith a 25-ms separation.The useof midi notefor labelingis equivalentto that of frequencywith middle C (261.6 Hz) assignedthe valueof 60, and eachintegeraddedcorrespondsto a stepup

in theequaltempered scale(or to multiplication by 2•/u ). The sample rate was 11 025 for all examples.The sound sourceof Fig. 4 is a clarinetplayinga chromaticscale.Particularly prominentin this graphis the absenceof evenharmonics,which is a featureof the spectraof tubeswith one openendandoneclosedend.Figure5 is thetransformof the sameclarinetin a portionof a performanceof thepiece"Dialoguede l'Ombre Double," by Pierre Boulez. 2700

J. Acoust.Soc.Am.,Vol.92, No.5, November1992

48

•0 MIDINOTE

112 ->

FIG. 5. ConstantQ transformfor a clarinetplayinga portionof thepiece "Dialoguede I'OmbreDouble," by PierreBoulez.The axesare labeledasin Fig. 4.

J.C. BrownandM. S. Puckette: Algorithm for O transform

2700

VIOLIN

VIOLIN

VIBRATO

VIBRATO

100

1oo

53

83 MIDINOTE

53

83

MIDINOTE

->

FIG. 6. ConstantQ transformof a violinexecutingvibratoon thenoteD s. Axesarelabeledasin thetwopreceding figures,but thelowestfrequency is midi note53 corresponding to F• with a frequency of 175Hz.

VIOLIN

0

113

FIG. 8. ConstantQ transformof a violin executingvibratoon the noteD s. This calculationhasdoublethe Q of that of Fig. 6 and twice as many frequencybins.

Figures6 and7 represent theconstantQ spectrumof a violinwith examples ofvibratoin Fig. 6 andglissando in Fig. 7. The vibratois not resolvedin Fig. 6, sowe haveredonethe calculationwith quartertonespacingand an appropriately largerwindowsizein Fig.8.We havechosen theseparticular soundsasexamplessincetheywill be usedfor fundamental frequency trackingin a subsequent article.

GLISSANDO

L

113

->

59

ACKNOWLEDGMENT

JCB is very gratefulto Daniel P. W. Ellis of the MIT Media Lab for invaluablediscussions. Sheis alsogratefulto WellesleyCollege for its generousSabbaticalleave policy and to IRCAM for the use of its facilities during her stay there.

Brown,J. C. (1991). "Calculationof a ConstantQ SpectralTransform,"J. Acoust. Soc. Am. 89, 425-434.

Harris, F. J. (1976). "High-ResolutionSpectralAnalysiswith Arbitrary SpectralCentersand Arbitrary SpectralResolutions,"Cornpt. Elect. Eng. 3, 171-191. Kronland-Martinet, R. (1988). "The Wavelet Transform for Analysis, Synthesis, andProcessing of Speechand Music Sounds,"Cornput.Music J. 12, 11-20. 112

80

MIDINOTE

->

Mont-Reynaud,B. (1985). "The Bounded-QApproachto Time-Varying SpectralAnalysis,"Departmentof Music TechnicalReportSTAN-M28.

Oppenheim,A.V., and Schuler,R. W. (1975). Digital SignalProcessing FIG. 7. ConstantQ transformof a violinglissando from D s to A.s.Axesare labeledas beforewith the lowestmidi note 48 correspondingto C_•( 130.8 Hz).

2701

J. Acoust.Soc. Am., Vol. 92, No. 5, November1992

(Prentice-Hall, EnglewoodCliffs, NJ). Schwede,G. W. (1983). "An Algorithm and Architecturefor Constant-Q SpectrumAnalysis,"Proc. ICASSP 29.2, 1384-1387.

J.C. Brownand M. S. Puckette:Algorithmfor O transform

2701