Wavelet Algorithm for the Estimation of Pitch ... - Semantic Scholar

1 downloads 0 Views 284KB Size Report
which was previously used to analyze phonocardiogram signals [8,9]. The next section of this paper describes the proposed wavelet based algorithm for pitch ...
WAVELET ALGORITHM FOR THE ESTIMATION OF PITCH PERIOD OF SPEECH SIGNAL M.S.Obaidat, C.Lee, Y.Zhang , H.Khalid, and TD. Nelson Department of Electrical Engineering City University of New York, The City College Convent Ave. at 140th Street New York, NY 10031 obaida-ee-mail.engr.ccny .cuny .edu ?Department of Defense 9800 Savage RD Ft. Meade, MD 207554000

ABSTRACT: An algorithm based on dyadic wavelet transform (DyWT) has been developed for detecting pitch period. Pitch period is regarded as an important feature in designing and developing automatic speaker recognitionhdentification systems. In this paper, we have developed two methods for detecting pitch period of synthetic signals. In the first method, we estimated the pitch period using the original signal. Whereas, in the second method, pitch period was estimated from the power spectrum of the signal. Several experiments were performed, under noisy and ideal environmental conditions, to evaluate the accuracy and robustness of the proposed methodology. It was observed from the experiments that the proposed techniques were successful in estimating pitch periods.

obtaining an optimal edge detector was provided by Canny [4]. In a subsequent work Mallat [SI has shown that the multiscale Canny edge detection is equivalent to finding the local maxima of a wavelet transform. Kadambe and Boudreaux-Bartels [6] developed a wavelet based scheme for pitch detection in speech recognition, and have shown that the wavelet based method is superior to the traditional pitch estimation techniques. Brodzik, Obaidat, and Nelson [7] evaluated the performance of Gaussian Window (GW), f i s t derivative Gaussian (DG), Modulated Gaussian (MG), and the onesided exponential window (EW) wavelet. They observed that first derivative Gaussian (DG) has the best estimation accuracy for the pitch period of speech signals.

KEY WORDS: Speech Processing, Wavelet Transform,

In this paper, we present an algorithm based on the Dyadic wavelet transform for detecting pitch period of synthetic speech signals under ideal and noisy conditions. Dyadic wavelet transform @yWT) is a scale-discretized version of the continuous wavelet transform (CWT), which was previously used to analyze phonocardiogram signals [8,9]. The next section of this paper describes the proposed wavelet based algorithm for pitch detection. Section I11 present results and the relevant discussion. Finally, the last section deals with the conclusions.

Pitch Estimation. I. INTRODUCTION: In a generic speaker recognition system, the desired features are first extracted from the speech signal. The extracted features are then used as an input to another sub-system, which makes the decision regarding the verification or identification of the speaker. The process of feature extraction consists of extracting characterizing parameters of a signal to be used for speaker recognition. In feature extraction, the aim is to extract those features that are invariant with regard to the speaker while maintaining its uniqueness from features of an imposter. The traditional speaker identification techniques such as the autocorrelation or cepstrum-based methods, failed to provide accurate results due to the wide range of variations present in the real speech situation [l]. Pitch period is considered an important parameters that can be used reliably for the identification of the speaker [2,3]. Therefore, good performance of an automatic speaker recognition system is strongly related to the accuracy and reliability with which pitch periods for the speech signals can be detected and estimated.

Pitch period can, in some sense, be related to the edge detection problem in image processing. A procedure for

ICECS '96

- 47 1

11. THE ALGORITHM: The flow chart for the proposed DyWT based pitch detection algorithm is shown in figure 1. The algorithm begins by taking n millisecond samples of the synthesized speech signals (clean and noisy). For each of the extracted samples, our algorithm estimates scale factorkcale parameteddilation parameter S and the scale L (where, S=2L).The calculations are performed iteratively with the goal of minimizing an objective function e,@). The function e,(n) was chosen to be the mean-squared prediction error over N samples: (1) e@)= @An) - F(n)I2 Where, F(n)= signal E,(n) = energy for a time-varying signal at scale factor S For time-varying signal, the energy is defined as follows: E,(n) = m [W2Lm)f(n-m)]Z,m = 0,1,2 N-1 (2)

............ .... ,...,

.......

Where, +(m)= scaling function that relates to the wavelet We chose E,(n) for computing the prediction error, since it displays the time varying amplitude properties of the speech signal and enhances the characteristic of the pitch points. Next, the pitch detection algorithm computes DyWT of the signal F(n) and its power I F(n) I * using the value of scale L estimated in the preceding algorithmic steps. The pitch points from DyWT are calculated by heuristically choosing amplitudes that have values greater than or equal to 30% of the Global peak value. The average distance between adjacent pitch points yields the pitch period for the extracted speech sample (clean and noisy). 111. RESULTS AND DISCUSSION: High frequency

noise decays greatly at large scale. Therefore, by using an appropriate threshold to detect maximum at large scale, the faulty detections caused by high frequency noise was reduced greatly. In many of our experiments, we found that large scales E+1, L+2, and L + 3 are sufficient to detect pitch period accurately. The DyWT was computed at scales L+1, L+2, and L + 3 for the signal F(n) and its power I F(n) I (this comprises the two methods respectively). We then estimated the pitch period by measuring the time interval between local maxima of DyWT. The second method provided an improved performance in estimating the pitch period. In order to illustrate the robustness of the proposed algorithm to noise, we added White Gaussian noise to the synthetic signals (lal,~el,lil,lul,lv~and evaluated the pitch period again. We observed that the pitch period estimation based on the application of appropriately scaled wavelet on the signal power was very accurate. Experiments were performed on different synthetic signals that generate sound for different alphabets such as: a, e, i, U, and v. The vocal tract that was used for generating the aforementioned sounds was characterized by an Auto Regressive Moving Average (ARMA) model, consisting of m poles and n zeros. Tables 1 and 2 illustrate the comparative performance of the DyWT of original speech signal (method 1) and the DyWT of the power of the signal (method 2). The comparison focuses upon the following criteria: 1) the accuracy with which the pitch periods can be estimated, 2) the accuracy when the noise is added. We can see that DyWT can provide accurate estimation of the pitch period on both methods 1 and 2 with clean speech signal. However, when the noise appears, method 2 @yWT of the power of speech signal) performed better. DyWT of the original speech signal cannot detect the variation of the pitch period due to the high level noise. The pitch point of the power of speech signal is stretched by taking the square of magnitude. Therefore, we can still detect these points. It is important to note from the tables that the wavelet transform (WT) modules maxima of our experiments tend to vanish as the scale increases beyond a certain

value. The scale of L+2 was found to be suitable for use with the DyWT for the signals under study. To get a flavor of how signals and their DyWTs for the vowels look like, lets glance through figure 2. Figure 2 shows: a synthesized signal (clean plus noisy) for the vowel I d , the power spectrum for the signal, and DyWT for the signal and its power spectrum. Again, observe that the DyWT for the power spectrum of the signal at scale L+2 provides reasonably accurate estimate for the pitch period of the sound generated for vowel lul under ideal and noisy conditions.

'

IV. CONCLUSIONS: In this paper, we present an algorithm based on DyWT for the detection of pitch points of speech signal. The algorithm was evaluated on several synthetic speech signals. The performance of the algorithm was found to be excellent even in the noisy environment. It was observed that the DyWT for the power spectral speech signal provided accurate estimates of pitch period for the signal corrupted by White Gaussian noise. Our algorithm was also found to be successful in calculating the scale L for the DyWT. V. ACKNOWLEDGEMENT: This work is supported by NSA under Grant NSA-0-95-5. VI. REFERENCES [l]S. Kadambe, "The Application of Time-Frequency and Time-Scale Representations in Speech Analysis, "PhD. Thesis, University of Rhode Island, Dept. of Elect. Eng., 1991. [2] W.Hess, "Pitch Determination of Speech Signals: Algorithms and Devices, "Berlin: Springer-Verlag, 1983. [3] E.R.Rabiner and R.W.Schafer, "Digital Processing of Speech Signals, "Prentice-Hall Publisher, 1978. [4] J.Canny, "A Computational Approach to Edge Detection, "IEEE Trans. on PAMI, 8(6), pp.679-698, Nov. 1986. [5] S.Mallat and S.Zhong, "Characterization of Signals from Multiscale Edges, "IEEE Trans. on PAMI, 14(7), pp.710-732, July 1992. [6] S.Kadambe and G.F.Boudreaux-Bartels, "Application of the Wavelet Transform for Pitch Detection of Speech Signals, "IEEE Trans. on Information Theory, 38(2), pp.917-924, March 1992. [7] A.Brodzik, M.S.Obaidat, and D.Nelson, "Performance of Wavelet Algorithms for Speech Processing, "Proceedings of the 1997 SCS Summer Computer Simulation Conference, July 1996.(in Press) [SI M.S. Obaidat, "Phonocardiogram Signal Analysis: Techniques and Performance Comparison, "Journal of Med. Eng. and Tech., 17(6), pp.221-227, Nov./Dec. 1993. [9] M.S. Obaidat and M.M. Matalgah, "Performance of the Short-Time Fourier Transform and Wavelet Transform to Phonocardiogram Signal Analysis, "Proceedings of 1992 ACM Symposium on Applied Computing, pp.856-863, March 1992.

412 - ICECS '96

Table 1. Pitch Detection Accuracy Computa acalmg funmm wm length L

Signal and icala a L=3

.

Compute DyWT. Do Scaling f u w n mnvolubon Wm, signal F.store as E,(n)

*

( Signal without noi8e)

Method2

Method 1 I

L+1

I

L+2

L+3

0.9995

0.9995

1

I

I

L+l 0.9998

L+2

L+3

1.oooO

1.0000

I

Calculate enor e,(n) = E,(n)-F(n)

V I

I

Fmd power of the stgnai by taking square of magnltude IFI'

t

j

I

Compute spline funcbon f" the scalina Dammeter esnmated abwe

Table 2. Pitch Detection Accuracy (Signal with noise) Method 1 caie

,

~ + 1

I

L+2

I

a L=3 0

0.6189

distance

L+2

L+3 1.oooo

1.oOoo

0.9435 I

I

I

L+l

L+3 0.6360

L=4 L&

Method2

r

0.9960

0.9940

0.9900

0.9730

L:~

0.6531

0.9905

0.9965

0.9940

L15

0.9920

0.982

0.9960

0.9940

5

(stop Fig. 1 Algorlthm Flow Chart

ICECS '96 - 473

0.9900

I

r

r

(i)

(ii)

(iii)

(iv)

Fig. 2:' (i)Synthesized signal /U/ without noise (iiilwith 5 0 % noise (iilpouser spectral signal /U/ without noise (iv)with 5 0 % noise, computed at dyadic scale 6,7,8(since L=5).

474 - ICECS '96

Suggest Documents