An architecture for Wavelet-Packet Based Speech Enhancement for ...

12 downloads 55 Views 145KB Size Report
Istambul, Turkey, June 5-9, 2000. University of Malaga. Department of Computer Architecture. C. Tecnologico • PO Box 4114 • E-29080 Malaga • Spain ...
An architecture for Wavelet-Packet Based Speech Enhancement for Hearing Aids M.A. Trenas J. Lopez F. Arguello E.L. Zapata

June 2000 Technical Report No: UMA-DAC-00/04

Published in: IEEE Int’l. Conf. on Acoustics, Speech, and Signal Processing (ICASSP’2000) Istambul, Turkey, June 5-9, 2000

University of Malaga Department of Computer Architecture C. Tecnologico • PO Box 4114 • E-29080 Malaga • Spain

AN ARCHITECTURE FOR WAVELET-PACKET BASED SPEECH ENHANCEMENT FOR HEARING AIDS Maria A. Trenas, Juan Lopez and Emilio L. Zapata

Francisco Arguello

Dept. Computer Architecture University of Malaga. Spain. fmaria,juan,[email protected]

Dept. Electronics and Computation University of Santiago. Spain.

ABSTRACT

1. INTRODUCTION In this work we present the core of an architecture for wavelet packets analysis in order to compensate a hearing impairment known as recruitment of loudness. Recruitment of loudness commonly occurs in conjunction with sensorineural hearing impairments. Patients with sensorineural losses generally experience a high-frequency loss, resulting in a reduced dynamic range of hearing (see gure 1). Besides, if a listener su ers from recruitment of loudness, perceived loudness grows more rapidly with an increase in sound intensity than it does in the normal ear. Traditional hearing aids, based on linear ampli cation result on some problems, since linear ampli cation may easily fall outside the patient's dynamic range of hearing. Clipping or automatic gain control, used to avoid exceeding the threshold of pain, results in a distortion in the frequency domain, reducing intelligibility. Another kind of hearing aids try to cope with this problem using amplitude compression with equalization. The idea is to map speech from the normal conversational range into the reduced range of an impaired listener. This non-linear processing in most cases uses parameters that are chosen based on normal conversational speech and remain xed over time. Thus, compression with a gain computation varying in time according to the input speech, will perform better. The FFT based approach [1] attempts to compress

-10

250

500

1000 750

0

2000 1500

4000 3000

8000 6000

Normal 10

HEARING THREHOLD (dB-HL)

Wavelet packets have been applied in order to compensate the speech signal to improve the intelligibility for a common hearing impairment known as recruitment of loudness, a sensorineural hearing loss of cochlear origin. We present an architecture that allows selection of the best decomposition tree for each patient, in order to apply this wavelet-packet based parametric compression algorithm.

Frequency (Hz) 125

20 Mild 30 40 Moderate 50 60 Mod-Sev. 70 Severe 80 90

Figure 1: Example of thresholds of hearing showing high frequency losses the signal spectral into the impaired listener's dynamic range, in an input dependent way . Another example of parametric compression is TVFD [2], also based on the short-time Fourier transform. Wavelet-based compression is a kind of parametric compression where the parameters are the wavelet coef cients. A gain is calculated for each wavelet coecient in each frequency band such that the ratio of log intensity above the hearing threshold to dynamic range of hearing is the same for both the hearing impaired and the normal listeners (see gure 2):   = (1)   From the above relationship, compression gain for a coecient can be derived as follows, using logarithmic representations:  = Tim + (Cmn ? Tnor )  (2) Cmn  where C  and C are respectively, the compensated

and non-compensated wavelet coecients.

testings, with a convenient tree selection for a given patient, intelligibility as well as perceived quality were improved [4].

120

(2,3)

INTENSITY (dB SPL)

(1,1)

100

∆∗

C*

(3,5) (2,1)

δ∗

(3,4)

80 ∆

60

(3,3) (3,1) (3,2)

C

40

(3,0) (2,0)

δ

20

(a) 689

1723

2412 3101 3790 FREQUENCY (Hz)

4823 5512.5

Threshold of pain for both (T pain ) Hearing-impaired threshold (T ) of hearing with standard WT im Hearing-impaired threshold (T im ) of hearing with a fixed tree WP Normal threshold of hearing (T nor )

Figure 2: Parameters in compression gain computation Wavelet based methods have shown to produce similar results to previous compensation schemes [3] [4]. Their advantage is that the use of wavelet packets allows a more exible choice of the frequency bands and, thanks to best-tree searching algorithms, a better matching of the speech waveform. With Wavelet Packet Transform (WPT) we can achieve good spectral and temporal resolutions in arbitrary regions of the time-frequency plane. If the patient has the hearing thresholds shown in gure 1 we could decide, after a convenient testing, to use a decomposition tree with more terminal nodes in the frequency range 1000-4000 Hz, where the threshold of hearing changes rapidly. In gure 2 we are comparing the use of a WP tree with terminal nodes (frequency ordered) [(2,0),(3,2),(3,3),(3,4),(3,5),(2,3)], with the use of the standard WT (see gure 3). We mark the central frequencies for each of the six bands in the chosen WP tree. The di erent parameters are evaluated in each of these central frequencies and used for the whole band. Note that with the WP tree we will be using a better approximation to the patient thresholds of hearing when computing the compression gain, than with standard WT, as the steps of the approximation are smaller in the slope area. Objective evaluation of this kind of WP-based compression provided similar values of the Articulation Index than the standard WT approach. But in subjective

(b)

Figure 3: (a)Standard WT tree, (b)A possible WP tree for the example. We have developed an architecture that allows computation of both direct and inverse wavelet packet transform (DWPT/IWPT), with an intermediate stage for coecient compression. Perfect reconstruction (without compression) is obtained by the use of QMF lters. Our architecture has been designed in such a way that it permits the computation of the complete binary tree (see gure 4), as well as the selection of any required decomposition subtree according to the patient's needs. It is suitable to be used as the core of a hearing aid based on the above parametric compression algorithm. In the following section we will center our attention in the direct WPT, describing our architecture for its computation. Then, due to the lack of space, we will summarize the compression and inverse WPT stages, and will nish this work with some conclusions and future work.

2. AN ARCHITECTURE FOR THE DWPT Direct discrete wavelet packet transform is often implemented using a tree-structured lter bank (see gure 4), where at each octave level j , an input sequence wj ?1;0(n) is fed into low-pass and high-pass lters H0 and H1 respectively. From a multiresolution point of view, the output from the high-pass lter H1 represents the detail information in the original signal at a given level j , which is denoted by wj;1(n), and the output from the low-pass lter H0 represents the remaining (coarse) information in the original signal, which is denoted as wj;0(n). Thus, at each resolution level, wavelets decompose the signal into approximation and detail signals at the next level. Experiments carried out using a sampling frequency of 11KHz and Daubechies coecients, suggested a number of L = 16 coecients and J = 3 levels for obtaining

good results. These parameters will determine the storage and number of processing elements required in our architecture. Using the notation introduced in gure 4, samples produced at node (i; j ) of the tree are obtained from the outputs of their parent node (i ? 1; b 2j c), using the following expression: ( )=

wi;j n

Xh

L?1 k=0

j %2 (k)wi?1;b j2 c (n ? 2

i?1k)

where n 2 2i l being i; j; l 2 Z , 0 < i  J , i 2 , and w0;0 are the input samples.

(3)

0j< w

w2,3 H1

H1

2

H0

2

2

H1

3,6

fs/8

1,1

2

w

fs/2 w H0

H1

3,5

2 fs/8

2,2

2

w

fs/4 H0 w

fs/8 w

fs/4 w

3,7

3,4

w

w w 3,7 3,6 w w 3,5 3,4

0,0

M0

PE0

PE1

M1

PE2 M2

w3,3w3,2 w3,1w3,0

Figure 5: The word-serie architecture.

 J processing elements in a pipelined fashion. Each

processing element computes a level, implementing both low-pass and high-pass ltering (folded). Processor Pi, 0  i  J ? 1 computes i + 1 level of the WPT.  Each processing element consists of a multiplier and an adder (see gure 6). It works on a wordserie way: computation of a new sample requires L computing cycles to be carried out in one sampling period. The rst L ? 1 cycles will store intermediate results in internal register D, and the last cycle will output the resulting coecient.

2

h

fs/8

0,0

(i)

0/1

w3,3

fs H1 w H1

2 fs/8

2,1

2

w3,2

fs/4 H0 w H0

fs/8

1,0

w

fs/2 H1 w

+

1

fs/8 w

fs/4

D

out

0

3,1

2

2,0

2 H0

*

2

2

H0

in

3,0

2

Figure 6: The processing element, including modi cation for subtree selection.

fs/8

Figure 4: A full decomposition 3-levels WP tree. Computation of the DWPT is periodic, with period M = 8 when J = 3 (i.e. the same set of operations is separated by a time index of 8). So, we will concentrate on studying one computing period. Observe in gure 4 that each node in level j computes half of the samples than nodes at level j ? 1. However, the number of nodes ( lters) is twofold. As a consequence, the number of samples to be computed during one computing period in each level remains the same, M . And the number of samples to be computed during one sampling period will be J MM = J . Then, for a real time computation, we can use J lters attaining a complete HW utilization. Besides, to obtain a real time/on line data ow, retaining the natural multiply-accumulate structure (MAC) of ltering, we use the ideas of RPA algorithm [5]. Overlapping the computation of the di erent levels, we schedule each output at the earliest possible instance. We propose a folded word-serial architecture with the following characteristics (see gure 5):

 Communicationbetween stages is achieved through the use of the memories Mi , 0  i  J ? 1. Mem-

ory Mi is partitioned in 2i logical blocks of size L. It stores the intermediate coecients that processor Pi requires to obtain the samples of the 2i+1 bands in level i + 1 of the tree.PJThen, the ?1 2iL = total memory requirements will be i=0 (2J ? 1)L.  Each memory supports simultaneous read and write accesses. As indicated before, computation of a new sample requires L computing cycles: Pi makes L read accesses to Mi (operands) and, in the last cycle, also a writing access to Mi+1 (output). This is controlled by address generation circuits.  We have scheduled the operations in order to minimize the lifetime of the samples, that is, the time they remain in memory, and therefore, the storage requirements. This implies that processor P0 will interleave computations of samples w1;0

and w1;1. Then, P1 must interleave computation of samples w2;0,w2;1,w2;2 and w2;3. And so on.  Selection of a subtree is carried out by means of bypasses, without modi cation of the address generation circuits. In the nal writing cycle, processor Pi will output one of the data it read from memory Mi instead of the one it was suppose to compute. A study of dependences, as the one shown in gure 7, is required to select which input data must be bypassed. In this gure we have supposed L = 4 for simplicity. The example supposes it is not required to decompose node (2; 1) into (3; 2) and (3; 3). The HW must replace the w3;2 and w3;3 samples, by the w2;1 ones. From the gure we can deduce that the data to be bypassed is one of the two last read data, depending on which lter is being applied (H0/H1). This is implemented in the processing elements by the multiplexor which selects the nal value of the output. Of course, if a node is not to be decompossed, the same will occur with all of its descendants. W 3,3 (-8)

W 2,1 (-8)

W 2,1 (-12)

W 2,1 (-16)

W 2,1 (-20)

W 3,2 (-8)

W 2,1 (-8)

W 2,1 (-12)

W 2,1 (-16)

W 2,1 (-20)

W 3,3 (0)

W 2,1 (0)

W 2,1 (-4)

W 2,1 (-8)

W 2,1 (-12)

W 3,2 (0)

W 2,1 (0)

W 2,1 (-4)

W 2,1 (-8)

W 2,1 (-12)

W 3,3 (8)

W 2,1 (8)

W 2,1 (4)

W 2,1 (0)

W 2,1 (-4)

W 3,2 (8)

W 2,1 (8)

W 2,1 (4)

W 2,1 (0)

W 2,1 (-4)

Figure 7: Dependences for DWPT. We select the diagonal with smaller latency.

3. ARCHITECTURES FOR COMPRESSION AND IWPT For applying the compression gains, we have to convert the coecient values from linear to decibel scale. This operation does not require high precision. Then we approximate the function log(1 + X ) by the lineal function Xlog(2). The solution results in a very simple circuitry and small storage [1]. Related to the IWPT, the architecture is similar to the one proposed for DWPT. The processing elements, structure and memory requirements are analogous. The same address generation circuits can be reused with little modi cations ( delays from one level to the next, and an increment in the resulting reading address). This modi cations are impossed by the corresponding analysis of dependencies.

4. CONCLUSIONS AND FUTURE WORKS We have designed a pipelined word-serie architecture for direct and inverse WPT that permits the selection of any subtree. Though many architectures have been proposed for standard wavelet transform [5], we have not found out any proposal for WPT. The architecture's characteristics of real time/on line processing, small storage requirements, and reduced interconnection complexity, make it suitable for VLSI implementation. This architecture has been simulated at a functional level to demonstrate its correctness. This design is adequated for any application requiring a exible choice of the time-frequency(scale) tiling. In particular, we apply this design as the core of a hearing aid device using wavelet-based parametric compression. This work is an step towards the use of wavelet packets with best-tree searching, that allow a better matching of the speech waveform. In [4] a best-tree searching algorithm with a preprocessing stage for denoising demonstrated to improve perceived quality in subjective tests, and intelligibility in both subjective and AI measurements.

5. REFERENCES [1] J. A. Hidalgo, A. Daza, O. Oballe, J. C. Tejero, A. Gago, A Microelectronic Core for a Programmable Digital Hearing Aid, in Proc. DCIS, 1997. [2] J. C. Rutledge and M. A. Clements, Compensation for Recruitment of Loudness in Sensorineural Hearing Impairments Using a Sinusoidal Model of Speech, in Proc. IEEE ICASSP, 1991, pp. 3641-

3644. [3] Laura A. Drake, Janet C. Rutledge, Wavelet

Analysis in Recruitment of Loudness Compensation, IEEE Trans. Signal Processing, vol.41, no.12,

pp.3306-3312, 1991. [4] Mara A. Trenas, Janet C. Rutledge, Nathaniel A. Whitmal III, Wavelet-Based Speech Enhancement for Hearing Aids, to appear in Proc. EMBEC, 1999. [5] Chaitali Chakrabarti, Mohan Vishwanath, and Robert M. Owens, Architectures for Wavelet Transforms: A Survey, Journal of VLSI Signal Processing 14, pp. 171-192, 1996.

Suggest Documents