An adaptive cochlear model for speech recognition.

3 downloads 387 Views 267KB Size Report
Dept. of Electronic Engineering, Regional Technical College,. Athlone, Ireland. ABSTRACf. This paper describes an adaptive cochlear model, in which the.
AN ADAPTIVE COCHLEAR MODEL FOR SPEECH

RECOGNITION

Eliathamby Ambikairajah and Liam Kilmartin

Dept. of Electronic Engineering, Regional Technical College,

Athlone, Ireland.

ABSTRACf This paper describes an adaptive cochlear model, in which the basilar membrane is modelled as a cascade of 128 digital filters, covering the frequency band from 70 Hz to 3.4 kHz. The output of the inner hair cell of each filter is used to vary the coefficients of that filter so that its Q-factor is modified.Thus, for a low-amplitude stimulus, the Q-factor is increased, while it is decreased for a high-amplitude stimulus. This modification takes place at a rate which simulates the continuous adaptation of the basilar membrane. Results are presented for sinusoidal stimuli of different amplitudes, and also for speech input .gnals, Keywords: auditory modelling, speech processing.

control the Q-factor of that filter in such a manner that it can vary between specified limits. If a frequency component of the input stimulus is of low amplitude, the Q-factor of the filter corresponding to that component is increased, thus boosting the filter's output. Conversely, if the frequency component is of high amplitude, the Q-factor is decreased. This behaviour attempts to mimic the properties of the mammalian cochlea. In the model presented here, the basilar membrane is represented by a cascade of non-linear filters, rather than linear filters followed by an adaptive-Q stage (Hirahara and Komakine, 1989). The model operates at a sampling frequency of 8 kHz. It has been tested with sinusoidal stimuli of different amplitudes, and also speech signals. When tested with speech signals, the model amplified the lower-amplitude frequency components in the stimulus. 2.

TIffi PASSIVE COCHLEAR MODEL

The model used for the passive cochlea is that developed by Ambikairajah et al. (1989), in which the basilar membrane is modelled as a cascade of 128 digital filters (Fig. 1). The input stimulus travels down along the cascade of digital filters, and the pressure input to each filter is converted into mechanical displacement of that filter. The pressure transfer function for a single filter is : 1.

va (7)

INTRODUCTION

A number of passive auditory models have been proposed, which consist of cascade/parallel linear filtering stages, followed by non-linear stages (Lyon, 1982; Ambikairajah et al., 1989; Seneff, 1986). On the other hand, several researchers have proposed models which include active basilar .iembrane vibration (Davis, 1983; Neely and Kim, 19~5); however, these models require considerable computation. Some researchers have attempted to overcome the computational burden by using analog hardware models (Zwicker, 1986; Lyon and Mead; 1988). A computational cochlear model consisting of cascaded filters was proposed by Hirahara and Komakine (1989), with linear filters (similar to Lyon(l982» followed by adaptive-Q filters. A less computationally-intensive cochlear model with switching Q­ factors was presented in Ambikairajah and Jones (1990).

-v;rzr

1 - an 1 - b l + b2 = K _ _ .• .. -1 -2 1 - '0 z 1- b l Z + b2 Z

1- a

l

·Z·I

+~i2

1 - al + ~

(1)

where Viis the pressure input to the filter, V 0 is the pressure output from the filter, K is a gain factor, and aO' ai' a2' b l and b2 are the digital filter coefficients. The membrane displacement transfer function is given by: Vm(Z) . Y(z) I

K

1 - 80

1-8 z·1 o

( I - b l + b2 1- b l

Z

-1

-I ) Z

+ b2 Z

-2

(2)

where Vrn is the membrane displacement. The model of the inner hair cell used in the present work is a capacitor model, in which the input voltage corresponds to the spatially differentiated displacement of the basilar membrane.

In this paper, a variation of the model proposed by Ambikairajah and Jones is presented. The basilar membrane is modelled as a cascade of 128 digital filters, each of which has a different resonant frequency in the range 70 Hz to 3.4 kHz. The output (membrane displacement) of each section is transduced into electrical energy by an inner hair cell model. The level of the inner hair cell output of each filter is used to

1331

The manner in which the coefficients a2 and ~ vary is described (for coefficient ~) by the following equations:

APEX

BASE

INPlTI' SIGNAL

Fn.TERi

FlLTERI

,-­

MEMBRANE D1SPu.CEMINI'

~

MEMBRANE DISPu.CEMINI'

=

allJ. 2

=

(~- ~) llJ. (Pibc - Pmax) (P _P . ) + ~

P ibc·> PIJlaX HQ

LQ

INNER HAIR

INNER HAIR CEUi

CELLI

E1.EC11UCAL SIGNAL

ElECTRICAL ~AL

Fig. 1.

INNER HAIR CEU.I28

~

~

Block diagram of the passive cochlear model.

HQ

Wp

a~Q is the value of the ~ coefficient for the high-Q state.

A similar equation holds for the adaptation of the b 2 coefficient The inner hair cell output energy is averaged over the time interval between successive adaptations of the coefficients; this is the value Pihc' The values of P max and Pmin for each filter are estimated as follows. A high-amplitude sinewave, of frequency equal to the centre frequency of the filter, is applied to the model with all of the filters fixed in the low-Q state. The value of the inner hair cell output energy after a steady state has been reached is given the symbol Then, the values of Pmin and Pmax are given by:

e.

Pmin" . -k I' e,

_1

In ( b

2

b

PRESSURE

PRESSURE

-

---.

HQ

.....

Fll..TER i

'J

MEMBRANE DISPLACEMENT

NEWSETOF COEmCIENTS

(aI, 31, bl, b2)

2

u

~ P ihc

INNER HAIR CEll..i

.....

I;:

ADAPTATION

4.

~

E '(3

~~

·e

Pmax = k 2

where k l and k2 are constants, with k 2 > k l and both k l and k 2 between 0 and 1. This procedure is carried out for each of the 128 filters in the model. The adaptation algorithm for the iib filter is shown diagrammatically in Fig. 3.

)

where f s is the sampling frequency and ~ is the pole frequency. A similar relationship holds between the zero Q­ factor and a2' The coefficients a l and b l are re-calculated from the new values of a2 and b 2, in order to preserve the centre juency of the filter.

.r::.

~LQ is the value of the ~ coefficient for the low-Q state;

The adaptive cochlear model is of the same form as the passive

model, except that the Q-factors of the digital filters are varied

according to the level of the inner hair cell output energy

(averaged over a certain time interval). If the (averaged) output

of the inner hair cell of a certain filter is low, the Q-factor of

"at filter is increased; if the output is high, the Q-factor is

_ .creased. The Q-factor of any ftlter is varied by changing the

digital filter coefficients aI' a2' b l and b 2 (see equations (1)

and (2». The Q-factors vary between two fixed levels; for the results presented here, these two levels are separated by 20 dB. Thus, if the inner hair cell energy becomes very small, the Q­ factor saturates at the upper level (high-Q), while if it becomes very large, the Q-factor saturates at the lower level (low-Q). The Q-factors arevaried by changing the coefficients a2 and ~ in a linear manner between certain pre-calculated levels (see Fig. 2). The relationship between the pole Q-factor, Qp' and the coefficient b 2 is as follows:

= - Ts

s Pmax

Pibc < Pmin .

= a2

where

Qp

mm

Pmin s Pibe

ELECrRICAL SIGNAL

TIlE ADAPTIVE COCHLEAR MODEL

3.

t

max

LQ b 2

I

••• .; • • • • • • • • • • • • • • • • • • •"!Io..

P

min

P

ihc

...

P

---

AVERAGER

~

INNERHAIR CEll..

0UIl'lJr ENERGY

max

Fig.3. Block diagram of the ith section of the cochlear model, including the adaptation mechanism.

Fig. 2. Relationship between inner hair cell output energy and filter coefficients (shown for ~ coefficient).

1332

4.

RESULTS OF TIlE SIMULATION OF TIlE ADAPTIVE COCHLEAR MODEL

component in the (smoothed) inner hair cell response whose energy value is above a certain threshold. Figure 5(a) shows a pseudo-spectrogram, for the same utterance, of the inner hair cell response of the adaptive-Q model. In this case, additional trajectories are visible because low-amplitude frequency components have been boosted by the adaptive model.

To demonstrate the behaviour of the adaptive cochlear model, a stimulus consisting of two sinusoids, of different amplitudes in the ratio of 100:1, was applied. Figures 4 (a) and (b) show the response of the adaptive model to this stimulus, after the model has reached steady-state. Fig. 4(a) shows the inner hair cell output energy and Fig. 4(b) the Q-factors of the filters at one particular time instant. Initially, every filter is in the high-Q state. The sinusoid corresponding to filter 29 (1.79 kHz) is of high-amplitude, while that corresponding to filter 73 (522 Hz) is of low-amplitude. Thus, at steady state, filter 29 saturates in the low-Q state; however, filter 73 is not in the low-Q state, because the 522 Hz component is of low amplitude. The ~ and b2 coefficients of filter 73 are still on the linear part of the characteristic shown in Fig. 2 (the Q-factor has not saturated). The response of the adaptive model to the 522 Hz component is boosted relative to the response of the passive model.

INNER HAlR CEIL OUfPU1S 3.\71

3.171

T'- - - - - - - - - - "

...;.-::----­

2.7071-­

2.

.

1.932'

--

--

1.932'

.,.---~

_ -------- -------­

~J ----.....,----­ ".,.

...

-'"

--

e­..,

-8...

--.....

...

'"

-----

lJl

:;

---­

".,.-..

.......­_.......,-.... ........ ......

,f

.,-----------, --.....

U'

u 12

:2 ~ .s 10lJ C

]

1.11

18

21

30

fRAME NUMBER

lu

JJ, 1.11

(a)

I

t

1.11

t

i

011

A

.:11

12

18

2\

30

36

fRAME NUMBER (b)

Fig. 5. Pseudo-spectrograms of the responses of the adaptive (Fig. 5(a» and passive (Fig. 5(b» cochlear models.

1.111l 1

1.11

36

Filler number

5. Fig. 4(a) Inner hair cell output energy vs. filter number for the adaptive-Q model. .

9

CONCLUSIONS

A computational adaptive cochlear model has been proposed in this paper. Pseudo-spectrograms obtained from both the adaptive model, and a passive (low-Q) model have been compared. The spectrogram for the adaptive model indicates that low-amplitude frequency components are boosted. As the frequency components are represented more adequately in the output of the adaptive model, such a model could prove more useful for speech recognition purposes than the more traditional passive cochlear model.

II

~

ACKNOWLEDGEMENT

' .•oj 1.11

I

,

,

I.ll

,

,

"

'.11

0.11

,

'.11

i

,

j

1.11

,

,

This research has been funded by EOLAS, the Irish Science and Technology Agency, under the Scientific Research Programme, 1989 - 1991.

,

1.1111 1

Filler number

REFERENCES

Fig. 4(b) Plot of Q-factor vs. filter number for the adaptive-Q model.

Ambikairajah, E., Black, N. and Linggard, R. (1989)."Digital filter simulation of the basilar membrane", Computer S p e ech and Language 3, 105-118.

The model was also tested using band-pass filtered (300 Hz to 3.4 kHz) speech samples as input. Figure 5(b) shows a pseudo-spectrogram of the inner hair cell output energy of the passive (low-Q) model, for the utterance "one". The inner hair cell output was smoothed before the pseudo-spectrogram was taken. The pseudo-spectrogram plots the trajectory of each

Ambikairajah, E. and Jones, E. (1990). "An active cochlear model for speech recognition", Proc. Third Australian International Conference on Speech Science and Technology, Melbourne, 130-135. Davis, H. (1983). "An active process in cochlear mechanics", Hearing Research 9, 79-90.

1333

Hirahara, T. and Komakine, T. (1989). "A computational cochlear nonlinear preprocessing model with adaptive Q circuits", Proc. IEEE ICASSP, 496-4.9.9. Lyon, R. (1982). "A computational model of filtering, compression and detection in the cochlea", Proc. IEEE ICASSP, 1282-1285. Lyon, R. and Mead, C. (1988). "A CMOS VLSI cochlea", Proc. IEEE ICASSP, 2172-2175. Neely, S. T. and Kim, D.O. (1983)."An active cochlear model showing sharp tuning and high sensitivity", Hearing Research 9, 123-130. Seneff, S. (1986). "A computational model for the peripheral auditory system", Proc. IEEE ICASSP, 37.8.1-37.8.4 Zwicker, E. (1986). "A hardware cochlear nonlinear preprocessing model with active feedback", J. Acoust. Soc. Am. 80, 146-153.

1334