Binaural application of microphone arrays for ...

10 downloads 0 Views 1MB Size Report
arrays and wearing binaural hearing aids have been compared and these comparisons ...... Starkey has developed the so-called CETERA system with two In-.
Binaural application of microphone arrays for improved speech intelligibility in noise concept-proefschrift I.L.D.M. Merks

Laboratory of Acoustical Imaging and Sound Control Department of Applied Sciences Delft University of Technology

Delft, August 1999

Cover:

Binaural application of microphone arrays for improved speech intelligibility in noise Binaurale toepassing van microfoonarrays voor verbeterde spraakverstaanbaarheid in lawaai

Dit concept-proefschrift is goedgekeurd door de promotor: Prof. dr. ir. A.J. Berkhout

Auteur: ir. I.L.D.M. Merks Laboratory of Acoustical Imaging and Sound Control Faculty of Applied Sciences Delft University of Technology Lorentzweg 1 2628 CJ Delft The Netherlands tel:+31-15-2782021 fax:+31-15-2783251 e-mail: [email protected]

v

Contents

1

Summary

ix

Introduction

1

1.1 1.2 1.3

2

Hearing 2.1 2.2 2.3 2.4

3

One-dimensional receiver arrays ......................................................................... Targets and preliminary choices .......................................................................... Microphone array geometry................................................................................. Measures in array technology .............................................................................. Focusing and beamforming methods ................................................................... Application of head-sized microphone arrays ..................................................... Summary and application of array processing in this thesis................................

Binaural broadside array 4.1 4.2

9

Localization of sound............................................................................................. 9 Speech intelligibility in noisy environments........................................................ 15 Hearing aid technology ........................................................................................ 22 Summary .............................................................................................................. 24

Array technology 3.1 3.2 3.3 3.4 3.5 3.6 3.7

4

Problem description ............................................................................................... 1 State of affairs at the start of this research ............................................................. 1 Objective of this thesis........................................................................................... 8

27 27 37 40 44 47 55 60

63

Double beamsteering method .............................................................................. 63 Preliminary perceptual evaluation of the double beamsteering method.............. 69

vi

Contents

4.3 4.4 4.5

5

Binaural endfire array 5.1 5.2 5.3 5.4

6

8

9

Appendix A Optimal beamforming

115 120 128 131 136

139 139 141 146 148 153

155 159

Derivation of filters for the optimal beamforming method ................................159 Taylor approximation of optimal filters for a two microphone endfire array.....161

Appendix B Results of the perceptual evaluations B.1 B.2

115

Experimental set-up of an artificial diffuse noise field...................................... Physical measurements in the artificial diffuse noise field................................ Speech intelligibility test with normal hearing subjects .................................... Speech intelligibility test with hearing impaired subjects ................................. Conclusions........................................................................................................

Conclusions

A.1 A.2

101

Introduction........................................................................................................ Optimization on the basis of recursive filters .................................................... Optimization on the basis of non-recursive filters............................................. Directivity of the endfire array mounted on a KEMAR manikin ...................... Conclusions........................................................................................................

Evaluation of the optimized endfire array 8.1 8.2 8.3 8.4 8.5

83 91 93 99

Localization test ................................................................................................. 101 Speech intelligibility test.................................................................................... 110 Conclusions........................................................................................................ 114

Optimization of the directivity of the endfire array 7.1 7.2 7.3 7.4 7.5

83

Broadband application of the gradient method.................................................... Implementation .................................................................................................... Measurements ...................................................................................................... Conclusions..........................................................................................................

Perceptual evaluation of the binaural arrays 6.1 6.2 6.3

7

Implementation .................................................................................................... 72 Measurements ...................................................................................................... 75 Conclusions.......................................................................................................... 81

163

Localization tests ................................................................................................163 Speech intelligibility test with one interfering noise source ...............................168

Contents

vii

References

169

List of symbols, abbreviations, and definitions

177

Samenvatting

185

Curriculum vitae

187

Dankwoord

189

viii

Contents

ix

Summary Binaural application of microphone arrays for improved speech intelligibility in noise This thesis describes the design and evaluation of microphone arrays for the improvement of speech intelligibility in noise. Microphone arrays are very capable to improve the speech intelligibility in noise as previous research at the Laboratory of Acoustical Imaging and Sound Control has shown. This thesis improves the recently developed microphone arrays following a two-fold strategy. 1. Miniaturization of the array. The microphone array has been designed such that it can be integrated in a cosmetically good- looking pair of spectacles. This implies that the length of the array is restricted to 14.0 cm for a broadside array and to 7.2 cm for an endfire array. Furthermore, the applied microphones are the smallest available omnidirectional microphones. The array has been designed as a front-end for a normal hearing aid. 2. Optimization of the quality of the array. I. Binaural hearing: the microphone array has a binaural output to enable the localization of sound sources and to increase the speech intelligibility in noise. II. Optimal directivity: the microphone array has an optimal directivity pattern to maximize the speech intelligibility in noisy environments. Two arrays which enable binaural hearing have been designed. A binaural broadside array has been designed and it uses a double beamsteering method to realize a binaural output. This method realizes two main beams: one beam is steered slightly to the left and one beam is steered slightly to the right. These two beams realize an Interaural Level Difference. A binaural endfire array has been designed and it consists of an endfire array in each arm of the pair of spectacles. Each endfire array applies the gradient method such that broadband directiv-

x

Summary

ity is obtained. The binaural endfire array realizes an Interaural Time Difference. The binaural broadside and endfire arrays have been implemented in analogue electronics and the measurements show a good agreement with the simulations. Furthermore both arrays have been evaluated in a localization and speech intelligibility test. The localization test showed that localization is possible with both arrays, but the subjects using the binaural endfire array localize with a much smaller localization error and they find the localization task much easier. The speech intelligibility test with one noise source at several angles showed that the subjects’s speech intelligibility improves considerably with the binaural endfire array in comparison with the diotic endfire array. The tests with the binaural broadside array showed a deterioration in comparison with the diotic broadside array. Hence, the binaural endfire array provides an Interaural Time Difference which is a useful binaural cue and it enables localization of sound sources and it improves the speech intelligibility in noise. Subsequently, the directivity of this endfire array with four omnidirectional microphones is optimized for recursive filters and for non-recursive filters. The optimization with the recursive filters showed that the directivity of the endfire array can be further increased with only a slightly more complex signal processing scheme. The non-recursive filters are digital Finite Impulse Response filters and they realize the optimal directivity. Because of the progression in digital hearing aids, it is expected that complex digital filters are implementable in hearing aids and assistive listening devices in the near future and therefore this optimized endfire array with the digital filters is used in the final evaluation. The binaural endfire array with the digital filters has been evaluated in an artificial diffuse noise field which was set-up in the Audiological Center of the Dijkzigt Hospital Rotterdam. Firstly, an objective evaluation showed that the array mounted on a KEMAR manikin attenuates the diffuse noise field with 7.7 dB (intelligibility-weighted average). Secondly, the speech intelligibility in noise has been evaluated with normal hearing and hearing impaired subjects. The Signal-to-Noise Ratios of the subjects wearing the binaural endfire arrays and wearing binaural hearing aids have been compared and these comparisons show large improvements of 7.5 dB for the normal hearing subjects and of 6.2 dB for the hearing impaired subjects. The latter improvement restores the speech intelligibility in noise of most hearing impaired subjects. In conclusion, this thesis has shown that it is possible to design an array which is cosmetically acceptable as an assistive listening device and which has good audiological properties as well. This binaural endfire array enables binaural hearing and thereby it improves the speech intelligibility and it enables the localization of sound sources. Moreover, the optimal directivity of the array results in a very good speech intelligibility in noisy environments.

1

1 1Introduction

This thesis describes the design and the evaluation of a microphone array for the improvement of speech intelligibility in noisy environments. This chapter describes the problem of speech intelligibility in noisy environments, it establishes the state of affairs at the start of this research and it explains the objective and the set-up of this thesis.

1.1 Problem description Many people have great difficulty in understanding speech in surroundings with background noise and/or reverberation. Eleven percent of the entire Dutch population has difficulties to apprehend a conversation in a group of more than 3 persons (Chorus et al., 1995). This is a larger part of the population than the hearing impaired part of the population which is about 5%. So, it is not only hearing impaired people who face this problem, but also the normal hearing people. This problem cannot be resolved by an ordinary hearing aid, because a hearing aid amplifies all sound and does not discriminate between desired sound (speech) and undesired sound. Moreover, the amplification of the sound is not necessary for many people because they can hear the sounds, but they cannot understand the desired speech. Hence, they need a better Signal-to-Noise Ratio (SNR) to understand the desired speech. Many assistive listening devices have been designed to address this problem and they use different techniques. An overview of these techniques is given in the next section.

1.2 State of affairs at the start of this research A lot of research has been done to improve speech intelligibility in noise. This has led to different techniques. This section describes briefly the most important techniques and relates to

2

1 Introduction

the literature for details. Most techniques are integrated in hearing aids and they can be divided into single microphone and multi-microphone techniques. The single microphone techniques use differences between the frequency spectra and modulations of the desired and undesired signals to enhance the desired signal and the multi-microphone techniques use differences between microphone signals. The simplest single microphone technique consists of a high-pass filter: when the background noise has mainly low frequency components, it can be reduced by the high-pass filter. In many early hearing aids a manually adjustable high-pass filter was used for this purpose (Levitt et al., 1994). A more sophisticated approach is short-term Wiener filtering which adapts the filter to the changing statical properties of the speech and noise. Tests where the statistical properties of the signals were measured separately showed some improvement for hearing impaired subjects. Implementation of short-term Wiener filtering in practice is likely to be difficult since it is necessary to know both the speech and noise spectra in order to derive the appropriate filter (Levitt et al., 1994). A recent technique calculates the modulation of the signal in four frequency bands. When there is a high modulation in that frequency band, the signal is supposed to be speech and then the frequency band is passed, otherwise the signal is supposed to be noise and then the band is attenuated (Siemens, 1998). This technique is useful in situations where the noise is a continuous random signal, for example in a car. Several other single microphone techniques were developed, but they hardly improve the speech intelligibility. In general, the success of the single microphone techniques is limited, because often the desired signal as well as the noise signals are speech signals and thus they have the same frequency spectrum and the same time structure. Furthermore, the techniques need the original signal-to-noise ratio to be high to improveme the speech intelligibility. The multi-microphone techniques have different origins. It can be divided into binaural techniques and general signal processing techniques. Binaural techniques make use of physiological and psycho-acoustical models of the human perception of sound. General signal processing techniques are used in several fields of science and technological applications. The simplest binaural technique is the use of binaural hearing aids. Binaural fitting of hearing aids is widely recommended (Stach, 1998). Benefits from binaural hearing aids include enhancements in: • audibility of speech originating from different directions; • localization of sound; • improved speech intelligibility in noisy environments. This improved speech intelligibility due to binaural hearing is also described by the so-called

1.2 State of affairs at the start of this research

3

cocktail party effect (Blauert, 1983). At a cocktail party, several people are engaged in lively conversation in the same room. A listener is nonetheless able to focus attention on one speaker amidst the din of voices, even without turning toward the speaker. But if the listener plugs one ear, the speaker becomes much more difficult to understand. An extension of binaural hearing is the use of the binaural input signals to enhance the speech signal (Bodden, 1993, 1996; Kollmeier et al., 1993; Kollmeier and Koch, 1994). Bodden has simulated the Cocktail-Party-Effect electronically and the algorithm produces simulations of neural excitation patterns including the spatial distribution of neural excitation. Further analysis is then rendered by a model of more central stages of the signal processing in the auditory system. As a result it is possible to predict the azimuth of sound incidence with respect to the listener’s head. Further, the algorithm estimates parameters of the incoming signals specified by their respective azimuths. These parameters are then used to control the transfer function of a Wiener filter, such as to enhance one desired signal out of the spatial distribution of concurrent signals. Significant increases in comprehensibility are found with one or two competitive speakers in an anechoic chamber. Kollmeier et al. (1993) use a directional and a dereverberation algorithm to increase speech intelligibility. The directional algorithm compares the interaural level and phase differences with the reference interaural level and phase differences resulting from a sound incident from the target direction. The deviation between actual and reference values serves as the input for a weighting function, which yields no attenuation of the respective frequency component if the deviation is small and an attenuation of up to 20 decibel (dB) if the deviation is large. The dereverberation algorithm calculates the interaural coherence and this coherence serves as the input for the weighting function in a similar way as the deviation does in the direction algorithm. The algorithms have shown to be beneficiary when the original SNR is intermediate or high. This means that especially subjects with more severe hearing loss benefit. Kollmeier and Koch (1994) have implemented a more sophisticated algorithm which utilizes the modulation spectra in each frequency band to reconstruct the spectrum of the target speaker in an off-line implementation. In general, the algorithms are beneficial when there is only a small number of competing sources and when the original SNR is intermediate or high. The drawbacks of these algorithms are the computational load and the need of a link between the two hearing aids. An example of a general signal processing technique for noise suppression is an adaptive noise canceller (Widrow and Stearns, 1985). Weiss (1987) uses an adaptive noise canceller with two microphones. One microphone signal consists of the speech + noise signal, the other microphone signal contains only the noise signal. An adaptive filter is used to remove all correlation between the speech+noise signal and the noise signal, so, ideally, the speech signal remains. In practice, it is very difficult to obtain a noise signal which does not obtain the speech signal. The adaptive noise canceller yields good result in rooms with little reverberation or a small number of noise sources, but the results dete-

4

1 Introduction

riorate under true cocktail-party conditions, because the noise signal contains too much speech signal, hence the speech signal is cancelled as well. A more thorough approach is the use of a microphone array which is a spatially distributed set of microphones. Array processing is used for directional signal reception or generation and it is applied in many fields like sonar, radar, and seismics (Haykin, 1985; Skolnik, 1970; Berkhout, 1987). An example of array processing in acoustics is given by Boone (1987). It illustrates that array technology can be used to enhance the signal from one single source while attenuating those from other sources. Figure 1.1 illustrates the principle. →t

→ ∆t

→t

source

microphone array Figure 1.1:

received signals

applied time-shifts

received signals after time-shifting

Illustration of the application of time-shifts to the receiver signals from a single source (from Boone (1987)).

A source radiates sound which arrives at the microphone array. The received signals show that the sound has different travel times from source to array due to path differences. These different travel times are corrected by applying time-shifts to the signals. After the time-shifting, the signals of the microphones are in phase and a summation of these signals is constructive. The applied correction for the travel time is only correct for that single source position (target). For other source positions, the time-shifted signals are not in phase and in that case the summation of the signals is destructive. Thus, the array enhances the signal of one source while attenuating other source signals. This kind of signal processing is also called focusing or delay-and-sum processing. Focusing or delay-and-sum processing is not the only signal processing method which can be used to achieve directional signal reception or directivity with microphone arrays. In general, three signal processing methods are used. Here, they are described briefly. 1. The focusing or delay-and-sum method is described above. This method has two advantages. Firstly, the signal processing is simple. Secondly, the signal processing improves the ratio between the target signal and the internal noise of the microphones (e.g electronic noise). This ratio is called the internal SNR. The disadvantage is that the length of the microphone array has to be long with respect to the wavelengths in order to realize directiv-

1.2 State of affairs at the start of this research

5

ity. 2. The gradient method uses the gradient of the pressure of the sound field. For example, the first order gradient is calculated as the difference of two microphone signals. The gradient signal is proportional to the particle velocity of the sound field in the direction colinear to the array of the two microphones. The pressure and the velocity can be combined to realize directivity (Olson, 1979). The distance between the microphones has to be small with respect to the wavelength in order to realize directivity, so it is possible to achieve directivity with relative small microphone arrays. A disadvantage of this method is the degradation of the internal SNR due to the subtraction of the microphone signals. 3. The optimal beamforming method processes the microphone signals such that the amount of energy of the output signal is minimized with the constraint that the amplification of the signal coming from the target direction is unity (Cox et al., 1986). This technique achieves the highest directivity but it has two disadvantages. The processing of the microphone signals requires a lot of computational power and it degrades the internal SNR. The choice of signal processing method depends on the application of the microphone array. This thesis describes the application of a microphone array to enhance a speech signal in a noisy environment, like a cocktail party. In the case of a cocktail party, one is interested in the speech of one speaker and other noise sources have to be suppressed. Array processing is very suitable for this purpose. This has been demonstrated at our laboratory by an earlier Ph.D. study (Soede, 1990; Soede et al., 1993a, 1993b; Bilsen et al., 1993). The approach and results of that study are briefly summarized here. Soede developed microphone arrays which were mounted on a pair of spectacles. target

target L=14 cm directional microphone

L=10 cm

broadside array

a) Figure 1.2:

endfire array

b) Soede’s microphone arrays. a): broadside array. b): endfire array. Both microphone arrays contain five directional microphones with a cardioid directivity pattern.

Figure 1.2 shows Soede’s broadside and endfire array. A broadside array has its microphones along a line perpendicular to the target direction and an endfire array has its microphones colinear with the target direction. The array-processing used by Soede is similar to the signal processing in Figure 1.1 (delay-and-

6

1 Introduction

sum processing). The signals of the microphones are corrected for time-differences and summed. The time differences along the broadside array are very small, because the length of the broadside array is small in comparison with the distance to the target and the wavelength. Therefore the time-shifts can be neglected and only a summation is necessary. The endfire array has time differences due to the travel time along the array. But, Soede has applied time shifts which are 1.4 times larger than the time differences due to path differences, because endfire arrays have a higher directivity if such exaggerated time shifts are applied (Hansen and Woodyard, 1938). For both arrays, the signal processing is very simple and it has been implemented with analogue electronics. It has been mentioned before that the disadvantage of the delay-and-sum method is that the length of the microphone array has to be long with respect to the wavelength in order to realize directivity. However, this is not the case with the microphone arrays applied by Soede. Directivity at low frequencies is obtained by using directional microphones, see Figure 1.3. Front port

0 30

θ

−5

−30

−10 60

−60

−15 −20

∆d 5.6 mm

Rear port

3.88 mm

a) Figure 1.3:

90

−90

120

time delay acoustical network

−120 150

±180

−150

b) a) Directional electret microphone. Model Microtronic 61. b) Polar diagram with three possible directivity patterns of directional microphones: dipole (dashed line), cardioid (solid line), and hypercardioid pattern (dash-dot line).

The directivity is obtained here with the gradient method. For that purpose, the directional microphone has two input ports and it measures the sound pressure of the front port as well as the back port. The pressure from the rear port is delayed by a time delay acoustical network. The amount of time delay and the distance between the ports (∆d) influence the directivity pattern, which is the sensitivity of an array as function of the angle of the inciding sound (θ). The directivity pattern is usually depicted in a polar diagram and the sensitivity is shown in decibels, where the maximum sensitivity is scaled to 0 dB. The possible directivity patterns of this kind of microphone are shown in Figure 1.3b. When the directional microphone has no time delay, the directivity pattern is a dipole. When the time delay is increased, the directivity pattern changes into a hypercardioid pattern and when the time delay is further increased, the directivity pattern changes into a cardioid pattern. The hypercardioid pattern has the highest directivity

1.2 State of affairs at the start of this research

7

of these three directivity patterns. Soede used directional microphones with a cardioid pattern to obtain a broadband directivity. Soede tested the microphone array in an artificial diffuse noise field. The average improvement in SNR was 7 dB. This has also been confirmed with speech intelligibility tests with both normal hearing and hearing impaired persons. Soede’s design led not to a commercial product due to the cosmetic objections. The length of the arrays are large in comparison with real spectacles. Furthermore, the directional microphones are too large to integrate into the arm of a pair of spectacles. Soede’s work got a lot of attention and a lot of successors, who focused on the optimization of the array’s directivity. Stadler and Rabinowitz (1993) optimized the directivity of the microphone arrays using the optimal beamforming method. Kates (1993) used the same theory to show that it is possible to realize a broadband directivity without directional microphones. The aforementioned research on arrays are all fixed array-processing techniques: the signal processing is fixed during time. It is also possible to adjust the processing of an array according to properties of the received sound signals. This so-called adaptive array-processing is also used to improve the directivity of microphone arrays for hearing aid applications, (DeBrunner and McKinney, 1995; Hoffman et al., 1994). Adaptive array-processing is advantageous when there are just a few noise sources. Then the adaptive algorithm can attenuate these sources very well. However, the performance deteriorates in a situation with a lot of noise sources and/or reverberation like a cocktail party. In that case, the improvement in directivity is low, but the computational costs are high. Soede’s work also draw a lot of attenuation from hearing aid manufacturers. There were already directional hearing aids containing one cardioid microphone. These hearing aids had little directivity and were therefore not favorable (Killion et al., 1998). Phonak developed a new directional hearing aid with two omnidirectional microphones which were used to realize directivity electronically (Vonlanthen, 1991). This hearing aid is more directive than the old directional hearing aids and therefore it is more successful. However, the directivity of this hearing aid is small in comparison with genuine microphone arrays and the gain in SNR is small. Another trend in the hearing aid-technology is digital signal processing. Different manufacturers have developed digital signal processors (DSP’s) (Neuteboom et al., 1997). The computational power of these DSP’s is yet limited and it is only suitable for single microphone signal processing, but the expectation is a fast growth in computation power and a reduction in costs, such that multi-microphone processing techniques can be used in hearing aids. In summary, microphone arrays are widely used in the field of acoustics and they are also used to improve the speech intelligibility in noisy environments of hearing aids. At the start of this research, however, there was still no assistive listening device based on array technology. This thesis describes the design of microphone arrays which could result in such an assistive listen-

8

1 Introduction

ing device. The next section describes the objective of this thesis in more detail.

1.3 Objective of this thesis The objective of this thesis is the design of a microphone array to improve the speech intelligibility in noisy environments. This array has the following features. • The array is integrated in an ordinary pair of spectacles. This has the advantage that the target direction of the array is linked to the position of the head. So the array is focused in the same direction as the listener is looking. • The array must be integrated into a pair of spectacles in a cosmetically acceptable manner. Therefore only small omnidirectional microphones can be used, because directional microphones are too large to integrate in an ordinary pair of spectacles. • The array has a high directivity in order to maximize the improvement in speech intelligibility. This directivity has to be achieved with the gradient and optimal beamforming method because the array can only contain omnidirectional microphones. • The array enables binaural hearing to enhance the localization of sound and the speech intelligibility in noisy environments. The outline of this thesis is as follows. The first two chapters are an introduction to the human perception of sound and to array processing technology. Chapter 2 gives an overview of perceptual research on binaural hearing, speech intelligibility in noise, and localization. It also describes the latest developments in the hearing aid industry. Chapter 3 describes the theory of array-processing technology and it focuses on array techniques for speech intelligibility in noisy environments. The remainder of the thesis describes the design and evaluation of several microphone arrays. Chapter 4 describes the design of a broadside array with binaural output. It uses a double beamforming technique to realize a binaural output. Chapter 5 describes the design of a binaural endfire array. Two endfire arrays, one per ear, use fist order pressure gradient processing to achieve two highly directional output signals. Chapter 6 describes the perceptual evaluation of the binaural broadside and endfire arrays. It focuses on localization and speech intelligibility. Chapter 7 investigates the optimization of the directivity of the endfire array. It discusses several optimizations with different types of filters. The optimization with the Finite Impulse Responsefilters is used to test the improvement of speech intelligibility with normal hearing and hearingimpaired subjects. This evaluation has been done at the Audiological Center of the University Hospital Rotterdam. Chapter 9 draws the conclusions to this thesis.

9

2 2Hearing

This thesis describes the design of a device to assist people in understanding speech in noisy environments. This chapter reviews some topics about hearing which are relevant to this project and it focuses on three topics: • localization of sound; • speech intelligibility in noise; • hearing aid technology. Relevant research with normal hearing and hearing impaired subjects is reviewed. Furthermore, important trends in hearing aid technology are explained. The topics are described briefly and the reader is referred to the literature for details.

2.1 Localization of sound When a normal hearing person hears a sound, he can indicate almost instantly the direction of the sound source. This ability of the human auditory brain is called localization of sound. Localization is important in daily life as Byrne and Noble (1998) illustrate with the following examples. • Localization can play a vital part in understanding group conversations under difficult listening conditions. When the conversation switches from one person to another, the listener needs to locate the new speaker instantly, otherwise he misses the first part of each segment of the conversation, which may seriously reduce speech intelligibility. • Localization is also important to warn for danger. For example, localization helps to perceive an approaching car in traffic. • Localization is part of experiencing the environment in a natural way and this may be very important subjectively.

10

2 Hearing

People find localization an easy task, but research has revealed that localization has a complex background. Therefore, the official definition of localization by Blauert (1983) is repeated here: “localization is the law or rule by which the location of an auditory event (e.g. its direction or distance) is related to a specific attribute or attributes of a sound event, or of another event that is in some way correlated with the auditory event. “ This section explains how people are able to localize; it investigates research into localization, and into localization of the hearing impaired. 2.1.1 Physical phenomena behind localization

People can localize sound sources using the difference between the ear signals as well as properties of each of the signals. These properties can be divided into four types of cues: • Interaural Time Difference (ITD); • Interaural Level Difference (ILD); • Amplitude spectra of each ear signal; • Movements of the head. The first two cues are important for localization in the horizontal plane and they are based on binaural hearing: the human auditory brain uses the difference between the left-ear and rightear signal to localize sources in the horizontal plane. Figure 2.1 shows the origin of this difference. sound field

Interaural Difference

Time

Interaural Level Difference Figure 2.1:

Interaural Time and Level Difference: the origins of the difference between the sound field at the left and the right ear.

Figure 2.1 shows a sound field reaching a subject from the left. This field reaches the two ears but there is a difference between the sound field at the left and the right ear. The sound field reaches the left ear sooner than it does the right ear. This is the Interaural Time Difference (ITD). The sound field at the left ear has a higher level than that at the right ear due to screening by the head. This is the Interaural Level Difference (ILD). This ILD and ITD can be measured using for instance a KEMAR manikin (Knowles Electronics Manikin for Acoustic Research). This manikin is a model of an average human torso and head and it includes microphones at the positions of the eardrums. It has been developed for measurements with hearing aids (Burkhard and Sachs, 1975).

2.1 Localization of sound

11

The transfer functions of the sound from a free field source to these microphones have been measured for different angles of the free field source in the horizontal plane. These transfer functions are called head-related transfer functions (HRTF’s) and they can be measured in an anechoic chamber. Among others, Hulsebos (1998) has performed such a measurement in the anechoic chamber of the Delft University of Technology. From these measurements, the ITD and the ILD have been calculated and they are shown in Figure 2.2. 20 0.5 kHz 1 kHz 2 kHz 4 kHz

0.6 0.4

Interaural Level Difference (dB)

Interaural Time Difference (ms)

1 0.8

0.2 0 −0.2 −0.4 −0.6 −0.8 −1 180

a) Figure 2.2:

90

0 angle (o)

−90

10 5 0 −5 −10 −15 −20 180

−180

0.5 kHz 1 kHz 2 kHz 4 kHz

15

90

0 angle (o)

−90

−180

b) Measurement of Interaural Time Difference (a) and Interaural Level Difference (b) of a KEMAR manikin in the horizontal plane.

The ILD and ITD are zero for sound incident from the front (0o) and sound incident from the rear (180o) and they have optima for sound incident from the left (90o) and from the right (90o). The ITD is rather frequency-independent, but the ILD increases with higher frequencies, because the shielding by the head increases for higher frequencies. The auditory brain uses both ITD and ILD to localize sound sources, but they are not sufficient to discriminate unambiguously for all directions. For example, sound coming from the front (0o) and sound coming form the rear (180o) can not be distinguished on basis of ITD and ILD, because ITD as well as ILD are zero. For front-rear discrimination and localization in the vertical plane the shapes of the HRTF’s are the most important cues. The HRTF’s are shaped characteristically for each sound direction. This shaping is due to the screening by the head, the transfer by the pinna and the reflections from the shoulder. So the spectrum of a sound coming from the back is differently shaped than the spectrum of the same sound coming from the front. This spectral shaping is a monaural effect and it occurs only at high frequencies (above 4 kHz). In addition, small movements of the head result in variations in ITD, ILD, and frequency spectra and these variations eliminate any further ambiguity in localization.

12

2 Hearing

As indicated above, the cues for horizontal and vertical localization are largely different but they overlap to some extent. Spectral cues have some role in horizontal localization and interaural difference cues have some role in vertical localization. Thus for localization, there is some redundancy in the auditory information used. The role of these cues in sound source localization cues has been determined in different research projects which are described in the next section. 2.1.2 Research on localization

One of the most important goals of research on localization is establish how well people are able to localize. Among others, this has been done by determining the smallest change in source position a subject can notice. This smallest change is called localization blur (Blauert, 1983). The results of these experiments are shown in Figure 2.3. ∆θ ~ 4o ∆φ ~ 13o

∆θ ~ 10o a) Figure 2.3:

∆φ ~ 9o b)

Localization blur in: a) horizontal plane b) vertical plane. Data from Blauert (1983).

People localize sound sources in the horizontal plane with the highest precision. The horizontal localization blur is 4o for sources in front of the subject and 10o for sources to the side of the subject. The vertical localization blur is somewhat larger: 9o for sources in front of the subject and 13o for sources above the subject. It should be noted, however, that these measurements have been done with a fixed head. If a subjects is allowed to move his head, the localization blur decreases. Besides total localization performance, a lot of research has been done to measure the influence of ITD and ILD on localization. The influence of ITD and ILD on the localization of a sound source is often tested with stimuli with artificially changed ITD and ILD. The stimuli are presented over headphones. But, if a subject listens to the stimuli with headphones, the auditory event usually occurs inside his head. This effect is called “inside-the-head locatedness” or the internalization of the sound image. This internalization does not always occur when listening with headphones. When the HRTF’s of a subject are measured and stimuli are made using these transfer functions, then the subjects listening to these stimuli with headphones localize the auditory event (just) outside their head. In general, internalization does not occur when the HRTF’s are hardly or not distorted (Hartmann and Wittenberg, 1996). When internalization occurs people do not experience the environment in a natural way. This can

2.1 Localization of sound

13

judgement of lateral displacement

left earlier ↔ left later

4 2 0 2 4

6 −1.5 −1 −0.5 0 0.5 1 interaural time delay (ms)

a) Figure 2.4:

6 4 2 0 2 4

6 −12

1.5

left stronger ↔ left weaker

left ↔ right

6

left ↔ right

judgement of lateral displacement

bring a disconcerting feeling of being isolated (Byrne and Noble, 1998). When internalization occurs, the auditory event is placed along an imaginary line between the ears. This displacement of the auditory event is called lateralization. Figure 2.4 shows how subjects lateralize an auditory event when pure interaural time (left) and level shifts (right) are applied.

−6 0 6 12 interaural level difference (dB)

b) Lateral displacement of the auditory event as a function of a pure interaural time shift (a) and a pure interaural level shift (b). From Blauert (1983).

The subjects use an ordinate to scale the displacement. Here 0 corresponds to the center of the head, and 5 to the average maximum, with the auditory event approximately at the entrance of the ear canal. Figure 2.4a shows that the lateral displacement is linear with the time delay for delays smaller than 0.6 ms. Above 0.6 ms, the lateral displacement is constant. Figure 2.4b shows that the same linear displacement is found for interaural level differences smaller than 11 dB. Above 11 dB, the displacement is constant. It is often argued that lateralization is not natural and that therefore results of lateralization tests cannot be extended to localization. Therefore a different task is developed where the subject has to map an internalized auditory event to an apparent spatial position of a sound source. This task is called judgement of apparent sound source direction in this thesis, after Wightman and Kistler (1992). Wightman and Kistler investigated the role of ITD and ILD in localization. They measured the HRTF’s of their subjects and they manipulated these HRTF’s to produce stimuli in which the interaural time and level cues conflicted. Their direction judgement tests with these stimuli have shown that interaural time differences are effective only at low frequencies and the results also suggest that when low-frequency interaural time cues are present, they override the interaural intensity and spectral shape cues present in both low and high frequencies, and thus provide the dominant cue. As a rule of thumb, the ITD is important in the localization of low frequencies, i.e. below 1500

14

2 Hearing

Hz, with the ITD at 500-600 Hz being the most dominant (Bilsen and Raatgever, 1973). The ILD is important in the localization of high frequencies. Normal hearing persons find localization an easy task which they can do with good accuracy. Localization by hearing impaired subjects is discussed in the next section. 2.1.3 Localization and hearing impairment

Durlach et al. (1981) reviewed the literature on localization and hearing impairment. They concluded that in general it is difficult to draw conclusions from the papers due to differences in test methods and differences in impairment. However, they concluded that general localization and lateralization is • degraded by unilateral deafness and bilateral asymmetry; • degraded more by middle-ear disorders and much more by auditory nerve lesions than by cochlear impairments; • not easily be predicted on the basis of audiograms. Recently, localization tests with unaided and aided hearing impaired subjects have been performed to clarify and specify the roles of degree of hearing loss, in different frequency regions, and type of hearing loss, on various aspects of auditory localization (Byrne and Noble, 1998). They concluded that their findings agree with the localization theory of Section 2.1.1 and Section 2.1.2. • Localization in the horizontal plane decreases as hearing loss at the low frequencies ( 4 kHz) increases. The vertical plane discrimination is readily compromised because most hearing impaired have a hearing loss in the 4- 6 kHz frequency range. Also, the accuracy of frontrear discrimination is related to hearing in the 4 to 6 kHz frequency range, at least among sensorineural cases of hearing impairment. Hearing impaired subjects with a hearing aid tend to have a poorer horizontal localization than without a hearing aid. The deterioration of horizontal localization is caused by the distortion of the interaural time cues. Therefore open earmoulds are advised to optimize horizontal localization. Another optimization of localization can be realized by binaural fitting of hearing aids. This is especially helpful for the moderate and severely hearing impaired. Again, it is important to disturb the interaural time cues as little as possible to optimize horizontal localization. Aided vertical localization is much poorer for those hearing impaired who have a reasonable unaided vertical localization. This is due to:

2.2 Speech intelligibility in noisy environments

15

• the distortion of the pinna reflections by the earmoulds; • the limited amplification of hearing aids above 5 kHz; • the attenuation of high frequency sounds by the earmoulds. In general, localization in the horizontal plane does not deteriorate due to hearing impairment, but the localization in the vertical plane and the front-back discrimination are easily compromised, because most hearing impaired have a hearing loss in the 4 to 6 kHz range. This section has shown that localization is an important function of the auditory function. Another important function of the human auditory brain is the intelligibility of speech.

2.2 Speech intelligibility in noisy environments The most important function of the human auditory system is the understanding of speech. People who have lost this function feel often isolated and they lose contact with other people. Speech intelligibility is assessed with different audiological tests. Tests for speech intelligibility are not universal, because the subjects have to be tested in their native languages. Fortunately, a lot of research on speech intelligibility has been done in the Netherlands and reliable tests to measure speech intelligibility in quiet and in noise have been established. Therefore, this section discusses these Dutch tests and the problems which the hearing impaired encounter with speech intelligibility are illustrated with use of these tests. The tests discriminate between intelligibility in quiet (for example in a one-to-one conversation) and speech intelligibility in noise (at meetings, parties, etc.). First, speech intelligibility in quiet, then speech intelligibility in noise is discussed. 2.2.1 Speech intelligibility in quiet

A good speech intelligibility in quiet is important in situations where a person is talking with only one other person and where no background noise is present. The first tool to assess how people can understand speech in quiet is the pure tone audiometry test. Pure tones are presented to a subject over headphones and the hearing threshold of the subject is established in an iterative procedure (Stach, 1998). The threshold is measured in dB HL (Hearing Level). This is a sound level normalized for the threshold of normal hearing persons: i.e. a normal hearing person has a hearing threshold of 0 dB HL. The thresholds are measured at 250, 500, 1000, 2000, 4000, and 8000 Hz. The most important frequencies for speech intelligibility in quiet are 500, 1000, 2000 Hz. The mean of the pure tone audiometry of these frequencies is called the Fletcher Index (FI) or Pure Tone Average (PTA) and it is an indication for the speech intelligibility in quiet. A second test is the speech audiometry which measures the speech intelligibility in quiet with mono-syllabic words. These words are presented to the subject at a constant sound level. The subject has to repeat each word. Each correct phoneme of the word adds three points to the sub-

16

2 Hearing

ject’s score. After eleven words, the subjects’s total score at that sound level is known and the test is repeated at a sound level that is decreased by 10 dB. The test is repeated until the subject can not reproduce any phoneme correctly. The goal of the test is to determine the level at which the subject can reproduce 50% of the phonemes. An example of speech audiometry is shown in Figure 2.5.

score in percentage

100 80 60 50 40

15

25

35

45

55

65

75

85

95 → Hearing Level

20 0

Figure 2.5:

5

0

10

20

30

40 50 60 70 80 90 100 110 120 130 Sound Pressure Level (dB)

An example of a speech audiometry diagram of a hearing impaired person. The intelligibility score is shown as percentage versus sound pressure level (SPL). The corresponding hearing level is also indicated just above the 50% limit. The curve at the left is the average speech intelligibility for normal hearing.

The subject’s intelligibility score is shown versus the sound level. The 50% intelligibility score of this subject is at 55 dB SPL or 30 dB HL and that level should confirm the Fletcher Index of the pure tone audiometry. The pure tone audiometry and the speech audiometry are standard tools in audiology. However, they do not predict the speech intelligibility in noise. The next section deals with speech intelligibility in noise. the term speech intelligibility refers to speech intelligibility in noise in this thesis. 2.2.2 Speech intelligibility in noise

Many people complain that they cannot understand speech in a situation with background noise. Eleven percent of the entire Dutch population finds it difficult to apprehend a conversation in a group of more than 3 persons (Chorus et al., 1995). However, a large number of these people have no or little problem to understand speech in quiet. Therefore, a different test is necessary to measure the speech intelligibility in noise. In the Netherlands, the speech intelligibility in noise is measured with the so-called Plomp test (Plomp and Mimpen, 1979). The test consists of phoneme-balanced Dutch sentences and the noise has the long-term spectrum of speech. The noise can be steady-state or it can be modulated like speech. The goal of the test is to determine the Speech Reception Threshold (SRT) which is the sound level at which a person understands 50% of the sentences. The SRT is usually expressed as a SNR, but sometimes it is expressed in absolute sound level. In this thesis,

2.2 Speech intelligibility in noisy environments

17

the SRT expressed in absolute level is called the absolute SRT and the SRT expressed in SNR is simply called SRT. The SRT is determined with an up-down procedure as is shown in Figure 2.6a. SRT=0.0 dB s.d: 1.3 100 intelligibility score, (%)

6

SNR (dB)

4 2 0 −2 −4 −6

Figure 2.6:

60 40 20 0

1 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 sentence number

a)

80

−4

−2 0 2 SNR (dB)

4

b) a) Measurement of the SRT using the up-down procedure. Sentence number 14 is actually not presented to the listener, but its level is known from the answer on sentence 13. The SRT is the average SNR level of sentence 5 to 14. b) Intelligibility score versus SNR, from Plomp and Mimpen (1979).

The determination of the SRT starts with the speech sound level below the SRT. In the example of Figure 2.6a, the first sentence is presented to the subject with a SNR of - 5.8 dB. The subject does not reproduce the sentence correctly, so the same sentence is repeated with ever increasing sound level (+4 dB/repetition) until the listener is able to repeat the sentence correctly. Then, the sound level of the next sentence (nr. 2) is decreased by 2 dB. If the listener is able to repeat this sentence correctly, the level of the next sentence is decreased by 2 dB, if he is not, the level of the next sentence is increased by 2 dB. This procedure is done for 13 sentences. Sentence number 14 is actually not presented to the listener, but its level is known from the answer on sentence 13. The average level of the sentences 5 up to 14 determines the SRT and its standard deviation. In this example, the SRT is 0 dB with a standard deviation (s.d.) of 1.3 dB. This standard deviation indicates that the margin in SNR between understanding and not understanding speech is very small and this is also illustrated in Figure 2.6b which shows the intelligibility score (the percentage of correctly reproduced sentences) as function of the SNR. The steepness of the curve shows that a small decrease in SNR results in a large decrease in intelligibility (20%/dB). The usual procedure is to measure the SRT in quiet first. Then the noise level is set to 20 or 30 dB above the absolute SRT in silence in order to assure that the measurement of the SRT in noise is not influenced by the subject’s hearing threshold. The SRT is measured with headphones as well as in free field and monaurally as well as binaurally. Especially, the difference in SRT between the monaural and the binaural conditions are interesting, because it is well known that binaural hearing improves speech intelligibility in

18

2 Hearing

noise. This is also known as the cocktail party- or squelch-effect (Markides, 1977). The human auditory brain uses the binaural input to suppress or squelch the spatially distributed noise sources. The influence of binaural hearing on the speech intelligibility was examined by Bronkhorst and Plomp (1988) and Bronkhorst and Plomp (1992). They measured the SRT of normal hearing subjects in a situation with one noise source where the noise source was positioned at different azimuths in the horizontal plane. They did not do free field measurements, but they used the HRTF’s of a KEMAR manikin and they manipulated the HRTF’s such that they could measure the combined and separate influence of ILD and ITD on the speech intelligibility. The measured SRT versus azimuth of the noise source is shown in Figure 2.7a. 10 SRTmonaural−SRTbinaural (dB)

−20

SRT (dB)

−18 −16

FF

−14

dL

−12 dT

−10 −8 −6 −4 0

a) Figure 2.7:

5

0

30 60 90 120 150 180 noise azimuth (o)

1 2 4 6 number of noise sources

b) a) SRT versus the azimuth of the noise source for three different noise types FF (free field), dL (ILD only), dT (ITD only). (from Bronkhorst and Plomp (1988)) b) Difference between monaural and binaural SRT as function of number of noise sources (data from Bronkhorst and Plomp (1992)).

Clearly, the azimuth of the noise source has a large influence on the SRT. The SRT decreases by more than 10 dB when the free field noise source is to the side (90o). The influence of the ITD and ILD is also evident. The ITD decreases the SRT with almost a constant 5 dB for sources not at the front (0o) or at the rear (180o). The ILD has an angle dependent influence and it decreases the SRT with maximal 7.8 dB for 90o. So, binaural hearing with a spatially distributed noise source has a great affect on the SRT. However, this advantage of binaural hearing decreases when the number of spatially distributed noise sources increases as is shown by research from Bronkhorst and Plomp (1992). They measured the SRT as function of the number of noise sources which were distributed around the subject in different configurations. They measured the SRT of monaural and binaural listening subjects. From these measurements, the difference between monaural and binaural SRT is calculated and this is shown as function of the number of noise sources in Figure 2.7b. The results show that subjects are well able to use binaural hearing to improve (or decrease) the SRT. It also

2.2 Speech intelligibility in noisy environments

19

shows that the difference between monaural and binaural SRT decreases with the increasing number of noise sources. Thus, the advantage of binaural hearing decreases when the number of spatially distributed noise sources increases. However, the improvement is still 2.8 dB for 6 noise sources and this has already a large impact on the speech intelligibility as is shown in Figure 2.6b. The aforementioned researches measured the SRT in several situations. However, it would be useful to predict the (improvement in) speech intelligibility instead of measuring it with elaborate perceptive tests. The next section describes a method which has been developed to predict the improvement in speech intelligibility by systems which improve SNR. 2.2.3 Prediction of the speech intelligibility

Measuring the speech intelligibility is a tedious and laborious work and therefore research has focused on predicting the speech intelligibility in noise using physical measurements. For example, the speech intelligibility in rooms can be calculated as a function of the volume of the room, the room’s reverberation time, the background noise, the talker’s vocal output, and the talker-to-listener distance (Houtgast et al., 1980). This measure (Speech Transmission Index) is based on research on the Articulation Index (Kryter, 1962). The articulation index is calculated as a weighted average of the performance at each frequency band and it takes into account the present speech spectrum, noise spectrum, and the masking spectrum of the noise. Greenberg et al. (1993) have adapted this method to characterize an effective Signal-to-Noise Ratio which is called the (speech) intelligibility-weighted Signal-to-Noise Ratio SNRI: Nf

SNR I =

∑ γi SNRi ,

(2.1)

i=1

where SNR i is the signal-to-noise ratio in the ith frequency band, γ i is the importance weight associated with ith frequency band, and Nf is the number of frequency bands. This measure is used to predict the performance of a system which is designed to improve Signal-to-Noise Ratio. The intelligibility-weighted gain of this system is: G I = SNR I, out – SNR I, in ,

(2.2)

where SNR I, in and SNR I, out are the intelligibility weighted SNR’s at the input and output of the system. The weighted averages are performed on the center frequencies of the one-third-octave bands spanning 180 to 4500 Hz (Stadler and Rabinowitz, 1993). Other authors perform the sum at the frequencies 500, 1000, 2000 and 4000 Hz (Struck, 1998). The next figure shows the weights

20

2 Hearing

γ i from the two different authors. cumulative weights from Articulation Index

weights from Articulation Index

1 Stadler Struck

0.8 0.6 0.4 0.2 0

250

a) Figure 2.8:

500 1000 2000 frequency (Hz)

4000

1 0.8 0.6 0.4 0.2 0

250

500 1000 2000 frequency (Hz)

4000

b) Weights provided by the articulation index according to Stadler and Rabinowitz (1993) (×) and Struck (1998) (+). The weights are normalized to one. a) Weights versus frequency. b) Cumulative weights versus frequency.

These weights show that the most important frequency range for speech intelligibility in noise is between 1500 and 3000 Hz, but the right graph shows that the frequencies below 1 kHz account for 40% of the articulation index weights. It should be emphasized that the intelligibility weighted metrics are intended to characterize a system, not necessarily the intelligibility improvement by that system (Greenberg et al., 1993). In order to predict the speech intelligibility improvement, the actual SNR improvement in the sound field has to be measured. Furthermore, this prediction does not give information about the influence of binaural hearing or the influence of hearing impairment on the speech intelligibility. 2.2.4 Speech intelligibility in noise and hearing impairment

Speech intelligibility in noise is a large problem for hearing impaired people. This section illustrates this problem with use of the research of Duquesnoy and Plomp (1983). They measured the absolute SRT as a function of the noise level in a free field situation with normal hearing and hearing impaired subjects. The subjects listened monaurally and the hearing impaired subjects listened without and with hearing aid. The results of these measurements are shown in Fig-

2.2 Speech intelligibility in noisy environments

21

ure 2.9. 100

40 30 hearing aid 20

60 40

heari ng ai d

50

80

impa ired

60 impaired

al

70

nor m

intelligibility score, (%)

absolute SRT in dB(A)

80

20

10 normal 0

0

a) Figure 2.9:

0

20 40 60 80 noise level in dB(A)

−10 −8 −6 −4 −2 0 SNR (dB)

2

4

b) SRT-test for a group of normal hearing subjects and a group of hearing impaired subjects without and with hearing aid (from Duquesnoy and Plomp (1983)). a) Absolute SRT versus level of noise. b) Intelligibility score versus SNR analog to Plomp and Mimpen (1979).

Figure 2.9a shows the absolute SRT of the subjects versus the sound level of the noise. At low noise levels, the absolute SRT of the normal hearing subjects is determined by the subject’s hearing threshold and not by the noise level and therefore the absolute SRT is constant for noise levels below 20 dB(A) (A-weighted decibel). When the noise level is higher than 20 dB(A), the absolute SRT increases linear with the noise level. This means that the SRT is constant (-5.8 dB). The results of the hearing impaired are quite similar. The absolute SRT is constant for low values of the noise level, but the absolute SRT is high due to the hearing impairment. For noise levels over 60 dB(A), the absolute SRT is again linear with the noise level, but the SRT is higher than for normal hearing (-1.9). When the hearing impaired subjects are aided with a hearing aid, the absolute SRT is lower for low noise levels due to the amplification of the hearing aid, but at high levels the SRT is worse than the unaided SRT (-0.7). In conclusion, hearing impaired subjects have a higher SRT than normal hearing subjects. When the hearing impaired use a hearing aid, their performance is better in quiet, but it is worse in noise. The difference in the performance in noise between normal hearing and the hearing impaired is only a few dB’s, but Figure 2.9b shows the vast impact of this difference. Figure 2.9b shows the curves of intelligibility score versus SNR for the normal hearing, hearing impaired, and hearing impaired with hearing aids subjects, according to Plomp and Mimpen (1979). Here, it is assumed that the steepness of the curves for normal hearing and hearing impaired subjects are the same. For example, suppose that the SNR is -2 dB, then it follows from the curves that the normal hearing person understands almost 100%, the hearing impaired

22

2 Hearing

person understands about 50%, and the hearing impaired person with hearing aid understands only 25%. The hearing impaired person is not able to take part in conversation and the hearing aid only deteriorates the performance. Of course, the aforementioned SRT’s are not valid for all people whose speech intelligibility in noise is affected, but they illustrate two important aspects. • A small deterioration in SRT results in a large loss of intelligibility. • These people need a better SNR at the input of their auditory system in order to restore speech intelligibility in noise. Duquesnoy (1982) stated that a prime requisite for an assistive listening device to provide substantial benefit in noise is that it increases the SNR at least 5 dB. An improvement of the SRT is already achieved by binaural hearing or by binaural fitting of hearing aids. This binaural fitting should be done carefully such that the advantage of binaural hearing is in tact. However, the gap in SRT with normal hearing persons remains and therefore the need of an improvement in SNR remains. The design of an assistive listening device which provides a better SNR is the goal of this thesis. Such an assistive listening device does not have to be used together with an ordinary hearing aid, but the output of the assistive listening device has to be presented to the ear in one way or another. In the past, however, such assistive listening devices were always connected to or integrated in ordinary hearing aids. The hearing aid industry has provided the majority of hearing aids and assistive listening devices. An overview of these assistive listening devices and the technology in the hearing aid industry is given in the next section.

2.3 Hearing aid technology This section reviews the most important technologies in hearing aids to improve speech intelligibility in noise and it starts with current fitting practice. This review describes these topics briefly and it refers to literature for details. 2.3.1 Hearing aid fitting

The basic goal of hearing aid fitting is to restore complete speech intelligibility in quiet when the target source is at normal conversation level of 60 dB SPL (Kapteyn et al., 1988). Hearing aids are usually fitted binaurally for the following reasons: • audibility of speech originating from different directions; • localization of sound; • improved speech intelligibility in noisy environments. Only, if there is a large asymmetry in hearing loss or the hearing impaired does not want two hearing aids, there is monaural fitting. There is a large variety of hearing aids which are suited for different kinds of hearing impair-

2.3 Hearing aid technology

23

ment (Stach, 1998). In the last 5 years, a lot of hearing aids have been developed with Digital Signal Processors (DSP’s). Several manufacturers have developed DSP’s which are often programmable, so the audiologists have a much larger influence on the adjustment of the amplification and of the frequency response. Furthermore, hearing aids are able to contain more programs, so the hearing impaired can use different amplification schemes for different acoustic environments. The developments in digital signal processing are going very fast and the computation power is still increasing. At completion of this thesis, the state of the art is a DSP which can calculate a Finite Impulse Response Filter of 64 points (Van Tasell, 1998). Other examples of DSP’s are developed by Philips Hearing Instruments and ReSound/Danavox (Neuteboom et al., 1997; Edwards, 1988). Despite the fast developments, most algorithms and amplification schemes of hearing aids are focused on speech intelligibility in quiet. The next section gives an overview of those hearing aids and assistive listening devices which are designed to improve speech intelligibility in noise. 2.3.2 Hearing aid and speech intelligibility in noise

The restoration of speech intelligibility in noise is not the first goal of hearing aid fitting, but the hearing aid industry has come up with some improvements. The easiest improvement is to use a high-pass filter to attenuate all low-frequency signals. Often, background noise has many low-frequency components (noise from machines, cars, reverberation, etc.). Furthermore, it reduces the upward spread of masking which is the masking of high frequency components by low frequency noise (Kapteyn et al., 1988). Another improvement is the use of a directional microphone in a hearing aid. This microphone is more sensitive to sound coming from the front than from the back and it can attenuate background noise. Directional microphones were already used in the 1980’s, but the directivity was low (Killion et al., 1998). Nowadays, a directional microphone is realized using two omnidirectional microphones (Vonlanthen, 1991). This has improved the directivity of the directional hearing aids and therefore they are more successful, but they do not achieve the required 5 dB improvement in SNR. Another approach is to improve speech intelligibility in noise by the restoration of binaural hearing as good as possible. Starkey has developed the so-called CETERA system with two InThe-Ear hearing aids (Van Tasell, 1998). The transfer function of the ear canals of the hearing impaired is measured and the DSP’s in the hearing aids are programmed such that these transfer functions are matched as close as possible. This should not only lead to an externalization of the sound image and good localization, but it should also restore speech intelligibility in noise. Except solutions which are integrated in hearing aids, there are hand-held solutions. The most robust is a stand-alone microphone which should be placed near the speaker and the microphone signal is sent to the hearing aid with an FM-transmitter.

24

2 Hearing

Another device is a hand-held microphone array which the hearing impaired carries around and points to the speaker, the signal is again sent to the hearing impaired’s hearing aid with a FMtransmitter (Stach, 1998). The hand-held devices are effective but they are very unpopular. Most hearing impaired do not like to carry apparatus around and thereby expose their hearing impairment. A better solution is to integrate the device into a pair of spectacles (Zwicker and Beckenbauer, 1988; Soede, 1990). Spectacles can be designed such that they are cosmetically acceptable and easy to use. Furthermore, the spectacles moves together with the head and it is therefore directed to the target, because people usually look at the person they are listening to.

2.4 Summary This chapter has reviewed several topics in hearing which are closely related to the problem of speech intelligibility. This section briefly summarizes the most important conclusions. The first section has described the human ability to localize sound sources. People need localization not only to understand group conversations and to warn for danger, but localization is also part of experiencing the environment in a natural way. Localization in the horizontal plane is realized by binaural hearing. Interaural Time and Interaural Level Differences are used to localize sound sources. Localization in the vertical plane is realized with the spectrum of the ear signals which are shaped differently for each direction. Normal hearing persons can localize sources with great accuracy and with little effort. Most hearing impaired can localize sound sources in the horizontal plane but with deteriorated accuracy and with more effort than normal hearing people. Most hearing impaired cannot localize sound sources in the vertical plane due to affected hearing in the 4 to 6 kHz frequency range. The use of a hearing aid deteriorates horizontal localization, but it removes almost all vertical localization. Binaural hearing aids can improve horizontal localization. The second section has described the auditory system’s ability to understand speech in quiet as well as in noisy environments. Tests have been developed to measure intelligibility in quiet as well as in noise, because the measurement of the speech intelligibility in quiet does not predict the speech intelligibility in noise. Measurements of the speech intelligibility in noise have shown that little changes in SNR result in large changes in speech intelligibility in noise. Binaural hearing improves the speech intelligibility in noise. This binaural effect is also called the squelch or cocktail party effect. People with deteriorated speech intelligibility in noise need an improvement in SNR of at least 5 dB to have substantial benefit. The third section has described the use of hearing aids and the current hearing aid technology. The necessary improvement in SNR is not delivered by conventional hearing aids because

2.4 Summary

25

those hearing aids are only optimized to improve speech intelligibility in quiet. Two important trends can be seen in the hearing aid industry: the use of directional microphones to improve the speech intelligibility in noise and the use of Digital Signal Processors which allows much more complicated signal processing techniques to be implemented in hearing aids. Assistive listening devices are effective but they are usually unpopular because they are conspicuous and therefore not cosmetically acceptable. An assistive listening device which improves speech intelligibility in noise should have the following features in order to be successful: • improve the SNR with at least 5 dB; • enable binaural hearing; • easy to use and cosmetically acceptable. This thesis presents such a device. It is integrated into a pair of spectacles and it uses multiple microphone or array techniques to improve the SNR. The theory of array processing technology is described in the next chapter.

26

2 Hearing

27

3 3Array technology

This chapter explains the theory of array technology in acoustics and the current state in array technology and it focuses on array technology for hearing-aid applications. The theory is a summary of theory on general acoustics and array processing. A more extensive overview can be found in Berkhout (1987) and Ziomek (1995). An overview of the used symbols and the mathematical definition of the Fourier transform can be found in the “List of symbols, abbreviations, and definitions” on page 177. Section 3.1 explains the theory of continuous one-dimensional receivers and discrete onedimensional receivers which are usually called arrays. Section 3.2 describes the requirements the arrays need to meet in this application. Section 3.3 and Section 3.4 formulate the mathematical background to describe different performance measures of the array. Section 3.5 describes different methods to achieve directivity with microphone arrays. Section 3.6 compares these directivity methods and illustrates their properties with some examples from the literature. Section 3.7 summarizes this chapter and it shows how array processing is used in this thesis.

3.1 One-dimensional receiver arrays This section describes the theory of one-dimensional or line receivers. These receivers are applied in air which is supposed to be an isotropic, homogenous, linear medium. This section explains the response of a line receiver to one source and it presents the Fraunhofer approximation which allows to write the directivity pattern of a line receiver as a Fourier transform of its spatial dimensions. Finally, it describes the difference between continuous line receivers and discrete line receivers (arrays).

28

3 Array technology

3.1.1 Acoustic waves and sources

In acoustics, two fundamental wave forms are distinguished: the plane wave and the spherical wave. An example of a plane wave is shown in Figure 3.1. z

r k = knˆ

wavefront

O

y

x Figure 3.1:

Plane wave traveling in the direction of the propagation vector vector in the direction of k and k is the magnitude of k .

k . The vector nˆ is the unit

The plane wave is traveling in the direction of the propagation vector k . The magnitude of k is the wave number and can be written as: ω 2πf k = k = ---- = -------c c

(3.1)

where ω is the radial frequency, f is the frequency, and c is the wave propagation velocity. The acoustic pressure of the plane wave at position r is described in the frequency domain with the following equation: P ( r, ω ) = S ( ω )e –jk ⋅ r .

(3.2)

Here ⋅ presents the inner product of two vectors, j is the imaginary unit, and S(ω) is the spectrum of the signal that is transported by the wave. The time domain solution is obtained by using the temporal Inverse Fourier Transform Ft-1 as defined on page 184 giving r ⋅ nˆ p ( r, t ) = s  t – ----------  c 

(3.3)

where nˆ is the unit vector in the direction of k . The spherical wave form is the wave form generated by a monopole point source. If there is a monopole at the origin, then the acoustic pressure of the spherical wave at position r is described in the frequency domain with the following equation: e – j kr P ( r, ω ) = S ( ω ) ---------- . r

(3.4)

Here r is the distance to the sound source. The time domain solution is obtained by applying

3.1 One-dimensional receiver arrays

29

the Inverse Fourier Transform to Equation (3.4): r s  t – --  c p ( r, t ) = ------------------ . r

(3.5)

Equation (3.5) shows that moving away from a monopole source has two effects on the sound pressure. 1. the sound pressure is inverse proportional with the distance r; 2. the sound pressure is delayed by r/c, due to the finite propagation velocity c of sound waves. The response of a continuous line receiver to sound inciding from a monopole is described in the next section. 3.1.2 Continuous line receiver

Figure 3.2 shows a monopole source in point A (xA,yA,zA) and a continuous line receiver. z θA A rA line receiver

-L/2 O y

L/2

φA

x Figure 3.2:

A line receiver through the origin O of a cartesian coordinate system with a monopole source in point A. The length of the receiver is L, the position of the monopole is described with its distance to the origin rA and the angles θA and φA.

The line receiver has length L and it is placed on the x-axis symmetrically around the origin of the cartesian coordinate system. The receiver consists of a continuous line of infinitely small omnidirectional receivers of which the outputs are summed. For a continuous receiver line with a distribution of omnidirectional receivers, the array output U(ω) with sound inciding from monopole A is mathematically expressed by

30

3 Array technology



U(ω) = S(ω)

e – jkr ( x )

dx . ∫ F ( x, ω ) ---------------r(x)

(3.6)

–∞

Here r(x) is the distance from the source to coordinate x on the x-axis, F(x, ω) represents the frequency dependent line receiver sensitivity with F(x, ω) = 0 outside the receiver. Equation (3.6) states that the output of a linear array can be calculated by integration of all receiver signals. Notice that the distance r(x) is different for each part of the line receiver. Hence, this yields a complex calculation. Next, an approximation is presented which simplifies Equation (3.6) by linearizing the phase. Figure 3.2 shows that the distance rA between A and the origin O can be described in cartesian coordinates: 2

r A = r ( x A, y A, z A ) =

2

2

xA + y A + zA .

(3.7)

and that the position vector r A can be described in spherical coordinates: r A sin θ A cos ϕ A r A = r A sin θ A cos ϕ A .

(3.8)

r A cos θ A Likewise, the distance r ( x ) between A and a point on the line L is defined as follows: r(x ) = r =

2

2

2

( xA – x ) + y A + zA .

(3.9)

For small values of x around the origin, the phase of Equation (3.6) may be linearized: ∂r kr ≈ k r A + ----- x . ∂x O

(3.10)

The partial derivative can be computed from Equation (3.9) and Equation (3.8): ∂r---∂x

O

1 = – ---- 2 ( xA – x ) 2r

O

xA - = – cos φ A sin θ A. = – ----rA

(3.11)

After substituting Equation (3.11) in Equation (3.10), the linearized phase can be described by kr ≈ kr A – k sin θ A cos φ A x

(3.12)

kr ≈ kr A – k x x ,

(3.13)

or where k x = k sin θ A cos φ A . Substituting the above results in Equation (3.6) and assuming that the amplitude can be approximated as r ≈ r A , yields

3.1 One-dimensional receiver arrays

e – jkr A U ( ω ) = S ( ω ) ------------rA

31



∫ F ( x, ω )e

jk x x

dx

(3.14)

–∞

or – jkr

e A U ( ω ) = S ( ω ) ------------- F x { F ( x, ω ) } rA

(3.15)

where Fx{s} denotes the spatial Fourier Transform with respect to x, according to the definition in the “List of symbols, abbreviations, and definitions” on page 184. Equation (3.14) is the Fraunhofer or far-field approximation of Equation (3.6). Monopoles for which the Fraunhofer approximation is applied are said to be in the Fraunhofer or far-field region of the receiver. Equation (3.15) shows that the response of a receiver can now be considered as the response of an omnidirectional receiver in the centre of the line to a monopole source in A, multiplied by the spatial Fourier Transform of F(x, ω) . F(x, ω) represents the frequency behavior as well as the geometrical characteristics of the linear receiver. The linearization of the phase also indicates that the incident sound field from sources in the – jkr Fraunhofer region are considered as scaled local plane waves. The factor S ( ω ) ( e A ⁄ r A ) represents the scale factor and the factor F x { F ( x, ω ) } decomposes the field in directive plane waves. Thus, spherical waves of monopoles in the Fraunhofer region may be considered as plane waves along the receiver. Or in general, any wave field on the linear receiver due to any source distribution can be considered as a sum of three dimensional plane waves. So, the output of a line receiver with sound sources in its Fraunhofer region can be described with Equation (3.14) and Equation (3.15). Equation (3.14) can easily be extended to a multiple source response: e –jkr m U ( ω ) = ∑ S m ( ω ) ------------rm m



∫ F ( x, ω )e

jkxm x

dx

,

(3.16)

–∞

in which k xm = k sin θ m cos φ m . The output of a receiver array depends on the angles of the sources with respect to the receiver, the source spectrum Sm(ω), the geometrical properties of the receiver and the frequency responses of the individual receiver elements. The Fraunhofer approximation is an important simplification, but it is only valid when the phase over the array can be linearized without introducing too large errors. The size of this error

32

3 Array technology

can be determined with the use of Figure 3.3. wavefront

δ

A

L

Figure 3.3:

rA

Line receiver with length L and monopole A at distance r . The Fraunhofer approximation A introduces an error δ.

The Fraunhofer approximation treats waves from monopole A as plane waves. This introduces a maximum error δ. This error can be approximated by using Pythagoras: δ =

2

2 L 2 L r A +  --- – r A ≈ --------- .  2 8r A

(3.17)

Berkhout (1987) states that this error is acceptable when the phase error kδ « π ⁄ 4 or L2 r A » ----------- . λ min

(3.18)

where λmin denotes the minimum wave length of interest. All sources complying to Equation (3.18) are in the Fraunhofer or far-field region of the receiver. Equation (3.18) is also called the far-field criterion. 3.1.3 Directivity pattern

The directivity pattern Γ(θ,φ,ω) of a receiver is defined by splitting Equation (3.16) into a source dependent and a receiver dependent part: U(ω) =

e –jkrm ------------ Γ ( θm, φ m, ω ) S ( ω ) ∑ m rm

(3.19)

m

with ∞

Γ ( θ, φ, ω ) =

∫ F ( x, ω )e jk( cos φ sin θ)x dx

=

F x { F ( x, ω ) }

(3.20)

–∞

The directivity pattern describes the directional behavior of the receiver independent of the incident wave field. |Γ(θ,φ,ω)| shows the plane wave amplitude in a receiver array as a function of the angles of incident φ and θ. Notice that the definition of the directivity pattern is given in the Fraunhofer area. Therefore, the directivity pattern is obtained by a spatial Fourier Transform of F ( x, ω ) .

3.1 One-dimensional receiver arrays

33

As an example, the function F ( x, ω ) which does not depend on ω is defined as: x ≤L⁄2 x >L⁄2

1 F ( x, ω ) =  0

(3.21)

The directivity of this line receiver in the x-z plane can be calculated with the spatial Fourier transform and φ = 0 . The spatial Fourier Transform of this rectangular window function is a sinc-function: L sin  kx ---  2 Γ(θ, φ = 0, ω) = ---------------------kx ⁄ 2

with k x = k sin θ .

(3.22)

This spatial Fourier Transform is also depicted in Figure 3.4. 1.5

F(x, ω)

|Γ(θ,φ=0,ω)| L

1 1 Fourier Transform

0.5

0

−L/2

a)

L/2

x

−3λ/L −2λ/L −λ/L

0

λ/L

2λ/L 3λ/L sin θ

b) Figure 3.4:

a) Line receiver of length L with a constant F(x, ω) across the receiver. b) Directivity pattern of line receiver as function of sin θ .

The directivity pattern in Figure 3.4b depends on the length L, the wavelength λ and the angle of the incident sound wave. This means that the directivity pattern is frequency-dependent. This is also shown by depicting the directivity pattern in another way in Figure 3.5.

34

3 Array technology

kx

0

dB

kx=k sin θ

kx=-k

k (m−1)

0

kx=k

0

10

−10

20 −20

k

30 −30

a)

−20

−10

0 10 kx (m−1)

20

30

b) Figure 3.5:

a) General outlook of kx-k diagram. b) kx-k diagram of a continuous line receiver

The kx-k diagram is a way to show the frequency-dependency of a line receiver. The diagram is only shown for k x ≤ k , because k x ⁄ k = sin θ has to be between -1 and 1 to represent a real angle. This means that the kx-k diagram shows the directivity pattern for the angles from o o – 90 or ( k x = – k ) to 90 ( k x = k ) . Due to symmetry, the array is not able to distinguish between sound coming from the front and from the back. Hence, the directivity pattern from o o o o – 90 to 90 (front) is the same as the directivity pattern from 90 to – 90 (back). A straight line through the origin of the kx-k diagram represents the directivity pattern in one direction as function of k. A horizontal line through the kx-k diagram presents the directivity pattern at one k-value or frequency. Figure 3.5b shows the kx-k diagram of the line receiver. The legend shows that a black shade corresponds with high sensitivity and a white shade with low sensitivity. It is easy to see that there are more white shades at higher k-values. Thus, the directivity increases for higher k-values (or frequencies). The kx-k diagram shows the directivity pattern as function of the frequency and angle, but it is not easy to extract information from the diagram. Figure 3.6 shows that this is much easier from

3.1 One-dimensional receiver arrays

35

a polar diagram or a θ-f diagram of the directivity pattern. − 1kHz, −. 2kHz, −− 4kHz,

dB

0 30

−5

0

−30

−10 −60

−15 −20

90

−90

120

−120 150

a) Figure 3.6:

frequency (kHz)

60

±180

−150

1

−5

2

−10

3

−15

4

−20

5 180

−25 90

0 −90 θ (degrees)

−180

b) Two representations of the directivity pattern of a line receiver in the x-z plane. Both diagrams show Γ(θ, φ = 0, ω) in dB with the maximum normalized to 0 dB. The length of the receiver is 0.14 m. a) Polar diagram. b) θ-f diagram.

Both diagrams show the directivity pattern Γ(θ, φ = 0, ω) as a function of the angle θ . In this example, the length of the receiver L=0.14 m. The polar diagram of Figure 3.6a shows the directivity pattern for the frequencies 1, 2, and 4 kHz. The directivity patterns get more directive to 0 and 180 degrees which means that the array is more sensitive for these directions. The lobe in the main direction 0o is called the main lobe or main beam. For frequencies above 2 kHz, there also small lobes at other directions. These lobes are called side lobes. However, the level of these side lobes are 10 dB below the main lobe for a frequency of 4 kHz. A polar diagram shows well the attenuation of the array versus the angle. A better overview of the directivity pattern as function of the frequency is given in Figure 3.6b. This θ-f diagram shows the directivity pattern with gray shading. The θ-f diagram shows that the width of the main beam of the array decreases with increasing frequency and that the directivity pattern contains more smaller side lobes for high frequencies. In general, a polar diagram gives a better quantitative view of a directivity pattern, a θ-f diagram gives a better view of the directivity pattern as function of the frequency. Clearly, Figure 3.6 shows that the receiver array is directive and that the amount of directivity depends on: • the length of the array; • the wavelength or the frequency of the sound; • the geometrical and the signal processing function F(x, ω) .

36

3 Array technology

The directivity can also depend on the use of discrete arrays instead of continuous line receivers. This is discussed in the next section. 3.1.4 Discretization of the line receiver

Continuous line receivers are rarely used in acoustics. Instead, an array is used. An array can be thought of as a sampled continuous receiver. That is a continuous receiver that is excited only at points or in localized areas. An array consists of individual electroacoustic transducers, e.g. microphones. The sampling or discretization of continuous receivers is called spatial sampling which can be described by (Berkhout, 1997): F ∆x(x, ω) = F ( x, ω ) ∆x ∑ δ(x – n ∆x) = n

∑ Fn(ω)δ(x – n ∆x)

,

(3.23)

n

where δ(x) is the delta pulse, ∆x is the sampling distance, and F n(ω) = ∆x F(n ∆x, ω) is called the nth sample or microphone. Section 3.1.3 showed that Γ(θ,φ,ω), i.e. the array response to a plane wave arriving from the direction (θ,φ), is obtained by the spatial Fourier Transform of F(x, ω) . This means for a continuous line receiver on the x-axis that the directivity pattern in the x-z plane ( φ = 0 ) can be described as function of kx : Γ ( k x, ω ) =

F x { F ( x, ω ) } .

(3.24)

The directivity pattern of a discrete array can be represented by Γ ∆x as follows: Γ ∆x ( k x, ω ) =

Fx { F∆x ( x, ω ) }

=

-, ω . ∑ Γ  kx – n ----∆x  2π

(3.25)

n

In the remainder of this thesis, the subscript ‘∆x’ of Γ ∆x is omitted, because from this point on only discrete arrays are considered. Equation (3.25) shows that the directivity pattern is periodic in k x which is also depicted in Figure 3.7. kx

0

kx=k sin θ

kx=-k k

a) Figure 3.7:

-2π/∆x

0

f = c/2∆x

kx=k

2π/∆x

kx

f = c/∆x k

b)

a) kx-k diagram of a continuous line receiver. b) kx-k diagram of discrete line array with spatial sampling distance ∆x. The dark patching are the areas which are affected by spatial aliasing.

3.2 Targets and preliminary choices

37

Figure 3.7a shows the kx-k diagram of a continuous line receiver. After sampling, the kx-k diagram has become periodic with period 2π ⁄ ∆x , see Figure 3.7b. This results in parts of the triangles overlapping each other as is shown with dark patches. This overlapping is called spatial aliasing and it results in a loss of directional information of the incident wave field for the regions of overlap. However, the kx-k diagram is not disturbed up to the horizontal dotted line. This means that for f < c ⁄ 2 ∆x the array response of a discrete array is identical to that of a continuous array. So, if spatial aliasing is to be avoided for frequencies up to f max , the spacing of the microphone arrays should meet the following condition: c 1 ∆x ≤ ------------- = --- λ min , 2f max 2

(3.26)

where λmin is the minimum wavelength present in the wave field. This section has shown that arrays can be used for directive signal reception. The next section describes what kind of arrays are used in this thesis to improve speech intelligibility in noise.

3.2 Targets and preliminary choices The previous section showed that arrays can realize directivity: it is possible to amplify sound from one direction while attenuating sound from other directions. This property of microphone arrays is used in this thesis to improve the speech intelligibility in noisy environments. It is the goal of this thesis to design a microphone array which can be used in daily life as an assistive listening device. Therefore, the microphone array has to comply with certain requirements. A. The array is integrable into a normal pair of spectacles. There are two reasons to integrate the array into a normal pair of spectacles: • A pair of spectacles is widely used and it is cosmetically acceptable. • A wearer of the spectacles is usually facing the person whom he wants to hear. Therefore, the position of the array is fixed with regard to the desired direction.

38

3 Array technology

The dimensions of a normal pair of spectacles is shown in Figure 3.8. top view 3.5

140 side-view

72 72 8.0

a)

b) Figure 3.8:

a) Dimensions of an average pair of spectacles in mm. b) Dimensions of a commercial available arm of a pair of spectacles with microphones integrated.

The microphone array can be integrated into the front or into the arms of a pair of spectacles, see Figure 3.8a. The array in the front is called a broadside array because it is perpendicular to the desired speaker who is in front of the array-wearer. The length of the broadside array is limited to 140 mm. The array in the arm is called endfire array because it is colinear to the desired speaker. A detailed overview of a commercial arm of a pair of spectacles is given in Figure 3.8b. The length of the array in the arm is limited to 72 mm. There is no more room at the front of the arm because there is the hinge to the front of the spectacles. There is no more room at the back of the arm because the arm is very small in the neighborhood of the ear. The top view of the commercial available arm shows that there is also little room to conceal the microphones. This means that the smallest available microphones have to be used. At the completion of this thesis, this is the Microtronic elecret model 9355, see Figure 3.9. 1.52

1.00 2.28 2.8

3.63

3.63 Figure 3.9:

0.66

Microtronic electret microphone model 9355. The dimensions are in mm. (from Microtronic (1995)).

The dimensions of model 9355 just fit in the arm of the pair of spectacles. So, it is not possible to use directional microphones such as shown in Figure 1.3.

3.2 Targets and preliminary choices

39

Besides these general restrictions to the broadside and endfire array, there are also some differences between the broadside and endfire array. The broadside array has some disadvantages in comparison with the endfire array. Firstly, the broadside array is right in the face of the wearer, so the cosmetic restrictions to this array are more severe than to the endfire array which is at the side of the head. Secondly, the wearer has limited choice in the frame of the spectacles when a broadside array has to be integrated. The endfire array, however, can be built in a separate arm which can be connected to different frames. Thirdly, the wiring of the broadside arrays has to go through the hinge of the pair of spectacles. A previous introduced pair of spectacles with microphones at the front had to be withdrawn from the market because the wiring snapped due to the repetitious bending of the hinge (Den Dulk, 1992). Fourthly, the broadside array is close to the wearer’s mouth. The wearer of a broadside array will hear his own voice much louder than with an endfire array or with a conventional hearing aid. An advantage of the broadside array is that the array is closer to the target than an endfire array or conventional hearing aid. Furthermore, the head is a screen against noise coming from the back. This screening is frequency-dependent as is shown by the measurements of the ILD in Figure 2.2b on page 11. In general, an endfire array is more likely to provide a cosmetically acceptable array. B. The array provides directivity in the frequency range of 180-4500 Hz Section 2.2 has shown that the frequency range from 180-4500 Hz is important for speech intelligibility in noise. The goal is to provide directivity in that frequency range with an emphasis on the frequency range of 1000-3000 Hz which is the most important frequency range for speech intelligibility. The restrictions on the length of the array and on the frequency range of interest enables to calculate the beginning of the Fraunhofer area with and L = 0.14m λ min = c ⁄ f max = 344 ⁄ 4500 = 0.076m . L2 r A » ----------- = 0.26m . λ min

(3.27)

Clearly, the sources in daily life are much further away from the array-wearer than 0.26 m. It is safe to state that all sources are in the Fraunhofer area. It is also possible to calculate the maximum spacing of the microphones in the array to avoid spatial aliasing, using Equation (3.26): ∆x < 0.038m .

(3.28)

Hence, the broadside array should contain at least 5 microphones and the endfire array should contain at least 3 microphones.

40

3 Array technology

C. The array has a flat frequency response for sound inciding from the target direction The microphone array is not a stand-alone device, but it is a front-end for a hearing aid or other listening device which delivers the sound to the listener’s ear, see Figure 3.10. To avoid conflicts between the array-output and the listening device, the frequency response of the array for sound inciding from the target direction should be flat. The array can be thought of as a sophisticated directional microphone for a conventional listening device. The spectrum of the desired sound is the same as with an omni-directional microphone. D. The array enables binaural listening Chapter 2 has shown that binaural listening is important for speech intelligibility in noise, localization, and the natural experience of sound. Therefore, the array has to be designed such that there are different output signals for each ear. A schematic overview of the microphone array is given in Figure 3.10. omnidirectional microphone

72

array processing for right ear

hearing aid or listening device

array processing for left ear

hearing aid or listening device

desired sound source 140

plane wave

Figure 3.10: Schematic overview of a person wearing the microphone array with the desired sound source in front of him. The microphone contains only omnidirectional microphones and it has different signal processing for the left ear and right ear signal. The output signals of the array are sent to different hearing aids or listening devices. The desired sound source as well as the noise sources are supposed to be in the Fraunhofer region and therefore their waves are seen as plane waves.

The next section defines the mathematical descriptions of these arrays which are used to analyze their performance.

3.3 Microphone array geometry This section sets a mathematical framework to analyze microphone arrays. Several geometrical solutions are described. Because the microphone arrays are discrete, the mathematical formulation is done with vector and matrix notation which leads to compact equations.

3.3 Microphone array geometry

41

Microphone arrays which are used to pick-up sound from one source (target) and to attenuate interfering sound from other directions are divided into two types: • broadside array which is positioned perpendicular to the target; • endfire array which is positioned colinear to the target. Figure 3.11 shows a broadside array. target z z x

x

F1(ω) F2(ω) F3(ω) FN-1(ω) FN(ω)

x2

x1

x3

xN-1 xN z θ

y

Σ

x output signal

φ y

a)

b)

Figure 3.11: a) Signal processing scheme of a broadside array. b) Geometry of a broadside array.

Figure 3.11a shows the signal processing scheme of a broadside array with N microphones. The nth microphone in the array is processed by F n ( ω ) which can be seen as a sample of F ( x, ω ) of Section 3.1. Next, all processed signals are summed into one output signal. The different signal processing functions are gathered in one column vector F(ω) : F ( x1, ω ) F(ω) =

F ( x2, ω ) … F ( xN, ω )

F1 ( ω ) F2 ( ω )

=

.

… FN ( ω )

(3.29)

The vector F(ω) can be interpreted as a vector of frequency dependent weights or filters of each microphone signal. In other words, F(ω) represents the filters on the microphone signals. Figure 3.11b shows that a plane wave inciding from direction (θ, φ) on the different microphones of the array is described by: e j ( ω ⁄ c )x1 W(θ, φ, ω) = e

sin θ cos φ

j ( ω ⁄ c )x2 sin θ cos φ



e j ( ω ⁄ c )xN

.

(3.30)

sin θ cos φ

The vector W(θ, φ, ω) describes the relative propagation delays from the source at ( θ, φ ) to each microphone. The propagation delay for the target direction ( 0, 0 ) is usually described

42

3 Array technology

with an apart symbol W(ω) : 1 W(0, 0, ω) = W(ω) = 1 . … 1

(3.31)

Figure 3.12 shows the geometry of an endfire array. target

z

z

FN(ω)

output signal

zN-1

F 3 (ω )

z3

F 2 (ω )

z2

F 1 (ω )

θ

zN

FN-1(ω)

Σ

z

a)

b)

φ y

z1

x

x

x

y

Figure 3.12: a) Signal processing scheme of an endfire array. b) Geometry of an endfire array.

The signal processing of the endfire array is similar to the broadside array. Each microphone signal is processed by its own filter. All filters are described in filter vector F(ω) . The geometry of an endfire array is different to the broadside array and with help of Figure 3.12b the propagation delay vector can be defined: e

ω j ---- z 1 cos θ c

W(θ, φ, ω) = e

ω j ---- z 2 cos θ c

.

(3.32)

… e

ω j ---- z N cos θ c

The phase factors in W(θ, φ, ω) depend only on the angle θ because of symmetry. Again, an exclusive definition exists for the target direction:

3.3 Microphone array geometry

43

e

ω j ---- z1 c

W(0, 0, ω) = W(ω) = e

ω j ---- z2 c

.

(3.33)

… e

ω j ---- zN c

The former descriptions of W are valid for omnidirectional microphones. Although the use of directional microphones is excluded in this thesis, see Section 3.2, microphones are sometimes directional unintentionally. For example, when the microphone is integrated into a baffle, the baffle can cause the microphone to be directive when the size of the baffle is not small in relation to the wavelength. In that case, the microphone can be modeled as a directional microphone. Then, the phase factors are multiplied by the individual directivity patterns Γ n ( θ, φ, ω ) , n = 1…N , of the microphones. This yields for the endfire array Γ 1 ( θ, φ, ω )e W(θ, φ, ω) =

Γ 2 ( θ, φ, ω )e

ω j ---- z 1 cos θ c ω j ---- z 2 cos θ c

(3.34a)

… Γ N ( θ, φ, ω )e

ω j ---- z N cos θ c

and for the broadside array: Γ 1 ( θ, φ, ω )e W(θ, φ, ω) =

Γ 2 ( θ, φ, ω )e

ω j ---- x 1 sin θ cos φ c ω j ---- x 2 sin θ cos φ c

.

(3.34b)

… Γ N ( θ, φ, ω )e

ω j ---- x N sin θ cos φ c

The definitions of this section can be used to define the directivity pattern of the discrete array similar to the definition of Section 3.1.3: Γ ( θ, φ, ω ) = F T ( ω )W ( θ, φ, ω ) ,

(3.35)

where T represents transpose. Remember that Γ ( θ, φ, ω ) is the array frequency response to a plane wave incident from (θ, φ). Therefore, the array frequency response Γ T ( ω ) to plane waves from the target direction, i.e. (0,0) for an endfire array on the z-axis, is written as Γ T ( ω ) = F T ( ω )W ( ω ) ,

(3.36)

Γ T ( ω ) is called the array target response. In many situations its equivalent in dB’s is used:

44

3 Array technology

10 log ( Γ T ( ω ) 2 ) ,

(3.37)

where log(.) is the logarithmic function with base 10. Hence, the directivity pattern is described by propagation delay vector W(θ, φ, ω) and filter vector F(ω) . Independent of the choice of the filter vector, it is already possible to mention some characteristics of broadside and endfire arrays. The broadside array and endfire array are both cylindrical symmetrical. The broadside array is cylindrical symmetrical around the x-axis which implicates that the broadside array cannot differentiate between sound coming from the front (target) and sound coming from the back, see Figure 3.11. The endfire array is cylindrical symmetrical around the z-axis and the directivity pattern is independent of the angle φ. This implicates that the endfire array cannot differentiate between sound coming from the left (+θ) and sound coming from the right (-θ). Equation (3.32) confirms this, because cos ( θ ) = cos ( – θ ) . The beamwidth of the main lobe in the horizontal x-z plane depends also on the propagation delay vector W(θ, φ, ω) . This means that the beam width of a broadside array is related to sin θ and that the beam width of the endfire array is related to cos θ . In general, the width of the main lobe of a broadside array is smaller than the main lobe of an endfire array provided that the length and number of microphones of the arrays are the same. The vector definitions of this section are used in the next section to describe array performances in matrix form.

3.4 Measures in array technology The diagrams of the directivity pattern are visual aids to analyze the directivity of a microphone array. There have also been defined several indices to define the performance of the array. Two indices describe the amount of directivity: the directivity index and the front-random index. 3.4.1 Directivity index and front-random index

The directivity index is a dB measure of the directivity factor. The directivity factor is a measure for the directivity of acoustical devices (Beranek, 1954). The directivity factor Q is defined 2 as the ratio of the maximum of the squared array response Γ(θ, φ, ω) with respect to the 2 angles θ, φ to the average squared array response Γ(θ, φ, ω) due to sound incident from all directions: max { F H ( ω )W∗ ( θ, φ, ω )W T ( θ, φ, ω )F ( ω ) } θ, φ -. Q ( ω ) = -------------------------------------------------------------------------------------------------------------------T ( ω )F ( ω ) F H ( ω )S zz where ∗ represents complex conjugate and

H

(3.38)

represents the Hermitian (complex conjugate

3.4 Measures in array technology

45

transpose). Szz(ω) is the cross-spectral density matrix for background noise: 2π π

1 S zz ( ω ) = [ S mn ] = -----4π

∫ ∫ Wm ( θ, φ, ω )Wn∗ ( θ, φ, ω ) sin θ dθ dφ .

(3.39)

0 0

The subscripts m and n are matrix indices. This definition of Szz(ω) assumes that the background noise is uniform and isotropic. In other words, the background noise is assumed to be a diffuse sound field. For a broadside or endfire array with omnidirectional microphones, S mn : sin [ k ( d m – d n ) ] S mn = -------------------------------------- , k ( dm – dn )

(3.40)

where dm is the position of the mth microphone. The directivity factor is mostly transformed to the directivity index QI : Q I = 10 log Q .

(3.41)

The directivity index is expressed in decibels. When the array is applied in a diffuse noise field, the directivity index yields the attenuation of the background noise by the array in dB. The second index is the front-random or target-random factor which is defined as the ratio of 2 the squared array response Γ T ( ω ) due to sound incident from the target direction to the aver2 age squared array response Γ(θ, φ, ω) due to sound incident from all directions: F H ( ω )W∗ ( ω )W T ( ω )F ( ω -) QFR ( ω ) = -----------------------------------------------------------------. T ( ω )F ( ω ) F H ( ω )S zz

(3.42)

There is also a front-random index that represents the front-random factor in decibels: Q FR, I = 10 log Q FR .

(3.43)

In general, the front-random index equals the directivity index, because the array is usually designed such that the target direction is the most sensitive direction of the array. In this thesis, it is assumed that the front-random index is the same as the directivity index and it is explicitly mentioned if they are not. When microphone arrays are used to enhance speech intelligibility in noise, it is necessary to have one broadband measure to assess the performance of the array. Such a measure can be defined with the use of the intelligibility-weighted gain of Section 2.2.3. Hence the intelligibility-weighted directivity index is: Nf

QI =

∑ γ i Q I( f i ) i=1

and the intelligibility-weighted front-random index is:

(3.44)

46

3 Array technology

Nf

Q FR, I =

∑ γi QFR, I(fi) .

(3.45)

i=1

In this thesis, the weights of Stadler and Rabinowitz (1993) are applied. 3.4.2 Noise sensitivity

Microphone arrays are used to suppress unwanted noise sources. The noise sources introduce a correlation between the microphone signals as is described by the vector W . Except for these correlated noise sources, there is also uncorrelated noise in the microphone signals. This noise is called internal noise and it arises from each microphone’s pre-amplifier or from wind turbulence. A measure to quantify the amplification of this uncorrelated noise by the array is the noise sensitivity. Since the array response to mutually uncorrelated white noise sources in front of the microphones equals F H F , and the array response to signals from the target direction is F T W , it follows that the noise sensitivity can be written as: F H ( ω )F ( ω ) -. Ψ ( ω ) = -----------------------------------------------------------------H F ( ω )W∗ ( ω )W T ( ω )F ( ω )

(3.46)

Mostly, the noise sensitivity is also expressed in dB’s: Ψ I = 10 log Ψ ( ω ) .

(3.47)

Besides the amplification of the internal noise, the noise sensitivity also reflects another property: the robustness of the array (Cox, 1973). The term robustness is related to the sensitivity of the array to errors in the signal processing and deviations in the microphone’s frequency responses. This effect can be understood by realizing that to some extent these errors/deviations are also uncorrelated between the microphones. Arrays with a high noise sensitivity are not robust which results in a large decrease in directivity when there are deviations in the microphone frequency responses or in the signal processing techniques. The amount of allowable noise sensitivity in a microphone array has to be determined by taking the following factors into account: • The internal noise of the microphones themselves. The total internal noise level is the sum of the internal noise level of the microphones plus the noise sensitivity. When the microphones have a high noise level themselves, the noise sensitivity must be restricted. • The external noise sources. When the sound level of the external noise sources is low, the internal noise becomes audible and therefore little noise sensitivity is allowed. When there is a lot of external noise, the noise sensitivity can be a lot higher before being audible. • The deviations in the frequency responses of the microphones. Microphones do not have exactly the same frequency response due to small deviations in the fabrication process. These deviations lead to a deterioration of the directivity and arrays with high noise sensitivity are more sensitive to these deviations. Thus, how better the frequency responses of the

3.5 Focusing and beamforming methods

47

microphones match, how more successful array processing with a high noise sensitivity can be applied. • The deviations in the signal processing electronics. The filters F(ω) can often not be implemented exactly due to mismatching of resistor or capacitor values. These mismatches have the same influences as mismatches in the microphones’ frequency responses. It is often very difficult to make a good estimate of these factors. Then, it has to be tested in practice whether the noise sensitivity is acceptable or not. The measures of this section play an important role in the analysis of array-configurations in this thesis. They are also used in the next section which describes several signal processing methods to achieve directivity with microphone arrays.

3.5 Focusing and beamforming methods The amount of directivity depends on: • the length of the array; • the wavelength or the frequency of the sound; • the position and the number of microphones; • the direction of the target; • the signal processing functions in F(ω) . The length of the array and the frequency of the sound are prescribed in this problem. So the question is how to place the microphones and how to choose F(ω) to obtain as much directivity as possible. Directivity or beamforming methods can be divided in fixed and adaptive methods. Fixed beamforming methods have a F(ω) which is fixed in time. Adaptive beamforming adapt the filters F(ω) to the present noise sources. This section focuses only fixed beamforming methods. For information on adaptive beamforming methods, the reader is referred to Widrow and Stearns (1985) and Monzingo and Miller (1980). There are roughly three fixed beamforming methods: • focusing, delay-and-sum or beamsteering method; • gradient method; • optimal beamforming method. 3.5.1 Focusing, delay-and-sum or beamsteering method

The focusing, delay-and-sum or beamsteering method is a method which has different names depending on the application, but the signal processing functions are the same. A. Focusing

48

3 Array technology

In general, focusing is a way to observe one specific point in space by remote sensing, see Figure 3.13. rA

x φ

A

L

θ

focus width

z y

focus length Figure 3.13: Focusing with a broadside array of length L to a source A which is at distance rA from the array.

The microphone signals are processed such that the signals from source A are added constructively. It must be noted that focusing to a point requests a planar receiver array. As the microphone arrays in this project are one dimensional, the array responses are cylindrical symmetrical around the x-axis. Thus, the array shown in Figure 3.13 focuses to all points at a circle with radius rA. Focusing is only possible when the travel times of the source to the microphones depend on the distance rA. This has two consequences: • focusing is not possible with an endfire array; • focusing is not possible for sources in the Fraunhofer area. It is possible to use the focusing method for short broadside arrays and endfire arrays. But this will lead to an infinite focus length; the array has no depth discrimination. Then, the method is called the delay-and-sum method. B. Delay-and-sum Berkhout (1997) has formulated the focusing operator. The same operator is used to determine the filters for delay-and-sum processing: W∗(ω) F ds(ω) = ------------------------------. H W (ω)W(ω)

(3.48)

Using Equation (3.31) and Equation (3.33), the following filters for an endfire or broadside array with N microphones are derived: 1 F ds, e ndfire

1 = ---- e N e

ω – j ---- ( z 2 – z 1 ) c



ω – j ---- ( z N – z1 ) c

or F ds, b road

1 1 1 = ---. N … 1

(3.49)

The first and hindmost microphone is labelled as reference microphone and therefore the filter

3.5 Focusing and beamforming methods

49

of that microphone is set to unity. The filters show that the processing of the delay-and-sum method is very simple. The microphone signals of a broadside array only have to be summed and the microphone signals of the endfire array need only a delay which can be implemented as a tapped time delay line. The delay line contains N-1 delays and after each delay, another microphone signal is added to the line. Another advantage of the delay-and-sum method is that it minimizes the noise sensitivity. Substituting Equation (3.48) in Equation (3.46) yields for the noise sensitivity of an endfire array as well as broadside array: ψ(ω) = 1 ⁄ N or ψI(ω) = – 10 log N .

(3.50)

This means that the delay-and sum method improves the SNR of the output signal by 10 log N with respect to the SNR of one microphone signal. Notice that this SNR relates to the internal noise sources and not to the external noise sources. The suppression of the external noise sources is described by the directivity index. The directivity index versus the frequency of a broadside array and of an endfire array using the delay-and-sum method is shown in Figure 3.14.

10

5 mic. 6 mic. 7 mic.

8 6 4 2 0

a)

12 directivity index (dB)

directivity index (dB)

12

250

10

6 4 2 0

500 1000 2000 4000 frequency (Hz)

3 mic. 4 mic. 5 mic.

8

250

500 1000 2000 4000 frequency (Hz)

b)

Figure 3.14: Directivity index versus the frequency of arrays using the delay-and-sum method. a) Broadside array of 14 cm long and 5,6, and 7 omnidirectional microphones. b) Endfire array of 7.2 cm long and 3, 4, and 5 omnidirectional microphones.

The broadside array has a length of 14 cm and the directivity index has been calculated with 5, 6, or 7 microphones which means that spatial aliasing is avoided. The directivity index increases with the frequency; below 2 kHz, there is hardly any directivity. This is also the main disadvantage of the delay-and-sum method; the length of the array has to be long in comparison with the wavelength to realize directivity. The number of microphones does not really influence the directivity. The simulations with the endfire array of 7.2 cm and 3, 4 or 5 microphones confirms these same conclusions. There is no directivity at low frequencies and the number of microphones does not

50

3 Array technology

influence the directivity. It is, however, striking that the directivity index of the endfire array is approximately as high as the directivity index of the broadside array, while the broadside array is circa twice as long as the endfire array. In general, endfire arrays are more directive than broadside arrays. This is further discussed in Section 3.6. Sometimes the target is not right in front of the array. Then, a variant of the delay-and-sum is used. This method is called the beamsteering method which is described in the next section. C. Beamsteering o Broadside and endfire arrays are usually used with the target at θ = 0 , see Figure 3.11 and Figure 3.12. Sometimes, broadside arrays are used with the target off-axis. Then the delay-andsum method is called beamsteering. Endfire arrays cannot apply beamsteering, see Figure 3.12. When the beam is steered in the direction (θbeam, φbeam), then there is also a beam in the direction (−θbeam, φbeam) due to symmetry. So the beamsteering filters F beam(ω) for the broadside array look like: 1 1 F beam(ω) = ---N

ω – j ---- ( x 2 – x 1 ) sin θ beam cos φ beam e c

… e

.

(3.51)

ω – j ---- ( x N – x 1 ) sin θ beam cos φ beam c

Notice that beamsteering does not steer the beam to a single direction. Due to circle symmetry around the x-axis, the beam is steered at every direction which fulfills the condition sin θ cos φ = sin θ beam cos φ beam . The filters for the delay-and-sum and beamsteering method are essentially phase filters. The amplitude of the filters are the same for all microphone signals. With other words, there is a rectangular amplitude window over the array. It is well known that the form of the amplitude window in the time domain determines the form of the spectrum in the frequency domain (Harris, 1978). Because the directivity pattern is a Fourier transform of the filters F n(ω) with respect to the spatial coordinate x or z, the same windowing techniques can be applied here. An amplitude window influences the proportion between the beam width of the main lobe and the level of the side lobes of the directivity pattern. Some windows realize a small main lobe with high side lobe levels and other windows realize low side lobe levels but with a wider main lobe. There are amplitude windows which increase the directivity index a little bit in comparison with a rectangular window, but they do not realize broadband directivity (Soede, 1990). For a more detailed overview of amplitude windows for arrays, the reader is referred to Ziomek (1995).

3.5 Focusing and beamforming methods

51

This section has shown that the delay-and-sum method realizes only directivity at high frequencies. The next section describes a method which realized broadband constant directivity for short endfire arrays. 3.5.2 Gradient method

This section presents a method which realizes broadband constant directivity for endfire arrays which are short in comparison with the wavelength. This method makes use of the spatial gradient of the sound field, that is calculated from the microphone signals. In general, a (N-1)th order gradient can be calculated with a N microphone array. Arrays using gradient processing are also called Jacobi-arrays (Weston, 1986). This section focuses on the most popular gradient method: the first order gradient method which uses two omnidirectional microphones. The first order gradient method makes use of the principle that an approximation of both the pressure and a component of the particle velocity of the wave field can be obtained by two omnidirectional microphones. This principle is also used in 2-channel intensity probes as described in Fahy (1995). Figure 3.15 shows a two microphone array placed on the z-axis with a plane wave with propagation vector k . z

θ r x

2

L/2

k = knˆ

L/2 1

Figure 3.15: A plane wave encounters two omnidirectional microphones on the z-axis. The distance between the microphones is L, the angle of the inciding wave is θ.

The gradient method uses the pressure and the particle velocity in the direction colinear to the array to obtain directivity. The pressure P(ω), halfway between the microphones, is approximated as: P1 ( ω ) + P2 ( ω ) -. P ( ω ) = -----------------------------------2

(3.52)

where P1(ω) and P2(ω) are the pressures measured by microphone 1 and 2 respectively. The particle velocity in the direction colinear to the array, i.e. the z-direction, is found using New-

52

3 Array technology

ton’s law: –1 ∂ V z ( r, ω ) = ------------ P ( r, ω ) , jωρ 0 ∂ z

(3.53)

where ρ0 is the density of the medium. The particle velocity component Vz(ω), halfway between the microphones, is then obtained by a finite difference approximation of Equation (3.53): –1 V z ( ω ) ≈ --------------- ( P 2 ( ω ) – P 1 ( ω ) ) . jωρ0 L

(3.54)

The pressure of the plane wave at position r can be described by Equation (3.2): P ( r, ω ) = S ( ω )e

– jk ⋅ r

.

(3.55)

Substituting this in Equation (3.52) and Equation (3.54) and assuming that kL/20.05). Therefore

6.2 Speech intelligibility test

113

the results of both groups may be compared. The main interest of this test was the difference in SRT between the diotic and the binaural condition. Figure 6.9 shows this difference for the broadside and endfire array. broadside array

endfire array 7.5 SRTdiotic−SRTbinaural (dB)

SRTdiotic−SRTbinaural (dB)

7.5 5 2.5 0 −2.5 −5

a) Figure 6.9:

5 2.5 0 −2.5 −5

0 30 60 90 angle of noise source (o)

0 30 60 90 angle of noise source (o)

b) The mean difference in SRT with standard deviation between diotic and binaural arrays. a) broadside array. b) endfire array.

Figure 6.9a shows that the SRT’s for the binaural broadside array are higher than the SRT’s for the diotic broadside array. This means that the ILD of the binaural broadside array is not advantageous for speech intelligibility in comparison with the diotic broadside array. The binaural broadside array can also be compared with the diotic presentation of the channel of the binaural broadside array with the best SNR for source angles 15o and 30o (condition 4, 7, and 8). The mean SRT’s of the binaural broadside array are -7.9 dB for 15o and -10.5 dB for 30o. The mean SRT’s of the diotic presentation of the channel of the binaural broadside array with the best SNR are -8.6 dB for 15o and -10.6 dB for 30o. The difference 0.7 dB for the noise source at 15 degrees is significant (p=0.017) but the difference 0.1 dB for the noise source at 30 degrees is not significant (p>0.5). This suggests that the subjects used not only the ear-signal with the best SNR when listening to the binaural broadside array. This is in agreement with the results of the determination of the masking threshold in Section 4.2. In conclusion, the subjects were not able to use the better SNR to achieve a lower SRT. An explanation could be that the main lobe is at 15o (not 0o) and therefore the frequency response in the target direction has a low-pass characteristic with a -3 dB cut-off frequency at 4 kHz. Figure 6.9b shows that the SRT’s for the binaural endfire array are lower than the SRT’s for the diotic endfire array. This means that the ITD of the endfire array is advantageous for the speech intelligibility. This binaural advantage is at least 5 dB for the noise source positions 30, 60, and 90 degrees and there is no significant difference between these positions (one-way analysis of variance, p>0.1).

114

6 Perceptual evaluation of the binaural arrays

This binaural advantage due to the ITD is also very close to the binaural advantage found by Bronkhorst and Plomp (1988). Their binaural advantage due to natural ITD was also 5 dB and almost independent of the angle of the competing noise source, see Figure 2.7a.

6.3 Conclusions This chapter has presented a perceptual evaluation of the binaural broadside array and binaural endfire array as presented in Chapter 4 and Chapter 5. The perceptual evaluation consisted of a localization test and a speech intelligibility test. The localization test showed that localization in the horizontal plane is possible with both the binaural broadside and endfire array. However, localization with the binaural endfire array is a much more natural task and it leads to significant smaller errors. This is probably due to the presence of Interaural Time Difference in the binaural endfire array. Localization with the binaural broadside array improves after a training of the subjects. The training secures an unambiguous localization in the horizontal plane for the binaural broadside array too. The binaural endfire array and the binaural broadside array have been tested in a speech intelligibility test with normal hearing subjects. The Speech Reception Thresholds have been determined with one noise source at different angles (0, 30, 60, 90 degrees) for the binaural and diotic broadside array and for the binaural and diotic endfire array. Twenty normal hearing subjects have performed the test with the endfire arrays and twenty normal hearing subjects have performed the test with the broadside arrays. The most important results are: • the SRT of the binaural broadside array is higher than the SRT of the diotic broadside array: ILD is not advantageous for speech intelligibility; • the SRT of the binaural endfire array is lower than the SRT of the diotic endfire array: the ITD is advantageous for speech intelligibility; • The SRT of the diotic broadside array is lower than the SRT of the diotic endfire array which corresponds with the smaller main lobe of the broadside array in comparison with the endfire array. In conclusion, the binaural endfire array has better binaural properties than the binaural broadside array. Therefore, the binaural endfire array is chosen as a first prototype for an assisting listening device based on microphone arrays. Furthermore, Section 3.2 already provided several arguments in favor of the endfire array. The most important argument being the fact that the endfire array is cosmetically more acceptable. The binaural endfire array of Chapter 5 has broadband directivity with very simple signal processing. It is, however, not clear whether this array has optimal directivity and whether it is possible to improve this directivity without a too large extension of the signal processing scheme. Chapter 7 describes the optimization of the directivity of the endfire array using recursive and non-recursive filters for the processing of the microphone signals.

115

7 7Optimization of the directivity of the endfire array

Section 3.5 has presented three beamforming methods to realize directivity with a microphone array: the focusing method, the gradient method, and the optimal beamforming method. The optimal beamforming method realizes maximum directivity given a certain allowed noise sensitivity. However, this method does not indicate the implementation of these optimal filters. Moreover, it is not clear how the performance of the optimal beamforming method compares with the other two beamforming methods or, for example, the endfire array of Chapter 5 which realizes directivity with a combination of the focusing method and the gradient method. This chapter presents the research on the optimization of the directivity of the endfire array with four omnidirectional microphones as presented in Section 5.1.3 on page 87. The research consists of approximation methods to implement the optimal beamforming method with recursive filters or with non-recursive filters. Furthermore, it investigates the directivity of the array when it is mounted near the head of a KEMAR manikin. This research has been performed in close cooperation with Dion de Roo as part of his graduation project (de Roo, 1998).

7.1 Introduction The endfire array of Chapter 5 realizes broadband directivity with only four omnidirectional microphones and a simple analog signal processing scheme. However, two questions remain: 1. what is the limit to the directivity of this array? 2. what kind of signal processing scheme is requested to reach that limit?

116

7 Optimization of the directivity of the endfire array

The first question can be answered with the optimal beamforming method of Section 3.5. This method calculates the signal processing filters such that the directivity is optimal for a microphone array of a certain configuration and a certain allowed noise sensitivity. The signal processing filters are calculated with Equation (3.63): W H ( ω ) ( S zz ( ω ) + β(ω)I ) –1 -. FoT, β ( ω ) = --------------------------------------------------------------------------------W H ( ω ) ( S zz ( ω ) + β(ω)I ) –1 W ( ω )

(7.1)

The vector W(ω) consists of the relative propagation delays of the microphones in the target direction, S zz(ω) is the cross-spectral density matrix, mostly assumed as uniform isotropic, and β(ω) is the assumed relative self-noise of the microphones. A realistic value has to be chosen for β(ω) , otherwise the noise sensitivity is too large which results in a high internal noise level and high sensitivity to deviations in the transfer functions of the microphones and filters. The optimal beamforming method was used to calculate the potential increase of the directivity of the endfire array in comparison with the PGP-method. The optimal filters were calculated with β(ω) such that the noise sensitivity of the optimal endfire array matches the noise sensitivity of the endfire array with the PGP-method, see Figure 5.7 on page 89. The directivity index of this optimal array is shown in Figure 7.1.

directivity index (dB)

12 10 optimal →

8

← PGP

6 4 2 0

Figure 7.1:

0.25

0.5 1 2 frequency (kHz)

3 4 5

Directivity index of the endfire array with four omnidirectional microphones with the optimal beamforming method (solid) and the PGP-method (dash-dot) provided that the noise sensitivity of the optimal beamforming method matches to the noise sensitivity of the PGP-method.

Figure 7.1 shows clearly that the optimal beamforming method provides more directivity than the PGP-method at the same noise sensitivity. The intelligibility-weighted front-random index is also higher 8.0 dB over 7.1 dB. Hence, it is lucrative to use the optimal beamforming method to increase the directivity of the endfire array. So, the second question remains. What kind of signal processing scheme is necessary to imple-

7.1 Introduction

117

ment these optimal filters? The optimal filters have to be implemented as linear time-invariant filters. The next section briefly describes linear time-invariant filters and their characteristics. 7.1.1 Linear Time-Invariant Filters

Linear time-invariant filters are treated in many text books, e.g. (Oppenheim and Schafer, 1975; Orfanidis, 1996). Furthermore, many tools have been developed to design and analyze filters with a computer (MATLAB, 1996b). This section and the following two sections describe briefly some characteristics of the filters and they refer to the literature for more detailed information. In general, filters can be divided into continuous-time and discrete-time filters. Continuoustime filters are usually implemented in analogue electronics and discrete-time filters are usually implemented in digital electronics, for example in a Digital Signal Processor (DSP). Therefore this section refers to continuous-time filters as analog filters and to discrete-time filters as digital filters. Analog filters are described with a differential equation and digital filters described with a difference equation: 2

K

y(t) = b 0 x(t) + b 1 d x(t) + b 2 d x(t) + … + b K d x(t) 2 K dt dt dt 2 M   –  a 1 d y(t) + a 2 d y(t) … + a M d y(t) 2 M  dt  dt dt

(7.2a)

and y [ n ] = b0 x [ n ] + b1 x [ n – 1 ] + b2 x [ n – 2 ] + … + bK x [ n – K ] – ( a 1 y [ n – 1 ] + a 2 y [ n – 2 ]… + a M y [ n – M ] )

.

(7.2b)

Here x(t) and y(t) are the input- and output-signal of the analog filter, x[n] and y[n] are the inputand output-signal of the discrete-time filter at sample number n. The b ’s and am’s are the filter coefficients, K and M are the numbers of b ’s and am’s. The order of the filter is the maximum of M and K.

118

7 Optimization of the directivity of the endfire array

These filters can also be depicted in the following signal processing scheme. b0

x(t)

y(t)

b0

x[n]

-a1

s

z-1

b2

-a2

s

z-1

bK

-aM

s

z-1

s

b1

s

s

+

non-recursive Figure 7.2:

recursive

+

y[n]

b1

-a1

b2

-a2

bK

-aM

non-recursive

z-1 z-1

z-1

recursive

General structure of a linear time-invariant filter (from Orfanidis (1996)). a) Analog filter with complex variable s in s-plane (Laplace transform). b) Digital filter with complex variable z in z-plane (z-transform).

Here s and z are the complex variables in the s- and z-plane. The filters have a non-recursive and a recursive part. The non-recursive part is the part of the filter at the left of the summation sign and it operates on the input-signal. The recursive part is the part of the filter at the right of the summation sign and it operates on the output signal. From these signal processing schemes, the transfer functions of the filters can be derived: 2

K

b0 + b1 s + b2 s + … + bK s H(s) = ------------------------------------------------------------------2 M 1 + a1 s + a2 s + … + bM s –1

–2

(7.3a) –K

b0 + b1 z + b2 z + … + bK z H(z) = ---------------------------------------------------------------------------–1 –2 –M 1 + a1 z + a2 z + … + aM z

(7.3b)

Here, the denominator is the recursive part and the numerator is the non-recursive part of the filter. Another representations of the transfer function is with the pole-zero notation b K ( s – z 1 ) ( s – z 2 )… ( s – z K ) H(s) = ----------------------------------------------------------------------a M ( s – p 1 ) ( s – p2 )… ( s – p M ) –1

–1

(7.4a) –1

b 0 ( 1 – z z 1 ) ( 1 – z z 2 )… ( 1 – z z K ) H(z) = ------------------------------------------------------------------------------------. –1 –1 –1 ( 1 – z p 1 ) ( 1 – z p 2 )… ( 1 – z p M )

(7.4b)

Here pm is the mth pole and z is the Nth zero of the transfer function. The frequency responses jΩ of these filters can be obtained by replacing s by jω into H(s) and z by e into H(z) , where Ω is the digital frequency. The linear filters can also be divided in another way: • recursive filters which can be divided in pole-zero filters and all-pole filters

7.1 Introduction

119

( b1 = b 2 = … = bK = 0 ) ; • non-recursive or all-zero filters ( a 1 = a 2 = … = a M = 0 ) . In practice, the recursive filters can be implemented in analog electronics as well as in digital electronics, the non-recursive filters can only be implemented in digital electronics. The next two sections describe briefly the properties of the recursive and non-recursive filters. 7.1.2 Recursive filters: analog filters or digital Infinite Impulse Response Filters

Recursive filters can be implemented as analog filters as well as digital filters. The digital recursive filters are called Infinite Impulse Response (IIR)-filters. It is possible to translate filters from the analog to the digital domain by approximating derivatives by differences or by using the bilinear transformation (Oppenheim and Schafer, 1975). This chapter only treats the analog recursive filters. The most important properties of recursive filters are: • Recursive filters of a low order can already realize rather sharp transfer functions. • Recursive filters have little control over the phase of the frequency response which is very important for array processing. • Recursive filters are not always stable. Several computational tools have been developed to design recursive filters (MATLAB, 1996b). Most of these tools are developed to design standard recursive filters like low-pass or band-pass filters or they design filters which filter only the amplitude of the signal correctly and they disregard phase information. Fortunately, there is also a tool which can approximate a desired (complex) frequency response by an analog recursive filter of a given order. This tool is non-linear: it needs several iterations and it does not guarantee that the absolute optimal solution is reached. This tool is used in Section 7.2 to approximate the filters of the optimal beamforming method with analog recursive filters of a low order. 7.1.3 Non-recursive filters or digital Finite Impulse Response-filters (FIR)

The non-recursive filter operates only on the input signal and it is implemented in digital electronics. Digital recursive filters are usually called Finite Impulse Response (FIR)-filters. The most important properties of these FIR-filters are: • FIR filters require a much higher order than recursive filters to achieve a given level of performance (of the frequency response); • FIR filters need a large number of taps to be applicable at low frequencies. • FIR-filters have a very good control over the frequency domain, both in amplitude and phase. • FIR filters are always stable; In general, FIR filters are precise but expensive to implement, recursive filters are cheap to

120

7 Optimization of the directivity of the endfire array

implement, but imprecise. FIR-filters need a large number of taps to be applicable at low frequencies. As a rule of thumb, it can be stated that a FIR filter with K+1 coefficients or taps performs from a minimum frequency fmin, if the period 1 ⁄ f min fits twice the time interval of the filter 2 ( K + 1 ) ⁄ f s , where fs is the sample frequency. In formula, 2f s f min ≈ ------------- , K+1

(7.5)

If the number of taps exceeds several hundreds, it is more efficient to filter in the frequency domain, i.e. to use a Fast Fourier Transform (Orfanidis, 1996). Many computational tools have also been developed to design and analyze FIR-filters (MATLAB, 1996b). These tools are usually linear and therefore they do not need iterations to obtain the final result. There is a tool which can design standard filters and there is a tool which can design filters with an arbitrarily shaped (complex) frequency response. The latter tool will be used in Section 7.3 to design the filters of the optimal beamforming method with digital FIRfilters. The hearing aid industry could not use FIR-filters until recently because: 1. FIR filters could not be miniaturized in a chip small enough to integrate in a hearing aid; 2. FIR filters could not be designed such that their power consumption is low enough for hearing-aid applications. Recently, Starkey has been able to design a DSP-chip which performs a filter operation with a FIR-filter of 64 taps and which is integrable in a hearing aid (Edwards, 1988). This development shows that FIR-filters become a feasible option for hearing aids and assistive listening devices such as microphone arrays. Because of these fast developments, there will be made no attempt to design short FIR filters but the FIR filters will be designed to obtain maximum directivity for the endfire array regardless of the necessary computational power.

7.2 Optimization on the basis of recursive filters This section describes the design of the filters for the optimal beamforming method with use of recursive filters. Section 7.2.1 describes the design method, Section 7.2.2 describes the implementation, and Section 7.2.3 presents the measurements. 7.2.1 Design method

The optimal filters F oT, β ( ω ) = F 1(ω) … F N(ω) which are calculated with Equation (7.1) have to be approximated with recursive filters. The method of MATLAB (1996b) to approxi-

7.2 Optimization on the basis of recursive filters

121

mate an arbitrary complex frequency response with a recursive filter is a least-squares method and it minimizes the following criterion: Nf

min b K, a M

J(b K, a M) =

∑ wi

2

2

F n(ωi) – H(ω i, b K, a M) .

(7.6)

i=1

Here H(ω i, b K, a M) is the frequency response of the transfer function in Equation (7.3a) with T T coefficients b K = b 0 b 1 b 2 … b K and a M = a 1 a 2 … a M . wi is the weight for the ith frequency point ωi and Nf is the number of frequency points. The method contains two algorithms: • The first algorithm is a linear algorithm which creates a system of linear equations from Equation (7.6) and that system is solved with a linear least-squares method. However, this solution is not guaranteed stable. The solution is stabilized by mirroring the non-stable poles in the y-axis of the s-plane. This stabilized solution is the initial estimate of the second algorithm. • The second algorithm is a non-linear algorithm which iterates towards the optimum solution with the damped Gauss-Newton method. It is the goal to approximate the optimal filter Fn(ω) with a filter of a low order. Therefore it is necessary that the filters are rather smooth as a function of ω, otherwise it is not possible to approximate the filters with a low filter order. Equation (7.1) shows that the optimal filters depend on W(ω), Szz(ω), and β(ω). Because W(ω) and Szz(ω) are determined by the array configuration and the assumed isotropic noise field, the self-noise β(ω) is the only parameter to control the smoothness of the filters F oT, β ( ω ) . But β(ω) also controls the noise sensitivity which is written according to Equation (3.46) as: F H ( ω )F ( ω ) -. Ψ ( ω ) = -----------------------------------------------------------------H F ( ω )W∗ ( ω )W T ( ω )F ( ω )

(7.7)

Because the optimal beamforming method assures that the array target response F oT, β ( ω )W ( ω ) = 1 , the noise sensitivity can be simplified: Ψ ( ω ) = F oH, β ( ω )F o, β ( ω ) .

(7.8)

Hence, the noise sensitivity only depends on the filter function and when the noise sensitivity is smooth as function of ω, so is Ψ(ω). If β is chosen as a function of the radial frequency ω , then the noise sensitivity as well as the smoothness of the filters F oT, β ( ω ) can be controlled. This is also illustrated in Appendix A.2 which approximates the optimal filters of a two micro2 phone endfire array with a Taylor expansion. When β(ω) = α ( ωL ⁄ c ) is chosen, the Taylor approximation delivers smooth analog first order filters. The choice of α determines the directivity pattern. Unfortunately, the optimal filters of the endfire array with four microphones cannot be approximated as easily with a Taylor expansion, because the mathematical expressions of the filters F oT, β ( ω ) become too complicated for the endfire array with four microphones and

122

7 Optimization of the directivity of the endfire array

the condition ( ωL ) ⁄ c « 1 no longer applies for the complete frequency range. Therefore, the optimal filters of the endfire array with four microphones are approximated with γ β(ω) = αω with 10 –6 < α < 10 –2 and 0.2 < γ < 4 . The parameters α and γ are determined in a non-linear optimization routine with constraints, see Figure 7.3. Additionally, this choice for β(ω) also assures that the noise sensitivity is inversely proportional with ω, so the shape of the noise sensitivity is similar to the noise sensitivity of the endfire array with the PGP method. Initial parameters: filter order K, M, α and γ

Updating parameters α and γ in β(ω)=αωγ

Calculation of optimal filters Fn(ω), n=1 ... N

Filter approximation of optimal filters Hn(ω)~Fn(ω), n=1 ... N

no

is criterion Q I maximum and are the constraints Q I(f) > 5.5dB for 0.1kHz ≤ f < 5kHz Ψ I(f) < 18 dB for 0.1kHz ≤ f ≤ 5kHz satisfied? yes α,γ

Figure 7.3:

Optimization scheme for the approximation of optimal filters of the endfire array with nonrecursive filters.

The optimization routine is a constrained maximization algorithm from MATLAB (1996a). A criterion is maximized such that certain constraints are fulfilled. In this maximization, the criterion is the intelligibility-weighted directivity index Q I , see Equation (3.44), and the constraints are: • the directivity index Q I(f) > 5.5 dB for 0.1kHz ≤ f < 5kHz ; • the noise sensitivity Ψ I(f) < 18 dB for 0.1kHz ≤ f < 5kHz . The first constraint assures that there are no unexpected declines in the directivity index at cer-

7.2 Optimization on the basis of recursive filters

123

tain frequencies and the second constraint assures that the amount of noise sensitivity is constrained to values which are comparable with the noise sensitivity of the PGP-method. The approximated filters have been calculated for the endfire array of four microphones for increasing filter order from 2 to 5 and Figure 7.4 shows the resulting directivity index, target response, and noise sensitivity.

6 4

0 −5

−15

0.25

0.5 1 2 3 45 frequency (kHz)

−20

0.25

0.5 1 2 3 45 frequency (kHz)

b) PGP 4th order 5th order

12 10 8 6 4

0

10

0.5 1 2 3 45 frequency (kHz)

d)

5 0 −5

−20

e) Figure 7.4:

0

0.25

0.5 1 2 3 45 frequency (kHz)

0.25

0.5 1 2 3 45 frequency (kHz)

30

−15

0.25

10

c)

−10

2

20

−10

noise sensitivity (dB)

a)

directivity index (dB)

5

−10

2 0

30 noise sensitivity (dB)

8

array target response (dB)

directivity index (dB)

10

10 array target response (dB)

PGP 2nd order 3rd order

12

0.25

0.5 1 2 3 45 frequency (kHz)

20 10 0

−10

f)

Top row: results of optimization for filter orders 2 (dash-dot) and 3 (dashed). The results of the PGP-method are also shown (solid). a) Directivity index. b) Array target response. c) Noise sensitivity. Bottom row: results of optimization for filter orders 4 (dash-dot) and 5 (dashed). The results of the PGP-method are also shown (solid). d) Directivity index. e) Array target response. f) Noise sensitivity.

Figure 7.4a and Figure 7.4d show the directivity index of the endfire array with approximated filters. Clearly, the directivity index of the approximated filters is higher than the directivity index of the PGP method. This increase manifests especially in the frequency range 1-3 kHz

124

7 Optimization of the directivity of the endfire array

which is the most important frequency range for speech intelligibility in noise. This increase results in a intelligibility-weighted directivity index of 7.4, 7.8, 8.1, and 8.2 dB for the filter orders 2-5 in comparison with 7.1 dB for the PGP method. It is striking that the second order filter has a good performance up to 2 kHz and that the performance deteriorates afterwards. This is due to the fact that a second order filter can only provide a limited amount of phase shift and that especially the front microphones (no: 3 and 4) need more phase shift to introduce the necessary delay. This can also be noticed in the PGP-method where microphone signals 3 and 4 are delayed with a third order all-pass filter, see Section 5.1.3 on page 87. The higher order approximations have therefore no problems with frequencies above 2 kHz as they can deliver enough phase shift. Figure 7.4b and Figure 7.4e present the array target response of the approximations and they show that the target response is between -2 and +2 dB for all approximations, except for the second order approximation which shows a slight increase above 2 kHz. Figure 7.4c and Figure 7.4f show the noise sensitivity for the different approximations and for the PGP-method. The approximations have less noise sensitivity below 1 kHz and more noise sensitivity above 1 kHz in comparison with the PGP-method. Over the entire frequency range, the approximations have the same amount of noise sensitivity as the PGP-method and it should be possible to implement these approximations without suffering from too much internal noise or being too sensitive to deviations in the transfer functions of the microphones or in the signal processing scheme. These simulations show that an approximation of the optimum filters gains directivity in comparison to the PGP method for different filter orders. The next section describes the implementation of the third order approximation in analog electronics. 7.2.2 Implementation

This section presents the implementation of the approximated filters of the third order in analog electronics. The third order approximation has been chosen, because it provides a substantial increase in directivity and because third order filters are still convenient to implement in analog electronics. Figure 7.5 shows the general signal processing scheme.

7.2 Optimization on the basis of recursive filters

front-end

array x4(t) 1.6 cm 7.2 cm

125

pre-amp

H 4 (ω )

pre-amp

H 3 (ω )

x2(t)

pre-amp

H 2 (ω )

x1(t)

pre-amp

H 1 (ω )

x3(t)

1.6 cm

Figure 7.5:

rear-end

signal processing

y(t)

+

low-cut filter/ power-amplifier

ear

Analog signal processing with a microphone array using the third order recursive filters which are approximations of the optimal filters.

The endfire array with four omnidirectional microphones is depicted at the left. The ith microphone signal x i(t) is pre-amplified and filtered by filter H i(ω) which is the third order approximation of the optimal filter F i(ω) . Next, the filtered signals are summed and sent to the rearend of the signal processing scheme which consists of a low-cut filter and a power-amplifier. After the power-amplifier, the signal is sent to the ear over headphones or over a hearing aid. The implementation of the approximated third order filters is accomplished with a design method developed at the Electronics Laboratory of the Delft University. This method can implement a filter on the basis of its transfer function (Groenewold, 1992; Monna, 1996). This design method contains 5 steps: 1. Specifications The starting point in the design trajectory is to establish the specifications. The specifications contain, among others things, the minimum input signal level and the minimum required dynamic range, i.e the ratio between the maximum and minimum signal level. 2. Transfer function The filter transfer functions can be given in the form of numerator and denominator coefficients, see Equation (7.3a), or in the form of sets of poles and zeros with a gain factor, see Equation (7.4a). 3. Signal flow graph/filter topology If the filter transfer function is known, the filter topology has to be determined. The signal flow graph consists of various ideal integrators (the basic building block of the filter) which have to be interconnected so that the transfer function is realized. The topology of the filter already influences the sensitivity and dynamic range of the filter, but these properties are optimized using a state-space representation of the topology. 4. Mapping on ideal electronic building blocks Once the topology of the filter is known and the dynamic range is optimized, the ideal integrators which are still dimensionless have to be implemented. The implementation of the integrators cause the filters to have either current or voltage dimension. In this case the

126

7 Optimization of the directivity of the endfire array

dimension of the integrators is determined by the dimension of the input microphone signals, i.e. voltage. This step in the design process replaces integrators by voltage-to-voltage integrators and ‘translates’ the state-space representation into impedances and capacitances. 5. Implementation of electronic circuits The ideal electronic components have yet to be implemented. The active part of the integrators is implemented by an operational amplifier (Op-Amp) and the resistors and capacitors are implemented by physical resistors and capacitors. After the implementation, the endfire array was evaluated with measurements in the anechoic chamber. 7.2.3 Measurements

The directivity pattern of the endfire array with the recursive filters were measured in the anechoic chamber with the measurement set-up of Section 4.3.1 on page 73. Figure 7.6 shows the results.

measured simulated

1 kHz, −. 2 kHz, −− 4 kHz 0 30 −30 −5 1

−60

90

−90

120

−120 150

a)

±180

frequency (kHz)

−10 −15 −20

60

dB 0

frequency (kHz)



2 3 4

−150

5

0 2 4 6 8 1012 QI (dB)

1

−5

2

−10

3

−15

4

−20

5 180

−25 90

0 −90 θ (degrees)

−180

b) Figure 7.6:

Measured directivity pattern of optimized endfire array with recursive filters a) Polar patterns at 1 kHz (solid), 2 kHz (dash-dot), and 4 kHz (dashed) b) θ-f diagram with plot of measured (solid) and simulated (dash-dot) directivity index versus frequency.

The directivity pattern at 1 kHz is already more directive than the hypercardioid pattern of the PGP-method and the directivity increases for higher frequencies which results in a smaller main

7.2 Optimization on the basis of recursive filters

127

beam and more but lower side lobes. The θ-f diagram shows that the increase in directivity is the largest below 2 kHz and that the directivity pattern is almost constant above 2 kHz. The directivity index versus frequency is also shown in Figure 7.7 together with the measured target response and the noise sensitivity. 10

10 8 6 4

5 0 −5

−10

2 0

30 noise sensitivity (dB)

array target response (dB)

directivity index (dB)

12

−15

0.25

0.5 1 2 3 45 frequency (kHz)

a)

−20

b) Figure 7.7:

0.25 0.5 1 2 345 frequency (kHz)

20 10 0

−10

0.25 0.5 1 2 345 frequency (kHz)

c)

Measured directivity index (a), array target response (b), and noise sensitivity (c) of the endfire array with the approximated recursive filters of order 3.

Figure 7.7a shows the simulated and measured directivity index. The measurement and simulation show good agreement for frequencies 400 to 4 kHz. Below 400 Hz, there is less directivity because step 5 of the design method rounds off the capacitors to values which are available in standard series. This rounding off changes the transfer functions of the filters. Above 4 kHz, there is less directivity, because the omnidirectional microphones are not fully omnidirectional and because the microphones do not match exactly. The target response of the array is flat within the expected ±2 dB. There is some decrease at low frequencies, because of the low-cut filter in the rear-end. Figure 7.7c presents the measured noise sensitivity which is consistent with the simulated noise sensitivity. The small deviations comply with the small deviations in the array target response. In conclusion, the approximation of the optimal filters with recursive filters results in an increase of the directivity of the array with ca. 1 dB compared with the PGP-method. This increase can be realized with only a small increase in the complexity of the signal processing scheme. Furthermore, the recursive filters are very useful to apply in an initial digital implementation because of the low computational costs. When the DSP’s for hearing aid applications are more advanced, it should be possible to use non-recursive FIR filters which realize an even larger directivity as the next section shows.

128

7 Optimization of the directivity of the endfire array

7.3 Optimization on the basis of non-recursive filters 7.3.1 Design method

Section 7.1.3 already explained that Finite Impulse Response (FIR)-filters can approximate an arbitrary complex transfer function very well, provided that the FIR filters have enough coefficients or taps. A widely applied method to design FIR filters is the window method (MATLAB, 1996b). The window method is as follows: • The complex frequency response of the filter is calculated at a large number of frequency points between 0 Hz and the aimed Nyquist frequency. • This complex frequency response is converted into the impulse response with the inverse Fourier Transform. • The impulse response is multiplied with an amplitude window which has a non-zero part equal to the intended FIR filter length. The center of the amplitude window is chosen such that it coincides with the maximum of the impulse response. The resulting non-zero part of the impulse response are the coefficients of the FIR-filter. This window method has been used to design FIR-filters of different lengths which approximate the optimal filters. Figure 7.8 shows the resulting directivity index of 256-, 128-, 64- and 32-taps FIR filters for the endfire array with four omnidirectional microphones and a sample frequency of 16 kHz. 10

8 6 32 taps 64 taps 128 taps 256 taps

4 2 0

0.25

0.5 1 2 frequency(kHz)

a)

45

5 0 −5

−10 −15 −20

b) Figure 7.8:

30 noise sensitivity (dB)

directivity index (dB)

10

array target response (dB)

12

0.25

0.5 1 2 frequency(kHz)

45

20 10 0

−10

0.25

0.5 1 2 frequency(kHz)

45

c)

Directivity index (a), array target response (b), and noise sensitivity (c) of the 4 microphone endfire array with optimal filters which are approximated by FIR filters with different number of taps.

The simulation shows that decreasing the number of FIR taps effects the directivity at low frequencies. This is due to the shorter time interval (or window length) covered by the FIR filter.

7.3 Optimization on the basis of non-recursive filters

129

Equation (7.5) indicates that at least 177-taps FIR filters with a sample frequency of 16 kHz are needed to achieve a broadband directivity (from 180 to 4500 Hz). The array target response is very flat as expected and the noise sensitivity is chosen to be somewhat higher than in the PGP-method, because the implementation of the FIR filters is very precise as the next section shows. This higher noise sensitivity also results in a higher intelligibility-weighted directivity index of 9.1 dB. 7.3.2 Implementation

Figure 7.9 shows the digital signal processing scheme for the microphone array. DSP-system front-end

array

1.6 cm 7.2 cm 1.6 cm

x4(t)

A/D

x3(t)

A/D

x2(t)

A/D

x1(t)

A/D

rear-end

signal processing x4[n] x3[n] x2[n] x1[n]

FIR4

y4[n] y3[n]

FIR3 FIR2 FIR1

y2[n]

+

y(t)

y[n] D/A

ear

y1[n]

pre-amp and anti-aliasing Figure 7.9:

Digital signal processing scheme for the endfire array with four omnidirectional microphones using Finite Impulse Filters (FIR-filters).

Each (analog) microphone signal xi(t) is pre-amplified and low-pass filtered to avoid aliasing. Subsequently, the analog signal xi(t) is converted to a discrete digital signal xi[n] with an A/D converter which is a 18 bits, 64 times oversampling Σ∆ converter. The digital signal xi[n] is filtered with a Finite Impulse Response filter into signal yi[n]. The filtered signals yi[n] are summed to y[n] and then converted to the analog signal y(t) with a D/A converter. This 18 bit D/A converter includes an 8 times oversampling filter followed by an 64 times oversampled one-bit modulator. The output from the one-bit modulator controls the polarity of an internal reference voltage which is then passed through an ultra-linear low-pass filter. The analog output of the D/A-converter is 1st order low-pass filtered and the output signal y(t) is presented to the ear. The actual implementation of the FIR-filters in Digital Signal Processors (DSP’s) has been done with the DSP system as developed for Wave Field Synthesis (Start, 1997). This system contains 12 floating point DSP processor boards (Motorola DSP96002). The processor boards are plugged into a PC (Intel 486DX2) which is coupled to a custom made 16-input and 96-output A/D-D/A converter unit. The 18-bit input channels and 18-bits output channels are sampled at 16 kHz. The communication to the DSP’s has been implemented in Matlab routines.

130

7 Optimization of the directivity of the endfire array

A filter program has been developed for these DSP’s which can perform FIR filtering with coefficients provided to the DSP via Matlab (Van den Heuvel, 1995). This filter program has been especially developed for array processing meaning that one single DSP can perform more FIR filters simultaneously. The use of floating point DSP’s allows very precise filtering, because the coefficients can be set with high accuracy. The FIR filters with 256 taps are used and an extra low-cut filter is incorporated in each FIR-filter to attenuate sound at the lowest frequencies. 7.3.3 Measurements

This section presents the same kind of measurements as Section 7.2.3. Figure 7.10 presents the measurement of the directivity pattern and Figure 7.11 presents the measurement of the directivity index, array target response, and noise sensitivity.

1

−60

90

−90

120

−120 150

±180

dB 0

frequency (kHz)

−10 −15 −20

60

a)

measured simulated

1 kHz, −. 2 kHz, −− 4 kHz 0 30 −30 −5 frequency (kHz)



2 3 4

−150

5

0 2 4 6 8 1012 QI (dB)

1

−5

2

−10

3

−15

4

−20

5 180

−25 90

0 −90 θ (degrees)

−180

b) Figure 7.10: Measured directivity pattern of optimized endfire array with FIR-filters. a) Polar patterns at 1 kHz (solid), 2 kHz (dash-dot), and 4 kHz (dashed). b) θ-f diagram with plot of measured (solid) and simulated (dash-dot) directivity index versus frequency.

The measurement of the directivity pattern shows that the endfire array with FIR-filters realizes a very high directivity indeed. The directivity pattern at 1 kHz is already a very directive pattern and the directivity increases fast above that frequency. This increase continues until 2 kHz and the directivity pattern is almost constant between 2 and 4 kHz as the polar diagrams in Figure 7.10b and the θ-f diagram in Figure 7.10c show. Figure 7.11 shows a more quantitative analysis of the measurement.

7.4 Directivity of the endfire array mounted on a KEMAR manikin

8 6 4

30

5

noise sensitivity (dB)

10

0 −5

−10

2 0

a)

10 array target response (dB)

directivity index (dB)

12

131

−15

0.25

0.5 1 2 3 45 frequency (kHz)

−20

b)

0.25 0.5 1 2 345 frequency (kHz)

20 10 0

−10

0.25 0.5 1 2 345 frequency (kHz)

c)

Figure 7.11: Measured directivity index (a), array target response (b), and noise sensitivity (c) of the 4 microphone endfire array with optimal filters which are implemented as FIR filters.

The measured directivity index increases from 5 dB at 200 Hz to 10 dB at 4 kHz and it agrees well with the simulation from 300 to 3 kHz. At low frequencies, the deviation is due to mismatch between the microphones and at high frequencies the omnidirectional microphones are not completely omnidirectional anymore. The measured array target response is very flat as expected except for the small decrease at low frequencies due to the low-cut filtering by the FIR filter. The measured noise sensitivity matches also well with the measured noise sensitivity and the influence of the low-cut filter is also visible here. In conclusion, the endfire array with non-recursive FIR-filters achieves broadband high directivity (intelligibility-weighted 9.1 dB). It is not possible to implement such filters in hearing aid DSP’s yet, but the advancements in DSP’s for hearing aid application are going very fast. Therefore these FIR-filters are chosen to be used in a speech intelligibility test with normal hearing and hearing impaired subjects. This speech intelligibility test is a measurement of the Speech Reception Threshold in an artificial diffuse noise field, see Chapter 8. The subjects are wearing a pair of glasses with an endfire array in both arms (i.e. a binaural endfire array). Therefore, it is necessary to know what is the influence of the head on the directivity. This influence is examined in the next section.

7.4 Directivity of the endfire array mounted on a KEMAR manikin The endfire arrays of the previous sections have been developed and measured in free field conditions. However, the endfire arrays have been integrated in the arms of a pair of spectacles which is worn on the head. Therefore, this section examines the influence of the head on the directivity. Section 7.4.1 presents measurements of the endfire array on a KEMAR head, Sec-

132

7 Optimization of the directivity of the endfire array

tion 7.4.2 presents a design method to include the influence of the head in the filter design and Section 7.4.3 presents measurements with this new design. 7.4.1 Measurements with the endfire array mounted on a KEMAR manikin

This section presents measurements with two different endfire arrays mounted on a KEMAR head. The first endfire array is the endfire array as used in the previous sections, see Figure 5.10 on page 92. The second endfire array is developed to use in the speech intelligibility test with the normal hearing and hearing impaired subjects. This endfire array has the same length and the same configuration and it contains the same type of microphones as the first endfire array. However, subjects with different head sizes and different haircuts have to be able to wear the spectacles with the endfire arrays. Furthermore, the endfire array must be shock-proof. This results in a design in which the arrays are somewhat further away from the KEMAR head than the first design. The distance of the first endfire array to the KEMAR head is 5 mm and the distance of the second endfire array to the KEMAR head is 12 mm which is similar to the distance of the arrays which Soede used in his speech intelligibility test (Soede, 1990). Figure 7.12 and Figure 7.13 show the measured directivity patterns of both endfire arrays mounted on a KEMAR manikin. KEMAR free field



1 kHz, −. 2 kHz, −− 4 kHz 0 30 −30 −5 −10 60 −60 −15 −20

−120 150

±180

2 3

frequency (kHz)

1

−90

120

a)

0

frequency (kHz)

90

dB

4

−150

5

0 2 4 6 8 10 Q2D,FR,I (dB)

1

−5

2

−10

3

−15

4

−20

5 180

−25 90

0 −90 θ (degrees)

−180

b) Figure 7.12: Measured directivity pattern of the optimized endfire array at 5 mm to the KEMAR-head. a) Polar patterns at 1 kHz (solid), 2 kHz (dash-dot), and 4 kHz (dashed) b) θ-f diagram with plot of measured KEMAR-mounted (solid) and measured free-field (dash-dot) two-dimensional front-random index versus frequency.

The polar pattern shows that the head has a small influence at 1 and 2 kHz. The main beam of the directivity pattern broadens in comparison with the free-field measurement. The influence is larger at 4 kHz where the directivity pattern is slightly directed to the right which is the side

7.4 Directivity of the endfire array mounted on a KEMAR manikin

133

of the endfire array. This is also visible in the θ-f diagram. The main beam is at 0o for frequencies below 3.5 kHz. Then, a second beam appears which becomes the main beam for higher frequencies. This has also clear implications for the two-dimensional front-random index. A comparison between the free-field two-dimensional front-random index and the KEMAR twodimensional front-random index shows that the difference is about 0.5 dB for frequencies below 3.5 kHz, but it increases to a difference of 1 dB for frequencies above 3.5 kHz. Figure 7.13 shows the directivity pattern of the endfire array at a distance of 12 mm to the head of the KEMAR manikin. KEMAR (12 mm) KEMAR (5 mm) free field



1 kHz, −. 2 kHz, −− 4 kHz 0 30 −30 −5 −10 60 −60 −15 −20

0

−90

120

−120 150

±180

frequency (kHz)

1 frequency (kHz)

90

dB

2 3 4

−150

5

0 2 4 6 8 10 Q (dB) 2D,FR,I

a)

1

−5

2

−10

3

−15

4

−20

5 180

−25 90

0 −90 θ (degrees)

−180

b) Figure 7.13: Measured directivity pattern of the optimized endfire array at 12 mm to the KEMAR-head. a) Polar patterns at 1 kHz (solid), 2 kHz (dash-dot), and 4 kHz (dashed). b) θ-f diagram with plot of measured KEMAR-mounted (12 mm) (solid), measured KEMARmounted (5 mm) (dashed), and measured free-field (dash-dot) two-dimensional frontrandom index versus frequency.

The directivity of the second endfire array is hardly distorted by the head. The main beam of the directivity pattern does not broaden and it remains at 0 degrees for all frequencies. The directivity is even higher than the free-field measurement of the first endfire array for frequencies above 4 kHz, but this is due to the better matching of the microphones of the second endfire array. These measurements show that the directivity pattern of the endfire array is perturbed when it is very close to the head. It is possible to include these perturbations by the head in the design of the filters and thereby optimizing the directivity of the array when mounted near the head. However, the high dependence of the perturbation on the distance to the head will make it difficult to design these head-related optimal filters such that they are suitable for all heads and

134

7 Optimization of the directivity of the endfire array

head-sizes. The next section designs head-related optimal filters for the first endfire array which is at a distance of 5 mm to the head. 7.4.2 Design of head-related optimal filters

The head perturbs the directivity pattern of the endfire array when it is closely mounted to the head. The head introduces sound diffraction and reflections which modifies the transfer function of the microphones as function of the angle of the incident sound and of the frequency. Hence, the head induces the (omnidirectional) microphones to be directive. This directivity depends on the frequency and of the position of the microphone with respect to the head. The directivity of the microphones can easily be included in the design of the optimal filters, as the array theory in Chapter 3 showed. Section 3.3 showed that the directivity pattern of the microphones has to be included in the vector W(θ, φ, ω) with the relative propagation delays, see Equation (3.34a): Γ 1 ( θ, φ, ω )e W(θ, φ, ω) =

Γ 2 ( θ, φ, ω )e

ω j ---- z 1 cos θ c ω j ---- z 2 cos θ c

(7.9)

… Γ N ( θ, φ, ω )e

ω j ---- z N cos θ c

where Γn(θ,φ,ω) is the directivity pattern of microphone n. This new vector W(θ, φ, ω) enables to calculate the head-related optimal filter with use of Equation (7.1). Therefore, the directivity pattern of each of the four microphones has been measured while the endfire array is mounted on the KEMAR manikin. Here, the directivity patterns have been measured in the horizontal plane only, and then the optimal filters have been calculated assuming symmetry around the z-axis. These head-related optimal filters are used to simulate the directivity pattern of the KEMARmounted endfire array, see Figure 7.14.

7.4 Directivity of the endfire array mounted on a KEMAR manikin

135

head−related (5 mm) KEMAR (5 mm) free field



1 kHz, −. 2 kHz, −− 4 kHz 0 30 −30 −5 −10 60 −60 −15 −20

−120 150

±180

2 3

frequency (kHz)

1

−90

120

a)

0

frequency (kHz)

90

dB

4

−150

5

0 2 4 6 8 10 Q2D,FR,I (dB)

1

−5

2

−10

3

−15

4

−20

5 180

−25 90

0 −90 θ (degrees)

−180

b) Figure 7.14: Simulated directivity pattern with the endfire array at 5 mm to the KEMAR-head and headrelated optimal filters. a) Polar patterns at 1 kHz (solid), 2 kHz (dash-dot), and 4 kHz (dashed). b) θ-f diagram with plot of head-related (solid), measured free-field (dashed), and measured KEMAR mounted (dash-dot) two-dimensional front-random index versus frequency.

The simulated directivity pattern shows that the head-related optimal filters realize a smaller main beam in comparison with the optimal filters and that the main beam of the directivity pattern is at 0 degrees for all frequencies. These improvements result in an increase of the twodimensional front-random index such that the resulting front-random index is half-way between the free-field and KEMAR mounted two-dimensional front-random index.

136

7 Optimization of the directivity of the endfire array

7.4.3 Measurements

Figure 7.15 presents a measurement of the directivity pattern of the endfire array with the headrelated optimal filters to verify whether the simulated improvement also occurs in practice. The head-related optimal filters are implemented as FIR-filters.



−90

120

0 1

−120 150

±180

a)

2 3

frequency (kHz)

90

dB

frequency (kHz)

1 kHz, −. 2 kHz, −− 4 kHz 0 30 −30 −5 −10 60 −60 −15 −20

head−related (5 mm) KEMAR (5 mm) free field

4

−150

5

0 2 4 6 8 10 Q2D,FR,I (dB)

1

−5

2

−10

3

−15

4

−20

5 180

−25 90

0 −90 θ (degrees)

−180

b) Figure 7.15: Measured directivity pattern with the endfire array at 5 mm to the KEMAR-head and headrelated optimal filters. a) Polar patterns at 1 kHz (solid), 2 kHz (dash-dot), and 4 kHz (dashed). b) θ-f diagram with plot of head-related (solid), measured free-field (dashed), and measured KEMAR mounted (dash-dot) two-dimensional front-random index versus frequency.

Unfortunately, Figure 7.15 shows that there is a considerable difference between the simulation and the measurement of the directivity pattern of the endfire array with the head-related optimal filters. The main beam is not steady at 0 degrees and the measured two-dimensional front-random index is also below the simulated front-random index. The head-related filters are apparently not stable and they do not provide more directivity in comparison with the ordinary optimal filters.

7.5 Conclusions This chapter has presented research on the optimization of the directivity of the four microphone endfire array with the optimal beamforming method. A simulation has shown that the optimal beamforming method improves the directivity of the endfire array with four omnidirectional microphones in comparison with the PGP-method which achieves an intelligibility-

7.5 Conclusions

137

weighted directivity index of 7.1 dB. The optimal filters have been implemented with recursive and non-recursive filters. Recursive filters are very efficient but they cannot be prescribed very well for our purposes. The optimal filters have been approximated with recursive filters of different filter orders. These approximations result in an intelligibility-weighted directivity index of 7.3, 7.8, 8.1, and 8.2 dB for filter orders 2-5. The third order approximation has been implemented in analogue electronics and the measurements show a good agreement with the simulations. Non-recursive filters are very precise but they are not very efficient. They realize a very high intelligibility-weighted directivity index of 9.1 dB with digital FIR filters of 256 taps. These filters have also been implemented in a DSP-system with Motorola 96000 DSP’s and the measurements confirm the high directivity. These good results and the advancement in the computational power of DSP’s for hearing aids have led to the use of this endfire array with digital FIR filters in speech intelligibility tests with normal hearing and hearing impaired subjects. These tests are presented in the next chapter. The speech intelligibility tests are in-situ tests meaning that the subjects are actually wearing a pair of spectacles with endfire arrays in both arms. Therefore, the directivity of the endfire array has been investigated when the array is mounted on a KEMAR manikin. The measurements show that the directivity of the array strongly depends on the distance of the array to the head. When the array is very close to the head (5 mm), the directivity pattern is perturbed, especially at high frequencies. When the array is somewhat further from the head (12 mm), the influence of the head is minimal and the directivity pattern is almost as in free field. To minimize the influence of the head when the array is close to it, the optimal filters have been redesigned such that they include the perturbations of the head. These head-related optimal filters showed improvement in simulation, but this improvement could not be confirmed with measurements, probably due to the fact that the calculated head-related filters depend very much on the way of mounting the endfire array on the head. In conclusion, the directivity of the endfire array is optimal when the filters of the optimal beamforming method are implemented with FIR filters. These optimal filters achieve an intelligibility-weighted directivity index of 9.1 dB. An optimal binaural endfire array consisting of two optimal endfire arrays is tested in a speech intelligibility test with normal hearing and hearing impaired subjects in the next chapter.

138

7 Optimization of the directivity of the endfire array

139

8 8Evaluation of the optimized endfire array

This chapter presents the results of an evaluation of the optimized binaural endfire array with digital filters as presented in Chapter 7. This evaluation was performed in an artificial diffuse noise field and it was an objective as well as a perceptual evaluation. The objective evaluation consisted of measurements of the front-random index of the optimized endfire array as well as the front-random index of an omnidirectional and a directional hearing aid in the artificial diffuse noise field. The perceptual evaluation was a speech intelligibility test which measured the Speech Reception Threshold (SRT) of normal hearing and hearing impaired subjects when they were listening with hearing aids and when they were listening with the optimized binaural endfire array. Section 8.1 discusses the set-up of the artificial diffuse noise field, Section 8.2 shows the objective measurements, Section 8.3 shows the SRT-tests with the normal hearing subject and Section 8.4 shows the SRT-tests with the hearing impaired subjects. Finally, Section 8.5 draws the conclusions of this chapter. A part of this research was already published in Merks et al. (1999).

8.1 Experimental set-up of an artificial diffuse noise field The optimized binaural endfire array has been designed to improve the speech intelligibility in noisy environments, like a cocktail party. Hence, the binaural endfire array had to be evaluated in a situation which resembled such a difficult noisy environment. Therefore the same artificial diffuse noise field as developed by Soede (1990) was used for the evaluation. An additional advantage is that the results of the evaluation are comparable with Soede’s results. This section

140

8 Evaluation of the optimized endfire array

briefly summarizes the set-up of this artificial diffuse noise field. Figure 8.1 shows the loudspeaker configuration which realizes the artificial diffuse noise field. 2.0 m

0 2.

m

CL 30 1.7 m

0.4 m

Figure 8.1:

Setup of loudspeakers to realize an artificial diffuse noise field. One loudspeaker (CL30) simulates the partner in discussion while the other eight loudspeakers realize an artificial diffuse sound field at the center of the rectangular box.

The set-up consisted of 9 loudspeakers and it was placed in the sound insulated room of the Audiological Center at the Dijkzigt Hospital in Rotterdam. The artificial diffuse sound field was realized with 8 small loudspeakers (Philips Matchline CE-75 series) which were positioned at the boundaries of an imaginary rectangular box inside the room. The loudspeakers were fed with 8 independent noise signals with female speech noise, according to Plomp and Mimpen (1979) and IZF (1988). These eight loudspeakers realized a noise field that was practically diffuse at the center of the rectangular box (Soede, 1990). At that center position, the KEMAR head was positioned during the measurements. During the speech intelligibility test, the subject was seated such that his head is positioned at that center position. The ninth loudspeaker (Celestion CL30) was positioned at a height of 1.25 meter right in front of the subject and it simulated the partner in a discussion during the speech intelligibility test. The noise and speech production during the measurements and speech intelligibility test were done with the set-up shown in Figure 8.2.

8.2 Physical measurements in the artificial diffuse noise field

SRT-system

141

1 speech signal 1 intercom-signal

on/off signal

8 noise signals DSP-system

8 microphone signals 2 ear signals

control booth Figure 8.2:

sound insulated room

Noise and speech production in the experimental set-up of the artificial sound field at the Audiological Center of the Dijkzigt hospital in Rotterdam.

The experimental set-up was divided between the sound insulated room with the actual noise field and a control booth where a DSP-system was placed which controlled the noise field. The DSP-system in the control booth is the same DSP-system as described in Section 7.3.2. Two of the twelve DSP processor boards (Motorola DSP96002) generated 8 independent noise signals with the long-term spectrum of speech (from track 41 of IZF (1988)). These signals were sent to the 8 loudspeakers in the sound insulated room to realize the diffuse noise field. The (speech) signal for the ninth loudspeaker was provided by the SRT-system which is usually used to perform the SRT-tests at the Audiological Center in Rotterdam. The SRT-system uses a CD-player to present the speech (sentences) and the signal of the SRT-system that controls the CD-player was also sent to the DSP-system to switch the artificial diffuse noise field on and off during the speech intelligibility tests. Section 7.3.2 already described how the DSP-system also processed the 8 microphone signals of the microphone arrays into two ear-signals. These ear-signals were sent to two Sennheiser induction plates EZI 120 which transmitted the signals to the hearing aids of the subject. The subject answered via the intercom.

8.2 Physical measurements in the artificial diffuse noise field Before the speech intelligibility tests, two metrics were measured to assess the performance of the array in an objective way: • front-random index; • insertion gain. The front-random index is a measure for the directivity of the array as discussed in Section 3.4.1

142

8 Evaluation of the optimized endfire array

and the insertion gain is a measure for the (array) target response. The insertion gain is the frequency response of the hearing aid in comparison with the open-ear response of the KEMARmanikin. These frequency responses are measured with sound incident from the front (target). First, the open-ear response of KEMAR is measured and then the response of the KEMAR-ear with the hearing aid is measured. The insertion gain is the proportion of these two. In this case, the insertion gains of interest were the insertion gain of the hearing aid of the subject (e.g. Philips M71) and the insertion gain of the microphone array which was coupled to the hearing aid via an induction plate (coil). These insertion gains were compared to assure that there were no differences between the target response of the array and hearing aid which could influence the outcome of the speech intelligibility test. 8.2.1 Front-random index

The front-random index has been defined in Section 3.4.1 as the ratio of the gain of an acoustic device in its front direction and the average gain of the acoustic device over all directions. The front-random index is a measure for the actual attenuation of a artificial diffuse noise field by an acoustic device like a microphone array or (directional) hearing aid. The spectrum with noise coming from the front is measured (front spectrum) and the spectrum in a diffuse noise field is measured (random spectrum). These spectra are measured with the acoustic device as well as with an omnidirectional microphone (reference). The four measurements are used to calculate the front random index of the acoustic device: Q FR, I = U front ,device – Urand ,device – ( U front ,omni – U rand ,omni ) .

(8.1)

Here U front, device is the front spectrum of the device, U rand ,device is the random spectrum of the device, U front ,omni is the front spectrum of the omnidirectional microphone, and U rand ,omni is the random spectrum of the omnidirectional microphone. The front spectra and the random spectra were measured in the artificial diffuse noise field using a B&K 2133 spectrum analyzer. Here, the acoustic devices were mounted on a KEMAR manikin in the center of the artificial-diffuse noise field and the reference measurements were performed with an omnidirectional B&K4165 1/2 inch pressure microphone in the place of the KEMAR manikin. First, the front-random index of the ear of the KEMAR manikin itself was measured. Secondly, the front-random index of an omnidirectional Behind-The Ear (BTE) hearing aid mounted on a KEMAR manikin was measured. This hearing aid is the Philips M71 and it was used in the speech intelligibility test with the normal hearing subjects in Section 8.3. Thirdly, the front-random index of a directional BTE hearing aid was examined. Several directional hearing aids have been introduced into the hearing aid market and the Siemens Prisma is one of the latest directional hearing aids which uses two omnidirectional microphones and digital signal processing to achieve directivity (Siemens, 1998). Finally, the front-random index of the optimized endfire array mounted on a KEMAR manikin was measured.

8.2 Physical measurements in the artificial diffuse noise field

143

The results of measurements with an ear of the KEMAR manikin and with an omnidirectional hearing aid (Philips M71) mounted on the ear of a KEMAR manikin are shown in Figure 8.3.

a) KEMAR−ear

front−random index (dB)

12 10 8 6 4 2 0 −2 −4 250

500

1000 2000 4000 frequency (Hz)

8000 intelligibility− weighted

250

500

1000 2000 4000 frequency (Hz)

8000 intelligibility− weighted

b) Omnidirectional BTE

Figure 8.3:

front−random index (dB)

12 10 8 6 4 2 0 −2 −4

Front-random index of two acoustic devices mounted on a KEMAR manikin which is placed in the center of the artificial diffuse sound field. The last bar is the intelligibility-weighted front-random index. a) Left ear of the KEMAR manikin. b) Omnidirectional BTE hearing aid (Philips M71) mounted on the ear of a KEMAR manikin.

Figure 8.3 shows the measured front-random index versus the frequency and it shows the intelligibility-weighted front-random index in the last bar. The measurement of the KEMAR-ear shows that the front-random index is slightly negative for frequencies below 2 kHz and slightly positive for higher frequencies. This is due to shielding by the head and the pinna. This results in an intelligibility-weighted front-random index of -0.2 dB. Hence, a single ear has about the same front-random index as an omnidirectional microphone. It is, of course, the binaural processing of both ear-signals which really improves the speech intelligibility in noise. The measurement shows that the front-random index of the omnidirectional hearing aid (Philips M71) resembles the front-random index of the KEMAR-ear for low frequencies. But the

144

8 Evaluation of the optimized endfire array

front-random index at high frequencies is considerably lower than the front-random index of the KEMAR-ear because the omnidirectional hearing aid is at an unfavorable position in comparison with the ear. This results in an intelligibility-weighted front-random index of -1.4 dB. Hence, this omnidirectional BTE hearing aid does not improve but it deteriorates the speech intelligibility in noise. Figure 8.4 shows the measurements with a directional hearing aid (Siemens Prisma) and the optimized endfire array.

a) Directional BTE

front−random index (dB)

12 10 8 6 4 2 0 −2 −4 250

500

b) Endfire

front−random index (dB)

12 10

8000 intelligibility− weighted

1000 2000 4000 frequency (Hz)

8000 intelligibility− weighted

simulated

8 6 4 2 0 −2

measured

−4 250

Figure 8.4:

1000 2000 4000 frequency (Hz)

500

Front-random index of two acoustic devices mounted on a KEMAR manikin which is placed in the center of the artificial diffuse sound field. The last bar is the intelligibility-weighted front-random index. a) Directional BTE hearing aid (Siemens Prisma in directional mode). b) Optimized endfire array. The line is the simulated three-dimensional directivity index

Clearly, the measurement shows that the directional hearing aid has a higher front-random index than the omnidirectional hearing aid. The intelligibility-weighted front-random index is 3.3 dB and this is a considerable improvement in comparison with the ear (-0.2) and with the omnidirectional hearing aid (-1.4). However, it does not provide the essential improvement of at least 5 dB, which is necessary to provide substantial benefit, see Section 2.2.4. The front-random index of the optimized endfire array varies from 5 dB at 200 Hz to more than

8.2 Physical measurements in the artificial diffuse noise field

145

10 dB at 4 kHz. The intelligibility-weighted front-random index is 7.7 dB. The front-random index is about 1 dB less than the simulated three-dimensional directivity index in Section 7.3 which is due to the influence of the KEMAR head. 8.2.2 Insertion gain

The goal of the speech intelligibility test was to measure the difference in SRT when the subjects were listening with hearing aids and when they were listening with the endfire array. The subjects listened to the output-signals of the binaural endfire array over hearing aids. The output signals of the arrays were transmitted to the hearing aids with two Sennheiser induction plates (EZI 120) and the hearing aids were switched to tele-coil reception which means that the hearing aid uses its tele-coil to pick up signals from an electromagnetic transducer. For correct comparison, it is important that the target responses of the hearing aids are equal to the target responses of the arrays. The measurement of the array target response in Section 7.3.2 showed already that the array target response is flat. Next, the target response of the array coupled to the hearing aid and the target response of the hearing aid had to be compared. This comparison was done by measuring the insertion gain. The insertion gain is the transfer function of the hearing aid in comparison with the open-ear response. These transfer functions are measured with sound coming from the front (the target). First, the open-ear response of KEMAR was measured and then the response of the ear with the hearing aid was measured. The ratio between the hearing-aid response and the open-ear response is the insertion gain. In this case, the insertion gains of interest were the insertion gain of a hearing aid (Philips M71) and the insertion gain of the microphone array which was coupled to the hearing aid via an induction plate (coil). Figure 8.5 shows the measured insertion gains.

146

8 Evaluation of the optimized endfire array

60

Insertion gain (dB)

40

20

0

−20

Figure 8.5:

hearing aid ↓

↑ endfire array

125

250

500

1000 frequency (Hz)

2000

4000

8000

Insertion gains measured with a Philips M71 hearing aid mounted on a KEMAR manikin. One insertion gain is measured with the internal microphone of the hearing aid (dash-dot) and the other insertion gain is measured with the endfire array coupled via the induction plate with the hearing aid (solid).

The measurements show good comparison between the two insertion gains from 400 Hz up. There is less insertion gain below 400 Hz for the microphone array due to the electromagnetic coupling. However, the insertion gain of most hearing aids is set low for that frequency range, because low frequencies have a negative influence on the speech intelligibility (upward spread of masking) and on the comfort of the hearing impaired subject (irritating background noise).

8.3 Speech intelligibility test with normal hearing subjects After this objective evaluation, the arrays were evaluated with a speech intelligibility test. The speech intelligibility test was an SRT-measurement with the up-down procedure as explained in Section 2.2.2 using the speech and noise of Plomp and Mimpen (1979) (from IZF (1988)). The Speech Reception Thresholds (SRT) of normal hearing and hearing impaired subject were measured while the subjects were listening with hearing aids and with the optimized binaural endfire array. The SRT-measurements were in-situ measurements; the subject was seated in a chair with head-restrain such that his head was in the center of the artificial diffuse noise field. The subject was able to move his head, but he was asked to put his head against the headrestrain. First the speech intelligibility test was performed with normal hearing subjects.

8.3 Speech intelligibility test with normal hearing subjects

147

8.3.1 Selection criteria for normal hearing subjects

The normal hearing subjects were selected with the following criteria: 1. maximum hearing threshold of 15 dB HL at 0.5, 1, 2, and 4 kHz; 2. maximum interaural difference of 10 dB in hearing threshold at 0.5, 1, 2, and 4 kHz; 3. subjects are minimal 16 years old and maximal 75 years old; The resulting group of subjects consisted of 13 females and 5 males with ages ranging from 1850 years (median: 29.5). 8.3.2 Listening conditions

The Speech Reception Threshold was determined under the following listening conditions: 1. absolute SRT with binaural hearing aids in quiet; 2. SRT with binaural hearing aids and noise level 30 dB above absolute SRT in quiet (2x) (condition 1); 3. absolute SRT with optimized binaural endfire array via binaural hearing aids in quiet; 4. SRT with optimized binaural endfire array via binaural hearing aids and noise level 30 dB above absolute SRT in quiet (2x) (condition 3). First, the absolute SRT in quiet was measured, so the level of the diffuse noise in the other conditions could be set at +30 dB above that absolute SRT, so that the subject’s threshold did not influence the SRT-measurement. The SRT is usually measured at 20 dB and 30 dB above the absolute SRT in quiet. Here, the SRT was measured twice at + 30 dB, because Soede reported that the SRT’s measured at +20 can be too close to the absolute SRT in quiet which biases the measurements. The order of conditions (i.e with the hearing aids and with the endfire arrays) was varied to avoid effects of habituation and fatigue. The duration of the listening test was about 30 minutes. The used hearing aids were Philips M71 Behind-The-Ear (BTE) and the sound was presented to the ear via a Libby-horn with foam-plug. 8.3.3 Results

Table 8.1 summarizes the results of the SRT-tests. Table 8.1:

Results SRT-test with normal hearing subjects

condition binaural hearing aids

number of subjects

SRT (dB)

s.d (dB)

18

-5.3

1.3

-12.8

0.6

binaural endfire array via hearing aids 18

The main interest of the test is not the SRT’s themselves, but the improvement due to the binaural endfire array. This improvement is the difference between the SRT with hearing aids and

148

8 Evaluation of the optimized endfire array

the SRT with the binaural endfire array. The distribution of this improvement is given in Figure 8.6. 18 subjects, mean 7.5 dB, s.d 1.5 dB 12

Number of subjects

10 8 6 4 2 0

Figure 8.6:

1−2

3−4 5−6 7−8 9−10 11−12 improvement SNR (dB)

Number of normal hearing subjects classified with respect to the improvement of the SNR as a result of comparative listening tests with two hearing aids with omnidirectional microphones and two endfire arrays via hearing aids.

The average subjective improvement of 7.5 dB can be compared with the measured objective improvement. This objective improvement is the difference between the measured intelligibility-weighted front-random index of the endfire array (7.7 dB) and of the Philips M71 hearing aid (-1.4 dB). A one-sided T-test shows that the subjective improvement of 7.5 dB is significantly smaller than the objective improvement of 9.1 dB (p

Suggest Documents