a robust adaptive cross microphone array - IEEE Xplore

2 downloads 0 Views 398KB Size Report
In this paper, a robust adaptive microphone array system which has a cone-shaped directionality pattern is presented. The array consists of two linear ...
A ROBUST ADAPTIVE CROSS MICROPHONE ARRAY Jianfeng Chen, Koksoon Phua, Louis Shue, Hanwu Sun Institute for Infocomm Research, 21 Heng Mui Keng Terrace, Singapore 119613 E-mail: [email protected] whose focus is on tolerance for error and model mismatch. A representative for robust beamformer was recently proposed by Hoshuyama in 1999 [2]. This structure is a generalized sidelobe canceller (GSC) with an adaptive Blocking Matrix (BM) using coefficient-constraint adaptive filters (CCAFs) and a multipleinput canceller with norm-constrained adaptive filters (NCAFs). The method was attractive because: (a) It provides adequate tolerance for the large target-direction error. The maximum allowable target-direction error can be specified by the user; (b) It can be implemented with only few microphones; (c) It exhibits good performance even in midfield and reverberant environment.

ABSTRACT In this paper, a robust adaptive microphone array system which has a cone-shaped directionality pattern is presented. The array consists of two linear microphone arrays, arranged in a cross fashion, and incorporates the CCAF-NCAF robust beamformer scheme of Hoshuyama, et al. As indicated in the simulation results, the proposed cross microphone array overcomes problem of spatial ambiguity (unlike a conventional linear array), and has the added advantage of 3-Dimensional interference cancellation. In addition, the system also benefits from the accelerated convergence speed due to a reduced allowable capture region.

In this paper, we have applied the CCAF-NCAF scheme for a two-dimensional microphone array, arranged in the shape of a cross. We note firstly that two-dimensional arrays can overcome some of the inherent drawbacks which may present in onedimensional linear array. For example, angle ambiguity which occurs in the plane perpendicular to the one-dimensional array is eliminated. Our simulations results also indicate that the adaptive process benefits from the 2-D array structure resulting in increased robustness, superior noise cancellation and a faster convergence.

1. INTRODUCTION In recent years, the need for high quality audio capture while ‘on the move’ is becoming increasingly important for various communication applications, such as speech recognition on handsfree devices, teleconferencing, hearing aids, etc. In particular, microphone arrays coupled with advanced beamforming technology have shown very promising results [1].

The structure of the paper is as follows. Section 2 reviews the CCAF-NCAF scheme as proposed by Hoshuyama [2]. In Section 3, we will provide a detailed description of our proposed scheme for realizing the cone-shaped directionality. Simulation results obtained under the anechoic and reverberant environments will be presented in Section 4. Finally, some conclusions will be provided in Section 5.

Compared with array processing in other domains, microphone array processing is more challenging with specific difficulties which arise due to the nonstationary characteristics of speech and the usually complicated acoustic environment. Some of these factors include, for example, (a) imperfect microphone array calibration; (b) acoustic multipath distortion or reverberation; (c) inherent intermittent property of speech signal; (d) competing and interfering signals with similar characteristics to that of the target speech signal; (e) near or mid field condition; (f) relatively broad bandwidth and large dynamic range of speech signal. As a result, most of the commonly used signal models in conventional array processing cannot be easily adopted in practice. Hence, in order to apply the sophisticated beamforming technology successfully in microphone arrays, these factors must be comprehended.

2. REVIEW OF THE CCAF-NCAF BEAMFORMER In this section, we will briefly outline the salient features in the CCAF-NCAF beamformer. More details can be found in [2]-[4].

Among the various beamforming methods for tackling the problem of providing improved directionality in signal acquisition, adaptive beamforming is preferred over conventional fixed beamforming with the main reasons being that adaptive methods tend to exhibit much higher interferences suppression and can adapt to the changing environment. For example, the Griffiths-Jim beamformer (GJBF) is a widely studied adaptive method. However, it has been observed that GJBF suffers from imperfect calibrated array, which typically can result in steering-vector error and target-signal cancellation [1]. Since array errors and model mismatches are inevitable in practice, researchers have turned their attention to sub-optimal solutions. One of these solutions is categorized as robust beamformer,

0-7803-8834-8/05/$20.00 ©2005 IEEE.

1682

The structure of the CCAF-NCAF beamformer is shown in Fig. 1. Similar to the Generalized Sidelobe Canceller (GSC), this structure includes a Fixed Beamformer (FBF), a Blocking Matrix (BM), and a Multiple-input Canceller (MC). In the upper branch, the output signals from M sensors, xm(k), (m=1,…, M), are fed into the FBF to enhance the target signal to produce the output b(k) at time k. In the lower branch, the same signals, xm(k), enter the BM which eliminates the target signal and allows the interferences to pass through. The output signals, zm(k), which are correlated with the output of the FBF are then subtracted in the MC from the output signal b(k-L1) of the FBF, where L1 is the number of delay samples needed for causality. The key difference of Hoshuyama’s method from the GJBF lies in the design of the BM, where CCAFs are used to determine the allowable target-direction range. This is based on the fact that filter coefficients for target-signal minimization vary significantly

Z

with the target direction [2]. Hence, the maximum allowable target-direction error can be easily controlled. For that reason, this method can be applied in microphone array systems which experience large target-direction error in practice.

5

α 6

4 cm

b(k)

z-L1

z1(k) z2(k)

: :

4

X

Fig.3. The layout of cross microphone array. In view of these drawbacks in the linear array configuration, we propose to extend the adaptive beamformer application to a 3-D situation by combining two linear arrays, arranged perpendicular to each other to achieve a constraint in the vertical orientation. Fig. 3 illustrates the layout of this cross microphone array. Four sensors have been used in each equispaced linear array (d = 4 cm). The whole array is quite small (total size = 12 cm diameter) and is applicable in area such as hands-free communication devices in car.

zM(k)

Detailed structure of the adaptive beamformer for the cross microphone array is shown in Fig. 4. There are two sets of BMs, corresponding to the linear arrays, along the X- and Y-axis respectively (see Fig. 3). To avoid confusion, xm(k) represents the mth sensor signal in the linear array along the X-direction, similarly ym(k) in the Y-direction, where m = 1,2,…, M and M is the sensor number for each linear array (e.g. M = 4 in Fig. 3). There is a single MC, resulting in the beamformer output d(k).

3. CONE-SHAPED DIRECTIONALITY REALIZATION

The two adaptive beamformers for the two linear arrays are designed to operate simultaneously. We note that the MC behind the two BMs will adaptively cancel all the components correlated to the output of BMs. Hence, it can be concluded that any interferences coming outside the desired capture region of each linear array would be canceled out. As a result, the common allowable region will form a 3-D cone-shaped capture space, aligning along a steerable direction.

Allowable Region

θ

O

β

8

Fig.1. The CCAF-NCAF Adaptive Beamformer Structure proposed by Hoshuyama [2].

Rejection Region

3

d(k)

b(k-L1) MC (Multiple-input Canceller) NCAF: Norm-Constrained Adaptive Filter

: :

7

CCAF Coefficient-Constrained Adaptive Filter

: : xM (k)

FBF: Fixed Beam-Former

x2 (k)

2

1

BM (Blocking Matrix)

x1 (k)

Y

Rejection Region

The differences between our proposed structure for the cross array and the original Hoshuyama beamformer for a linear array [2] are summarized as follows. Allowable Region

(a) FBF output, b(k), is the summation of the linear arrays along the X- and Y-direction, i.e.,

Fig. 2: An illustration of allowable and rejection regions for a linear array.

b(k ) =

Much of current research on adaptive beamforming techniques concentrates on linear arrays for their simplicity. The complexity and the cost of 2-D/3-D arrays also make their adoption difficult. Nevertheless, it should be noted that a linear array congenitally suffers from spatial ambiguity. That is, a linear array is unable to differentiate signals impinging from the same azimuth but at different elevations. As shown in Fig. 2, the allowable region of a linear array is actually a broad ring, or the space except the two cones of rejected regions. As a result, undesired signals that fall into the capture region will be present in the array output since they are falsely regarded as the desired signal.

1 2M

M

M

  ym ( k )  ∑ xm (k ) + ∑  m =1  m =1

(1)

(b) Since there are two sets of BM, the CCAFs are represented in separated forms, namely, T

zm (k ) =

 xm ( k − L2 ) − H m ( k ) B ( k ),  T  ym ( k − L2 ) − H m ( k ) B ( k ),

[

H m ( k ) = hm ,0 ( k )

B ( k ) = [b( k )

1683

hm ,1 ( k )

b ( k − 1)

L

L

m = 1, L , M

(2)

m = M + 1, L , 2 M

hm , N −1 ( k )

]T , m = 1, L , 2M

b ( k − N + 1) ]

T

(3) (4)

shown in Fig. 6, a rectangular coordinate system with the origin in one corner and axes parallel to the walls was used as reference points in the room. The sensors were located at (1.00m, 3.06m, 1.50m), (1.00m, 3.02m, 1.50m), (1.00m, 2.98m, 1.50m), (1.00m, 2.94m, 1.50m), (1.00m, 3.00m, 1.56m), (1.00m, 3.00m, 1.52m), (1.00m, 3.00m, 1.48m), (1.00m, 3.00m, 1.44m), respectively. The source signal, located at (3.00m, 3.00m, 1.50m), was produced by playing a speech file selected from the TIMIT database.

where L2 is the number of delay samples for causality, N is the number of taps in each CCAF, zm(k) the mth is the output signal of each BM, and Hm(k) is the filter coefficient vector of the mth CCAF, at time index k. (b) 2M outputs of the two BMs are fed into NCAF to be cancelled from b(k), as follows, 2M

d (k ) = b(k − L1 ) − ∑ WmT (k ) zm (k )

(5)

m =1

[

Wm ( k ) = wm ,0 ( k ) Z m (k ) = [ zm ( k )

]T

wm ,1 ( k )

L

wm , N ′−1 ( k )

z m ( k − 1)

L

z m ( k − N ′ + 1) ] (7)

Digital versions of the room impulse responses were generated with Allen and Berkeley's image model [5], together with Peterson's modification [6]. The reverberation time was about 0.3 second. Referring to Fig. 6, an interference (band-limited Gaussian noise) was used, moving from (3.00m, 1.80m, 1.50m) to (3.00m, 3.00m, 0.30m) along an arc around the normal of the cross array and outside the capture region of the cross array. The Signal-to-Interference Ratio was about –8dB. We want to point out that the interference would have fallen into the allowable target region while moving if the linear array had been used.

(6) T

m = 1, L , 2 M where L1 is the number of delay samples for causality, N' the number of taps in each NCAF, Wm(k) and Zm(k) are the filter coefficient vector and the signal vector of the mth NCAF. 4. SIMULATIONS

As shown in the resulting output waveform (see Fig. 7), interferences have been significantly suppressed after the adaptation in less than 0.2 seconds. The output power during speech absence (such as in 10th second or 25th second) indicates the achieved interference reduction ratio is more than 21dB in this experiment. Some distortion was observed due to the low SIR and imperfect adaptation control. Nevertheless intelligibility was rather good by informal listening tests.

To demonstrate the 3-Dimension directionality performance of the proposed cross array, in this section we have carried out simulations for two environments: an anechoic environment and a simulated reverberant room. The 3-D directionality of the cross array was investigated using a structure as illustrated in Fig. 3. The following parameters have been used, similar to those used in [2] where N = N' = 16, L1 = 10, L2 = 5. The sampling rate was 8000 Hz.

5. CONCLUSIONS

A. Anechoic environment

In this paper a robust adaptive cross microphone array has been proposed which has a cone-shaped directionality pattern. In this way, the linear ambiguity of linear array is avoided. While the system still has an ambiguity in the front and rear end, the problem can be resolved by using unidirectional microphones (such as cardioid) instead of omni-directional microphone. The “dead” direction of the unidirectional microphone should face the rear of the cross array plane so as to cancel the noise coming from behind. Alternatively, carefully designed installation scheme or acoustic shading can be considered to avoid this front-back ambiguity. A study of the adaptation control scheme and near field sound source is currently in progress.

A band-limited (0.3-3.7 kHz) Gaussian noise was used as the test signal. We have assumed that the desired signal comes from the normal to the cross array plane. The maximum allowable targetdirection error was 15°. The half sphere was scanned using at increments of 2° in both the X- and Y- directions. The control scheme for the adaptation of the CCAF and NCAF is similar to those used in [3] but the iterations for the two stages were reduced greatly. The CCAFs were adapted for only 10000 iterations, about five times fewer than in [2] and the NCAFs were adapted for 100000 iterations instead of 150000 in [2]. Compared with the linear array used in [2]-[4], the faster convergence may benefit from the reduced allowable capture region and additional enhancement in the signal reference in the upper branch of Fig. 4.

6. REFERENCE

[1] M. Brandstein, D. Ward, Microphone arrays: Signal Processing techniques and applications, Springer-Verlag Berlin Heidelberg, New York, 2001 [2] O. Hoshuyama, A. Sugiyama, A. Hirano, “A robust adaptive beamformer for microphone arrays with a blocking matrix using constrained adaptive filters”, IEEE Trans. on Signal Processing, Vol. 47 No.10, Oct. 1999, pp. 2677–2684. [3] O. Hoshuyama, B. Begasse, A. Sugiyama, A. Hirano, “A real time robust adaptive microphone array controlled by an SNR estimate”, Proc. IEEE Int. Conf. on Acoust., Speech,, Signal Processing, Vol. 6, 1998, pp. 3605 -3608. [4] O. Hoshuyama, A. Sugiyama, “A robust adaptive beamformer for microphone arrays with a blocking matrix using constrained adaptive filters”, Proc. IEEE Int. Conf. on Acoust., Speech and Signal Processing, Vol. 2, 1996, pp. 925 –928.

The polar plot of the total output powers after convergence, normalized by the power of the assumed target direction is shown in Fig. 5. Interference originating outside the cone (in this case 15° apart from the normal of the plane) have been attenuated by no less than 23dB. The 3-D directionality pattern appears as a super directional microphone capable of canceling most of the interferences outside the cone-shaped region, which is to be compared to the ring-shaped region for linear arrays. As a result, the 3-D directionality scheme achieves a higher SNR gain than the linear array. B. Reverberant environment

A more practical environment was simulated using a rectangular room (7m×6m×3m) with uniform wall reflection coefficients. As

1684

[6] P. M. Peterson, “Simulating the response of multiple microphones to a single acoustic source in a reverberant room”, J. Acoust. Soc. Am., Vol. 80, No. 5, May 1986, pp. 1527-1529.

[5] J. B. Allen, D. A. Berkley, “Image method for efficiently simulating small-room acoustics”, J. Acoust. Soc. Am., Vol. 65, No.4, April 1979, pp. 943-950.

x1 (k) : :

M

b(k) xM (k) y1 (k)

1

: :

MC

z-L1

d(k)

+ b(k-L1)

FBF

Response(dB)

1

-

yM (k)

M

BM1

z-L2

: : z

-L2

BM2

z

-L2

: : z-L2

sinα cosβ

H1(k)

CCAF x1(k-L2) HM(k)

W1(k)

NCAF

z1(k)

:

:

CCAF xM(k-L2)

sinα sinβ

WM(k)

NCAF

zM(k)

HM+1(k)

CCAF y1(k-L2)

WM+1(k)

zM+1(k)

:

NCAF

:

H2M(k)

CCAF yM(k-L2)

W2M(k)

z2M(k)

NCAF

Fig.4. Diagram of the cross microphone array with CCAFNCAF adaptive beamformer structure.

Fig. 5. 3-D directionality of the cross array in Euclidean axis (top) and polar axis (bottom) 1

0

-1

0

5

10

15

20

25

0

5

10

15

20

25

0

5

10

15

20

25

Amplitude(v)

1

0

-1 1

0

-1

T im e ( s e c . )

Fig. 6. Cross array and sound sources arrangement in the experiment (unit: meter).

Fig. 7. The adaptation result versus time (top: original signal; middle: noisy signal from one microphone; bottom: processed signal)

1685

Suggest Documents