Optimum microphone placement for array sound capture - CiteSeerX

11 downloads 296 Views 321KB Size Report
In an example application of speakerphone telephony, a properly mounted ...... where Q and V are de ned as: Q = 2. 6664. qT. 1. qT. 2 ... qT. P. 3. 7775 V = 2. 6664. DT i1;j1. DT .... \A Real-Time Desktop Microphone Array System for Speech ...
Optimum microphone placement for array sound capture Daniel V. Rabinkin, Richard J. Renomeron, Joseph C. French and James L. Flanagan Center for Computer Aids for Industrial Productivity, Rutgers Univ. P.O. Box 1390, Piscataway, NJ 08855-1390 World Wide Web: http://www.caip.rutgers.edu/multimedia/marrays

ABSTRACT

Microphone arrays can be used for high-quality sound pick up in reverberant and noisy environments. The beamforming capabilities of microphone array systems allow highly directional sound capture, providing superior signal-to-noise ratio (SNR) when compared to single microphone performance. There are two aspects in microphone array system performance: The ability of the system to locate and track sound sources, and its ability to selectively capture sound from those sources. Both aspects of system performance are strongly a ected by the spatial placement of microphone sensors. A method is needed to optimize sensor placement based on geometry of the environment and assumed sound source behavior. The objective of the optimization is to obtain the greatest average system SNR using a speci ed number of sensors. A method is derived to evaluate array performance for a given array con guration de ned by the above mentioned metrics. An overall performance function is described based on these metrics. A framework for optimum placement of sensors under the practical considerations of possible sensor placement and potential location of sound sources is also characterized.

1. INTRODUCTION

The microphone array provides a means of localizing sound pickup and improving sound quality. An array uses multiple sensors distributed in spatially distinct locations to provide signal-to-noise (SNR) gain for sound from a desired spatial region. The SNR gain is achieved due to principles of sound propagation. Sound signals generated at the source arrive at the array sensors with a known delay that is a function of sound propagation velocity and distance from sound source to sensor. The delay and sum beamforming algorithm adds the captured signals from the array sensors with corresponding delay in such a way that signal components originating from a desired location are combined coherently, while signals originating from other locations are combined in an incoherent fashion. This lends the desired signal gain over undesired noise that is a monotonically increasing function of the number of sensors. A key feature of the beamforming technique is its ability to modify the focal pickup region without physical alteration of sensor placement by applying the proper compensating signal delay. This makes the system adaptive, and enables it to track a moving sound source. In an example application of speakerphone telephony, a properly mounted microphone array is able to track a user walking about a room and conversing, considerably reducing the degradation of captured sound caused by the additive noise. E ective Beamforming relies on precise identi cation of sound source coordinates. The microphone array system must be able to locate and track the sound source of interest accurately. As the number of sensors in the array increases, it becomes possible to achieve a beam with sharper spatial selectivity. A \tight" beam requires more accurate position estimation | error in source position estimation will cause the beam to focus on an incorrect position and result in SNR attenuation, rather than SNR gain. It has been demonstrated that the placement of microphones a ects array system performance.10 In addition, unlike adaptive algorithms, which can adjust automatically to a changing environment, the microphone locations must be chosen during the design or installation of the system (or at least when the system is not in operation). It is therefore desirable to place the sensors in a fashion that will maximize performance based on expected system use characteristics and physical enclosure parameters. One of the primary uses of microphone array systems is high delity sound capture. An objective measure of a given system's performance for this task is the SNR of the processed signal. Average expected SNR gain for an array may be expressed and used as a performance metric. This metric is expressed as function of microphone positions for a given set of N microphones A metric will be proposed in Section 2

that de nes average SNR as a function of microphone position. This metric in turn relies on source location accuracy which will be described in Section 3. The performance of several array con gurations based on the performance metric will be evaluated in Section 4. Conclusions will be presented in Section 5.

2. PERFORMANCE METRIC 2.1. The SNR Metric for Array Capture

Array capture is illustrated in Figure 1. A free space sound propagation model is used to compute delay times between sound generation and sound arrival at the array sensors. The microphones in the array are used to convert sound pressure uctuation resulting from the sound source and interfering noise sources to an electric-potential signal.4 Wall Mi N Interfering noise

Mi+2

Re

flec

tion

Mi+1

XN

TDOA’s (1st Step)

e

nc

re rfe

X1 X2

e

Int

{Dij} Direc

t path

S

Source Location (2nd Step)

Microphones Sound source

Figure 1. Sound captured by the array microphones is composed of direct signal arrival, re ected signal, and interfering noise components.

s :{X,Y,Z}

Figure 2. A two stage source TDOA based source location algorithm

The microphone array approach assumes that the redundant information captured through the multiple sensors will enable the subtraction of additive noise and reverberant components of the signals captured by the array sensors. The aggregate of captured signals due to a single sound source and L interfering noise sources may be expressed in terms of vector notation: L X (1) x(t) = s  s(t ?  s ) + nl  nl (t ?  nl ) l=1

where vectors are indicated by bold notation. The vector x(t) is the captured signal column vector of signals xi (t) captured at microphone i, is the scaling column vector for the sound source re ecting attenuation of sound proportional to the distance between source and sensor, and is a set of L such attenuation vectors corresponding to the L noise source positions. The signal vector s is formed by time shifting the source signal s(t) by a delay vector  . This vector is formed from i that correspond to sound propagation delays from the source to microphone i and are de ned by the system geometry. The vector nl is the additive noise column vector set due to an interfering source l time shifted by a corresponding delay vector set  nl and scaled by a propagation scaling vector . The  operator indicates an element by element product. All Vectors are assumed to be size of N corresponding to N array sensors. One signal source, and L interfering noise sources are assumed. s

nl

nl

 All

notation is summarized in APPENDIX C

Equation (1) can also be written as:

x(t) = s  hs(t)  s(t) + where

L

X

nl  hnl (t)  nl (t)

l=1

(2)

hs (t) = (t ?  s) hnl (t) = (t ?  nl)

(3) (4) This notation re ects that fhg is the transfer function set corresponding to the propagation of source and noise signals to the sensor set. The operator  denotes convolution of s(t) with the elements of h. The frequency domain equivalent of (2) is given as:y

X(f ) = ( s  Hs(f ))S(f ) +

L

X

l=1

( nl  Hnl (f ))Nl (f )

(5)

The beamforming technique performs time alignment of the sound source portions of the captured signals by convolving Equation (2) by the causal inverse of h. Let: gs(t) = (t +  s) (6) Then: gs(t)  hs(t) = (t) (7) Gs(f )  Hs(f ) = 1 (8) and,z L X (9) y(t) = gs(t)  x(t) = s s(t) + nl  gs(t)  hnl(t)  nl (t) l=1

The outputs of the time alignment operation are summed to obtain the recovered signal:

# L "X N N N X X X 4 nli gsi(t)  hnli  nl (t) s^(t) = yi (t) = s(t) si + i=1 i=1 l=1 i=1 # L "X N N N X X X nli Gsi(f )Hnli (f ) Nli (f ) S^(f ) = Yi (f ) = S (f ) si + i=1 i=1 l=1 i=1

(10) (11)

Clearly the rst term of Equation (10) represents the \desired signal" portion of the recovered signal, while the second term represents noise. The equation for the SNR can then be written as: 4 10 log SNR = 10

8 > >
> =

N PN g (t)  h (t)  n (t)i2 > > ; nli l l=1 i=1 nli si

h > P > :

(12)

where the f (t) de nes the expected time average of function f (t). A frequency domain version of SNR may be similarly de ned from Equation (11): SNR = 10 log10

8 > >
P > :

L l=1



P

P

i N 2 i=1 si

9 > > =

N G (f )H (f ) N (f )i2 > > ; nli l i=1 nli si

y Upper case variables indicate the Fourier transform of the lowercase variables in the following notation. z Expression (6) is non-causal. In fact  (t +  s max ) is used and max is picked large enough to guarantee causality.

?

(13)

Here the over-bar denotes an average over the spectrum. A frequency dependent SNR that de nes the signal to noise ratio in a particular frequency component may be de ned by taking the pre-spectrum-average expression: 4 10 log SNR(f ) = 10

8 >
:

L l=1



P

2 P S (f ) Ni=1 si

9 > =

(14)

N G (f )H (f ) N (f ) 2 > ; nli l i=1 nli si

Equation (9) is seen to be a linear combination of the source signal and noise signals. It is therefore possible to evaluate it by applying the source signal separately, and each of the interference signals separately, and then combining the results.

2.2. Beamforming Response

Equation (6) assumes that  s is known exactly. During array operation ^ s can be computed from a position estimate, hence it is likely to be somewhat di erent from  s . This implies an estimated inverse function g^s (t) such that: G^ s(f )  Hs (f )  1 (15) but note the equality no longer holds. This causes (14) to become: 8 >
P :

2 P S (f ) Ni=1 siG^ si(f )Hsi (f )



9 > =



L PN G^ (f )H (f ) N (f ) 2 > ; nli l l=1 i=1 nli si

From properties of the Fourier transform of the delta function note that: G^ si(f )Hsi (f ) = e?2f ( ?^ ) si

and

(17)

si

G^ si(f )Hnli (f ) = e?2f ( ?^ ) nli

(18)

si

Substituting these into (16) results in a simpli ed expression:

9 2 P > = S (f ) Ni=1 sie?2f ( ?^ ) SNR(f ) = 10 log10 P P 2  > L N e?2f ( ?^ ) N (f ) > ; : l l=1 i=1 nli 8 >

Suggest Documents