H. Zhou, J. Lu, and X. Qiu, “Design of a Wideband Linear Microphone Array for High-Quality Audio Recording” J. Audio Eng. Soc., vol. 66, no. 3, pp. 154–166, (2018 March.). DOI: https://doi.org/10.17743/jaes.2018.0004
ENGINEERING REPORTS
Design of a Wideband Linear Microphone Array for High-Quality Audio Recording 1
1
2
HAORAN ZHOU , JING LU , AND XIAOJUN QIU, AES Member (
[email protected])
(
[email protected])
(
[email protected])
1
2
Key Lab of Modern Acoustics, Institute of Acoustics, Nanjing University, Nanjing, China Centre for Audio, Acoustics and Vibration, Faculty of Engineering and IT, University of Technology Sydney, Ultimo, NSW 2007, Australia
Linear microphone arrays with unequally spaced elements enable high wideband directivity with a small number of microphones. In this engineering report a general procedure for designing wideband linear microphone arrays steering at the endfire direction with unequally spaced elements is proposed for high quality audio recording from 20 Hz to 16 kHz. The simulated annealing method is used to optimize the spatial distribution iteratively for microphones that do not need to have matched amplitude and phase. The challenge in the optimization process is that the operations of large matrices needed for the wide bandwidth bring large accumulation error, thus the optimization process needs to be carefully regularized with the error being restricted to a relatively small tolerable level. The optimized microphone distributions and the beampatterns of arrays with different number of microphones are presented in simulations, and their directivities are compared with that of a same-length shotgun microphone. Finally, experiments were carried out to demonstrate that the microphone array designed with the proposed method outperforms the same-length shotgun microphone significantly in directivity while maintaining a comparable low self-noise level.
0 INTRODUCTION Shotgun microphones are preferable choices for highquality speech and audio recording in environments with intense ambient noise due to their superior directivities over other types of microphones. However, their low and middle frequency directivities are usually not sufficiently high for practical usage [1] because their interference tubes can only act to increase the directivities above a frequency inversely proportional to the lengths of the interference tubes [2]. Increasing the acoustic tube lengths can improve their directivities but forms an obstacle to their practical applications [3]. Microphone arrays provide an alternative to shotgun microphones for directional sound pickup, and they have already been utilized in stereo audio recording [4], hearing aids [5], distant speech recognition [6], and sound field capturing [7]. The directivity of the array is limited in most applications due to the restrictions of the array size and the microphone number, so some kind of post-processing is needed to further suppress the ambient noise and interference [8, 9]. Unfortunately, post-processing might cause degradation of the sound quality, which makes it incompetent for high-quality audio acquisition. Although large aperture arrays with a great number of micro154
phones have the benefit of sharp directivity, the sophistication of these systems prohibits their practical applications [10, 11]. To be an alternative of shotgun microphones, the microphone array must have similar or better frequency response, directivity, and self-noise level with similar or shorter physical length and an acceptable number of elements. For arrays with uniformly spaced elements, a large number of microphones is required to allow the microphone intervals meet the request of the spatial Nyquist sampling theorem [12]. Arrays with unequally spaced elements can alleviate spatial aliasing and allow a smaller number of elements [13]. However, the cost function for the optimization of the element distribution contains an exponential or trigonometric function, and the non-convex property of the cost function makes the optimization problem a nontrivial task. Many optimization algorithms have been proposed to achieve global optimization, for example, the evolutionary programming [14], the genetic algorithm [15], and the simulated annealing [16]. Other efforts include constructing an array with unequally spaced elements from the perspective of compressive sensing [17] and using a logarithmically spaced configuration [18]. Though these works all manage to optimize microphone distribution to achieve a better grating lobe and side lobe attenuation, none of them are designed J. Audio Eng. Soc., Vol. 66, No. 3, 2018 March
ENGINEERING REPORTS
WIDEBAND LINEAR MICROPHONE ARRAY
Fig. 1. Structure of a linear microphone array.
for high quality audio recording with convincing experiments. In this engineering report a general procedure for optimizing unequally spaced microphone arrays is proposed to serve as an alternative solution of shotgun microphones for high quality audio recording at the endfire direction. The time domain weighted least squares (WLS) cost function is chosen to be the optimization target because preliminary tests show that different methods such as the WLS beamforming [19], the superdirective beamforming [20], and the differential microphone array (DMA) [21] achieve similar directivities at low frequencies if a certain white noise gain (WNG) constraint is imposed, while the WLS beamformer can assure a reasonable mainlobe width at high frequencies. Since the optimization needs to be performed from 20 Hz to 16 kHz, the matrices used in the wideband cost function are usually of large sizes. In order to circumvent the influence of considerable accumulation error, the optimization process is carefully regularized and the error is restricted to a tolerable level. The optimized microphone distribution and the performance related to the number of microphones are presented in simulations to be compared with the directivity of a same-length shotgun microphone. Finally, experiments were carried out to measure the performance of a 16-element array designed with the proposed method.
where w = [w0 , w1 , . . ., w N −1 ]T , w n = [wn ,0 , wn ,1 , . . ., wn , L −1 ], e(ω) = [1, exp(−jω), exp(−j2ω) . . ., exp(−j(L−1)ω)], g(ω, φ, d) = [e(ω)exp(−jωτ0 (φ, d0 )), e(ω)exp(−jωτ1 (φ, d1 )), . . .e(ω)exp(−jωτ N −1 (φ, d N −1 ))]T , exp(·) denotes the exponential function, L is the length of each filter, d = [d0, d1, . . ., d N −1 ] represents the element distribution, τn (φ, dn ) = cosφ dn fs /c denotes the relative delay of samples for the signal received by the nth microphone, c is the speed of sound, fs is the sampling rate, and the superscript (·) T denotes the transpose operation. The cost function of the WLS beamformer is calculated in the time domain because a balance can be achieved between the filter length and the frequency resolution [19]. The normally utilized WLS error between the desired directivity and the actual directivity of the array is given by F(ω, ϕ)|H (ω, ϕ, d) JLS (w,d) =
− D(ω, ϕ)|2 dωdϕ = wT QLS (d)w − 2wT a(d) + bLS ,
(2)
where D(ω, φ) denotes the desired directivity, H(ω, φ, d) is the actual directivity for the specific array filters and the element distribution, and F(ω, φ) the error weight. The integration ranges of φ and ω are both from 0 to π. QLS (d), a(d), and bLS are defined as [19] QLS (d) = F(ω, ϕ)GR (ω, ϕ, d)dωdϕ (3)
G(ω, ϕ, d) = g(ω,ϕ, d)gH (ω, ϕ, d),
(4)
a(d) =
F(ω, ϕ) [DR (ω, ϕ)gR (ω, ϕ, d)
+DI (ω, ϕ)gI (ω, ϕ, d)] dωdϕ,
(5)
bLS =
1 ARRAY STRUCTURE AND THE COST FUNCTION The structure of a typical linear microphone array is shown in Fig. 1, where a unit amplitude plane wave of normalized angular frequency ω is incident from the azimuth angle φ. The signals from the aligned microphones are filtered and added to enhance the signal from a certain direction by suppressing noise from other directions. N is the number of the elements, Yn (ω, φ) represents the signal received by the nth microphone, O is the origin of the coordinates, w n is the vector containing the coefficients of the FIR filter for the nth microphone, dn is the distance from the nth microphone to the reference point, and Z(ω, φ) is the array output for the expected enhanced signal. The directivity of the array, i.e., the equivalent transfer function between the incident signal and the array output, can be described as [19] H (ω, ϕ, d) = wT g(ω, ϕ, d) J. Audio Eng. Soc., Vol. 66, No. 3, 2018 March
(1)
F(ω, ϕ)|D(ω, ϕ)|2 dωdϕ,
(6)
where (·)R means the real part and (·)I the imaginary part. Also, the mismatches of microphones are taken into account to guarantee the robustness of the microphone array. The magnitude and phase characteristics of the nth microphone can be described by An (ω, φ) = an (ω, φ)ejψ n(ω,φ) , where an and ψn denote the variable An ’s amplitude and phase with probability density functions (PDF) fa (an ) and fψ (ψn ) respectively, and f A (An ) denotes the probability density function of An . The details of the derivation can be viewed at [19] and the main results are listed as follows. mean (w, d) can be written as The mean WLS cost function JLS [19] mean (w, d) = ... JL S (w, A, d) f A ( A0 ) . . . JLS A0
A N −1
× f A ( A N −1 )d A0 . . . d A N −1 = wT Qmean (d)w − 2wT amean (d) + bLS ,
(7) 155
ZHOU ET AL.
ENGINEERING REPORTS
where A = [A0 , A1 , . . ., A N −1 ]. Assuming the independence between the amplitude and the phase, the integral parameters in Eq. (7) can be expressed as [19] amean (d) = μa μcφ a(d) + μa μsφ ao (d)
Qmean (d)
⎡
σa2 L μa2 σφc L ⎢μa2 σφc L σ2 L a ⎢ = QLS (d) ⎢ .. .. ⎣ . . μa2 σφc L μa2 σφc L
(8)
⎤ · · · μa2 σφc L · · · μa2 σφc L ⎥ ⎥ ⎥ , (9) .. .. ⎦ . . 2 · · · σa L
ao (d) =
F(ω, θ)D(ω, θ)gI (ω, θ, d)dωdθ,
(10)
μa = a a f α (a)da, μcφ = φ cos φ f ψ (φ)dφ, μsφ = φ sin φ f ψ (φ)dφ, σa2 = a a 2 f α (a)da, σφc = (μcφ )2 + (μsφ )2 , σφs = μsφ μcφ − μcφ μsφ = 0,
(11)
where L is an L × L dimensional matrix with all elements equaling to 1 and denotes element-wise multiplication. The matrix Qmean (d) is usually of high order, so a regularization term is added to alleviate the influence of a possible ill-conditioning matrix. The modified cost function is described as mean (w, d) = wT Qmean (d)w − 2wT amean (d) JLS,R
+bLS + λwT w = wT Qmean,R (d)w − 2wT amean (d) + bLS ,
(12)
where λ is a regularization parameter and Qmean,R (d) = mean ,R(w, d) with respect to Qmean (d) + λ I. Minimizing JLS w yields wLS (d) = Q−1 mean,R (d)amean (d).
(13)
Substitute Eq. (13) into the original cost function Eq. mean ,R(wLS , d) can be obtained (12), the minimum value of JLS to show the performance of the element distribution. The cost function to be optimized with respect to d is mean mean (d) = JLS,R (wLS , d). JLS,opt
(14)
2 OPTIMIZATION In this section the optimization of the element distribution will be discussed in detail. Before optimization, the numerical computation error must be analyzed to assure a valid optimization result. In scenarios where only a narrow band cost function with a small number of microphones needed, the computational error is negligible and does not need to be analyzed. However, for optimization with such a broad bandwidth for high quality audio recording, the matrices in the cost function are of large sizes and the regularization term and the numerical computational accuracy must be analyzed carefully to constrain the computational error. Among all the optimization methods described in the Introduction, the simulated annealing (SA) method is chosen for the optimization because of its fast convergence and 156
relatively small computational burden compared to other optimization methods. 2.1 Constraining the Computational Error When given an element distribution setting d, the cost mean (d) needs to be computed numerically to function JLS,Opt evaluate such a distribution. The errors caused by the numerical integrations in the calculation of Qmean (d) and amean (d) will have a detrimental effect and even lead to unreasonable optimization results. Thus the regularization factor λ and the accuracy of integration must be tuned carefully to guarantee that the computational error is significantly less than the cost function. Each element in Qmean (d) and amean (d), which is derived from QLS (d) and a(d), can be obtained by the double integrals formulized in Eq. (3). Note that a◦ (d) equals to zero given the parameter set in this paper. The integral over ω can be calculated analytically given F(ω, φ) and D(ω, φ) set in Eq. (20) [19] as elaborated in the Appendix, and the integral over φ can be computed numerically using the composite Simpson’s rule while controlling the error smaller than the integral error δI [22]. Usually a large number of parameters need to be optimized for such a wide band optimization. For example, for an array of 16 microphone elements with a 200-tap FIR filter acting on each input, the matrix Qmean is of size 3200 × 3200 and the vector amean is of size 3200 × 1. The cost function is sensitive to the error accumulation. Both the regularization parameter λ and the integral error δI need to be chosen carefully. In this paper λ = 0.01 and δI = 1 × 10−14 , and their efficacy is justified as follows. Denote as the error operator representing the absolute computational error caused by the integral. From Eq. (12) and Eq. (13), the total error can be estimated as εall = (wTLS Qmean,R wLS − 2wTLS amean + bLS ) ≈ (aTmean wLS ),
(15)
where bLS is left out as there is no error accumulation when calculating this term. Neglecting the high order terms, Eq. (15) can be approximated by εall ≈ (aTmean wLS ) ≈ (aTmean )wLS +aTmean Q−1 mean,R (amean ) +aTmean (Q−1 mean,R )amean .
(16)
During the optimization procedure, the upper limit of the elements in abs(w) is set to be 1 where abs (·) denotes the element-wise absolute value. This means that the distributions yielding abs(w) elements bigger than 1 will be abandoned. Thus the first term on the right side of Eq. (16) is smaller than 3200 × δI = 3.2 × 10−11 and can be neglected when compared to the cost function value as shown in Sec. 3. The observed maximum value in abs(Q−1 mean,R ) is always smaller than 50 in the simulations given the parameters set in this paper, and the second term is smaller than 3200 × 50 × 3200 × δI ∼ = 5.1 × 10−6 , which is significantly smaller than the cost function value. To evaluate the last term of Eq. (16), the error propagation for matrix inversion [23] needs J. Audio Eng. Soc., Vol. 66, No. 3, 2018 March
ENGINEERING REPORTS
WIDEBAND LINEAR MICROPHONE ARRAY
to be used, which is described as
Q−1
Q
mean,R 2 mean,R 2
−1 ≤ κ(Qmean,R )
Q
Q
mean,R 2 mean,R 2
2 +O Qmean,R 2 ,
(17)
where κ denotes the condition number and κ(Qmean, R ) = ||Q−1 mean,R ||2 ||Qmean,R ||2 . ||·||2 denotes the l2 norm and O(·) is the big O notation. Neglecting the high order term, the inversion error can be estimated as
Q−1 ≤ Q−1 2 Q (18) mean,R 2 . mean,R 2 mean,R 2 The observed maximum value of ||Q−1 mean,R ||2 , which depends strongly on the regularization factor λ, is found to be smaller than 101 in the simulations. Since the l2 norm of a matrix is always equal to or smaller than its Frobenius norm, the l2 norm of (Q−1 mean,R ) is estimated to be smaller than 1012 × (3200 × 3200 × δI 2 )0.5 ∼ = 3.3 × 10−7 . Given the parameters set in this paper, the maximum value in abs(amean ) equals to 0.58, hence the variance of the last term can be estimated as
T
a (Q−1 )amean ≤ amean 2 Q−1
mean
mean,R −4
2
≤ 3.5 × 10 ,
2
mean,R 2
(19)
which corresponds to a reasonable tolerance compared to the cost function value. From the above analysis, it can be seen that with the denoted λ and δI , the error accumulation can be kept at a trivial level and the convergence of the optimization process can be guaranteed. Other settings of the parameters can also achieve a tolerable error level. As a general rule, if λ is too large, the directivity will deteriorate. However, if λ is too small, ||Q−1 mean,R ||2 will be large and δI must be small enough to constrain the numerical error, which requires considerably more time for the numerical calculation of integrals. Based on the simulations and experiments conducted in this paper, the current parameter setting can achieve a satisfactory directivity with a reasonable computation time. 2.2 Element Distribution Optimization SA is chosen for optimizing the element distribution because of its relatively small computational burden. SA is a heuristic technique for finding an approximation of the global minimum. The basic idea is to explore the solution space and accept better solutions as well as worse solutions with a slowly decaying probability. The optimization procedure using SA is summarized as follows, which is similar to that used in [16] with some minor revisions. (a) Initialize the temperature T = Tmax , the cooling coefficient α, the minimum distance between adjacent elements ε, the maximum position perturbation δ, the upper bound of the filter elements’ absolute values δ2 , and the number of iterations Nmax . (b) Generate an initial distribution d by fixing two elements at the two ends while distributing other element positions randomly and keeping the interJ. Audio Eng. Soc., Vol. 66, No. 3, 2018 March
Fig. 2. Flow chart of the element distribution optimization using the simulated annealing (SA) method.
vals between the adjacent elements greater than the threshold ε. (c) Generate a neighboring distribution dnew by perturbing the element positions in d in random order while assuring that every position perturbation is less than δ and the intervals between the adjacent elements are greater than the threshold ε. (d) Compare the cost function values of the old distrimean (d) with that of the new distribution bution JLS,Opt mean JLS,Opt (dnew ), and choose one according to the acceptance rule. If the maximum absolute value of any of the filter coefficients w exceeds the threshold δ2 , the new distribution will be abandoned. (e) Repeat (b)–(d) above until reaching the maximum number of iterations. T is a parameter controlling the initial probability of accepting worse solutions and α is a parameter controlling the decay of that accepting probability. A detailed flow chart for optimizing the element distribution is illustrated in Fig. 2, in which i is the looping index for the number of 157
ZHOU ET AL.
ENGINEERING REPORTS
Fig. 3. (a) Value of the cost function during optimization. (b) Optimized distribution of the array elements.
optimization, s is an array containing the randomly generated perturbation order, j is the looping index for perturbing different microphones. The optimization parameters are set as follows. The length of the array Lmax = 27.9 cm, the number of the elements N = 16, the taps of the FIR filters L = 200, the sampling rate fs = 44.1 kHz, the speed of sound c = 340 m/s, the regularization factor λ = 0.01, the minimum distance δ = 0.008 m, the maximum position perturbation δ = 0.02 m, and the upper limit of abs(w), which is represented by δ2 , equals to 1. The frequency range is from 20 Hz to 16 kHz, outside of which F(ω, φ) equals to 0. The weighted desired directivity is ⎧ D (ω, ϕ) = 1, F (ω, ϕ) = 1 ⎪ ⎪ ⎪ ⎪ 0 < ϕ < 15 (passband) ⎪ ⎪ ⎨ D (ω, ϕ) = 0, F (ω, ϕ) = 0 , (20) 15 < ϕ < 25 (transitionband) ⎪ ⎪ ⎪ ⎪ D (ω, ϕ) = 0, F (ω, ϕ) = 1 ⎪ ⎪ ⎩ 25 < ϕ < 180 (stopband) In the calculation, the initial temperature Tmax is chosen to be 0.1, the cooling coefficient α equals 0.99, and the number of iterations Nmax is 1000. The PDF of the gain fa (an ) is a Gaussian distribution with mean 1 and standard deviation 0.3, and the PDF of the phase is a Gaussian distribution with mean 0 and standard deviation 0.3 radian. The deviations of the microphone characteristics are set based on the measurements on the MEMS microphones used for the array. Two special techniques are used here to improve the optimization result and accelerate the computation. The first is that all elements in the steering vector g(ω, φ, d) are multiplied by exp(jLω/2) with L the filter length to guarantee the causality of the array filters. The second is programming the procedure with heavy computational burden in 158
C language. Since there are a great number of elements in QLS , a, and a◦ , the calculation of these elements is implemented using C language and then the values are returned to Matlab using a mex function. With these techniques, the time for computing the cost function for a single time takes about five minutes using an Intel I5-3210M processor.
3 SIMULATIONS 3.1 Element Distribution Analysis Fig. 3(a) shows the value of the cost function as a function of the iteration number for an array of 16 microphones and the optimized array distribution is shown in Fig. 3(b). It is clear that the optimization process converges well and the optimized elements’ positions show a denser distribution in the middle of the array and a sparser distribution at the two ends. The frequency responses of the filters are shown in Fig. 4, where the filters for the sparse distributed elements at the two ends have a low-pass tendency. This is reasonable because the spatial aliasing problem can be alleviated by assigning relatively smaller weights to those sparse distributed elements at high frequencies. For the densely distributed elements in the middle, their weights are relatively small at low frequencies so that the system can be more robust to the low frequency sound field fluctuation. To validate the effectiveness of the optimized distribution, another simulation is conducted based on an array with 16 equally spaced elements, and the beampatterns of the two arrays are compared in Fig. 5. The grating lobes above the Nyquist frequency of the array (9.1 kHz) can be clearly observed for the equally-distributed array. This J. Audio Eng. Soc., Vol. 66, No. 3, 2018 March
ENGINEERING REPORTS
WIDEBAND LINEAR MICROPHONE ARRAY
Fig. 4. Frequency responses of the FIR filters for the microphones (a) Filters for microphones indexed 0–3. (b) Filters for microphones indexed 4–7. (c) Filters for microphones indexed 8–11. (d) Filters for microphones indexed 12–15.
severely deteriorates the noise shielding capability of the array.
3.2 Performance with Different Microphone Numbers Optimizations with different microphone numbers were also carried out with parameters described in Sec. 2. The obtained optimal distributions, the beam patterns, and the cost function values are shown in Table 1 with the same color bar used in Fig. 5. It is clear that the performance of the array improves with the increase of the number of microphone elements. It can also be seen that all optimizations show denser distributions in the middle and sparser distributions at the two ends, which is consistent with the discussion in Sec. 3.1. To further quantify the comparison results, the directivity index (DI) is utilized for measuring the improved directivity of a directional recording device to an omnidirectional microphone in a spherically isotropic sound field [21]. The
discretized DI can be approximated as: ⎛
⎞
⎟ ⎜ ⎟ ⎜ 4πH (ω, 0, dopt ) ⎟ D I (ω) = 20log10 ⎜ ⎟ ⎜ 2π π ⎠ ⎝ H (ω, ϕ, dopt ) sin(ϕ)dϕdθ ⎛
θ=0 ϕ=0
⎜ 2M ⎜ ≈ 20log10 ⎜ ⎝ π M−1 m=0
⎞ H (ω, 0, dopt )
H (ω,
m m π, dopt ) sin( M π) M
⎟ ⎟ ⎟ ,(21) ⎠
where H(ω, φ, dopt ) is the beambattern for the optimized array with distribution dopt , and M is the number of measured angles from 0 to π. The DI of all the optimized arrays, together with that measured for a shotgun microphone (Audio Technica AT897), are shown in Table 2. Compared with the shotgun microphone, significantly better directivity can be achieved above 250 Hz for the optimized array with more than 10 elements. For frequency below 250 Hz, both the
Fig. 5. Beampatterns of the arrays with (a) the optimized element distribution of Fig. 3 and (b) equally spaced elements.
J. Audio Eng. Soc., Vol. 66, No. 3, 2018 March
159
ZHOU ET AL.
ENGINEERING REPORTS
Table 1. The cost function values, configurations, and beampatterns of arrays with different number of elements Number of elements
6
8
Cost function value Element distribution
0.323
0.256
10 0.207
12 0.168
14 0.134
16 0.104
Beampattern
Number of elements Cost function value Element distribution
Beampattern
Number of elements Cost function value Element distribution
Beampattern
Table 2. Directivity index of microphone arrays with different numbers of microphones (dB)
HH
Frequency HH (kHz) Number of HH HH elements
0.25 0.5
6 8 10 12 14 16 AT897
6.7 6.8 6.8 6.8 6.9 7.0 7.1
8.8 9.0 9.0 9.2 9.2 9.3 7.4
1
2
4
8
16
8.8 10.6 10.5 10.7 11.0 11.2 7.9
7.9 10.6 10.7 11.5 12.8 13.3 8.8
9.1 11.2 11.4 13.3 13.3 14.6 9.8
10.5 11.0 12.5 14.2 14.1 14.4 10.3
7.7 9.4 12.1 12.5 13.4 13.8 12.3
microphone array and the shotgun microphone show similar super cardioid directivity patterns. 160
4 EXPERIMENTS An optimized 16-element array, as described in Sec. 3.1, was built and its directivity, equivalent noise level, and impulse response are compared with those of a typical shotgun microphone. The array was constructed using MEMS microphones since this kind of microphone has the merits of miniature size and well-matched frequency responses. The signal-to-noise ratio (SNR) of the MEMS element used in the array is 66 dBA. The tests of the optimized array and the shotgun microphone were carried out in an anechoic chamber, as shown in Fig. 6. In the experiments, a B&K PULSE Multi-Analyzer System (Type 3560D) was used to generate audio signals and record the microphones’ output signals, and a B&K turntable (type 9640) was used to adjust the relative angle between the tested object and the sound source. J. Audio Eng. Soc., Vol. 66, No. 3, 2018 March
ENGINEERING REPORTS
WIDEBAND LINEAR MICROPHONE ARRAY
Fig. 6. (a) The optimized array and the AT897 shotgun microphone. (b) The microphone array during test.
The frequency responses of both the microphone array and the shotgun microphone were calibrated to guarantee a flat amplitude frequency response. For the microphone array, the mean frequency response of the MEMS microphones and the beamformer’s frequency response are both utilized to compensate the array response. The shotgun microphone’s frequency response was also calibrated using its tested frequency response.
4.1 Directivity Test The beampatterns tested using pure tone signals are shown in Fig. 7. There is a good conformity between the theoretical and experimental results though the microphones were not pre-adjusted to have the same amplitude and phase response through calibration process. This is mainly attributed to the good consistency of the microphones and the robustness of the beamformer. Compared with the shotgun microphone, it can be found that the microphone array has a distinct advantage at frequencies from 500 Hz to 8 kHz. This agrees well with the DI shown in Table 2. It can be seen that the optimized distribution effectively suppresses the grating lobe up to 16 kHz. The test results using broadband signals are shown in Fig. 8, which contains the measured beampatterns and the recorded waveforms of a speech source with the existence of a white noise source. The speech source was located in the 0-degree of the tested device (φ = 0 in Fig. 1), and the noise source was located at a 60-degree incident direction. The optimized array obviously has a narrower mainlobe, greater attenuation in the stopband, and a superior noise shielding performance.
4.2 Equivalent Self-Noise Level Test Aside from directivity, equivalent noise level is another crucial indicator when the device is used for high quality audio recording. The equivalent self-noise level of the array, as well as that of a microphone, is expressed as an equivaJ. Audio Eng. Soc., Vol. 66, No. 3, 2018 March
lent acoustical noise level stated in dBA [2] and is defined as: Vn , (22) L n = 20log M P0 where Vn is the square root of variance of the output voltage, M is the sensitivity of the microphone, and P0 is the reference sound pressure. The sensitivities of the array and the shotgun microphone were tested in the anechoic chamber at 1 kHz and the result is –41.6 dB (V / Pa) for the shotgun microphone and –38.1 dB(V / Pa) for the microphone array. Then the frequency responses of the microphone array and the shotgun microphone are calibrated as stated in the beginning of Sec. 4. The calibration process is necessary because different frequency responses can lead to different estimations of Vn . Finally, their self-noise signals were recorded, and the selfnoise signals were transformed to the frequency domain and the A-weighted equivalent noise levels are shown in Fig. 9(a). It can be clearly seen that the optimized array has a similarly low noise level (19.6 dBA) as that of the shotgun microphone (19.1 dBA). The white noise gain (WNG) [21], which indicates the array’s ability to suppress spatial uncorrelated noise, is calculated and shown in Fig. 9(b). It agrees well with the basic microphone theory [11] that a higher WNG can be easily achieved at high frequency. So the self-noise of the optimized array has a clear lowpass feature and is very different from that of the shotgun microphone. 4.3 Impulse Response and Frequency Response Impulse response and frequency response are also crucial factors when analyzing high quality recording devices. The impulse responses of the array and the shotgun microphone were tested in the anechoic chamber and are shown in Fig. 10 and their frequency responses are shown in Fig. 11. The two devices both have very short impulse responses and the shotgun microphone has a slightly better transient 161
ZHOU ET AL.
ENGINEERING REPORTS
Fig. 7. Beampatterns tested using pure tone signals in an anechoic chamber. (a) The microphone array. (b) AT897.
response, which is hardly noticeable for recording. After calibrating their frequency responses, both the shotgun microphone and the microphone array have flat magnitude frequency responses, and they are also approximately linearphase systems in their pass bands. The shotgun microphone has a broader bandwidth than the microphone array above 16 kHz, which is often inaudible for human auditory systems. 162
5 CONCLUSION In this engineering report a general procedure for designing wideband microphone arrays for high quality recording is proposed. The wide bandwidth, which is from 20 Hz to 16 kHz, leads to really large matrices during optimization. In order to circumvent the influence of considerable accumulation error caused by this, the optimization process is J. Audio Eng. Soc., Vol. 66, No. 3, 2018 March
ENGINEERING REPORTS
WIDEBAND LINEAR MICROPHONE ARRAY
Fig. 8. Beampatterns tested using broadband signals in an anechoic chamber. (a) The microphone array. (b) AT897. Speech signal waveforms recorded in the existence of a white noise source. (c) The microphone array. (d) AT897.
Fig. 9. (a) Equivalent noise levels of the optimized array and the shotgun microphone. (b) WNG of the microphone array.
Fig. 10. Impulse responses of the (a) shotgun microphone (b) microphone array J. Audio Eng. Soc., Vol. 66, No. 3, 2018 March
163
ZHOU ET AL.
ENGINEERING REPORTS
Fig. 11. Frequency response of the (a) shotgun microphone (b) microphone array
carefully regularized and a relatively small numerical computation error tolerance is set. The performances and the distribution patterns of the arrays with different numbers of elements optimized using this procedure are analyzed and experimental results based on a 16-element array designed with the proposed method are presented. It shows that the proposed method can design microphone arrays with higher directivity than the shotgun microphones of the same length with comparable low self-noise level. Considering that the hardware cost of microphone arrays reduced considerably in the last few years, the proposed system is expected to be a competitive choice for high-quality audio recording. 6 ACKNOWLEDGMENTS This work was supported by National Natural Science Foundation of China (Grants No. 11374156 and No. 11474163). 7 REFERENCES [1] M. R. Bai and Y. Y. Lo, “Refined Acoustic Modeling and Analysis of Shotgun Microphones,” J. Acoust. Soc. Amer., vol. 133, no. 4, pp. 2036–2045 (2013 Apr.). [2] J. Eargle, The Microphone Handbook, 2nd ed. (Focal Press, Oxford, 2005). [3] Y. Sasaki, T. Nishiguchi, Toshiyuki, et al. “Development of Shotgun Microphone with Extra-Long Leaky Acoustic Tube,” presented at the 141st Convention of the Audio Engineering Society (2016 Sept.), convention paper 9639. [4] E. Hulsebos, D. de Vries and E. Bourdillat. “Improved Microphone Array Configurations for Auralization of Sound Fields by Wave-Field Synthesis,” J. Audio Eng. Soc., vol. 50, pp. 779–790 (2002 Oct.). [5] W. Soede, A. J. Berkhout and F. A. Bilsen, “Development of a Directional Hearing Instrument Based on Array Technology,” J. Acoust. Soc. Amer., vol. 94, no. 2, pp. 785–798 (1993 Aug.). [6] K. Kumatani, J. W. McDonough and B. Raj, “Microphone Array Processing for Distant Speech Recognition: From Close-Talking Microphones to Far-Field Sensors,” IEEE Signal Process. Mag., vol. 29, no. 6, pp. 127–140 (2012 Nov.). 164
[7] J. Merimaa, “Applications of a 3-D Microphone Array,” presented at the 112th Convention of the Audio Engineering Society (2002 Apr.), convention paper 5501. [8] M. Stolbov and S. Aleinik, “Speech Enhancement with Microphone Array Using Frequency-Domain Alignment Technique,” presented at the AES 54th International Conference: Audio Forensics (2014 Jun.) conference paper 5-1. [9] S. Goetze, K. D. Kammeyer and V. Mildner, “A Psychoacoustic Noise Reduction Approach for Stereo HandsFree Systems,” presented at the 120th Convention of the Audio Engineering Society, (2006 May), convention paper 6759. [10] J. Meyer and G. W. Elko, “A Highly Scalable Spherical Microphone Array Based on an Orthonormal Decomposition of the Soundfield,” Proc. ICASSP, vol. II, pp. 1781–1784 (2002). [11] E. Weinstein, K. Steele, et al., “A 1020-Node Modular Microphone Array and Beamformer for Intelligent Computing Spaces,” MIT, M1T/LCS Technical Memo MIT-LCSTM-642 (2004). [12] J. Benesty, J. Chen, and Y. Huang, Microphone Array Signal Processing (Springer-Verlag, Berlin, 2008). [13] H. Unz, “Linear Arrays with Arbitrarily Distributed Elements,” IEEE Trans. Antennas Propag., vol. 8, no. 2, pp. 222–223 (1960 Mar.). [14] C. Kumar, S. S. Rao and A. Hoorfar, “Optimization of Thinned Phased Arrays Using Evolutionary Programming,” Proceedings of the 7th Int. Conf. on Evolutionary Programming, pp. 157–166 (1998 Mar.). [15] K. Chen, X. Yun, Z. He and C. Han, “Synthesis of Sparse Planar Array Using Modified Real Genetic Algorithm,” IEEE Trans. Antennas Propag., vol. 55, no. 4 (2007 Apr.). [16] M. Crocco and A. Trucco, “Stochastic and Analytic Optimization of Sparse Aperiodic Arrays and Broadband Beamformers with Robust Superdirective Patterns,” IEEE Trans. Audio Speech Language Process., vol. 20, no. 9, pp. 2433–2447 (2012 Nov.). [17] L. Carin, “On the Relationship between Compressive Sensing and Random Sensor arrays,” IEEE Antennas Propag. Mag., vol. 5, no. 5, pp. 72–81 (2009 Oct.). [18] M. Van Der Wal, E. W. Start, and D. De Vries, “Design of Logarithmically Spaced Constant-Directivity J. Audio Eng. Soc., Vol. 66, No. 3, 2018 March
ENGINEERING REPORTS
WIDEBAND LINEAR MICROPHONE ARRAY
Transducer Arrays,” J. Audio Eng. Soc., vol. 44, pp. 497– 507 (1996 Jun.). [19] S. Doclo, Multi-Microphone Noise Reduction and Dereverberation Techniques for Speech Applications, Ph.D. thesis, University of Leuven, Leuven, Belgium (2003). [20] H. Cox, R. Zeskind, and T. Kooij, “Practical Supergain,” IEEE Trans. Acoust. Speech Signal Process., vol. ASSP-34, pp. 393–398 (1986 Jul.). [21] M. S. Brandstein and D. B. Ward, Microphone Arrays: Signal Processing Techniques and Applications (Springer-Verlag, Berlin, 2001). [22] E. S¨uli and D. F. Mayers. An Introduction to Numerical Analysis. (Cambridge University Press, Cambridge, 2003). [23] J. Demmel, “The Componentwise Distance to the Nearest Singular Matrix,” SIAM Journal on Matrix Analysis and Applications, vol. 13, pp. 10–19 (1992 Jan.). APPENDIX In this section the calculations of integrations in Eq. (3), Eq. (5), and Eq. (10) will be briefly described. Generally, integrals in those equations can be expressed as the following form given the values of D(ω, φ) and F(ω, φ) chosen in the paper θ2 ω2 cos [ω (α A + β A cos θ) + γ A ]dωdθ,
I =
(A. 1)
θ1 ω1
Case 1: β A = 0, α A = 0. ω2 I = (θ2 − θ1 ) cos(ωα A + γ A )dω ω1
= (θ2 − θ1 )
sin(ω2 α A + γ A ) − sin(ω1 α A + γ A ) . α (A. 2)
(A. 3)
For other cases, θ2 I = θ1
θ2 − θ1
sin(ω2 (α A + β A cos θ) + γ A ) dθ ω2 (α A + β A cos θ) sin(ω1 (α A + β A cos θ) + γ A ) dθ. ω1 (α A + β A cos θ)
J. Audio Eng. Soc., Vol. 66, No. 3, 2018 March
f (ω, θ) =
sin(ω (α A + β A cos θ) + γ A ) , ω (α A + β A cos θ)
(A. 5)
θ2 Iθ (ω) =
f (ω, θ)dθ.
(A. 6)
θ1
Case 3: β A = 0, β A = α A , and there exists θn ∈[θ1 , θ2 ] yielding α A +β A cosθn = 0. In this case the integration can be decomposed into several independent parts ⎡ θn −ε A θn +ε A (ω) = f (ω, θ)dθ + f (ω, θ)dθ I θ ⎢ θ1 θn −ε A ⎢ ⎢ θ2 ⎢ ⎢ f (ω, θ)dθ, θn ∈ (θ1 , θ2 ) + ⎢ θn +ε A ⎢ . θn +ε A θ2 ⎢ ⎢ I (ω) = f (ω, θ)dθ + f (ω, θ)dθ, θ = θ n 1 ⎢ θ ⎢ θn θn +ε A ⎢ θn −ε A θn ⎣ Iθ (ω) = f (ω, θ)dθ + f (ω, θ)dθ, θn = θ2 θ1
θn −ε A
(A. 7) The last two forms of Eq. (A.7) are similar to the first one, so only the first one is discussed below. The first part and last part of the integration can be done numerically without problem. By expanding f(ω, θ) around θn , the second part of the integration approximately equals to [19] α A sin γ A . (A. 8) Iθ,2 ≈ ε A 2ω cos γ A + αA2 − βA2 In this paper ε A equals to 10−20 and the error is negligible compared to the integration error. Case 4: β A = α A , β A = 0. In this case the singularity θn = π, and the integration is divergent [19]. However β A = α A and β A = 0 is very unlikely to happen because α A is always an integer and d (A. 9) fs , c where d is a variable related to the microphone intervals. Default case. The integration of f(ω, θ) over θ can be accomplished numerically without problem. βA =
Case 2: β A = 0, α A = 0. I = cos γ A (θ2 − θ1 ) (ω2 − ω1 ) .
So the problem becomes solving
(A. 4)
165
ZHOU ET AL.
ENGINEERING REPORTS
THE AUTHORS
Haoran Zhou
Jing Lu
Haoran Zhou is a master candidate from Institute of Acoustics, Nanjing University. He received his BS degree in physics from Nanjing University in 2015. His main research interest includes microphone array signal processing, and speech signal processing. • Jing Lu received the BS degree in the Electronic Science and Technology Department in 1999, and the Ph.D. degree in the Institute of Acoustics in 2004, both from Nanjing University. He is currently the Deputy Head of Department of Acoustical Science and Engineering. His main research interest includes loudspeaker and microphone arrays, active noise control, speech enhancement, and DSP implementations of acoustical signal processing algorithms. He is currently a senior member of Chinese Institute of Electronics and a fellow of Chinese Institute of Acoustics. • Xiaojun Qiu received his Bachelor and Master degrees from Peking University in 1989 and 1992, and his Ph.D. from Nanjing University in 1995, all majoring in acoustics. He worked in the University of Adelaide, Australia, as a
166
Xiaojun Qiu Research Fellow in the field of active noise control from 1997 to 2002, and has been with the Institute of Acoustics of Nanjing University as a professor on acoustics and signal processing since 2002. He visited the Institute of Technical Acoustics (RWTH Aachen), Germany, as a Humboldt Research Fellow in 2008, working in the field of sound field reproduction. He worked at RMIT University as a Professor of design on audio engineering from 2013 to 2016, and now he is a Professor in audio, acoustics and vibration in University of Technology Sydney. His main research areas include noise and vibration control, room acoustics, electro-acoustics, and audio signal processing, particularly applications of active control technologies. He is a member of Audio Engineering Society and an elected director of the international Institute of Acoustics and Vibration, and serves as an Associate Technical Editor for the Journal of Audio Engineering Society. He has published 2 books, 5 book chapters and more than 400 technique papers, and has been the principal investigator for numerous projects. He has also applied for more than 90 patents on audio acoustics and audio signal processing.
J. Audio Eng. Soc., Vol. 66, No. 3, 2018 March