to dimensionality reductions in HRTF based applications. Our experimental ... quired. Therefore in order to make a customization application more efficient, even.
Dimensionality Reduction in HRTF by using Multiway Array Analysis Martin Rothbucher, Hao Shen and Klaus Diepold
Abstract In a human centered robotic system, it is important to provide the robotic platform with multimodal human-like sensing, e.g. haptic, vision and audition, in order to improve interactions between the human and the robot. Recently, Head Related Transfer Functions (HRTFs) based techniques have become a promising methodology for robotic binaural hearing, which is a most prominent concept in human robot communication. 1 In complex and dynamical applications, due to its high dimensionality, it is inefficient to utilize the original HRTFs. To cope with this difficulty, Principle Component Analysis (PCA) has been successfully used to reduce the dimensionality of HRTF datasets. However, it requires in general a vectorization process of the original dataset, which is a three-way array, and consequently might cause loss of structure information of the dataset. In this paper we apply two multiway array analysis methods, namely the Generalized Low Rank Approximations of Matrices (GLRAM) and the Tensor Singular Value Decomposition (Tensor-SVD), to dimensionality reductions in HRTF based applications. Our experimental results indicate that an optimized GLRAM outperforms significantly the PCA and performs nearly as well as Tensor-SVD with less computational complexity.
1 Introduction Head Related Transfer Functions (HRTFs) describe spectral changes of sound waves when they enter the ear carnal, due to the diffraction and reflection properties of the human body, i.e. the head, shoulders, torso and ears. In far field applications, they can be considered as complicated functions of frequency and two spatial variables (elevation and azimuth) [2]. Thus HRTFs can be considered as direction dependent
Institute for Data Processing, Technische Universit¨at M¨unchen, 80290 M¨unchen, Germany. e-mail: {martin.rothbucher,hao.shen,kldi}@tum.de 1
Preprint: appeared at the Proceedings of Human Centered Robot Systems, 2009
1
2
Martin Rothbucher, Hao Shen and Klaus Diepold
filters. Since the geometric features of head, shoulders, torso and ears differ from person to person, HRTFs are unique for each individual. The human cues of sound localization can be used in telepresence applications, where humans control remote robots (e.g. for dangerous tasks). Thus the sound, localized by the robot, is then synthesized at the human’s site in order to reconstruct the auditory space around the robot. Robotic platforms would benefit from a human based sound localization approach because of its noise-tolerance and the ability to localize sounds in a three-dimensional environment with only two microphones, or reproducing 3D sound with simple headphones. Consequently, it is of great use to improve performance of sound localization and synthesis for telepresence systems. Recently, researchers have invested great efforts in customization of HRTFs [4], which leads to better quality of sound synthesis with respect to unique individuals [12]. However, for the purpose of HRTF customization, a huge number of Head Related Impulse Response (HRIR) datasets of various test subjects are usually required. Therefore in order to make a customization application more efficient, even on a computational restricted system, e.g. telepresence systems or mobile phones, dimensionality reduction methods can be used. Since the pioneering paper [6], Principle Component Analysis (PCA) has become a prominent tool for HRTF reduction [8] and customization [4]. In general, applying PCA to HRTF datasets requires a vectorization process of the original 3D dataset. As a consequence, some structure information of the HRTF dataset might be lost. To avoid such limits, the so-called Tensor-SVD method, which was originally introduced in the community of multiway array analysis [7], has been recently applied into HRTF analysis, in particular for customization [3]. Meanwhile, in the community of image processing, the so-called Generalized Low Rank Approximations of Matrices (GLRAM) [13], a generalized form of 2DPCA, has been developed in competition with the standard PCA. It has been shown, that the GLRAM method is essentially a simple form of Tensor-SVD [10]. In this paper, we study both GLRAM and Tensor-SVD methods for the purpose of dimensionality reduction of HRIR datasets and compare them with the standard PCA. The paper is organized as follows. Section 2 provides a brief introduction to three dimensionality reduction methods, namely, PCA, GLRAM and Tensor-SVD. In section 3, performance of the three methods is investigated by several numerical experiments. Finally, a conclusion is given in section 4.
2 HRIR Reduction Techniques In this section, we briefly describe three techniques of dimensionality reduction for HRIRs, namely, PCA, GLRAM and Tensor-SVD. In general, each HRIR dataset can be represented as a three-way array H ∈ RNa ×Ne ×Nt , where the dimensions Na and Ne are the spatial resolutions of azimuth and elevation, respectively, and Nt the time sample size. By a Matlab-like notation, in this work we denote H (i, j, k) ∈ R
Dimensionality Reduction in HRTF by using Multiway Array Analysis
3
the (i, j, k)-th entry of H , H (l, m, :) ∈ RNt the vector with a fixed pair of (l, m) of H and H (l, :, :) ∈ RNe ×Nt the l-th slide (matrix) of H along the azimuth direction.
2.1 Principal Component Analysis The dimensionality reduction of HRIRs by using PCA is described as follows. First of all, we construct the matrix H := [vec(H (:, :, 1))> , . . . , vec(H (:, :, Nt ))> ] ∈ RNt ×(Na ·Ne ) ,
(1)
where the operator vec(·) puts a matrix into a vector form. Let H = [h1 , . . . , hNt ]. The mean value of columns of H is then computed by µ=
1 Nt
Nt
∑ hi .
(2)
i=1
b = [b After centering each row of H, i.e. computing H h1 , . . . , b hNt ] ∈ RNt ×(Na ·Ne ) where b b is computed as follows hi = hi − µ for i = 1, . . . , Nt , the covariance matrix of H C :=
1 b b> Nt H H .
(3)
Now we compute the eigenvalue decomposition of C and select q eigenvectors {x1 , . . . , xq } corresponding to the q largest eigenvalues. Then by denoting X = [x1 , . . . , xq ] ∈ RNt ×q , the HRIR dataset can be reduced by the following e = X >H b ∈ Rq×(Na ·Ne ) . H
(4)
Note, that the storage space for the reduced HRIR dataset depends on the value of q. Finally to reconstruct the HRIR dataset one need to simply compute e + µ ∈ RNt ×(Na ·Ne ) . Hr = X H
(5)
We refer to [5] for further discussions on PCA.
2.2 Tensor-SVD of Three-way Array Unlike the PCA algorithm vectorizing the HRIR dataset, Tensor-SVD keep the structure of the original 3D dataset intact. Given a HRIR dataset H ∈ RNa ×Ne ×Nt , c∈ Tensor-SVD computes its best multilinear rank − (ra , re , rt ) approximation H N ×N ×N a e t R , where Na > ra , Ne > re and Nt > rt , by solving the following minimization problem
4
Martin Rothbucher, Hao Shen and Klaus Diepold
min c∈RNa ×Ne ×Nt H
c
,
H − H
(6)
F
c where k · kF denotes the Frobenius norm of tensors. The rank − (ra , re , rt ) tensor H can be decomposed as a trilinear multiplication of a rank − (ra , re , rt ) core tensor C ∈ Rra ×re ×rt with three full-rank matrices X ∈ RNa ×ra , Y ∈ RNe ×re and Z ∈ RNt ×rt , which is defined by c= (X,Y, Z) · C H (7) c is computed by where the (i, j, k)-th entry of H c(i, j, k) = H
ra
re
rt
∑ ∑ ∑ xiα y jβ zkγ C (α, β , γ).
(8)
α=1 β =1 γ=1
Thus without loss of generality, the minimization problem as defined in (6) is equivalent to the following min kH − (X,Y, Z) · C kF ,
X,Y,Z,C
s.t. X > X = Ira ,Y >Y = Ire and Z > Z = Irt .
(9)
We refer to [9] for Tensor-SVD algorithms and further discussions.
2.3 Generalized Low Rank Approximations of Matrices Similar to Tensor-SVD, GLRAM methods does not require destruction of a 3D tensor. Instead of compressing along all three directions as Tensor-SVD, GLRAM methods work with two pre-selected directions of a 3D data array. Given a HRIR dataset H ∈ RNa ×Ne ×Nt , we assume to compress H in the first two directions. Then the task of GLRAM is to approximate slides (matrices) H (:, :, i), for i = 1, . . . , Nt , of H along the third direction by a set of low rank matrices {XMiY > } ⊂ RNa ×Ne , for i = 1, . . . , Nt , where the matrices X ∈ RNa ×ra and Y ∈ RNe ×re are of full rank, and the set of matrices {Mi } ⊂ Rra ×re with Na > ra and Ne > re . This can be formulated as the following optimization problem Nt
min
N
> (:, :, i) − XM Y )
(H
, i ∑
t X,Y,{Mi }i=1 i=1
F
(10)
s.t. X > X = Ira and Y >Y = Ire . Here, by abuse of notations, k · kF denotes the Frobenius norm of matrices. Let us construct a 3D array M ∈ Rra ×re ×Nt by assigning M (:, :, i) = Mi for i = 1, . . . , Nt . The minimization problem as defined in (10) can be reformulated in a Tensor-SVD style, i.e.
Dimensionality Reduction in HRTF by using Multiway Array Analysis
min kH − (X,Y, INt ) · M kF ,
X,Y,M
s.t. X > X = Ira and Y >Y = Ire .
5
(11)
We refer to [13] for more details on GLRAM algorithms. Remark 1. GLRAM methods work on two pre-selected directions out of three. There are then in total three different combinations of directions to implement GLRAM on an HRIR dataset. Performance of GLRAM in different directions might vary significantly. This issue will be investigated and discussed in section 3.2.1.
3 Numerical Simulations In this section, we apply PCA, GLRAM and Tensor-SVD to a HRIR based sound localization problem, in order to investigate performance of these three methods for data reduction.
3.1 Experimental Settings We use the CIPIC database [1] for our data reduction experiments. The database contains 45 HRIR tensors of individual subjects for both left and right ears, with a spatial resolution of 1250 points (Ne = 50 in elevation, Na = 25 in azimuth) and the time sample size Nt = 200. In our experiment, we use a convolution based algorithm [11] for sound localization. Given two signals SL and SR , representing the received left and right ear signals of a sound source at a particular location, the correct localization is expected to be achieved by maximizing the cross correlation between the filtered signals of SL with the right ear HRIRs and the filtered signal of SR with the left ear HRIRs.
3.2 Experimental Results In each experiment, we reduce the left and right ear KEMAR HRIR dataset (subject 21 in the CIPIC database) with one of the introduced reduction methods. By randomly selecting 35 locations in the 3D space, a test signal, which is white noise in our experiments, is virtually synthesized using the corresponding original HRIRs. The convolution based sound localization algorithm, which is fed with the restored databases, is then used to localize the signals. Finally, the localization success rate is computed.
6
Martin Rothbucher, Hao Shen and Klaus Diepold
(a) GLRAM on (azimuth, elevation)
(b) GLRAM on (azimuth,time)
(c) GLRAM on (elevation,time) Fig. 1 Contour plots of localization success rate of using GLRAM in different settings.
3.2.1 GLRAM in HRIR Dimensionality Reduction In this section, we investigate performance of the GLRAM method to HRIR dataset reduction in three different combinations of directions. Firstly, we reduce HRIR datasets in the first two directions, i.e. azimuth and elevation. For a fixed pair of values (Nra , Nre ), each pre-described experiment gives a localization success rate of the test signals. Then, for a given range of (Nra , Nre ), the contour plot of localization success rate with respect to various pairs of (Nra , Nre ) is drawn in Fig. 1(a). The curves in Fig.1(a) correspond to a set of fixed reduction rates. Similar results with respect to the pairs of (azimuth,time) and (elevation,time) are ploted in Fig. 1(b) and Fig. 1(c), respectively. According to Fig. 1(a), to reach a succes rate of 90%, the maximal reduction rate of 80% is achieved at ra = 23 and re = 13. We then summarize similar results from Fig. 1(b) and Fig. 1(c) for different success rates in Table 1. It is obvious that Localization Success Rate GLRAM (Nra , Nre ) GLRAM (Nra , Nrt ) GLRAM (Nre , Nrt )
90% 80% 80% 93%
80% 81% 83% 93%
70% 82% 85% 94%
Table 1 Reduction rate achieved by using GLRAM at different localization success rates
Dimensionality Reduction in HRTF by using Multiway Array Analysis
7
applying GLRAM on the pair of directions (elevation,time) outperforms the other two combinations. A reduction on the azimuth direction, which has the smallest dimension, leads to great loss of localization accuracy. It might indicate that differences (Interaural Time Delays) presented in the HRIRs between neighboring azimuth angles have stronger influence on localization cues than differences between neighboring elevation angles.
3.2.2 PCA and Tensor-SVD reduced HRIRs In the previous section, it is shown that the application of GLRAM in the directions of elevation and time performs the best, here we compare this optimal GLRAM with the standard PCA and Tensor-SVD. First of all, localization success rate of the test signals by using PCA reduction with respect to the number of eigenvectors is shown in Fig. 2. It is known from Table 1 that, to reach the success rate of 90% with the optimal GLRAM, it achieves the maximal reduction rate of 93%, which is equivalent for PCA with 17 eigenvectors. It is shown in Fig. 2, that with 17 eigenvectors, a localization success rate of 9% is only reached. On the other hand, to reach the success rate of 90%, PCA requires 37 eigenvectors, which gives a reduction rate of 82%. It is thus clear that the optimal GLRAM outperforms the PCA remarkably. As we know, GLRAM is a simple form of Tensor-SVD with leaving one direction out, in our last experiment, we investigate the effect of the third direction in performance of data reduction. We fix the dimensions in elevation and time to re = 15 and rt = 55, which are the dimensions for the optimal GLRAM in achieving 90% success rate, see Fig. 1(c). Fig. 3 shows the localization success rate of using Tensor-SVD with a changing dimension in azimuth. The decrease of the dimension in azimuth leads to a consistently huge loss of localization accuracy. In other words, with fixed reductions in directions (elevation,time), GLRAM outperforms Tensor-SVD. Similar to the obervations in previous subsection, localization accuracy seems to be more sensitive to the reduction in the azimuth direction than the other two directions.
Fig. 2 success rate by PCA
Fig. 3 success rate by Tensor-SVD
8
Martin Rothbucher, Hao Shen and Klaus Diepold
4 Conclusion and Future Work In this paper, we address the problem of dimensionality reduction of HRIR dataset using PCA, Tensor-SVD and GLRAM. Our experiments demonstrate that an optimized GLRAM can beat the standard PCA reduction with a significant benefit. Meanwhile, GLRAM is also competitive to Tensor-SVD due to nearly equivalent performance and less computational complexity. Applying GLRAM on HRIR data, two possible applications for future work are recommended. First, in order to accelerate localization process on mobile robotic platforms, GLRAM methods can be used to shorten the HRTF filters. Secondly, GLRAM methods could also be benefitial in HRTF customization. Acknowledgements This work was fully supported by the German Research Foundation (DFG) within the collaborative research center SFB453 "High-Fidelity Telepresence and Teleaction".
References 1. V. R. Algazi, R. O. Duda, D. M. Thompson, and C. Avendano. The CIPIC HRTF database. In IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics, pages 21–24, New Paltz, NY, 2001. 2. J. Blauert. An introduction to binaural technology. In Binaural and Spatial Hearing, pages 593–609, R. Gilkey, T. Anderson, Eds., Lawrence Erlbaum, Hilldale, NJ, USA, 1997. 3. G. Grindlay and M.A.O. Vasilescu. A multilinear (tensor) framework for HRTF analysis and synthesis. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, volume 1, pages 161–164, Honolulu, Hawaii, USA, 2007. 4. L. Hu, H. Chen and Z. Wu. The estimation of personalized HRTFs in individual VAS. In Proceedings of the 2008 Fourth International Conference on Natural Computation, pages 203–207, Washington, DC, USA, 2008. 5. I. T. Jolliffe. Principal Component Analysis. Springer, second edition, 2002. 6. D. J. Kistler and F. L. Wightman. A model of head-related transfer functions based on principal components analysis and minimum-phase reconstruction. Journal of the Acoustical Society of America, 91(3):1637–1647, 1992. 7. L. Lathauwer, B. Moor, and J. Vandewalle. A multilinear singular value decomposition. SIAM Journal on Matrix Analysis and Applications, 21(4):1253–1278, 2000. 8. J. Middlebrooks and D. Green. Observations on a principal components analysis of headrelated transfer functions. Journal of the Acoustical Society of America, 92(1):597–599, 1992. 9. B. Savas and L. Lim. Best multilinear rank approximation of tensors with quasi-Newton methods on Grassmannians. Technical Report LITH-MAT-R-2008-01-SE, Department of Mathematics, Linkpings University, 2008. 10. B. N. Sheehan and Y. Saad. Higher order orthogonal iteration of tensors (HOOI) and its relation to PCA and GLRAM. In Proceedings of the 2007 SIAM International Conference on Data Mining, pages 355–366, Minnenpolis, Minnesota, USA, 2007. 11. M. Usman, F. Keyrouz, and K. Diepold. Real time humanoid sound source localization and tracking in a highly reverberant environment. In Proceedings of 9th International Conference on Signal Processing, pages 2661–2664, Beijing, China, 2008. 12. E. Wenzel, M. Arruda, D. Kistler, and F. Wightman. Localization using nonindividualized head-related transfer functions. Journal of the Acoustical Society of America, 94:111–123, 1993.
Dimensionality Reduction in HRTF by using Multiway Array Analysis
9
13. J. Ye. Generalized low rank approximations of matrices. Machine Learning, 61(1-3):167–191, 2005.