Implementation of HRIR Interpolations On DSP Board TMS320C5535 eZdsp™ Hugeng1,a, Febryan Laya1,b, Wahidin Wahab2,c, Dadang Gunawan2,d 1
Computer Engineering Department
Universitas Multimedia Nusantara, Tangerang, Indonesia 2
Electrical Engineering Department
University of Indonesia, Depok, Indonesia a
[email protected],
[email protected],
[email protected],
[email protected]
Keywords: HRTF interpolation; implementation of HRTF on DSP board.
Abstract. Increased interest in virtual space reproduction for games and simulation program motivates many researches on Head-related Transfer Function (HRTF) for producing 3D sound. HRTF characteristics which differ for each individual need intensive individual HRTF measurement. HRTF interpolation can be done in time-domain or frequency-domain to reduce the need of large HRTF measurement. In this research, CIPIC HRTF Database was used and three HRTF interpolation techniques are discussed. Linear vertical technique came out as best performer according to mean square error with 7.7818%, Linear horizontal technique with 92.83423%, and bilinear technique with 88.5655%. Interpolations of HRTF can be implemented on DSP Board easily for reproducing 3D sound. Subjective tests using DSP Board also performed on 6 subjects resulting in 55.83% correct localization for linear vertical technique, and 52.5% for both linear horizontal and bilinear techniques. Introduction Human ears can be used to detect location of sound source. Localization of sound using ears and eyes can create sense of space [1]. Research in 3D sound, especially in those 3D games or virtual space simulation is always exciting. Differ from the real environment which is sound come from anywhere, sound from recording could not contain sound source location information when reproduced via speaker or headphone. The Fourier-transform pair of HRTF in time domain is called Head-related Impulse Response (HRIR). Using a pair of HRIRs, monaural sound wave can be reproduced into 3D sound that perceived as comes from a certain location in space. But, massive HRIRs measurements need intensive effort from each individual to stay still for a long period of time to measure from all points which sound may come. With interpolation, we can predict other non existing points from a set of HRIRs measurements. Thus, we only need to measure less HRIRs which reduces the effort and cost needed [2]. In this research, CIPIC HRTF Database and three techniques of interpolation will be used and the result will be implemented into DSP Board TMS320C5535 eZdsp™. The usage of DSP board is based on its portability and value. In later development, the DSP can be implemented into real-time system that acts as an add-on for smartphone, game console, etc. to produce 3D sound. DSP TMS320C5535 is low-power and low-cost DSP that come with co-processor for FFT and LCD Controller which can be used to implement interactive real-time system.
HRTF Interpolations Linear Interpolations Linear interpolation uses HRTFs from two adjacent points in a straight line to calculate the HRTF in the desired point. Linear horizontal technique uses two points from the right-side and leftside of the desired point as references, whereas linear vertical technique uses two points from upper and under of the desired point as references. The interpolated HRIR, ĥ(k), at a desired point (θ, ϕ) can be calculated as follow, hˆ ( k ) rh a ( k ) (1 r )h b ( k ) , 0 ≤ r ≤ 1 (1) where ha(k) and hb(k) are two reference HRIRs, r is the dividing ratio which relates to the distance of the desired point to the two reference points. Bilinear Interpolations Another way to perform HRTF interpolation is the bilinear method [2, 3], which consists of computing the HRTF corresponding to a given point on the reference sphere as a weighted mean of the measured HRTFs associated with the four nearest points that circumscribe the desired point. In that method, if the corresponding set of HRIRs has been measured over a spherical grid with steps θ_ grid and ϕgrid, relative to azimuth and elevation, respectively, the estimate of the HRIR at an arbitrary coordinate (θ, ϕ), as shown in Fig. 1, can be obtained by, hˆ(k ) (1 c )(1 c )ha (k ) c (1 c )hb (k ) c c hc (k ) (1 c )c hd (k )
(2)
where ha(k), hb(k), hc(k) and hd(k) are the HRIRs associated with the four nearest points to the desired position. The parameters cθ and cϕ _ are computed as, modgrid C (3) c grid grid and C modgrid (4) c grid grid being Cθ and Cϕ _ the relative angular positions defined in Fig. 1.
Fig. 1. Graphical Interpretation for Bilinear Interpolation of HRIRs[2]
Research Methodology This research was divided into two approaches. First approach was objective approach which calculated the mean square error (MSE) between the interpolated HRIRs and the measured HRIRs from database. Both types of HRIRs have the same locations of sound sources. MSE can be calculated with the following equation, e(θ, ϕ) = 100 % x || h(θ, ϕ) - ĥ(θ, ϕ) ||2 / || h(θ, ϕ) ||2
(5)
where h(θ, ϕ) is the measured HRIR in a location (θ, ϕ) and ĥ(θ, ϕ) is the corresponding interpolated HRIR in the same location (θ, ϕ). Calculation of ĥ(θ, ϕ) was performed in Matlab using Eq. 1 for linear interpolation and Eq. 2 for bilinear interpolation. This data from Matlab is then converted into data in the form of 16-bit fixed point that was later implemented into DSP board TMS320C5535 eZdsp™. Sofware development is done using Code Composer Studio v4.2.4 that is provided by Texas Instruments. The second (subjective) approach of this research involved six subjects who were asked to listen to the reproduced sound from DSP board which contained sound from eight locations for linear vertical interpolation and bilinear interpolation and six locations for linear horizontal interpolation, and then asked to answer with their perception where the sound came from. To achieve better individualization result, nine anthropometric parameters of each individual were compared to corresponding anthropometric parameters of each subject from database using Euclidean distance. The calculation of Euclidian distances resulted in that subject A, HSF, and AL had nearest anthropometric parameters to those of subject 154, subject D nearest to subject 155, subject F nearest to subject 028, subject LJS nearest to subject 018, and subject DCL nearest to subject 061. Thus, in the subjective test, each subject in this research used HRIRs of the subject from database that has nearest anthropometric parameters to him. Experiments’ Results and Discussion Interpolations of HRIRs are calculated at all points and all subjects using corresponding Eq. 1 and Eq. 2, then compared to the measured HRIRs available on database. In this section, interpolated HRIRs at some points and subjects are used to analyze the result of three methods used to calculate interpolations of HRIRs. A. Linear Horizontal Interpolation As seen on Fig. 2a, the shapes of measured and interpolated HRIRs and HRTFs at (-45o,0o) from Subject 028 are similar although some peaks differ. While the graphs of measured and interpolated HRIRs and HRTFs at (0o,0o) show similar shapes for left-ear HRIRs and HRTFs, but right-ear graphs show large differences in shapes and peaks, as seen on Fig. 2b. When we compare the performance of interpolated HRIRs and HRTFs at the same point (-45o,0o) from Subject 028 (Fig. 2a) and from Subject 154 (Fig. 3a), we observe that they have similar performances. Mean square errors, as stated in Eq. 5, between interpolated HRIRs and corresponding measured HRIRs at all points and all subjects are calculated. For each subject, average MSE differs ranging from 53.32% to 141.71%. Overall, the average MSE across subjects for linear horizontal interpolation technique is 92.83%.
(a) Position (-45o,0o) (b) Position (0o,0o) Fig. 2. Interpolated HRIRs and HRTFs from Subject 028 Using Linear Horizontal Interpolation
(a) Linear Horizontal Interpolation at (-45o,0o) (b) Linear Vertical Interpolation at (0o,0o) Fig. 3. Interpolated HRIRs and HRTFs from Subject 154 B. Linear Vertical Interpolation Similar with linear horizontal interpolation technique, linear vertical interpolation technique uses two HRIRs at two vertically adjacent reference points to calculate interpolated HRIR at desired point. We can observed visually from Fig. 3b, Fig. 4a and Fig. 4b, that by using linear vertical interpolation, interpolated HRIRs and HRTFs are almost the same as their corresponding measured HRIRs and HRTFs, not only across positions of sound sources (observed from Fig. 4a and Fig. 4b), but also across subjects (observed at the same position (0o, 0o), for Subject 028 on Fig. 4b and for Subject 154 on Fig. 3b). In line with the visual observations on Fig. 3b, Fig. 4a and Fig. 4b, the average MSE for linear vertical technique is 7.78% overall. This result is much better than the overall MSE from linear horizontal interpolation discussed before.
(a) Position (-45o,0o) (b) Position (0o,0o) Fig. 4. Interpolated HRIRs and HRTFs from Subject 028 Using Linear Vertical Interpolation
C. Bilinear Interpolation Different from the other two techniques, bilinear technique uses four reference points instead of only two adjacent points, which requires more calculation.
Fig. 5. Interpolated HRIRs and HRTFs at (-45o,0o) from Subject 028 Using Bilinear Interpolation Improved result of interpolated HRIRs and HRTFs at point (-45o, 0o) for Subject 028 is shown in Fig. 5 using bilinear interpolation, compared to linear horizontal technique as shown in Fig. 2a. The MSE of left-ear HRIR for this point is 50.47% compared to 75.88% achieved with linear horizontal technique. From the experiments, overall average MSE using bilinear interpolation is 88.56%, which is slightly better than the result of linear horizontal interpolation. D. Subjective Listening Test Results For the purpose of listening test, interpolated HRIRs at some points that related to subjects on test were implemented in DSP Board TMS320C5535 eZdsp™. Subjective listening test was carried out on six subjects where 20 sound sources were played back from the DSP Board for each subject by using each technique. Percentages of the ratio of correct answers to total 20 sound sources were taken as the results. These results can be seen on Table 1. Table 1. Subjective Listening Test Results
Subject 154
Subject 155 Subject 018 Subject 061 Average
MSE Subject A Subject HSF Subject AL MSE Subject D MSE Subject LJS MSE Subject DCL MSE Subjective Test
Horizontal 84.02% 65.00%
Vertical 6.35% 75.00%
Bilinear 80.08% 60.00%
35.00%
45.00%
40.00%
60.00% 69.94% 60.00% 71.98% 40.00% 102.91%
70.00% 9.95% 70.00% 6.14% 30.00% 9.59%
60.00% 67.19% 50.00% 68.79% 45.00% 97.55%
55.00%
45.00%
60.00%
92.83%
7.78%
88.57%
52.50%
55.83%
52.50%
As seen from Table 1, the results of subjective listening test do not follow the results of objective experiments by calculating MSE of the interpolated HRIRs. The best results of objective experiments were achieved using linear vertical interpolation, followed by bilinear interpolation and the worst results were achieved using linear horizontal interpolation. Subject A, AL, and D can gain percentage of correctness about 60%. These subjects could localize the sound source positions quite well.
Conclusion Three interpolation techniques were compared their performances by using mean square error as objective performance parameter. Linear vertical interpolation came out as best performer according to its average mean square error. Implementation of HRIRs on the DSP Board was performed with pre-processing the HRIRs data to comply with DSP specification. Subjective listening test carried out on six subjects resulted in that percentages of correctness in localizing the sound source were almost the same among interpolation techniques, it did not affected by the objective performances. Acknowledgment The authors gratefully thank to CIPIC Interface Laboratory of California University at Davis, USA for providing the CIPIC HRTF Database. References G. S. Kendall,”A 3-D Sound Primer: Directional Hearing and Stereo Reproduction”, in Computer Music Journal, Vol. 19, No. 4, 1995. [2] F. P. Freeland, L. W. P. Biscainho, and P. S. R. Diniz, “Efficient HRTF interpolation in 3D Moving sound,” in Proc. Audio Eng. Soc. 22nd Int. Conf. on Virtual, Synthetic and Entertainment Audio, Espoo, Finland, June 2002. [3] R. L. Martin and K. McAnally. “Interpolation of Head-Related Transfer Function,” Fishermans Bend,Victoria : Defence Science and Technology Organisation, Australia, 2007. [4] I. Verbauwhede, P. Schaumont, C. Piguet, B. Kienhuis, "Architectures and Design Techniques for Energy Efficient Embedded DSP and Multimedia Processing", in Proc. DATE, 2004, pp. 20988, Feb 2004. [5] L. Wang, F. Yin, and Z. Chen. “Head-related transfer function interpolation through multivariate polynomial fitting of principal component weights”. Acoust. Sci. & Tech. Vol. 30, No. 6, 2009, pp. 395-403. [6] V. R. Algazi, R. O. Duda, D. P. Thompson, and C. Avendano, "The CIPIC HRTF Database," Proc. IEEE WASPAA01, New Paltz, NY, pp. 99-102, 2001. [7] W. G. Gardner, and K. Martin. “HRTF Measurements of a KEMAR,” J. Acoust. Soc. Amer., Vol. 97, pp. 3907-3908, 1995. [8] CIPIC Interface Laboratory. (October, 1998). Documentation for the UCD HRIR Files. University of California at Davis. [9] Texas Instruments. (2011). TMS320C5535 eZdsp™ USB Development Kit. [10] Hugeng, W. Wahab, and D. Gunawan, “The Effectiveness of Chosen Partial Anthropometric Measurements in Individualizing Head-Related Transfer Functions on Median Plane,” ITB J. ICT, Vol. 5, No.1, 2011, pp. 35-56. [11] Burgess, David A. (1992). Real-Time Audio Spatialization with Inexpensive Hardware, Georgia Institute of Technology. Blauert, J. (1983) Spatial Hearing: The Psychophysics of Human Sound Localization, MIT Press: Cambridge, MA. [1]