Developing Common Attributes for Surround Sound Recording Evaluation
Kamekawa et al.
DEVELOPING COMMON ATTRIBUTES TO EVALUATE SPATIAL IMPRESSION OF SURROUND SOUND RECORDINGS TORU KAMEKAWA1, ATSUSHI MARUI2 1
Musical Creativity and the Environment, Tokyo University of the Arts, Tokyo, Japan
[email protected] 2 Musical Creativity and the Environment, Tokyo University of the Arts, Tokyo, Japan
[email protected] For the evaluation of spatial impression, several attributes are used. However, in investigating more critical evaluation of spatial impression such as surround microphone setting, it is very difficult to share common meanings for each perceptual attribute. The authors attempted to elicit common attributes from surround sound recordings by triadic elicitation procedure. Three attributes, “brightness,” “temporal separability,” and “spatial homogeneity,” were elicited. Pairwise comparison was implemented to evaluate five different microphone placements for surround recordings using these attributes. From the results of ANOVA, significant differences between microphone placements and interaction among subjects were observed at all attributes. After removing the subjects who had circular triads and by using cluster analysis procedure, 60 to 70 percent (depending on the attributes) of the professional subjects remained. This is more stable compared to the student subjects for all three attributes. It is suggested that training is necessary for naïve listeners to share the same meanings of these attributes. Focusing on one of the elicited attributes, “spatial homogeneity,” which had significant differences between pairs of recording excerpts, the authors studied the correspondence with physical factors. D/R (the direct to reverb ratio) was calculated via IACC, which was measured from a binaural recording of music and filtered into third-octave bands. It is hypothesized that the differences of mean D/R at a certain frequency band (around 3kHz) that were measured between different listening positions, correspond to “spatial homogeneity”.
INTRODUCTION When we evaluate recording techniques for surround sound, many attributes are used such as ‘brightness’, ‘powerfulness’, and so on [1–4]. Particularly, spatial impression is one of the most important factors when compared with conventional two-channel stereo. The typical attributes of spatial impression are the following (also described in Figure 1): - Localization: the apparent location of the sound source, - Depth: the apparent spatial distance of the sound source from the listener, - Width: the width of frontal image similar to ASW (apparent source width), - Envelopment: the enveloped feeling, surrounded laterally and from behind similar to LEV (listener envelopment) [6] - Presence: the impression of being at the actual performance. As for Depth and Width, individual sizes and ensemble sizes can be distinguished (as shown in Figure 1). However, it is difficult to evaluate the perceptual differences between actual music excerpts such as differences in microphone set up [4, 5]. The subjective evaluation
test using these attributes often results in no significant differences, because while differences between the excerpts are small, the differences between impressions of these attributes are somewhat large among listeners. In this paper, the authors discuss the attributes for spatial impressions that enable listeners to share during a subjective evaluation of surround sound recordings.
Figure 1: The typical attributes of spatial impression.
AES 40th International Conference, Tokyo, Japan, 2010 October 8–10
1
Developing Common Attributes for Surround Sound Recording Evaluation
1
ELICITATION OF THE ATTRIBUTES
1.1 Repertory Grid Technique The method for elicitation of the attributes was The Repertory Grid Technique. RGT consists of two parts. First is an elicitation part in which the subjects describe in what way the sounds differed most from each other. In the second part the stimuli are rated between the elicited descriptors. A triadic elicitation procedure was implemented following the method of Berg and Rumsey [7,8]. The subjects were presented with a triplet of sounds and instructed to indicate which of the three sounds differed most from the other two, and a pair of words describing their similarities and differences was obtained for each. In addition to the conventional RGT, the subject was then also asked why the sound differed from the others. 1.2 Playback Material Nine musical excerpts were used for the experiment. These excerpts were recorded in our university’s concert hall named ‘Sogakudo’ (1,140 seats) equipped with a movable ceiling system which can control the volume of the hall and consequently the reverberation time. The microphone array “Omni+8” was used, which consists of four omni-directional microphones and one figure-8 microphone [9] (Figure 2).
Figure 2: Omni+8. Two omni-directional microphones for left and right and a figure of 8 for center, two omnidirectional microphones for rear.
Kamekawa et al.
The microphone placement was optimized for each ceiling type to match the piano performance (Scherzo No.2 by Chopin) that was also adjusted for each ceiling type by a professional pianist. An automatic piano player was used to record and playback the piano performances to keep the same performance during these recordings. The treatment resulted in nine excerpts in total. To compare a different type of music, another piece sequenced by MIDI data (Invention No.1 by J. S. Bach) was also recorded. This performance was also kept constant over different ceiling types and microphone placements. The two musical pieces are different in use of chords and dynamic range. Also, the Bach piece was prepared in a studio and the performance was not adjusted for the Sogakudo concert hall. For the triadic elicitation procedure, three excerpts were randomly chosen from these nine recordings (of the same musical piece) for each trial. The duration of each excerpt was approximately twenty seconds. 1.3 Experimental setup Seven subjects including five students and two faculty members, who major in or teach acoustics, participated in the experiment. The experiment was conducted at the Sound Production Studio located at the Senju Campus of Tokyo University of the Arts (a.k.a. Tokyo Geidai). The studio was built under the listening room specifications in ITU-R BS 1116. The 9 five-channel stimuli (three types of ceilings by three microphone placements) were presented via five active full bandwidth loudspeakers (Genelec 8050) arranged according to ITU-R BS 775-1, with a height of 1.2 m from the floor and with a radius of 2.6 m from the central listening position (no LFE signal was prepared). Calibration of loudspeaker level was performed using an audio analyzer (Klark Teknik DN6400 with NTI N2010 microphone) with A-weight and fast response. Each loudspeaker output was individually calibrated to 79dBA using a -10dBFS pink noise input signal, giving 85dBA for the total of all five speakers. Average listening level of each music excerpt was adjusted ca. 85dB (LAeq). A Digidesign ProTools HD system with a D-Command mixing console was used for the surround sound reproduction system at 48kHz/24bits.
The recording condition of the music excerpts were a linear combination of the following three types of ceilings with three microphone placements adjusted for each ceiling type (Figures 3, 4, and 5). - Ceiling type A: Highest ceiling usually used in pipe organ concerts - Ceiling type B: Middle height of ceiling for orchestra concerts - Ceiling type C: Lower height of ceiling for solo piano and vocal music.
AES 40th International Conference, Tokyo, Japan, 2010 October 8–10
2
Developing Common Attributes for Surround Sound Recording Evaluation
Kamekawa et al.
of the words obtained (originally in Japanese). During the discussion session, each word was confirmed as to if it could share a common meaning among all participants listening to the same excerpts. For example, the word “hard,” which was selected to distinguish excerpt I from the other excerpts (II and III), was discussed as to whether it could be similarly used in evaluation by all other participants. A subsequent listening test was conducted to confirm that all participants could choose the same excerpts by the same criteria. This way, the words that did not reach an agreement were excluded from the lists, and the following three attributes were consolidated in the order of importance at the end of the session.
Figure 3: The ceiling configurations of “Sogakudo” for the recording. Values indicate the height of each ceiling panel from the stage in meters.
- Brightness - Temporal Separability - Spatial Homogeneity
I I I I I
II II II II II
III III III III III
Reason of choice hard light narrow sound image bright rich spaciousness
Common of others soft heavy wide dark poor spaciousness
Table 1: An example questionnaire and the example of the evaluation words. 2 Figure 4: Microphone placements for the recording. Unit of the values are in centimeters.
Figure 5: Microphone placements for the recording from the side view. Unit values are in centimeters. 1.4 Discussion and Concentration on Common Meaning After the triadic elicitation procedure mentioned above, the authors conducted an interview session with all subjects who participated this experiment, and discussed the words elicited by each other. This part differs from the typical RGT, and was inspired by Quantitative Descriptive Analysis method. Table 1 indicates a sample
SUBJECTIVE LISTENING TEST USING THE ELICITED ATTRIBUTES
2.1 Preliminary experiment To confirm whether the elicited attributes can be used to express common impressions of music excerpts, the listening test was conducted as the preliminary experiment. Six subjects who participated in previous elicitation sessions evaluated nine music excerpts using nine faders on a mixing console to indicate the hierarchical order of the recorded samples for each attribute. The faders were used solely to express the attribute ratings for excerpts, and were not used to control signal levels or any other parameters. Because all nine excerpts were played simultaneously, a participant could use the respective “solo” switch on a fader to switch between them whenever he/she chose to. Listening level was adjusted ca. 85dB (LAeq), the same as the previous experiment.
AES 40th International Conference, Tokyo, Japan, 2010 October 8–10
3
Developing Common Attributes for Surround Sound Recording Evaluation
BR (Chopin) TS (Chopin) SH (Chopin) BR (Bach) TS (Bach) SH (Bach)
SS 3.946 3.941 1.396 2.402 4.038 5.288
df 8 8 8 8 8 8
MS 0.493 0.493 0.174 0.300 0.505 0.661
F0 1.153 1.196 0.403 0.698 1.151 1.602
p 0.348 0.323 0.913 0.691 0.349 0.151
Kamekawa et al.
elicited three attributes, ‘Brightness’, ‘Temporal Separability’, and ‘Spatial Homogeneity.’ The experiment system was implemented via Scheffe's Paired Comparison using the “A-B Comparison mode” of the “STEP” program by Audio Research Labs [11]. A listener makes a two-comparison forced-choice test methodology using a 7-point response scale (Figure 6).
Table 2: ANOVA results of each attribute and music. The “BR” means “Brightness”, “TS” means “Temporal Separability”, and “SH” means “Spatial Homogeneity”. As in the results of one-way analysis of variance shown on Table 2, there was no significant difference between the nine excerpts among the three attributes. The authors concluded the reasons were the following: - Nine excerpts were too many in number to evaluate simultaneously and the differences between the excerpts were too small, - Subjects were not familiarized enough to distinguish the difference in the surround music, and/or - The attributes were not adequate to evaluate surround sound. 2.2 Discussion between the surround professionals To examine the adequacy of the elicited attributes, the same kind of discussions were held with two professionals working for surround music recording and surround post-production. From the triadic elicitation procedure implemented using the same music excerpts, the following criteria regarding spatial impression were listed: - Potential to imagine the room shape, - Continuity between front and rear, and - Balance of reverberation. Furthermore, the following three words were elicited in the order of importance; - Balance of reverberation, - Amount of reverberation, and - Timbre of reverberation. These words are surprisingly related to the attributes elicited in previous discussion by the students, - Spatial Homogeneity, - Temporal Separability, and - Brightness. However, it is remarkable that the order of importance is the opposite of the order from students. Authors concluded that the elicited three attributes could be used to express common impression for surround music recording.
Figure 6: Screenshot of the “STEP” program. 2.4 Music sources Five music excerpts recorded in a medium-sized concert hall (Casals Hall, Tokyo) were used. First, the same microphone setup used in the previous recording (Omni+8) was located in the optimal position by a professional recording engineer’s ears (called the ‘reference’). Additional samples were recorded by moving the microphone arrays 30cm from the reference position to back, forward, up, and down (Figure 7). The automated player piano was employed to play “Impromptu Opus” by Chopin (bars 29 to 42), which resulted in a total of five samples that were used in the subsequent experiment.
2.3 Pair-wise comparison The subsequent experiment was conducted considering the previous results. The experiment was implemented using the pairwise comparison method [10] using the
Figure 7: Microphone placement for the recording at Casals Hall. Unit of the values are in centimeters.
AES 40th International Conference, Tokyo, Japan, 2010 October 8–10
4
Developing Common Attributes for Surround Sound Recording Evaluation
2.5 Playback environment Twenty-one subjects, including 11 students and 10 professional recording engineers and researchers, evaluated all linear combinations of the five recorded excerpts (5x(5-1)/2=10 pairs) in the same playback environment mentioned in the section 1.2. Durations of the samples were approximately 20 seconds. A subject could listen to each piece until he/she was satisfied with the evaluation of each pair. Each subject evaluated three attributes sequentially, with pairwise comparison procedure (10 pairs by 3 attributes making 10x3=30 trials). 3 RESULTS Figures 8 through 10 show the results of the experiments and tables 3 through 5 show the results of the analysis of variance (Nakaya variation of Scheffe’s ANOVA [10]) of each attribute by all subjects. Significant differences between microphone placements and effects of subjects’ individual differences were observed for all attributes.
Kamekawa et al.
Figure 10: Mean ratings for the microphone placements regarding ‘Spatial Homogeneity’ by all subjects. The Yaxis is the score of 7-point response scale (-3 to +3). Whiskers indicate 95% confidence intervals. SS 40.86 261.94
df
MS 10.21 3.27
F0 5.03 * 1.61 *
“Brightness” 4 Subject 80 ”Brightness” 19.43 6 3.24 x subject Error 243.77 120 2.03 Total 566 210 F(4, 120; 0.05) = 2.4472, F(80, 120; 0.05) = 1.3922 Y(0.05) = 0.5538
Table 3: ANOVA of the comparison of the microphone placements regarding “Brightness” by all subjects.
Figure 8: Mean ratings for the microphone placements regarding ‘Brightness’ by all subjects. The y axis is the score of 7-point response scale (-3 to +3). Whiskers indicate 95% confidence intervals.
SS df MS F0 “Temporal * 128.90 4 32.22 22.09 Separability” Subject 220.30 80 2.75 1.89 * ” Temporal Separability” 5.77 6 0.96 x subject error 175.03 120 1.46 total 530.0 210 F(4, 120; 0.05) = 2.4472, F(80, 120; 0.05) = 1.3922 Y(0.05) = 0.4692 Table 4: ANOVA of the comparison of the microphone placements regarding “Temporal Separability” by all subjects.
Figure 9: Mean ratings for the microphone placements regarding ‘Temporal Separability’ by all subjects. The Y-axis is the score of 7-point response scale (-3 to +3). Whiskers indicate 95% confidence intervals.
SS df MS F0 “Spatial Ho71.14 4 17.79 10.54 * mogeneity” Subject 267.66 80 3.35 1.98 * “Spatial Homogeneity” x 6.67 6 1.11 subject Error 202.53 120 1.69 Total 548.0 210 F(4, 120; 0.05) = 2.4472, F(80, 120; 0.05) = 1.3922
AES 40th International Conference, Tokyo, Japan, 2010 October 8–10
5
Developing Common Attributes for Surround Sound Recording Evaluation Y(0.05) = 0.5048 Table 5: ANOVA of the comparison of the microphone placements regarding “Spatial Homogeneity” by all subjects. 3.1 Circular triad To check the error in each subject, the existence or nonexistence of circular triads was confirmed [10,12]. Circular triads are caused by internal inconstancy when the subject rated A higher than B, B higher than A, and C higher than A (see top-right panel of Figure 11). The following formula (1) was used to calculate the number of circular triads made by each subject.
1 1 k d = k(k −1)(k − 2) − ∑ ai (ai −1) 6 2 i=1 The d is the number of circular triads, the k is the number of nodes (in our case, nodes are microphone place€ ments, thus k=5), and the ai is the number of outward arrows from each node to the others (see Figure 8). The subjects who had more than two circular triads were removed from the list. Finally, the following numbers remained; - Brightness: 13 out of 21, - Temporal Separability: 16 out of 21, - Spatial Homogeneity: 17 out of 21.
(1)
Kamekawa et al.
analysis was done using the “kmeans” command in Matlab. For example, the subjects of “Brightness” were classified into two groups. The results of these two groups show different tendencies (Fig. 12 and 13). Final results of “Temporal Separability” and “Spatial Homogeneity” are shown in Figure 14 and 15 and the analysis of variance after the groupings are shown in tables 6, 7, and 8. Significant differences between each microphone placement increased for all the subjects. Regarding “Brightness,” “forward” was rated the highest, and “down” has the highest “Temporal Separability.” Meanwhile, “reference” (Ref) is superior in “Spatial Homogeneity” than other microphone placements. The impression of “Brightness” and “Temporal Separability” show similar results, and these two attributes and “Spatial Homogeneity” show contrasting results. Also, it is interesting to note that despite the fact that the reference is at the middle location of both up-and-down and back-and-forward, impression differences are not formed in this order.
Figure 12: Comparison of the microphone placements regarding “Brightness” by one of the group of subjects classified by cluster analysis. The Y-axis is the score of the 7-point response scale (-3 to +3).Whiskers indicate 95% confidence interval.
Figure 11: Principle of the circle triad. Compared to the top-left triangle, the top-right triangle has a circular triad. A pentagon at the bottom shows two circular triads (colored). 3.2 Grouping by Cluster Analysis The analysis of variance after removing the subjects with inconstancies still had the effect of subjects’ individual differences, however. So, remaining subjects were divided into two or three groups by cluster analysis until the effect of subjects’ individual differences had no significant difference. Computing of cluster
Figure 13: Comparison of the microphone placements regarding “Brightness” by another group of subjects
AES 40th International Conference, Tokyo, Japan, 2010 October 8–10
6
Developing Common Attributes for Surround Sound Recording Evaluation classified by cluster analysis. The y axis is the score of 7-point response scale (-3 to +3). Whiskers indicate 95% confidence interval.
Kamekawa et al.
SS “Temporal 131.0 Separability” 0 Subject 37.00 ” Temporal Separability” 9.10 x subject error 59.90 total 237 F(4, 54; 0.05) = 2.5429 F(36, 54; 0.05) = 1.6347 Y(0.05) = 0.6031
df
MS
F0
4
32.75
29.52
36
1.03
0.93
6
1.52
54 100
1.11
*
Table 7: ANOVA of the comparison of the microphone placements regarding “Temporal Separability” by selected subjects. Figure 14: Comparison of the microphone placements regarding ‘Temporal Separability’ by selected subjects. The Y-axis is the score of 7-point response scale (-3 to +3). Whiskers indicate 95% confidence interval.
SS “Spatial Ho121.88 mogeneity” Subject 97.72 “Spatial Homogeneity” x 2.74 subject Error 97.66 Total 320 F(4, 72; 0.05) = 2.4989 F(48, 72; 0.05) = 1.5313 Y(0.05) = 0.5804
df
MS
F0
4
30.47
22.46
48
2.04
1.50
6
0.46
72 130
1.36
*
Table 8: ANOVA of the comparison of the microphone placements regarding “Spatial Homogeneity” by selected subjects.
Figure 15: Comparison of the microphone placements regarding ‘Spatial Homogeneity’ by selected subjects. The Y-axis is the score of the 7-point response scale (-3 to +3). Whiskers indicate 95% confidence interval. SS df MS F0 “Brightness” 59.80 4 14.95 8.79 * Subject 46.2 28 1.65 0.97 ”Brightness” x 11.58 6 1.93 subject Error 71.43 42 1.70 Total 189 80 F(4, 42; 0.05) = 2.5943, F(28, 42; 0.05) = 1.7450 Y(0.05) = 0.8423 Table 6: ANOVA of the comparison of the microphone placements regarding “Brightness” by selected subjects.
3.3 Discussion Table 9 shows the percentage of selected subjects in each attribute elicited from this study. Regarding “Brightness,” 38% of all subjects could share the meaning, with 48% for “Temporal Separability,” and 62% for “Spatial Homogeneity.” Some subjects informally told the authors that the criterion of evaluation has drifted in different parts within a music excerpt especially with “Brightness.” This comment is related to the percentage numbers obtained above. The breakdown of selected subjects (who were consistent throughout the experiment) included 60 to 70% of all professionals and 18 to 55% of all students (Figure 16). It is remarkable to compare these two groups, since more than 60% of the professionals had common impressions of these attributes. It is suggested that training is necessary to share the same meanings of these attributes.
AES 40th International Conference, Tokyo, Japan, 2010 October 8–10
7
Developing Common Attributes for Surround Sound Recording Evaluation
attributes Brightness Temporal Separability Spatial Homogeneity
CT 62% (13/21) 76% (16/21) 81% (17/21)
CA 38% (8/21) 48% (10/21) 62% (13/21)
N 2 3 2
Pro 60% (6/10) 60% (6/10) 70% (7/10)
Student 18% (2/11) 36% (4/11) 55% (6/11)
Table 9: Percentage of selected subjects in each attribute. The “CT” means subjects who had less than two circular triads of all subjects. The “CA” means subjects who belong to the major group divided by cluster analysis, and “N” means number of clusters. The “pro” and “student” means percentage of number of professionals and students in each group. The numbers of the parenthesis denote the ratio of the selected against the population.
Kamekawa et al.
from an impulse response signal at the ears. In our case, the music was recorded with a set of dummy head microphones (B&K Type 4128) in the same room as the subjective rating experiments were done. The signal was first filtered into third-octave bands and an IACC value for a 10ms window that steps forward in 5ms steps were calculated for each band. The result is a value of IACC that is different for each 5ms of the signal. The direct to reverberant ratio in dB (D/R) was found from the IACC by following formula [14]. ⎛ ⎞ 1 D /R = 10log⎜ ⎟ ⎝1− IACC ⎠
(2)
€
Fig. 17 D/R ratio calculated from running IACC comparing “Ref (red solid line)” and “Forward (blue dotted line) on 3.2kHz .
CORRESPONDENCE BETWEEN PHYSICAL FACTORS AND SPATIAL IMPRESSIONS
Since the D/R value fluctuates every moment and it is difficult to compare each sample (Fig.17), histograms of each sample’s D/R were used (Fig. 18). The authors expected that the D/R value below 3dB (which is the same as IACC < 0.5) corresponds to “spatial homogeneity,” and thus compared the value with each frequency band (Fig.19). From the figure, the appearance probability of D/R below 3dB is very little under 800Hz and it increases according to frequency increase. Furthermore, there is a salient difference between “Ref” and “Front” at 3.2kHz.
4.1 Running IACC and D/R Using a pair of recording samples named “Ref (reference)” and “Forward” which has significant differences regarding “spatial homogeneity”, correspondence between physical factors and spatial impressions was studied. On the factor regarding spatial impression, ASW and LEV are well-known to correspond to IACC (Inter-aural Cross-correlation Coefficient) [6,13]. To consider a more ecological listening environment, “running IACC” proposed by D. Griesinger [14] was employed. Running IACC is used to investigate the temporal variation of inter-aural cross-correlation within a time varying signal such as music and speech, hence it is not calculated
Fig. 18 Histogram of the appearance of each level of D/R ratio calculated from running IACC at 3.2kHz 1/3 octave band comparing “Ref” and “Forward”.
Fig. 16: Percentage of selected subjects in professional and student groups. The “pro” and “student” means the percentage of the number of professionals and students in each group. 4
AES 40th International Conference, Tokyo, Japan, 2010 October 8–10
8
Developing Common Attributes for Surround Sound Recording Evaluation
Fig. 19 Histogram of D/R below 3dB of each 1/3 octave band comparing “Ref” and “Forward”. 4.2 Plural point of IACC (D/R) To confirm “homogeneity” of IACC (D/R) around the listening position, the authors measured sets of IACC at several head and body angles with the dummy-head to left and right (±15 and ±30 degrees). To compute overall differences of head rotation from the reference position, differential between these positions at each 1/3 octave band of D/R ratio was calculated, and then the average over head angles were computed, hence called “mean D/R differences”. From the histogram of mean D/R differences, almost all differences are included within 3dB (Fig.20). To compare “Ref” and “Forward” in higher resolution, mean D/R differences below 1dB of each third-octave band were calculated (Fig.20).
Fig. 19 Histogram of mean D/R differences at 3.2kHz 1/3 octave band comparing “Ref” and “Forward”. The salient difference of histograms between “Ref” and “Forward” at 3.2 kHz was found again. It can be hypothesized that differences of D/R around 3kHz between different listening positions correspond to “spatial homogeneity”. Though the differences below 800Hz are also salient, it is considered that these differences do not affect spatial impressions because of their small value shown in Fig. 18.
Kamekawa et al.
Fig. 21 Histogram of mean D/R differences below 1dB of each 1/3 octave band. 5 CONCLUSION The attributes for evaluating surround sound microphone setup were discussed. From the triadic elicitation procedure using actual surround recording, attributes were elicited. Through the process of excluding words that did not reach an agreement, the following three attributes resulted: - Brightness, - Temporal Separability, and - Spatial Homogeneity. Pairwise comparison was implemented to evaluate five different microphone placements for surround sound recording. From the results of ANOVA, significant differences between microphone placements and interaction among subjects were observed for all attributes. To reduce significant differences in the interaction of subjects, subjects who had circular triads were removed, and the rest were divided into two or three groups by cluster analysis. The remaining subjects consisted 60 to 70% of the all subjects for professionals and this is more stable compared to the students at all three attributes. It is suggested that training is highly necessary to share the same meanings of these attributes for naïve listeners. Using a pair of recording samples, which have significant differences regarding “spatial homogeneity”, the correspondence between physical factors and spatial impressions was studied. D/R value was calculated from running IACC, which was measured by a binaural recording of music and filtered into third-octave bands. From histograms of each sample ‘s D/R, the value below 3dB (same as IACC < 0.5), which is considered correspondence to “spatial homogeneity,” was compared with each frequency band. The appearance ratio of D/R below 3dB of each third-octave band indicates salient difference between “Ref” and “Front” at 3.2kHz. To confirm “homogeneity” of IACC (D/R) around listening positions, a dummy-head was plural-position rotated left and right (±15 and ±30 degrees) and measured. The appearance ratio of D/R differences below
AES 40th International Conference, Tokyo, Japan, 2010 October 8–10
9
Developing Common Attributes for Surround Sound Recording Evaluation 1dB of each third-octave band was calculated from differential of D/R between front-face and rotated positions. It was found that the salient difference of appearance ratios is at 3.2 kHz. It is hypothesized that differences of D/R around 3kHz between different listening positions correspond to “spatial homogeneity”. This study is still in progress, and the final goal will be the establishment of training methods for evaluating spatial impression of surround sound recordings. As for the next step, confirmation of the appropriateness of the elicited attributes is planned. The authors will conduct further listening experiments using music excerpts controlling D/R (IACC) and/or other relevant physical parameters. REFERENCES [1] F. Rumsey, “Spatial Audio”, Chapter 2 (pp2151),Focul Press,2001. [2]
S. Bech, N. Zacharov, “Perceptual Audio Evaluation”, Appendix B (pp363-366),John Wiley & Sons,Ltd, 2006.
[3]
F. Rumsey, “Spatial Quality Evaluation for Reproduced Sound Terminology, Meaning, and a Scene-Based Paradigm”, J. Audio Eng. Soc. Vol.50, No.9, 2002.
[4]
W. Martens and S. Kim, “Verbal Elicitation and Scale Construction for Evaluating Perceptual Differences between Four Multichannel Microphone Techniques” AES 122nd Convention. Vienna. Preprint 2007.
[5]
T. Kamekawa, A.Marui, and H. Irimajiri “Correspondence Relationship between Physical Factors and Psychological Impressions of Microphone Arrays for Orchestra Recording” AES 123th Convention, New York. Preprint 2007.
[6]
M. Morimoto, H. Fujimori, and Z. Maekawa, “Discrimination between auditory source width and envelopment”, J. Acoust. Soc. Japan. Vol.46, No.6 pp.443-457 (1990). (Available only in Japanese language)
[7]
J. Berg and F. Rumsey “Identification of Quality Attributes of Spatial Audio by Repertory Grid Technique,” J. Audio Eng. Soc. Vol.54, No.5. May 2006.
[8]
S. Choisel and F. Wickelmaier “Extraction of Auditory Features and Elicitation of Attributes for the Assessment of Multichannel Reproduced Sound”, J. Audio Eng. Soc. Vol.54, No.9
[9]
T. Kamekawa, “The Effect on Spatial Impression
Kamekawa et al.
of the Configuration and Directivity if Three Frontal Microphones Used in Multi-channel Stereophonic” AES 28th International Conference. Pitea, Sweden. 2006. [10]
S. SATOH, “Statistical Sensory Testing.” Union of Japanese Scientists and Engineers Publishing. 1985. (Available only in Japanese language)
[11]
“STEP User Manual”, Audio Research Labs.
[12]
W. Martens, A. Marui, and S. Kim, “Investigating Contextual Dependency in a pairwise Preference Choice Task”, AES 28th International Conference, pp307-316
[13]
T. Hanyu, “Room acoustical parameters”, J. Acoust. Soc. Japan. Vol.60, No.2 pp.72-77 (2004). (Available only in Japanese language)
[14]
D. Griesinger, “Measurement of acoustic properties through syllabic analysis of binaural speech” power point document for IAC2004, http://www.davidgriesinger.com/
AES 40th International Conference, Tokyo, Japan, 2010 October 8–10
10