Preference and loudness judgments of multi-tone sounds and their relationship to psychoacoustical metrics Stephan Töpken1 and Reinhard Weber2 1, 2
Carl von Ossietzky University Oldenburg – Acoustics Group Carl-von-Ossietzky-Str. 9-11, 26129 Oldenburg, Germany
ABSTRACT In sound quality assessments it is often of interest to find objective, algorithm based psychoacoustical descriptors reflecting the results from listening tests. In this paper psychoacoustic metrics are compared with the results of subjective loudness and preference judgments. In separate experiments the points of subjective equality (PSEs) for loudness and for preference are determined for multi-tone test sounds as level differences compared to a fixed reference sound. The spectral content of the overall 25 test sounds, each consisting of up to 460 partials, is varied. For this set of test sounds the values of six loudness and six other psychoacoustic metrics are calculated as possible descriptors. The metrics include the existing standards for the loudness of stationary and time variant signals as well as roughness, fluctuation strength, sharpness and tonality measures. The results show that the subjective loudness judgments are best reflected by loudness values based on the DIN 45631 / ISO 532-standard for stationary signals. The preference judgments are best reflected by the sharpness metrics.
1. INTRODUCTION Multi-tone sounds produced by rotating machinery are a part of our everyday life. For the evaluation of their sound quality the assessment by subjective listening tests is normative. Nevertheless, it is also desirable to find objective metrics reflecting the subjective impressions and judgments. One important aspect of a sound is its pleasantness/unpleasantness or acceptance. It is generally accepted that the dB(A)-level of a sound does neither describe its unpleasantness nor its loudness. Nevertheless often correlations between the unpleasantness and the loudness of sounds are found. However previous experiments of the authors have demonstrated that there is a clear difference between loudness and unpleasantness of sounds [1]. The unpleasantness of multi-tone sounds has been determined by preference judgments in a paired comparison paradigm with a fixed reference sound. In this paper the relationship between the subjective loudness and preference judgments of 25 multi-tone sounds and 12 different psychoacoustic metrics are presented.
1 2
[email protected] [email protected]
1
2. METHODOLOGICAL SETTING 2.1 Determination of the subjective loudness and preference of multi-tone sounds In two separate listening experiments the loudness and the preference of multi-tone sounds are determined. In a paired comparison the points of subjective equality (PSEs) for loudness and preference of multi-tone sounds are measured with respect to a constant reference sound. The PSEs are measured with an adaptive paradigm varying the level of the multi-tone test sounds. For the measurement of the preference the method is based on the following characteristics: 1. The dB(A)-level does not describe the loudness and the unpleasantness of a sound. So two sounds with the same dB(A)-level may differ in loudness as well as in unpleasantness. But the change of the dB(A)-level of a sound may change its loudness and its unpleasantness. 2. If the level of the multi-tone test sound is the same as the level of the reference (74 dB(A)), then the multi-tone sound is less preferred than the reference sound due to its unpleasant sound character. 3. If the level of the multi-tone test sound is reduced, then it becomes at some level change more preferred than the reference sound, simply because it is less loud and more pleasant than the reference. Based on these characteristics the point of subjective equality for preference was measured with an adaptive procedure varying the level of the test sounds, e.g. shown in figure 1. During the paired comparison experiment the participants are asked to decide which of the two sounds - the multi-tone sound or the reference sound - they prefer. Depending on the answer the level of the test sound is varied. It is reduced if the test sound is not preferred and it is increased if the reference sound is not preferred. The adaptive procedure is designed to converge at the 50 percent point of the psychometri c function which is the PSE by the simple 1-up 1-down staircase procedure. The level step size is halved after each upper reversal point of the level curve over the trials. In a similar way the PSE for loudness can be determined. In this case the participants are simply asked: “Which sound is louder?” Here the level of the test sound is reduced if it is louder and increased if the reference is louder.
Figure 1 – Exemplary development of the test sound level over the trials of the staircase method. Symbols indicate the answer of the participant. The result of the matching procedure is a level difference ΔL. Depending on the question during a task, it is a relative measure for the loudness or for the pleasantness of the test sound compared to the reference sound. The results of such an evaluation of the preference and the loudness are db(A)-level differences ΔLpreference and ΔLloudness necessary to make the test sound on the one hand equally preferred as the reference sound (figure2, I) and on the other hand equally loud (figure 2, II). Generally the test sounds need more reduction in level to become equally preferred than the reduction necessary to become equally loud.
2
Figure 2 – Exemplary level structure resulting from the paired comparison of test sound and reference sound. The level of the reference is kept constant at 74 dB(A). In comparison to the level of the reference sound the db(A)-level of the multi-tone test sound has to be attenuated by the level difference ΔLloudness in order to obtain the same loudness as the reference sound at the point of subjective equality for loudness (II). An even higher level decrease ΔLpreference is required for the multi-tone test sound to achieve an equal preference with respect to the reference sound (I). The level difference ΔLsound character = ΔLpreference - ΔLloudness is attributed to the differences in sound character between the multi-tone test sound and the reference sound (III). An additional attenuation of ΔLsound character = ΔLpreference - ΔLloudness is necessary to make an equally loud test sound also equally preferred. This additional portion in level difference between these two judgments can thus be attributed to the sound character (figure 2, III). The presented way of measuring the loudness and the preference judgments on the same dB-scale makes a quantification of this in-between portion possible and allows for a quantitative differentiation between the two assessments. 2.2 Listening setup and participants The listening tests take place in the anechoic chamber of the University of Oldenburg with a lower limiting frequency of 50 Hz. The task itself is implemented as a Matlab® routine on a computer (positioned outside the anechoic chamber). An external soundcard (M-Audio, Fast Track Pro) supplied the audio signals to an active loudspeaker (Mackie, HR 824) positioned in front of the participant seated inside the anechoic chamber. The experimental routine is operated by the participant via a computer keyboard and a TFT-screen, placed underneath the loudspeaker. The listening tests have been carried out in multiple studies with 38 to 47 participants, aged between 18 and 31 years. The ratio of female and male participants is kept balanced best possible. 2.3 Stimuli Synthetic multi-tone sounds consisting of up to 460 superposed partials composed of two complex tones (CX1 and CX2) and additional combination tones (CTs) are used as test sounds. All partials are based on two fundamental frequencies f10 100 Hz for the complex tone CX1 and f10 132.66 Hz for CX2. The frequency components in detail are given by:
CX1 :
fi 0 i f10 i 130
CX 2 : f0 j j f01
j 130
CTs : fij i f10 j f01 i, j 120 f10 100Hz
f 01 132.66Hz
(2) (3) (4) (5)
The starting phases for all partials are taken from one set of equally distributed rand om values between zero and 2π. To determine the relationship between subjective judgments and psychoacoustic
3
metrics overall 25 different multi tone test sound differing in the proportion of the complex tones and combination tones and the spectral envelope are prepared. The reference sound is a noise signal with a spectral slope of approximately -6 dB per octave up to 1 kHz and -12 dB per octave above 1 kHz. It has a constant level of 74 dB(A). 2.4 Psychoacoustic metrics Overall six loudness metrics (table 1) and six other psychoacoustic metrics (table 2), are calculated for the 25 test sounds. The calculations of the metrics are carried out with the help of a loudness toolbox for Matlab® [7] and commercial acoustic software [6]. All calculations are based on sound samples with durations of five seconds and a sound pressure level of 74 dB(A). For the psychoacoustic metrics (roughness, fluctuation strength, sharpness and tonality), the calculated values are the 95%-percentile of the calculated values over time. Only the last second of the samples (4s-5s) is evaluated, due to some overshoot effects in some of the metrics. Table 1 – Calculated loudness metrics, Authors and software version for the comparison with the loudness judgments in terms of level differences ΔLloudness. The first two loudness metrics (i, ii) are identically to the two (a, b) included in the set of the eight psychoacoustic metrics in table 2. ID
Loudness metric
Author
Software
(i)
DIN 45631
Zwicker
ARTEMIS 10, HEAD acoustics
(ii)
ISO 532
Zwicker
ARTEMIS 10, HEAD acoustics
(iii)
ANSI S3.4, 2007
Moore et al.
MATLAB, Genesis Acoustics
(iv)
DLM, short term loudness
Fastl, Chalupper
MATLAB
(v)
Time varying loudness, N5
Moore et al.
MATLAB, Genesis Acoustics
(vi)
Time varying loudness, N5
Zwicker
MATLAB, Genesis Acoustics
Table 2 - Calculated psychoacoustic metrics (and the software settings) for the comparison with the pleasantness judgments in terms of level differences ΔLpreference. All metrics are calculated by Artemis 10 (HEAD acoustics) [6].The first two loudness metrics (a, b) are the same as (i, ii) included in the set of the six loudness metrics in table 1. ID
Metric
Properties
(a)
loudness DIN 45631
DIN method, Soundfield: free
(b)
loudness ISO 532
ISO method (FFT/ISO 532B), Window: Hanning, FFT-size: 4096, Overlap: 50%, Soundfield: free
(c)
Roughness
Artemis method
(d)
fluctuation strength
Artemis method, Resolution: 1/1 bark
(e)
sharpness (Aures)
Calculation method: DIN, Sharpness method: Aures, Soundfield: free
(f)
sharpness (von Bismarck)
Calculation method: DIN, Sharpness method: von Bismarck, Soundfield: free
(g)
sharpness (DIN 45692)
Calculation method: DIN, Sharpness method: DIN Soundfield: free
(h)
Tonality
Artemis method, Overlap:50%
4
3. RESULTS In order to identify possible objective descriptors for the subjective loudness and pleasantness (preference) the results of the listening tests in terms of level differences are presented in relation to the calculated values of the different psychoacoustic metrics. The subjective lo udness judgments (ΔLloudness ) are presented with the loudness values calculated by the six different loudness metrics shown in table 1. The subjective preference judgments (ΔL preference ) are presented with the eight psychoacoustic metrics of table 2 including those two loudness metrics which reflect the loudness judgments best (table 1, i and ii). 3.1 Relationship between loudness judgments and psychoacoustic metrics Figure 3 shows the mean values of the subjective loudness judgments expressed by ΔLloudness plotted over six different loudness metrics of table 1. The error bars indicating the standard error of the calculated mean values are very small. In most cases they are smaller than the marker size and therefore hidden behind the marker. The mean subjective loudness judgments are best described by the loudness calculated based on the DIN 45631(i) and on the ISO 532 standard (ii) with a correlation coefficient of r = -0.8* which is statistically significant. The well-known relationship of a loudness doubling for level increase of 10 dB is fairly well reflected by the linear regressions of all metrics with slopes between -8 dB/20 sone and -12 dB/ 20 sone [5].
Figure 3 (i-vi) – Mean values of the loudness judgments for 25 multi-tone sounds plotted as ΔLloudness over six different loudness metrics (table 1). The standard errors are astonishingly small: Error bars indicating the standard error of the calculated mean are mostly hidden behind the symbols. Loudness metrics for stationary sounds are shown in the upper row (i-iii) and metrics for time varying sounds in the lower row (iv-vi). Correlation coefficients are given in the lower left corner. Highest correlations between judged and calculated loudness are found for the DIN 45631 (i) and the ISO 532 (ii) standard, with r = -0.8* in both cases.
3.2 Relationship between preference judgments and psychoacoustic metrics The relationship between the preference judgments and the eight psychoacoustic parameters (see table 2) is shown in figure 4. Plotted are the mean values of the preference judgments ΔL preference over the calculated values for the eight parameters. The negative values for ΔL preference for all test sounds
5
indicate that all sounds are less preferred than the reference sound at the level of 74 dB(A). A level reduction between -2 dB and -22 dB is necessary to let the multi-tone sounds be equally preferred as the reference sound. Our data shows that the pleasantness of the multi-tone sounds decreases with increasing values for loudness, roughness and sharpness which is well in line with literature data [5, 8]. For the current test sounds the subjective preference judgments are best described by the sharpness metric based on the Aures method (e) with a statistically significant correlation coefficient of r = -0.95* between the preference judgments and the calculated sharpness values. Also for the other psychoacoustic metrics except for the fluctuation strength statistically significant correlation coefficients are found.
Figure 4 (a-h) – Mean values of the preference judgments for 25 multi-tone sounds plotted as ΔLpreference over values calculated from eight psychoacoustic metrics (table 2). Error bars indicate the standard error of the calculated mean. Correlation coefficients are given in the lower left corner. The highest correlation coefficient is found for the sharpness metric calculated with the Aures method (e), r = -0.95*. The basis for the combined loudness-preference method is the fact that loudness as well as preference changes if the dB(A)-level of a multi-tone sounds is varied. It turns out that different level changes are required to make the multi-tone sounds equally loud in comparison to the status where they are equally preferred. If the preference would only depend on loudness, then the status of equal loudness would also reflect the status of equal preference. So the level difference between the attenuation of the multi-tone sounds required for equal loudness and equal preference are attributed to the sound character of the different multi-tone sounds: ΔLsound character = ΔLpreference - ΔLloudness (see figure 2, III). The mean values of ΔLsound character are shown in figure 5, again plotted over the eight psychoacoustic metrics (from table 2). These level differences ΔLsound character are best reflected by the sharpness metric based on the method of von Bismarck (f) and the DIN 45692 standard (g) with a statistically significant correlation coefficient of r = 0.91*. A reduction in the shared variance between the calculated loudness values (a, b) and ΔLpreference from r 2 ≈ 61% down to r 2 ≈ 30% for ΔLsound character can be seen. The correlation coefficients for the roughness (c) and for the sharpness (e, f and g) remain on a similar level.
6
Figure 5 (a-h) – Mean values of the preference judgments for 25 multi-tone sounds plotted as ΔLsound character = ΔLpreference – ΔLloudness over values calculated from eight psychoacoustic metrics (table 2). Error bars indicate the standard error of the calculated mean. Correlation coefficients are given in the lower left corner. The highest correlation coefficient is found for the sharpness metric calculated with the von Bismarck (f) and the DIN 45692 method (g), both with a value of r = -0.91*.
4. SUMMARY The loudness and the pleasantness of multi-tone sounds are determined in listening test with up to 48 participants. In an adaptive paradigm the level of each multi-tone test sound is varied until the sound is equally loud (experiment 1) and equally preferred (experiment 2) as a reference sound which is kept constant in level (74 dB(A)). The results are thus expressed as level differences ΔLloudness and ΔLpreference. Overall 25 multi-tone sounds consisting of two complex tones and additional combination tones resulting serve as test sounds. The level ratio of the partials and the spectral envelope are varied as parameters to generate a wide variety of sounds differing particularly in spectral content. An analysis of the relationship between the subjective loudness and preference judgments and psychoacoustic metrics (as possible objective descriptors) leads to the following conclusions: The subjective loudness judgments in terms of ΔLloudness are best described in the loudness calculated by the DIN 45631 and the ISO 532 standard. Loudness metrics alone are not appropriate to describe the preference judgments. The subjective preference judgments in terms of ΔLpreference are best described by the sharpness metric (Aures method). The level differences ΔLsound Bismarck/DIN 45692 method)
character
are best correlated with the sharpness metric (von
7
REFERENCES [1] S. Toepken and R. Weber, “Differentiating between loudness and preference in the case of multi-tone stimuli”, Proceedings of the 21st International Congress on Acoustics, ICA 2013, Montreal, Canada (2013) [2] ISO 532:1975, “Acoustics - Method for calculating loudness level” [3] DIN 45631/A1, “Calculation of loudness level and loudness from the sound spectrum - Zwicker method - Amendment 1: Calculation of the loudness of time variant sound”, 2008 [4] ANSI S3.4-2007, “American National Standard Procedure for the Computation of Loudness of Steady Sound”, 2007 [5] E. Zwicker and H. Fastl, Psychoacoustics: Facts and Models 3rd ed. (Springer, Berlin/Heidelberg, 2007) [6] Artemis 10, HEAD acoustics® [7] Loudness Toolbox for Matlab®, Genesis, www.genesis-acoustics.com [8] W. Aures, “Der sensorische Wohlklang als Funktion psychoakustischer Empfindungsgrößen (Sensory pleasantness as a function of psychoacoustic sensations),” Acustica, 58, 282–290 (1985)
8