ACTA ACUSTICA UNITED WITH Vol. 97 (2011) 432 – 440
ACUSTICA DOI 10.3813/AAA.918424
Comparison of Different Calculation Methods of Effective Duration (τe ) of the Running Autocorrelation Function of Music Signals Shin-ichi Sato, Shuoxian Wu
State Key Laboratory of Subtropical Building Science, South China University of Technology, Guangzhou 510640, China.
[email protected]
Summary This paper discusses methods to calculate the effective duration τe of the running autocorrelation function (ACF) of music signals. The iterative method is newly proposed and compared with the methods of previous studies. The index R, describing the accuracy of the τe regression, is also introduced. The iterative method was found to improve the accuracy of the τe regression for orchestral music signals with a wide variation of ACF envelopes. The resulting τe is discussed with reference to music tempo; the τe value increased as the music tempo decreased. Also, a longer inter-onset interval led to a long τe value and staccato results in a short τe value. PACS no. 43.75.Zz, 43.60.Cg
1. Introduction The preferred conditions of the reflections and reverberation in a concert hall depend on the nature of the source signal. Ando found that the preferred delay time of a single reflection depends on the music motifs, and that it corresponds to the effective duration of the autocorrelation function (ACF) of the source signal [1]. The effective duration of the ACF is defined by the delay τe , such that the envelope of the normalized ACF becomes smaller than 0.1 after the delay. The preferred conditions of sound fields with multiple reflections and reverberation also depend on the music motifs in terms of the ACF of the source signal [2, 3, 4, 5]. The above studies analyzed long-time ACF of source signals (Integration interval, 2T = 33–35 s). Ando et al., however, introduced a running ACF (r-ACF) analysis of a music signal because the τe values of the long-time ACF and the r-ACFs of Japanese Shakuhachi music were found to be very different [6]. The analysis showed that the results of the preferred delay time of a single reflection can be described by the minimum effective duration of r-ACF (τe )min , with 2T = 2.0 s, which roughly relates to the “psychological response” for every moment at which we may perceive certain subjective attributes. Kuroki et al. used 10 different musical pieces for their preference tests [7]. They found that the most-preferred delay time of a single reflection and reverberation time can be calculated by the (τe )min of the musical pieces.
Received 2 September 2010, accepted 29 January 2011.
432
Not only the subjective preference explained above, but also other subjective attributes can be explained by the factors extracted from the ACF. Other important factors extracted from the ACF are the energy at the origin of the delay Φp (0), the delay time and amplitude of the first major peak of the normalized ACF τ1 and φ1 , respectively, and the width of amplitude φ(τ) around the origin of the delay time, defined at a value of 0.5Wφ(0) [8]. The ACF is known as a method for estimating the fundamental frequency of a sound signal, as determined by τ1 . The pitch strength can be explained by φ1 . Also, the annoyance of noises has been investigated with relation to their ACF factors [9, 10]. Practical problems of r-ACF analysis include how to calculate the ACF and how to evaluate the initial envelope decay for the τe value. Kato et al. discussed the procedure for calculating r-ACF and the definition of “the initial part” for calculating its (τe )min [11]. However, their investigation only used singing voices with a stationary amplitude and pitch, and the music signals analyzed by the r-ACF were therefore limited. Studies on the relationship between subjective preferences and the ACF of the source signals used musical pieces with durations of only a few seconds, and most of the music signals were comprised of a single instrument and vocal signals. Therefore, this study investigates the r-ACF analysis of orchestral music signals of a few minutes duration and which show a wide variation in tempo and composition. First, the τe of the orchestral music signals are calculated using the previous methods investigated by Ando et al. [6] and Kato et al. [11]. An indicator to express the accuracy of the τe calculation, that is, the accuracy of the regression of the peaks of the initial envelope of the r-ACF, is intro-
© S. Hirzel Verlag · EAA
ACTA ACUSTICA UNITED WITH ACUSTICA
Sato, Wu: Running autocorrelation of music
Vol. 97 (2011)
Table I. The eight music signals. Frames: Number of frames for the r-ACF analysis. τs : Average duration of quarter notes calculated by Hidaka et al. [12]. Composer
L’ Arlésienne, Suite No. 2 Prelude to Act 1, La Traviata Overture, Le nozze di Figaro Pizzicato Polka Prélude à l’après-midi d’un Faune Overture, Ruslan and Lyudmila 4th mov., Symphony No. 3 in a minor Op. 56 ‘Scottish’ 1st mov., Symphony No. 4 in E flat major ’Romantic’
Bizet Verdi Mozart Johann & Josef Strauss Debussy Glinka Mendelssohn Bruckner
duced because such an indicator has yet to be investigated. Then, the iterative method to calculate the τe of r-ACF is newly proposed to make up for the shortcomings of previous methods. Furthermore, the resulting τe is discussed with reference to the music tempo. The tempo of the orchestra music was then changed and the r-ACF analyzed. Finally, an accordion music performance with a different tempo was analyzed using r-ACF to confirm the proposed method.
2. Procedure 2.1. Sound signal As listed in Table I, eight orchestral music signals from the CD (DENON PG-6006) were analyzed by the r-ACF. The long-time ACF of these music signals has previously been investigated, and the relationship with the music tempo has been clarified [12]. The duration of the signals, the number of frames for the r-ACF analysis, and the music tempo τs , which is defined by the average time required per quarter note, are shown in Table I.
Duration
Frames
τs [ms]
4’12 3’26 4’18 2’34 1’53 5’21 2’18 1’40
2494 1994 2551 1400 1070 3045 1358 957
872 1038 215 691 1352 198 472 441
0
10log10|p(;t,T)| [dB]
1 2 3 4 5 6 7 8
Title
(a)
-5 - 10 - 15 0
50
100
150
0
10log10|p(;t,T)| [dB]
No.
200
(b)
-5 - 10 - 15 0
50
100 Delay time [ms]
150
200
Figure 1. Examples of τe regression by the Ando method. (a) Δτ = 5 ms (Music 6); (b) Δτ = 10 ms (Music 6). These figures also show examples of the high (a) and low (b) R values of the peaks used for the straight line regression.
2.2. Procedure for calculating r-ACF r-ACF as a function of time lag τ is calculated as [13] φp (τ; t, T ) =
Φp (τ; t, T ) Φp (0; t, T )Φp (0; τ + t, T )
where 1 Φp (τ; t, T ) = 2T
2T 0
p (s)p (s + τ) ds.
,
(1)
(2)
2T represents the integration interval and p (s) = p(t)∗ s(t). Function p(t) denotes the amplitude of the original waveform of the signal, and function s(t) was chosen as the impulse response of the A-weighting filter corresponding to ear sensitivity. The r-ACF by equations (1) and (2) ensures that the normalized ACF satisfies φp (0) = 1 and φp (τ) ≤ 1 at τ > 0. The r-ACFs are obtained by the FFT methods based on the Wiener-Khinchine theorem after obtaining the power density spectrum for each signal with 2T . A rectangular window as a time window function was used. As described in Appendix B in [11], “FFT method C” was used to obtain the ACF corresponding to the direct method. Although the choice of 2T is another problem to
be clarified, 2T was fixed at 2.0 s here because the purpose of the study was to investigate the procedure to calculate the τe of r-ACF. The running interval was 100 ms, and the maximum time lag τmax to obtain the τe , was set at 200 ms. This study focused on the τe calculation of the music signals. Therefore, the envelope decay for the initial part of the logarithm of the r-ACF of musical signals can be approximated by linear regression. Not all the envelope of the r- ACF of the sound signals is linear. For example, the envelope of the ACF of the bandpass noise is not exponential [14, 15]. 2.3. Calculation of τe Ando (1989) method Figure 1 shows examples of the logarithm of the absolute value of r-ACF as a function of the time lag τ. τe is defined by the time lag at which the envelope of the normalized r-ACF becomes −10 dB. The envelope of maxima of Δτ intervals is drawn by a straight line regression for the delay range τmax . The original definition by Ando et al. used 5 and 50 ms for Δτ and τmax , respectively [6]. Be-
433
ACTA ACUSTICA UNITED WITH Vol. 97 (2011)
ACUSTICA
Sato, Wu: Running autocorrelation of music
Table II. Median and minimum R values of each regression method (average for the eight music signals, A5–A10 according to Ando method, Kato method and iterative method) as well as the number of frames whose R value are less than 0.85, and the number of incorrect regression cases shown in Figure 3 (total number for the eight music signals). A5
A6
A7
A8
A9
A10
Kato
Iterative
Median Minimum Frequency of R < 0.85 (%)
0.83 −0.21 7353 (49.5)
0.89 −0.25 4576 (30.8)
0.91 −0.12 4104 (27.9)
0.92 0.11 2938 (19.8)
0.94 0.18 2338 (15.7)
0.93 0.11 2363 (15.9)
0.96 0.01 2131 (14.5)
0.97 0.82 17 (0.114)
Incorrect regression
170
118
71
21
19
15
9
0
0
0
- 0.1 76
77 time [s]
-5
- 10
50
- 10 - 15 0
50
100 Delay time [ms]
150
200
Figure 2. Example of a frame which has large amplitudes only in the last part (Music 4). In such cases, τe becomes considerably short.
cause there are many variations of the initial envelope of the r-ACF of the orchestral music signals, the different Δτ (5 to 10 ms in 1 ms steps) were examined. τmax was 10Δτ. The peak of the first interval (0 to Δτ ms) was not used for the straight line regression because some r-ACF frames showed a prominent peak at τ = 0, as shown in Figure 1b. Because the music signals analyzed sometimes included a blank (silence) between the notes or the bars, all frames whose energy Φp (0) was at least −30 dB below the maximum were excluded. Also, there were some cases where the frame was extracted so that only the last part had large amplitudes (Figure 2). In such cases, the calculated τe was considerably short. Therefore, frames whose 90 percentile value of the power of the amplitude was at least −30 dB below the maximum were also excluded. To describe the accuracy of the τe calculation, the negative value of the correlation coefficient (−r) of the peaks used for the straight line regression was defined as R because the correlation coefficient of the peaks usually shows a negative value. Thus, a larger R value indicates better accuracy of the τe regression. An R value higher than 0.85 practically assures sufficient accuracy. In cases of incorrect regression, the slope of the regression line is opposite, and the R value shows a negative value (the correlation coefficient shows a positive value), as shown in Figure 3. Table II shows the median and the minimum R values for each Δτ of the Ando method. The R value increases as Δτ
434
-5
- 15 0
100 Delay time [ms]
150
200
Figure 3. Example of an incorrect regression case (Music 1). The slope of the regression line is opposite (R value is negative).
0 10log10|p(;t,T)| [dB]
10log10|p(;t,T)| [dB]
0
78
10log10|p(;t,T)| [dB]
0.1
(a) 5 dB
-5 - 10 - 15
0
50
100
150
0 10log10|p(;t,T)| [dB]
Amplitude
Δτ [ms]
200
(b)
-5 100 ms
- 10 - 15 0
50
100 Delay time [ms]
150
200
Figure 4. Examples of τe regression by the Kato method. (a) Music 6; and (b) Music 5.
increases. Also, the number of incorrect regression cases decreases as Δτ increases. Due to the wide variation of the orchestral music, the fixed Δτ was not suitable for accurate τe calculation. Kato (2006) method Kato et al. determined Δτ according to the fundamental period of the voice signal [11]. Figure 4 shows examples of τe regression by the Kato method. The major local peaks corresponding to multiples of the fundamental period, in
ACTA ACUSTICA UNITED WITH ACUSTICA
Sato, Wu: Running autocorrelation of music
Vol. 97 (2011)
1.0
R
e for Δ = 3 ms (RΔ=3ms)
1
0.8
e for Δ = 10 ms (RΔ=10ms)
3 e is determined by Δ which gives the maximum accuracy R max for Δ=3-10ms
4 5
0.4
6 Is R max for Δ=3-10ms > 0.85?
7
0.2
8 0
10
1 [ms] 20
No
Yes e = e (R max for Δ = 3-10 ms)
e for Δτ = 20 ms (R Δ=20ms)
30
Figure 5. Relationship between τ1 and R obtained by the Kato method. The number in the legend corresponds to the music signal number in Table I. There were 9 cases of regression error (shown in Figure 3).
Is R Δ=20ms > 0.85? No
Yes e = e for Δ = 20 ms
e for Δτ = 30 ms (R Δ=30ms)
e for Δτ = 50 ms (R Δ=50ms)
0
10log10|p(;t,T)| [dB]
…
2
0.6
0.0
e for Δ = 4 ms (RΔ=4ms)
Is R Δ=50ms > 0.85?
-5
No
Yes e = e for Δ = 50 ms
e = e (R max for Δ = 3-10, 20, 30, 40, and 50 ms)
- 10
Figure 7. Illustration of the iterative method. - 15 0
50
100 Delay time [ms]
150
200
Figure 6. Case where the period of τ1 does not correspond to the envelope of r-ACF (Music 4).
the range up to the amplitude of the first major peak subtracted by 5 dB, were used for a straight line regression. The fundamental period was calculated as τ1 , the delay time of the maximum peak in the range from the first zerocross to 30 ms of the r-ACF. If all of the major local peaks up to τmax = 100 ms exceeded the amplitude of the first major peak subtracted by 5dB, then a straight line was fitted to all the major local peaks (Figure 4b). (Note that Kato et al. used τmax = 50 ms.) Usually, the τe of the instrument signal is longer than that of the voice signal. As shown in Table II, the median R value was greater than that of the Ando method. Also, the number of incorrect regression cases was smaller than that of the Ando method. However, the minimum R value was still low, and thus the accuracy of the τe regression was insufficient. Figure 5 shows the relationship between τ1 and the R value obtained by the Kato method. Lower R values were observed as τ1 was shorter. As shown in Figure 6, there were cases where the Δτ determined by τ1 did not correspond to the envelope of r-ACF. Iterative method As investigated above, the fixed Δτ of the Ando method was not suitable for detecting the peaks for the τe regression because of the wide variation in envelope decay of the r-ACF of the orchestral music signals. Also, the Kato method, which determined the Δτ according to the fundamental period τ1 , showed a problem with the regression
accuracy, especially when Δτ is shorter, because the fundamental period of the orchestra music signal was not as simple as the voice signal. Therefore, an iterative method for τe regression was newly proposed: 1. τe is calculated with Δτ = 3–10 ms (in 1 ms steps). τmax = 10Δτ. 2. τe with Δτ which gives the maximum R value, is selected. 3. When the R value in 2) is below 0.85, τe is calculated with Δτ = 20 ms and τmax = 200 ms, and the τe with the larger R value among 2) and 3) is selected. When only the first two peaks are used for the regression, the R value is always 1.0, but it is difficult to judge whether the regression is correct. Therefore, the first three peaks in the range τ = 20–40, 40–60, and 60–80 ms are always used for the straight line regression. The peaks after τ = 80 ms are used if its amplitude is greater than the amplitude at τ1 subtracted by 5 dB. 4. When the R value in 3) is still below 0.85, τe is calculated with Δτ = 30 ms and τmax = 200 ms, and the τe with the larger R value among 3) and 4) is selected. The first three peaks in the range τ = 30–60, 60–90, and 90–120 ms are always used for the regression. The peak after τ = 120 ms is used if its amplitude is greater than the amplitude at τ1 subtracted by 5 dB. 5. 4) is repeated up to Δτ = 50 ms. The range of Δτ (3–50 ms) was determined so that no incorrect regression (shown in Figure 3) occurs. The procedure of the iterative method is illustrated in Figure 7. Figure 8 shows an example of the comparison of the different τe regression methods (Music 4). The R values
435
ACTA ACUSTICA UNITED WITH Vol. 97 (2011)
ACUSTICA
Sato, Wu: Running autocorrelation of music
1000
1.0 0.8
e
10
R
[
100
0.6
(a)
0.4
0
50
Time [s]
100
0.2 . 00
150
1000
0
50
Time [s]
100
150
1.0 0.8 0.6
R
e
[
100
(b)
0.4 0.2
10
0
50
Time [s]
100
0.0
150
1000
0
50
Time [s]
100
150
1.0
100
R
e
[
0.8 0.6
(c)
0.4 0.2
10
0
50
Time [s]
100
150
0.0
0
50
Time [s]
100
150
Figure 8. Example of the comparison of the three procedures for calculating the τe of the r-ACF (Music 4). (a) Ando method; (b) Kato method, and (c) Iterative method. Left: τe ; Right: R. Values less than 10 ms in τe and 0.0 in R indicate the regression error shown in Figure 3 (Three cases for the Ando method and four cases for the Kato method, respectively). Table III. Percentile τe of the eight music signals as well as (τe )min [ms]. The music number corresponds to that of Table I. No.
(τe )min
5% τe
25% τe
1 2 3 4 5 6 7 8
46 24 25 14 33 12 26 48
73 37 36 50 79 29 43 74
113 59 58 66 119 51 59 101
No.
50% τe
75% τe
95% τe
1 2 3 4 5 6 7 8
163 84 80 82 175 73 76 139
249 116 112 119 446 104 96 189
457 201 190 393 1227 293 136 328
of the iterative method were greatly improved when compared with the previous methods. The three curves of the τe look similar, although there were 391 and 776 cases for the Ando and Kato methods, respectively, where the difference in τe with the iterative method was larger than 10%. In total, for the eight music signals, there were 6346 and
436
7547 cases for the Ando and Kato methods, respectively, where the τe difference with the iterative method is larger than 10%. As shown in Table II the frames whose R was less than 0.85 greatly decreased, while there was only a 0.01 difference in the median of the R values of the Kato and iterative methods. Before establishing the index R to describe the accuracy of the τe regression, each τe value must be checked by visual inspection to see whether the regression line properly fits the envelope of the r-ACF. It is not easy to check thousands of r-ACF frames by visual inspection. Figure 9 shows the τe of the r-ACF of the eight music signals. The percentile τe values, as well as (τe )min , are shown in Table III. It is noted that subjective preference theory emphasiszes the importance of (τe )min because such parts include the densest information and are thus deeply connected to the subjective response. This is because subjective judgments have been conducted using music pieces of several seconds in length. However, music signals of more than a few minutes in length were analyszed in this study. The subjective preferences or hearing impressions regarding the different reverberation times and initial time delay gaps depend on what part of the passage is chosen for the hearing test. Wide variations in the tempo and composition of the music signals used in this study produced a wide variation in the envelope of the r-ACF. Therefore, the fixed Δτ and the Δτ according to τ1 were not suitable for detecting the periodical peaks for the τe regression accurately. The
ACTA ACUSTICA UNITED WITH ACUSTICA
Sato, Wu: Running autocorrelation of music
Vol. 97 (2011)
100
100
e
e
[
1000
[
1000
10
1 0
50
100 150 Time [s]
200
10
250
[
100
10
3 0
50
100 150 Time [s]
200
[
e
5
200
4 0
50
Time [s]
100
150
0
50 Time [s]
100
10
100
6 0
1000
100
100
[
1000
e
[
e [
e
150
1000
100
10
100 Time [s]
100
10
250
1000
10
50
1000
e
e
[
1000
2 0
7 0
50
Time [s]
100
10
100
Time [s]
200
300
8 0
newly proposed iterative method showed reasonable accuracy in the τe regression. Only 8 songs were used for the rACF analyses in this study; however, the number of frames for the r-ACF analysis was 14869 in total, and the range of τe was from 12 to 4614 ms. Therefore, the amount of calculations in this study is sufficient to show the applicable range of the iterative method. For a wider signal variation, the range of Δτ in the iterative method may be changed.
3. Discussion with relation to music tempo The τe of a music signal depends on the performing style characteristics, such as music tempo, articulation, and vibrato, etc. Hidaka et al. investigated the long time ACF of orchestral music signals including the eight music signals used in this study [12]. They found a relationship between τe and the music tempo (the average duration of quarter notes τs ). In this study, the correlation coefficients between τs values and the 1, 25, 50, 75, and 99 percentile τe values of the r-ACF of the eight music signals were 0.54, 0.61, 0.62, 0.73, and 0.75, respectively. The 75 and 99 percentile τe showed higher correlation coefficients with τs than the τe of the long time ACF calculated in [12]. A part of the
25
50 Time [s]
75
Figure 9. τe of the r-ACF of the eight music signals. The music number corresponds to that of Table I.
music signal with a longer τe reflects the music tempo. Taguti and Ando investigated the r-ACF of a piano signal played in various performing styles [13]. They showed that a fast tempo resulted in a short τe value, and a slow tempo led to a long τe value. To examine the effect of music tempo on τe , the tempo of the music signals used in this study was changed, and the τe was investigated. The tempo was modified so that the duration of the signals becaomes 0.8, 0.9, 1.1, and 1.2 in relation to that of the original signal (1.0). To modify the music tempo without a pitch shift, the signal was divided into pieces of short duration, and then these pieces were overwrapped using software (‘Stretch Effect” in Adobe Audition). Figure 10 shows the 5, 50, and 95 percentile τe of the eight music signals. The τe roughly increased as the tempo decreased. The results of an analysis of variance showed a significant increase of τe for the tempo changes from 0.8 to 0.9 and from 0.9 to 1.0. To further investigate the relationship between τe and the music tempo, accordion music signals played at different tempos were analyszed by the r-ACF. The music signal was the Chinese song “MoLiHua (Jasmine Flower)”. The accordion performer was asked to play this music 1)
437
ACTA ACUSTICA UNITED WITH Vol. 97 (2011)
100
ACUSTICA
200
5 percentile τ e
Sato, Wu: Running autocorrelation of music
50 percentile τe
60 40
[ms]
100
0.8
0.9
1.0
1.1
1.2
50
0.8
0.9
1.0
1.1
1.2
Music tempo
95 percentile τ e 1
e [ms]
3 4 5
500
6 7 8 0.8
0.9
1.0
1.1
1.2
Music tempo
Figure 10. Percentile τe of the orchestra music signals with different tempos.
1000
100 Fast
10 1000 0
e [ms]
5% Min. Fast
N to F S to F
F to N Normal S to N F to S N to S Slow
Figure 12. Percentile τe of the accordion music signals with different tempos. For example, “N to F” means that a signal played with Normal tempo was modified (compressed) so that its duration became the same as that of a Fast tempo signal. F: Fast; N: Normal; and S: Slow.
2
1000
0
Median
50
Music tempo 1500
95%
100
20 0
Max.
1000
e
150
e [ms]
e [ms]
80
2000
10
20
30
40
50
10
20
30
40
50
10
20
30 Time [s]
40
50
100 Norm al
10 1000 0
100
quently as the tempo decreased, while the (τe )min value did not show a large difference (62, 63, and 64 ms for fast, normal, and slow tempos, respectively). This lower limit may be caused by a mechanism in producing sound from the instrument. It should be noted that the lower limit of τe for the piano was about 60 ms [14]. When the accordion music was performed at different tempos, not only the inter-onset interval, but also the performing styles like articulation and attack had changed. To examine the effect of the performing style on τe , the original signals were stretched or compressed so that the duration of the signals was modified to that of the other tempos. The modification was done by the method described above. By such manipulation, the inter-onset interval can be changed while maintaining the articulation and attack of the performance. Figure 12 shows the percentile τe according to the music tempo. Even though the signal played at a fast tempo was stretched, the τe value did not increase because the short τe was realized by staccato rather than a short inter-onset interval. When the signal played at a slow tempo was compressed, on the other hand, the τe value decreased as the tempo increased. When the accordion music was played with a slow tempo, each note was played with less attack. Thus, the inter-onset interval mainly affected the τe value. Regarding the accuracy of the τe regression, the R values were not less than 0.73, and 99.4% of the data were more than 0.85, with sufficient accuracy.
Slow
10
0
Figure 11. τe of the accordion music signals with different tempos.
at normal speed, 2) as slow as possible, and 3) as fast as possible. The sound signals were recorded by a 1/2" condenser microphone in front of the instrument (0.5 m) and were stored in a PC through the microphone amplifier and the audio-interface. The τe of the r-ACF was calculated by the iterative method, as described above. Figure 11 shows the τe of the r-ACF of the accordion music signals at different tempos. As expected, longer τe appeared more fre-
438
4. Conclusions Methods to calculate the τe of the r-ACF for music signals showing a wide variation in tempo and composition were investigated. The iterative method, newly proposed in this study, was compared with previous methods. Index R, defined by the correlation coefficient of the peaks used for the regression and expressing the accuracy of the τe regression, was also introduced. The iterative method showed better R values than those of the previous studies. The resulting τe was discussed with reference to music tempo. The percentile τe values correlated to the tempo of the music signals. The new τe regression method was also confirmed using accordion music signals which were
ACTA ACUSTICA UNITED WITH ACUSTICA
Sato, Wu: Running autocorrelation of music
0
Vol. 97 (2011)
1000
r- ACF
(a)
- 10 - 15 0
50
100
150
0
100
200
Hilbert
[ms]
10log10|p(;t,T)| [dB]
-5
-5
10 1000
(b)
e
- 10 - 15 0
50
100 Delay time [ms]
150
200
100
Figure A1. Comparison of r-ACF (top) and its Hilbert envelope (bottom). 10
No.
Hilbert Envelope
Int. (50 ms)
Int. (200 ms)
1 2 3 4 5 6 7 8
0.94 0.92 0.96 0.92 0.99 0.89 0.68 0.79
0.31 0.63 0.72 0.52 0.41 0.53 0.50 0.48
0.32 0.22 0.48 0.64 0.30 0.48 0.31 0.48
0
Ando et al. discussed other methods to obtain the τe of the r-ACF: (1) the Hilbert envelope and (2) the integration of squared ACF φ2 (τ) [6]. Figure A1 shows a comparison of the r-ACF and its Hilbert envelope. As already mentioned in [6], the Hilbert envelope still leaves too many fine structures in the envelope. τe was calculated from the Hilbert envelope using the iterative method. Table A1 shows the correlation coefficient of τe between the r-ACF and its Hilbert envelope. Figure A2 shows a comparison of the τe values calculated from the r-ACF and its Hilbert Envelope (Music 7). Even though the correlation coefficient of this music signal is the lowest among the eight signals, the behaviours of the two curves are similar to each other. Thus, the Hilbert envelope method requires a rather long computation time to obtain. Regarding the integration method, Kato et al. found that the integration range affects the range of τe [11]. As shown in Figure A3, τe was obtained from the backward integration of the range of 50 and 200 ms. Figure A4 shows the running τe obtained by the integration method (Music 3). The integration method highlights the problem associated
100
-5
- 10 Integration (50 ms)
Integration (200 ms)
50
100 Delay time [ms]
150
200
Figure A3. Integration method.
1000
200 ms
e [
Appendix
Time [s]
0
- 15 0
played with different tempos. The τe value became shorter as the tempo increased, and a longer inter-onset interval led to a longer τe value, while a staccato result led to a shorter τe value.
50
Figure A2. Running τe obtained by r-ACF (a) and its Hilbert envelope (b) (Music 7).
10log10|p(;t,T)| [dB]
Table A1. Correlation coefficients with the τe values obtained by the iterative method. The music number corresponds to that of Table I.
100
10 0
50 ms 50
100 150 Time [s]
200
250
Figure A4. Running τe obtained by the integration method (Music 3).
with the dynamic range of the τe values, and their correlation coefficients with those of the iterative method were low (Table A1). Thus, the τe obtained by the integration method does not reflect the initial envelope of the r-ACF. The integration method is derived from the Schroeder integration for calculating the reverberation time [16]. The integration of the amplitude of the impulse response is mathematically reasonable; however, the integration of the correlation value does not have a mathematical background. Therefore, the regression of peaks discussed in the main sections should be used to calculate the τe of the r-ACF.
439
ACTA ACUSTICA UNITED WITH Vol. 97 (2011)
ACUSTICA
Acknowledgement The authors would like to thank Professor Yoichi Ando for his useful comments. The work was supported by the National Natural Science Foundation of China (No. 50938003). References [1] Y. Ando: Subjective preference in relation to objective parameters of music sound fields with a single echo. J. Acoust. Soc. Am. 62 (1977) 1436–1441. [2] Y. Ando, D. Gottlob: Effects of early multiple reflections on subjective preference judgments of music sound fields. J. Acoust. Soc. Am. 65 (1979) 524–527. [3] Y. Ando, M. lmamura: Subjective preference tests for sound fields in concert halls simulated by the aid of a computer. J. Sound Vib. 65 (1979) 229–239. [4] Y. Ando, K. Otera, Y. Hamana: Experiments on the universality of the most preferred reverberation time for sound fields in auditoria (in Japanese). J. Acoust. Soc. Jpn. 39 (1983) 89–95. [5] Y. Ando, M. Okura, K. Yuasa: On the preferred reverberation time in auditoriums. Acustica 50 (1982) 134–141. [6] Y. Ando, T. Okano, Y. Takazoe: The running autocorrelation function of different music signals relating to preferred temporal parameters of sound fields. J. Acoust. Soc. Am. 86 (1989) 644–649. [7] S. Kuroki, M. Hamada, H. Sakai, Y. Ando: Individual preference in relation to the temporal and spatial factors of the sound field: factors affecting individual differences in sub-
440
Sato, Wu: Running autocorrelation of music
jective preference judgments. J. Temporal Des. Arch. Environ. 4 (2004) 29–40. [8] Y. Ando: Auditory and visual sensations. Springer, New York, 2009. [9] K. Fujii, J. Atagi, Y. Ando: Temporal and spatial factors of traffic noise and its annoyance. J. Temporal Des. Arch. Environ. 2 (2002) 33–41. [10] S. Sato, J. You, J. Y. Jeon: Sound quality characteristics of refrigerator noise in real living environments with relation to psychoacoustical and autocorrelation function parameters. J. Acoust. Soc. Am. 122 (2007) 314–325. [11] K. Kato, T. Hirawa, K. Kawai, T. Yano, Y. Ando: Investigation of the relation between (τe )min and operatic singing with different vibrato styles. J. Temporal Des. Arch. Environ. 6 (2006) 35–48. [12] T. Hidaka, K. Kageyama, S. Masuda: Recording of anechoic orchestral music and measurements of its physical characteristics based on the auto-correlation function. Acustica 67 (1988) 68–70. [13] Y. Ando, H. Alrutz: Perception of coloration in sound fields in relation to the autocorrelation function. J. Acoust. Soc. Am. 71 (1982) 616–618. [14] Y. Ando: Concert hall acoustics. Springer-Verlag, Heidelberg, 1985. [15] T. Taguti, Y. Ando: Characteristics of the short-term autocorrelation function of sound signals in piano performances. – In: Music and concert hall acoustics. Y. Ando, D. Noson (eds.). Academic Press, London, 1997. [16] M. R. Schroeder: New method of measuring reverberation time. J. Audio. Eng. Soc. 35 (1987) 299–305.