Buenos Aires – 5 to 9 September, 2016 st Acoustics for the 21 Century…
PROCEEDINGS of the 22nd International Congress on Acoustics
Psychological and Physiological Acoustics (others): Paper ICA2016-150
Dimensional approach of musical emotion recognition in relation to the running autocorrelation parameters Esteban Zanardi (a), Shin-ichi Sato (b), Florent Masson(c) (a)
Universidad Nacional de Tres de Febrero, Argentina,
[email protected] (b) Universidad Nacional de Tres de Febrero, Argentina,
[email protected] (c) Universidad Nacional de Tres de Febrero, Argentina,
[email protected]
Abstract In this study, the dimensional approach of the music emotion recognition (MER) is related to running autocorrelation function (r-ACF) parameters. Therefore, a two dimensional valencearousal plane is used to classify the emotions perceived. Ranking-based subjective tests were conducted to carry out the correlation analysis with acoustical parameters. Eight different excerpts of non-classical music were used for the test. This selection was made in order to achieve a great variety of musical genres, including Rock, Pop, Tango, Jazz, and Argentinean Folklore. The results of the analyses showed that significant correlation values were achieved between arousal dimension and the delay time of the maximum amplitude of the ACF (τ 1). A lower but still high correlation was found between the amplitude of the first peakof the ACF (ϕ1) on the same dimension. Finally, valence recognition showed acceptable regression coefficients with the mean values of the width of the first peak of the ACF (W Φ(0)).
Keywords: r-ACF, Music emotion recognition.
22nd International Congress on Acoustics, ICA 2016 Buenos Aires – 5 to 9 September, 2016 Acoustics for the 21st Century…
Dimensional approach of musical emotion recognition in relation to the running autocorrelation parameters 1 Introduction Music Emotion Recognition (MER) is a technique utilized nowadays in a growing number of devices and software programs[1]. One of the main trends in this field is to find the perceived emotion by objective parameters and to relate them to the results of subjective tests for machine learning[1,2]. According to previous studies by psychologists[3], the perceived emotion is relatively less influenced by the situational and personal factors of listening than the felt (or evoked) emotion. Additionally, Thayer’s two dimensional arousal-valence emotion planes[4] is a typical representation of musical emotion classification used in the literatures [1,5] and in existent mobile applications.While arousal refers to the bodily intensity, with quiescence or sleepy emotions on its lower values and excitement or annoyance on the higher values, valence is related to positive or negative emotions. Nowadays, a large set of audio features is available regarding its frequency and temporal characteristics [1,6,7].Music signals have been analysed by running autocorrelation function (r-ACF) parameters torelate to the perceived emotions indirectly[8,9]. Some r-ACF parameters were related to timbre or loudness[8], whichhas a known connection with valence and arousal[1]. Although r-ACF has been used to characterize sound signals, little research has been done in MER analysis relating them directly[1,5]. Therefore, this research aims to use r-ACF parameters to classifythe musical emotion. Ranking-based approach [10] was used to evaluate the correlation between objective magnitudes and the two dimensions of the valence-arousal emotion plane. It appeared to imply less cognitive effort for the listener as well as delivering good representative results.
2 Procedure 2.1
Source signals
Eight non-classical music excerpts were used for the test, considering a wide variety of musical genres (Table 1).The aim of avoiding classical music was to use popular music which is expected to be more close to test subject’s everyday listening. The quantity of the samples directly relates to the procedure of the subjective qualification explained in Section 2.3. In order to limit the stereo and loudness effects on judgments, all audios with CD quality (44.1 kHz and 16 bit) were converted to monaural signals by mixing the two channels at equal level and then normalized to 0 dBFS (dB Full Scale). To make the test representative and to avoid emotion fluctuations, only 10 to 20 s of each song were extracted, taking into account the most representative part of each audio, which was most of the cases the chorus. This also contributes to shorten the test duration.
2.2
Stimuli presentation
The subjective test was conducted in a quiet room. The background noise was expected to be less than 50dBA according to community noise guidelines[11]. SONY MDR-7506 Headphones and M-Audio Fast Track Pro audio device were used to present the sound stimuli to the subjects. In order to be comfortable and, at the same time, to avoid background noise disturbance output level was set up by each listener and kept at a constant level all through the test.
2
22nd International Congress on Acoustics, ICA 2016 Buenos Aires – 5 to 9 September, 2016 Acoustics for the 21st Century…
Table 1: Eight songs selected for the test.
N°
Title
Performer
Genre
1 2 3 4 5
AntiguoReloj de Cobre Children’s Song Pickpocket Unforgiven Como la cigarra
Osvaldo Pugliese Chick Corea At The Drive-In Beck Mercedes Sosa
6 7
Dear ‘Ol Dad It might as well be spring
8
Your Company
Blind Melon Stan Getz, João and Astrud Gilberto Marketa Irglova
Tango Jazz Punk Rock Rock Argentinian Folklore Rock Bossa Nova / Jazz
2.3
Folk Rock/Pop
Duration [s] 12 13 11 17 18 15 16 17
Subjective test
A total of 16 subjects (20-40 years old) without hearing problemparticipated in the test.Seven of themhave been playing an instrument for a minimum of two years and are considered with musical skill.Nine do not have any kind ofmusical skill. The main question of the test was about the perceived valence and arousal. Therefore, prior to the test,the difference between perceived and felt emotion as well as the concepts of the arousal and valence wasfully explained to the subjects. The evaluation made by the subjects was in a ranking-based manner, used by Yang and Chen(Figure 1)[10]. It consists of a total of 7 combination pairs (N-1, where N = 8 music signals) for each subjective parameter. The eight music signals were presented randomly and each pair was played only once for each listener. The overall duration of each test wasless than 10 min.
(a)
(b) Source: (Yang and Chen, 2011)
Figure 1: Ranking-based emotion tournament[10]. Playoff matches (a) and resulting matrix (b).
3 Objective measurements 3.1
r-ACF analysis
Acoustic parameters were calculated using a self-developed code according to[12]. r-ACF parameters evaluated were: τ1, which is the time delay associated with the first ACF peak;ϕ1, which is the amplitude of the first peak; Φ(0), which is the sound energy at the time origin of the ACF, W Φ(0),which is the width around the origin of the delay time, and τe, which is the effective duration of the ACF.
3
22nd International Congress on Acoustics, ICA 2016 Buenos Aires – 5 to 9 September, 2016 Acoustics for the 21st Century…
(a)
(b) Source: (Ando, 2009)
Figure 2: Definitions of τ1, ϕ1 and Wφ (0)/2 for the normalized ACF (a) and the effective duration,τe, for normalized ACF in logarithmic scale (b). Meanand standard deviation were obtained to correlate with the arousaland valence responses (Table 2). Although the deviation is large in somecases, a certain tendency can be achieved. Table 2: Mean and standard deviationof the r-ACF parameters for the eight songs.
Song 1 2 3 4 5 6 7 8
RelativeΦ(0) [dB] -10.02.32 -7.10.83 -4.10.20 0.00.37 -8.82.56 -5.80.52 -10.41.78 -9.91.67
ϕ1
WΦ(0) [ms]
τ1 [ms]
τe [ms]
0.440.07 0.720.07 0.380.06 0.760.06 0.530.15 0.340.19 0.710.09 0.600.13
0.660.18 0.660.10 0.420.16 2.831.28 1.090.29 0.540.21 1.400.61 1.220.30
14.010.8 29.410.6 0.60.2 32.425.7 20.413.1 5.410.7 25.212.0 35.131.9
15850.8 446176.5 4214.1 586593.1 187117.1 4430.1 48536.9 437325.1
Then, correlations amongthe ACF parameters were calculated to know possible similar information achieved by different parameters.Table 3shows a clear relation between several parameters. The most important correlation were achieved comparing τ1 and τe between them and with ϕ1 (p < 0.01), also high and significant correlation were observed forW Φ(0) with τe (p < 0.05)..
Table 3: Correlation analysis amongthe acoustical parameters.
Φ(0) ϕ1 WΦ(0) τ1 τe
Φ(0)
ϕ1
WΦ(0)
τ1
0.10 0.49 -0.07 0.12
0.70 0.89** 0.97**
0.65 0.76*
0.92**
τe
**p