AbstractâSongs feel emotionally different to listeners depending on their ... to emotions such as pleasure, sadness, and love, and genres such as classic, and ...
2011 23rd IEEE International Conference on Tools with Artificial Intelligence
Lyrics-based Emotion Classification using Feature Selection by Partial Syntactic Analysis
Minho Kim, Hyuk-Chul Kwon Dept. of Computer Science Pusan National University Busan, Republic of Korea e-mail: {karma, hckwon}@pusan.ac.kr
Abstract—Songs feel emotionally different to listeners depending on their lyrical contents, even when melodies are similar. Accordingly, when using features related to melody, like tempo, rhythm, tune, and musical note, it is difficult to classify emotions accurately through the existing music emotion classification methods. This paper therefore proposes a method for lyrics-based emotion classification using feature selection by partial syntactic analysis. Based on the existing emotion ontology, four kinds of syntactic analysis rules were applied to extract emotion features from lyrics. The precision and recall rates of the emotion feature extraction were 73% and 70%, respectively. The extracted emotion features along with the NB, HMM, and SVM machine learning methods were used, showing a maximum accuracy rate of 58.8%.
classification of emotion have utilized melody-related features such as tempo, beat, tune, musical note, and rhythm. However, lyrics also are important: songs affect listeners differently depending on their lyrical contents, even where melodies are similar [5]. Lyrics, indeed, can be considered to play the role of songs’ emotion conveyors. Clearly, there is a research lacuna as regards lyrics-based emotion classification. Lyrics-based emotion classification is text-based. Methods of inferring emotions from texts are being researched by means such as machine learning, and dictionary-based strategies, for particular domains that include newspapers, blogs, movies, and conversations, among others. For example, research on emotion ontology, which defines methods of emotion classification, comprehends emotion vocabulary, emotion specifications, and their characteristic values. This paper, presenting the results of a study on lyricsbased emotion classification, is organized as follows. First, Section 2 examines the domestic and foreign emotion models heretofore researched, and introduces the emotion ontology that was chosen for the purposes of the present investigation. Section 3 demonstrates the use of that emotion ontology along with the methods of extracting emotion features and of evaluating the performances of those features. Section 4 evaluates the performances of the models used in this study, and Section 5 draws conclusions.
Keywords-emotion classification, lyrics, feature selection, emotion ontology, text mining
I.
INTORODUCTION
These days, the productivity of digital sound sources is increasing with the development of computers and Internet technology. More recently, and complementarily, the introduction of smart phones has enhanced access to music. Many companies in fact are endeavoring to take financial advantage of this phenomenon by offering their customers music through such phones. The existing music search methods for numerous music sources have expanded from the search for basic information on music to recommendation of music desired by users. Music information searches bundle genre, title, lyric, singer and album name information into metadata, for which an index is then composed. As regards methods for recommending music, there is popularity measurement by algorithms, in order of recent registration and the method of user definition album that users choose music in advance. Recently, a method of recommending music according to the particular atmospheres and emotions preferred by users has become popular. This type of system requires extensive classification of music. When music is classified according to emotions such as pleasure, sadness, and love, and genres such as classic, and jazz, it is possible to recommend music that is suitably tailored to users’ needs and situations [1]. Classification of musical genres has long been studied, but that of music-derived emotions has interested researchers only recently [2][3][4]. Previous inquiries into the 1082-3409/11 $26.00 © 2011 IEEE DOI 10.1109/ICTAI.2011.165
II. RELATED WORKS Classification of music emotions first requires the choice of an emotion model. Considerations in that regard include the accuracies with which models represent emotions and emotions reflect actual music, respectively. Most of the existing research on music emotion classification has utilized Thayer model [6] or the Tellegen-Watson-Clark model [7]. The Thayer model is simple and efficient, including two pillars representing stress and energy. Accordingly, the model expresses listeners’ feelings of music’s strength, tune, and tempo two-dimensionally. The research involving text-based treatment of emotion has been conducted through classification of emotion polarity (positive, negative or neutral) and of more specific emotions regarding various corpora. As for the research on emotion polarity, there is classification using mechanical learning methods such as Naive Bayes and SVM [8][9] on 960
system defined to make possible the logical analysis of human emotion. The emotion expression library is a database established through analysis and tagging of description-type text and conversation-type text using the emotion classification framework. Yoon & Kwon established their text-based ontology based on the existing emotion research. The research of Plutchik [18] was accepted, but through problem analysis, and classification of name change and intensity, eight basic emotions and 17 combination emotions were redefined, and regarding intensity changes of five stages and combination emotions, there were intensity changes of three stages. The attributes of emotion include polarity (positive, negative, and neutral), the emotions of subjects (speakers and listeners), the types of skill (linguistic and non-linguistic), and the corresponding methods (descriptive, expressional, and iconic). For the types of skill and methods, there are six classifications, and with emotion differentiation of 25 emotion specifications, intensity and attributes of polarity, emotions shown in texts were clearly classified through those who experience, subject of skill and method, and linguistic attribute, which is suitable for selection of text-based emotion classification framework because of design appropriate for multimodal environment. As for the classification of lyrics, Yoon & Kwon’s framework, which is appropriate for extracting emotions from text and expressing textually expressed emotions, was used. The present study enhanced the accuracy of emotion extraction by applying rules governing the linguistic characteristics of the Korean language. Controlling canonical-form emotion extraction by means of generated rules can reduce emotion extraction error with respect to expressions whose emotions are removed or changed depending on the surrounding lexical context. Three hundred (300) songs were randomly collected, and changes of emotions according to the lyrics’ relations with emotion vocabularies and contextual vocabularies were observed. On that basis, four kinds of syntactic analysis rules were formulated, as follows. Negative word combination: Negative sentences are formed with negative adverbs such as ‘ani (not)’ or ‘mot (not)’ or declinable words such as ‘anida’, ‘anihada (anda)’, ‘mothada’, or ‘malda’. Negative sentences can be interpreted for many meanings, and there are dangerous factors in establishing opposite feeling in emotion specifications Time of emotions: In Korean language, the present, past and future tenses are expressed through the prefinal ending words ‘-neun-’, ‘-eot-’, and ‘-get-’, respectively. Emotion expression vocabularies that use ‘-neun-‘, indicating the present tense, can be expressions showing the emotions of the present speaker, but ‘-eot-’ indicating the past tense, or ‘get-‘, indicating the future tense (guess, will or expression), can incur emotion-expression errors. Emotion condition change: Emotion expression vocabularies and verbs of changing emotion condition such as ‘jinanda (pass), tteonada (leave), and sarajida (disappear)’, are combined in sentence structures expressing the present emotion; this can result in emotion-expression error, as it can show the emotion change of the present
blogs, newspapers, movie reviews and product reviews, classification using dictionaries such as WordNet [10][11], and classification of polarity using natural language processing. As for the research on more specific emotion classification, work on 9-12 emotions in an emotion-rich conversation-type corpus using machine learning is being conducted [12][13]. For enhanced classification performance, research on negative words treatment and the word weight application method [14], and on effective feature selection, is ongoing [15]. Machine learning used and as for used feature groups, vocabulary, transcendental emotion vocabulary and context feature are used. Use of these features has the disadvantage that satisfactory performance requires learning large volumes of data, and, in natural language processing, negative word treatment is limited. As for language resources, there is SentiWordNet, which was established through the tagging of three values (positive, negative and neutral) to the meanings of words in WordNet 2.0 (in English-speaking areas) [16]. However, music emotion classification is not about specific, positive, negative and neutral tags. In the Korean language for example, there has been no such language resource establishment; indeed, recently, Yoon & Kwon introduced a text-based emotion ontology that defines an emotion framework and emotion library [17]. The present study examined emotions extracted through application of the syntactic analysis rule, and classified them on the basis of lyrics according to Yoon & Kwon’s emotion ontology.
Figure 1. Yoon & Kwon’s emotion ontology
III.
EMOTION FEATURE EXTRACTION UTILIZING EMOTION ONTOLOGY
Yoon & Kwon’s emotion ontology includes an emotion classification framework and an emotion expression library. The emotion classification framework is a classification
961
lowered. This shows that the rule applying the syntax characteristic worked greatly. The following section examines the emotion extraction method and emotion feature performance. The extracted emotion feature was used as a learning feature in mechanical learning and utilized in the lyrics-based classification of emotions.
speaker. 25 verbs which change emotion condition change are defined and emotion vocabularies were written together, there was limit on emotion extraction. Interrogative sentence: When song lyrics form an interrogative sentence, it does not express the emotion of a speaker, but rather asks about the fact of the emotion, and thus requires confirmation. Emotion extraction accuracy for music emotion classification and emotion-feature use is important. Performance evaluation evaluates emotion feature extraction. A comparison is made before and after the application of the rule reflecting the syntactic characteristic, the efficiency of the rule is shown, and a test referencing external data shows the generality of the rule. The test corpus was largely composed of corpus A (the lyrics of 300 songs) and corpus B (the lyrics of 225 songs). Emotions in the corpus were tagged, and test data was composed. Overall, 2,714 emotions were extracted from the 300 songs of corpus A, along with another 2,344 from the 225 songs of corpus B. Table 1 shows that when only the library of the surfaceform emotion was used, the precision rate was high, but that in recall rate, the performance was low. TABLE I.
IV.
Emotion features extracted from the lyrics were utilized to classify the emotions of lyrics. The corpus used in the experiment was established by evaluating representative emotions in the texts of 252 random Korean-language songs using a tagger. Because the same song can elicit different feelings, three persons participated in the experiment. TABLE IV.
Use of canonical form
Use of form + canonical form
Precision
72.38%
48.16%
62.46%
Recall
24.43%
65.96%
70.08%
No. of songs
Tagging of different emotions
TABLE II.
PRECISION/RECALL COMPARISON FOR CORPUS A BEFORE AND AFTER APPLICATION OF SYNTACTIC ANALYSIS RULE
Precision
After application of syntactic analysis rule 72.96%
Recall
70.08%
70.01%
Table 2 shows that there was an improvement of 10.5% through the application of the syntactic analysis rule. Table 3 shows a comparison of the precision/recall rates before and after the application of the rule to corpus B. TABLE III.
PRECISION/RECALL COMPARISON FOR CORPUS B BEFORE AND AFTER APPLICATION OF SYNTACTIC ANALYSIS RULE
Precision
Before application of syntactic analysis rule 62.46%
After application of syntactic analysis rule 72.96%
Recall
70.08%
70.01%
213 425 100
All three persons tagged the same feeling with regard to 213 songs (41%), and more than two persons showed a correspondence for 425 songs (81%). In order to confirm the reliability of the experimental corpus, the 425 songs about which more than two persons showed a correspondence were used through the use of experiment corpus. Table 5 shows its composition. Use of Yoon & Kwon’s emotion ontology has the advantage of classifying emotions into groups (through emotion groups and emotion mapping) when there is a need for specific classifications. When more than two persons showed a correspondence for 425 songs, the corresponding emotions and emotion information of the other persons, or information locating emotions on an emotion palette, can be used to distinguish among them. Therefore, as shown in Figure 2, the emotions were divided into eight groups. The commonly employed mechanical learning methods include NB (Naive Bayes), SVM (Support Vector Machine), and HMM (Hidden Markov Model). NB is the most representative probability model, and assumes strong independence among learning features. SVM is the model finding hyper-plane, and best classifies data by difference; accordingly, it is frequently used in face recognition and emotion classification. Moon & Zhang used HMM (Hidden Markov Model) to analyze many emotions expressed in one sentence, but the present research used HMM to reflect the information on time flow. HMM uses state and state transition probabilities to determine output, and this can reflect information on time flow. For the evaluation index, accuracy through 10-fold crossvalidation was used. The extent to which music was accurately classified was determined according to the ratio of music with accurate classification of meaning to the total number of songs and in reference to the evaluation index.
Next, the emotion libraries of the surface and canonical forms were used, and the emotion extraction performances were compared.
Before application of syntactic analysis rule 62.46%
NUMBER OF SONGS IN EMOTION TAGGING
Correspondence of three persons Correspondence of two or more persons
PRECISION/RECALL RATE COMPARISON FOR CORPUS A (EMOTION LIBRARY) Use of form
EMOTION CLASSIFICATION USING EMOTION FEATURE
In the results, there was an improvement of about 7% in the precision rate, and the recall rate was not significantly
962
TABLE V.
NUMBER OF SONGS DEPENDING ON REPRESENTATIVE EMOTIONS OF LYRICS (425 SONGS)
No.
Representative emotion
No. of songs
No.
Representative emotion
No. of songs
1
LOVE
106
11
ANTICIPATION
4
2
REMORSE
99
12
GRATEFULNESS
3
3
SADNESS
83
13
BOREDOM
3
4
OPTIMISM
53
14
PESSIMISM
2
5
DISAPPOINT MENT
22
15
SENTIMENTALI TY
2
6
JOY
18
16
CURIOSITY
2
7
TRUST
7
17
FEAR
2
8
PRIDE
6
18
GUILT
1
9
ANGER
6
19
CYNICISM
1
10
ANXIETY
4
20
CONTEMPT
1
Total
TABLE VI.
ACCURACY COMPARISON DEPENDING ON LEARNING METHOD AND FEATURE SELECTION
Learning feature Learning method
Emotion vocabulary
Before application of syntactic analysis rule
After application of syntactic analysis rule
NB
35.8%
38.8%
44.2%
SVM
32.9%
48.7%
53.6%
HMM
33.9%
43.6%
52.2%
Extracted emotion
Compared with NB, the BOW (bag of words)-based probability model showed the high performance of HMM, and the analysis indicates that there was a significance to the time flow in the lyrics. That is, users feel differently according to the order of expressed emotions. The tendency of the reference experimental corpus shows, for instance, that when the emotion of appeared and appeared later, the emotion of was strong, and when the emotion of appeared after the emotion of , the emotion of was strong. HMM can show this information on emotion flow. Among the several mechanical learning methods, SVM showed the highest performance, because, compared with NB and HMM, SVM is freer from the problem of overfitting. As noted above, when emotion extraction is limited by the rule reflecting syntax characteristics, limited information can carry great meaning. For example, in one song, when there were many expressions used together with the canonical forms of Ā love ā and Ā sadness ā , negative words, the past, and the verbs of emotion change, there was a significant possibility that feelings remote from and would be felt. In order to examine this through experimentation, emotion information with controlled extraction was added for learning, and the results are shown in Table 7.
425
Group 1 - , , Group 2 - , Group 3 - , Group 4 - , , , , Group 5 - , , Group 6 - Group 7 - , , , Group 8 - , , , , Figure 2. Emotion groups
The experiment utilized vocabularies used in the existing researches as learning features, and emotion ontology suggested in this study for performance comparison experiment about the use of extracted emotion as learning feature. As for the selection of vocabularies, the emotion vocabularies suggested in [16] were used when vocabularies were selected as the learning features. Information on attributes used in extracted emotions was in the forms of emotion specification values and emotion polarity values. When an emotion was used as a feature, the cases before and after the application of rules reflecting syntax characteristics were classified, and the results are shown in Table 6. As shown in Table 6, when emotion vocabularies were used as features, the performance, depending on learning method, was 32.9% ~ 35.8%, not much higher than 25%, the result for the case where all of the results were classified as . The reason is that the learning data was very small, 425 songs, compared with the number of emotion vocabulary, about 2,000. On the other hand, when an extracted emotion was used as a feature, the performance was 38.8% ~ 48.7%, and 44.2% ~ 53.6% for the cases before and after application, respectively, of the rule reflecting syntax characteristics, indicating that a small amount of learning data can result in high performance.
TABLE VII.
SYSTEM ACCURACY WITH ADDED EMOTION FEATURE CONTROLLED BY SYTATIC ANALYSIS RULE
Learning method NB SVM HMM
Emotion controlled by rule is added as learning feature 45.1% 58.8% 54.2%
Compared with the case before addition of the emotion feature controlled by the rule, there was a performance improvement of 1% ~ 5%, indicating that learning feature of great extraction rule can be made. A comparison was drawn between the proposed emotion classification system and that of Hu and Downie [19], as the proposed system itself showed performance differences depending on emotion classification, social environment, and answer corpus creation. Table 8 shows that in the system of Hu and Downie, 2,829 songs were used to classify emotions into 18 kinds, with an accuracy of 60.4%. The proposed system used the data of 425 songs along with the learning feature, and in 8 and 25 emotion classifications, the accuracy
963
ACKNOWLEDGMENT This work was supported by the Korea Science and Engineering Foundation (KOSEF) grant funded by the Korea government (MOST) (No. 2010-0028784).
rates were 58.8% and 53.6% respectively, indicating great performance for small data. TABLE VIII.
PERFORMANCE COMPARISON WITH OTHER SYSTEM (OF HU AND DOWNIE)
Classification
Hu and Downie
Proposed system
Number of songs
2,829 songs
425 songs
Learning model
SVM
SVM
Learning feature Number of emotion classifications Accuracy
vocabulary
emotion
V.
REFERENCES [1]
[2]
18 kinds
8 kinds
25 kinds
60.4%
58.8%
53.6%
[3]
[4]
CONCLUSIONS AND FUTURE RESEARCH
This study used emotion ontology and mechanical learning methods to investigate the process of emotion classification. Unlike the existing research, which has used vocabularies, the present study used emotions extracted from lyrics as learning features. As for the emotion extraction method for learning features, established emotion ontology of wide use was utilized, and category use was limited depending on the types of skill and the corresponding methods. Additionally, differences of conversation-type and lyrics were analyzed, and the emotion library, accordingly, was selectively employed. For enhanced emotion extraction performance, classification was performed with the surface and canonical forms. When canonical form was extracted, the extraction form based on the syntax characteristics was created and applied for consideration of the surrounding context. In the experiment using the method proposed in this study, the performance was shown to be 58.8% (SVM) with a small amount of learning data (425 songs). This suggests that compared with the other systems, which used learning data composed of 2,829 songs and showed a performance of 60.8%, the proposed method was efficient. This study used the proposed four extraction rules, showing 73% and 70% precision and recall rates of emotion extraction, respectively. This suggests that there was insufficiency in the extraction rule, and it is expected that through supplementation of the rule, the performance of lyric emotion classification will be improved. Also, it should be noted that emotions can be felt differently depending on individuals’ dispositions. Experimentation through significant amounts of data established by experimental participants having various characteristics, therefore, is important, and this is a priority for future research. There are various emotions felt when listening to a song. Emotions such as tranquility and vigor can be felt in melody, and sadness and loneliness can be felt in lyrics. Users use this information without classification in their search for songs, but the proposed system needs to meet their needs. Analysis of users’ query and mixture of emotion classification of lyrics will lead to a music recommendation system that is much more satisfactory to users.
[5]
[6] [7] [8]
[9]
[10] [11]
[12]
[13]
[14]
[15]
[16] [17]
[18]
964
Rainer Typke, Frans Wiering, Remco C. Veltkamp "A survey of music information retrieval systems" Multimedia Information Retrieval, 2005 T. Li and M. Ogihara "Detectin gEmotion in Music" The International Society for Music Information Retrieval, 2003, Y Feng, Y Zhuang, Y Pan "Music information retrieval by detecting mood via computational media aesthetics", IEEE/WIC international Conference on Web Intelligence, 2003 Min-Joon Yoo, Hyun-Ju Kim, In-Kwon Lee "Music Exploring Interface using Emotional Model", Human Computer Interaction 2009, 2009 So-Young Jin, Byungchuel Choi, "The Influences of Song Lyrics for the Perceived Preferences of Music Listening", Korean Journal of Music Therapy Vol.8 No.1, 2006 R. Thayers. "The biopsychology of mod and arousal", Oxford University Press. 1989. A. Tellegen, D. Watson and L. Clark. "On the dimensional and hierarchical structure of affect." Psychological Science, 1999 B. Pang, L. Lee and S. Vaithyanathan, "Thumbs up? Sentiment Classification using machine Learning Techniques", Empirical Methods in Natural Language Processing, 2002 K. Dave, S. Lawrence, D. M. Pennock, "Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews,"Proceedings of the 12th international conference on World Wide Web, 2003 WordNet (http://wordnet.princeton.edu) J. Kamps, M. Marx, R.J. Mokken, and M.D. Rijke, "Using WordNet to measure semantic orientation of adjectives" Proceedings of the 4th International Conference on language Resources and Evaluation, 2004 Sungsoo Lim, Sung-Bae Cho "User Emotion Inference From SMS Based on Computing with Words", Proceedings of the Korea Computer Congress 2007 Vol.34 No.1, 2007 Hyun-Ku Moon, Byoung-Tak Zhang "Emotional States Recognition of Text Data Using Hidden Markov Models", Proceedings of the Korean Institute of Information Scientists and Engineers, Vol.28 No.2, 2001 Yuchul Jung, Yoonjung Choi, Sung-Hyon Myaeng, "A Study on Negation Handling and Term Weighting Schemes and Their Effects on Mood-based Text Classification", The Korean Journal of Cognitive Science Vol.19 No.4, 2008 Hongmin Park "Determination of the Effective Feature Set and Category for the Emotion Classification of Korean Utterance", Master's Thesis of Sogang University, 2009 Bruno Ohana, Brendan Tierney. "Sentiment classification of reviews using SentiWordNet, Dublin Institute of Technology, 2009 Aesun Yoon, Hyuk-Chul Kwon "Component Analysis for Constructing an Emotion Ontology", The Korean Journal of Cognitive Science Vol.21 No.1, 2010 Plutchik R, Kellerman H. EMOTION Theory, Research, and Experience, Academic press, 1980 Xiao Hu, J. Stephen Downie, Andreas F. Ehmann, ͆ Lyric Text Mining in Music Mood Classification,͇ Proceeding of International Society for Music Information Retrieval, 2009