Music Information Retrieval Using a GA-based Relevance Feedback

Music Information Retrieval Using a GA-based Relevance Feedback∗ Seungmin Rho1, Eenjun Hwang2, Minkoo Kim3 1

Graduate School of Information and Communication, Ajou University, Suwon, Korea [email protected] 2 School of Electrical Engineering, Korea University, Seoul, Korea [email protected] 3 College of Information Technology, Ajou University, Suwon, Korea [email protected] Abstract

Recently, there has been an increased interest in the query reformulation using relevance feedback with evolutionary techniques such as genetic algorithm for multimedia information retrieval. However, these techniques have still not been exploited widely in the field of music retrieval. In this paper, we propose a novel music retrieval scheme that incorporates user relevance feedback with genetic algorithm to improve retrieval performance and develop a prototype system based on it. Our system also provides interesting easyto-use graphical user interfaces. For example, users can browse and play query results easily using markers in the music indicating those matched parts for the query. By performing various experiments, we show the effectiveness and efficiency of our proposed scheme.

1. Introduction With the explosive expansion of digital music and audio contents, efficient retrieval of such data is getting more and more attention, especially in the large-scale multimedia database applications. In the past, music information retrieval was based on textual metadata such as titles, composers, singers or lyrics. However, various metadata-based schemes and techniques for music retrieval have shown limitations due to various reasons such as user's incomplete knowledge or personal bias. Compared with traditional keyword-based music retrieval, content-based music retrieval provides more flexibility and expressiveness to retrieve audio/music data. Content-based music retrieval is usually based on ∗

a set of extracted music features such as pitch, duration and rhythm. QBH (Query by Humming) is a popular content-based retrieval method for large music database. QBH system can take a user’s acoustic input (a short clip of singing, whistling or humming) via a microphone and then retrieve the desired song from a music database. It is very useful when you want to find a song or music segment from music database but forget its title or name of singer. One common approach for developing contentbased music retrieval is to represent music as a string of characters using three possible values for the pitch change: U(p), D(own) and S(ame) or R(epeat). In our previous work [1, 2], we presented the limitation of the existing UDR notation based on the pitch contour and proposed another notation such as uUdDr and LSR to overcome the restriction of the UDR notation. The music retrieval system should also consider the response time and storage space for indexing music data due to the large amount of music data. Therefore, we also proposed a dynamic index scheme called FAI from frequently queried melody tunes for fast query processing in our previous work [1, 2]. Relevance feedback (RF) is a well known technique from information retrieval (IR) that reformulates a query based on the documents which is selected by the users as relevant. Relevance feedback has been widely adopted to improve the performance of both text and multimedia information retrieval in recent years. Many RF methods, which are adopted in CBIR (ContentBased Image Retrieval) system [3, 4], have been much studied since those methods were first emerged in text retrieval system. However, there have been rather few works which have used RF methods in music

This work was supported by the MIC, Korea, under the ITRC support program supervised by the IITA and Seoul R & BD Program and this research is also supported by the ubiquitous Computing and Network (UCN) Project, the Ministry of Information and Communication (MIC) 21st Century Frontier R&D Program in Korea.

2007 International Conference on Multimedia and Ubiquitous Engineering(MUE'07) 0-7695-2777-9/07 $20.00 © 2007

information retrieval for improving their retrieval performance. Genetic Algorithm (GA) is a powerful problemsolving technique in artificial intelligence based on the Darwin's theory of evolution and the principles of biological inheritance. Very few researchers have tried to use evolutionary algorithms like genetic algorithms in the field of music information retrieval. Previous attempts [5, 6] to use GA have only focused on the automatic music composition, but not considered adaptation of the query melody representation. In this paper, we present a novel music retrieval scheme which supports GA-based relevance feedback mechanism to improve the quality of query results by reformulating users’ query. Even more, the system provides an easy-to-use querying and result browsing interfaces to improve the query usability. The rest of this paper is organized as follows. In Section 2, we present an overview of the ongoing researches for constructing MIR systems. Typical issues in the system design are introduced in Section 3. Section 4 presents the query reformulation methods and GA algorithm used in the system. In Section 5, we describe our prototype system and report some of the experimental results. Section 6 concludes this paper and points out our future directions.

information. However, this system used only three types of contour information to represent melodies. MELDEX [8] system was designed to retrieve melodies from a database using a microphone. It first transforms acoustic query melodies into music notations, and then searches the database for tunes containing the hummed (or similar) pattern. This webbased system provides several match modes including approximate matching for interval, contour and rhythm. Hoashi et al. [9, 10] used relevance feedback for music retrieval which is based on the tree-structured vector quantization method (TreeQ), developed by Foote [11]. The approach of the TreeQ method is to train a vector quantizer instead of modeling the sound data directly. Habich et al. [12] developed Eyes4Ears music application server system which offers methods for keyword-based as well as content-based search. The system allows the adaptation of the similarity measure to the user’s intuition of the query. Nelson [13] tries to find and compose structured musical works using GA. This study uses fitness functions to generate musical works. Johanson [14] also tries to compose music automatically using GP (Genetic Programming) with fitness functions.

3. Overview of the proposed system

2. Related work In this section, we review some of the researches and systems for music information retrieval. There exist several prototypes Music Information Retrieval (MIR) systems. For instance, Ghias et al. [7] developed a QBH system that is capable of processing an acoustic input in order to extract the necessary query

Figure 1 shows the overall system architecture of our prototype system for music retrieval. The system consists of three main components: Interface, Analyzer, and GA Engine. Typical query processing procedures can be described as follows:

Figure 1. Overall architecture of proposed prototype system


User first makes an initial query using one of four different user query interfaces: QBE, QBH, QBMN and QBC. These interfaces will be explained in detail in Section 5. When a user query is given, Analyzer interprets the query as a signal or a sequence of notes, and then extracts audio features such as pitch and time contour. Then, those extracted features are transformed into uUdDr and LSR string. For the transformed string, the FAI index is first looked up and then the music database is searched for if the index lookup has failed. If a match is found in the index, it increases the entry’s reference counter and puts it into the result set. If a melody is found from the database and the user confirms it, the query tune is inserted into the FAI for that music and its reference counter is initialized. More detailed description for the FAI indexing scheme can be found in Rho and Hwang [2]. Matched melodies are displayed in the order of their rank on the browse interface. When the user selects a melody or its segment as the most relevant one, GA engine generates new music segments and evaluates fitness of each segment based on our genetic algorithm. Modified query is generated by the user’s relevance judgment via feedback interface, and then the whole query process will be repeated until the user is satisfied.

, where M is the music object in population P, n is the number of matched melody segments in each music object M. The relevance value ranges from 0 to 1. Here, value ‘1’ indicates when the music is fully relevant to the user’s query and ‘0’ does the opposite. The Levenshtein Distance (LD) [15] is a measure of the distance between two strings by the number of deletions, insertions, or substitutions required to transform one into the other. Genetic_Algorithm(query) { initialize population P; evaluate fitness(P); while(not termination-condition) { c_genomes = select query and Gr in P; crossover_result = Crossover(c_genomes); m_genome = select Gr in P; mutation_result = Mutation(m_genome); add(P, crossover_result); add(P, mutation_result); P.sort(); } } Crossover(genomes) { g1 = genomes[1]; g2 = genomes[2]; g_locus = Sim_Func(g1, g2);

4. Genetic Algorithm in Music Retrieval As we mentioned above, we have implemented a GA-based relevance feedback scheme into our system to improve retrieval performance. Figure 2 shows our genetic algorithm. The algorithm starts with initializing the population and then evaluates this population using a fitness function that returns a value for each chromosome indicating its quality as a solution to the problem. We calculated the fitness value of the chromosome with the following formula: Fitness =

N

1 N

∑ relevance ( M

i

)

1

i

∑ i ∑ relevance ( M i =1

j =1

j

)

(1)

i =1

, where N is the total number of music objects retrieved from population P and relevance(M) is a function that returns the relevance of music M. The equation for the relevance function is: n

relevance ( M ) =

∑ i =1

QueryLengt h − Cost ( LD ) QueryLengt h n

(2)


if (1 < g_locus