AUTOMATIC CONTENT-BASED HYPERMETRIC RHYTHM RETRIEVAL APPROACH Jaroslaw Wojcik
Dmitry Zhatukhin
[email protected]
[email protected]
Wroclaw University of Technology Department of Information Systems Wybrzeze Wyspianskiego 27, Wroclaw 50370, Poland ABSTRACT Recurrence of melodic and rhythmic patterns in various representations of music and a hybrid method of a hierarchical rhythm retrieval, employing the most promising set of ranking methods, have been conceived. On the basis of this novel approach and also authors’ former research in the area of metric rhythm concerning the rhythmic salience of sounds, an application called DrumAdd, accepting symbolic representation of music on input, is proposed. The system generates automatically a drum accompaniment to a given melody on the basis of hypermetric hypothesis ranked as the first one among all hypotheses. In the paper other studies on rhythm retrieval are described in aspect of their applicability in the system of automatic drum accompaniment. Details on experimental setup and results obtained are presented, conclusions are delivered concerning the quality of the engineered methods. Keywords – automatic drum accompaniment, contentbased music analysis, hypermetric rhythm retrieval, automatic music indexing 1.
INTRODUCTION
Early music retrieval methods were based on text indexes associated with musical files, the indexes have been prepared by humans and full-text retrieval methods were used to search within the indexes. Currently, the researchers work on content-based retrieval models – in this class of methods characteristic features of musical objects are extracted automatically. Music can be retrieved within the aspect of the most informative elements of a musical piece, i.e. melody or rhythm. Engineered melody information retrieval methods work well already, and commercial systems are successfully introduced. Authors of this paper analyze the existing rhythm retrieval approaches and propose their method for the retrieval of the hypermetric structure of rhythm, consisting of long-period rhythmic levels such as phrases, sentences and periods. On the basis of formal rhythmic analysis, done automatically on the basis of musical content, authors propose a prototype system, creating a drum accompaniment to a given melody, automatically.
The fundamental terms in musical rhythm retrieval area are accent and rhythm, the problem with both notions is that they are defined in a number of various ways. Lerdahl and Jackendoff in their General Theory of Tonal Music [8] define three types of accents: phenomenal, structural and metrical. According to the authors of this paper, the phenomenal accent, associated with the duration of notes, is the type of accent which can be used in computational approaches retrieving the entire hierarchical structure of rhythm. This type of accent has a metrical nature, which means that it is associated with a certain rhythmic level. If a musical event is accented at a rhythmic level, it will also be accented at lower rhythmic levels, this dependency, however, does not always work in the opposite direction. According to Fraisse [5] “a precise, generally accepted definition of rhythm does not exist”, Cooper & Meyer [2] propose a textual description of rhythm, which is „the way in which one or more unaccented beats are grouped in relation to an accented one”, in their proposition, each pattern consists of one or two unstressed (unaccented) beats, grouped around one accented beat – five basic rhythmic groupings proposed are: iamb, anapest, trochee, dactyl and amphibrach. The term rhythm is also commonly used in understanding of the type of pattern occurring most frequently in a given piece, thus the pattern determines the genre of the piece, which is qualified to be e.g. waltz or tango. Although the notion of rhythm is understood in a few ways, it is always associated with time (temporal) domain aspect of music, as melody is related to the frequency domain. A sequence of equally spaced rhythmic events is called rhythmic level by authors of this paper. Rhythmic level is characterized by the period and the onset of the first sound belonging to the level. The period is constant for each rhythmic level. Atomic period is a length of the rhythmic level with the smallest period in a piece. Levels are grouped in a hierarchical relation, two rhythmic levels are related, if the natural multiply of a period of a rhythmic levels equals to a natural multiply of the period of the latter rhythmic level. A set of related rhythmic levels is rhythmic hypothesis, the highly ranked hypothesis is assumed to be the correct rhythm of a piece.
2.
RELATED WORK
Since the notion of rhythm may be understood in a number of ways, the tasks of computational rhythm retrieval are ambiguously formulated by various researchers. The simplified approach to this task may be reduced to retrieving the sequence of onset times and/or durations of sounds from the performance musical data – this process is called quantization. In another approach, the time signature is retrieved on the basis of musical content, in this class of methods usually the period of time is found, which divides the stream of sounds into repeating fragments. This task may additionally be combined with phenomenal accent retrieval in such a way, that the phase of a phenomenal accentuations in a piece is found. If the accentuations found line up with locations where humans tap the foot to the melody, it may be concluded that bar lines are found – the rhythmic level of the size equal to the meter is thus found. The next complication is to retrieve metric rhythm i.e. the hierarchic structure of related rhythmic levels. If such a structure reaches high rhythmic levels such as phrases, sentences and periods, hypermetric structure of rhythm is retrieved – on its basis, automatic drum accompaniment applications can be developed. In the real-life performances the onsets of sounds are not equally spaced, because of performer’s inaccuracies in playing. The process of rounding the IOIs and durations of sounds to the time grid is named quantization. The example of a constant-tempo quantization method is by Curtis [3]. In work by Parncutt [9] the phenomenal accents are retrieved with the use of exponentially-thresholded function taking time, amplitude, timbre and pitch into account. The idea of oscillators, primarily used to model a number of biological processes such as heartbeat, is also used for musical rhythm retrieval. The method to find downbeats in rhythmic patterns by a network of nonlinear oscillators is proposed e.g. in work by Eck [4]. In the work by Brown [1], autocorellation function is used to determine the meter. This method calculates the occurrences of musical events – the peak of the function indicates the events located on the downbeat of a measure (the location where the measure begins). The experiments were conducted for the single melodic lines extracted from the score. The sounds are given weights depending linearly on their durations. For sixteen out of nineteen pieces, the correct meter was found. The approach of metrical rhythm finding method is engineered by van Zaanen at al. [11]. In this research rhythms are represented by metrical trees, a few memory-based models are examined in aspect how they can be adopted to parse metrical structure of rhythm. Methods were evaluated on the 16-note fragments from 105 national anthems, played with various meters. Dopdis parser was tested in experiments, 50% of beats were found correctly.
In the work by Goto [6] a beat tracking system is proposed, recognizing the hierarchical structure of rhythm up to a measure level. The system processes in real time popular music sampled from compact discs, the method also deals with audio performances if they contain drum tracks. Goto notices the possibility to create a system of automatic drummer on the basis of metric rhythm retrieved. Authors of preference rules approach, Temperley & Sleator [10], retrieving metrical rhythm, built their model on the basis of three rules (event, length and regularity). In this method the higher rhythmic levels retrieved reach phrases or sentences (two levels above the tactus). The method engineered by the authors of this paper generates the rhythmic hypotheses to the level, whose length is equal to the half of a piece. For pieces having about twenty tacts or more, motives, phrases, sentences and rhythmic periods may be found, endings of high rhythmic levels such as sentences indicate locations, where automatic drum accompanist inserts a fill-in. 3.
HYPERMETRIC RHYTHM RETRIEVAL
The authors of this paper use the methods proposed in the former research by Kostek & Wojcik [7] i.e. periodicity of melorhythmic patterns and saliencebased method of ranking rhythmic hypotheses. For each rhythmic hypothesis, eight ranking positions are calculated – one per each ranking method. Ranking methods together with their accuracies are: (A) sound duration-based method (85,3%), (B) repeating melodic patterns with melody created of upper sounds of the chords, melody represented as a sequence of frequencies (86,4%), (C) repeating melodic patterns with melody created of upper sounds of the chords, melody represented as a sequence of intervals (78,1%), (D) repeating melodic patterns with melody created of upper sounds of the chords, melody represented as a sequence of directions of intervals (72,9%), (E) repeating melodic patterns with melody created of lower sounds of the chords, melody represented as a sequence of frequencies (83,8%), (F) repeating melodic patterns with melody created of lower sounds of the chords, melody represented as a sequence of intervals (75,1%), (G) repeating melodic patterns with melody created of lower sounds of the chords, melody represented as a sequence of directions of intervals (72,2%), (H) repeating rhythmic binary patterns (83,9%). Experiments conducted in that research indicate that recurrences of patterns in various representations is a promising direction towards
hypermetric rhythm retrieval. ‘Sequence of intervals’ and ‘sequence of directions of intervals’ appeared to be too much simplified representations in the task of retrieving the entire hypermetric structure. In this paper, the consecutive simplifications of music are verified experimentally in aspect of their usability to obtain the time signature from a polyphonic piece. 3.1. Approximate Intervals Melody Representation In this research, the recurrence of patterns in representations (A)-(H) is applied to the task of time signature retrieval. New representations are also proposed – intervals are reduced and classified depending on their size and shift. The idea of approximate intervals, is presented in Table 1. The first column in the table, C(1_0), presents the consecutive intervals, expressed in semitones, which (B) C(1_0) -2 -1 0 1 2 3
is analogous to the ‘intervals’ representation. The second column simplifies intervals to the entire tones. Since intervals approximated to the entire tones (range of two semitones) can be rounded to the upper semitone or the lower one, two shifts are distinguished in the table, namely C(2_0) and C(2_1). Methods work analogically with approximation to the range of three semitones, in this case there are three possible shifts. The next column in the table would be C(4_0). The values inserted in the table are equal to the lowest interval in each range, read from the first column. The ‘sequence of frequencies’ representation deals with recurrences of exactly matching patterns. ‘Sequence of intervals’, analogous to C(1_0), deals with patterns transposed, but intervals between consecutive sounds should be equal within one semitone. Further simplifications deal with patterns not exactly matched.
(I) C(2_0) (J) C(2_1) (K) C(3_0) (L) C(3_1) (M) C(3_2) -2 -3 -3 -2 -4 -2 -1 -3 -2 -1 0 -1 0 -2 -1 0 1 0 1 -1 2 1 0 1 2 2 3 3 1 2 Table 1. Approximate intervals music representation entire hypermetric structure. However, on the basis of 3.2. Experiments both propositions, it is possible to compare the adequacy of melodic patterns in various representations in The representations (A)-(H) together with retrieval of lower rhythmic levels (time signature) and in representations of approximate intervals (I)-(M), higher ones (hypermetric structure). proposed by the authors of this paper, are tested on a database of polyphonic MIDI files of 51 national Accuracy anthems. Methods are evaluated in the following way: Method [%] the hypothesis was acknowledged to be correct if the time signature (double or triple) was present in the (A) Duration-based 97,02 hypothesis. Since a number of hypotheses generated (B) Sequence of frequencies 99,59 contain the correct time signature, the ranking position RP of the hypothesis is a ranking position of highly (C) Sequence of intervals, C(1_0) 99,75 ranked among all hypotheses containing the correct time (D) Seq. of directions of intervals 99,82 signature. Each piece in the dataset receives the accuracy calculated. A single method is validated by (H) Binary rhythmic patterns 99,78 averaging the accuracy of a single hypothesis ranking (I) C(2_0) 99,75 function for all pieces. For each ranking method, hypotheses are sorted descending, according to the (J) C(2_1) 99,75 ranking values, each hypothesis gets a ranking position (K) C(3_0) 99,75 RP, belonging to the range 1..NoH, where NoH is the number of hypothesis. The accuracy of the method is (L) C(3_1) 99,82 equal to the expression calculated with Formula (1). (M) C(3_2) 99,75 RP − 1 Accuracy = 100% − (1) (N) C(4_0) 99,78 NoH (O) C(4_1) 99,75 Results delivered from experiments can be read from Table 2. Results expressed in percents presented (P) C(4_2) 99,75 above are not comparable to the ones by Kostek & (Q) C(4_3) 99,75 Wojcik, cited in the Section 3.2 of this paper, because the authors of this paper evaluated a method to retrieve Table 2. Accuracy of time signature retrieval methods the correct time signature from the hypermetric rhythm structure, whereas Kostek & Wojcik [7] investigated the
3.3. Conclusions and Future Work The results delivered in the previous section of this paper indicate, that duration-based method is not as good for the correct time signature retrieval as for the entire hypermetric structure. Another finding is that simplified representations i.e. (C) and (D) retrieve the time signature with higher accuracy than a sequence of frequencies (B). Both remarks above, concerning the time signature retrieval experiment, differ from the ones delivered from the entire hypermetric structure retrieval; it might be thus concluded that different methods should be used for time signature retrieval and for the entire hypermetric structure. Another reason for inconsistency of the results for time signature and entire hypermetric structure retrieval might be that another experimental corpus was used in both researches. Patterns represented as approximate intervals (I)-(Q) retrieve time signature with a good accuracy, approximately equal to (C) or (D) – this remark may indicate, that simplified representations of melodic patterns deal with one of basic compositional rules, namely approximate transpositions of intervals in consecutive melodic patterns. The remarks described above need further experimental verification on a larger database, however. Accuracy of a few hypotheses ranking methods is relatively high. For hypermetric structure promising methods might be (A), (B), (E) and (H). Time signature is retrieved with higher accuracy for methods (D), (L), (H) and (N). Thus, it is worth combining those methods and creating a hybrid approach, which could result in high accuracy of the final method. Hybrid approach could be stated as follows: A set of hypotheses ranking methods {M1, M2, … , Mp}, pieces {U1, U2, …, Uq}, and rhythmic hypotheses {H1(Uj), H2(Uj), …, Hr(Uj)} for each piece (Uj) are given. Hypothesis Hk is ranked with method Mi, the position occupied by Hk is RankPos(Mi). For each hypothesis of a piece, a sum of RankPos(Mi) is counted for all promising methods Mi in a set of promising methods PromMeth – in this case PromMeth ={A, B, E, H} or {D, L, H, N} for hypermetric rhythm or time signature retrieval, respectively. For each set of promising methods PromMeth, hypotheses are sorted ascending, according to the sum of ranking values. Then, accuracy of the hybrid is counted in an analogous manner to the accuracy of single method i.e. with aid of Formula (1). Hypermetric rhythm retrieval approach is successfully applied in the system DrumAdd, automatically generating drum accompaniment to the given melody in the way proposed by Kostek & Wojcik in their work [7].
4.
ACKNOWLEDGEMENTS
This work was supported by the Polish Ministry of Science and Education within the research project No. No. 3 T11F02729. We (authors of this paper) want to express our gratitude to the head of the project – Prof. Bozena Kostek from Gdansk University of Technology – for Her support and the influence on our intellectual growth. We also want to thank Marcin Grudzinski for the development of experimental environment and for a number of priceless advices. 5.
REFERENCES
[1] Brown, J., “Determination of the meter of musical scores by autocorrelation”, Journal of the Acoustical Society of America, 94(4):1953–1957, 1993 [2] Cooper, G. and Meyer, L., “The rhythmic structure of music”, University of Chicago Press, Chicago, 1960 [3] Curtis, R., “The Computer Music Tutorial”, MIT Press, Cambridge, 1996 [4] Eck, D., “A network of relaxation oscillators that finds downbeats in rhythms”, Tech. report IDSIA-06-01, IDSIA, Lugano, Switzerland, 2001 [5] Fraisse, P., “Rhythm and Tempo”, In The Psychology of Music, edited by Deutsch D., Series in Cognition and Perception. Academic Press, 1982 [6] Goto, M., “An Audio-Based Real-Time Beat Tracking System for Music with or without Drum-Sounds”, Journal of New Music Research 30:159–171, 2001 [7] Kostek, B., Wojcik J., “Automatic Retrieval of Musical Rhythmic Patterns”, 119th Audio Engineering Society Convention, 2005 [8] Lerdahl, F. and Jackendoff, R., “A Generative Theory of Tonal Music”, MIT Press, Cambridge, MA, 1983 [9] Parncutt, R.A, “A perceptual model of pulse salience and metrical accent in musical rhythms”, Music Perception, vol. 11(4), pp. 409-464, 1994 [10] Temperley, D., Sleator, D., “Modeling meter and harmony: A preference-rule approach”, Comp. Music J., 15 #1:10–27, 1999 [11] van Zaanen, M., Bod, R. and Honing, H., “A Memory-Based Approach to Meter Induction”, Proceedings of ESCOM, pp. 250-253, 2003