The performance of autocorrelation-based meter induction was tested with two large collections of ..... that were merely 2.4% and 1.6% lower than those ob-.
Autocorrelation in meter induction: The role of accent structurea) Petri Toiviainen and Tuomas Eerola Department of Music, P.O. Box 35(M), 40014 University of Jyväskylä, Jyväskylä, Finland
共Received 16 April 2005; revised 14 October 2005; accepted 7 November 2005兲 The performance of autocorrelation-based meter induction was tested with two large collections of folk melodies, consisting of approximately 13 000 melodies for which the correct meters were available. The performance was measured by the proportion of melodies whose meter was correctly classified by a discriminant function. Furthermore, it was examined whether including different melodic accent types would improve the classification performance. By determining the components of the autocorrelation functions that were significant in the classification it was found that periodicity in note onset locations was the most important cue for the determination of meter. Of the melodic accents included, Thomassen’s melodic accent was found to provide the most reliable cues for the determination of meter. The discriminant function analyses suggested that periodicities longer than one measure may provide cues for meter determination that are more reliable than shorter periodicities. Overall, the method predicted notated meter with an accuracy reaching 96% for binary classification and 75% for classification into nine categories of meter. © 2006 Acoustical Society of America. 关DOI: 10.1121/1.2146084兴 PACS number共s兲: 43.75.Cd 关DD兴
Pages: 1164–1170
I. INTRODUCTION
Most music is organized to contain temporal periodicities that evoke a percept of regularly occurring pulses or beats. The period of the most salient pulse is typically within the range of 400 to 900 ms 共Fraisse, 1982; Parncutt, 1994; van Noorden and Moelants, 1999兲. The perceived pulses are often hierarchically organized and consist of at least two simultaneous levels whose periods have an integer ratio. This gives rise to a percept of regularly alternating strong and weak beats, a phenomenon referred to as meter 共Cooper and Meyer, 1960; Lerdahl and Jackendoff, 1983兲. In Western music, the ratio of the pulse lengths is usually limited to 1:2 共duple meter兲 and 1:3 共triple meter兲. Meter in which each beat has three subdivisions, such as 6 / 8 or 9 / 8, is referred to as compound meter. A number of computational models have been developed for the extraction of the basic pulse from music. Modeling of meter perception has, however, received less attention. Large and Kolen 共1994兲 presented a model of meter perception based on resonating oscillators. Toiviainen 共1997兲 presented a model of competing subharmonic oscillators for determining the meter 共duple versus triple兲 from an acoustical representation of music. Brown 共1993兲 proposed a method for determining the meter of musical scores by applying autocorrelation to a temporal function consisting of impulses at each tone onset whose heights are weighted by the respective tone durations. A shortcoming of Brown’s 共1993兲 study is a兲
Portions of this work were presented in “The role of accent periodicities in meter induction: a classification study,” Proceedings of the 8th International Conference on Music Perception and Cognition, Evanston, IL, August 2004; and “Classification of musical metre with autocorrelation and discriminant functions,” Proceedings of the 6th International Conference on Music Information Retrieval, London, 2005.
1164
J. Acoust. Soc. Am. 119 共2兲, February 2006
that she does not provide any explicit criteria for the determination of meter from the autocorrelation functions. Although there is evidence that the pitch information present in music may affect the perception of pulse and meter 共Dawe et al., 1993; Thomassen, 1982; Hannon et al., 2004兲, most models of pulse finding developed to date rely only on note onset times and durations. Dixon and Cambouropoulos 共2000兲, however, proposed a multi-agent model for beat tracking that makes use of pitch and amplitude information. They found that including this information when determining the salience of notes significantly improved the performance of their model. Vos et al., 共1994兲 applied autocorrelation to the determination of meter in isochronous or almost isochronous music. They utilized a method similar to that proposed by Brown 共1993兲, except for using the melodic intervals between subsequent tones instead of tone durations as the weighting factor in the autocorrelation analysis. Using a corpus of 30 compositions by J. S. Bach, they found that the maxima of the obtained autocorrelation functions matched the respective bar lengths as indicated in the musical score. As the majority of music is nonisochronous, reliance on mere melodic interval structure in meter induction is probably a special case. In a general case, it can be expected that meter determination be based on both temporal and pitch structure. To address the question of the relative importance of temporal and pitch structure in meter determination, it would be necessary to use a rhythmically more variable set of stimuli. According to a commonly adopted notion, meter can be inferred from phenomenal, structural, and metrical accents 共Lerdahl and Jackendoff, 1983兲. Phenomenal accents are the primary source of meter and are related to the surface structure of music and arise from changes in duration, pitch, timbre, and dynamics. Pitch-related phenomenal accents, more
0001-4966/2006/119共2兲/1164/7/$22.50
© 2006 Acoustical Society of America
commonly referred to as melodic accents, arise from “changes in pitch height, pitch interval or pitch contour” 共Huron and Royal, 1996兲. Using a correlational study, Huron and Royal 共1996兲 investigated the extent to which different types of pitch-related accent correspond to respective metrical positions as indicated by musical notation. Of eight different types of melodic accent, they found that the empirically derived accent by Thomassen 共1982兲 was the only one to correlate significantly with metric position. The lack of correlation of the other accent types however, does, not exclude the possibility that various metric accents exhibit periodic structure that may serve as an additional cue for meter induction. For instance, different types of accents may contain periodicities of equal length, while being phase shifted with respect to each other. This is the case, for instance, with the accents 1, 4, and 5 in Fig. 3. It must be noted that temporal structure and pitch information are not the sole determinants of meter, as it is also influenced by other features such as phrasing and lyrics as well as accents introduced by the performer. These aspects are, nonetheless, beyond the scope of the present study. A shortcoming of meter induction models presented to date is that they have not been evaluated with large sets of musical material. While Vos et al. 共1994兲 utilized a corpus of 30 compositions, Brown 共1993兲 presented only a handful of short musical excerpts to visualize the performance of her model. The present study investigated the performance of autocorrelation-based meter induction with large collections of folk melodies 共consisting of thousands of items兲, for which the notated meters are available. The performance was assessed by the proportion of melodies that were correctly classified in terms of their meter. The components of the autocorrelation function that are significant in the classification were determined. Moreover, a number of different types of melodic accent and combinations thereof were applied to the classification to assess the significance of each of them in the induction of meter. Finally, confusions made by the algorithm between different types of meter were investigated in detail. II. AUTOCORRELATION AND METER
Below, the method for constructing the autocorrelation function for meter induction is described. For the original description, see Brown 共1993兲. Let the melody consist of N notes with onset times ti , i = 1 , 2 , . . . , N. Each note is associated with an accent value ai , i = 1 , 2 , . . . , N; in Brown 共1993兲, ai equals the duration of the respective note. The onset impulse function f is a time series consisting of impulses of height ai located at each note onset position:
FIG. 1. Excerpt from a melody, its onset impulse function weighted by durational accents, f, and the corresponding autocorrelation function, F. The maximum of the autocorrelation function at the lag of 4 / 8 indicates duple meter.
Autocorrelation refers to the correlation of two copies of a time series that are temporally shifted with respect to each other. For a given amount of shift 共or lag兲, a high value of autocorrelation suggests that the series contains a periodicity with length equaling the lag. In the present study, the autocorrelation function F was defined as F共m兲 = 兺 f共n兲f共n − m兲 n
冒兺
f共n兲2 ,
共3兲
n
where m denotes the lag in units of sampling interval; the denominator normalizes the function to F共0兲 = 1 irrespective of the length of the sequence. Often, the lag corresponding to the maximum of the autocorrelation function provides an estimate of the meter. This is the case for the melody depicted in Fig. 1. Sometimes the temporal structure alone is not sufficient for deducing the meter. This holds, for example, for isochronous and temporally highly aperiodic melodies. In such cases, melodic structure may provide cues for the determination of meter. This is the case, for instance, with the melody depicted in Fig. 2. With this isochronous melody, the autocorrelation function obtained from the duration-weighted onset impulse function fails to exhibit any peaks, thus making it impossible to determine the meter. Including information about pitch content in the onset impulse function leads, however, to an autocorrelation function with clearly discernible peaks.
III. STUDY 1 A. Material
where dt denotes the sampling interval and 关 兴 denotes rounding to the nearest integer.
The material used in the first study consisted of folk melodies in MIDI file format taken from two collections: the Essen collection 共Schaffrath, 1995兲, consisting of mainly European folk melodies, and the Digital Archive of Finnish Folk Tunes 共Eerola and Toiviainen, 2004a兲, subsequently referred to as the Finnish collection. For the present study, all melodies in either duple 共2 / 4, 4 / 4, 4 / 8, etc.; 2n eighth notes per measure兲 or triple/compound 共3 / 8, 3 / 4, 6 / 8, 9 / 8, 12/ 8, etc.; 3n eighth notes per measure兲 meter were chosen. Consequently, a total of 5507 melodies in the Essen collection were used in the study, of which 3121 共56.7%兲 were in duple and 2386 共43.3%兲 were in triple/compound meter. From the
J. Acoust. Soc. Am., Vol. 119, No. 2, February 2006
P. Toiviainen and T. Eerola: Autocorrelation in meter induction
N
f共n兲 = 兺 ai␦i共n兲,
n = 0,1,2, . . . ,
共1兲
i=1
where
␦i共n兲 =
再
1, n = 关ti/dt兴, 0, otherwise,
冎
共2兲
1165
FIG. 3. 共a兲 Onset impulse functions constructed from a melodic excerpt using the six accent types described in the text; 共b兲 the respective autocorrelation functions. As can be seen, the melodic accents frequently fail to co-occur either with each other or with the durational accents. All the autocorrelation functions, however, have maxima at lags of either 6 / 8 or 12/ 8, indicating triple or compound meter. FIG. 2. Excerpt from an isochronous melody. 共a兲 Onset impulse function weighted by durational accents, f, and the corresponding autocorrelation function, F, showing no discernible peaks. 共b兲 Onset impulse function weighted by interval size, f, and the corresponding autocorrelation function, F. The maximum of the autocorrelation function at the lag of 12/ 8 indicates triple or compound meter.
was utilized. The measures observed in the analyses were the percentage of correctly classified cases, the order in which variables entered into the discriminant function, and the discriminant function coefficients. C. Results
Finnish collection, 6861 melodies were used, of which 5518 共80.4%兲 were in duple and 1343 共19.6%兲 were in triple/ compound meter. B. Method
For each of the melodies in the two collections, we constructed a set of onset impulse functions weighted by various accent types 关Eqs. 共1兲 and 共2兲兴. In each case the sampling interval was set to 1 / 16 note. The accents consisted of 共1兲 durational accent 共ai equals tone duration兲, 共2兲 Thomassen’s 共1982兲 melodic accent, 共3兲 interval size in semitones between previous and current tone 共see, e.g., Vos et al., 1994兲, 共4兲 pivotal accent 共ai = 1 if melody changes direction, ai = 0 otherwise兲, and 共5兲 gross contour accent 共ai = 1 for ascending interval, ai = −1 for descending interval, ai = 0 otherwise兲. Since the note onset times alone, without regard to any accent structure, provide information about metrical structure, we further included 共6兲 constant accent 共ai = 1兲. The analysis was carried out using the MIDI Toolbox for Matlab 共Eerola and Toiviainen, 2004b兲. For each melody, each of the onset impulse functions was subjected to autocorrelation. The components of the obtained autocorrelation functions corresponding to lags of 1, 2,…, 16 eighth notes were included in the subsequent analyses. Figure 3 depicts the onset impulse functions and the respective autocorrelation functions constructed from a melodic excerpt using each of the accent types described above. The extent to which these autocorrelation functions could predict the meter of each melody was assessed by means of stepwise discriminant function analyses, in which various subsets of autocorrelation functions were used as independent variables and the meter 共duple versus triple/ compound兲 as the dependent variable. The leave-one-out cross-validation scheme 共Lachenbruch and Mickey, 1968兲 1166
J. Acoust. Soc. Am., Vol. 119, No. 2, February 2006
The analysis was started by considering the autocorrelation function based on durational accents. First, only the components corresponding to lags of 4 and 6 eighth notes were used as independent variables. This yielded a correct classification rate of 80.9% for the Essen collection and 84.6% for the Finnish collection. Thus, a significant proportion of the melodies was misclassified, suggesting that periodicities of 4 / 8 and 6 / 8 in durational accent are not sufficient for reliable classification. This can be clearly seen in Fig. 4, where the values of these components are displayed as scatter plots, showing a significant overlap between melodies representing the two types of meter. Next, all the components of the durational-accent-based autocorrelation function were entered into the analysis. This yielded a correct classification rate of 90.5% for the Essen collection and 93.1% for the Finnish collection. Inclusion of all 16 components as independent variables thus considerably improved classification performance. The first components to enter into the discriminant function were lags of 8 / 8, 12/ 8, and 16/ 8 共in this order兲 for the Essen collection, and 4 / 8, 12/ 8, and 16/ 8 for the Finnish collection. This suggests that periodicities longer than one bar may provide cues for meter determination that are more reliable than shorter periodicities. Subsequently, discriminant function analyses were carried out with the autocorrelation functions obtained using each of the remaining accents, one at a time. The results are summarized in Table I. As can be seen, a significant proportion of the components of the autocorrelation functions that entered first in the stepwise analysis correspond to relatively long time lags. In particular, for all accent types lag 12/ 8 is among the three most important components. For lag 16/ 8 the same holds true for all but one accent type. This again suggests that periodicities exceeding the span of one bar P. Toiviainen and T. Eerola: Autocorrelation in meter induction
seem to offer highly important cues for meter induction. Somewhat unexpectedly, the autocorrelation function obtained by ignoring any accent structure 共constant accent兲 yielded for both collections the highest classification rate. In light of this result, the onset function constructed using a constant accent may be slightly more efficient in meter induction than the function originally introduced by Brown 共1993兲. In terms of correct classification rate, the constant accent was followed by durational accent and Thomassen’s melodic accent, in this order. Finally, all autocorrelation functions were used together as independent variables. This yielded a correct classification rate of 95.3% and 96.4% for the Essen collection and the Finnish collection, respectively. The first variable to enter into the discriminant function was lag 8 / 8 with durational accent for the Essen collection and lag 4 / 8 with constant accent for the Finnish collection. The next three variables to enter were the same for both collections: lag 12/ 8 with constant accent, lag 16/ 8 with constant accent, and lag 6 with melodic accent, in this order. This analysis contained a total of 96 independent variables 共16 for each of the 6 accent types兲; due to the large number of cases, a large number of variables were entered into the stepwise analyses. To obtain a simpler model for meter classification, we performed a further discriminant function analysis in which the six most prominent components for two most prominent accent types from the previous analysis were used. These corresponded to the values of the autocorrelation functions at lags of 3 / 8, 4 / 8, 6 / 8, 8 / 8, 12/ 8, and 16/ 8 using constant accent and Thomassen’s melodic accent. Using both collections together 共N = 12 368兲, we obtained the following discriminant function, ⌬ = − 1.042 + 0.318Fconst共3/8兲 + 5.240Fconst共4/8兲 − 0.630Fconst共6/8兲 + 0.745Fconst共8/8兲 − 8.122Fconst共12/8兲 + 4.160Fconst共16/8兲 − 0.978Fmel共3/8兲 + 1.018Fmel共4/8兲 − 1.657Fmel共6/8兲
FIG. 4. Scatter plot of the values for lags 4 / 8 and 6 / 8 of the durationalaccent-based autocorrelation function for the melodies in the Essen collection and the Finnish collection that were used in the present study. Circles and crosses represent melodies in duple and triple meter, respectively.
+ 1.419Fmel共8/8兲 − 2.205Fmel共12/8兲 + 1.568Fmel共16/8兲,
共4兲
where Fconst and Fmel denote autocorrelation functions obTABLE I. Proportions of correctly classified melodies 共classification rate兲 and components of the autocorrelation function that entered first in each analysis 共primary components; numbers refer to lags in units of one eighth note兲 for both collections and each accent type.
Accent type
Classification rate 共%兲 Primary Components
Classification rate 共%兲 Primary Components
共1兲 Duration
90.5 8, 12, 16
93.1 4, 12, 16
J. Acoust. Soc. Am., Vol. 119, No. 2, February 2006
共2兲 Thomassen
共3兲 Interval
共4兲 Pivot
Essen Collection 共N = 5507兲 86.8 85.9 78.5 16, 12, 8
16, 12, 4
16, 12, 6
Finnish collection 共N = 6861兲 90.1 87.6 79.4 8, 12, 4
8, 12, 16
6, 16, 12
共5兲 Contour
共6兲 Constant
73.1
91.5
16, 12, 8
8, 12, 16
77.9
94.7
12, 6, 16
4, 12, 16
P. Toiviainen and T. Eerola: Autocorrelation in meter induction
1167
TABLE II. Confusion matrices between notated and predicted meters for both collections. Each element shows the number of melodies with the respective notated and predicted meter. For each meter, the recall, precision, and F score values are given. Essen Collection 共N = 5592兲 Predicted meter
Notated meter
2/4 3/2 3/4 3/8 4/1 4/2 4/4 6/4 6/8 Total
2/4
3/2
3/4
3/8
4/1
4/2
4/4
6/4
6/8
Total
Recall
Precision
F
1130 0 52 23 0 0 98 0 15 1318
0 65 0 0 1 4 1 0 0 71
21 11 930 9 0 0 32 21 12 1036
9 0 30 168 0 0 0 0 109 316
0 0 0 0 36 11 1 0 0 48
0 8 0 0 2 148 12 0 0 170
124 16 91 0 0 10 1452 9 1 1703
0 0 106 0 0 0 1 80 1 188
1 0 6 91 0 0 1 0 643 742
1285 100 1215 291 39 173 1598 110 781
0.88 0.65 0.77 0.58 0.92 0.86 0.91 0.73 0.82
0.86 0.92 0.90 0.53 0.75 0.87 0.85 0.43 0.87
0.87 0.76 0.83 0.55 0.83 0.86 0.88 0.54 0.84
Finnish Collection 共N = 7351兲 Predicted meter
Notated meter
2/4 3/2 3/4 3/8 4/4 5/2 5/4 6/4 6/8 Total
2/4
3/2
3/4
3/8
4/4
5/2
5/4
6/4
6/8
Total
Recall
Precision
F
2439 5 98 9 958 0 32 4 14 3559
17 45 23 0 7 0 6 1 3 102
62 4 693 17 5 0 1 9 16 807
0 0 2 47 0 0 0 0 48 97
740 15 16 0 1203 11 0 26 0 2011
12 3 2 0 25 26 0 0 0 68
14 0 2 0 0 1 374 0 1 392
5 2 65 1 5 1 0 38 2 119
4 0 1 55 2 0 0 0 134 196
3293 74 902 129 2205 39 413 78 218
0.74 0.61 0.77 0.36 0.55 0.67 0.91 0.49 0.61
0.69 0.44 0.86 0.48 0.60 0.38 0.95 0.32 0.68
0.71 0.51 0.81 0.42 0.57 0.49 0.93 0.39 0.65
tained with constant and Thomassen’s melodic accent, respectively. With this discriminant function, the correct classification rates were 92.9% and 94.8% for the Essen and the Finnish collections, respectively. This simpler discriminant function thus yielded correct classification rates that were merely 2.4% and 1.6% lower than those obtained with 96 predictive variables. Variables that received the largest coefficients in the discriminant function were Fconst共12/ 8兲, Fconst共4 / 8兲, Fconst共16/ 8兲, Fmel共12/ 8兲, and Fmel共6 / 8兲, in this order. In concordance with the aforementioned results, this suggests that temporal structure above the bar level produces important cues for meter determination. Further, the fact that the most significant components of the melodic accent autocorrelation function correspond to multiples of 3 / 8 lags suggests that, especially for triple and compound meters, melodic accent structure provides additional cues. IV. STUDY 2
The aim of study 2 was to assess the capability of the autocorrelation-based meter induction method to carry out a more detailed classification. More specifically, instead of mere classification as duple versus triple, the dependent variable used in this study was the actual notated meter. In the analysis, special attention was paid to the pattern of confusion between meters. 1168
J. Acoust. Soc. Am., Vol. 119, No. 2, February 2006
A. Material
As in study 1, the material was taken from the Essen collection and the Digital Archive of Finnish Folk Tunes. From each collection, melodies that consisted of a single notated meter were included. Moreover, for each collection only meters that contained more than 30 exemplars were included. Consequently, a total of 5592 melodies in the Essen collection were used, representing nine different notated meters 共2 / 4, 3 / 2, 3 / 4, 3 / 8, 4 / 1, 4 / 2, 4 / 4, 6 / 4, 6 / 8兲. From the Finnish collection, 7351 melodies were used, representing nine different notated meters 共2 / 4, 3 / 2, 3 / 4, 3 / 8, 4 / 4, 5 / 2, 5 / 4, 6 / 4, 6 / 8兲. For each collection, the number of melodies representing each notated meter is shown in Table II. B. Methods
The classification of meters was based on the discriminant function obtained using the autocorrelation functions obtained using all the accent types. The performance was assessed by means of a confusion matrix. Furthermore, for both collections the precision and recall values as well as the F score were calculated for each meter 共Salton and McGill, 1983兲. For a given meter, precision is defined as the number of melodies having the meter and being correctly classified divided by the total number of melodies being classified as representing the meter. A high value of precision thus indiP. Toiviainen and T. Eerola: Autocorrelation in meter induction
cates that, of the melodies classified as being notated in a given meter, a large proportion is correctly classified. Similarly, for each meter, recall is defined as the number of melodies being notated in the meter and being correctly classified divided by the total number of melodies being notated in the meter. A high value of recall thus indicates that of the melodies being notated in a given meter, a large proportion is correctly classified. The F score is defined as the harmonic mean of precision and recall and is regarded as an overall measure of classification performance 共see, e.g., Salton and McGill, 1983兲.
C. Results
FIG. 5. Dendrograms obtained from the confusion matrix using the similarity measure of Eq. 共5兲. The leftmost column displays the average note durations in quarter notes for the melodies representing each meter.
Overall, 83.2% of the melodies from the Essen collection and 68.0% of those from the Finnish collection were correctly classified. These proportions being lower than the corresponding figures in the first study is due to the larger number of classes used in this study 共nine兲 than in the first 共two兲. The notably low correct classification rate for the Finnish collection can be mainly attributed to the fact that a large proportion 共43.4%兲 of the melodies representing 4 / 4 meter were classified as being 2 / 4 共see below兲. To obtain a more detailed view of the classification performance, we calculated the confusion matrices for both collections. They are displayed in Table II. The table also shows the precision and recall values as well as the F scores for each meter. In terms of the F score, the most accurately classified meters were 4 / 4 and 2 / 4 for the Essen collection and 5 / 4 and 3 / 4 for the Finnish collection. Similarly, the least accurately classified meters were 6 / 4 and 3 / 8 for both collections. Table II reveals that the most frequent confusions were made within the groups of duple and triple/compound meters, as defined in study 1, whereas confusions across these groups where significantly less frequent. For both collections, meters 2 / 4 and 4 / 4 displayed the highest mutual confusion rate, followed by meters 3 / 4 and 6 / 4. A large proportion of these misclassifications can probably be attributed to the effect of tempo on the choice of notated meter 共cf. London, 2002兲. Certain confusions imply more severe misattributions by the algorithm. For instance, 11.7% of the melodies in the Essen collection notated in 3 / 4 meter were misclassified as representing binary meter 共4 / 4 or 2 / 4兲, the corresponding figure for the Finnish collection being 12.6%. In general, duple meters were less frequently misclassified as representing triple/compound meter as vice versa. This asymmetry is in line with the results obtained in study 1. Further research would be needed to account for this phenomenon. As the confusion matrices contain an abundance of numbers, the relationship between meters may be difficult to see. Therefore we visualized the relations between meters by performing separate hierarchical cluster analyses for both collections. To this end, we calculated the distance between each meter from the confusion matrix according to the formula
where dij denotes the distance between meters i and j, and cij is the number of cases where a melody in meter i has been classified as being in meter j. By definition, the larger the proportion of melodies confused between meters, cij + c ji, to the number of melodies correctly classified for both meters, cii + c jj, the smaller the distance dij between the meters. Figure 5 displays the dendrograms obtained from the clustering algorithms. In the dendrograms of Fig. 5, the stage at which given meters cluster together reflects the algorithm’s rate of confusion between the meters. For both collections, the meters to first cluster together are 3 / 8 and 6 / 8. For the Essen collection, this is followed by the clustering of the meters 3 / 4 and 6 / 4 as well as 2 / 4 and 4 / 4, in this order. Also for the Finnish collection these pairs of meters cluster next, albeit in reverse order, that is, the clustering of 2 / 4 and 4 / 4 precedes that of 3 / 4 and 6 / 4. A further similar feature between the two dendrograms is that the last clustering occurs between the cluster formed by the meters 3 / 8 and 6 / 8 and the cluster formed by all the other meters. This suggests that, in terms of the autocorrelation functions, meters 3 / 8 and 6 / 8 are most distinct from the other meters. One peculiar feature of the dendrogram for the Essen collection is the relatively late clustering of meters 4 / 1 and 4 / 2 with meters 2 / 4 and 4 / 4. In particular, the former two meters cluster with meter 3 / 2 before clustering with the latter two meters. A potential explanation for this is the difference in the average note durations between the meters, shown in the leftmost column of Fig. 5. More specifically, the average note durations for meters 4 / 1, 4 / 2, and 3 / 2 exceed those of meters 2 / 4 and 4 / 4 by a factor of 2. This anomaly, however, is not significant, as meters 4 / 1, 4 / 2, and 3 / 2 constitute merely a minor proportion of the whole collection.
J. Acoust. Soc. Am., Vol. 119, No. 2, February 2006
P. Toiviainen and T. Eerola: Autocorrelation in meter induction
dij = 1 −
冉
冊
cij + c ji , cii + c jj
共5兲
1169
V. CONCLUSIONS
We studied the classification performance of the autocorrelation-based meter induction model, originally introduced by Brown 共1993兲. Using discriminant function analysis, we provided an explicit method for the classification. Furthermore, we applied the algorithm to investigate the role of melodic accent structure in meter induction. In conformance with the general view, we found that periodicity in note onset locations was the most important cue for the determination of meter. The results also imply that periodicities longer than one measure provide additional information for meter induction. A somewhat unexpected finding was that ignoring the accent structure in the onset impulse function yielded the best classification rate. This result is difficult to explain and calls for further investigation. Furthermore, the results suggest that periodicity in melodic accent structure may serve as an additional cue in the determination of meter. In particular, including Thomassen’s 共1982兲 melodic accent was found to improve classification performance to some extent. This, however, does not necessarily imply that melodic and durational accents need cooccur but rather that they exhibit similar periodic structure with eventual mutual phase shift. An additional finding that calls for further study was the significant difference between the correct classification rates for melodies in duple and triple/compound meter. More specifically, melodies in duple meter were more often correctly classified than melodies in triple/compound meter. A detailed investigation of misclassified melodies could provide insight into this question. The material used in the present study consisted of melodies predominantly in duple, triple, and compound meters, although study 2 utilized a few hundred Finnish melodies in either 5 / 4 or 5 / 2 meter. To obtain a deeper insight into the role of accent structure in meter induction, a study with nonregular meters such as those present in the folk music of the Balkan region could be carried out. An apparent limitation of the method presented in this article is its inability to deal with melodies that contain changes of meter. For a melody that, say, starts in 2 / 4 meter and changes to 3 / 4 meter, the algorithm gives unpredictable results. This is due to the fact that the algorithm considers the melody as a whole. The limitation may be overcome by applying a windowed analysis analogical to algorithms used in pitch estimation from acoustical signals, in which the autocorrelation is applied to short windowed segments of the melody, with the window moving gradually throughout the melody. The present study utilized melodies that were represented in symbolic, temporally quantized form. The choice of stimuli was mainly based on the availability of correct 共notated兲 meters for the melodies in the collections. In principle the method could, however, be applied to performed music in acoustical form as well, at least with a monophonic input. This would require algorithms for onset detection 共e.g., Klapuri, 1999兲, pitch estimation 共e.g., Brown and
1170
J. Acoust. Soc. Am., Vol. 119, No. 2, February 2006
Puckette, 1994; Klapuri, 2003兲, and beat tracking 共e.g., Dixon, 2001; Large and Kolen, 1994; Toiviainen, 1998, 2001兲. ACKNOWLEDGMENTS
This work was supported by the Academy of Finland 共Project No. 102253兲. Brown, J. C. 共1993兲. “Determination of meter of musical scores by autocorrelation,” J. Acoust. Soc. Am. 94, 1953–1957. Brown, J. C., and Puckette, M. S. 共1994兲. “A high resolution fundamental frequency determination based on phase changes of the Fourier transform,” J. Acoust. Soc. Am. 94, 662–667. Cooper, G., and Meyer, L. B. 共1960兲. The Rhythmic Structure of Music 共Univ. of Chicago, Chicago兲. Dawe, L. A., Platt, J. R., and Racine, R. J. 共1993兲. “Harmonic accents in inference of metrical structure and perception of rhythm patterns,” Percept. Psychophys. 54, 794–807. Dixon, S. 共2001兲. “Automatic extraction of tempo and beat from expressive performances,” J. New Music Res. 30, 39–58. Dixon, S., and Cambouropoulos, E. 共2000兲. “Beat tracking with musical knowledge,” in ECAI 2000: Proc. 14th European Conference on Artificial Intelligence 共IOS, Amsterdam, Netherlands兲, pp. 626–630. Eerola, T., and Toiviainen, P. 共2004a兲. Digital Archive of Finnish Folk Tunes, University of Jyväskylä, Jyväskylä, Finland. Available at: http:// www.jyu.fi/musica/sks/ Eerola, T., and Toiviainen, P. 共2004b兲. MIDI Toolbox: MATLAB Tools for Music Research, University of Jyväskylä, Jyväskylä, Finland. Available at http://www.jyu.fi/musica/miditoolbox/ Fraisse, P. 共1982兲. “Rhythm and tempo,” in Psychology of Music, edited by D. Deutsch 共Academic, New York兲, pp. 149–180. Hannon, E., Snyder, J., Eerola, T., and Krumhansl, C. L. 共2004兲. “The Role of melodic and temporal cues in perceiving musical meter,” J. Exp. Psychol. Hum. Percept. Perform. 30, 956–974. Huron, D., and Royal, M. 共1996兲. “What is melodic accent? Converging evidence from musical practice,” Music Percept. 13, 489–516. Klapuri, A. 共1999兲. “Sound Onset Detection by Applying Psychoacoustic Knowledge,” Proc. IEEE Int. Conf. Acoustics Speech and Sig. Proc. 共ICASSP兲, Phoenix, AZ, pp. 3089–3092. Klapuri, A. 共2003兲. “Multiple fundamental frequency estimation by harmonicity and spectral smoothness,” IEEE Trans. Speech Audio Process. 11, 804–816. Lachenbruch, P. A., and Mickey, M. R. 共1968兲. “Estimation of error rates in discriminant analysis,” Technometrics 10, 1–11. Large, E. W., and Kolen, J. F. 共1994兲. “Resonance and the perception of musical meter,” Connection Sci. 6, 177–208. Lerdahl, F., and Jackendoff, R. 共1983兲. A Generative Theory of Tonal Music 共MIT, Cambridge, MA兲. London, J. 共2002兲. “Cognitive constraints on metric systems: some observations and hypotheses,” Music Percept. 19, 529–550. Parncutt, R. 共1994兲. “A perceptual model of pulse salience and metrical accent in musical rhythms,” Music Percept. 11, 409–464. Salton, G., and McGill, M. 共1983兲. Introduction to Modern Information Retrieval 共McGraw Hill, New York兲. Schaffrath, H. 共1995兲. The Essen Folksong Collection in Kern Format 共computer database兲, edited by D. Huron 共Center for Computer Assisted Research in the Humanities, Menlo Park, CA兲. Thomassen, J. M. 共1982兲. “Melodic accent: Experiments and a tentative model,” J. Acoust. Soc. Am. 71, 1596–1605. Toiviainen, P. 共1997兲. “Modelling the perception of metre with competing subharmonic oscillators,” Proc. Third Triennial ESCOM Conference, Uppsala, Uppsala University, pp. 511–516. Toiviainen, P. 共1998兲. “An interactive MIDI accompanist,” Comput. Music J. 22, 63–75. Toiviainen, P. 共2001兲. “Real-time recognition of improvisations with adaptive oscillators and a recursive Bayesian classifier,” J. New Music Res. 30, 137–148. van Noorden, L., and Moelants, D. 共1999兲. “Resonance in the perception of musical pulse,” J. New Music Res. 28, 43–66. Vos, P. G., van Dijk, A., and Schomaker, L. 共1994兲. “Melodic cues for metre,” Perception 23, 965–976.
P. Toiviainen and T. Eerola: Autocorrelation in meter induction