The Role of Accent Periodicities in Meter Induction: A ... - CiteSeerX

THE ROLE OF ACCENT PERIODICITIES IN METER INDUCTION: A CLASSIFICATION STUDY Petri Toiviainen & Tuomas Eerola Department of Music, University of Jyväskylä, Finland

ABSTRACT We tested the performance of autocorrelation-based meter induction with two large collections of folk melodies (N = 12368), for which the correct meter was available. Furthermore, we examined whether including different melodic accent types would improve the classification performance, as measured b y the proportion of melodies whose meter was correctly classified by a discriminant function. The components of the autocorrelation functions that were significant in the classification were determined. It was found that periodicity i n note onset locations was the most important cue for the determination of meter. Of the melodic accents included, Thomassen's melodic accent was found to provide the most reliable cues for the determination of meter. The discriminant function analyses suggested that periodicities longer than one measure may provide cues for meter determination that are more reliable than shorter periodicities.

1.

INTRODUCTION

Most music is organized to contain temporal periodicities that evoke a percept of regularly occurring pulses, or beats. The period of the most salient pulse is typically within the range of 400 to 900 msec [1,2,3]. The perceived pulses are often hierarchically organized and consist of at least two simultaneous levels, whose periods have an integer ratio. This gives rise to a percept of regularly alternating strong and weak beats, a phenomenon referred to as meter [4,5]. In Western music, the ratio of the pulse lengths is usually limited to 1:2 (duple meter) and 1:3 (triple meter). Meter in which each beat has three subdivisions, such as 6/8 or 9/8, is referred to as compound meter. A number of computational models have been developed for the extraction of the basic pulse from music. Modeling of meter perception has however received less attention. Large [6] presented a model of meter perception based on resonating oscillators. Toiviainen [7] presented a model of competing subharmonic oscillators for determining the meter (duple vs. triple) from an acoustical representation of music. Brown [8] proposed a method for determining the meter of musical scores by applying autocorrelation to a temporal function consisting of impulses at each tone onset whose heights are weighted b y the respective tone durations. It must be noted that this method is only capable of producing an estimate of the meter, that is, the period of strong beats, but cannot be used for determining the locations of strong beats.

ISBN 1-876346-50-7 © 2004 ICMPC

422

Although there is evidence that the pitch information present in music may affect the perception of pulse and meter [9,10], most models of pulse finding developed to date rely only on note onset times and durations. Dixon and Cambouropoulos [11] however, proposed a multi-agent model for beat tracking that makes use of pitch and amplitude information. They found that including this information when determining the salience of notes significantly improved the performance of their model. Vos, van Dijk and Schomaker [12] applied autocorrelation to the determination of meter in isochronous music. They utilized a method similar to that proposed in [8], except they used melodic intervals between subsequent notes to represent the accent of each note. Using a corpus of 3 0 compositions by J. S. Bach, they found that the maxima of the obtained autocorrelation functions matched the respective bar lengths as indicated in the musical score. Using a correlational study, Huron and Royal [13] investigated the extent to which different types of pitch-related accent correspond to respective metrical positions as indicated b y musical notation. Of eight different types of melodic accent, they found that the empirically-derived accent by Thomassen [10] was the only to correlate significantly with metric position. The lack of correlation of the other accent types does, however, not exclude the possibility that various metric accents exhibit periodic structure that may serve as an additional cue for meter induction (for an example, see Fig. 2). The present study investigated the performance of autocorrelation-based meter induction with large collections of folk melodies, for which the correct (notated) meters are available. The performance was assessed by the proportion of melodies that were correctly classified in terms of their meter (duple vs. triple/compound). Furthermore, the components of the autocorrelation function that are significant in the classification were determined. Finally, a number of different types of melodic accent and combinations thereof were applied to the classification to assess the significance of each of them in the induction of meter.

2.

AUTOCORRELATION AND METER

Below, the method for constructing the autocorrelation function for meter induction is described. For the original description, see [8]. Let the melody consist of N notes with onset times t i ,i = 1,2,...,N . Each note is associated with an accent value ai ,i = 1,2,...,N ; in [8], ai equals the duration of the respective note. The onset impulse function f is a time series consisting of impulses of height ai located at each note onset position:

ICMPC8, Evanston, IL, USA

August 3-7, 2004

N

f (n) = ai i (n),n = 0,1,2,...

(1)

of which 5518 (80.4%) were in duple and 1343 (19.6%) were i n triple/compound meter.

i=1

4.

where 1, n = [ t i / dt] i (n) = 0, otherwise

(2)

where dt denotes the sampling interval and [] denotes rounding to the nearest integer. Autocorrelation refers to the correlation of two copies of a time series that are temporally shifted with respect to each other. For a given amount of shift (or lag), a high value of autocorrelation suggests that the series contains a periodicity with length equaling the lag. In the present study, the autocorrelation function F was defined as F(m) = f (n) f (n m)

f (n) 2

(3)

n

n

where m denotes the lag in units of sampling interval; the denominator normalizes the function to F(0)=1 irrespective of the length of the sequence. Often, the lag corresponding to the maximum of the autocorrelation function provides an estimate of the meter. This is the case for the melody depicted in Fig. 1. Sometimes the temporal structure alone is not sufficient for deducing the meter. This holds, for example, for isochronous melodies. In such cases, melodic structure may provide cues for the determination of meter.

Fig. 1. Excerpt from a melody, its onset impulse function weighted by durational accents (black bars) and the corresponding autocorrelation function (blue bars). The maximum of the autocorrelation function at the lag of 4/8 indicates duple meter.

3.

MATERIAL

The material used in the experiment consisted of folk melodies in MIDI file format taken from two collections: the Essen collection [14], consisting of mainly European folk melodies, and the Digital Archive of Finnish Folk Tunes [15]. For the present experiment, all melodies in either duple (2/4, 4/4, 4/8, etc.; 2n eighth notes per measure) or triple/compound (3/8, 3/4, 6/8, 9/8, 12/8, etc.; 3n eighth notes per measure) meter were chosen. Consequently, a total of 5507 melodies in the Essen collection were used in the experiment, of which 3121 (56.7%) were in duple and 2386 (43.3%) were in triple/compound meter. From the Finnish collection, 6861 melodies were used,

423

METHOD

For each of the melodies in the two collections, we constructed a set of onset impulse functions weighted by various accent types (Eqs. 1 and 2). In each case the sampling interval was set to 1/16 note. The accents consisted of (1) durational accent ( ai equals tone duration); (2) Thomassen's (1982) melodic accent; (3) interval size in semitones between previous and current tone (see e.g. Vos et al., 1994); (4) melodic closure (Narmour, 1990); (5) pivotal accent ( ai = 1 if melody changes direction, ai = 0 otherwise); and (6) gross contour accent ( ai = 1 for ascending interval, ai = -1 for descending interval, ai = 0 otherwise). Since the note onset times alone, without regard t o any accent structure, provide information about metrical structure, we further included (7) constant accent ( ai = 1). The analysis was carried out using the MIDI Toolbox for Matlab [16]. For each melody, each of the onset impulse functions was subjected to autocorrelation. The components of the obtained autocorrelation functions corresponding to lags of 1, 2, ..., 1 6 eighth notes were included in the subsequent analyses. Figure 2 depicts the onset impulse functions and the respective autocorrelation functions constructed from a melodic excerpt using each of the accent types described above.

Fig. 2. Onset impulse functions constructed from a melodic excerpt using the seven accent types described in the text (black bars); the respective autocorrelation functions (blue bars). As can be seen, the melodic accents frequently fail to cooccur either with each other or with the durational accents. All the autocorrelation functions, however, have maxima at lags of either 6/8 or 12/8, indicating triple or compound meter. The extent to which these autocorrelation functions could predict the meter of each melody was assessed by means of stepwise discriminant function analyses, in which various subsets of autocorrelation functions were used as independent variables and the meter (duple vs. triple/compound) as the dependent variable. The measures observed in the analyses were the percentage of correctly classified cases, the order i n

ICMPC8, Evanston, IL, USA

August 3-7, 2004

which variables entered into the discriminant function, and the discriminant function coefficients.

5.

RESULTS

The analysis was started by considering the autocorrelation function based on durational accents. First, only the components corresponding to lags of 4 and 6 eighth notes were used as independent variables. This yielded a correct classification rate of 80.9% for the Essen collection and 84.6% for the Finnish collection. Thus, a significant proportion of melodies were misclassified, suggesting that periodicities of 4/8 and 6/8 in durational accent are not sufficient for reliable classification. This can be clearly seen in Figure 3, where the values of these components are displayed as a scatter plot for each melody in the Essen collection, showing a significant overlap between melodies representing the two types of meter.

highest classification rate. This was followed by durational accent and Thomassen's melodic accent, in this order. Finally, all autocorrelation functions were used together as independent variables. This yielded a correct classification rate of 95.3% and 96.4% for the Essen collection and the Finnish collection, respectively. The first variable to enter into the discriminant function was lag 8/8 with durational accent for the Essen collection and lag 4/8 with constant accent for the Finnish collection. The next three variables to enter were the same for both collections: lag 12/8 with constant accent, lag 16/8 with constant accent; and lag 6 with melodic accent, i n this order. This analysis contained a total of 128 independent variables (16 for each of the 8 accent types); due to the large number of cases, a large number of variables were entered into the stepwise analyses. To obtain a simpler model for meter classification, we performed a further discriminant function analysis in which lags 3/8, 4/8, 6/8, 8/8, 12/8 and 16/8 with constant accent and Thomassen's melodic accent were included. Using both collections together (N = 12368), we obtained the following discriminant function = -1.042 + 0.318Fconst (3/8) + 5.240Fconst (4/8) - 0.630Fconst (6/8) + 0.745Fconst (8/8) -8.122Fconst (12/8) + 4.160Fconst (16/8) - 0.978Fmel (3/8) + 1.018Fmel (4/8) -1.657Fmel (6/8) + 1.419Fmel (8/8) - 2.205Fmel (12/8) + 1.568Fmel (16/8)

Fig. 3. Scatter plot of the values for lags 4/8 and 6/8 of the durational-accent-based autocorrelation function for the melodies in the Essen collection that were used in the present study (N = 5507). Blue and red circles represent melodies i n duple and triple/compound meter, respectively. Next, all the components of the durational-accent-based autocorrelation function were entered into the analysis. This yielded a correct classification rate of 90.5% for the Essen collection and 93.1% for the Finnish collection. Inclusion of all 16 components as independent variables thus improved classification performance significantly. The first components to enter into the discriminant function were lags of 8/8, 12/8, 16/8, and 4/8 (in this order) for the Essen collection, and 4/8, 12/8, 16/8, and 2/8 for the Finnish collection. This suggests that periodicities longer than one measure may provide cues for meter determination that are more reliable than shorter periodicities. Subsequently, discriminant function analyses were carried out with the autocorrelation functions obtained using each of the remaining accents, one at a time. The results are summarized in Table 1. Somewhat unexpectedly, the autocorrelation function obtained by ignoring any accent structure (constant accent) yielded for both collections the

424

where Fconst and Fmel denote autocorrelation functions obtained with constant and Thomassen's melodic accent, respectively. With this discriminant function, the correct classification rates were 92.9% and 94.8% for the Essen and the Finnish collection, respectively. Variables that received the largest coefficients in the discriminant function were Fconst (12 / 8), Fconst (4 / 8) , Fconst (16 / 8), Fmel (12 / 8) , and Fmel (6 / 8) , in this order. This suggests that temporal structure at a hyper-measure level, such as at the phrase level, may produce cues for meter determination. Further, melodic accent structure may provide additional cues, especially for triple and compound meters.

6.

CONCLUSIONS

In conformance with the general view, it was found that periodicity in note onset locations was the most important cue for the determination of meter. The results also imply that periodicities longer than one measure provide additional information for meter induction. A somewhat unexpected finding was that ignoring the accent structure in the onset impulse function yielded the best classification rate. This result is difficult to explain and calls for further investigation. Furthermore, the accent structure determination of melodic accent performance to necessarily imply occur.

results suggest that periodicity in melodic may serve as an additional cue in the meter. In particular, including Thomassen's was found to improve classification some extent. This, however, does not that melodic and durational accents need co-

ICMPC8, Evanston, IL, USA Accent type

Duration (1)

Classification Rate (%)

90.5

Primary Components

8, 12, 16

Classification Rate (%)

93.1

Primary Components

4, 12, 16

August 3-7, 2004 Thomassen Interval Closure (2) (3) (4) Essen Collection (N = 5507) 86.8 85.9 68.1 16, 12, 8

16, 12, 4

16, 12, 4

Finnish Folk Tunes (N = 6861) 90.1 87.6 75.2 8, 12, 4

8, 12, 16

12, 16, 6

Pivot (5)

Contour (6)

Constant (7)

78.5

73.5

91.5

16, 12, 6

16, 12, 8

8, 12, 16

79.4

78.1

94.7

6, 16, 12

12, 6, 16

4, 12, 16

Table 1. Proportions of correctly classified melodies (classification rate) and components of autocorrelation function that entered first in each analysis (primary components; numbers refers to lags in units of one eighth note) for both collections and each accent type.

An additional finding that calls for further study was the significant difference between the correct classification rates for melodies in duple and triple/compound meter. More specifically, melodies in duple meter were more often correctly classified than melodies in triple/compound meter. A detailed investigation of misclassified melodies could provide insight into this question. The present study considered only the classification between duple and triple/compound meter. We plan to apply the method to more detailed classification, such as between simple duple (e.g., 2/4), simple triple (e.g., 3/4), compound duple (e.g., 6/8) and compound triple (e.g., 9/8) meters. Additionally, other meters such as 5/4 and 7/4 should be investigated i n subsequent studies.

7.

REFERENCES

1. Fraisse, P. (1982). Rhythm and tempo. In Deutsch, D. (Ed.), Psychology of music (pp. 149-180). New York: Academic Press.

ESCOM Conference. Uppsala: Uppsala University, 511516. 8. Brown, J. C. (1993). Determination of meter of musical scores by autocorrelation. Journal of the Acoustical Society of America, 94, 1953-1957. 9. Dawe, L. A., Platt, J. R., & Racine, R. J. (1993). Harmonic accents in inference of metrical structure and perception of rhythm patterns. Perception and Psychophysics, 54, 794-807. 10. Thomassen, J. M. (1982). Melodic accent: Experiments and a tentative model. Journal of the Acoustical Society of America, 71, 1596-1605. 11. Dixon, S., & Cambouropoulos, E. (2000). Beat tracking with musical knowledge. In ECAI 2000: Proceedings o f the 14th European Conference on Artificial Intelligence (626-630). IOS Press. 1 2 .Vos, P. G., van Dijk, A., & Schomaker, L. (1994). Melodic cues for metre. Perception, 23, 965-976.

2. Parncutt, R. (1994). A perceptual model of pulse salience and metrical accent in musical rhythms. Music Perception, 11, 409-464.

13. Huron, D. & Royal, M. (1996). What is melodic accent? Converging evidence from musical practice. Music Perception, 13(4), 489-516.

3. van Noorden, L., & Moelants, D. (1999). Resonance i n the perception of musical pulse. Journal of New Music Research, 28, 43-66.

14. Schaffrath, H. (1995). The Essen Folksong Collection i n Kern Format. [computer database]. D. Huron (ed.). Menlo Park, CA: Center for Computer Assisted Research in the Humanities.

4. Cooper, G., & Meyer, L. B. (1960). The rhythmic structure of music. Chicago: University of Chicago Press. 5 . Lerdahl, F., & Jackendoff, R. (1983). A generative theory of tonal music. Cambridge, MA: MIT Press. 6. Large, E. W. & Kolen, J. F. (1994). Resonance and the perception of musical meter. Connection Science, 6(1), 177-208. 7 . Toiviainen. P. (1997). Modelling the perception of metre with competing subharmonic oscillators. In A. Gabrielsson (Ed.), Proceedings of the Third Triennial

425

15. Eerola, T., & Toiviainen, P. (2004). Digital archive o f Finnish folk tunes. University of Jyväskylä: Jyväskylä, Finland. Available at: http://www.jyu.fi/musica/sks/ 1 6 .Eerola, T. & Toiviainen, P. (2004). MIDI Toolbox: MATLAB Tools for Music Research. University of Jyväskylä: Jyväskylä, Finland. Available at: http://www.jyu.fi/musica/miditoolbox