A geometric approach to computing repeated ... - Semantic Scholar

2 downloads 0 Views 575KB Size Report
May 23, 2001 - 3 Most repetitions in music are not perceptually significant. 5. 4 The class ..... 6. w is a border of x if and only if w is both a prefix and a suffix of x.
A geometric approach to computing repeated patterns in polyphonic music∗ David Meredith

Kjell Lemstr¨om

Geraint A. Wiggins

Department of Computing,

Department of Computer Science,

Department of Computing,

City University, London.

University of Helsinki.

City University, London.

[email protected]

[email protected]

[email protected]

December 28, 2001

Contents 1 Introduction

2

2 The importance of repetition in music perception and analysis

2

3 Most repetitions in music are not perceptually significant

5

4 The class of perceptually significant repetitions is very diverse

6

5 A survey of algorithms for discovering repetitions in 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Some string-processing concepts and terminology . . . 5.3 String-processing algorithms for discovering repetitions

music . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . in music . . . . . . . . . . . . .

11 11 11 13

6 Representing music using multidimensional datasets

19

7 The functions computed by SIA and SIATEC

24

8 The SIA and SIATEC algorithms 28 8.1 The SIA algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 8.2 The SIATEC algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 9 Running SIA and SIATEC on music data

36

10 Isolating perceptually significant repetitions

42

11 Summary and possible directions for further work 45 11.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 11.2 Some possible directions for further work . . . . . . . . . . . . . . . . . . . . . . . . . 47 ∗ The algorithms described in this paper are the subject of an active patent application submitted on 23 May 2001 (Meredith et al., 2001).

1

1

1

INTRODUCTION

2

Introduction

In this paper we address the problem of designing an algorithm that, when given a representation of a passage of music as input, discovers all and only instances of perceptually significant repetition in the passage. The algorithm must be able to discover perceptually significant instances of repetition where the occurrences of the pattern are identical (exact repetition) and instances where the occurrences differ but are nonetheless perceived to be ‘versions’ of the same thing (modified repetition). Such an algorithm would have both scientific and engineering applications. One possible scientific use is as an important component in a computational model of expert music cognition. For example, if one were interested in evaluating hypotheses about the cognitive mechanisms underlying the perception of repetitions in music, one could embody these hypotheses as repetition discovery algorithms and then run the algorithms on datasets representing musical works. A repetition discovery algorithm could also be used to build useful tools for music analysts and composers. For example, a music analyst might find such an algorithm useful for discovering characteristic structural features in the works of a particular composer or in a particular genre. An analyst might also use the algorithm for analysing the structure of a work into its elemental ‘building blocks’. A composer who is suffering from ‘writer’s block’ could use such an algorithm to find significant repeated structures in an unfinished work thus discovering a fresh perspective that might stimulate further progress on the work. Another important practical use for such an algorithm is for music database indexing. The repeated structures discovered by the algorithm could be used for sparse indexing in cases where the databases used for content-based music retrieval are enormous. To enable interactive queries to such databases, some kind of indexing is certainly required. However, even if a highly effective indexing structure is used such as the most economical version of suffix trees (Kurtz, 1999), the index will be of excessive size (Lemstr¨om, 2000). For example, in the case of a database of 100,000 MIDI songs (which by current-day standards is only a moderately-sized database) (Bainbridge et al., 1999), a suffix tree would require at least 5 GB of main memory which is well beyond that typically available on current-day personal computers. Therefore, it would seem that a sensible strategy in such a case would involve an indexing structure storing only the musically most significant patterns. In a straightforward solution all the recurring patterns could be stored in such a structure, whereas a more complicated but more efficient solution would involve using analysis heuristics to store only those patterns that listeners perceive to be the most significant. If we assume that the perceptually significant patterns will typically be used as queries, then in this way we might be able to achieve interactive performance using only a moderately sized indexing structure. If the query cannot be located in the indexing structure, a fast online technique could be used such as those described by Lemstr¨om (2000), perhaps in combination with the approach outlined by Lemstr¨om et al. (2001). There are thus good scientific and practical reasons for attempting to develop an algorithm that can discover the perceptually significant instances of repetition in a musical passage.

2

The importance of repetition in music perception and analysis

Many music analysts and music psychologists (see, for example, Bent and Drabkin, 1987; Lerdahl and Jackendoff, 1983; Nattiez, 1975; Ruwet, 1972) have stressed that the identification of perceptually significant instances of repetition is an essential step in the process by which an expert listener achieves a rich interpretation of a musical work. One of the four main components of Lerdahl and Jackendoff’s Generative Theory of Tonal Music is devoted to what they call grouping structure (Lerdahl and Jackendoff, 1983, pp. 36–67). They describe the task of ‘grouping’ a musical surface as being ‘an auditory analog of the partitioning of

2

THE IMPORTANCE OF REPETITION IN MUSIC PERCEPTION AND ANALYSIS

A

3 & b 4 œ œ œ œ œ œ. œ Œ Œ . œ b & 5

œ œ œ œ. œ. ˙

œ

3

œ œ œ œ œ. œ. œ Œ Œ

˙

œ. œ ˙

œ

Figure 1: Opening measures from Beethoven’s Quartet, Op.18. No.1 (after Lerdahl and Jackendoff, 1983, p. 51).

the visual field into objects, parts of objects, and parts of parts of objects’ (Lerdahl and Jackendoff, 1983, p. 36). Lerdahl and Jackendoff’s grouping structure theory consists of two sets of rules: five ‘grouping well-formedness rules’ (GWFRs) and seven ‘grouping preference rules’ (GPRs). The GWFRs ‘define the formal notion group by stating the conditions that all possible grouping structures must satisfy’ (Lerdahl and Jackendoff, 1983, p. 37). The GPRs, on the other hand, ‘establish which of the formally possible structures that can be assigned to a piece correspond to the listener’s actual intuitions’. Of the seven GPRs, numbers 1 to 4 deal with local detail and numbers 5 to 7 deal with higher-level grouping. Of the three higher-level grouping rules, one in particular stands out as being of special relevance to the main problem addressed in this paper. This is GPR 6, the ‘Parallelism’ rule: GPR 6 (Parallelism) Where two or more segments of the music can be construed as parallel, they preferably form parallel parts of groups. (Lerdahl and Jackendoff, 1983, p. 51) Lerdahl and Jackendoff point out that ‘GPR 6 says specifically that parallel passages should be analyzed as forming parallel parts of groups rather than entire groups’ and explain that it is stated this way so that it can ‘deal with the common situation in which groups begin in parallel fashion and diverge somewhere in the middle, often in order for the second group to make a cadential formula’. Figure 1 shows an example of this. Measure 3 is an exact repetition of measure 1 and measure 5 is an exact transposition up an octave of measure 1. Measures 1, 3 and 5 are therefore segments that ‘can be construed as parallel’ and should therefore, according to GPR 6, ‘preferably form parallel parts of groups’. Since measure 1 can only form the beginning of a group (it begins the piece), measures 3 and 5 must therefore each form the beginning of its group. This example demonstrates that the recognition of parallelism—and, in particular, the identification of certain instances of exact and exact transposed repetition—is ‘important in establishing intermediate-level groupings’. Lerdahl and Jackendoff claim that GPR 6 (and, by implication, the identification and cataloguing of repetition) is also the major factor in all large-scale grouping. For example, it recognizes the parallelism

2

THE IMPORTANCE OF REPETITION IN MUSIC PERCEPTION AND ANALYSIS

4

between the exposition and the recapitulation of a sonata movement, and assigns them parallel groupings at a very large level, establishing major structural boundaries in the movement. (Lerdahl and Jackendoff, 1983, p. 52) Lerdahl and Jackendoff state that the importance of parallelism in musical structure cannot be overestimated. The more parallelism one can detect, the more internally coherent an analysis becomes, and the less independent information must be processed and retained in hearing or remembering a piece. (Lerdahl and Jackendoff, 1983, p. 52) Unfortunately, they admit that they are unable to state precisely the conditions that must be satisfied by two passages before they can legitimately be ‘construed as parallel’. Nonetheless, they are able to say that two passages that are ‘identical’ (i.e., related to each other by exact or exact transposed repetition) ‘certainly count as parallel’. Lerdahl and Jackendoff are not the only authors who have emphasized that identifying perceptually significant repetitions (or ‘parallelism’) is an important part of the process by which a listener achieves a rich interpretation of a musical work. In his overview of the field of music analysis for The New Grove, Ian Bent defines music analysis as the resolution of a musical structure into relatively simpler constituent elements, and the investigation of the functions of those elements within that structure. (Bent and Drabkin, 1987, p. 1) He points out that ‘the phrase ‘musical analysis’, taken in a general sense, embraces a large number of diverse activities’ that ‘represent fundamentally different views of the nature of music’ (Bent and Drabkin, 1987, p. 1). Nevertheless, in his view, the ‘central activity’ of music analysis is comparison and ‘the central analytical act is thus the test for identity’ (Bent and Drabkin, 1987, p. 5). This is especially true of musical semiotics, a field of analysis pioneered by Nicolas Ruwet (1972) and developed further by a number of authors including, in particular, Jean-Jacques Nattiez (1975). Ruwet first described the basic method of semiotic analysis in 1966 in his paper M´ethodes d’analyse en musicologie (Ruwet, 1966). The aim of the method is to take a monophonic musical surface and derive from this surface a syntax that can account for its structure. The first step in this process is to segment the surface and Ruwet uses the principle of repetition as his ‘principal criterion for segmentation’: Je. . . choisirai, comme principal crit`ere de division, la r´ep´etition. Je partirai de la constatation empirique du rˆole ´enorme jou´e en musique, `a tous les niveaux, par la r´ep´etition.. . (Ruwet, 1972, p. 111) [I choose to use repetition as my principal criterion for segmentation. I will take as my starting point the empirical observation that repetition plays an enormously important role in music at all levels.] In this respect, Ruwet was following the precedent set by Gilbert Rouget in his study of African chromaticism:

3

MOST REPETITIONS IN MUSIC ARE NOT PERCEPTUALLY SIGNIFICANT

# ## & # c

œ

? #### c œ œ 4

&

#### ‰

˙ ‰ ? #### ˙ ˙

œ œ { œ { œ œœ œœ … œ œ œ œœ œœ

‰ j ‰ œœ œœ # œœ n n œœ œ œ œ nœ nœ … ˙ œ ‹ œ œ œ ‰ œ œœ # œ ‰ # #{ ‹{ œ # œœ œ œ J n… ˙ nœ ˙

w w w œœ ‰ … œ nœ œ œœ ‰ nœ nœ

j œ … œœ … œ œœ J

‰ œ ‰ œ œ

j œœ … œ œ œœ J



j # œœ ‰ œœ œ œ #œ ˙ œ ‰ œ ‰ { œœ œœ { œ { J ˙ œ ˙

œ œœ œœ …œ ˙ #œ œ # œœ … ‰ œœ ˙ ˙



œœ œ

j œœ # # œœ ‰ n n œœ œ #{ œ nœ nœ œ ‹œ ‰ #œ œœ # ‹ œœ # #{ œ { n œ Jœ nœ ‰ œ

œ … œœ

5

‰ œ œ

j œ œœ … #œ # œœ J

‰ œ ‰ œ œ



j # œœ œ œ #{ ‰ œ œœ œ J œ j ‹ œœ ‹œ œ œœ J

Figure 2: Measures 1–6 of Rachmaninoff’s Prelude in C minor, Op.3 No.2.

c’est sur la r´ep´etition—ou l’absence de r´ep´etition—qu’est fond´e notre d´ecoupage. Lorsqu’une suite de sons est ´enonc´ee `a deux ou plusieurs reprises, avec ou sans variante, elle est consid´er´ee comme une unit´e. Corollairement, une suite de sons ´enonc´ee une seule fois, quels que soient sa longueur et le nombre apparent de ses articulations (notamment les silences) est consid´er´ee elle aussi comme une unit´e. . . (Rouget, 1961, p. 41) [Our segmentation is based on repetition—or the absence of repetition. When a sequence of sounds is stated two or more times, with or without variation, it is considered to be a unit. It follows from this that a sequence of sounds stated once, regardless of how long it may be or how many points of articulation (particularly rests) it may seem to contain, is also considered to be a unit. . . ]

3

Most repetitions in music are not perceptually significant

Our discussion in the previous section suggests that many music analysts and psychologists of music agree that the identification and cataloguing of significant instances of repetition within a musical work forms an important step in the process by which a listener achieves a rich interpretation of that work. However, not all instances of repetition within a work are perceptually significant. Consider, for example, the passage shown in Figure 2. The fourth measure (excluding the first quaver) is an exact repetition of the third measure and the fifth and sixth measures are modifications of the third measure. Most listeners, performers and music analysts would probably agree that recognizing these instances of repetition enriches one’s interpretation of the music. However, the pattern consisting of the notes in square boxes in Figure 2 is an exact transposed repetition of the pattern consisting of the circled notes. The ‘circled note pattern’ can be transformed into the ‘square box pattern’ by translating it 7 crotchets along the time axis and then transposing it up a minor ninth. Most listeners, performers and music analysts would probably not notice this instance of repetition unless it were first pointed out to them—we discovered it using the SIATEC algorithm described below. It is actually hard to hear the repetition even when one knows it is there, unless the notes in the pattern

4

THE CLASS OF PERCEPTUALLY SIGNIFICANT REPETITIONS IS VERY DIVERSE

6

are emphasized by, for example, playing them louder. Moreover, most listeners would probably agree that knowing about this instance of repetition does not enrich one’s understanding of the music. Indeed, hearing this repetition requires so much concentration and distracts the listener so much from the rest of the music that it is almost impossible to attend simultaneously to the perceptually significant repetitions in the passage such as the exact and modified re-statements of measure 3. In fact, it seems that in a typical piece of Western music, the vast majority of exact repetitions are not perceptually significant (see section 10 below). Therefore, the task of developing an algorithm that isolates all and only the perceptually significant instances of repetition in a work involves formally characterising what it is about these interesting repetitions that distinguishes them from the many repetitions that the listener and analyst either do not hear or do not consider to be perceptually significant.

4

The class of perceptually significant repetitions is very diverse

Unfortunately, it appears that the class of perceptually significant repetitions is a very diverse set. There are at least two reasons for this. First, the patterns involved in such repetitions vary widely in their structural characteristics. Second, there are many different ways in which a pattern can be modified to give a second pattern that is perceived to be a version of it. In section 6 below, we shall properly formalise the concept of a pattern. However, for now let us say that an object may be called a pattern if and only if it is a set of notes within a piece of music. In general, a repetition involves two patterns. If p1 and p2 are the two patterns involved in a repetition then we can fully specify the repetition by specifying p1 and a transformation t that transforms p1 into p2 . For example, in the case of the repetition indicated in Figure 2, p1 would be the pattern consisting of the circled notes and t would be a translation of 7 crotchets along the time axis followed by a tranposition up a minor ninth. In this case, the durations and relative onset times of the notes in p1 are preserved in p2 so this is an example of exact transposed repetition.

b & b bbbb c ‰ . ? bb b b c ‰ . bb b & b bbbb

r œ œ

œ. nœ œ.

A1 r œ .. œ œ . nn œœ œ .. œ œ.

œ bœ. œ œ A2 r œ œ œ . nn œœ œ œ œ.

œ. nœ œ. A3 œ. . œ . œ œœ . œ

œ œ œ

3

bœ œ nœ. ‰. A3 (cont.) A4 ? bb b b œ . bb bœ œ. œ. bœ œ.

3 ‰ . œ ∫ œ œ œ ∫ œ œ œ .. œ‰ œœ n œ ∫b œœ œ n œ b œ n œ ∫œ œ. R R 3 A5 3 œ. nœ b˙ nœ b˙ bœ nœ bœ œ. œ œ b ˙ . nœ nœ b˙ œ œ. 3

Figure 3: Measures 1–4 of Barber’s Sonata for Piano, Op.26, showing 5 occurrences of the rising bass pattern A1–A5.

We shall now present some examples of perceptually significant repetition that give some idea of

4

THE CLASS OF PERCEPTUALLY SIGNIFICANT REPETITIONS IS VERY DIVERSE

7

just how diverse this class of phenomena is. The patterns involved in a perceptually significant repetition may each contain only a few notes or a substantial fraction of all the notes in a piece. If the repeated pattern is just a very small motif, consisting of no more than a few notes, then usually, for such patterns to be perceived as being significant structural units, they have to occur frequently over the course of a piece. Figure 3 shows an example of this occurring in the opening measures of Barber’s Sonata for Piano, Op.26, where a rising left-hand motif in octaves (A1) occurs 5 times within the first 4 bars, both at pitch (A1–A3) and transposed up a perfect fourth (A4 and A5). On the other hand, in a sonata-form movement, it is typical for the whole exposition to be stated twice (as happens, for example, in most of Beethoven’s piano sonatas) and then repeated again in a modified form as the recapitulation. The exposition of a sonata-form movement may contain a significant fraction of all the notes in the movement. In the first movement of Beethoven’s Sonata in F minor, Op.2, No.1, for example, the exposition contains nearly a quarter of all the notes in the movement.

Figure 4: Measures 14–16 of the Fugue in C major from Book 1 of J.S. Bach’s Das Wohltemperirte Klavier.

In voiced music, a pattern involved in a perceptually significant repetition may only contain notes from one voice or it may contain notes from two or more voices. Figure 4 shows measures 14–16 from the Fugue in C major from Book 1 of J.S. Bach’s Das Wohltemperirte Klavier. This stretto passage involves four statements of a truncated version of the Fugue’s subject. Each of these statements only contains notes from a single voice. On the other hand, each of the patterns A1–A3 in Figure 5 (which shows mm.16–19 of the first movement of Mozart’s Symphony in G minor, K.550) involves the whole orchestra and contains notes from 13 voices. The patterns involved in a perceptually significant repetition may overlap in time as they do, for example, in Figure 4; or they may occur consecutively as they do in Figure 5; or they may be widely separated in time as they are in the case of the exposition and recapitulation of a sonata-form movement. Figure 5 demonstrates that a pattern involved in a perceptually significant repetition may be temporally compact —that is, it may contain all the notes in the piece that occur within the time interval spanned by the pattern. On the other hand, each statement of the subject in Figure 4 contains all and only those notes in a single voice that occur within the time interval spanned by the pattern. In baroque and renaissance music, it was common for singers and instrumentalists to embellish their parts by replacing a long note with a sequence of shorter notes usually beginning with a note with the same pitch as the original long note. This technique is called diminution. For example, if the two patterns in Figure 6 occurred within the same piece, pattern B occurring after pattern A, then B might be interpreted as an embellished repetition of A. In particular, notes 1, 5, 9 and 13 in pattern B might be heard as being a re-statement of pattern A. This suggests that a pattern involved in a perceptually significant repetition may not even contain all the notes in a single voice that occur within the time interval spanned by the pattern.

4

THE CLASS OF PERCEPTUALLY SIGNIFICANT REPETITIONS IS VERY DIVERSE

A1 Fl.

Ob.

Cl.(Bb)

Fg.

b &b C

A2 œ





Œ

b &b C Ó

A3 œ

#˙ Œ

˙˙

# œœ

Œ



œ



œ

œ

˙˙

# œœ

œœ

œœ

œœ

Œ

˙˙

# œœ

Œ

C #œ

Œ

# # ˙˙

# œœ

Œ

# # ˙˙

# œœ

Œ

# # ˙˙

# œœ

# # œœ

œœ

œœ

B bb C # œ

Œ

n ˙˙

#œ œ

Œ

n ˙˙

#œ œ

Œ

n ˙˙

#œ œ

n œœ

œ œ

œœ

&

&

C Ó

˙

œ

Œ

˙

œ

Œ

˙

œ

œ

œ

Cor. I (Bb)

œ

Cor. II (G)

b &bb C Ó

˙

œ

Œ

˙

œ

Œ

˙

œ

œ

œ

œ

Vl. I

b &b C ˙

Œ

b &b C ˙

Œ

Vla.

B bb C ˙

Vc. e Cb.

? bb C ˙

Vl. II

œ

œ

˙

œ

œ

˙

Œ

œ

œ

˙

Œ

œ

œ

˙

Œ

œ

œ

˙

œ

œ

˙

Œ

œ

œ

˙

Œ

œ

œ

˙

Œ

Œ

œ

œ

œ

œ

œ

œ

œ

œ

œ

œ

œ

œ

œ

œ

œ

œ

Œ

œ

œ

œ

œ

œ

œ

œ

œ

Œ

œ

œ

œ

œ

œ

œ

œ

œ

Œ

Figure 5: Measures 16–19 of the first movement of Mozart’s Symphony in G minor, K.550.

3 A &4 œ 1

œ 2

œ 3

˙.

B œ ˙. œ œ œ œ œ œ œœœœ œ

4

1 2 3 4 5 6 7 8 9 10 11 12 13

Figure 6: Example of a pattern being embellished using the technique of diminution.

8

4

THE CLASS OF PERCEPTUALLY SIGNIFICANT REPETITIONS IS VERY DIVERSE

œ

&b c



? c˙ b

œ.

œ. & b œ.

œ œ œ œ œ œ œ œ #œ

œ œ.

?b ˙

œ.

œ œ.

4

œ J

œ

œ œ. œ œ. œ

œ.

œ ˙ J

Ó

œ. œ œ œ



œ.

9

œœœœ œ. œ œ J

n œ œœ œ œ œ . œ œœ . œ œ

œ #œ. œ.

œ œ

œ œ.

œ œ.

œ

œ œ.

œ œ.

Figure 7: Measures 1–5 of Contrapunctus VI from Die Kunst der Fuge by J.S. Bach.

The term diminution is also (confusingly) used to signify a baroque contrapuntal technique in which the durations of the notes in a melodic theme are all divided by some constant value. Bach uses this technique in Contrapunctus 6 from Die Kunst der Fuge (Figure 7) where the first alto entry of the subject is obtained from the opening bass entry by halving the durational values of the notes. Bach uses the inverse of diminution—augmentation—in Contrapunctus 7 of Die Kunst der Fuge where the first bass entry is obtained from the alto entry by multiplying the durational values by 4. If a piece is represented as a set of points in a two-dimensional space where the dimensions are pitch and time and each point represents a note (see section 6 below), then the musical transformations of augmentation and diminution correspond to the geometrical transformation of central dilatation (see Borowski and Borwein, 1989, p. 162, s.v. dilatation). Figure 7 demonstrates that in baroque music one can also find cases of repetition where the two patterns involved are related by inversion: the first alto entry of the subject in Figure 7 is an inversion of the soprano entry. Again, if a musical passage is represented as a set of points in a two-dimensional space where the dimensions represent time and pitch, then the musical transformation of inversion corresponds to the geometrical transformation of reflection about an axis parallel to the time axis. The techniques of augmentation/diminution and inversion were sometimes used in combination in baroque music. For example, the soprano entry in Figure 7 is an inverted diminution of the bass entry. It is also possible to find perceptually significant repetitions in which one pattern involved in the repetition is the reverse of the other. This occurs, for example, in one of the canons in Bach’s Musikalisches Opfer. The complete canon is reproduced in Figure 8. This is a two-part canon in which one part is exactly the reverse of the other. Again, if we represent a musical passage as a set of points in a two-dimensional space in which the dimensions represent pitch and time, then we can see that the musical transformation of reversal corresponds to a geometrical transformation of reflection about an axis parallel to the pitch axis. There are many other ways in which a perceptually significant musical pattern may be modified to give a second pattern that is perceived to be a version of it. However, we hope that the foregoing discussion has at least served to convince the reader that the class of perceptually significant musical repetitions is a very diverse set. We have also shown that if the music to be analysed is appropriately represented as a set of points in a multidimensinal space, then certain types of com-

4

THE CLASS OF PERCEPTUALLY SIGNIFICANT REPETITIONS IS VERY DIVERSE

b &bb c

b &bb c œ

œ

œ

˙

˙

˙

˙

œ nœ œ œ œ œ œ œ

œ

Œ

n˙ œ

œ

œ

œ œ œ œ œ nœ

4

b &bb œ





œ





œ

œ



œ

œ œ œ œ b œ œ bœ bœ bœ œ œ œ œ œ œ œ œ œ œ œ œ & b b nœ nœ œ 7

b &bb



œ

œ

œ

˙

˙

˙

˙

b & b b œ œ œ œ œ œ œ œ œ œ œ œ nœ nœ œ œ œ œ œ œ œ œ œ œ 10

b & b b œ œ œ œ œ œ œ œ œ œ nœ nœ œ œ œ œ bœ œ œ œ œ œ œ œ b &bb ˙

˙

˙

˙

œ

œ

œ



œ nœ 13 b bœ œ œ œ œ œ & b b œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ nœ nœ œ b &bb

œ





œ

œ

œ



b œ œ œ œ œ & b b nœ œ œ

œ œ œ œ œ œ œ nœ œ

b &bb œ

˙

16

Œ



˙

œ œ

˙

Figure 8: The ‘Crab Canon’ from J.S. Bach’s Musikalisches Opfer.

œ





10

5

A SURVEY OF ALGORITHMS FOR DISCOVERING REPETITIONS IN MUSIC

11

mon thematic modification (e.g., diminution, augmentation, inversion, reversal and transposition) correspond directly to geometrical transformations within such a multidimensional representation.

5

A survey of algorithms for discovering repetitions in music

5.1

Introduction

In section 8 we present two new algorithms, SIA and SIATEC, that can be used for discovering certain classes of perceptually significant repetition in music. We believe that there are types of perceptually significant repetition that can be discovered using SIA and SIATEC that cannot be discovered using any other existing algorithms. In this section we present a survey of algorithms that can be used for discovering repetitions in music. For each algorithm, we identify the classes of perceptually significant musical repetition that it can be used to discover. Most of these algorithms are string-processing algorithms—that is, algorithms that are based on the assumption that the data to be analysed is represented in the form of a string of symbols or a set of symbol strings. Therefore, before taking a closer look at these algorithms, we define some basic string-processing concepts and terminology.

5.2

Some string-processing concepts and terminology

Let Σ be an alphabet if and only if it is a finite set of symbols. We then say that x is a string over the alphabet Σ if and only if it is the empty sequence, , or a sequence, x = x1 . . . xn , such that xi ∈ Σ for 1 ≤ i ≤ n. The set of all strings over an alphabet Σ is denoted by Σ∗ . If x = x1 . . . xn is a string then the length of x, denoted by |x| is n. The set of all strings of length m over Σ is denoted by Σm . If x, w, u and v are strings over Σ then: 1. w is a factor of x if and only if ∃u, v | x = uwv. 2. x is a superstring of w if and only if w is a factor of x. 3. w is a prefix of x if and only if ∃v | x = wv. 4. w is a suffix of x if and only if ∃v | x = vw. 5. w is a subsequence of x if and only if x can be transformed into w by deleting zero or more symbols in x. 6. w is a border of x if and only if w is both a prefix and a suffix of x.  and x are trivial borders of x. If x = uwv then we say that the string w occurs at position |u| + 1 of x. If x = uv then x is a concatenation of u and v. The concatenation of k copies of x is denoted by xk . If x and y are strings over an alphabet Σ such that x = x1 . . . xn and y = y1 . . . ym and xn−i+1 . . . xn = y1 . . . yi for some i ≥ 1, then the string z = x1 . . . xn yi+1 . . . ym = x1 . . . xn−i y1 . . . ym is the superposition of x and y. In this particular case, z is ‘the superposition of x and y with i overlapping symbols’. If x is a string x1 . . . xn and w = w1 . . . wm is a factor of x then w is a maximal repeated factor in x if and only if there exist at least two integers i and j such that the following four conditions are satisfied: 1. 1 ≤ i < j ≤ n − m + 1. 2. w = xi . . . xi+m−1 = xj . . . xj+m−1 . 3. xi+m = xj+m ∨ j + m > n.

5

A SURVEY OF ALGORITHMS FOR DISCOVERING REPETITIONS IN MUSIC

12

4. xi−1 = xj−1 ∨ i = 1). A factor w of a string x is called a period of x if x is a prefix of wk for some value of k or, equivalently, if x is a prefix of wx. Formally, if x is the string x1 . . . xn then the prefix of x, x1 . . . xp , 1 ≤ p < n, is a period of x if and only if xi = xi+p for all 1 ≤ i ≤ n − p. For example, abc is a period of abcabcabca. The period of a string x is the shortest period of x. A string x is a square if there exists a string w such that x = w2 . A string w is primitive if there exists no string u such that w = uk for k > 1. If x is a string and w is a factor of x then w is a periodic factor of x if there exists a primitive string u and an integer k > 1 such that w = uk .1 If a periodic factor w = uk (|u| = p) occurs at position i in a string x then w is a maximal periodic factor of x if uk+1 does not occur at position i in x and either i − p ≤ 0 or u does not occur at position i − p.2 A periodic factor w = uk that occurs at position i in a string x is an end-maximal periodic factor if uk+1 does not occur at position i in x.3 A factor w of a string x is called a cover of x if x can be constructed from concatenations and superpositions of w. For example, abca is a cover of abcabcaabca. The shortest-cover problem (also known as the superprimitivity test) is that of computing the shortest cover of a string. The all-covers problem is that of computing all the covers of a string. A factor w of a string x is called a seed of x if there exists a superstring of x which can be constructed from concatenations and superpositions of w. For example, abca is a seed of abcabcaabc. A string x is quasiperiodic if there exists a string w that is a cover of x. Alternatively, one can say that a string x is quasiperiodic if there exists another string w = x such that every position of x falls within some occurrence of w in x. For example, if x = abaababaababa then x is quasiperiodic because every position in x is within an occurrence of w = aba. z is a quasiperiodicity of x if and only if z is a quasiperiodic factor of x. w is the quasiperiod of z if and only if z is a quasiperiodicity and w is a cover of z. If z is a quasiperiodicity of x then it can be denoted by an ordered triple i, w, |z| where i is the starting position of z, w is the quasiperiod of z and |z| is called the span of z. If x is a string and z = i, w, |z| is a quasiperiodicity in x then z is maximal if and only if: 1. wa does not cover za when a = xi+|z| ; and 2. there is no quasiperiodicity j, w, |z  | in x, such that j < i and |z  | ≥ i − j + |z| (cf. Iliopoulos and Mouchard, 1999, p. 264). 1 5 10 15 19 For example, if x = then the following are quasiperiodicities in aabaabaaa b aaba a baa a x: q1 = 9, aba, 9 q2 = 9, abaa, 10 q3 = 2, abaa, 17 q4 = 1, aabaabaa, 18 q1 is not maximal because it does not satisfy the first condition of maximality given above. q2 is not maximal because it does not satisfy the second condition of maximality: it is embedded in q3 which has the same quasiperiod, abaa. Both q3 and q4 , however, are maximal—q3 is embedded in q4 but q4 cannot be covered by abaa. q4 shows that for a quasiperiodicity z = i, w, |z| , w is not necessarily the shortest cover of z. (The shortest cover of q4 is aabaa.) Two strings are said to be k-approximately similar if the latter can be obtained from the former (or vice versa) by using k or fewer editing operations (e.g., deletion, insertion, substitution) (Crochemore and Rytter, 1994). The minimum number of edit operations required to transform one string into another is called the edit distance between the two strings. If Σ is an alphabet of integers and δ and γ are integers, then we say that two symbols, a, b ∈ Σ, are δ–approximate, denoted by a =δ b, if and only if |a − b| ≤ δ. Two strings, x, y ∈ Σ∗ , are δ– δ approximate, denoted by x = y, if and only if |x| = |y| and xi =δ yi for all i ∈ {1 . . . |x|}. x and y 1 This

is slightly different from what Crochemore (1981, p. 244) calls a ‘repetition’. is the same as what Crochemore (1981) calls a ‘maximal repetition’. 3 This is the same as what Crochemore (1981) calls a ‘repetition’. 2 This

5

A SURVEY OF ALGORITHMS FOR DISCOVERING REPETITIONS IN MUSIC γ

are γ–approximate, denoted by x = y if and only if |x| = |y| and δ

|x|

i=1 γ

13

|xi − yi | < γ. We say that

the strings x and y are {δ, γ}–approximate if and only if x = y and x = y. If x = x1 . . . xn is a string and δ and γ are integers then a δ–approximate square occurs at position j in x if and only if there exists a string u ∈ Σm (called the root of the square) such that δ

xj . . . xj+m−1 = u

δ

and xj+m . . . xj+2m−1 = u

A {δ, γ}–approximate square occurs at position j in x if and only if there exists a string u (again, the root of the square) such that δ,γ

xj . . . xj+m−1 = u

5.3

δ,γ

and xj+m . . . xj+2m−1 = u

String-processing algorithms for discovering repetitions in music

In his General Computational Theory of Musical Structure (GCTMS ), Cambouropoulos (1998) presents what amounts to a computational model of his own interpretation of the semiotic method of analysis of Ruwet, Nattiez and their colleagues (as discussed in section 2 above). In GCTMS a musical surface is first segmented according to local grouping principles embodied in a system called the ‘Local Boundary Detection Module’ (Cambouropoulos, 1998, pp. 64–81). The surface is then analysed by a ‘Sequential Pattern Induction Algorithm’ (SPIA) that identifies repetitions in the surface. A ‘Selection Function’ is then used to assign a value to each instance of repetition found by the SPIA. This value increases with the length of the pattern and its frequency of occurrence and decreases with the degree to which instances of the pattern overlap. (We propose similar heuristics for isolating perceptually significant patterns in section 10 below.) Finally, the results generated by the SPIA and the Local Boundary Detection Module are used to produce a segmentation of the musical surface. This segmentation is given as input to another program called Unscramble which compares the segments in the segmentation and categorises them, placing similar segments into the same category. A fundamental assumption in Cambouropoulos’s GCTMS is that the music to be analysed is represented in the form of a string. There are many different meaningful and potentially useful ways of representing a piece of music as a string (or as a set of strings). However, it seems that almost every one of the string-based music representations that have been proposed in the literature falls into one of the following two categories: 1. Event strings: Each symbol in the string x represents a musical event. In this case, if xi and xj are symbols in x, then j > i if and only if the event represented by xj occurs later in the music than xi . 2. Interval strings: Each symbol xk represents the transformation required to transform an event e1 (xk ) into an event e2 (xk ) that follows it in the music. If xi , xj are any two symbols in such a string x, then j > i if and only if e1 (xi ) ≺ e2 (xi ) e1 (xj ) ≺ e2 (xj ) where ≺ means ‘precedes’ and means ‘precedes or is equal to’. Perhaps the most obvious example of an event string representation is one that represents the sequence of pitches in a single monophonic voice, each symbol representing the pitch of a note event.4 A rhythm can be encoded using an even simpler event string consisting simply of a sequence of durations. A slightly more sophisticated representation might encode both the pitch and duration of each note in a given voice. A set of such event strings could be used as an abstraction of the score of a polyphonic piece, a separate string being used for each voice. A completely homophonic passage 4 The pitch of the note may be represented in various ways. For example, one may use diatonic or chromatic pitch, MIDI number, pitch class, or some more complex pitch representation such as Cambouropoulos’s (1996) GPIR or one of Meredith’s (1999; 2001) MIPS representations or one of Brinkman’s (1986; 1990) representations.

5

A SURVEY OF ALGORITHMS FOR DISCOVERING REPETITIONS IN MUSIC

14

of music could be represented by a single event string in which each symbol represents the set of pitches in a single chord. Or each symbol could be an ordered pair representing the pitch content and duration of a chord. A non-homophonic polyphonic passage can also crudely be represented as a single event string in which each symbol represents the set of pitches of all notes starting at a given instant in the music (Dovey, 1999; Dovey and Crawford, 1999, 2000; Lemstr¨om, 2000). Although crude, this method of representation often suffices for music information retrieval purposes. Conklin and Anagnostopoulou (2001) describe a pattern discovery technique based on the music representation formalism of multiple viewpoints (Conklin and Witten, 1995). In this approach, a number of different strings are used to represent any given passage of music, each string representing a particular viewpoint. For example, one of the string representations used by Conklin and Anagnostopoulou describes contour, another represents pitch, a third represents inter-onset time interval between consecutive events and so on. Conklin and Anagnostopoulou also use event strings to represent, in a rudimentary way, musical structure on hierarchical levels higher than the note level. For example, in one of their representations, only the first note in each crotchet beat is represented; and in another, only the first note in each bar is represented. The most obvious example of an interval string representation is one in which each successive symbol represents the pitch interval between two consecutive notes in a single voice, the whole string thus representing the pitch sequence of a complete voice. One well-known advantage of this pitch interval string representation over the pitch event string representation is that it is much simpler to find transposition-invariant occurrences of a pattern in a pitch interval string representation (Lemstr¨om, 2000, p. 24).5 Such a representation may be augmented by adding duration information and a polyphonic piece could be encoded as a set of such interval strings or even as a string of bitvectors, each bitvector representing the set of intervals that occur between the set of pitches that occur at a given instant and the set of pitches that occur at the next instant at which a new pitch is sounded (Lemstr¨om and Tarhio, 2000).6 There are many other interval systems on which one could base an interval string music representation. Lewin (1987, Chapter 2) gives a number of examples of Generalized Interval Systems (GIS s) on which such representations could be based. One could, for example, represent the rhythm of a melodic line as a sequence of numbers, each number representing the ratio of the duration of a note to the one that precedes it (Lewin, 1987, p. 23). Indeed, interval and rational duration information may be encoded using a single value allowing both transposition and tempo invariances to be obtained using linear strings (Lemstr¨om et al., 1999). There is thus a multiplicity of ways in which one can represent musical passages using event strings and interval strings. Cambouropoulos (1998, pp. 88–90) claims that his SPIA can profitably be used on a variety of different event string and interval string representations. The original version of the SPIA component of Cambouropouos’s GCTMS was an implementation of a pattern induction algorithm described by Crow and Smith (1992, pp. 46–52). This algorithm computes all the maximal repeated factors in a string. In a later version of the SPIA, Cambouropoulos replaced the Crow-Smith algorithm with a more efficient algorithm based on a partitioning technique for refinements of equivalence relations described by Cardon and Crochemore (1982) and Crochemore (1981). The algorithm is given in full and analysed by Iliopoulos and Mouchard (1999, pp. 265–267). This algorithm finds all occurrences of all the distinct factors in a string in a worst-case running time of O(n log n) for a string of length n. This partitioning algorithm and Crow and Smith’s (1992) algorithm are two examples of stringprocessing algorithms that can be used for discovering various classes of repeated factors in strings. Other algorithms of this type include two other algorithms by Crochemore (1981): one for discov5 Although

there are occasions when it is more appropriate to calculate the intervals ‘on the fly’ from event strings in which each symbol represents the pitch of an event (Lemstr¨ om and Ukkonen, 2000). 6 If one represents a polyphonic piece as a set of pitch interval strings and one needs to be able to determine the interval relationships between notes in different voices, then one also has to encode at least the first pitch in each voice.

5

A SURVEY OF ALGORITHMS FOR DISCOVERING REPETITIONS IN MUSIC

15

ering all the end-maximal periodic factors in a string and another for computing all the maximal periodic factors in a string. Both of these algorithms have a worst-case running time of O(n log n). Apostolico and Preparata (1983) give another algorithm that computes the same as Crochemore’s partitioning algorithm but uses more complex data structures. Apostolico and Ehrenfeucht (1993) described an algorithm based on that of Apostolico and Preparata (1983) for computing all the maximal quasiperiodic factors in a string in time O(n log2 n). Iliopoulos and Mouchard (1999) give an O(n log n) algorithm that computes all the maximal quasiperiodicities in a string. Their algorithm shadows the square detection algorithm of Crochemore (1981). It is faster by a factor of O(log n) than the algorithm described by Apostolico and Ehrenfeucht (1993) for solving the same problem. Iliopoulos and Mouchard (1999, p. 263) claim that the computation of maximal quasiperiodicities ‘has direct application in music analysis’. For example, they suggest that one might want to find all repeated factors in a piece that are at most k positions apart or find all repeated non-overlapping factors of length greater than k for some given integer k. Other algorithms for finding various classes of repeated factors in strings include Apostolico, Farach and Iliopoulos’s (1991) linear-time algorithm for solving the shortest-cover problem (superprimitivity test) and Breslauer’s (1992) linear-time on-line algorithm for solving the same problem. Breslauer (1994) also presented algorithm that computes the shortest-cover of a string   a parallel n log n in O(log log n) time using an log log n –processor concurrent-read-concurrent-write parallel random access machine (CRCW PRAM). Iliopoulos and Park presented optimal O(log log n)–time (thus worktime optimal) parallel algorithms for the shortest-cover problem (Iliopoulos and Park, 1994) and the all-covers problem (Iliopoulos and Park, 1996). Moore and Smyth (1994) presented a linear-time algorithm for the all-covers problem and Li and Smyth (1999) gave a linear-time on-line algorithm for computing all the covers of all prefixes of a string. Iliopoulos et al. (1993) introduced the idea of a seed and gave an O(n log n)–time algorithm for computing all the seeds in a string. A parallel algorithm for solving the same problem was presented by Ben-Amram et al. (1994) that requires O(log n) time and O(n log n) work. Hsu et al. (1998) used a dynamic programming technique to find repeating factors in strings representing monophonic melodies. Though having an O(n4 ) worst case time complexity, it works much more efficiently in practice. In their study, Hsu et al. did not consider transposition invariance, but this can be obtained straightforwardly by using an interval string representation instead of an event string representation. First they use a correlative matrix, whose processing take O(n2 ) time. The output of the phase is a candidate set including all the patterns comprised in S (even those that appear only once), together with their frequency. In a second phase, each candidate which is a subset of another candidate c (and whose frequency does not exceed the frequency of c) is removed from the candidate set. In a pathological case the number of required operations for this phase can be O(n4 ). However, there are rather fewer candidates to be considered in practice. These and other algorithms for finding repeated factors in strings can be used to find a restricted (but nonetheless important) class of perceptually significant musical repetitions. For example, if the stretto passage in Figure 4 were represented as a set of pitch interval strings, each string representing a voice and each symbol in each string representing the diatonic interval between two consecutive notes in a voice, then the occurrences of the fugue subject in this passage would be discovered by an algorithm that only found the exactly repeated factors in a string. The patterns A1–A3 in Figure 5 could also be discovered by such an algorithm if the passage were represented as an event string in which each symbol represented the set of pitches of all notes starting at a particular instant. The repetition in Figure 6 could be discovered using an algorithm for discovering exactly repeated factors if one were to adopt one of Conklin and Anagnostopoulou’s (2001) suggestions and represent the music as a string in which each symbol represents the pitch of the first note in a crotchet beat. In baroque contrapuntal keyboard works, the voice structure of the music is often very clearly indicated in the score by cues such as the directions of note stems (see, for example, Figure 4).

5

A SURVEY OF ALGORITHMS FOR DISCOVERING REPETITIONS IN MUSIC

bb & b b b b c ‰ . œr œ bb & b b bb c

œ.nœ œ. œ bœ. œ œ œ.nœ œ . œ nœ .bœ œ ∑

B1

? b b b c ‰ . r œ .. bbb œ œ.nœ ? b b b c ‰. r bbb œ .. œ œ.nœ B6

Œ



œ .∫ œ œ

16

3 3 ‰ œ .∫ œ œ œ ∫ œ œ nœ bœnœ

‰. r r œ œ .. ∫ œ œ



3

3

r œ œ. œ . œ œ .bœ œ .n œb˙ œ œ.nœ œ

œn œ b œ œn œb œ œ . œ . n œb ˙ œ

r œ œ. œ . œ œ .bœ œ .n œb˙ n œ œ . œ œ

œ . œ . n œb ˙ œ

B2

B7

B3

B8

B4

B9

B5

B10

Figure 9: Measures 1–4 of Barber’s Sonata for Piano, Op. 26, showing one reasonably natural way of partitioning the notes into monophonic voices.

However, in nineteenth and twentieth century piano music, the voicing is often ambiguous and not explicitly represented in the score. If one wants a ‘neutral’ representation of a score that does not represent just one particular encoder’s interpretation of the score, then one should omit information that is not explicitly represented in the score. A neutral representation of a keyboard work will therefore often contain no information about voicing. A symbolic representation of what a listener hears when he or she listens to a performance of a work for piano or harpsichord must also not contain any explicit voicing information. Such a representation might be used, for example, as input to a repetition-discovery algorithm forming one part of a computational model of an expert listener. Many music databases contain music in the form of MIDI files but a MIDI representation of a polyphonic keyboard work is often structured so that all the notes are in a single channel. Such a file therefore contains no voicing information. A repetition-discovery algorithm should therefore be capable of finding perceptually significant repeated patterns in a musical representation that does not contain voicing information. Figure 9 shows one reasonably natural way of partitioning the notes in the passage in Figure 3 into monophonic voices. The music in Figure 9 could be represented using four pitch interval strings, each string representing the sequence of pitch intervals in one of the four monophonic parts shown. However, if the music were represented in this way, then a string processing algorithm for finding exactly repeated factors might discover the ten statements of pattern B1 shown in Figure 9 but no algorithm for finding exactly repeated factors would discover the five occurrences (A1–A5) of the rising bass pattern in Figure 3. If we want to use an algorithm that only finds exactly repeated factors for discovering the set of patterns A1–A5 in Figure 3, then we have to allow the ‘voices’ into which we partition the music to be polyphonic. Figure 10 shows a reasonably natural way of partitioning the notes in the passage into three parts, one of which is polyphonic. If each of the three polyphonic ‘streams’ in Figure 10 is represented by an interval string in which each symbol represents the transformation required to convert a chord into the next chord that occurs in the part then all five occurrences of the boxed pattern would be represented by the same factor in the interval string representation of the third part. However, before analysing a work, the user has no idea which particular ways of partitioning it into polyphonic ‘streams’ will result in the discovery of perceptually significant repetitions. To be

5

A SURVEY OF ALGORITHMS FOR DISCOVERING REPETITIONS IN MUSIC

17

sure of discovering all such repetitions, the user would therefore have to use a multiple representation in which every possible partitioning of the work into polyphonic ‘streams’ were represented. This would be completely impractical for a work with many parts such as an orchestral work.

bb & b b bb c ‰ .

r œ œ

œ.

bb & b b bb c

œ

œ œ

bœ.

A2

nœ.

bœ œ

œ.

bb & b b bb Œ

‰.

r œ œ ..

A3 r œ œ

? b b b œ. bbb œ.

A4 bœ œ. bœ œ.

nœ b˙ nœ b˙

∫œ œ

3

œ.

∫œ œ

œ

r ∫œ œ



3

œ. œ.

œ. œ.

nœ œ nœ œ

œ. œ.

3

A3 (cont.)

œ



n œ œ .. n œ œ ..

b & b b bb b

nœ œ.

œ.

∑ A1 r œ œ. œ œ.

? b b b c ‰. bbb

nœ œ.

A5 œ œ. œ œ.



œ œ. œ œ.

œ œ

3

∫œ

œ nœ

bœ nœ

3

œ nœ bœ

œ nœ bœ

nœ b˙ nœ b˙

Figure 10: Measures 1–4 of Barber’s Sonata for Piano, Op. 26, showing one plausible way of partitioning the notes into polyphonic voices.

The foregoing discussion demonstrates that the class of perceptually significant musical repetitions that can be discovered using an algorithm that discovers the exactly repeated factors in a string is heavily dependent upon the particular type of string representation used. To discover all those repetitions in which the repeated patterns can be represented as identical factors in a string, one has to use not only a variety of algorithms but also a multitude of distinct representations. So far we have only discussed algorithms that discover exactly repeated factors in strings. There also exist algorithms for discovering various types of approximately repeated factors. For example, Cambouropoulos et al. (1999, pp. 137–140) give two pattern induction algorithms, one for computing all the δ–approximate squares in a string and another for computing all the {δ, γ}–approximate squares in a string. Both algorithms have a worst-case running time of O(n2 ). These algorithms could be modified to find all δ–approximate and {δ, γ}–approximate factors in a string. In order to be able to measure the overall similarity between two strings representing (monophonic) musical passages, Mongeau and Sankoff (1990) modified the well-known dynamic programming approach for calculating the edit distance between two strings (Ukkonen, 1985). In addition to the traditional tool-box of editing operations (containining insertion, deletion, and substitution) they introduced two novel musically motivated operations called fragmentation and consolidation. By limiting the number of notes that can be involved in a fragmentation or consolidation with suitably selected constants, the original time complexity of the dynamic programming approach (i.e., O(mn), m and n denoting the lengths of the two string representations being compared) can be preserved.

5

A SURVEY OF ALGORITHMS FOR DISCOVERING REPETITIONS IN MUSIC

18

They also showed that the problem of finding local similarities in any two monophonic musical passages given to the algorithm as arguments can be solved by a straightforward modification of their method. Thus, by making both arguments of the algorithm the same, all the k–approximately similar repeated factors can be discovered. Another approach to discovering approximately repeated patterns in music is presented by Rolland (1999). Like Mongeau and Sankoff (1990), Rolland uses an extended toolbox of editing operations. In his method, the length of the patterns involved in the discovered repetitions is limited by setting a minimal and maximal length for them.7 Then, all the patterns (factors) falling in the allowed range are extracted from the musical string. Having created a node for a pattern, it is compared against other patterns (whose length is close enough to that of the considered pattern). If their similarity exceeds a given threshold value, an arc between the two nodes corresponding to these patterns is created and the weight of this arc is set according to their similarity. In this way, the related patterns form stars in a similarity graph. Finally, the patterns are sorted by their prominence: for every pattern in the similarity graph, the weights of the arcs by which they are connected to other nodes are accumulated. Then, the patterns are listed in descending order according to the accumulations. Rolland claims an overall time complexity of O(n2 ). However, if the algorithm is required to find patterns of any size this time complexity rises to (at least) O(n4 ), making it less efficient in the worst case than SIATEC. Also, unlike SIATEC and SIA, Rolland’s algorithm is currently limited to the discovery of patterns in monophonic sources. It was mentioned above (p. 15) that an algorithm for finding exactly repeated factors could be used to find the repetition in Figure 6 but only if the music were represented as a string in which only the first note in each crotchet beat were represented. However, such an ad hoc expedient would not suffice for detecting the fact that pattern B in Figure 11 is an embellished restatement of pattern A. This is because the notes in pattern A do not all fall on the beat. The only way to discover a repetition like this using a string-processing algorithm is to employ an approach such as Rolland’s that discovers k–approximately similar factors.

A 3 & 4 .. œ

r œ œ

œ

˙

œ

B œ œ3 œ œ œ œ œ œ œ œ œ & œ œ œ œ œ œ œ œ œ

Figure 11: Pattern B is an embellished repetition of pattern A.

Such an algorithm judges two patterns to be ‘similar’ if and only if the number of edit operations required to transform one into the other is less than some threshold k. However, for the two patterns 7 In

SIA and SIATEC there are no such bounds—the patterns discovered may be of any size.

6

REPRESENTING MUSIC USING MULTIDIMENSIONAL DATASETS

19

in Figure 11 to be judged ‘similar’ by such an algorithm, this threshold k would have to be set to at least 14 to allow for the 14 insertions required to transform pattern A into B. But this threshold is far too high in general, because, with such a high threshold, the algorithm would judge many highly dissimilar patterns to be ‘similar’. An algorithm for discovering k–approximately similar patterns in strings therefore cannot be used to find repetitions in which one pattern is a highly embellished version of the other. However, such an algorithm can be used to find repetitions in which the two patterns differ by only one or two notes. So far, we have only considered algorithms that discover repeated factors in strings. We now briefly review algorithms that discover repeated subsequences—that is, ‘patterns with gaps’. The problem of discovering repeated patterns within a string is closely related to the following wellknown string-processing problem: given a finite set W of words, find a pattern, p that is the longest substructure (i.e., factor or subsequence) of every word in W . Note, that W may have been obtained from a long string S that has been divided into |W | factors. It is known (Crochemore and Rytter, 1994, p. 25), that if |W | is not constant and the substructures to be considered are subsequences, the problem becomes NP-complete. However, if factors instead of subsequences are considered, the problem is solvable in polynomial time. This illustrates that finding repetitions with ‘gaps’ (i.e., repeated subsequences) is much more complex than finding repetitions without gaps (i.e., repeated factors). Very little effort has been made so far to solve the problem of discovering repeated subsequences in strings. However, an algorithm for discovering subsequences might be able to discover repetitions such as the one shown in Figure 11 without also computing many spurious instances of repetition. The problem is somewhat related to the problem dealt with in molecular biology, where several DNA sequences are to be aligned in an optimal way (and therefore the common subsequences of the considered DNA sequences are to be found). These problems are inherently different from each other and, in some respects, the discovery of patterns in DNA sequences and proteins is, in fact, somewhat simpler than the discovery of musically significant patterns in polyphonic music. This is because both proteins and nucleic acids can, at least at the primary level of structure, be appropriately modelled as 1-dimensional strings of symbols taken from a highly restricted alphabet; whereas much polyphonic music cannot even be appropriately represented as a set of 1-dimensional strings and the musical “alphabets” used are (at least in principle if not generally in practice) infinite and multidimensional. Nevertheless, we shall outline, as an example, a string matching approach to the discovery of repeated subsequences in 1-dimensional strings given by Floratos and Rigoutsos (2000). The idea is to start with initial patterns of a given length (possibly containing also ‘don’t care’ characters), and proceed recursively to generate longer and longer patterns appearing in the data set. Floratos and Rigoutsos attempt to avoid the inherent NP-hardness of the problem by limiting the length of the considered patterns. With the aid of two suffix structures, they attempt to extend the considered patterns both backwards (by assigning a prefix to the pattern) and forwards (by assigning a suffix). Finally, the generated patterns representing exactly the same subsequences in a different way, are combined in a maximal pattern.

6

Representing music using multidimensional datasets

It appears that most previous approaches to repetition-discovery in music have been based on the assumption that the music to be analysed is represented as a string or a set of strings. As shown above, there exist string-processing algorithms for discovering both exactly and approximately repeated factors and subsequences. However, to be sure of finding all instances of certain classes of perceptually significant repetition in a musical passage using a string-based approach, one has to run a variety of algorithms on a multitude of different representations of the passage. Moreover, when the music

6

REPRESENTING MUSIC USING MULTIDIMENSIONAL DATASETS

20

to be analysed is polyphonic, there are certain classes of repetition that cannot be discovered if the music is represented using strings. This implies that the string-based approach is not suitable for discovering certain classes of repetition in polyphonic works or databases of polyphonic works. We have avoided the problems inherent in string-based approaches to repetition-discovery by developing a new approach in which the music to be analysed is represented as a multidimensional dataset . This allows many different classes of perceptually significant repetition to be discovered efficiently by running appropriately designed algorithms (such as SIA and SIATEC) on orthogonal projections of a single multidimensional representation of the music. Repetitions such as the one in Figure 11 can easily be found using this approach. It is also possible to use this approach to find repeated (possibly overlapping) polyphonic patterns ‘with gaps’ in unvoiced polyphonic music such as keyboard music.8 The so-called ‘distributed matching’ problem (Crawford et al., 1998, p. 84) can also be solved using this new geometric approach. By representing music using multidimensional datasets we can dispense with the need for multiple representations, analyse complex polyphonic music as simply and efficiently as monophonic music and discover repetitions in the timbre, dynamic and rhythmic structure of a passage as well as its pitch structure. Informally, a multidimensional dataset can be visualized as a set of points in a multidimensional space. Before giving a formal definition, we need to define some basic concepts.9 An object is a vector if and only if it is a finite ordered set of numbers. An object is a k-dimensional vector if and only if it is a vector of cardinality k. We assume here that every number in a vector is integral, rational or real. An object is a vector set if and only if it is a set of vectors. An object is a k-dimensional vector set if and only if it is a vector set in which every vector has cardinality k. An object may be called a pattern or a dataset if and only if it is a k-dimensional vector set. An object may be called a datapoint if and only if it is a vector in a pattern or a dataset. We usually reserve the term dataset for a k-dimensional vector set that represents some complete set of data that we are interested in processing. We usually reserve the term pattern for a k-dimensional vector set that is a subset of some specified dataset or a transformation of some subset of a dataset. Also, if we have two k-dimensional vector sets P and D and we wish to search for occurrences of P in D then we would usually refer to P as a pattern and D as a dataset. Every pattern is a k-dimensional vector set and every dataset is a k-dimensional vector set but we only call a k-dimensional vector set a pattern or a dataset if the vectors it contains are intended to be interpreted as position vectors (i.e., datapoints). There are many different appropriate ways of representing a passage of music as a multidimensional dataset. Figure 13 shows a 5-dimensional dataset that represents the music in Figure 12. Each datapoint in Figure 13 represents a single note in Figure 12. The first element in each datapoint represents the onset time of the note in terms of the number of semiquavers that have elapsed by the time the note occurs. The second element in each datapoint represents the chromatic pitch of the note as defined by Meredith (1999, 2001). The chromatic pitch of a note can be understood to be a numerical representation of the key on a normal piano keyboard that would have to be pressed to play the note. The chromatic pitch of A0 which is usually the lowest note on a piano keyboard, is defined to be 0. The chromatic pitch of B0 , the note one semitone above A0 , is therefore 1 and the chromatic pitch of middle C (C4 ) is therefore 39. An increase in pitch of one semitone results in an increase of 1 in chromatic pitch. The concept of chromatic pitch is similar to Brinkman’s continuous pitch code (Brinkman, 1990, p. 122). The third element in each datapoint represents the morphetic pitch of the note as defined by Meredith (1999, 2001). The morphetic pitch of a note indicates the position of the notehead of the note on the staff—a rise of one step on a staff increases the morphetic pitch by one. The morphetic pitch of A0 is defined to be 0. The morphetic pitch of middle C (C4 ) 8 This

corresponds to the class of repetition-discovery problem that Crawford et al. (1998, p. 88) call ‘unstructured repetitions’. 9 Some well-known mathematical concepts (e.g., set and ordered set) are not defined here. For definitions and explanations of such well-known concepts see, for example, Borowski and Borwein (1989) or Cormen et al. (1990, Chapter 5).

6

REPRESENTING MUSIC USING MULTIDIMENSIONAL DATASETS

C A B bb b c ≈ & œœœœœœœœœœœœœœœ œ œ œ œ ? bb c œ œ b

œ

œ

œ

œ

œ



œ

21

œ

œ



œ œœœœ œœœœœœœœœœœ

Figure 12: The first two measures of the Prelude in C minor from Book 2 of J.S. Bach’s Das Wohltemperirte Klavier, BWV 871/1.

{

0, 27, 16, 2, 2 , 4, 32, 19, 2, 2 , 7, 44, 26, 1, 1 , 10, 41, 24, 1, 1 , 14, 38, 22, 2, 2 , 17, 34, 20, 1, 2 , 20, 44, 26, 2, 1 , 24, 34, 20, 1, 2 , 27, 30, 18, 1, 2 , 30, 50, 29, 2, 1 ,

1, 46, 27, 1, 1 , 4, 47, 28, 1, 1 , 8, 30, 18, 2, 2 , 11, 42, 25, 1, 1 , 14, 39, 23, 1, 1 , 18, 32, 19, 1, 2 , 21, 32, 19, 1, 2 , 24, 42, 25, 2, 1 , 28, 32, 19, 1, 2 , 31, 29, 17, 1, 2

2, 39, 23, 2, 2 , 5, 44, 26, 1, 1 , 8, 46, 27, 1, 1 , 12, 29, 17, 2, 2 , 15, 41, 24, 1, 1 , 18, 51, 30, 2, 1 , 22, 30, 18, 1, 2 , 25, 30, 18, 1, 2 , 28, 41, 24, 2, 1 , }

2, 44, 26, 1, 1 , 6, 39, 23, 2, 2 , 9, 42, 25, 1, 1 , 12, 44, 26, 1, 1 , 16, 27, 16, 1, 2 , 19, 34, 20, 1, 2 , 22, 51, 30, 2, 1 , 26, 29, 17, 1, 2 , 29, 29, 17, 1, 2 ,

3, 46, 27, 1, 1 , 6, 42, 25, 1, 1 , 10, 39, 23, 2, 2 , 13, 41, 24, 1, 1 , 16, 42, 25, 2, 1 , 20, 35, 21, 1, 2 , 23, 32, 19, 1, 2 , 26, 51, 30, 2, 1 , 30, 27, 16, 1, 2 ,

Figure 13: 5-dimensional dataset representation of the music in Figure 12.

7

THE FUNCTIONS COMPUTED BY SIA AND SIATEC

22

Figure 14: A two-dimensional orthogonal projection of the dataset in Figure 13 onto the plane defined by the onset time and chromatic pitch dimensions.

is therefore 23, D above middle C has a morphetic pitch of 24 and so on. The concept of morphetic pitch is similar to Brinkman’s continuous name code (Brinkman, 1990, p. 126). The fourth element in each datapoint represents the duration of the note measured in semiquavers and the fifth element represents the voice in which the note occurs (here, a value of 1 indicates the upper voice and a value of 2 indicates the lower voice). Each orthogonal projection of a multidimensional dataset is itself a multidimensional dataset. This implies that if one has a rich multidimensional dataset representation D of a musical passage and a repetition-discovery algorithm such as SIA or SIATEC that can process datasets of any dimensionality, then one is equipped for discovering repeated structures not only in the original complete dataset D but also any orthogonal projection of D. This can be useful. For example, it is undoubtedly true that most listeners would consider patterns B and C in Figure 12 to be perceptually significant repetitions of pattern A. If one considers only the first two elements in each datapoint in Figure 13 then one gets the projection shown in Figure 14 which represents the onset time and chromatic pitch of each note. We say that two patterns P1 and P2 are translationally equivalent if and only if there exists a vector v such that P1 translated by v gives P2 . Note that patterns B and C in Figure 14 (which correspond to patterns B and C respectively in Figure 12) are not translationally equivalent to A. This implies that an algorithm such as SIA or SIATEC that only discovers instances of repetition in which the two patterns are translationally equivalent, would not discover the repetitions indicated in Figure 14. However, if one considers only the first and third element in each datapoint in Figure 13 then one gets the 2-dimensional projection shown in Figure 15. Patterns A, B and C in Figure 15 are translationally equivalent and therefore would be discovered by an algorithm that only finds instances of repetition in which the repeated patterns are translationally equivalent. This example also shows that instances of approximate repetition in one projection of a dataset might correspond to instances of exact repetition in some other projection of the dataset.

7

THE FUNCTIONS COMPUTED BY SIA AND SIATEC

23

Figure 15: A two-dimensional orthogonal projection of the dataset in Figure 13 onto the plane defined by the onset time and morphetic pitch dimensions.

×b ×

e

3

×d ×f

2 y

× × a

1

c

0 0

1

x

2

3

Figure 16: A simple two-dimensional dataset containing 6 datapoints.

7

THE FUNCTIONS COMPUTED BY SIA AND SIATEC

24

B

A

b &bb c ≈œ œ œ œ œ œ œœ œœ œœ œœ œ œ œ œ œ ? bb c œ œ b C

œ

œ

œ

œ

œ



œ

œ

œ



œ œœœœ œœœœœœœœœœœ D

Figure 17: The first two measures of the Prelude in C minor from Book 2 of J.S. Bach’s Das Wohltemperirte Klavier, BWV 871/1.

7

The functions computed by SIA and SIATEC

In this section we develop mathematical expressions for the functions computed by the SIA and SIATEC algorithms. Let D be a dataset and let d1 and d2 be any two datapoints in D. The vector from d1 to d2 is given by d2 −d1 where the minus sign denotes vector subtraction. For example, in the dataset in Figure 16, the vector from a = 1, 1 to f = 3, 2 is 3, 2 − 1, 1 = 3 − 1, 2 − 1 = 2, 1 . If v = d2 − d1 then d2 = v + d1 which expresses the fact that the datapoint d1 can be translated by the vector v to give the datapoint d2 . If A is an ordered set (such as a vector) then we denote the cardinality of A by |A| and the ith element of A by A[i]. If u and v are two vectors such that |u| = |v| = k then we say that u is less than v, denoted by u < v, if and only if there exists an integer i such that 1 ≤ i ≤ k and u[i] < v[i] and u[j] = v[j] for 1 ≤ j < i. For example, 1, 1 < 1, 2 < 2, 1 . We say that a pattern P is translatable by a vector v in a dataset D if and only if P can be translated by v to give a pattern that is a subset of D. For example, in the dataset shown in Figure 16, the pattern {a, c} is only translatable by the vectors 1, 1 and 0, 2 . The maximal translatable pattern (MTP) for a vector v in a dataset D, denoted by MTP (v, D), is the largest pattern translatable by v in D. Formally, MTP (v, D) = {d | d ∈ D ∧ d + v ∈ D} .

(1)

For example, if we consider the dataset in Figure 16, then the MTP for the vector 1, 0 is {a, b, d} and the MTP for 1, 1 is {a, c}. In music, MTPs often correspond to the patterns involved in perceptually significant repetitions. For example, each of the patterns A, B, C and D in Figure 17 corresponds to an MTP in the twodimensional dataset representation shown in Figure 18. Our goal in developing SIA was to design an efficient algorithm for computing all the non-empty MTPs in a dataset. The MTP for a vector v in a dataset D is non-empty if and only if there exist at least two datapoints d1 and d2 in D such that v = d2 − d1 . This implies that the complete set of non-empty MTPs for a dataset D is given by P(D) = {MTP (d2 − d1 , D) | d1 , d2 ∈ D} .

(2)

7

THE FUNCTIONS COMPUTED BY SIA AND SIATEC

25

Figure 18: A two-dimensional orthogonal projection of the dataset in Figure 13 onto the plane defined by the onset time and chromatic pitch dimensions.

In the dataset in Figure 18, pattern A is the MTP for the vector 16, −12 . If we translate pattern A by the vector 16, −12 we get pattern D which is the MTP for the vector −16, 12 . Let us denote by τ (P, v) the pattern that results when the pattern P is translated by the vector v. Formally, τ (P, v) = {d + v | d ∈ P } .

(3)

We will now prove that, in general, if we translate by v the MTP for v, we get the MTP for the vector −v. Lemma 1 If D is a dataset and v is a vector then τ (MTP (v, D), v) = MTP (−v, D).

(4)

τ (MTP (v, D), v) = {d1 + v | d1 ∈ MTP (v, D)} .

(5)

Proof From Eq.3 we deduce that

Substituting Eq.1 into Eq.5, we find that τ (MTP (v, D), v)

=

{d1 + v | d1 ∈ {d2 | d2 ∈ D ∧ d2 + v ∈ D}}

=

{d2 + v | d2 ∈ D ∧ d2 + v ∈ D} .

(6)

If we let d3 = d2 + v and substitute this into Eq.6, we deduce that τ (MTP (v, D), v) = {d3 | d3 − v ∈ D ∧ d3 ∈ D} .

(7)

Eqs.7 and 1 together imply τ (MTP (v, D), v) = MTP (−v, D). 

7

THE FUNCTIONS COMPUTED BY SIA AND SIATEC

26

Lemma 1 tells us that if we compute MTP (d2 −d1 , D) then we can find MTP (d1 −d2 , D) simply by translating MTP (d2 − d1 , D) by d2 − d1 . It is also clear that MTP (0, D) = D where 0 is the zero vector. These two facts imply that our MTP-discovery algorithm only really needs to compute the set P (D) = {MTP (d2 − d1 , D) | d1 , d2 ∈ D ∧ d1 < d2 } . (8) However, if our algorithm simply generated the set P (D), then we would not be able to tell which vector each pattern was the MTP for. Therefore, our SIA algorithm actually computes the set S(D) = {d2 − d1 , MTP (d2 − d1 , D) | d1 , d2 ∈ D ∧ d1 < d2 } .

(9)

Each member of S(D) is an ordered pair in which the first element is a vector v and the second element is the MTP for v in D. Figure 19 shows S(D) for the dataset in Figure 16. {

        

0, 1 , 0, 2 , 1, −2 , 1, −1 , 1, 0 , 1, 1 , 1, 2 , 2, −1 , 2, 1 ,

{2, 1 , 2, 2 } {1, 1 , 2, 1 } {1, 3 } {1, 3 , 2, 3 } {1, 1 , 1, 3 , 2, 2 } {1, 1 , 2, 1 } {1, 1 } {1, 3 } {1, 1 }

, , , , , , , ,

}

Figure 19: The set S(D) for the dataset in Figure 16.

C

D

b & b b c ≈ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ nœ A

B

? bb c œ œ œ œ œ œ œ n œ b

œ œœ

œ

œ Fœ œ œ œ Gœ œ œ H œ œ œ œ œœœ œ œ œ E

Figure 20: The first two measures of the Prelude in C minor from Book 2 of J.S. Bach’s Das Wohltemperirte Klavier, BWV 871/1.

Figure 21 is a 2-dimensional dataset representation of the music in Figure 20. In this representation only the morphetic pitch and onset time of each note event is represented. Patterns A–H in Figure 20 correspond to patterns A–H in Figure 21. SIA tells us that pattern A in Figure 21 is

7

THE FUNCTIONS COMPUTED BY SIA AND SIATEC

27

Figure 21: A two-dimensional dataset representation of the music in Figure 20 showing the onset time and morphetic pitch of each note.

the MTP in this dataset for the vector 28, −10 . In other words, SIA discovers that a repetition of pattern A in Figure 20 occurs 28 semiquavers later, diatonically transposed down an eleventh. However, SIA tells us nothing about all the other transposed occurrences of this pattern (patterns B–G) in this two-bar passage. Ideally, we would like to know not only that pattern A in Figure 21 is an MTP but also all the other patterns in the dataset that are translationally equivalent to A. If P is a pattern in a dataset D then we say that the translational equivalence class (TEC) of P in D, denoted by TEC (P, D), is the set that contains all and only those patterns in D that are translationally equivalent to P . For example, in Figure 21 the TEC of pattern A is the set of patterns labelled A–H. Formally, we say that two patterns P1 and P2 are translationally equivalent, denoted by P1 ≡τ P2 , if and only if there exists a vector v such that τ (P1 , v) = P2 . The TEC of a pattern P in a dataset D is then given by TEC (P, D) = {Q | Q ≡τ P ∧ Q ⊆ D} .

(10)

Our goal in developing SIATEC was to design an efficient algorithm for computing the set T(D) = {TEC (MTP (d2 − d1 , D), D) | d1 , d2 ∈ D} .

(11)

It can be shown that the translational equivalence relation is a true equivalence relation in the mathematical sense—that is, it is reflexive, transitive and symmetric (Meredith et al., 2001). This implies that the translational equivalence relation partitions the power set of a dataset into translational equivalence classes. This means that every pattern in a dataset is a member of exactly one TEC. However, from Lemma 1 we know that τ (MTP (d2 − d1 , D), d2 − d1 ) = MTP (d1 − d2 , D). Therefore TEC (MTP (d2 − d1 , D), D) = TEC (MTP (d1 − d2 , D)). Moreover, we know that MTP (0, D) = D and therefore TEC (MTP (0, D)) = {D} which is a trivial translational equivalence class. Therefore, instead of computing T(D) as defined in Eq.11, our SIATEC

8

THE SIA AND SIATEC ALGORITHMS

28

algorithm actually computes the set T  (D) = {TEC (MTP (d2 − d1 , D), D) | d1 , d2 ∈ D ∧ d1 < d2 } .

(12)

It can be shown that T(D) = T  (D) ∪ {{D}} (Meredith et al., 2001). If P is a pattern in a dataset D then we say that v is a translator of P in D if and only if P is translatable by v in D. The set of translators for P in D, which we shall denote by T (P, D), is the set that only contains all vectors by which P is translatable. Formally, T (P, D) = {v | τ (P, v) ⊆ D} .

(13)

For example, in Figure 16 the set of translators for the pattern {a, c} is {0, 0 , 1, 1 , 0, 2 } and the set of translators for the pattern {a, b, d} is {0, 0 , 1, 0 }. The TEC of a pattern P in a dataset D can therefore be represented efficiently by the ordered pair P, T (P, D) . For any given TEC, E, there are |E| such representations, one for each pattern in E. In general, this ordered-pair representation for a TEC can be much more space-efficient than explicitly writing out every member pattern of the TEC in full. For example, if there are 20 patterns in a dataset that are translationally equivalent to a pattern P1 containing 10 datapoints then printing out the TEC for P1 in full would involve printing 200 datapoints. However, if this TEC were represented as the ordered pair P, T (P, D) then only 10 + 20 = 30 vectors would need to be printed. This example illustrates how an algorithm such as SIATEC that discovers TECs can be used as the basis of an algorithm for compressing multidimensional data. In the output of SIATEC, each distinct TEC, E, in T  (D) is therefore represented as an ordered pair P, T (P, D) where P is a member of E and T (P, D) is the set of translators for P in D. Figure 22 shows T  (D) for the dataset shown in Figure 16. {

   

{1, 1 , 1, 3 , 2, 2 }, {1, 1 , 2, 1 }, {2, 1 , 2, 2 }, {1, 1 },

{0, 0 , 1, 0 } {0, 0 , 0, 2 , 1, 1 } {0, 0 , 0, 1 } {0, 0 , 0, 2 , 1, 0 , 1, 1 , 1, 2 , 2, 1 }

, , , }

Figure 22: The set T  (D) for the dataset in Figure 16.

8

The SIA and SIATEC algorithms

In this section we describe the SIA and SIATEC algorithms. Meredith et al. (2001) provide a more complete description and analysis of these algorithms.

8.1

The SIA algorithm

When given a multidimensional dataset, D, as input, SIA efficiently computes S(D) as defined in Eq.9 above. For a k-dimensional dataset containing n datapoints, the worst-case running time of SIA is O(kn2 log2 n) and its worst-case space complexity is O(kn2 ). The algorithm consists of the following four steps. SIA: Step 1 – Sorting the dataset The first step in SIA is to sort the dataset D to give an ordered set D that contains all and only the datapoints in the dataset in increasing order. For the dataset in Figure 16, the result of this first step would be the ordered set D = 1, 1 , 1, 3 , 2, 1 , 2, 2 , 2, 3 , 3, 2 .

(14)

8

THE SIA AND SIATEC ALGORITHMS

29

For a k-dimensional dataset of size n, it is possible to accomplish this first step of the algorithm using merge sort (Cormen et al., 1990, pp. 12–15) in a worst-case running time of O(kn log2 n).10 SIA: Step 2 – Computing the vector table The second step in SIA is to compute the set V = {D[j] − D[i], i | 1 ≤ i < j ≤ |D|} .

(15)

Note that each member of V is an ordered pair in which the first element is the vector from datapoint D[i] to datapoint D[j] and the second element is the index of the ‘origin’ datapoint, D[i], in D. For the dataset in Figure 16, V contains all the elements below the leading diagonal in Table 1.

To

1, 1 1, 3 2, 1 2, 2 2, 3 3, 2

1, 1

1, 3

From 2, 1

2, 2

2, 3

0, 2 , 1 1, 0 , 1 1, 1 , 1 1, 2 , 1 2, 1 , 1

1, −2 , 2 1, −1 , 2 1, 0 , 2 2, −1 , 2

0, 1 , 3 0, 2 , 3 1, 1 , 3

0, 1 , 4 1, 0 , 4

1, −1 , 5

3, 2

Table 1: A vector table showing the set V for the dataset shown in Figure 16. We call a table like the one in Table 1 a vector table. Each element in this table is an ordered pair v, i where i gives the number of the column in which the element occurs and v is the vector from the datapoint at the head of the column in which the element occurs to the datapoint at the head of the row in which the element occurs. For a k-dimensional dataset of size n, this third step of vector subtractions. It can be accomplished in a worst-case the algorithm involves computing n(n−1) 2 running time of O(kn2 ). SIA: Step 3 – Sorting the vectors If u, i and v, j are any two elements in the set V computed in the second step of the algorithm (Eq.15) then we define that u, i is less than v, j , denoted by u, i < v, j , if and only if u < v or u = v and i < j. The third step in SIA is to sort V to give an ordered set V that contains the elements of V in increasing order. For example, the column headed V[i] in Figure 23 gives V for the dataset in Figure 16. An examination of Table 1 reveals that the vectors increase as one descends a column and decrease as one goes from left to right along a row. In our implementation of SIA we use a two-dimensional linked list to represent V as a vector table like the one in Table 1. We then use a modified version of merge sort, that exploits the fact that the columns and rows in this vector table are already sorted, to accomplish this third step of the algorithm more rapidly than would be achievable using plain merge sort on the completely unsorted set V . The worst-case running time of this step of the algorithm is O(kn2 log2 n). SIA: Step 4 – Printing out S(D) If A is an ordered set of ordered sets then A[i, j] denotes the jth element of the ith element of A. For example, if A = a, b, c , d, e , f then A[1, 3] = c, A[2, 1] = d and A[3, 1] = f . As pointed out above, the column headed V[i] in Figure 23 gives V for the dataset in Figure 16. For each of these ordered pairs, V[i], the datapoint D[V[i, 2]] is 10 When

merge sort is implemented using arrays, it requires linear extra memory and the additional work spent copying to and from the temporary array throughout the algorithm has the effect of slowing down the sort considerably. However, we use a special implementation of merge sort that employs linked lists and in this implementation, no extra memory is required and no copying of data is performed.

8

THE SIA AND SIATEC ALGORITHMS

30

printed next to it in the third column in Figure 23. For example, V[1] = 0, 1 , 3 in Figure 23, so V[1, 2] = 3 and D[V[1, 2]] = 2, 1 , the third datapoint in the ordered set D for the dataset shown in Figure 16.

i

V[i]

1 2

0, 1 , 3 0, 1 , 4

3 4

0, 2 , 1 0, 2 , 3

5

1, −2 , 2

6 7 8 9 10

1, −1 , 2 1, −1 , 5 1, 0 , 1 1, 0 , 2 1, 0 , 4

11 12

1, 1 , 1 1, 1 , 3

13

1, 2 , 1

1, 1

14

2, −1 , 2

1, 3

15

2, 1 , 1

1, 1

D[V[i, 2]]  2, 1 2, 2  1, 1 2, 1  1, 3  1, 3 2, 3  1, 1   1, 3   2, 2  1, 1 2, 1

=

MTP for 0, 1

=

MTP for 0, 2

=

MTP for 1, −2

=

MTP for 1, −1

=

MTP for 1, 0

=

MTP for 1, 1



=

MTP for 1, 2



=

MTP for 2, −1

=

MTP for 2, 1



Figure 23: Reading the second column from top to bottom gives V for the dataset shown in Figure 16. The third column gives D[V[i, 2]] for each element V[i] in the second column. The right-hand side of the third column shows how the non-empty MTPs may be derived directly from V.

As indicated on the right-hand side of the third column in Figure 23, the MTP for a vector v is the set of consecutive datapoints D[V[i, 2]] in the third column that corresponds to the set of consecutive ordered pairs V[i] in the second column for which V[i, 1] = v. The complete set S(D) as defined in Eq.9 can be printed out using the algorithm in Figure 24. In our pseudocode, block structure is indicated by indentation and the symbol ‘←’ indicates assignment. Figure 25 shows the output generated by this algorithm for the dataset in Figure 16. SIA discovers the set P (D) of non-empty MTPs defined in Eq.8 and from Figure 23 it can easily be seen that SIA accomplishes this simply by sorting the set V defined in Eq.15. It is clear from . Therefore, if we use P Table 1 that, for a dataset of size n, the number of elements in V is n(n−1) 2  to denote an MTP in P (D),

n(n − 1) . |P | = 2  P ∈P (D)

Therefore the total number of vectors that have to be printed when S(D) is printed is n(n−1) plus 2 n(n−1)   one vector for each MTP in P (D). Since |P (D)| ≤ , the total number of vectors to be printed 2 n(n−1) . Therefore, for a k-dimensional dataset containing n out is certainly less than or equal to 2 datapoints, S(D) can be printed out in a worst-case running time of O(kn2 ).

8.2

The SIATEC algorithm

When given a multidimensional dataset, D, as input, SIATEC efficiently computes T  (D) as defined in Eq.12 above. For a k-dimensional dataset containing n datapoints, the worst-case running time

8

THE SIA AND SIATEC ALGORITHMS

m ← |V| i←1 PRINT NEW LINE PRINT(‘{’) while i ≤ m PRINT(‘’) PRINT VECTOR(V[i, 1]) PRINT(‘, {’) PRINT VECTOR(D[V[i, 2]]) j ← i+1 while j ≤ m and V[j, 1] = V[i, 1] PRINT(‘,’) PRINT VECTOR(D[V[j, 2]]) j ←j+1 PRINT(‘}’) if j ≤ m PRINT(‘,’) PRINT NEW LINE i←j PRINT(‘}’)

Figure 24: An algorithm for printing out S(D) using V and D.

{, , , , , , , , }

Figure 25: The output of the algorithm in Figure 24 for the dataset in Figure 16.

31

8

THE SIA AND SIATEC ALGORITHMS

32

of SIATEC is O(kn3 ) and its worst-case space complexity is O(kn2 ). The algorithm consists of the following seven steps. SIATEC: Step 1 – Sorting the dataset This is exactly the same as Step 1 of SIA as described on page 28 above. SIATEC: Step 2 – Computing W The second step in SIATEC is to compute the ordered set of ordered sets W = W[1, 1], . . . W[1, |D|] , . . . W[|D|, 1], . . . W[|D|, |D|] where W[i, j] = D[j] − D[i], i .

(16)

W can be visualized as a vector table like Table 2 (which shows W for the dataset in Figure 16). The only difference between the vector table (W) computed by SIATEC and the vector table computed in Step 2 of SIA (see Table 1) is that in SIATEC, all the elements in this table are filled in; whereas in SIA, only those elements below the leading diagonal are computed. Computing the whole table rather than just the region below the leading diagonal allows us to compute more efficiently the set of translators for each MTP (see Step 7 below).

To

1, 1 1, 3 2, 1 2, 2 2, 3 3, 2

1, 1

1, 3

From 2, 1

0, 0 , 1 0, 2 , 1 1, 0 , 1 1, 1 , 1 1, 2 , 1 2, 1 , 1

0, −2 , 2 0, 0 , 2 1, −2 , 2 1, −1 , 2 1, 0 , 2 2, −1 , 2

−1, 0 , 3 −1, 2 , 3 0, 0 , 3 0, 1 , 3 0, 2 , 3 1, 1 , 3

2, 2

2, 3

3, 2

−1, −1 , 4 −1, 1 , 4 0, −1 , 4 0, 0 , 4 0, 1 , 4 1, 0 , 4

−1, −2 , 5 −1, 0 , 5 0, −2 , 5 0, −1 , 5 0, 0 , 5 1, −1 , 5

−2, −1 , 6 −2, 1 , 6 −1, −1 , 6 −1, 0 , 6 −1, 1 , 6 0, 0 , 6

Table 2: A vector table showing W for the dataset shown in Figure 16. Computing W for a k-dimensional dataset of size n involves computing n2 vector subtractions. Each of these vector subtractions involves carrying out k scalar subtractions so the overall worst-case running time of this step is O(kn2 ). SIATEC: Step 3 – Computing V The third step of SIATEC is to compute the set V as defined in Eq.15. This is the same set as that computed in Step 2 of SIA. In SIATEC, V can be computed directly from W since V = {W[i, j] | 1 ≤ i < j ≤ |D|} . (17) In fact, in our implementation of SIATEC, V is computed at the same time as W. SIATEC: Step 4 – Sorting V to produce V

This step is exactly the same as Step 3 of SIA.

SIATEC: Step 5 – ‘Vectorizing’ the MTPs V is effectively a sorted representation of S(D) (Eq.9) (see Step 4 of SIA and Figure 23). The purpose of SIATEC is to compute T  (D) (Eq.12) which is the set that only contains every TEC that is the TEC of an MTP in P (D). P (D) can be obtained from V but it is possible for two or more MTPs in P (D) to be translationally equivalent. For example, the MTPs in the dataset in Figure 16 for the vectors 0, 2 , 1, −1 and 1, 1 are translationally equivalent (see Figure 23). If two patterns are translationally equivalent then they are members of the same TEC. Therefore, if we na¨ıvely compute the TEC of each MTP in P (D), we run the risk

8

THE SIA AND SIATEC ALGORITHMS

33

of computing the same TEC more than once which is inefficient. We therefore partition P (D) into translational equivalence classes and then select just one MTP from each of these classes, discarding the others. If P is a pattern then let SORT (P ) be the function that returns the ordered set that only contains all the datapoints in P sorted into increasing order. If P is an ordered set of datapoints then let VEC (P) be the function that returns the ordered set of vectors VEC (P) = P[2] − P[1], P[3] − P[2], . . . P[|P |] − P[|P | − 1] .

(18)

If P1 and P2 are two patterns in a dataset, then VEC (SORT (P1 )) = VEC (SORT (P2 )) ⇐⇒ P1 ≡τ P2 .

(19)

We say that VEC (SORT (P )) is the vectorized representation of the pattern P . In the ordered set V computed in step 4 of SIATEC, each MTP P is represented in its sorted form as SORT (P ) = P (see Figure 23). Therefore, if we want to use Eq.19 to partition P (D) we first have to compute VEC (P) for each of the sorted MTPs P in V. Step 5 of SIATEC is therefore to compute X = {i, VEC (SORT (P )) | v, P ∈ S(D) ∧ V[i, 1] = v ∧ (i = 1 ∨ V[i − 1, 1] = v)} .

(20)

If V[i] and V[j] are two distinct elements of V and V[i] < V[j] but V[i, 1] = V[j, 1] (i.e., the vectors in V[i] and V[j] are the same) then V[i, 2] < V[j, 2] which implies that D[V[i, 2]] < D[V[j, 2]]. This means that the datapoints within each MTP in the V representation of S(D) are sorted in increasing order, as can be seen in the output of SIA (Figure 25) generated by the algorithm in Figure 24.

m ← |V| i←1 X ←  while i ≤ m Q ←  j ← i+1 while j ≤ m and V[j, 1] = V[i, 1] Q ← Q ⊕ D[V[j, 2]] − D[V[j − 1, 2]] j ←j+1 X ← X ⊕ i, Q i←j

Figure 26: An algorithm for computing X using V and D. X can be efficiently computed directly from V and D using the algorithm in Figure 26 which exploits the fact that the MTPs in V are already sorted. In Figure 26, the set X is actually represented as an ordered set X. When the algorithm in Figure 26 has terminated, the ordered set X only contains all the elements of X (with no duplicates). In Figure 26,  denotes the empty ordered set and the symbol ⊕ denotes concatenation: if A and B are two ordered sets such that A = a1 , . . . am and B = b1 , . . . bn then A ⊕ B = a1 , . . . am , b1 , . . . bn . Figure 27 shows the state of X for the dataset in Figure 16 at the termination of Step 5 of SIATEC. For a k-dimensional dataset of size n, the worst-case running time of the algorithm in Figure 26 is O(kn2 ).

8

THE SIA AND SIATEC ALGORITHMS

         

1, 3, 5, 6, 8, 11, 13, 14, 15,

34

0, 1 1, 0  1, 0 0, 2 , 1, −1 1, 0   

, , , , , , , ,

Figure 27: The ordered set X for the dataset in Figure 16.

SIATEC: Step 6 – Sorting X Let Q1 and Q2 be any two ordered sets in which each element is a k-dimensional vector. We define that Q1 is less than Q2 , denoted by Q1 < Q2 if and only if one of the following two conditions is satisfied: 1. |Q1 | < |Q2 |. 2. |Q1 | = |Q2 | and there exists an integer 1 ≤ i ≤ |Q1 | such that Q1 [i] < Q2 [i] and Q1 [j] = Q2 [j] for all 1 ≤ j < i. (See page 24 for a definition of the expression u < v when u and v are vectors.) In Step 6 of SIATEC, the ordered set X generated by the algorithm in Figure 26 is sorted to produce the ordered set Y which satisfies the following two conditions: 1. Y only contains all the elements of X. 2. If Y[i] and Y[j] are any two distinct elements of Y then i < j ⇐⇒ Y[i, 2] < Y[j, 2] ∨ (Y[i, 2] = Y[j, 2] ∧ Y[i, 1] < Y[j, 1]). Figure 28 shows Y for the dataset in Figure 16. For a k-dimensional dataset of size n, this step of the algorithm can be accomplished in a worst-case running time of O(kn2 log2 n) using merge sort.          

5, 13, 14, 15, 1, 3, 6, 11, 8,

    0, 1 1, 0 1, 0 1, 0 0, 2 , 1, −1

, , , , , , , ,

Figure 28: The ordered set Y for the dataset in Figure 16.

We know that MTP (V[Y[i, 1], 1], D) ≡τ MTP (V[Y[j, 1], 1], D) ⇐⇒ Y[i, 2] = Y[j, 2].

8

THE SIA AND SIATEC ALGORITHMS

35

So Figure 28 tells us, for example, that the MTPs for the vectors V[3, 1] = 0, 2 , V[6, 1] = 1, −1 and V[11, 1] = 1, 1 are translationally equivalent since the vectorized representation of each of these patterns is 1, 0 . This implies that we only have to compute the TEC of one of these patterns and the other two can be disregarded.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

r ← |Y| m ← |V| i←1 PRINT NEW LINE PRINT(‘{’) if r > 0 repeat j ← Y[i, 1] I ←  while j ≤ m and V[j, 1] = V[Y[i, 1], 1] I ← I ⊕ V[j, 2]] j ←j+1 PRINT(‘’) PRINT PATTERN(I) PRINT(‘,’) PRINT SET OF TRANSLATORS(I) PRINT(‘’) repeat i← i+1 until i > r or Y[i, 2] = Y[i − 1, 2] if i ≤ r PRINT(‘,’) PRINT NEW LINE until i > r PRINT(‘}’)

Figure 29: An algorithm for printing out T  (D).

SIATEC: Step 7 – Printing out T  (D) The final step of SIATEC is to print out T  (D). This can be done using the algorithm in Figure 29. Recall that each TEC in T  (D) is represented as an ordered pair P, T (P, D) where P is an MTP and T (P, D) is the set of translators for P in the dataset D (see Eq.13 and discussion on page 28 above). In Figure 29, each MTP is printed out using the algorithm PRINT PATTERN called in line 14. This algorithm is given in Figure 30.

PRINT PATTERN(I) p ← |I| PRINT(‘{’) PRINT VECTOR(D[I[1]]) for k ← 2 to p PRINT(‘,’) PRINT VECTOR(D[I[k]]) PRINT(‘}’)

Figure 30: The PRINT PATTERN algorithm.

9

RUNNING SIA AND SIATEC ON MUSIC DATA

36

The set of translators for each TEC is printed out using the algorithm PRINT SET OF TRANSLATORS called in line 16 of Figure 29. This algorithm, which is given in Figure 31, exploits the fact that |D|

T ({D[i]} , D) =



{W[i, j, 1]} .

j=1

That is, the set of translators for a datapoint D[i] is the set that only contains every vector that occurs as the first element in an ordered pair in the ith column in the vector table computed in Step 2 of SIATEC (see Table 2). In Figure 29, each MTP is represented as a set of indices, I: the pattern represented by I is simply {D[i] | i ∈ I}. The set of translators for the pattern represented by I is therefore   |D|  T ({D[i]} , D) = {W[i, j, 1]} . (21) i∈I

i∈I

j=1

In other words, the set of translators for a pattern is the set that only contains those vectors that occur in all the columns in the vector table corresponding to the datapoints in the pattern. For example, the set of translators for the set {a, c} = {1, 1 , 2, 1 } is the set that only contains all the vectors that occur in both the first and third columns in Table 2: T ({1, 1 , 2, 1} , D)

=

{0, 0 , 0, 2 , 1, 0 , 1, 1 , 1, 2 , 2, 1} ∩ {−1, 0 , −1, 2 , 0, 0 , 0, 1 , 0, 2 , 1, 1}

=

{0, 0 , 0, 2 , 1, 1}

The algorithm PRINT SET OF TRANSLATORS is an efficient algorithm for computing the expression on the right-hand side of Eq.21. Using the algorithms in Figures 29 to 31, Step 7 can be accomplished in a worst-case running time of O(kn3 ) for a k-dimensional dataset of size n. Figure 32 shows the output generated by the algorithm in Figure 29 for the dataset in Figure 16.

9

Running SIA and SIATEC on music data

SIA and SIATEC have been implemented in ANSI C (Kernighan and Ritchie, 1988), compiled using the GNU C compiler and run on a 500MHz Sparc machine with 1GB RAM. Table 3 shows the results of running these implementations of the algorithms on a set of 52 datasets that we shall denote by S. As can be seen in Table 3, S contains 2-, 3-, 4- and 5-dimensional datasets (column headed k) ranging in size from 6 to 3456 datapoints (column headed n). The notes in Table 3 provide an explanation of the data presented. Table 4 gives more information about the dataset files used in the experiments. As shown in this table, many of the files used in the experiment were derived automatically from MIDI files created by Tomita (1996). Figure 33 gives a graphical representation of the data in the columns headed n and tSIA /s in Table 3. In Figure 33, the vertical axis of each graph represents running time t in seconds, and the horizontal axis represents the number of datapoints n in the dataset given as input to SIA. The smooth curve in each graph represents the function t = cSIA kn2 log2 n where k is the dimensionality of the input dataset and cSIA is a constant computed using the equation: cSIA =

tSIA (D) 1

|S| k(D)(n(D))2 log2 (n(D)) D∈S

where: tSIA (D) is the running time of our implementation of SIA for the dataset D;

9

RUNNING SIA AND SIATEC ON MUSIC DATA

PRINT SET OF TRANSLATORS(I) p ← |I| n ← |D| if p = 1 PRINT(‘{’) PRINT VECTOR(W[I[1], 1, 1]) for k ← 2 to n PRINT(‘,’) PRINT VECTOR(W[I[1], k, 1]) PRINT(‘}’) else PRINT(‘{’) J ←  for k ← 1 to p J ← J ⊕ 1 FINISHED ← FALSE FIRST VECTOR ← TRUE k←2 while not FINISHED if J[k] ≤ J[k − 1] J[k] ← J[k − 1] + 1 while J[k] ≤ n and W[I[k], J[k], 1] < W[I[k − 1], J[k − 1], 1] J[k] ← J[k] + 1 if J[k] > n FINISHED ← TRUE else if W[I[k], J[k], 1] > W[I[k − 1], J[k − 1], 1] k←2 J[1] ← J[1] + 1 else if k = p if not FIRST VECTOR PRINT(‘,’) else FIRST VECTOR ← FALSE PRINT VECTOR(W[I[k], J[k], 1]) for k ← 1 to p J[k] ← J[k] + 1 if J[k] > n FINISHED ← TRUE else k←2 else k ←k+1 PRINT(‘}’)

Figure 31: The PRINT SET OF TRANSLATORS algorithm. {, , , }

Figure 32: The output of the algorithm in Figure 29 for the dataset in Figure 16.

37

9

RUNNING SIA AND SIATEC ON MUSIC DATA

38

File name

K

k

SD

N

n

|S(D)|

|OSIA |

tSIA /s

|OSIAT EC |

|T  (D)|

simple dataset2d.dat

2

2

11

6

6

9

24

0

21

4

0

101mac2d.dat

2

2

11

101

101

2,423

7,473

0.07

5,699

791

0.19

invention1-2d.dat

2

2

11

117

117

2,697

9,483

0.09

8,171

994

0.34

202mac2d.dat

2

2

11

202

202

6,041

26,342

0.3

27,087

2,705

1.39

404-2d.dat

2

2

11

404

404

14,295

95,701

1.12

110,028

7,983

9.37

rach4d.dat

4

2

1001

1,725

564

12,427

171,193

2.2

233,104

tSIAT EC /s

2,292

21.42

maple2d.dat

2

2

11

575

575

17,914

182,939

2.34

231,558

12,677

24.98

WTCII19A.dat

2

2

11

689

689

28,616

265,632

3.47

302,781

19,287

41.87

bachprel5d.dat

5

2

10100

691

691

15,749

254,144

3.39

290,809

13,079

38.88

bachprel5d.dat

5

2

11000

691

691

24,940

263,335

3.45

309,670

19,838

42.63

bachfugue5d.dat

5

2

10100

736

729

23,478

288,834

3.81

340,640

16,150

45.13

bachfugue5d.dat

5

2

11000

736

729

34,558

299,914

3.94

348,489

21,572

47.87

Wtcii19b.dat

2

2

11

758

758

30,386

317,289

4.18

358,358

21,671

54.04

808mac2d.dat

2

2

11

808

808

31,544

357,572

4.89

414,365

19,934

59.9

Wtcii20b.dat

2

2

11

869

869

132,023

509,169

6.12

409,520

39,358

68.62

899

114,822

Wtcii24a.dat

2

2

11

518,473

6.53

Wtcii21b.dat

2

2

11

966

966

40,038

506,133

6.96

571,629

28,310

102.23

Wtcii24b.dat

2

2

11

1,041

899

1,041

78,774

620,094

8.2

634,145

467,422

35,173

39,173

120.75

80.32

WTCII23A.dat

2

2

11

1,097

1,097

108,391

709,547

9.36

685,594

40,076

138.51

Wtcii12b.dat

2

2

11

1,121

1,121

50,073

677,833

9.74

768,487

37,160

156.16

WTCII22A.dat

2

2

11

1,184

1,184

46,353

746,689

10.73

852,566

38,507

179.43

1212mac2d.dat

2

2

11

1,212

1,212

52,084

785,950

10.97

911,228

33,334

181.42

Wtcii23b.dat

2

2

11

1,311

1,311

59,729

918,434

13.31

1,032,894

44,474

233.89

WTCII12A.dat

2

2

11

1,473

1,473

90,425

1,174,553

16.85

1,318,412

57,194

325.16

WTCII20A.dat

2

2

11

1,542

1,542

119,906

1,308,017

19.1

1,555,868

90,144

406.45

Wtcii18b.dat

2

2

11

1,578

1,578

155,703

1,399,956

19.9

1,418,070

62,027

376.15

Wtcii01a.dat

2

2

11

1,658

1,658

86,953

1,460,606

21.5

1,638,495

47,939

433.7

rach4d.dat

4

2

1010

1,725

1,722

60,909

1,542,690

22.57

1,856,952

48,235

504.79 537.18

rach4d.dat

4

2

1100

1,725

1,722

96,142

1,577,923

22.69

1,891,199

73,393

Wtcii22b.dat

2

2

11

1,754

1,754

85,985

1,623,366

23.89

1,816,177

65,691

516.98

Wtcii01b.dat

2

2

11

1,934

1,934

117,872

1,987,083

30.01

2,143,243

51,538

644.62

WTCII18A.dat

2

2

11

2,616

2,616

234,956

3,655,376

58.77

3,815,574

108,453

1,605.51

WTCII21A.dat

2

2

11

3,456

3,456

168,787

6,139,027

108.88

6,803,022

139,791

?

bachprel5d.dat

5

3

10110

691

691

44,104

282,499

4.52

313,280

24,173

47.39

bachprel5d.dat

5

3

11010

691

691

61,566

299,961

4.68

315,377

30,899

48.84

bachprel5d.dat

5

3

11100

691

691

32,010

270,405

4.51

320,370

21,589

46.4

bachfugue5d.dat

5

3

11100

736

729

39,852

305,208

5.22

355,413

22,452

51.68

bachfugue5d.dat

5

3

10110

736

736

100,021

370,501

5.66

302,339

29,349

49.06

bachfugue5d.dat

5

3

11010

736

736

121,801

392,281

5.91

281,838

32,705

47.66

rach4d.dat

4

3

1110

1,725

1,722

135,816

1,617,597

29.46

2,031,369

87,941

590.25

rach4d.dat

4

3

1011

1,725

1,724

253,542

1,738,768

30.63

2,116,659

84,719

547.71

rach4d.dat

4

3

1101

1,725

1,724

341,460

1,826,686

30.97

1,851,950

101,779

530.41

cantfindthis4d.dat

4

4

1111

17

17

97

233

0

104

14

0

beethoven4d.dat

4

4

1111

24

24

244

520

0.01

41

3

0

cantfindthis2-4d.dat

4

4

1111

26

26

289

614

0.01

101

15

0

barber4d.dat

4

4

1111

74

74

2,187

4,888

0.07

568

70

0.06

bachprel5d.dat

5

4

11110

691

691

70,098

308,493

5.78

305,066

30,162

50.1

bachfugue5d.dat

5

4

11110

736

736

127,338

397,818

7.09

270,800

31,346

48.7 536.24

rach4d.dat

4

4

1111

1,725

1,724

392,188

1,877,414

38.57

1,785,796

99,412

mozart5d.dat

5

5

11111

130

130

5,022

13,407

0.24

2,586

231

0.31

bachprel5d.dat

5

5

11111

691

691

90,387

328,782

7.03

282,606

31,912

49.7

bachfugue5d.dat

5

5

11111

736

736

173,889

444,369

8.55

191,536

27,367

39.58

Notes: 1.

The column headed K gives the dimensionality of the dataset stored in the file given in the first column.

2.

The column headed k gives the dimensionality of the orthogonal projection of the dataset stored in the file that is analysed.

3.

Each entry in the column headed SD is a bit vector indicating which particular projection is being analysed. The jth dimension is included if and only if the jth digit from the left in the bit vector is a 1. For example, the bit vector 11010 indicates a three-dimensional orthogonal projection of a 5-dimensional dataset in which the first, second and fourth dimensions are included in the projection.

4.

The column headed N gives the size of the original dataset stored in the file.

5.

The column headed n gives the size of the orthogonal projection analysed.

6.

The column headed |S(D)| gives the number of ordered pairs in the set S(D) computed by SIA for that dataset.

7.

The column headed |OSIA | gives the total number of vectors printed out by SIA. This gives a measure of the size of the output.

8.

The column headed tSIA /s gives the running time of SIA in seconds.

9.

The column headed |OSIAT EC | gives the total number of vectors printed out by SIATEC. This gives a measure of the size of the output.

10.

The column headed |T  (D)| gives the number of TECs discovered by SIATEC.

11.

The column headed tSIAT EC /s gives the running time of SIATEC in seconds.

12.

The running time of SIATEC for the file WTCII21A.dat was not measured accurately. It took approximately 2 hours for the program to run on this file but the amount of CPU time used was not measured.

Table 3: Results of running SIA and SIATEC on 52 datasets.

9

RUNNING SIA AND SIATEC ON MUSIC DATA

Name of file

K

simple dataset2d.dat

2

cantfindthis4d.dat

4

beethoven4d.dat

4

cantfindthis2-4d.dat

4

SD

39

N

Description

Source

6

Dataset in Figure 16.

1

11110

17

Example in Figure 6.

1

11110

24

Extract in Figure 1.

1

11110

26

Example in Figure 11.

1

barber4d.dat

4

11110

74

Extract in Figure 3.

1

101mac2d.dat

2

11000

101

First 101 datapoints from Wtcii01a.dat.

2

invention1-2d.dat

2

11000

117

First 5 1/2 measures of J. S. Bach’s Two-Part Invention in C major.

1

mozart5d.dat

5

11111

130

Extract in Figure 5.

1

202mac2d.dat

2

11000

202

First 202 datapoints from Wtcii01a.dat.

2

404-2d.dat

2

11000

404

First 404 datapoints from Wtcii01a.dat.

2

maple2d.dat

2

11000

575

Second section of Scott Joplin’s Maple Leaf Rag

1

WTCII19A.dat

2

11000

689

Prelude in A major from Book II of J. S. Bach’s WTK

2

bachprel5d.dat

5

11111

691

Prelude in C minor from Book II of J. S. Bach’s WTK.

1

bachfugue5d.dat

5

11111

736

Fugue in C major from Book I of J. S. Bach’s WTK.

1

Wtcii19b.dat

2

11000

758

Fugue in A major from Book II of J. S. Bach’s WTK.

2

808mac2d.dat

2

11000

808

First 808 datapoints from Wtcii01a.dat.

2

Wtcii20b.dat

2

11000

869

Fugue in A minor from Book II of J. S. Bach’s WTK.

2

Wtcii24a.dat

2

11000

899

Prelude in B minor from Book II of J. S. Bach’s WTK.

2

Wtcii21b.dat

2

11000

966

Fugue in B flat major from Book II of J. S. Bach’s WTK.

2

Wtcii24b.dat

2

11000

1041

Fugue in B minor from Book II of J. S. Bach’s WTK.

2

WTCII23A.dat

2

11000

1097

Prelude in B major from Book II of J. S. Bach’s WTK.

2

Wtcii12b.dat

2

11000

1121

Fugue in F minor from Book II of J. S. Bach’s WTK.

2

WTCII22A.dat

2

11000

1184

Prelude in B flat minor from Book II of J. S. Bach’s WTK.

2

1212mac2d.dat

2

11000

1212

First 1212 datapoints from Wtcii01a.dat.

2

Wtcii23b.dat

2

11000

1311

Fugue in B major from Book II of J. S. Bach’s WTK.

2

WTCII12A.dat

2

11000

1473

Prelude in F minor from Book II of J. S. Bach’s WTK.

2

WTCII20A.dat

2

11000

1542

Prelude in A minor from Book II of J. S. Bach’s WTK.

2

Wtcii18b.dat

2

11000

1578

Fugue in G sharp minor from Book II of J. S. Bach’s WTK.

2

Wtcii01a.dat

2

11000

1658

Prelude in C major from Book II of J. S. Bach’s WTK.

2

rach4d.dat

4

11110

1725

S. Rachmaninoff’s Prelude in C sharp minor, Op.3, No.2.

1

Wtcii22b.dat

2

11000

1754

Fugue in B flat minor from Book II of J. S. Bach’s WTK.

2

Wtcii01b.dat

2

11000

1934

Fugue in C major from Book II of J. S. Bach’s WTK.

2

WTCII18A.dat

2

11000

2616

Prelude in G sharp minor from Book II of J. S. Bach’s WTK.

2

WTCII21A.dat

2

11000

3456

Prelude in B flat major from Book II of J. S. Bach’s WTK.

2

Notes: 1.

The column headed K gives the dimensionality of the dataset.

2.

The column headed SD indicates what the dimensions represent in the dataset: (a)

If the first digit is a one, then the onset time of each note is encoded.

(b)

If the second digit is a one, then the chromatic pitch of each note is encoded.

(c)

If the third digit is a one, then the morphetic pitch of each note is encoded.

(d)

If the fourth digit is a one, then the duration of each note is encoded.

(e)

If the fifth digit is a one, then the voice to which each note belongs is encoded.

3.

The column headed N gives the number of datapoints in the dataset.

4.

The abbreviation ‘WTK’ in the column headed ‘Description’ stands for ‘Das Wohltemperirte Klavier’.

5.

A ‘1’ in the column headed ‘Source’ indicates that the file was encoded by Dave Meredith.

6.

A ‘2’ in the column headed ‘Source’ indicates that the file was derived automatically from a MIDI file encoded by Yo Tomita.

Table 4: The dataset files used in the experiments.

9

RUNNING SIA AND SIATEC ON MUSIC DATA

Figure 33: Graphs of running time against input size for SIA.

40

9

RUNNING SIA AND SIATEC ON MUSIC DATA

41

Figure 34: Graphs of running time against input size for SIATEC. k(D) is the dimensionality of D; n(D) is the size of D; and S is the complete set of datasets used in the experiments, as defined above. That is, cSIA is the mean value of tSIA /(kn2 log2 n) for all the datasets in S. The points in the graphs in Figure 33 give the observed running times of our SIA implementation for the datasets in S. The results support our claim that the worst-case running time of SIA is O(kn2 log2 n). This particular implementation takes less than 2 minutes to process a 2-dimensional dataset containing nearly 3500 datapoints (i.e., a piece of music containing 3500 notes). Figure 34 gives a graphical representation of the data in the columns headed n and tSIAT EC /s in Table 3. In Figure 34, the vertical axis of each graph represents running time t and the horizontal axis represents dataset size n. In Figure 34, the smooth curve in each graph represents the function t = cSIAT EC kn3 where k is again the dimensionality of the input dataset and cSIAT EC is a constant computed using the equation 1 tSIAT EC (D) cSIAT EC =  |S | k(D)(n(D))3  D∈S

where:

10 ISOLATING PERCEPTUALLY SIGNIFICANT REPETITIONS

42

tSIAT EC (D) is the running time of our implementation of SIATEC for the dataset D; k(D) is the dimensionality of D; n(D) is the size of D; and S  is the set that only contains all the datasets in S apart from the one stored in the file WTCII21A.dat. (We were not able to obtain an accurate measure of the running time of SIATEC for this file.) In other words, cSIAT EC is the mean value of tSIAT EC /(kn3 ) for all the datasets in S  . The points in the graphs in Figure 34 give the observed running times of our SIATEC implementation for the datasets in S  . The results support our claim that the worst-case running time of SIATEC is O(kn3 ). This particular implementation takes about 13 minutes to process a 2-dimensional dataset containing 2000 datapoints (i.e., a piece of music containing 2000 notes).

10

Isolating perceptually significant repetitions

The power set of a dataset D contains 2|D| distinct patterns. On the other hand, the number of 2 MTPs generated by SIA for a dataset D is less than |D| 2 . Therefore, for all but the very smallest datasets, the set of non-empty MTPs computed by SIA contains only a tiny proportion of all the patterns in the dataset. Moreover, our experiments suggest that many of the patterns involved in perceptually significant repetitions in music are MTPs and are therefore discovered by SIA and SIATEC. For example, as we have already pointed out (see page 24 above) each of the patterns A, B, C and D in Figure 17 corresponds to an MTP in the two-dimensional dataset representation shown in Figure 18. If the extract in Figure 1 is represented as a multidimensional dataset, then the pattern labelled A in this figure corresponds to an MTP. If the extract in Figure 3 is represented as a two-dimensional dataset in which the dimensions represent chromatic pitch and onset time, then the pattern A1 in this figure is an MTP. If the examples in Figures 6 and 11 are represented in the same way, then the patterns labelled A in these examples are both MTPs. Figure 35 shows the first five bars of Bach’s two-part Invention in C major (BWV 772). The set of patterns indicated by boxes in the figure corresponds to one of the TECs discovered by SIATEC for this dataset. These examples demonstrate that it is possible to use SIA and SIATEC to discover perceptually significant musical repetitions that are difficult to find using string-based algorithms. However, as shown in Table 3, SIATEC typically discovers tens of thousands of TECs even in relatively short pieces such as those in Bach’s Das Wohltemperirte Klavier and usually only a very small proportion of these TECs would be considered perceptually significant or analytically interesting by a musician, listener or music analyst. We are therefore currently experimenting with systems that evaluate the output generated by SIATEC and isolate those TECs that correspond to particular classes of perceptually significant repetition. However, it seems that there are various ways in which a repeated musical structure might be “perceptually significant” or “analytically interesting”. In other words, it seems that there are various interesting types of structural feature that a repeated musical pattern might be able to tell us something about. For example, the TEC shown in Figure 35 obviously corresponds to the statements of the opening subject and therefore tells us something interesting about the thematic structure of the music. On the other hand, the largest MTP in the Prelude in C major from Book II of Bach’s Das Wohltemperirte Klavier tells us that there are 115 occasions in the piece when a note is followed precisely one bar later by a note that is a major second lower. At first, this might not seem to be significant. However, it actually reflects the analytically interesting fact that there are many occasions in the piece when a bar-long fragment is immediately repeated one tone lower—a baroque contrapuntal technique known as sequence.

10 ISOLATING PERCEPTUALLY SIGNIFICANT REPETITIONS

&c ≈œœœœœœœœ œ œ œ

& ?

≈œœ

≈œœœœœœœ œ œ Œ

?c Ó 4

œœœœœœœœœ œ œ œ

œ œ œ œ œ œ œ œ œ œ œ œ #œ œ œ œ œ œ

œ



œ

œ

œ

œ

œ

œ

43

œœœœœœœœœœœœœœœœ

œœœœœ œ œ œ œ œ œ œ œ

œ.

œ œ œ œ #œ œ œ œ œ

œ œ #œ œ œ œ œ œ

œ

œ

œ

Figure 35: Bars 1–5 of Bach’s two-part Invention in C major (BWV 772) showing one of the TECs discovered by SIATEC.

10 ISOLATING PERCEPTUALLY SIGNIFICANT REPETITIONS

44

This suggests that there is probably no single set of rules or heuristics that is capable of isolating all and only those TECs computed by SIATEC that correspond to perceptually significant musical repetitions. We therefore need to develop various algorithms, each one designed to isolate repetitions of a particular type. We are currently in the early stages of developing algorithms for isolating TECs that contain “theme-like” or “motif-like” patterns, since it seems likely that these are the patterns that one would wish to include in the index of a music database. Most of the perceptually significant patterns shown in the musical examples in this paper are theme-like or motif-like (see Figures 1, 3, 5, 6, 11, 12, 17, 20 and 35). Each of these patterns can be unambiguously specified by drawing a horizontal rectangle around it in the score. In other words, each of these theme-like or motif-like patterns contains all the notes that occur within the smallest horizontal rectangle that contains the pattern, whereas a patently non-theme-like pattern like the ones in Figure 2 cannot be specified in this way. In geometry, the smallest hypercuboid with edges parallel to the co-ordinate axes that contains a set of points is called the bounding box of that set of points (Cormen et al., 1990, p. 889). In two dimensions, the bounding box of a set of points S is simply the smallest rectangle with edges parallel to the x-axis and y-axis that contains S (see Figure 36). We say that a pattern P in a dataset is bounding-box y

×

6

× ×

× × × x

Figure 36: Example of a bounding box in 2 dimensions.

compact (BBC) if and only if it contains all the datapoints in the dataset that occur within the bounding box of P . We have designed and implemented an algorithm that when given a dataset D and the set T  (D) as input, outputs the set {E | E ∈ T  (D) ∧ (∃P | P ∈ E ∧ P is BBC)} . That is, it searches the set of MTP TECs generated by SIATEC and selects all and only those TECs that contain bounding-box compact patterns. The set of patterns shown in Figure 35 corresponds to a TEC that contains a bounding-box compact MTP in this dataset. Preliminary observations certainly suggest that compactness is one important feature that is common to most theme-like patterns. However, there are certainly theme-like patterns that are not bounding-box compact and there are also non-theme-like patterns that are bounding-box compact. The MTP labelled A in Figure 18 is an example of a theme-like pattern that is not bounding-box compact and the pattern labelled Z in Figure 21 is an example of a non-theme-like pattern that is bounding-box compact. Although pattern A in Figure 18 is not bounding-box compact, it does contain all the points in the dataset that occur within the smallest convex polygon that can be drawn around it. In geometry, the convex hull of a set Q of points in a two-dimensional Euclidean space is the smallest convex

11 SUMMARY AND POSSIBLE DIRECTIONS FOR FURTHER WORK

45

polygon P for which each point in Q is either on the boundary of P or in its interior (Cormen et al., 1990, p. 898). More generally, the convex hull of a subset A of a real vector space is the intersection of all convex sets containing A (Borowski and Borwein, 1989, p. 123, s.v. convex hull). Figure 37 shows the convex hull of a set of points in two dimensions. We say that a pattern P in a dataset is y

6

×

Q  QQ Q  Q  Q Q  QQ   A  A   A  A   A  A

×

×

×

×

×

x Figure 37: Example of a convex hull in 2 dimensions.

convex hull compact if and only if it contains all and only those datapoints in the dataset that occur within the convex hull of P . Cases like pattern A in Figure 18 suggest that it might be interesting to develop an algorithm that searches the MTP TECs generated by SIATEC and selects all and only those TECs that contain convex-hull compact patterns. Alternatively one could isolate those TECs that contain temporally compact patterns (see page 7 above). Instead of simply eliminating those TECs in the output of SIATEC that do not satisfy certain conditions, one could, instead, use a set of heuristics to calculate a value Q for each TEC that is designed to be an indication of how theme-like the patterns in the TEC are. One could then sort the TECs into decreasing order of Q and output this sorted list. Q might depend upon factors such as frequency of occurrence, pattern size, overlap between pattern occurrences, compactness and so on (cf. Cambouropoulos’s (1998, p. 89) ‘Selection Function’). We have designed and implemented algorithms along these lines and preliminary experiments have yielded promising results. For example, we designed an algorithm that computes a value of Q for each TEC based on measures of pattern size, frequency of occurrence, compactness and pattern overlap. When this algorithm was run on a twodimensional dataset representing the passage in Figure 35, the TEC shown in this figure achieved the highest value of Q. Also, when the same algorithm was run on the dataset shown in Figure 18, the TEC containing pattern A was assigned one of the highest values of Q. We are only in the early stages of experimenting with heuristics for isolating particular classes of perceptually significant repetition. However, our preliminary results suggest that, by using SIATEC in conjunction with algorithms embodying carefully chosen heuristics, it may be possible to isolate various desired classes of perceptually significant repetition automatically from the raw musical surface.

11 11.1

Summary and possible directions for further work Summary

1. There are good practical and scientific reasons for attempting to develop an algorithm that can discover the perceptually significant instances of exact and modified repetition in a musical

11 SUMMARY AND POSSIBLE DIRECTIONS FOR FURTHER WORK

46

passage. 2. Many music analysts and psychologists have stressed that the identification of perceptually significant instances of repetition in music is an essential step in the process by which an expert listener achieves a rich interpretation of a musical work. 3. Typically, the vast majority of exact repetitions that occur within a musical work are not perceptually significant. 4. The class of perceptually significant repetitions is a very diverse set because the patterns that are involved in such repetitions vary widely in their structural characteristics and because there are many different ways in which a pattern can be modified to give a second pattern that is perceived to be a version of it. 5. Most previous approaches to pattern discovery in music have been based on the assumption that the music to be analysed is represented using strings. It has further been assumed in these approaches that the strings used to represent the music are either event strings or interval strings. We have shown that there are certain classes of perceptually significant repetition that occur routinely in music that cannot be discovered using these string-based approaches. It is also true that if one wishes to discover a broad range of types of repetition using a string-based approach then one typically has to run a variety of algorithms on many different representations of the music. 6. We propose a geometric approach to repetition discovery in which the music to be analysed is represented as a multidimensional dataset (a set of points in a multidimensional Euclidean space). Certain classes of perceptually significant musical repetition that cannot be discovered using string-processing algorithms can easily be discovered using appropriately designed algorithms that process multidimensional datasets. 7. By adopting this new geometric approach, we dispense with the need for multiple representations because the same repetition discovery algorithms can be run on various different orthogonal projections of a single rich multidimensional representation of a musical passage. This new approach allows complex polyphonic music to be analysed as simply and efficiently as monophonic music and allows for the discovery of repeated patterns in the timbre, dynamic and rhythmic structure of a passage as well as the pitch structure. 8. We introduce the concept of a maximal translatable pattern (MTP). Given a multidimensional dataset D, a set of points (that is, a pattern) P is the MTP for a vector v if and only if it is the largest pattern in D that can be translated by v to give another pattern in D. 9. We also introduce the concept of a translational equivalence class (TEC). Two patterns P1 and P2 are translationally equivalent if and only if P2 can be obtained by translating P1 . Given a multidimensional dataset D, and a pattern P , then the TEC of P in D contains all and only those patterns in D that are translationally equivalent to P . 10. We present two new algorithms: SIA and SIATEC. SIA takes a multidimensional dataset D as input and computes all the non-empty MTPs in D. SIATEC takes a dataset D as input and computes the set of TECs that contain non-empty MTPs in D. For a k-dimensional dataset of size n, the worst-case running time of SIA is O(kn2 log2 n) and the worst-case running time of SIATEC is O(kn3 ). 11. SIA and SIATEC have been implemented in C and compiled on a variety of platforms. The programs have been run on datasets ranging in dimensionality from 2 to 5 and in size from 6 to nearly 3500 datapoints. When run on a 500MHz Sparc machine, SIA takes less than 2 minutes

11 SUMMARY AND POSSIBLE DIRECTIONS FOR FURTHER WORK

47

to process a dataset containing 3500 two-dimensional datapoints and SIATEC takes around 13 minutes to process a dataset containing 2000 two-dimensional datapoints. 12. Typically, the number of MTPs computed by SIA is a very tiny fraction of all the patterns in a dataset. Moreover, many of the patterns involved in perceptually significant repetitions in music are MTPs that can be discovered using SIA and SIATEC. Nevertheless, SIATEC typically discovers tens of thousands of TECs even in relatively short pieces such as those in Bach’s Das Wohltemperirte Klavier and usually only a very small proportion of these TECs would be considered perceptually significant by a musician, listener or music analyst. It is therefore necessary to develop heuristics or algorithms for isolating those TECs that correspond to perceptually significant or analytically interesting repetitions.

11.2

Some possible directions for further work

There are many ways in which the ideas reported here could be developed further. For example, the SIA and SIATEC algorithms could be used as the basis for new computational models and software tools for repeated pattern discovery, database indexing and compression. These models and tools could be used on data representing images, audio recordings, video, 3d-molecular models and, in general, anything that can appropriately be represented as a multidimensional dataset. We suggest below some specifically music-related research projects that could arise out of the work reported in this paper. 1. There does not yet exist a well developed formal language for describing, representing and modelling repetition in music. Such a language could be used to investigate the mathematical properties of various classes of repetition and to develop new algorithms for discovering these classes of repetition. 2. We have noted that the class of perceptually significant repetitions is a diverse set because the patterns involved in such repetitions vary widely in their structural features and because there are many ways in which a pattern might be modified to give a second pattern that is perceived to be a version of it. It is necessary to formally characterise the classes of pattern that can be involved in a perceptually significant repetition and also the classes of transformation that can be applied to each class of pattern to give perceptually significant repetitions. 3. SIA and SIATEC currently only discover exactly repeated patterns in multidimensional datasets. Versions of these algorithms need to be developed that can discover approximately repeated patterns such as those that would occur, for example, in a MIDI file representation of a performance of a musical work. 4. The concept of a “perceptually significant repetition” needs to be satisfactorily explicated. One way of doing this would be to carry out experiments to identify which repeated musical patterns various categories of listener (e.g., casual listeners who cannot play an instrument, performing musicians, musicologists and music analysts) consider to be significant. 5. SIA and SIATEC could be used as the basis of a system for indexing music databases (Lemstr¨om et al., 2001). However, to be effective, such an indexing system would have to discover those patterns that users of such databases typically give as queries. Experiments therefore need to be done to determine what classes of pattern users give as queries to music databases. Algorithms and heuristics (possibly based on SIA and SIATEC) can then be developed to extract these classes of pattern from music representations. 6. It would be interesting to see if algorithms and/or heuristics could be developed for isolating those repetitions that occur in music from non-Western cultures that are recognized as being

REFERENCES

48

significant by various classes of listener in those cultures. In non-Western musics, one might well find that the rhythmic or timbral features of a repeated pattern more strongly determine whether or not the pattern is perceptually significant than its pitch-structural features. 7. An in-depth investigation needs to be carried out to determine the classes of musical repetition that can and cannot be discovered by processing various string-based music representations using various different algorithms. Such an investigation would clearly reveal the classes of perceptually significant musical repetition that cannot be discovered using string-processing algorithms but that can be discovered using SIA-like algorithms that process multidimensional datasets. 8. Algorithms and heuristics (possible based on SIA and SIATEC) need to be developed for discovering and isolating various classes of perceptually significant repetition (e.g., theme-like repeated patterns, inversions, retrograde repetitions, augmentations, diminutions etc.). 9. Finally, there may be ways of improving the running times of SIA and SIATEC. In particular, it might be possible to achieve much faster running times by developing parallel versions of the algorithms or by exploiting techniques such as word-level parallelism (see, for example, Rahman and Raman, 1998).

Acknowledgements David Meredith is supported by grant GR/N08049/01 from the EPSRC. Kjell Lemstr¨om was partially supported by grant number 48313 from the Academy of Finland and by grant GR/R25316 from the EPSRC. The authors would like to thank Marcus Pearce for his helpful comments on an earlier draft of this paper.

References Alberto Apostolico and Andrzej Ehrenfeucht. Efficient detection of quasiperiodicities in strings. Theoretical Computer Science, 119(2):247–265, October 1993. Alberto Apostolico, M. Farach, and Costas S. Iliopoulos. Optimal superprimitivity testing for strings. Information Processing Letters, 39(1):17–20, 1991. Alberto Apostolico and Franco P. Preparata. Optimal off-line detection of repetitions in a string. Theoretical Computer Science, 22(3):297–315, February 1983. D. Bainbridge, C.G. Neville-Manning, I.H. Witten, L.A. Smith, and R.J. McNab. Towards a digital library of popular music. In Proceedings of the 4th ACM conference on digital libraries, pages 161–169, Berkeley, CA., 1999. A. Ben-Amram, Omer Berkman, Costas S. Iliopoulos, and Kunsoo Park. The subtree max gap problem with application to parallel string covering. In Proceedings of the 5th ACM-SIAM Annual Symposium on Discrete Algorithms, Arlington, VA., pages 501–510, 1994. Ian Bent and William Drabkin. Analysis. New Grove Handbooks in Music. Macmillan, 1987. Ephraim J. Borowski and Jonathan M. Borwein. Dictionary of Mathematics. Collins, 1989. D. Breslauer. An on-line string superprimitivity test. Information Processing Letters, 44(6):345–347, 1992.

REFERENCES

49

Dany Breslauer. Testing string superprimitivity in parallel. Information Processing Letters, 49(5): 235–241, 1994. Available online at http://citeseer.nj.nec.com/301674.html. Alexander R. Brinkman. A binomial representation of pitch for computer processing of musical data. Music Theory Spectrum, 8:44–57, 1986. Alexander R. Brinkman. PASCAL Programming for Music Research. The University of Chicago Press, Chicago and London, 1990. ISBN 0-226-07508-7. Emilios Cambouropoulos. A general pitch interval representation: Theory and applications. Journal of New Music Research, 25:231–251, 1996. Emilios Cambouropoulos. Towards a General Computational Theory of Musical Structure. PhD thesis, University of Edinburgh, February 1998. Emilios Cambouropoulos, Maxime Crochemore, Costas S. Iliopoulos, Laurent Mouchard, and Yoan J. Pinzon. Algorithms for computing approximate repetitions in musical sequences. In Rajeev Raman and J. Simpson, editors, Proceedings of the 10th Australasian Workshop on Combinatorial Algorithms (AWOCA’99), pages 129–144, Perth, WA, Australia, 1999. A. Cardon and Maxime Crochemore. Partitioning a graph in O(|A| log2 |V |). Theoretical Computer Science, 19(1):85–98, 1982. Darrell Conklin and Christina Anagnostopoulou. Representation and discovery of multiple viewpoint patterns. In Proceedings of the International Computer Music Conference, 2001, Havana Cuba, 2001. Darrell Conklin and Ian Witten. Multiple viewpoint systems for music prediction. Journal of New Music Research, 24(1):51–73, 1995. Thomas H. Cormen, Charles E. Leiserson, and Ronald L. Rivest. Introduction to Algorithms. M.I.T. Press, Cambridge, Mass., 1990. ISBN 0-262-53091-0. Tim Crawford, Costas S. Iliopoulos, and Rajeev Raman. String matching techniques for musical similarity and melodic recognition. Computing in Musicology, 11:73–100, 1998. Maxime Crochemore. An optimal algorithm for computing the repetitions in a word. Information Processing Letters, 12(5):244–250, 1981. Maxime Crochemore and Wojciech Rytter. Text Algorithms. Oxford University Press, Oxford, 1994. Daniel Crow and Barbara Smith. DB Habits: Comparing minimal knowledge and knowledge-based approaches to pattern recognition in the domain of user-computer interactions. In Russell Beale and Janet Finlay, editors, Neural Networks and Pattern Recognition in Human-Computer Interaction, pages 39–63. Ellis Horwood, New York and London, 1992. Matthew Dovey. An algorithm for locating polyphonic phrases within a polyphonic musical piece. In Proceedings of the AISB’99 Symposium on Musical Creativity, pages 48–53, Edinburgh, 1999. Matthew Dovey and Tim Crawford. Heuristic models of relevance ranking in searching polyphonic music. In DIDEROT FORUM on Mathematics and Music: Computational and Mathematical Methods in Music, pages 111–123, 1999. Matthew Dovey and Tim Crawford. Heuristic models of relevance ranking in searching polyphonic music. Presented at the London Strings Days 2000 workshop, April 2000. Aristidis Floratos and Isidore Rigoutsos. Method and apparatus for pattern discovery in 1dimensional event streams, August 2000. U.S. Patent no. 6,108,666.

REFERENCES

50

Jia-Lien Hsu, Chih-Chin Liu, and Arbee L.P. Chen. Efficient repeating pattern finding in music databases. In Proceedings of the 1998 ACM 7th International Conference on Information and Knowledge Management, pages 281–288. Association of Computing Machinery, 1998. Costas S. Iliopoulos, Dennis W.G. Moore, and Kunsoo Park. Covering a string. In Alberto Apostolico, Maxime Crochemore, Zvi Galil, and U. Manber, editors, Proceedings of the 4th Annual Symposium on Combinatorial Pattern Matching, Padova, Italy, volume 684 of Lecture Notes in Computer Science, pages 54–62. Springer, 1993. Costas S. Iliopoulos and Laurent Mouchard. An O(n log n) algorithm for computing all maximal quasiperiodicities in strings. In Proceedings of CATS’99: ‘Computing: Australasian Theory Symposium’, volume 213 of Lecture Notes in Computer Science, pages 262–272, Auckland, New Zealand, 1999. Springer-Verlag. Costas S. Iliopoulos and Kunsoo Park. An optimal O(log log n)–time algorithm for parallel superprimitivy testing. Journal of the Korean Information Science Society, 21(8):1400–1404, 1994. Costas S. Iliopoulos and Kunsoo Park. A work-time optimal algorithm for computing all string covers. Theoretical Computer Science, 164(1–2):299–310, 1996. Brian W. Kernighan and Dennis M. Ritchie. The C Programming Language. Prentice-Hall International, London, 1988. ISBN 0-13-110362-8. S. Kurtz. Reducing the space requirement of suffix trees. Software—Practice and Experience, 29(13): 1149–1171, 1999. K. Lemstr¨om, P. Laine, and S. Perttu. Using relative interval slope in music information retrieval. In Proceedings of the 1999 International Computer Music Conference, pages 317–320, Beijing, 1999. Kjell Lemstr¨om. String Matching Techniques for Music Retrieval. PhD thesis, University of Helsinki, Faculty of Science, Department of Computer Science, 2000. Report A-2000-4. Kjell Lemstr¨om and Jorma Tarhio. Searching monophonic patterns within polyphonic sources. In Content-Based Multimedia Information Access Conference Proceedings (RIAO’2000), April 12–14, 2000, volume 2, pages 1261–1279, Coll´ege de France, Paris, April 2000. Kjell Lemstr¨om and Esko Ukkonen. Including interval encoding into edit distance based music comparison and retrieval. In Proceedings of the AISB 2000 Symposium on Creative and Cultural Aspects and Applications of AI and Cognitive Science (April 17–18), pages 53–60, University of Birmingham, 2000. Kjell Lemstr¨om, Geraint A. Wiggins, and David Meredith. A three-layer approach for music retrieval in large databases. In Second Annual International Symposium on Music Information Retrieval (ISMIR 2001), Indiana University, Bloomington, Indiana, 2001. Fred Lerdahl and Ray Jackendoff. A Generative Theory of Tonal Music. M.I.T. Press, Cambridge, Mass., 1983. David Lewin. Generalized musical intervals and transformations. Yale University Press, New Haven and London, 1987. Yin Li and W.F. Smyth. Computing the cover array in linear time, 1999. To appear. David Meredith. The computational representation of octave equivalence in the Western staff notation system. In Cambridge Music Processing Colloquium, September 1999., September 1999. http://www-sigproc.eng.cam.ac.uk/music proc/submissions/meredith.pdf.

REFERENCES

David Meredith. MIPS: A formal language for the gation of pitch systems (version 2001-09-10), 2001. http://www.titanmusic.com/papers/public/mips20010910.pdf.

51

mathematical investiAvailable online at

David Meredith, Kjell Lemstr¨om, and Geraint Wiggins. SIA and SIATEC: Efficient algorithms for translation-invariant pattern discovery in multi-dimensional datasets, 2001. Document submitted to UK Patent Office on May 23, 2001. Available online at http://www.titanmusic.com/papers/public/patent.pdf. Marcel Mongeau and David Sankoff. Comparison of musical sequences. Computers and the Humanities, 24:161–175, 1990. Dennis W.G. Moore and W.F. Smyth. Computing the covers of a string in linear time. In Proceedings of the 5th ACM-SIAM Symposium on Discrete Algorithms, pages 511–515, 1994. ´ Jean-Jacques Nattiez. Fondements d’une s´emiologie de la musique. Union G´en´erale d’Editions, Paris, 1975. Naila Rahman and Rajeev Raman. An experimental study level parallelism in some sorting algorithms, 1998. Available http://citeseer.nj.nec.com/rahman98experimental.html.

of wordonline at

Pierre-Yves Rolland. Discovering patterns in musical sequences. Journal of New Music Research, 28 (4):334–350, 1999. Gilbert Rouget. Un chromaticisme africain. l’Homme, I(3), 1961. Nicolas Ruwet. M´ethodes d’analyse en musicologie. Revue belge de musicologie, 20:65ff., 1966. Republished in (Ruwet, 1972, 100–134); English translation in Music Analysis, Vol.5, 1986. ´ Nicolas Ruwet. Langage, Musique, Po´esie. Editions du seuil, 27, rue Jacob, Paris VI., 1972. Yo Tomita. The Second Book of the ‘48’ (MIDI files of II of J. S. Bach’s ‘Das Wohltemperirte Klavier), 1996. http://www.music.qub.ac.uk/~tomita/midi.html.

the works in Book Available online at

Esko Ukkonen. Algorithms for approximate string matching. Information and Control, 64:100–118, 1985.

Suggest Documents