tion, comprising traditional music mainly from the British Isles. ..... Music Information Retrieval (ISMIR 2003), 159-165 (2003). [ET01] Eerola, T., Toiviainen, P.: A ...
Visualization in Comparative Music Research Petri Toiviainen and Tuomas Eerola Department of Music University of Jyväskylä Finland {ptoiviai, ptee}@campus.jyu.fi Summary. Computational analysis of large musical corpora provides an approach that overcomes some of the limitations of manual analysis related to small sample sizes and subjectivity. The present paper aims to provide an overview of the computational approach to music research. It discusses the issues of music representation, musical feature extraction, digital music collections, and data mining techniques. Moreover, it provides examples of visualization of large musical collections. Key words: music, computational musicology, musical data mining, visualization 1 Introduction A great deal of research in musicology has concentrated on the analysis and comparison of different musical styles, genres, and traditions. This paradigm stems from the comparative and systematic musicology of late 19th century. Typical research questions in this area of inquiry involve the evolution of a musical style, typical musical features in the works a composer, or similarities and differences across music traditions from various geographical regions. Research aimed at tackling these kinds of questions has traditionally been based on visual analysis of notated scores (when these have been available), or aural analysis of music recordings. While studies utilizing these kinds of methods have undoubtedly shed light on similarities and differences on both temporal and spatial dimensions, they have two potential limitations. First, visual or aural analysis of music is time-consuming, and, consequently, studies utilizing these methods are necessarily based on relatively small sets of musical material, which may not be representative of the musical styles or traditions in question. Second, these kinds of analysis
(Published In A. Rizzi & M Vichi (Eds.), COMPSTAT 2006 - Proceedings in Computational Statistics. Heidelberg: Physica-Verlag, pp. 209-221)
methods may be subjective, or prone to errors, both of which can hinder the replicability of the study. A possible way of overcoming these limitations would be to adopt a computational approach. This would include the use of large digital collections of appropriate musical material, computational extraction of relevant musical features from this material, and subsequent utilization of, for instance, statistical methods to the extracted musical features. Such computational approaches to the analysis of large collection of music have been utilized since the 1980s [Mar83, VT89]. In addition to testing specific hypotheses concerning music, large musical collections can be used as material for exploratory research, the aim of which is to find interesting structures within, or similarities and differences between musical collections. To this end, methods of data mining can be applied. The present paper aims at providing an overview of the computational approach to comparative music research. First, issues related to forms of music representation, musical feature extraction, digital music collections, and data mining techniques are discussed. Second, examples of visualization of large musical collections are presented. 2 Music representations There are several alternatives for digital representation of music. On a general level, music representations can be divided in three categories based on their degree of structuredness: (1) notation-based, (2) eventbased, and (3) signal representations. Notation-based representations (e.g., **kern, SCORE, GUIDO, NIFF, DARMS, Common Music Notation) consist of discrete musical events like notes, chords, time values, etc., and describe these events in relation to formalized concepts of music theory. Event-based representations (e.g., MIDI, MIDI File) are somewhat less structured than notation-based ones, containing information about the pitch, onset and offset times, dynamics (velocity) and timbre (channel). Signal representations (e.g., AIFF, WAV, MP3, AAC) result from audio recordings and contain no structured information about music. From the viewpoint of computational music analysis, each of the three representation categories has its advantages and shortcomings. Notationbased and event-based representations are especially suitable for the investigation of high-level musical phenomena such as melodic, harmonic, and tonal structure. Signal representations are best suited for the analysis of, for instance, timbre, rhythmic structure and, to some degree, harmony and tonality. Although limited success has been achieved in extracting instru-
ment parts and melodic lines from music recordings [Kla05], this problem still waits to be solved. For each of these three main representation types, there are tools available for computational analysis of music. For notation-based representations, perhaps the best known is Humdrum [Hur95], which is a versatile collection of UNIX-based tools for musicological analysis. For eventbased representations, the MIDI Toolbox [ET04a], containing about 100 functions for cognitively oriented analysis of MIDI files, is available on the Internet1. With the IPEM Toolbox [LLT00], signal representations of music can be analyzed in terms of, for instance, their spectral structure, roughness, tone onset structure, and tonal centres. 3 Musical databases There is a relatively long tradition in organizing musical material into various kinds of collections. For instance, A Dictionary of Musical Themes [BM48] contains the opening phrases of ca. 10,000 compositions, organized with a manner that allows searches based on musical content. The largest digital database of music is the RISM incipits database [RIS97] that was initiated in the 1940s and currently contains ca. 450,000 works by ca. 20,000 composers. The compositions are encoded in the database using a simple notation-based representation that includes pitch, time value, location of bar lines and key and meter signatures. Musical databases that are freely available on the Internet, such as Melodyhound 2 and Themefinder 3, are not quite as extensive, containing a few thousands of items from the classical repertoire. On the web pages of Ohio State University one can find a few thousand classical works 4. In the field of folk music, the most extensive collection is the Digital Archive of Finnish Folk Tunes 5 [ET04b], containing ca. 9000 folk melodies and related metadata. Another extensive digital collection of folk music is the Essen Folk Song Collection [Sch95] that consists of ca. 6000 folk melodies of mainly European origin; also this collection contains extensive metadata concerning each melody. The MELDEX collection, mentioned above, contains ca. 2000 folk melodies from the Digital Tradition collection, comprising traditional music mainly from the British Isles. 1
http://www.jyu.fi/musica/miditoolbox/ http://www.musipedia.org/ 3 http://www.themefinder.org 4 http://kern.humdrum.net/ 5 http://www.jyu.fi/musica/sks/index_en.html 2
Although the number of music recordings greatly exceeds the number of notation-based or event-based representations of music, organized music databases in signal representations are, as of yet, less common than databases in other representations. This is mainly due to the memory requirements associated with audio. However, the Variations2 project6 at Indiana University aims at creating a digital music library that will contain the entire catalogue of Classical, Jazz, and Asian digital recordings of the recording company Naxos, consisting of about three terabytes of digital music information. In addition, the Real World Computing Music Database [GHN02] contains works in pop, rock, jazz, and classical styles in both acoustical and event-based forms. 4 Musical feature extraction In comparative research based on musical databases, the first step in the investigation is to extract relevant features from the musical material. The choice of features to be extracted is mainly dictated by the type of representation of the musical material at hand, and the research questions one aims to study. As indicated before, the set of musical features that can be reliably extracted with computational algorithms depends on the type of music representation. On a general level, the features could be divided into low-level features related to, for instance, spectrum, roughness, and pitch, and high-level features such as texture, rhythmic, melodic, and tonal structure. Another distinction can be made between temporal and static features. Temporal features represent aspects of sequential evolution in the music; examples of such features include the melodic contour vector [Juh00] and the self-similarity matrix [CF02]. Static features are overall descriptors of the musical piece collapsed over time, such as spectrum histograms [PDW04], statistical distributions of pitch-classes, intervals, and time values [PPI04, ET01], as well as periodicity histograms [DPW03, TE06]. An overview of the state of the art in computational feature extraction of music can be obtained at the ISMIR (International Conference for Music Information Retrieval) website7. The musical feature extraction process results in a musical feature matrix M = (mij). This is an N x M matrix, in which each of the N musical items is represented by an M-component feature vector. This is the starting point of subsequent analyses.
6 7
http://dml.indiana.edu http://www.ismir.net/all-papers.html
5 Data mining Depending on the research approach, the obtained musical feature matrix can be subjected to either confirmatory of exploratory data analysis. If one has specific hypotheses concerning, for instance, aspects in which two musical collections differ, these can be tested using a deductive approach, that is, using inferential statistics. If, however, there are no clear hypotheses concerning the data, an inductive, exploratory approach can be adopted. The aim of this latter approach is to find interesting structures in the data set, such as clusters, trends, correlations, and associations, as well as to find questions (rather than answers), and create hypotheses for further study. To this end, methods of data mining can be useful. Data mining can be described as a collection of methods for exploratory analysis of large data sets. Central methods utilized in data mining include projection, clustering, estimation, and visualization. Each of these methods is summarized below. 5.1 Projection
In many cases, the musical feature matrix has a large number of feature dimensions. To reduce the number of feature dimensions, various methods of projection can be applied. The various projection methods differ in terms of their criteria for the choice of projection direction in the highdimensional space. Typical methods used for dimensionality reduction include the following: • Principal Components Analysis (PCA). The PCA is a standard projection method that uses maximal variance as the projection criterion, and produces orthogonal projection directions. • Independent Component Analysis (ICA; [HKO01]). The ICA utilizes a latent variable model to project the data onto statistically independent dimensions. • Fisher Discriminant Function (FDF). If the data consists of items belonging to different classes, and the class labels are available, the FDF can be used to project the data onto dimensions that maximize the ratio of between-class variance to within-class variance, thus resulting in projections that produce maximal separation between the classes. • Projection Pursuit (PP; [Fri87]). The PP attempts to find projection directions according to a criterion of "interestingness". A typical such criterion is that the distribution of the projected data be maximally nonGaussian.
• Self-Organizing Map (SOM; [Koh95]). The SOM utilizes an unsupervised learning algorithm to produce a non-linear projection of the data set that maximizes the local variance. The projections obtained by each of the aforementioned methods can be visualized to allow exploratory study of the data. Moreover, the projection directions themselves contain information about the musical features that are significant for the projection criterion of the particular projection method. 5.2 Clustering
If the musical collection under investigation is large, it is often useful to reduce the amount of information by representing the items by a smaller number of representative exemplars. To this end, various clustering methods are available. • Hierarchical Clustering methods proceed successively by merging small clusters into larger ones. This results in a tree of clusters referred to as the dendrogram, which shows how the clusters are related. • Partitional Clustering methods attempt to decompose the data set into a predefined number of clusters. This is usually carried out by minimizing some measure of dissimilarity between the items within each cluster, or maximizing the dissimilarity between the clusters. An example of partitional clustering methods is k-means clustering. • The Self-Organizing Map (SOM), in addition to performing a non-linear projection of the data set, carries out clustering by representing the data set using a reduced set of prototype vectors. The combination of projection and clustering makes the SOM particularly suitable for data visualization. 5.3 Estimation
Musical feature matrices with high feature dimensions (M) can be visualized as, for instance, scatter plots on two (or three) projection directions. If the number of items (N) is large, it may, however, be difficult to observe the structure of the data set due to extensive overlapping of markers. In other words, it is possible that one observes mainly the outliers rather than the bulk of the data. This problem may be overcome by estimating the probability density of the projected data set with a nonparametric method, such as kernel density estimation [Sil86]. Kernel density estimation is carried out by summing kernel functions located at each data point, which in the present case comprise the projections of each musical feature vector. The kernel function is often a (one- or two-dimensional) Gaussian. The re-
sult of the estimation is a smooth curve or surface – depending on the dimensionality of the projection – the visualization of which may facilitate the observation of interesting structures in the data set. 6 Examples of visualization of musical collections This chapter presents examples, in which methods of musical feature extraction, projection, clustering, and estimation have been applied to musical collections. 6.1 Pitch-class distributions and SOM
Pitch-class distributions enable a detailed analysis on the importance of different tones in a musical corpus. Fig. 1 displays the component planes of a SOM with 12 x 18 cells that was trained with the pitch-class distributions of 2240 Chinese, 2323 Hungarian, 6236 German, and 8613 Finnish melodies. The musical feature matrix used to train the SOM thus had 19412 x 12 components. Each of the 12 subplots of the figure corresponds to one pitch-class, from C to B, and the colour displays the value of the respective component in the cells' prototype vectors, the red colour standing for a high value, and the blue colour for a low value. For instance, the lower left region of the SOM contains cells with prototype vectors having high values for the pitch classes G and A. Consequently, melodies in which these pitch-classes are frequently used are mapped to this region.
Fig. 1. The component planes of a SOM trained with pitch class distributions of 19412 folk melodies.
Differences in the pitch-class distributions between the collections can be investigated by visualizing the number of melodies that are mapped to each cell. This is shown in Fig. 2. As can be seen, the melodies of each collection occupy to a great extent different regions on the map, suggesting that there are significant differences in the pitch-class usage between these collections.
Fig. 2. Number of melodies mapped to each cell of the SOM of Fig. 1 for each of the four collections.
6.2 Metrical structure and PP
Most music exhibits a hierarchical periodic grouping structure, commonly referred to as meter. The metrical structure of a piece of music can be represented by, for instance, an autocorrelation-based function [Bro93, TE06]. Fig. 3a displays a visualization of metrical structures in a collection of Finnish folk melodies. To obtain the visualization, 8613 melodies from the Digital Archive of Finnish Folk Tunes [ET04b] were subjected to autocorrelation analysis, using the method of [TE06]. This resulted in 32component autocorrelation vectors, representing the metrical structure of each melody in the collection. Subsequently, PP and kernel density estimation were applied to the projection.
Fig. 3. Visualization of metrical structures in (a) the Digital Archive of Finnish Folk Tunes, and its (b) Folk songs and (c) Rune songs subcollections.
The obtained probability density shows an interesting structure, with three arms growing from the central body. Inspection of the projection directions suggests that the three arms can be associated with the 2/4, 3/4, and 5/4 meters. Probability densities for the folk song and rune song subcollections (Figs. 3b-c) imply differences between the distributions of meters within these subcollections. 6.3 Melodic contour and SOM
Melodic contour, or the overall temporal development of the pitch height, is one of the most salient features of a melody (Dowling 1971). Some shapes of melodic contour shapes have been found to be more frequent than others. For instance, Huron [Hur96] investigated the melodies of the Essen collection and found that an arch-shaped (i.e. ascending pitch followed by descending pitch) contour was the most frequent contour form in the collection. The SOM can be used to study and visualize typical contour shapes. Fig. 4a displays the prototype vectors of a SOM with 6 x 9 cells that was trained with 64-component melodic contour vectors. The material consisted of 9696 melodic phrases from Hungarian folk melodies and 13861 melodic phrases from German folk melodies. The musical feature matrix thus had 23557 x 64 components. As can be seen, the arch-shaped contour is prevalent on the right side of the map, but the left side of the map is partly occupied by descending and ascending contours. To compare the distribution of contour types between the two collections, the number of melodic phases mapped to each cell is displayed in Fig. 4b-c. As can be seen, the arch-shaped contour types are somewhat more prevalent in the German collection than in the Hungarian, whereas the opposite holds true for the descending contour types.
Fig. 4. (a) The prototype vectors of a SOM trained with 23557 melodic contour vectors. Number of melodic phrases mapped to each cell for (b) Hungarian and (c) German melodies.
6.4 Spatial estimation of musical features
If a musical database contains precise information about the geographical origin of each musical piece, geographical variation of musical features can be studied by applying methods of spatial estimation. In [AH01] visualizations of the geographical variation of various musical features in the Essen collection were presented. The Digital Archive of Finnish Folk Tunes [ET04b] contains detailed geographical information about the origin of each tune. Fig. 5 shows visualizations obtained using this information and kernel density estimation.
Fig. 5. (a) The proportion of melodies in minor mode in different regions of Finland. The red colour denotes a high proportion and the blue colour a low proportion. (b) The proportion of melodies starting with a tonic.
Fig. 5a displays the geographical variation of the proportion of melodies in minor mode in Folk song subcollection (N = 4842). As can be seen, melodies in minor are significantly more prevalent in the northeast than they are in the southwest. Fig. 5b displays the proportion of melodies that start with the tonic. The highest proportion of such melodies is in the western part of the country.
7 Conclusion This article has provided an overview of visualization methods in comparative music research. The application of computational methods to the investigation of large musical collections has the potential to afford insights into the material that would be difficult to obtain through manual analysis of musical notations, or aural analysis of recorded material. It also avoids the pitfalls of traditional methods by allowing one to study larger, and thus more representative, sets of musical material with objective methods. Explorative investigation of properly visualized collections may help to discover interesting structures, such as clusters, trends, correlations, and associations in various musical feature dimensions. These can again create hypotheses for further studies, in which additional methodologies can be used. References [AH01]
Aarden, B., Huron, D.: Mapping European folksong: Geographical localization of musical features. Computing in Musicology, 12, 169-183 (2001) [BM48] Barlow, S.H., Morgenstern, S.: A Dictionary of Musical Themes. Crown Publishers, New York (1948) [Bro93] Brown, J.C.: Determination of meter of musical scores by autocorrelation. Journal of the Acoustical Society of America, 94, 1953–1957 (1993) [CF02] Cooper, M., Foote, J.: Automatic Music Summarization via Similarity Analysis. In Proceedings of the 3rd International Conference on Music Information Retrieval (ISMIR 2002), 81-5 (2002) [DPW03] Dixon, S., Pampalk, E., Widmer, G.: Classification of dance music by periodicity patterns. Proceedings of the 4th International Conference on Music Information Retrieval (ISMIR 2003), 159-165 (2003) [ET01] Eerola, T., Toiviainen, P.: A method for comparative analysis of folk music based on musical feature extraction and neural networks. In: H Lappalainen (ed) Proceedings of the VII International Symposium of Systematic and Comparative Musicology and the III International Conference on Cognitive Musicology. University of Jyväskylä (2001) [ET04a] Eerola, T., Toiviainen, P.: MIDI toolbox: MATLAB tools for music research. University of Jyväskylä, available at: http://wwwjyufi/musica/miditoolbox (2004) [ET04b] Eerola, T., Toiviainen, P.: The Digital Archive of Finnish Folk Tunes Jyväskylä: University of Jyväskylä, available at: http://wwwjyufi/musica/sks (2004)
[Fri87] [GHN02]
[Hur95] [Hur96] [HKO01] [Juh00] [Kla05] [Koh95] [LLT00] [Mar83] [PDW04] [PPI04]
[RIS97]
[Sch95]
[Sil86] [TE06]
[VT89]
Friedman, J.H.: Exploratory projection pursuit. Journal of the American Statistical Association, 82, 249-266 (1987) Goto, M., Hashiguchi, H., Nishimura, T., Oka, R.: RWC Music Database: Popular Classical and Jazz Music Databases. In Proceedings of the 3rd International Conference on Music Information Retrieval (ISMIR 2002), 287-288 (2002) Huron, D.: The Humdrum Toolkit: Reference Manual. Center for Computer Assisted Research in the Humanities, Menlo Park, CA (1995) Huron, D.: The melodic arch in Western folksongs. Computing in Musicology, 10, 3-23 (1996) Hyvärinen, A., Karhunen, J., Oja, E.: Independent Component Analysis. John Wiley & Sons, New York (2001) Juhász, Z.: Contour analysis of Hungarian folk music in a multidimensional metric-space. Journal of New Music Research, 29, 71-83 (2000) Klapuri, A.: Automatic music transcription as we know it today. Journal of New Music Research, 33, 269-282 (2005) Kohonen, T.: Self-organizing maps. Springer-Verlag, Berlin (1995) Leman, M., Lesaffre, M., Tanghe, K.: The IPEM toolbox manual. University of Ghent, IPEM (2000) Marillier, C.G.: Computer assisted analysis of tonal structure in the classical symphony. Haydn Yearbook, 14, 187-199 (1983) Pampalk, E., Dixon, S., Widmer, G.: Exploring music collections by browsing different views. Computer Music Journal, 28, 49-62 (2004) Ponce de León, P.J., Pérez-Sancho, C., Iñesta, J. M.: A shallow description framework for music style recognition. Lecture Notes in Computer Science, 3138, 876-884 (2004) RISM: Répertoire international des sources musicales: International inventory of musical sources In: Series A/II Music manuscripts after 1600 [CD-ROM database]. K. G. Saur Verlag, Munich (1997) Schaffrath, H.: The Essen folksong collection in kern format [computer database]. Edited by D Huron. Center for Computer Assisted Research in the Humanities, Menlo Park, CA (1995) Silverman, B.W.: Density Estimation for Statistics and Data Analysis. Chapman and Hall, London (1986) Toiviainen, P., Eerola, T.: Autocorrelation in meter induction: The role of accent structure. Journal of the Acoustical Society of America, 119, 1164-1170 (2006) Vos, P.G., Troost, J.M.: Ascending and descending melodic intervals: statistical findings and their perceptual relevance. Music Perception, 6, 383-396 (1989)