Soft Comput (2012) 16:2049–2056 DOI 10.1007/s00500-012-0871-z
FOCUS
Corpus-based recombinant composition using a genetic algorithm Arne Eigenfeldt
Published online: 15 June 2012 Springer-Verlag 2012
Abstract A generative real-time composition system is described that uses a genetic algorithm to create a population of melodic and rhythm phrases that are combined by intelligent musical agents. The initial population is derived from an offline analysis of a corpus; the population undergoes continual breeding using rules derived from the population itself. The system’s role in the generation of musical material for the acoustic composition Other, Previously, for string quartet, is discussed. Keywords Evolutionary music Genetic algorithms Corpus-based composition
1 Introduction Other, Previously, for string quartet, was composed using Kinetic Engine, an interactive musical performance software system developed by the author, which generates rhythmic and melodic polyphony using musically intelligent algorithms (Eigenfeldt 2006, 2007, 2009). This version (KE3) uses an evolving population of phrases produced using a genetic algorithm derived from analysis of source material. Although it is a real-time system—in which individuals from the population are selected based upon their suitability to the current musical environment (see Sect. 2)—for Other, Previously, it was used to generate a fixed, fully notated score. Genetic algorithms (GA) have been used in a variety of ways to create music, both in performance (real-time) and A. Eigenfeldt (&) School for the Contemporary Arts, Simon Fraser University, Vancouver, Canada e-mail:
[email protected]
in the studio. Todd and Werner (1999) provide a good overview of the earlier musical explorations using such approaches, while Miranda and Biles (2007) provide a more recent survey. Very few of these approaches have been compositional in nature—using evolutionary methods to generate entire compositions rather than portions of compositions; instead, their foci is upon generating melodies, harmonies, or timbre. Biles, in his discussion, of his improvisation system, GenJam (Biles 1994), points out many of the idiosyncratic factors of using GAs in a musical context, including the notion that any fitness test derives from aesthetic judgment, which is inherently difficult to encode (Biles 2001). Biles’ system is, perhaps, the most famous—and arguably successful—musical application of evolutionary algorithms; however, its use is within a narrowly defined problem space of mid-twentieth century jazz. Waschka gives a revealing description of the problem faced by composers of contemporary concert music, their relation to their material, and the notion of the ‘welldefined problem space’ (Waschka 2007). To paraphrase, he points out that the desired solution of ‘‘good new music’’ is not, in itself, clearly defined: ‘‘Most composers, upon hearing a piece, even for the first time, feel confident of their ability to judge its quality and believe they will be able to point out what things about the piece worked well and what did not. However, such estimations differ significantly from knowing, a priori, what will make a good, non-formulaic, experimental, or avant-garde piece’’. This points directly to the inherent complication of using evolutionary algorithms within music: the difficulty in designing a non-interactive fitness function. Waschka solves this problem in GenDash by avoiding the issue entirely, and selecting individuals for reproduction through random methods. While this may solve a computational
123
2050
problem, a clear aesthetic choice—embracing randomness—has been made that other artists may not welcome. Martins and Miranda (2007) suggest that artists use computational models not as ‘‘tools to generate efficient solutions to problems automatically’’, but instead, ‘‘composers need tools to explore a vast space of possible outcomes’’. They also point out that ‘‘generating music from algorithms that were not designed for music seldom produces significant pieces of music’’, and instead, composers should modify A-Life algorithms to address musical issues. As mentioned, a good deal of research exists in this area, although the applications—and in rare case, systems—are primarily limited to very narrow problem spaces. For example, Thywissen (1996) describes a system, which allows composers to interactively explore musical structures through evolution. Apart from Waschka, it is difficult to find references to notated music, specifically concert music, which have been created, either entirely or in large part, through evolutionary algorithms. The reason for this could be twofold: first, the number of composers of concert music able to create their own evolutionary music systems is limited; second, those that have created production systems have not been artistically satisfied with their output. As Mozer suggests regarding his own CONCERT system, the resulting music tends to be ‘‘compositions that only their mother could love’’ (Mozer 1994). KE3 is, first and foremost, an artistic production system, in which a modified GA is employed for musical means: the goal is not any single solution, but the continual evolution of a given data set that is heard as it evolves. It has been used as a live musical production system, as well as producing fixed musical scores that have been performed in public concerts.
2 Description KE3 generates a population of available phrases (individuals) which are derived from an initial analysis of a provided corpus. Because all individuals are generated based upon rules derived from a corpus, they are all assumed to be strong, and thus no fitness test is required to assess which individuals are allowed to mate. Individuals are selected from the population for performance based upon their suitability to the current environment; those individuals that have been already performed are considered ‘‘stale’’, and have a greater likelihood of being culled from the population prior to the next breeding. New individuals are bred during performance from single individuals using derived Markov transitions, as well as a musically useful mutation function. The entire system was written in MaxMSP, and is available on the author’s website.
123
A. Eigenfeldt
Although Kinetic Engine has been described in academic and scientific journals and papers, its purpose, first and foremost, is as a music production tool. Almost all of the design decisions have an artistic motivation, and there are many examples of heuristic decisions made for purely musical considerations. For example, the use of a genetic algorithm was not initially within the design of this version; instead, the system was envisioned as one that derived formative material from a corpus. Rather than stitching together segments in direct recombination (Cope 1996), it was decided that these segments could be continually evolved, and a genetic algorithm was then included, as it is ideally suited for such a task. The system is comprised of two modules: an offline analysis module (see Sect. 2.1), and a real-time module (see Sect. 2.2–2.6) that generates polyphonic material using four autonomous Player agents. 2.1 Analysis The use of a corpus allows for the creation of rules derived through analysis of user provided MIDI files. This material can be very explicit (i.e. a set of patterns to use for a section of music) or general (i.e. a transcription of an entire piece of music). In the case of Other, Previously, the corpus is the 9 individual instrumental parts for the traditional Javanese gamelan composition Ladrang Wilugeng (Sorrell 1990), of which 16 measures were transcribed into standard MIDI files. The derived rules are variable order Markov transitions (Ames 1989) with separate phrase databases for pitch and rhythm. The musical result is a piece of music that reflects many of the tendencies of the original, without direct quotation. Kinetic Engine uses a chromosome structure for representing rhythmic phrases of indices into a database of potential rhythmic subdivisions of a beat, currently limited to sixteenth and triplet subdivisions (see Fig. 1). Individuals can be of varying length, based upon the phrase lengths detected by the analysis module.
Fig. 1 The first 16 subdivisions within the subdivision array
Corpus-based recombinant composition
Thus, the following phrase, taken from the gender 1 part of Ladrang Wilugeng: is represented rhythmically as (6 10 6 10 6 10 6 3). Pitch data are similarly extracted based upon patterns found within a beat. However, unlike rhythmic combinations, the number of possible pitch combinations within a beat is exponential, and thus representations are assigned as they are found. The first pitches found in the first beat will be assigned an index of 1; the next unique group of pitches found in a beat are assigned an index of 2, and so on. Therefore, Fig. 2 is represented as (1 2 3 2 1 2 1 4), with specific pitches being stored separately (see Fig. 3). As the performance module assumes repetition of phrases (an aesthetic decision), analysis algorithms search for repetitions with the source material. Individual pitch and rhythmic phrases are segmented based upon found repetitions (i.e. 1, 2, 4, or 8 measures). The occurrence of new phrases is noted within the overall material, and is considered variation points. Successive phrases are compared for similarity, thus determining the amount of variation between consecutive phrases. The analysis data are written to disk as XML files. When the databases are read at the beginning of the performance, they form the initial populations for individual Player agents (see Sect. 2.3). 2.2 Performance
2051
Fig. 3 The pitch representation of Fig. 2
XML analysis file
Population
Generation
Performance
Selection
Fig. 4 The design scheme for Kinetic Engine v.3 performance system
Parent (generation 0)
1st generation individuals A, B, C
The performance module is outlined in Fig. 4. 2.3 Population generation Along with the original source material, which exists in the database as generation 0, Player agents generate initial populations using the Markov transition tables: thus, generation 1 closely resembles the source material (see Fig. 5). Chromosomes that are found at the beginning of phrases in the corpus are used as initial queries to the database, and phrases are generated from acceptable continuations that meet the transition criteria. Terminal chromosomes are also taken from the corpus, to maintain musically logical phrase endings. Pitch material is treated separately, but linked to rhythmic generation. Given the three first generation individuals of Fig. 5, the pitch generation algorithm will construct appropriately matching pitch individuals. Pitch transitions are initially stored in a multi-dimensional matrix
Fig. 2 Example phrase from corpus
Fig. 5 Example of a parent phrase, and three first generation children generated using Markov transitions
that takes into account transitions between the number of onsets within a beat. For example, transitions of two notes to three notes are stored separately from transitions of two notes to two notes. Thus, for individual C in Fig. 5, a pitch individual needs to be constructed that has 2 pitches in the first four chromosomes (see Fig. 6). Once a population has been generated, it is immediately analysed (in a background process) for new transitions that form the ruleset for the next generation. New transitions will only appear due to mutation (see Sect. 2.6). Population analysis also includes an individual’s density (the number of events per rhythmic pattern compared to the maximum possible), and complexity (the degree of syncopation
123
2052
A. Eigenfeldt
Parent (generation 0)
1st generation individuals A, B, C
The elimination algorithm is a variation of the roulette wheel selection: the amount of ‘‘culling’’ is user defined (the default is 50 %). The population is sorted by usage, and a random value is exponentially scaled to select overused patterns. This method neither guarantee elimination of all weak candidates (those that are stale), nor the prolongation of strong candidates (those that remain fresh). Musically, this maintains an unpredictable balance between the repetitions of previous material with the introduction of new material. 2.6 Reproduction and mutation
Fig. 6 Example of a parent pitch phrase, and three first generation children based upon Markov transitions
within the individual beats and placement within the pattern): both are used by the Player in its selection criteria. 2.4 Selection of individuals for performance Individuals chosen from the population for performance depend upon the musical environment: player agents look to the user-defined global density and complexity parameters, and use a K-nearest neighbour algorithm to create a ranking of individuals from the population that best match these values. A roulette-wheel selection is made from this ranking, favouring the higher-ranked individuals. Using such a selection procedure ensures that, given the same conditions, the same selection will not necessarily be made each time. New individuals are chosen during performance when the global density and complexity parameters change, or a variation point has been reached (calculated during analysis). The amount of variation required at this point, similarly derived from analysis, entails selecting a new individual whose similarity to the current individual most closely matches the required variation parameter. Agents keep track of which individuals have been heard, as well as how many repetitions. This information is used to determine which individuals to cull prior to breeding.
Crossover, a standard method of generating new individuals by splicing together chromosomes from strong parents, is not used in KE3, as it was felt to produce unmusical results. Instead, the genetic material of a single parent is used to intelligently create a variation through continually derived Markov transitions. Special attention is paid to beginnings and endings of rhythmic phrases through independent transition matrices for these points. Although diversity depends upon the source material (the basis of the original population), it is further maintained through two mutation methods: the substitution of null rhythms (rests) at the beginning or end of the individual, and the substitution of related chromosomes within the individual—similarity being determined by an algorithm that compares weighted onsets and density within the patterns themselves. In both cases, the degree of mutation is constrained during performance: this amounts to a compositional control, as it has lasting effects.
3 Compositional use Although KE3 functions as a performance system, in the case of Other, Previously, it was used to generate material for a fixed, fully notated composition. The system was used ‘‘live’’ in the composer’s studio, and the real-time MIDI output was captured by a notation program. An informal performance of the work can be seen in http://youtu.be/ gaQfyhOiRio. 3.1 Description of corpus
2.5 Selection for elimination (culling) New populations are bred during performance at instances initiated by the user. However, rather than attempting to determine which individuals are most fit, a simpler determination is made as to which individuals remain ‘‘fresh’’ (selected for performance less often, or not at all), versus those that are considered ‘‘stale’’ (those that have been presented many times).
123
Sixteen measures of the traditional Javanese gamelan composition Ladrang Wilugeng, containing nine separate parts—gerongan, rebab, gender panerus (left hand, right hand), gender barung (left hand and right hand), bonang barung and bonang panerus, and gambang—served as the corpus. Some instrumental parts have a great deal of repetition: the first of four phrases for gender panerus right hand is
Corpus-based recombinant composition
Fig. 7 Interlocking gender parts
given in Fig. 7 (along with its interlocking left hand): the remaining three phrases are only subtle variations of this phrase. The rebab part, on the other hand, has little repetition, and is a continuous, free flowing melody. The pitch material of the original composition is limited to a major pentatonic scale in C, with 11 discrete pitches. 3.2 Real-time performance To maintain clarity between contrapuntal and hetrophonic material, only two or three agents were used in the studio performance, each agent producing monophonic material. Different combinations of analysis were used to generate seven recordings: • • • • • •
gambang and rebab; bonang barung and bonang panerus (two versions, one with two voices, one with three); rebab and saron; two gender; two sarons and rebab; gambang and other instruments.
The recordings varied in length from 10 2600 (27 measures) to 40 4500 (101 measures). A rotating pitch field was created and triggered during performance, which essentially caused modal modulations to occur. This generated some harmonic variety that was not found in the original corpus.
2053
The analysis produced three unique parents from the rhythm material of gender panerus, and three from gender barung (see Figs. 8, 9). These individuals served as generation 0 to the subsequent population of 19 individuals in each agent. In both cases, the initial generation remained within the population (as individuals only die when they are performed). The agent using the gender panerus corpus produced 16 new individuals: six from parent 1, seven from parent 2, and three from parent 3. Similarly, the three parents from gender barung produced 16 new individuals: four from parent 1, six from parent 2, and four from parent 3. The generation algorithm selects parents for breeding using a weighted probability based upon their prevalence in the original corpus. Given the performance variables of density and complexity, agents choose individuals to perform that best match user requests. The first six measures of this section is shown in Fig. 10: the similarity to the original is clear. After these six measures, a new population was bred, which introduced new individuals through mutation. At the same time, the performance parameter of density was increased, thereby forcing a selection of new individuals that had a greater number of onsets. As there were no individuals that contained more onsets, this request could not be met, and the generative capabilities of the system were initiated. 3.5 Generative aspects of system When individuals in the existing population do not adequately meet the musical requirements set by the user during performance—for example, density or complexity requests—agents can compose variations on their performance material. Agents create intentions, which is an array
3.3 Composition Fourteen sections were selected from these recordings and stitched together, without altering any pitches or rhythms. An eight-measure introduction was composed that also served as a refrain, bridging different sections. Dynamics and articulations were added, creating the original composition for cello and guitar; this work was later re-orchestrated for string quartet.
Fig. 8 Gender 1 rhythm, generation 0
3.4 Relation of composition to corpus The first machine-composed section was taken from the material generated from the two gender parts, beginning in the nineteenth measure. This section will be discussed in more detail.
Fig. 9 Gender 2 rhythm, generation 0
123
2054
A. Eigenfeldt
Fig. 10 Two voice generation of gender 1 and gender 2, limited to generation 0 and generation 1
Fig. 11 Two voice generation of gender 1 and gender 2, demonstrating generative properties of the system
of future events derived from the available rhythms and pitches in its population best suited to the current musical environment. When the environment changes, agents will attempt to temporarily adapt their intentions by altering an individual’s density and/or complexity. As these variations occur outside the population, they are not retained within it, and are thus improvisational responses to the environment. Figure 11 presents the final six measures of the opening section, demonstrating the generation of new material based upon individuals within the population. Figure 12 presents this material within the final composition, as orchestrated for string quartet.
4 Evaluation Evaluation within computational creativity is a difficult notion, detailed elsewhere (Eigenfeldt and Pasquier 2012). Essentially, there are at least five viewpoints to consider: 1.
The designer: the designer of the system accepts the output as artistically valid;
123
2. 3.
4. 5.
The audience: the work is presented publicly, and the audience accepts the work; The academic experts: the system is described in a technical peer-reviewed paper and accepted for conference or journal publication; The domain experts: the system receives critical attention through the media, non-academic artists; Controlled experiments: the system is validated through scientifically accepted empirical methods, using statistical analysis of the results in order to accept or reject the hypothesis made about the system.
Other, Previously was presented in concert: as such, the designer was satisfied with its output; it was selected by a jury for the performance not limited to computer generated music; the composer received good feedback from the performers and audience (although that, in itself, means little, as critical feedback is rare in the North American concert setting; the audience assumed it was human-composed rather than machine-composed (see Eigenfeldt and Pasquier 2012). Restricting our current discussion to the viewpoint of the designer, while the author would like to be able to state that
Corpus-based recombinant composition
2055
Fig. 12 Measures 20–23 of the composition, Other, Previously, incorporating the generative material shown in Fig. 11, and combined with generations derived from two additional gender pairings. Articulations and dynamics were added by the composer
the system is a complete success. However, on the one hand, it produced music that the author may have been able to produce on his own, but did not. The attraction of any algorithm and generative art system is to allow its designers to set processes in motion, rather than focus upon the details; to this extent, KE3 is a successful music production system. On the other hand, after producing only two complete (and publicly presented) compositions with it, the author has moved on. This is partly because the system was never made to be the conclusive composition system for the designer: artists tend to be somewhat mercurial in their goals and desires, and this system—which was over a year in development—addressed the needs of the designer at that time: the artistic direction for the author has since changed. Another reason to question its success is the difficulty of creating a system whose user-interface is transparent enough so that it can be used effortlessly. When the system got to the state of producing interesting music, the author wanted to use it right away, and not spend several months creating a high-level user interface; the result, unfortunately, is an often frustrating experience where the creative process is slowed down because, for example, file-handling is clunky. Last, artist/software designers have the additional task of having to quickly ‘‘switch hats’’ when it comes to their relationship with the system: as the designer, one tends to consider the larger issues, while as the user, one focuses upon the details. Once the system is ready to use, it is difficult not to continually switch back to designer, and add features that may seem immediately desirable from a user perspective, when these features may not have any longterm benefit to the whole. Clearly, KE3 would benefit greatly from third party user testing; however, as the author is not interested in releasing
the system publicly, this is unlikely. The different versions of Kinetic Engine—and indeed all of the author’s music production systems—have always been rather idiosyncratic and extremely personal systems that generate music that the author wishes to hear. No attempt has ever been made to make them universal or general purpose (although the author recognises that this would not only improve them, but even suggest new ways of working for the author himself).
5 Conclusion Other, Previously is one example of corpus-based recombinant composition using a genetic algorithm. Using a corpus of existing music, in this case the Javanese traditional gamelan composition Ladrang Wilugeng, new music can be created that reflects the rhythmic and pitch tendencies of the source, without directly quoting from it.
References Ames C (1989) The Markov Process as a compositional model: a survey and tutorial. Leonardo 22(2):175–187 Biles J (1994) GenJam: a genetic algorithm for generating Jazz Solos. In: Proceedings of the international computer music conference, San Francisco, Computer Music Association, pp 131–37 Biles J (2001) Autonomous Genjam: eliminating the fitness bottleneck by eliminating fitness. http://igm.rit.edu/*jabics//GEC CO01/. Accessed on 10 April 2012 Cope D (1996) Experiments in musical intelligence. A-R Editions, Middleton Eigenfeldt A (2006) Kinetic Engine: toward an intelligent improvising instrument. In: Proceedings of the sound and music computing conference, Marseille: GMEM - Centre National de Cre´ation Musiciale, pp 97–100 Eigenfeldt A (2007) Drum circle: intelligent agents in Max/MSP. In: Proceedings of the international computer music conference,
123
2056 Copenhagen. San Francisco, Computer Music Association, pp 9–12 Eigenfeldt A (2009) The evolution of evolutionary software: intelligent rhythm generation in Kinetic Engine. In: Application of Evolutionary Computing, LNCS, vol 5484. Springer, Berlin, pp 498–507 Eigenfeldt A, Pasquier P (2012) Evaluating musical metacreation in a live performance context. In: Proceedings of the international conference on computational creativity, Dublin (forthcoming) Martins J, Miranda E (2007) Emergent rhythmic phrases in an A-Life environment. Available at http://cmr.soc.plymouth.ac.uk/ publications/MusicAL_Martins.pdf. Accessed on 12 April 2012 Miranda E, Biles J (eds) (2007) Evolutionary computer music. Springer, London
123
A. Eigenfeldt Mozer M (1994) Neural network music composition by prediction: exploring the benefits of psychoacoustic constraints and multiscale processing. Connect Sci 6(2–3):247–280 Sorrell N (1990) A guide to the Gamelan. Faber and Faber, London Thywissen K (1996) GeNotator: an environment for investigation the application of genetic algorithms in computer assisted composition. In: Proceedings of the 1996 ICMC, San Francisco, pp 274–277 Todd P, Werner G (1999) Frankensteinian methods for evolutionary music composition. In: Griffith N, Todd P (eds) Musical networks: parallel distributed perception and performance. MIT Press, Cambridge, pp 313–339 Waschka R (2007) Composing with genetic algorithms: GenDash. Evolutionary Computer Music. Springer, London, pp 117–136