A WORKFLOW FOR THE COMPUTER REPRESENTATION OF MUSICAL PERFORMANCE Stuart Pullinger; Nicholas J. Bailey; Douglas McGilvray; Jennifer MacRitchie Centre for Music Technology University of Glasgow ABSTRACT The field of music performance analysis has been slow to benefit from the development of computer graphics due to the difficulties in representing musical performance alongside the score inside a computer. A workflow is presented for the computer representation of musical performance using tools being developed at the Centre for Music Technology at the University of Glasgow. The process begins with the recording of a MIDI file of a performance. This file is converted to an XML representation which is combined with an XML representation of the score. A notematching algorithm creates links between the performed notes and the score notes. The data is transferred to a database where it is queried for musically significant passages and their performance traits. A graphical score in SVG format is produced using the Lilypond typesetting software which is overlaid with performance data from the database query. 1 INTRODUCTION The purpose of the system presented here is to enable the objective measurement of traits in musical performance and for the display of the performance data alongside the score. To this end, the system represents both the score and performance inside the same data structure. The combination of score anaysis alongside performance analysis will provide new posibilities for the study of musical performance. The use of computers for this task will free the music analyst from the more labourious parts of performance analysis thus allowing more sophisticated queries of performance to be presented. The workflow detailed here reveals the first steps towards the development of a suite of tools for computer-aided analysis of musical performance. 2 THE SYSTEM WORKFLOW This workflow incorporates several stages. Performance data is acquired in MIDI format (Section 2.1). This data is converted to XML for notematching after which the data is transferred to a database for querying. The Lilypond typesetting software [9] is used to generate a score in Scaleable c 2007 Austrian Computer Society (OCG).
Vector Graphics (SVG) [20] format. Finally the performance data is overlaid onto the svg representation of the score. 2.1 Data Acquisition Acquisition of performance data is achieved using a Moog Piano Bar [37]. This device rests on the end cheeks of a normal acoustic piano spanning the length of the keyboard. Its has “teeth” which are positioned between the black keys and lie just above the white keys. The “teeth” project infrared beams onto the white keys and through the black keys and detect the beams return with sensors. A MIDI note-on at a particular key is triggered by a the beam being broken (for white notes) or being detected (for black notes). The device also outputs note-on velocity information. An additional magnetic sensor lies beneath the pedals and detects the depression of the una corda and sustain pedals. The Rosegarden [19] sequencer records the MIDI data and the multitrack recorder Ardour [1] records the audio. Video recording is performed by a high framerate camera and the Coriander [2] software. Future work will incorporate audio and video analyses into the performance analysis. 2.2 Representing the Score and the Performance The intermediate representation for the score and performance is created in eXtensible Markup Language (XML) [3]. Using XML allows for the extension of an existing XML music format for the needs of the present system. There also exist several standard APIs for the access [22] [23] and processing [27] [26] [24] of XML data which are implemented in a number of computer programming languages including C++/Java [25], Python [18] and Perl [13] among others. This wide support and multitude of tools make XML the most attractive choice for music representation and processing. The score is represented in the MusicXML [32] format. MusicXML provides a near-complete representation of the score in common western notation. MusicXML is widely supported by both commercial [21], [4] and opensource [11], [19] software packages as an interchange format which makes the present system robust against the rise and fall of software companies.
MusicXML contains no explicit timing information: the musical timing of notes or events are implied by the order in which they occur in the file rather than explicitly specified (see [33, (Honing, 1993)] for a discussion of the issues surrounding explicit vs. implicit timing in music). Note onset times are inserted into the MusicXML in “score time” - the number of beats since the beginning of the piece. This eases the process of matching score notes with performed notes. Each score note is also given an unique identity 1 to enable it to be identified. The musical performance is represented in Performance Markup Language (PML) [12]. PML is an extension of MusicXML to include performance data. A separate section is added to the end of the file giving a list of performance events (notes in this case), their onset and offset times, MIDI note number and a unique id. The performance section is generated from the MIDI recording of the performance. Once the file had been processed by the notematcher (see Section 2.3) a reference to a score note’s id is added if the performed note was successfully matched to a score note. 2.3 Matching the Score and the Performance After the addition of the performance information into the PML file it is processed with a notematching algorithm to create references in the performed notes to their corresponding score notes. The matching algorithm is based on the work of (Bolton, 2006) [28] and (Large, 1993) [35] extended to operate on polyphonic music. The algorithm uses dynamic time warping [30] to calculate the optimal correspondance between the 2 sequences of notes. On matching a performed note to a score note, a reference is added to the performed note stating the id of the matched score note. 2.4 Converting to a Database The PML file is parsed to create a database. Two tables are created - one to hold the score information from the MusicXML section and one to hold the performance data from the PML section. The parsing is performed by a program written in the Python [17] scripting language. The tables simply store one score or performance note per row with each of the columns given by each possible MusicXML/PML tag with the exception of the pitch as described below (section 2.4.1). The database enables the data to be queried easily in many varied ways. Although it is possible to query the data whilst it is still in XML form using XQuery, the comparative compactness of SQL syntax means it is favoured for the current application where there are frequent manually entered queries. 2.4.1 The Binomial Representation System The data stored in the score table of the database should be available for musical analysis. This enables the match1 The XML “id” attribute performs this function. For an XML file to be valid an id attribute must be unique within that file
ing of musical phrases, their transpositions and inversions with their performance traits. Many performance analysis sytems use MIDI data as the sole source for analysis [34] [10]. While MIDI data provides a useful means of acquiring a transcription of keyboard performance events it does not allow the distinction between enharmonically equivalent pitches - a vital prerequisite for music analysis. (For a more thorough investigation of the problems of MIDI see [36]). The MusicXML representation stores the pitch step along with an accidental tag and/or a pitch alter tag. For example, an F] would have a pitch step tag containing ’F’, an accidental tag containing ’sharp’ and an alter tag containing ’1’. The alter tag signifies a raising of the pitch by one semitone and the accidental tag indicates the presence of the ’]’ symbol on the stave. A second F] in the same bar would have the same pitch step and alter values but would not have an accidental tag which conforms to conventional western music notation. However, the presence of the accidental tag was not found to be consistant across all implementations of MusicXML. The pitch step tags and pitch alter tags provide the means for describing enharmonically equivalent pitches. In addition, the current application requires that the pitch data are easily processed in a computer. The binomial representation described in [29] is a similar system which is more suitable for computer processing than the textual representation used for MusicXML and is used here. In the binomial system, pitches are represented by integer pairs of the form where pc is the pitch class and nc is the name class. The pitch class is a number from 0 to 11 corresponding to the pitch the note sounds at from C=0, C]/D[=1... to B=11. The name class is a number from 0 to 7 representing the staff position of the pitch with C=0/7, D=1, E=2,... B=6. Combining the pitch class and name class: the scale of C Major is given by . Musical intervals can be calculated by subtracting the higher pair from the lower pair and transpositions by adding two pairs. Pitch classes and name classes cycle around modulo 12 and modulo 7 respectively. The system allows melodic inversion and diatonic operations. A more thorough explanation is given in [29] 2.4.2 The database The PostgreSQL [15] database is used in the current system. There are several added benefits to using the PostgreSQL compared to most other relational databse systems. Although the workflow described does not require a database so rich in features, it does have many properties which would enable deployment of the current methodology in anticipated, increasingly sophisticated query contexts. Using stored functions and triggers, the previously described operations could be moved inside the database allowing for the matching, querying and presentation stages to be accessed through a single interface. The database will run in the most popular computing environments (Linux, BSD, Macintosh, UNIX and even Windows) and is acces-
sible from many computer programming languages such as C [7], C++ [8], Java [6], Perl [14] and Python [16] making it no less interoperable than XML. The extension of the current system into audio and video processing is facilitated by the inclusion of Binary Large OBjects (BLOBs) which allow a filesystem-like interface to binary objects whilst storing them in the database. It is not possible to store binary data inside an XML file without reencoding it into text. Such schemes are not attractive due to the increased processing and storage requirements which they demand. For example, base64 encoding increases the data size by one third. It is therefore not feasible to store large (multi-gigabyte) files such as uncompressed video inside XML where the increase in disk usage and processing would detriment the performance of the system.
in the lilypond source file. With this method it is possible to locate all of the links surrounding noteheads in the SVG file, locate the line in the lilypond file which caused it and read the note id from the comment at the end of the line to match the graphical notehead with the note described in the database. This allows the performance data which has been matched with that score note to be added to the file aligned with the notehead. A single line drum staff is added above the normal staff to provide an origin for the performance data. It is the strength of XML that the SVG file could be loaded and edited using standard XML tools without the need for much SVG-specific code.
2.5 Querying the Database
The workflow was enacted in the analysis of a performance of Bach’s Invention number 1 (BWV 772). The piece was chosen for its simplicity and evident musical structure [31]. The performance was played by a university student and recorded using MIDI, audio and video in the manner described in 2.1. The tools described above were used to analyse the performance and produce the data shown in 4
The database is queried by sending an SQL string from a Python script. The database can be queried for motifs and their transpositions and inversions. The query is formed so that it would return the score note id along with the calculated data value from the performed data. (Examples are given in section 3.1). This enables the returned data to be matched to its corresponding note in the score later. The results were returned as a list of tuples. Interesting performance traits can be queried with quite simple SQL queries. 2.6 Presenting the Results The result of the query is combined with the score in Scaleable Vector Graphics (SVG) format. The SVG specfication describes an XML syntax for the display of vector graphics. The Lilypond typesetting software produces the score output. Lilypond parses text files written in the Lilypond language and produces high quality typeset scores. It can be considered to be a musical equivalent to the LATEXlanguage for text and is freely available under the GNU public licence [5]. The current application requires the ability to trace a note from the input lilypond file to the output SVG file so that the performance data can be aligned with it for display. The most obvious way to achieve this would be the addition of tags in the SVG output. The tags allow for the inclusion of any properly namespaced XML tags alongside the SVG data. Thus it would be possible to include MusicXML and PML data alongside their graphical representations. Unfortunately Lilypond does not currently support the inclusion of these tags so a work-around was used. Lilypond currently can create links in its postscript output. The links point to the location of the command in the source file which created the notehead. This allows a user to click their mouse on a notehead and be taken to the corresponding line in the source file to aid debugging. The Lilypond source code was modified to create these links in the SVG output using the
... tag. The cycle is completed by including the id of each note as a comment
3 AN EXAMPLE
3.1 Results Figure 1 shows a simple SQL query to search for repetitions of the first 3 notes ( ) in the right hand. select * from score notes as note1 where note1.pitch class=0 and note1.name class=0 and exists (select * from score notes as note2 where note2.pitch class=2 and note2.name class=1 and note2.note id =note1.note id+1 and exists (select * from score notes as note3 where note3.pitch class=4 and note3.name class=2 and note3.note id =note2.note id+1));
Figure 1. An SQL query to find the sequence ’C D E’ The query returns the first note row of each matching sequence. This type of query can be combined with an overlay of the performance data to compare how a motif is played in its original and transposed forms. Figure 3.1 shows the first 9 notes of the right hand and its repetition raised by a perfect fifth. A simple query to display performed note durations above each score note is displayed in 3. The result of the query is diplayed in Figure 4. Interesting performance traits are evident. A very clear change in tempo can be seen in the second half of bar 3
10
4 CONCLUSION
Figure 2. The first 9 notes of Bach’s Invention 1 with 2 examples of the same phrase transposed overlaid with performed note durations
select note id, perf notes.note end − perf notes.note onset as note duration from score notes, perf notes where score notes.align=’correct’ and score notes.pnote=perf notes.event id;
Figure 3. An SQL query to find correctly matched score notes and their durations
where the entry and exit from the phrase are played more slowly than the middle. A slight accelerando and ritardando can be observed over the phrase and repeat of the phrase (transposed) in the next bar. Interesting phrasing can be seen in bar 12 in the right hand where emphasis is given to the first and third note of each group of semiquavers by slightly elongating the notes. Certain anomalies in the performance data can be seen where the matching algorithm has matched a very short performed note to a score note. This occurs where there is a mordant marked in the original score such as the penultimate note in the left hand of bar 2 and the third note in the left hand of bar 5. The note matcher matches the score note to the first note of the mordant and ignores the rest. Mordant markings are not currently maintained through the analysis process, however, future work intends to address this.
The workflow discussed above provides a means for the presentation of computer analysis of musical performance. The examples show successful, if simple, analyses of a performance with the data displayed above the corresponding notes in the score. There are many areas where the current workflow could be improved. The data acquisition and note matching processes could be moved within the database. This would ease the creation of a simple interface to the system which would enable its use outside of the engineering laboratory. The inclusion of metadata through the Lilypond typesetting process would ease matching the performed notes with their graphical score representations and allow for the inclusion of a far more rich set of data in the output which would facilitate its use in interactive applications. 5 REFERENCES [1] Ardour - the new digital audio workstation. http: //www.ardour.org/. [2] Coriander - gui for firewire camera control and capture. http://damien.douxchamps.net/ ieee1394/coriander/index.php. [3] Extensible markup language (xml). http://www. w3.org/XML/. [4] Finale - music notation software. http://www. finalemusic.com/. [5] The gnu general public licence. http://www. gnu.org/licenses/gpl.html. [6] jdbc3-postgresql - jdbc driver for postgresql. http: //jdbc.postgresql.org/. [7] libpq - postgresql libraries. postgresql.org/.
http://www.
[8] libpqxx - c++ client api for postgresql. http: //thaiopensource.org/development/ libpqxx/. [9] Lilypond typesetter //lilypond.org.
software.
http:
[10] Music and audio retrieval tools (maart). http:// maart.sourceforge.net/. [11] Noteedit - a score editor. http://noteedit. berlios.de/. [12] Performance markup language. http://www. n-ism.org/Projects/pml.php. [13] Perl xml. http://perl-xml.sourceforge. net/. Figure 4. The first 12 bars of Bach’s Invention 1 overlaid with performed note durations
[14] pgperl - native perl interface to postgresql. http://gborg.postgresql.org/ project/pgperl/projdisplay.php.
[15] Postgresql - open source relational database system. http://www.postgresql.org/. [16] pypgsql - python interface to postgresql. http:// pypgsql.sourceforge.net/. [17] The python programming language. www.python.org/. [18] Python/xml libraries. sourceforge.net/.
http://
[34] Henkjan Honing. Poco: An environment for analysing, mofifying, and generating expression in music. In Proceedings of the 1990 International Computer Music Conference, 1990. [35] E. W. Large. Dynamic programming for the analysis of serial behaviors. Behavior Research Methods, Instruments, & Computers, 25(2):238–241, 1993.
http://pyxml.
[36] F. Richard Moore. The dysfunctions of midi. Computer Music Journal, 12(1):1928, Spring 1988.
[19] Rosegarden: music software for linux. http:// www.rosegardenmusic.com/.
[37] Will Mowat. Bob moog piano bar. Sound On Sound, March 2005.
[20] Scalable vector graphics (svg) xml graphics for the web. http://www.w3.org/Graphics/ SVG/. [21] Sibelius - software for writing, playing, printing and publishing music notation. http://www. sibelius.com/. [22] Simple api for xml. http://www. saxproject.org/about.html. [23] W3c document object model. http://www.w3. org/DOM/. [24] W3c xml query (xquery). org/XML/Query/.
http://www.w3.
[25] Xerces c++ parser. http://xml.apache.org/ xerces-c/. [26] Xml path language (xpath). http://www.w3. org/TR/xpath. [27] Xsl transformations (xslt). org/TR/xslt.
http://www.w3.
[28] Jered Bolton. Gestural Extraction from Musical Audio Signals. PhD thesis, Department of Electronics and Electrical Engineering University of Glasgow, 2004. [29] Alexander R. Brinkman. A binomial representation of pitch for computer processing of musical data. Music Theory Spectrum, 8:4457, Spring 1986. [30] Simon Dixon. An on-line time warping algorithm for tracking musical performances. International Joint Conference on Artificial Intelligence, 2005. [31] Laurence Dreyfus. Bach and The Patterns of Invention. Harvard University Press, 1996. [32] Michael Good. The Virtual Score, volume 12 of Computing in Musicology, chapter 8, page 113124. The MIT Press, 2001. [33] H. Honing. Issues in the representation of time and structure in music. Contemporary Music Review, 1993.