linkage, and a C++ SDK for plugin and host development. [2]. Vamp plugins ... Mopy [12] is a Python interface for the Music Ontology and related ontologies.
Reusable Metadata and Software Components for Automatic Audio Analysis Gyorgy Fazekas, Chris Cannam and Mark Sandler Centre for Digital Music Queen Mary University of London Mile End Road, London, E1 4NS, United Kingdom
{gyorgy.fazekas, chris.cannam, mark.sandler}@elec.qmul.ac.uk ABSTRACT Content-based metadata is becoming increasingly important for managing audio collections in digital library applications. While Music Information Retrieval (MIR) research provides means for extracting metadata from audio recordings, no common practice emerges for representing analysis results or exchanging algorithms. This paper argues for the need of modularity through interoperable components and data publishing methods in MIR applications. We demonstrate the use of a common API for audio analysis, enhanced with easily extended Semantic Web ontologies for describing results and configuration. Built on the extensible ontological framework provided by the Music Ontology[1], our system allows for the representation of diverse information such as musical facts, features or analysis parameters in a uniform, reusable and machine interpretable format. Our demonstration will be using SAWA1 , a Web-application available2 for researchers interested in these technologies.
1.
INTRODUCTION
Primary reasons for collecting musical metadata include the effective organisation of musical assets such as elements of personal collections, historical recordings or sounds in a sample library. These examples represent cases of contentbased information management and retrieval. Another category can be designated as metadata extraction for creative applications, which is often ignored by existing metadata standards. Several organisations have defined standards, modelling various aspects of the audio and multimedia domain. They provide schema and their definition in a particular syntax and various ways of encoding information. The result is a plethora of largely incompatible methods and standards. Since often a narrow area of the audio domain is addressed with a specific set of requirements, interoperability between applications is difficult, even if sharing other1 2
SAWA stands for “Sonic Annotator Web Application”. http://www.isophonics.net/sawa/
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. JCDL ’09 June 15–19, 2009, Austin, Texas, USA. Copyright 2009 ACM ...$10.00.
wise overlapping information would be useful. The problem can be identified in the use of non-normative development and publishing techniques rather than flaws in schema design. We argue that common approaches to overcome these issues, including standardising syntax using XML3 , do not provide sufficient ground for modularity and interoperability. An ad-hoc definition of terms is one common problem, while the lack of possibility of establishing meta-level relationships (such as equivalence or a hierarchical model) is another. These often prevent interoperability or the reuse of data expressed using these formats. Our answer to this problem is in using Vamp [2], a common API for audio analysis, enhanced with easily extended Semantic Web ontologies for describing both analysis results and configuration. These technologies are demonstrated using SAWA, a Web-based system for audio analysis. Our motivation in developing this system is twofold. On one hand, we provide an easily accessible exposition of Music Information Retrieval (MIR) algorithms implemented as Vamp plugins. On the other hand, we aim to show how Semantic Web technologies can facilitate interoperability between service components within an MIR system and provide access to external systems at the same time. These technologies can be reused and extended in the future towards a wider distributed MIR research environment.
2.
THE SEMANTIC WEB MODEL
The Uniform Resource Identifier is a key enabling concepts in the success of the World Wide Web. It solves the problem of identifying and linking resources (documents, data, or services). Together with the access mechanism of HTTP,4 it enables the formation of a large interlinked network of documents: the Web as we know and use it today. However, this infrastructure is not yet used as effectively as it could be. The flow of data and access to networked services are cluttered by incompatible formats and interfaces. The vision of the Semantic Web [3] is to resolve this issue by creating a “Giant Global Graph” of machine-interpretable data. Since information on the Web can stand for just about anything, Semantic Web developers are faced with a major challenge: How to represent and communicate diverse information so that it can be understood by machines? The answer lies in standardising how information is published, 3
eXtensible Markup Language The Hypertext Transfer Protocol (HTTP), provides the basic methods for obtaining resources identified by HTTP URIs. 4
rather than trying to arrange all human knowledge into rigid data structures. We believe that representing musical information presents similar challenges. We cannot presume to plan for all possible uses of a system. Therefore, we need data structures that are interoperable with other systems and extensible even by users. This poses similar problems to the development of the Semantic Web, hence our interest in these techniques.
3.
ONTOLOGIES
The Semantic Web, an interlinked web of heterogeneous information permits rich web services using data from different sources. It also brings intelligence to the Web by using automated reasoning. However, this requires common grounds for how information is published, rather like designing a schema for a database. However, because of the diversity of information represented on the web, no individual or organisation could possibly create a structured schema. Still, the Semantic Web’s answer to this complex and apparently unsolvable problem is surprisingly simple: the Resource Description Framework (RDF) [5].
3.1
RDF and the need for Ontologies
RDF is a conceptual data model providing the flexibility required for publishing diverse semi-structured data–that is, just about anything on the Semantic Web. It is based on the idea of expressing statements in the form of subject – predicate – object. Elements of these statements are literals or resources named by a set of URIs. This gives the model an unambiguous way of referring to things and using HTTP dereferencing, access to more data a resource may hold. However, in order to be precise in RDF statements and avoid ambiguities, we need to be able to define and later refer to concepts such as a Song, a Composer, a Plugin for audio processing, or the FFT size parameter in an algorithm using the Fast Fourier Transform. We also have to specify relationships pertinent to our application. Ontologies are the tools for establishing these necessary elements in a domain model. Semantic Web ontologies are created using the same conceptual model that is used for communicating the data. However, additional vocabularies are needed for expressing formal ontologies. A hierarchy of languages is recommended by the W3C5 including the RDF Schema for defining classes and properties of RDF resources and OWL for making RDF semantics more explicit.6
3.2
The Music Ontology
Recent efforts [4] toward integrating music-related data and web services have led to the creation of the Music Ontology, a standard base ontology for describing musical information. We can readily use the Music Ontology to express a wide range of concepts including high level editorial data about songs or artists, production data about musical recordings and finally detailed structural information of music using events and timelines. The Music Ontology does not cover everything we can say about music. Rather, it provides ways of plugging new terms under existing concepts which can be used to describe 5
The World Wide Web Consortium (http://www.w3.org/) For example, using OWL-DL we can impose restrictions on the range and domain types of properties, or constraints on cardinality, the number of individuals linked by a property. 6
a sub-domain. In the context of our current discussion, the most important related ontologies are the Event and Timeline ontologies (section 3.3), and the Audio Features Ontology (section 3.4) which can be used to express content-based audio descriptors.
3.3
Event and Timeline Ontologies
The Event and Timeline ontologies provide us with vocabularies for relating audio features to a recording. The Event Ontology [6] is a generic framework for classifications of patterns or regions in space or time. The Timeline Ontology [7] can describe relative timelines: for example, a regularly sampled timeline such as that defined by the sampling rate of an audio recording. It also allows expressing starting times and duration with reference to a particular timeline. This provides an unambiguous but contextually sensitive description of the time extents of an event.
3.4
Audio Features Ontology
The Audio Features Ontology (AF) [8] can be used to express both acoustical an musicological features. It allows publishing content-derived data about audio recordings. AF provides concepts such as Segment, Beat, and KeyChange on top of the Event Ontology. Where we need to publish features that have no predefined term in this ontology, we can synthesise a new class within our RDF document as a subclass of an appropriate class in the Event ontology. This ensures that our features can be interpreted correctly as time-based events, even where further semantic associations are unavailable.
4.
SOFTWARE COMPONENTS
4.1
Vamp plugins
Vamp is an plugin system designed for audio feature extraction. It consists of a binary plugin interface, with C linkage, and a C++ SDK for plugin and host development [2]. Vamp plugins accept audio data as input and produce structured data as output. The features returned by a Vamp plugin are rich enough to contain the necessary data to represent musically meaningful features such as beat or key-change information. However, they are not rich enough to describe what those features represent. They do not provide enough information for a host program to place them in context among other musical features. While it may be evident to a human that a plugin named “key change detector” detects key changes, a program cannot know that the outputs of this plugin will be comparable to other sources of key information. In order to establish this relationship and ensure that the returned features are correctly understood, the plugin’s output needs to be associated with relevant concepts in the Audio Features Ontology. The terms used to make this connection are found in the Vamp Plugin Ontology.
4.1.1
Vamp Plugin and Transform Ontologies
The Vamp Plugin Ontology [9] is used to express information about feature extraction plugins. The most useful aspect of this is the association of a plugin outputs with terms in the AF Ontology to express what the output describes. These may be distinct event types like note onsets, features describing aspects of the whole recording such as an audio fingerprint or dense signal data such as a chromagram.
Figure 1: The Vamp Transform Ontology. The Vamp Transform Ontology is published as part of the Vamp Plugin Ontology, though conceptually separate. It contains terms to describe how a plugin may be configured and run. Any host can use this to drive its processing, identifying parameter values and other details such as audio block and step sizes. This information is expressed and stored using the same RDF format as the results without imposing any additional encoding requirements. Any agent reading these results will therefore have certainty about how they were generated, a very valuable detail in the context of MIR research.
4.2
Vampy
Vampy is a Python wrapper for Vamp plugins. The Vampy wrapper is an ordinary Vamp plugin which may be installed in the usual manner. When it is installed, any appropriately structured Python scripts found in its script directory will be presented as if they were individual Vamp plugins for any Vamp host to use. Vampy permits Vamp plugins to be rapidly and dynamically developed using Python libraries for numerical and scientific computation such as NumPy and SciPy.
4.3
Sonic Visualiser
Sonic Visualiser [10] is an audio analysis application and Vamp plugin host. It is also capable of loading and viewing features calculated by other programs. In particular, it can load multiple sets of features associated with a single audio file from an RDF document generated by SAWA or Sonic Annotator. It therefore serves as a possible “offline” visualisation interface for the SAWA system.
4.4
Sonic Annotator
Sonic Annotator [11] (see Figure 2) is a “batch” audio analysis application which applies Vamp feature extraction plugins to audio data. It is built using libraries originally written as part of Sonic Visualiser. The two applications therefore share capabilities such as broad audio format support, network retrieval of audio files, and support for feature extraction specifications in RDF using the Vamp Transform Ontology (see 4.1.1).
4.5
Mopy
Mopy [12] is a Python interface for the Music Ontology and related ontologies. It generates Python classes associated with ontology terms which simplifies interaction with RDF data. SAWA uses an adapted version of Mopy in generating Vamp plugin configuration pages.
Figure 2: Sonic Annotator and other technologies in relation to the Semantic Web.
5.
THE SAWA SERVICE
SAWA was developed with the intention of providing an easy to use human interface contrary to systems using a Web Services API. It works with a small collection of audio files uploaded by the user. It allows for the selection and configuration of one or more Vamp outputs and execute transforms7 on previously uploaded files. Results are returned as RDF data. This can be examined using an RDF browser such as Tabulator, imported in Sonic Visualiser and viewed in context of the audio or published on the Semantic Web. SAWA’s plugin selection interface can be seen in figure 3. Optionally, the system attempts to identify the audio using a MusicDNS8 fingerprint and associated identifier. This identifier is matched against the MusicBrainz9 database to obtain editorial metadata which can be added to the results. For copyright reasons SAWA deletes all uploaded files as the user session expires.
5.1
System architecture
The SAWA system builds on several software components and ontologies addressing the issues described previously. Together with Vamp plugins, Sonic Annotator, a Vamp host and batch feature extractor provides the signal processing back-end of our system. Corresponding ontologies: the Vamp Plugin, Vamp Transform and Audio Features ontologies are used for representing and publishing analysis results and configuration data. The Music Ontology allows linking the data to other services on the Semantic Web. Together with a third-party library: CherryPy10 , Mopy helps in generating a dynamic web interface for any number of plugins installed on the system. Using the Linked Data concept is key in creating the modular architecture depicted in figure 4. The server side application uses RDF data to communicate with other components such as Sonic Annotator and interprets RDF data to generate its user interface dynamically. The advantages of using this format in the context of our system are manifold: 7 A transform can be seen as an algorithm available as Vamp output, associated with a set of parameters. 8 http://www.musicdns.com/ 9 http://www.musicbrainz.org/ 10 http://www.cherrypy.org/
Figure 3: Vamp plugin selection and configuration in SAWA Electronic Engineering and Computer Science, Queen Mary, University of London, the EPSRC-funded ICT project OMRAS2 (EP/E017614/1) and the NEMA project funded by the Andew W. Mellon Foundation.
8.
Figure 4: RDF data flow in SAWA. • Generic interface design: The interface can be generated for any number and type of plugins and their output configuration. • Generic implementation: The system may rely on any suitable Vamp plugin host or computation engine. • Efficiency: Cached results may be returned from a suitable RDF store instead of repeating computation. • Extensibility: The system can access other linked data services to augment the results with different types of metadata
6.
CONCLUSIONS
We presented a set of interoperable tools for the analysis and semantic annotation of audio. An easy to use web interface: SAWA was built to demonstrate how these tools work together, and how Semantic Web technologies can be used for communication between MIR system components. Future work includes matching transform queries against cached results in SAWA, providing web-based visualisations, finally, we plan to implement an online workbench for Vamp plugin development using Vampy.
7.
ACKNOWLEDGEMENTS The authors acknowledge the support of the School of
REFERENCES
[1] Y. Raimond., S. Abdallah, and M. Sandler. “The Music Ontology”, Proceeedings of the 8th International Conference on Music Information Retrieval, Vienna, Austria, 2007. [2] C. Cannam. “The Vamp Audio Analysis Plugin API: A Programmer’s Guide”, http://vamp-plugins.org/guide.pdf [3] T. Berners-Lee, J. Handler, and O. Lassila. “The semantic web”, Scientific American, pp. 34–43, May 2001. [4] Y. Raimond. and M. Sandler. “A Web of Musical Information”, Proceeedings of the 9th International Conference on Music Information Retrieval, Philadelphia, USA September 14-18, 2008. [5] O. Lassila and R. Swick, “Resource description framework (RDF) model and syntax specification”, 1998. http://citeseer.ist.psu.edu/article/ lassila98resource.html [6] Y. Raimond. and S. Abdallah. “The Event Ontology”, http://motools.sourceforge.net/event/event.html [7] Y. Raimond. and S. Abdallah. “The Timeline Ontology”, http: //motools.sourceforge.net/timeline/timeline.html [8] Y. Raimond. “Audio Features Ontology Specification”, http://motools.sourceforge.net/doc/audio_ features.html [9] C. Cannam. “The Vamp Plugin Ontology”, http://omras2.org/VampOntology [10] C. Cannam. “Sonic Visualiser”, http://sonicvisualiser.org/ [11] C. Cannam. “Sonic Annotator”, http://omras2.org/SonicAnnotator [12] C. Sutton, Y. Raimond “MOPY, Music Ontology Python Interface” http://sourceforge.net/projects/motools/mopy