Music in Time-Based Hypermedia - Semantic Scholar

Music in Time-Based Hypermedia Jacco van Ossenbruggen Anton Eliëns Vrije Universiteit, Dept. of Mathematics and Computer Science De Boelelaan 1081a, 1081 HV Amsterdam, The Netherlands email: [email protected], [email protected]

ABSTRACT The paper describes the extension of a hypermedia class library with music as a new component type, but will focus on the development of a software wrapper object as an application programmers interface to the Csound software sound synthesis program. This wrapper provides the flexible, interactive and object oriented interface needed by a hypermedia system. Additionally, some consequences of the fundamental difference between static and time-based media will be discussed. KEYWORDS: time-based hypermedia, object oriented programming, software sound synthesis

1 BACKGROUND The project DejaVu [2] at the Vrije Universiteit aims at providing a framework for the development of hypermedia systems which are: open, so users can easily extend a system to make it fit their own needs; heterogeneous, which means that systems consist of multiple, loosely coupled components; distributed, so systems can run on a network of computers; intelligent, by which we mean that logic-based retrieval and navigation mechanisms should be provided; object oriented, which indicates that software will be developed in a way that is known as object oriented programming. As part of this project, the students Martin Kalkman and Edwin G.K. Rijvordt developed a C++ [10] class library for constructing hypermedia user interfaces, as an extension to the InterViews [6] library. They have called the toolkit HyperViews [9] to reflect their intention to provide for hypermedia what the InterViews library provides for graphic user interfaces (GUIs). This library currently supports bitmaps, ordinary text and Unix manual pages as component types. It 0 Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copyright is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission. c 1994 ACM 0-89791-xxx-x/xx/xxxx...

provides user friendly mechanisms to define links between components of these types. Links to Unix commands or law1 audio-files are also possible. HyperViews is an open system in that similar support can be easily defined for other component types. Like the open hypermedia model of Microcosm [1], HyperViews does not impose any form of mark-up upon the data: link information is stored external to the file containing the component’s contents. As a consequence, components may be created with traditional editors, and (private) links may be created by all readers, even if they do not have permission to modify the corresponding files. But unlike the Microcosm system, HyperViews focuses on the application programmers interface offered by its class library. The objects representing the various component types and those representing their graphical instantiations are both implemented by classes which all inherit their common functionality from the same abstract component or instantiation base class2 .

2 TIME-BASED VS STATIC MEDIA Media like audio, video, music and choreography, are called time-based or dynamic media. They are called time-based because they have basic, time-related attributes like a duration or a starting time. Static media, like text or still graphics, lack this kind of attributes. These media could be defined as dynamic media which consist of just one sample frame. Dynamic media however, can be regarded as a sequence of sample frames. Digital audio data, for example, consist of a number of samples that represent a quantized approximation of an analog audio signal, video comes normally at a rate of fifty frames per second, music can be defined as a sequence of musical events (like notes and chords), and even for choreography there exist (notational) methods to decompose a dance 1 law encoding corresponds to CCITT G.711, and is the standard for voice data used by telephone companies in the United States, Canada and Japan; law data are sampled at a rate of 8000 samples per second with 12-bit precision, compressed to 8-bit samples. 2 The terminology proposed by the Dexter Reference Model will be used as much as possible throughout this article. The Dexter term component refers to the data-containing ‘nodes’ of the hypermedia database. In contrast, the word instantiation will refer to the presentation of a component to the user.

in a number of body movements. The graphical representation of music in standard music notation, can be considered as a static medium. Needless to say, music itself is a time-based medium.

3 HYPERMUSIC We have extended the HyperViews system with music as a new component type. Instantiations of this component type are able to represent a musical score graphically on a screen, and to generate an audio signal so that the musical component can actually be played. Hyperlinks can be made from and to all other implemented component types. For the graphical representation various classes from the InterViews library were used, and for the ability to define hyperlinks, hardly any new code had to be written: inheritance did its job the way it should. To be able to play a music component, it is possible to write it to a file in the Scot file format. Such a file can be read by an external program called Csound [11], which will be invoked by HyperViews to generate the events necessary to synthesize the audio signal. The actual sound of the notes defined in the music component file, is determined by a second, so called orchestra file which is really a computer program that defines the instruments used to play the notes. Csound runs real time on a SPARC station, producing 8000 bytes of law data a second. The format used to store music components is also based on the Scot file format. It is extended to provide some (graphical) presentational information. The presenting class should know where to break staves, whether it has to use bass or treble clefs, etc. Additionally, primitives to store the title of the score and name of the author are included. Note that the audio presentation information resides in the orchestra file mentioned before. A standard Scot score file requires some technical definitions (like waveform function tables) that logically belong in the orchestra file. This is only to enable Csound to save some memory space: Csound was developed when memory was scarce. Music component files lack this kind of definitions, instead some default definitions are added to the file when it has to be played.

4 SOFTWARE WRAPPERS The DejaVu project tries to use existing software wherever it is possible. However, in many cases available software does not provide a convenient application programmers interface (API). Software wrappers can provide such APIs, by embedding the involved program in one or more (C++) objects in order to hide its complex details behind a clean class inter-

face. Additionally, application programmers will be able to easily alter the interface if it does not exactly fit their needs.

4.1 Icsound For example, the method of playing music components described above is not flexible enough to satisfy the needs of a real hypermedia system. To provide the desired flexibility, we have developed a software wrapper with an object oriented interface around Csound. This wrapper, called Interactive Csound (Icsound), allows to process musical events in the flexible manner required by a hypermedia system. The wrapper provides the necessary functionality to play arbitrary, real time generated fragments of musical scripts. Additionally, this interface makes it possible to have access to information about the way playing proceeds: how many notes have been played, which notes are being played at the very moment, how long it will take to play the rest of the notes, etc. For the implementation of this wrapper it is, in principle, not necessary to modify the Csound program. Instead, the interface runs Csound in a special mode which continuously reads the input for incoming events and continuously fills the audio buffer. A C++ object executes Csound in another process and provides streams to write events to the Csound process, and to read its output. An arbitrary fragment of a Csound script can be played by writing it on the input stream. The wrapper object analyzes the produced output messages, in order to provide the real-time information described above. Programmers can install their own handler object to use this information in application programs. Higher level classes, derived from those described above, provide primitives to (re)play fragments starting at an arbitrary moment in time and to perform other useful operations upon these fragments.

4.2 Hush The approach followed by the development of the Icsound wrapper was inspired by the development of hush [3], a C++ application programmers interface to the Tcl/Tk toolkit [7, 8]. Despite the elegant design — InterViews was designed using the object oriented programming paradigm from the very beginning — the InterViews library turned out to be rather complex to use. Another disadvantage of InterViews is that it supports only a limited number of the widgets commonly used in graphical user interfaces. In contrast, the Tcl/Tk toolkit offers a large number of widgets and a very flexible environment for rapidly developing graphical user interfaces by means of Tcl scripts. A C++ interface to Tcl/Tk has been developed in a style reminiscent of InterViews. This interface, called the hyper utility shell (hush), has got a class structure which is considerably

less complex than the InterViews class structure. Hush enables the C++ programmer to take full advantage of the rich functionality offered by Tcl/Tk in a natural way. Tcl scripts, for example, can be reused without modification by interpreting them in a C++ program. By installing handler objects, the programmers are able to use C++ member functions in a hush script in a type-secure way, and to add, modify or delete widgets in their C++ program, even if these widgets were originally created by the script.

5 SUPPORTING TIME-BASED AUTHORING The most fundamental limitations of the current HyperViews system result from the fact that the hypermedia model used does not have any notion of the typical time-based characteristics of the music components. Several of the problems caused by this severe limitation will also arise when HyperViews will be extended with other dynamic component types. These problems will be discussed in this section. A system which cannot deal with attributes related to the dimensions length and width, cannot support users in the design of the document layout either. The same limitation yields for dynamic media: it may be possible to design a useful system that does not need to have any knowledge about the details of the (dynamic) components it deals with, but a system should be able to deal with the most fundamental attributes related to the dimension time. For instance, synchronizing various types of media proves to be very complicated and time consuming. To be useful, an authoring system should support the author in this process at a high level. It can only do so if it knows exactly how to deal with typical time-related attributes like the durations and relative starting times of its components. So, like the spatial dimensions, time needs to be integrated in the core of the hypermedia model itself. Currently, a length function is defined for all HyperViews components. It returns the length of the component in the X or Y dimension. This function can simply be extended to return the length of the component in the X, Y and time dimension. In this way, the duration of an object is integrated in a natural way. Most of the attributes related to a spatial dimension can be defined to apply to temporal aspects as well.

6 COMPOSITE DYNAMIC COMPONENTS So the DejaVu-framework offers application programs the ability to use scripts, interpreted by an embedded interpreter, both to define the GUI as well as to play arbitrary musical fragments.

In some systems, scripting languages are also used to define how a compound object should be constructed from other objects by explicitly programming the timing and location information. However, this approach ignores the inherent modularity in the structure of the hypermedia component. An approach which makes this structure explicit, and the actual timing and location information implicit, is offered by the Amsterdam Hypermedia Model [5] which describes an extension of the Dexter model. One of the major limitation of HyperViews is in the fact that it does not support structured or composite components at all. At this moment, HyperViews only supports the definition of referential links between two arbitrary components. Structured components could be implemented by the use of so called structural links. These links are essential for the production of linearized versions of hyperdocuments, and allow for user defined semantic checks. The ability to author composite components has proven to be very useful in ‘ordinary’ multimedia systems and will still be useful in a hypermedia environment, even for dynamic media like music.

7 FUTURE WORK Instantiations of music components have to be edited with a traditional text editor in a user unfriendly format, because there is no graphical editor implemented yet. Only the most basic aspects from the standard music notation are implemented. However, it should not be a serious problem to implement a graphical editor, and we expect the class which represents the score graphically, to be easily extendible with more elaborate elements of music notation. Because the DejaVu project is moving its focus from the InterViews toolkit towards a hush-based environment, future work on the graphical user interface will likely be done in this context. To support the development of time-based applications, the hush library itself will have to include time related features. On top of the current Csound class interface, new classes will be built to represent (and play) high-level musical concepts. At the moment, the Icsound library is reimplemented using a client/server architecture. While this will hardly alter the programmers interface, it will result in a better performance because it will be possible to run the csound (server) process and the application (client) process on a different host. Additionally, this implementation will make it possible to run simultaneously different applications which are all using the Icsound library. This is not possible at the moment, because the digital to analog converter is regarded as an exclusive device. In the C/S implementation, there will simply be many clients communicating with one server process, which will have the access to the audio device.

8 CONCLUSIONS Software wrappers provide a clean object oriented class interface to existing software. These interfaces can simply be modified by defining new classes using the original class as a base class. An open system like HyperViews can easily be extended with new types of static media. But music and other time-based media differ fundamentally from static media. Static media should be regarded as a special case of the more general time-based media, and not the other way round. In attempts to extend systems designed for static media with dynamic media, these media cannot simply be implemented as leaf-node classes: if the underlying structure does not represent time, the resulting system will not be able to take full advantages of the many possibilities time-based media could add [4]. In the future, the DejaVu framework will provide the facilities to integrate time in the core of its hypermedia model, enabling a full support to the process of authoring dynamic media documents.

References [1] H. Davis, W. Hall, I. Heath, G. Hill, R. Wilkins. "Towards an Integrated Information Environment with Open Hypermedia Systems" ECHT’92: Proceedings of the Fourth ACM Conference on Hypertext, Milan, Italy November 1992 [2] A. Eliëns. "DejaVu — A Distributed Hypermedia Application Framework" Project proposal Vrije Universiteit, 1993 obtainable via anonymous ftp at ftp.cs.vu.nl:eliens/DejaVu.ps [3] A. Eliëns. "Hush — A C++ API for Tcl/Tk" The X Resource (submitted 1994) obtainable via anonymous ftp at ftp.cs.vu.nl:eliens/hush-api.ps [4] L. Hardman, D.C.A. Bulterman, G. van Rossum. "The Amsterdam Hypermedia Model: Extending Hypertext to Support REAL Multimedia" CWI Report CS-R9306 January 1993, Amsterdam [5] L. Hardman, D.C.A. Bulterman, G.van Rossum. "The Amsterdam Hypermedia Model: Adding Time and Context to the Dexter Model" Communications of the ACM Vol 37, No 2, pp. 50–62 February 1994 [6] Mark A. Linton, Paul R. Calder, John A. Interrante, Steven Tang, John M. Vlissides "InterViews Reference Manual" Version 3.1-Beta June 26, 1992 [7] J.K. Ousterhout. "Tcl: An Embeddable Command Language" USENIX 1990 [8] J.K. Oosterhout "An X11 Toolkit Based on the Tcl Language" Usenix 1991

[9] G.K. Rijvordt, M Kalkman "HyperViews — building a Hypermedia System using InterViews" Thesis, Vrije Universiteit Amsterdam November 1992 [10] Bjarne Stroustrup "The C++ Programming Language, 2nd. Edition" Addison-Wesley 1991 [11] Barry Vercoe "Csound, A Manual for the Audio Processing System and Supporting Programs with Tutorials" Media Lab M.I.T. 1993