Music via Motion: Transdomain Mapping of Motion and Sound for Interactive Performances KIA C. NG, MEMBER, IEEE Invited Paper
This paper presents a framework called Music via Motion (MvM) designed for the transdomain mapping between physical movements of the performer(s) and multimedia events, translating activities from one creative domain to another—for example, from physical gesture to audio output. With a brief background of this domain and prototype designs, the paper describes a number of inter- and multidisciplinary collaborative works for interactive multimedia performances. These include a virtual musical instrument interface, exploring video-based tracking technology to provide an intuitive and nonintrusive musical interface, and sensor-based augmented instrument designs. The paper also describes a distributed multimedia-mapping server which allows multiplatform and multisensory integrations and presents a sample application which integrates a real-time face tracking system. Ongoing developments and plausible future explorations on stage augmentation with virtual and augmented realities as well as gesture analysis on the correlations of musical gesture and physical gesture are also discussed. Keywords—Augmented reality, computer vision, gesture interface, multimedia systems, music, sensors, virtual reality.
I. INTRODUCTION In general, most traditional musical instruments, e.g., orchestra instruments such as violin, viola, timpani, and trombone, are designed based on the human body and the physical nature of audio production which dictate the timbre (and pitch/frequency range) of the particular instrument. The effectiveness of a musical instrument interface largely determines by the controllability and interactions of the instrument. Hence, the body motion and gesture, directly and indirectly, contribute to various important factors of artistic performances. However, due to the physical and analog generation of audio output, the pitch range of an instrument contributes to the overall size of the instrument. For example, a
Manuscript received February 5, 2003; revised November 12, 2003. This work was supported in part by the Arts Council England. The author is with the Interdisciplinary Centre for Scientific Research in Music (ICSRiM), School of Computing & School of Music, University of Leeds, Leeds LS2 9JT, U.K. (e-mail:
[email protected];
[email protected]). Digital Object Identifier 10.1109/JPROC.2004.825885
double bass can produce very much lower pitch compare to a violin, and it is also very much larger in physical size with respect to the physical dimension of a violin. This paper presents a research framework, Music via Motion (MvM), for the transdomain mapping between physical movements of the performer(s) and multimedia events [1], [2], detecting and tracking physical movements and translating the detected activities to another creative domain such as audio and graphics in accordance to a set of mapping rules and algorithms. The framework intends to explore good controllability, interface, and expressivity of a musical instrument; at the same time, it is designed to remove physical constrains imposed by the analog musical instrument. Imagine the possibility of a five-year-old child practicing a double bass without the requirements of physical strength, minimum finger lengths, and others. Starting with a brief introduction to the background of this domain, the paper describes the MvM prototype’s design and presents several inter- and multidisciplinary collaborative works for interactive multimedia performances, including an interactive multimedia performance involving dance, music, and costume design, using the proposed framework with a motion and color sensitive system, applying video-based machine vision technology to provide an intuitive and nonintrusive musical interface. Besides virtual music interface with motion tracking techniques, this paper describes a sensor-based system for an augmented instrument, exploring enhanced traditional instrument with sensor technologies. It presents an augmented drum prototype with flex sensors embedded in the drum brushes to enable additional controls and manipulations of real-time multimedia outputs, e.g., audio processing and Musical Instrument Digital Interface (MIDI), for a drum player. The paper also presents the design and development of a distributed multimedia-mapping server (dMMS) with dynamic mapping interface to provide effective integration with
0018-9219/04$20.00 © 2004 IEEE
PROCEEDINGS OF THE IEEE, VOL. 92, NO. 4, APRIL 2004
645
many existing computer vision prototypes which are developed on different platforms with different operating systems. Ongoing and new developments, together with plausible future explorations related to gesture-controlled interface and expressive gesture analysis, are also discussed. These include a study of expressive piano performance and also expressive ballet modeling and simulation.
II. BACKGROUND With the advancements in electronic and computing technologies, there has been increasing interests in new musical instrument design to augment traditional instruments [3], [4] with new capabilities, for example, of controlling synthesis parameters, visual output, or triggering sound samples [5], as well as new interface designs (both physical and virtual) [6] to provide better ergonomics considerations and/or to offer simpler instrumental control to a wider users. With such systems, the mode of interfaces, sensitivities, and reactions (output) are highly flexible and can be configured or personalized, allowing better access to musical instrument playing with shorter learning time. Research activities into gestural control for interactive performance have been very active, particularly in the past few years [7], [8]. This is reflected by the newly formed conferences specialized in this domain, such as the International Conference on New Interfaces for Musical Expression (NIME) [8] and the Music and Gesture International Conference [9]. This is also reflected by the number of commercially available sensors-to-MIDI devices, such as AtoMIC Pro [10], [11], I-Cube [12], MIDIcreator, MIDIbox, SensorLab, and others [13]. Examples of video-based performance system include STEIM’s BigEye [14], Rokeby’s Very Nervous System [15]–[20] and EyesWeb [21]–[23]. These systems explore musical output based on motion, translating/mapping human body movements onto musical sound with various configurable mapping strategies. The Very Nervous System has reported various installations in galleries and public outdoor spaces and some performances. EyesWeb aims to model interaction and analyze expressive content in movement. It is an open platform with a number of integrated hardware and software modules which is extensible. It supports multiple video cameras and various sensors. BigEye is designed to convert real-time (and also prerecorded QuickTime movies) video information into MIDI events (for the Macintosh computer). The system detects and tracked movement of up to 16 simultaneous objects based on the color, brightness, and size. Some of the system focuses on part of the human body. For example, Lyons et al. [24], [25] used the shape of the performer’s mouth as a musical controller to control musical effects. The system (called mouthesizer) uses a camera to track the opening and closing of the mouth to provide additional control to a performer without requirement of the hands or fingers, which are usually required for playing the 646
musical instrument. It was claimed that the mouth controller is more intuitive and easy to used than a foot pedal to control “wah-wah” and distortion effects for electric guitar. Examples of sensor-based performance system include the DIEM Digital Dance system [26]–[28], the Toy Symphony,1 and many others [29]–[33]. The Digital Dance system consists of 14 bending sensors which measure the angle of the performer’s limbs. The data captured is transmitted by a small wearable wireless transmitter to control music or lighting or live processing for the performances. The Toy Symphony uses a sensor-embedded toy as an interface for musical expressivity to widen accessibility of musical expression. The devices (toys) offer the players the ability to experience music without the time-consuming stages of complex motor control necessary for traditional instruments. While this could be useful for a taste of musical expression, [34] claimed that hard-to-play instruments provide better expressive capabilities and virtuosities. As mentioned in the last section, this paper also considers musical instrument augmentation using sensors. The design makes use of the instrument interface which is already familiar to or mastered by the performer and provides additional capability to the exiting interface. Related works which explore traditional music instruments with embedded sensors include Sho with bidirectional breath sensor [4], as well as nonmusical instruments with embedded sensors for musical playing—for example, the Sensor Shoe project [35]–[37]. With the breath sensor [4], the performance of traditional music instruments (Sho) is used to control digital multimedia events with live graphical display to further extend the functionality of the instruments, providing new interface and enhanced expressivity. Besides the above-mentioned Sensor Shoe project, there are also many attempts to use a computer controller such as a graphic tablet, joystick, and others as musical instruments/interface. There are a wide range of multimedia output and control interfaces that have been studied in this domain, including audio (as discussed in this paper), computer graphics, as well as robotics and other arts forms such as installation arts and others. Application of a robotic interface is presented in [38], where the robotic interface is regarded as a “moving instrument” to provide interaction, using MIDI to control robot navigation and generate music. A comprehensive background of related technologies, projects, and developments can be found in [11], [23], [39], and [40]. Reviews and surveys of the development of various sensing technologies and mapping models can be found in [7], [11], and [41]–[49]. Discussions on complex mapping strategies for expert instrument interaction can be found in [50]–[52]. Nowadays signal processing and synthesis techniques are capable of generating physically demanding (or impossible) sound with ease. It is necessary that the interface (virtual or otherwise) to these new music instruments is equally
1See
http://www.media.mit.edu/hyperins/ToySymphony/
PROCEEDINGS OF THE IEEE, VOL. 92, NO. 4, APRIL 2004
outputs. It is believed that the pilot study of expressive gesture (piano playing and ballet) will provide better insight for a body motion model for application in this context (see Section V). This section presents related sensing techniques and the next section presents several implementation of the MvM framework for different applications. Fig. 2 illustrates the overall structure of the discussion, starting from the framework with the three systems under discussion and their contexts, including future directions. B. MvM Prototype Fig. 1.
Main modules of MvM.
equipped with new functionalities and controllability to interface to these new dimensions without physical constraints or limitations. III. MUSIC VIA MOTION
Under the framework, the initial prototype uses input from a video camera to process video frames acquired in real time, detects and tracks visual changes, and makes use of the recognized movements to generate interesting and relevant musical events, based on an extensible set of mapping functions The prototype is equipped with motion and color detection modules [53]–[56]. Motion detection and tracking submodules include standard frame differencing and background subtraction.
A. The MvM Framework The fundamental idea for the framework is simply to map/translate detected changes in an environment to multimedia events. The MvM framework consists of five main modules, as shown in Fig. 1. • A data acquisition module, which is responsible for communication with the imaging and/or sensor hardware. • A motion-tracking module, which detects relevant activities and changes. • A multimedia-mapping module, which consists of an extensible set of mapping submodules, for translating detected changes onto musical events. • A graphical user interface (GUI) module (front end to the above module), which enables online configuration and control of the multimedia-mapping module. It provides overall control of the mapping functions used and parameters controls. Musical controls and basic filtering include scale type (tonality), note filters, and pitch and volume ranges. • An output module, which is responsible for the multimedia output—for example, sound samples or MIDI data playing. The framework aims to provide a modular approach for transdomain mapping, reusing existing software modules. The actual processing module (which is typically context dependent) is left out with the assumption that the mapping of detected activities is responsible for the appropriate processing and basic output (MIDI events) with direct mapping can be carried out by connecting the output of the multimedia module to a standard MIDI synthesiser or a multimedia soundcard. Music is inherently a cultural product that consists of sequences of events with frequency and temporal changes. In our case, this issue is viewed simplistically by defining and linking one or more detected changes/events to one or more
C. Musical Mapping MvM is equipped with several mapping functions, including a straightforward distance-to-MIDI mapping with various configurable parameters. With the default settings, the horizontal axis is used to control pitch and the vertical axis is used to control volume. Imagine a virtual keyboard in front of a user: by waving his/her hand from left to right, the user performs a series of notes, from a lower pitch to a higher pitch. MvM also offers the user a set of user-configurable active regions whereby detected visual activities in certain areas can be mapped onto different MIDI channels. Fig. 3 illustrates a user controlling four different MIDI channels with four active regions; left and right hands in different regions. Constant one-to-one direct mapping of movement can also be tiresome and uninspiring. For the MvM/Coat of Invisible Notes (CoIN) performance (described later), a background layer of music was specially composed to provide a coherent structure with various timed intervals for MvM to perform its solo candenza. Basic expressive features are being added to the MvM prototype. This includes an accent detector module, which keeps a history of the region size of the detected visual changes, the directions and speed of the motion, and their means. Sudden changes in these parameters are used to control factors in audio generation. Musical mapping can also be enhanced with a database of composed musical phrases, and several mapping layers can be overlaid in order to produce multilayered and polyphonic effects (see Fig. 4). In general, mapping strategies can be categorized as follows: • one-to-one mapping; • one-to-many mapping; • many-to-one mapping; and • many-to-many mapping.
NG: MUSIC VIA MOTION: TRANSDOMAIN MAPPING OF MOTION AND SOUND FOR INTERACTIVE PERFORMANCES
647
Fig. 2.
MvM implementations and contexts.
and effectively integrate many existing research systems which were developed on various platforms with different operating systems, data communications between the main modules were redesigned, using sockets to enable cross platforms and distributed processing. The mapping module was reimplemented as a separate dMMS, which listens for input data via a stream socket connection on a specific port and processes the data using the originally designed mapping module. With standard networking protocol and the Internet, a distributed MvM (see Fig. 5) which allows remote collaborative performances is developed to enable platform independent integration. Each main module (e.g., tracking module, mapping module) could be executed on machines with different operating systems and at remote geographical locations (as illustrated in Fig. 6). Fig. 3. Active regions for controlling different MIDI channels.
IV. MVM IMPLEMENTATIONS AND CASE STUDIES Mapping strategy is vital for new musical interface design [7], [20], [47]. MvM/CoIN explores various different modes of mapping depending on the context of the specific application and the number of input and output/control parameters. For example, in order to provide a clear and direct relationship of motion and sound, the MvM/CoIN interactive dance performance starts with a simple one-to-one direct mapping with simple motion. As the performance progresses, the mapping strategies becomes more abstract, exploiting complex relationships with multilayered mappings. This seems to work well, in this case, as it allows audiences to follow the performance with increasing understanding and appreciation. However, this is by no means a generic model. Each mapping model has its purpose and usage depending on the context of the application. For example, a piano keyboard interface to pitch uses a one-to-one mapping, since each key represents a specific pitch. D. Distributed Multimedia-Mapping Server It was found that there are many existing research prototypes in visual tracking and sensing which could be integrated into the tracking module. To seamlessly 648
This section discusses three implementation of the MvM framework under different contexts. • CoIN, which integrates costume design, music, and dance; • Interactive Music Head, which explores facial expression tracking; and • Augmented Drum (AugDrum), which uses sensors to provide additional multimedia control for the player.
A. Coat of Invisible Notes CoIN is a collaborative project designed to bring together multiple creative domains to build special costumes, music and dance within an interactive audiovisual performance interface simulated by the MvM (see Figs. 7 and 8). For MvM/CoIN performances, MvM is configured to detect and track the color where visual changes were detected. This feature is particularly apparent in a section of the choreography where the dancers are divided into two groups (see Fig. 7, bottom), wearing different-colored costumes. The contrasting movements and interactions between the PROCEEDINGS OF THE IEEE, VOL. 92, NO. 4, APRIL 2004
Fig. 4.
Multilayered mapping.
Fig. 5. An example of distributed MvM configuration.
Fig. 7. A public MvM/CoIN performance. Two groups of dancers generating a musical interlude (bottom image).
Fig. 6. Distributed MvM modules for the Interactive Music Head.
two groups create interesting musical dialogues with two different musical sounds. Detected colors are used to control the choice of musical sound and effects. Hence, the visual changes of the costumes can also be used to control the character of the musical responses. The costume designs feature reversible and modular parts, allowing the dancers to reconfigure and reassemble the costumes to create different visual effects; at the same time, these transformations are detected and reacted by MvM. When a dancer chooses a costume in a different color or reconfigures/reassembles the costume, s/he is in effect changing his/her musical instrument. Each color is precalibrated with MvM before the performance and assigned to a separate MIDI channel, which contains audio samples and files with distinct timbres.
NG: MUSIC VIA MOTION: TRANSDOMAIN MAPPING OF MOTION AND SOUND FOR INTERACTIVE PERFORMANCES
649
Fig. 8.
A public MvM/CoIN performance.
Fig. 10.
Configurable mapping module.
as another input stream of the MvM system. This allows a performer to control musical events or to influence the mood of a performance merely through changes in facial expression. A dynamically controllable GUI is designed in order to provide a flexible mapping configuration (see Fig. 10). Each tracked feature (e.g., mouth, left eye) can be used to produce independent input to the mapping module, and each of these input features can be mapped to control an extensible set of multimedia events with independent configurations. For example, the opening of a mouth can be mapped to control a low-pass filter. A musical interface such as this would provide alternative routes to musical creativity for many people, including those for whom conventional instruments are not an option due to physical constraints. Fig. 9. Real-time face tracker system with spline curves classifying primary face structures.
B. Interactive Music Head The dMMS was first tested for the Interactive Music Head collaborative project, which integrates MvM with a real-time face (and expression) tracker project, which aims to create a synthetic talking head intended for mediating interaction between humans and machines [57], [58] (see Fig. 9). This system tracks the shape and position of facial features, producing sets of coordinates approximating the outlines of the eyes, eyebrows, nose, and mouth. A module has been developed for MvM which packages these coordinates and transmits them to the mapping module 650
C. Augmented Drum In addition to camera input, physical sensors can be used to provide raw data as input to the mapping module. For the MvM/AugDrum project, a flex sensor (see Fig. 11) is embedded in a drum brush to track the bending angle of the brushes during performance. This sensor changes its resiswith respect to its tance exponentially between 10 and 35 bending between 0 to 90 . An interface circuitry based on a PIC16x84 microcontroller is designed and built in order to detect the bend of the drum brush and communicate the data via a serial port of a PC. The overall function of the system is to convert drum-brush pressure into a serial bit stream. This signal is then used as an input to the dMMS. PROCEEDINGS OF THE IEEE, VOL. 92, NO. 4, APRIL 2004
• The wire connecting the brush handle to the microcontroller box sometimes interfere with the movement, typically after a lengthy performance or rehearsal. • The serial cable linking the microcontroller box to the computer represents another weak point. Users tend to forget to remove the wearable microcontroller box (since it is lightweight), and exert pressure on this interconnection. These issues have led to the design and development of a wireless version of the AugDrum, which represents the work in hand. Fig. 12 illustrates the design of an RF linked drum brush which uses a microcontroller with an inbuilt RF transmitter (rfPIC12C509AG). The flex sensor voltage is sampled at 9600 Hz, which is set by the baud rate used for the serial transmission. The maximum clock rate for the microcontroller, 4 MHz, defines the upper limit. The frequency used is in the 433-MHz band, which is exempted by the U.K. license. The circuitry design is straightforward: a signal conditioning stage, op-amp based, an analog-to-digital converter, a microcontroller, and an RF transmitter. E. Evaluation of MvM Implementations
Fig. 11.
Flex sensor and microcontroller interface box.
With the dMMS, the flex sensor data can be mapped onto different control parameters to give different effects or other multimedia events. The interface provides the user the ability to play a drum with an additional layer of multimedia event (e.g., MIDI or sample data, or to control graphic/video projection). In this case, the user could improvise a melodic line with one sensor-embedded brush and at the same time control the accompaniment with other sensor and the real drum. It also allows the performer to play “in the air,” drumming without a physical drum surface. The distributed MvM system allows this input to be used in combination with any other set of input modules, for example, adding a camera on the player. Building up components in this way allows the creation of completely customized performance tools to suit the situation in which they will be used. The advantage of using a traditional instrument to add augmentation is that the player is provided with a familiar interface which can be used to control additional multimedia events.2 D. Wireless AugDrum The AugDrum prototype was very enthusiastically received by various drummers. With the feedbacks from the users, we found two main limitations. 2Short video sequences from a public demo and performance of the AugDrum [59] can be found at http://www.leeds.ac.uk/icsrim/mvm/maxis02.htm
This section has described three implementations of the MvM framework (Interactive Music Head, CoIN, AugDrum), each with slightly different contexts. There have been various public performances of the CoIN. With basic computing resources under performance tour situation, it was found that around 15 frames/s video processing is sufficient to offer realistic interactive performances. For a choreographed performance, sound design is an important factor in providing a coherent and relevant feedback. Constant fast-moving notes with a single mapping strategy does not provide good feedback in the longer term. The overall performance requires form and structure with different and appropriate mapping strategies for different sections of the performance. The Interactive Music Head implementation remains a demo in the laboratory due to the requirement of a fast network (for the network camera as well as the parallel computer for the facial tracking processing). AugDrum implementation has received various performances and also applications as installations. The hardware design is robust and flexible, and it is easy to include various sensors. Currently the setup is being used to track tempo (by the rate of finger tapping on the sensor) to drive a MIDI accompaniment for a singer. The sensor-based and video-based implementation can also be integrated to provide an interactive environment, using video tracking for overall movement and using sensor tracking for detail and localized controlled. V. FUTURE DIRECTIONS Besides multimedia and interactive performances, we are also interested in three-dimensional (3-D) gesture tracking and analysis to explore the performance and communication
NG: MUSIC VIA MOTION: TRANSDOMAIN MAPPING OF MOTION AND SOUND FOR INTERACTIVE PERFORMANCES
651
Fig. 12. An illustration of the RF linked drum-brush design, using a microcontroller with an inbuilt RF transmitter (rfPIC12C509AG).
factors of physical gesture and movements for stage performances. From the current work, we are working toward the analysis and modeling of expressive gestures, exploring the correlations between musical gestures and physical gestures for musical instrument performances as well as dance gestures and expressivities modeling and simulations. From the prototypes, it is clear that a model for expressive motion is required for transdomain mapping. Currently expressive motions are being studied in two contexts: ballet and piano playing. In the case of piano gesture, we are capturing 3-D position of the upper body of the players and looking for the relationship between the body motion (relative position, relative speed, and acceleration) and the musical changes (e.g., tempo). The experimental designs aim to explore the correlation between physical gesture and musical gesture and hope to find out how to detect and model when one gesture would lead another and when one gesture would be affected by the other. The reason for studying expressive ballet motion is that ballet consists of standard key poses which are notated (dance score), and we hypothesized that the expressive intent must be contained in the movement between the key poses. The experimental designs aim to find differences of pathway and time required between key poses for different expressivities. A. Expressive Motion Analysis Few musicians perform without much body movements. Besides the mechanics of the instrumental playing, the body of the performer tends to move or sway in an individualistic manner, consciously or unconsciously, which may or may not reflect the music. Currently, we are working on a study to analyze the body motion of pianists with respect to expressive timing/dynamics. The pianists’ movements were captured with a set of three 3-D IR cameras, while the timing and volume (MIDI velocities) of their playing were recorded using a Yamaha Disklavier [60]. A photograph of the data acquisition session is presented in Fig. 13. B. Expressive Ballet Similar work is currently under development for ballet. Collections of standard ballet motion with different emotional intent are captured with live dancers in order to learn the principal parameters which could encapsulate the expression. The learned model [61]–[63] is intended to drive 3-D 652
Fig. 13. An expressive piano data capture session with 3-D IR camera system.
virtual dancers [64] for a number of applications, including the following: • virtual chorography; • augmented dance with interaction between live and syntactic dancers [65]; and • e-learning environment and educational resources. From the basic interactive performance system, the research has also expanded into the study of motion and expression, and other transdomain mappings. Current collaborative works include the following: • the expressive dance project, which aims to analyze, model, and generate virtual simulation of dance with expression (see Fig. 14); • the expressive piano performance project, which explores the relationship between the gestural motions, musical outputs, and expressions; and • transdomain mapping of sound and graphics. C. Augmented Stage Video accompaniment for stage performance is now popularly used [66] to provide alternative representations or for virtual interaction. Works in hand include the use of live intermediate image processing data with overlaid graphics to augment the stage visually, and the application of the 3-D model of the real environment [in Virtual Reality Modeling Language (VRML) format] to transform the stage into a virtual interactive world. Fig. 15 shows an example of a photorealistic 3-D model digitized from the real environment [67]–[70]. Since the PROCEEDINGS OF THE IEEE, VOL. 92, NO. 4, APRIL 2004
Fig. 14.
An expressive ballet data capture session.
Fig. 15. Three-dimensional reconstruction of real environment using laser and video technologies. (Top) 3-D photorealistic textured model. (Bottom) Wire-frame view of the model.
whole scene is represented by graphical primitives, the surfaces can be easily modified and/or animated. Operations such as transformation, translation, and many other functions are possible. Physically demanding or impossible scenarios can be virtually created, projected onto the stage, and dynamically altered by the mapping module, interacting with the performer.
the theme of interactive music [13], [71], [72]. It described three prototypes designed under the MvM framework: • a motion and color sensitive video tracking system; • an interactive multimedia face tracking; and • an augmented drum with flex sensor. The main features of these works all involved interactive control of multimedia events, e.g., musical sound, utilizing tracking technologies (video-based and sensor-based ones were described) to sense the performance and translate the detection activities into relevant output accordance to the mapping strategies used. The first two systems use real-time video and can be viewed as examples of virtual instruments: instruments without physical constraints, with dynamic and flexible configuration and mapping. In the first case, the whole body of the user/player could be an instrument; hence, dance was naturally one of the first application domains. The system was successfully utilized in a collaborative project, using the framework to integrate choreography, costume design, sound design, and composition. The second case uses features of the face (eyes, mouth) to control synthesis parameters—for example, opening distance of a mouth to influence pitch. The third system is a sensor-based system (MvM/AugDrum) which uses standard drum brushes with embedded electronic sensor to measure the bending of the brush, in order to provide additional control to the drummer. With the advancement of processing power and capability of real-time multimedia processing, it is very exciting to see the exploration of various electronic interfaces as musical interfaces, for example, datagloves [73] and others [74]. At the same time, many gestural controls for music have been developed, for example [8], [13], [75], and [76]. The development in this wide and varied interdisciplinary field has prompted a European Concerted Research Action, Gesture Controlled Audio Systems (Con-GAS), to contribute to the advancement of the development of different gesture data analysis and capture/actuation aspects connected to the control of digital sound and music processing. MvM explores the interdisciplinary integration of music, engineering, and interactive multimedia technologies to provide intuitive and nonintrusive musical interface and control mechanism to simulate virtual and augmented musical instruments, as well as interactive audiovisual environment for stage performance and installation. It aims to enable new collaborations, integrating multiple creative domains, crossing the boundaries of science and the arts to widen the participation in and access to music creativity and music playing without physical constraints [77], [78]. It is hoped that the active progress and advancement in this inter- and multidisciplinary domain will provide new possibilities to musical creativities and expressivities for all. ACKNOWLEDGMENT
VI. CONCLUSION This paper presented the MvM framework and discussed a number of sample applications which can be grouped under
The author would like to thank the RESOLV consortium for the 3-D models, and V. Devin, A. Galata, and D. Hogg for the Interactive Music Head collaboration involving the
NG: MUSIC VIA MOTION: TRANSDOMAIN MAPPING OF MOTION AND SOUND FOR INTERACTIVE PERFORMANCES
653
real-time face tracker system. The author would also like to thank the MvM/CoIN and MvM/AugDrum project teams, including I. Symonds, R. Sage, A. Bohn, J. Cook, J. Scott, and R. Neagle. REFERENCES [1] K. C. Ng, “Music via motion,” presented at the XIII CIM 2000—Colloq. Musical Informatics, L’Aquila, Italy, 2000. [2] , “Music via Motion (MvM): virtual and augmented performances,” in Proc. Electronic Imaging: The Visual Arts and Beyond (EVA2002), pp. 1–9. [3] B. Schoner, C. Cooper, and N. Gershenfeld, “Cluster-weighted sampling for synthesis and cross-synthesis of violin family instruments,” in Proc. Int. Computer Music Conf. (ICMC 2000), pp. 376–379. [4] Y. Nagashima, “Composition of ‘visional legend’,” presented at the Int. Workshop on Human Supervision and Control in Engineering and Music, Kassel, Germany, 2001. [5] J. A. Paradiso, K.-Y. Hsiao, J. Strickon, and P. Rice, “New sensor and music systems for large interactive surfaces,” in Proc. Int. Computer Music Conf. (ICMC 2000), pp. 277–280. [6] H. Livingston, “Paradigms for the new string instrument: digital and materials technology,” Int. J. Music Technol.: Organized Sound, vol. 5, no. 3, pp. 135–147, 2000. [7] A. Hunt and R. Kirk, “Mapping strategies for musical performance,” in Trends in Gestural Control of Music, M. Wanderley and M. Battier, Eds. Paris, France: IRCAM—Centre Pompidou, 2000. [8] Int. Conf. New Interfaces for Musical Expression (NIME2002), Dublin, Ireland. [9] Int. Conf. Music and Gesture, Norwich, U.K., 2003. [10] E. Fléty, “AtoMIC pro: a multiple sensor acquisition device,” in Proc. Int. Conf. New Interfaces for Musical Expression, 2002, pp. 96–101. [11] M. M. Wanderley and M. Battier, Eds., Trends in Gestural Control of Music. Paris, France: IRCAM—Centre Pompidou, 2000. [12] A. Mulder, “The I-cube system: moving toward sensor technology for artists,” presented at the 6th Symp. Electronic Arts (ISEA 95), Montreal, QB, Canada, 1995. [13] ICMA/EMF Working Group on Interactive Systems and Instrument Design in Music (ISIDM) [Online]. Available: www.igmusic.org [14] A Real-Time Video to MIDI Macintosh Software, BigEye. [Online]. Available: http://www.steim.nl/bigeye.html [15] D. Rokeby, “The construction of experience: interface as content,” in Digital Illusion: Entertaining the Future With High Technology, C.Clark Dodsworth Jr., Ed. New York: ACM, 1997. , (1995) Lecture for ‘Info art’. Kwangju Biennale, Kwangju, [16] Korea. [Online]. Available: http://www.interlog.com/drokeby/install.html [17] , Body language: ACM SIGGRAPH art show, Atlanta, GA, 1988. [18] , Transforming mirrors: Subjectivity and control in interactive media. [Online]. Available: http://www.interlog.com/~drokeby/ mirrorsintro.html [19] T. Winkler, “Creating interactive dance with the Very Nervous System,” presented at the 1997 Connecticut College Symp. Art and Technology, New London, CT, 1997. , Composing Interactive Music: Techniques and Ideas Using [20] Max. Cambridge, MA: MIT Press, 1998. [21] Proc. AIMI Int. Workshop Kansei: The Technology of Emotion, A. Camurri, Ed., Genova, Italy, 1998. [22] A. Camurri, S. Hashimoto, M. Ricchetti, A. Ricci, K. Suzuki, R. Trocca, and G. Volpe, “EyeWeb: toward gesture and affect recognition in interactive dance and music systems,” Comput. Music J., vol. 24, no. 1, pp. 57–69, 2000. [23] A. Camurri, P. Coletta, M. Peri, M. Ricchetti, A. Ricci, R. Trocca, and G. Volpe, “A real-time platform for interactive performance,” presented at the Int. Computer Music Conf. (ICMC 2000), Berlin, Germany. [24] M. J. Lyons, M. Haehnel, and N. Tetsutani, “The mouthesizer: a facial gesture musical interface,” in Conf. Abstract SIGGRAPH 2001, p. 230. [25] M. J. Lyons and N. Tetsutani, “Facing the music: a facial action controlled musical interface,” in Proc. Conf. Human Factors in Computing Systems, 2001, pp. 309–310.
654
[26] W. Siegel and J. Jacobsen, “The challenges of interactive dance, an overview and case study,” Comput. Music J., vol. 22, no. 4, pp. 29–43, 1998. [27] , “Composing for the digital dance interface,” in Proc. Int. Computer Music Conf. (ICMC 1999), pp. 276–277. [28] W. Siegel, “Two compositions for interactive dance,” in Proc. Int. Computer Music Conf. (ICMC 1999), pp. 56–59. [29] D. Rubine and P. McAvinney, “The videoharp,” in Proc. Int. Computer Music Conf. (ICMC 1988), pp. 49–55. [30] A. Sato, T. Harada, S. Hashimoto, and S. Ohteru, “Singing and playing in musical virtual space,” in Proc. Int. Computer Music Conf. (ICMC 1991), pp. 289–292. [31] H. Sawada, S. Ohkura, and S. Hashimoto, “Gesture analysis using 3D acceleration sensor for music control,” in Proc. Int. Computer Music Conf. (ICMC 1995), pp. 257–260. [32] D. Keane and P. Gross, “The MIDI baton,” in Proc. Int. Computer Music Conf. (ICMC 1989), pp. 151–154. [33] E. J. Paulos, “Personal tele-embodiment,” Ph.D. dissertation, Univ. California, Berkeley, CA, 2001. [34] J. Ryan, “Effort and expression,” in Proc. Int. Computer Music Conf. (ICMC 1992), pp. 414–416. [35] S. J. Morris and J. A. Paradiso, “Shoe-integrated sensor system for wireless gait analysis and real-time feedback,” in Proc. 2nd Joint IEEE Engineering in Medicine and Biology Soc. and Biomedical Engineering Soc. Conf., 2002, pp. 2468–2469. [36] J. Paradiso and E. Hu, “Expressive footwear for computer-augmented dance performance,” presented at the 1st Int. Symp. Wearable Computers, Cambridge, MA, 1997. [37] S. Pardue and J. A. Paradiso, “Musical navigatrics: new musical interactions with passive magnetic tags,” in Proc. New Interfaces for Musical Expression 2002 Conf. (NIME-02), pp. 168–169. [38] K. Suzuki, K. Tabe, and S. Hashimoto, “A mobile robot platform for music and dance performance,” in Proc. Int. Computer Music Conf. 2000, pp. 539–542. [39] J. Paradiso, “Electronic music interfaces: new ways to play,” IEEE Spectr., vol. 34, pp. 18–30, Dec. 1997. [40] M. M. Wanderley and P. Depalle, “Gestural control of computergenerated sound,” Proc. IEEE (Special Issue on Engineering and Music: Supervisory Control and Auditory Communication), vol. 92, pp. 632–644 , Apr. 2004. [41] A. Hunt, “Radical user interfaces for realtime musical control,” D.Phil. dissertation, University of York, York, U.K., 1999. [42] A. Hunt and R. Kirk, “Radical user interfaces for real-time control,” in Proc. 25th Euromicro Conf. (EUROMICRO ’99), vol. 2, pp. 2006–2012. [43] T. Marrin-Nakra, “Searching for meaning in gestural data: interpretive feature extraction and signal processing for affective and expressive content,” in Trends in Gestural Control of Music, M. Wanderley and M. Battier, Eds. Paris, France: IRCAM—Centre Pompidou, 2000. , “Inside the conductor’s jacket: Analysis, interpretation and [44] musical synthesis of expressive gesture,” Ph.D. dissertation, Media Lab, Massachusetts Inst. Technol., Cambridge, MA, 2000. [45] T. Marrin and J. Paradiso, “The digital baton: a versatile performance instrument,” presented at the Int. Computer Music Conf. ’97, Thessaloniki, Greece. [46] M. M. Wanderley, “Gestural control of music,” presented at the Int. Workshop on Human Supervision and Control in Engineering and Music, Kassel, Germany, 2001. [47] , “Performer-instrument interaction: Applications to gestural control of sound synthesis,” Ph.D. dissertation, University of Paris, Paris, France, 2001. [48] K. Warwick, “Linking human neural processes directly with technology—the future,” in Proc. Int. Conf. Artificial Neural Nets and Genetic Algorithms, 1999, pp. 8–13. [49] D. Wenn, R. J. Mitchell, and M. Gabb, “Toward the control room of the future,” in Proc. IEE Conf. People in Control (PIC 2001), pp. 79–85. [50] A. Hunt, M. Wanderley, and R. Kirk, “Toward a model for instrumental mapping in expert musical interaction,” in Proc. Int. Computer Music Conf. (ICMC 2000), pp. 209–212. [51] S. Maruyama, “How is a symphony started by the orchestra? An empirical study for interpersonal coordination between the professional orchestral conductor and player,” presented at the Int. Workshop on Human Supervision and Control in Engineering and Music, Kassel, Germany, 2001.
PROCEEDINGS OF THE IEEE, VOL. 92, NO. 4, APRIL 2004
[52] H. Morita, S. Hashimoto, and S. Ohteru, “A computer music system that follows a human conductor,” IEEE Computer, vol. 24, pp. 44–53, July 1991. [53] Y. Raja, S. McKenna, and S. Gong, “Segmentation and tracking using color mixture models,” in Proc. Asian Conf. Computer Vision (ACCV), vol. 1, 1998, pp. 607–614. [54] M. S. Drew, J. Wei, and Z.-N. Li, “Illumination-invariant color object recognition via compressed chromaticity histograms of normalized images,” in Proc. 6th Int. Conf. Computer Vision, 1998, pp. 533–540. [55] D. Chai and K. N. Ngan, “Locating facial region of a head-andshoulders color image,” in Proc. 3rd Int. Conf. Automatic Face and Gesture Recognition, 1998, pp. 124–129. [56] J. L. Crowley and J. M. Bedrune, “Integration and control of reactive visual processes,” in Proc. 3rd Eur. Conf. Computer Vision, vol. 2, 1994, pp. 47–58. [57] V. E. Devin and D. C. Hogg, “Reactive memories: An interactive talking-head,” School Comput., Univ. Leeds, Leeds, U.K., Rep. 2001.09, 2001. [58] K. Sobottka and I. Pitas, “Face localization and feature extraction based on shape and color information,” in Proc. IEEE Int. Conf. Image Proceeding, 1996, pp. 483–486. [59] K. C. Ng, I. Symonds, J. Scott, J. Cook, A. Bohn, and R. Sage, “Music via motion: an interactive multimedia installation with video and sensor,” presented at the MAXIS: Festival of Sound and Experimental Music, Sheffield, U.K., 2002. [60] M. Das, D. M. Howard, and S. L. Smith, “Motion curves in music: the statistical analysis of MIDI data,” in Proc. 25th Euromicro Conf. (EUROMICRO ’99), vol. 2, pp. 2013–2019. [61] N. Johnson, “Learning object behavior models,” Ph.D. dissertation, School Comput. Stud., Univ. Leeds, Leeds, U.K., 1998. [62] N. Johnson, A. Galata, and D. Hogg, “The acquisition and use of interaction behavior models,” in Proc. IEEE Computer Soc. Conf. Computer Vision and Pattern Recognition, 1998, pp. 866–871. [63] R. J. Mitchell, J. M. Bishop, D. A. Keating, and K. Dautenhahn, “Cybernetic approaches to artificial life,” Kunstliche Intelligenz (Special Edition on Artificial Life), vol. 1, pp. 5–11, 2000. [64] N. I. Badler, R. Bindiganavale, J. Rourne, J. Allbeck, J. Shi, and M. Palmer, “Real time virtual humans,” presented at the 4th Int. Conf. Digital Media Futures, National Museum of Photography, Film and Television, Bradford, U.K., 1999. [65] P. Volino and N. Magnenat-Thalmann, “3D fashion design and the virtual catwalk,” presented at the 4th Int. Conf. Digital Media Futures, National Museum of Photography, Film and Television, Bradford, U.K., 1999. [66] V. J. Vincent. Mandala virtual reality system. VIVID Group. [Online]. Available: http://www.vividgroup.com [67] R. J. Neagle, K. C. Ng, and R. A. Ruddle, “Notation and 3D animation of dance movement,” in Proc. Int. Computer Music Conf. (ICMC 2002), pp. 459–462. [68] K. C. Ng, V. Sequeira, S. Butterfield, D. C. Hogg, and J. G. M. Gonçalves, “An integrated multi-sensory system for photo-realistic 3d scene reconstruction,” in Proc. ISPRS Int. Symp. Real-Time Imaging and Dynamic Analysis, 1998, pp. 356–363. [69] K. C. Ng, V. Sequeira, E. Bovisio, N. Johnson, D. Cooper, J. G. M. Gonçalves, and D. C. Hogg, “Playing on a holo-stage: toward the interaction between real and virtual performers,” in Digital Creativity: A Reader, C.Colin Beardon and L.Lone Malmborg, Eds. Lisse, Switzerland: Swets & Zeitlinger, 2002. [70] V. Sequeira, K. C. Ng, E. Wolfart, J. G. M. Gonçalves, and D. C. Hogg, “Automated reconstruction of 3d models from real environment,” ISPRS J. Photogrammetry Remote Sensing, vol. 54, no. 1, pp. 1–22, 1999.
[71] G. Johannsen, “Human supervision and control in engineering and music,” presented at the Int. Workshop Human Supervision and Control in Engineering and Music, Kassel, Germany, 2001. [72] Parasite: Event for invaded and involuntary body, Stelarc. (1997). [Online]. Available: http://www.stelarc.va.com.au/parasite/index.htm [73] A. Mulder, “Design of virtual three-dimensional instruments for sound control,” Ph.D. dissertation, Simon Fraser Univ., Burnaby, BC, Canada, 1998. [74] F. Babiloni, F. Cincotti, L. Lazzarini, J. del R. Millán, J. Mouriño, M. Varsta, J. Heikkonen, L. Bianchi, and M. G. Marciani, “Linear classification of low-resolution EEG patterns produced by imagined hand movements,” IEEE Trans. Rehab. Eng. (Special Issue on Brain–Computer Interfaces), vol. 8, pp. 186–188, June 2000. [75] L. Tarabella and B. Graziano, “Giving expression to multimedia performance,” presented at the ACM Multimedia Workshop, Marnia Del Rey, CA, 2000. [76] M. Feldmeier, M. Malinowski, and J. A. Paradiso, “Large group musical interaction using disposable wireless motion sensors,” in Proc. Int. Computer Music Conf. 2002, pp. 83–87. [77] T. Anderson, “Using music performance software with flexible control interfaces for live performance by severely disabled musicians,” in Proc. 25th Euromicro Conf. (EUROMICRO ’99), vol. 2, pp. 2020–2029. [78] L. M. Taylor and J. M. Bishop, “Computer mediated communication use by the deaf,” in Proc. World Organization Systems and Cybernetics, 1999, pp. 185–189.
Kia C. Ng (Member, IEEE) received the B.Sc. (Hons.) degree in computational science and music and the Ph.D. degree in computer vision from the University of Leeds, England. He is currently Director of the Interdisciplinary Centre for Scientific Research in Music (ICSRiM) and Senior Lecturer at the School of Computing and School of Music, University of Leeds. He is also Chairman of Music Imaging Ltd., U.K. He is a leading expert in optical music recognition, working on computer recognition and translation of printed and handwritten music manuscripts. His Music via Motion system has been widely featured in the media, including the BBC’s News 24 and the BBC’s Tomorrow’s World Plus and Sky TV Science Review, and in the Financial Times, the New York Times, and others. He is an Editorial Consultant of the Computer Music Journal and the MIT Press, Cambridge, MA. His research links together works in the schools of computing, music, and electronic and electrical engineering on multimedia, computer vision, computer music, and digital media. Dr. Ng is a Member of the IEEE Computer Society, the British Computer Society, Institute of Directors, and a Fellow of the Royal Society for the encouragement of arts, manufacture, and commerce. In 2003, he organized the Third International Conference on Web Delivering of Music (WEDELMUSIC-2003) and the Second International Festival and Symposium of Sound and Experimental Music (MAXIS-2003). He is the General Chair of the AISB 2004 Convention on Motion, Emotion, and Cognition (Artificial Intelligence and the Simulation of Behavior) (http://www.kcng.org). His paper on 3-D reconstruction of real environment won the prestigious U.V. Helava Award (Best Paper 1999, ISPRS Journal, Elsevier Science), and a Young Author’s Award at the International Society for Photogrammetry and Remote Sensing Commission V Symposium on Real-Time Imaging and Dynamic Analysis.
NG: MUSIC VIA MOTION: TRANSDOMAIN MAPPING OF MOTION AND SOUND FOR INTERACTIVE PERFORMANCES
655