Exploring New Perspectives in Network Music

Chrisoula Alexandraki∗ and Demosthenes Akoumanakis∗∗ ∗

Department of Music Technology and Acoustics Technological Educational Institute of Crete E. Daskalaki 1, Perivolia 74100 Rethymnon, Crete, Greece [email protected] ∗∗ Department of Applied Informatics and Multimedia Technological Educational Institute of Crete Estavromenos 71500 Heraklion, Crete, Greece [email protected]

Exploring New Perspectives in Network Music Performance: The DIAMOUSES Framework

c

Computer Music Journal, 34:2, pp. 66–83, Summer 2010 2010 Massachusetts Institute of Technology.

advancements and the proliferation of the Internet, networked music performance has still remained a challenge. The main technological barriers to implementing realistic NMP systems concern the fact that these systems are (1) highly sensitive in terms of latency and synchronization, because of the requirement for real-time communication, and (2) highly demanding in terms of bandwidth availability and error alleviation, because of the acoustic properties of musical signals. Latency is the most important hurdle to be dealt with, as it is introduced throughout the entire process of capturing, transmitting, receiving, and reproducing audio streams. Existing latency may be due to hardware and software equipment, network infrastructures, and the physical distance separating the communicating musicians. Even worse, latency variation, referred to as network jitter, forms an additional barrier in ensuring smooth and timely signal delivery. Moreover, an additional challenge relates to the actual experience of collaborative NMP and the value resulting from making this practice a virtual endeavor in cyberspace. Specifically, networked media may facilitate mechanisms such as sharing, feedback, and feed-through, thus catalyzing not only how music is produced and marketed but also how it is conceived, negotiated, made sense of, and ultimately created. Presently, the majority of NMP systems that have been developed and explored are working on the technical challenges of NMP. Although a few of these systems manage to reduce latencies to the minimum possible value, none can actually guarantee timely— within certain limits—exchange of audio streams

66

Computer Music Journal

This article reports on the recent results of a research and development project on network music performance (NMP), named DIAMOUSES. DIAMOUSES is more than a standalone application for real-time exchange of audiovisual streams through the network, as it forms an open framework that aims to enable a wide range of applications and services for intuitive music expression and collaborative performance among geographically separated musicians. The realization of the DIAMOUSES open framework for NMP addresses several technological issues and contributes to the solution of a number of problems relating to human perception and human-computer interaction. It also contributes to the establishment of social connections between peers who collaborate through networks in which music is dynamically created, negotiated, and exchanged on the fly. This article describes DIAMOUSES in terms of technical orientation and validation, as well as in terms of user experience as revealed through a series of empirical studies. Collectively, these results provide useful insights into a variety of research questions on NMP and into a number of issues pending further attention.

Introduction Physical proximity of musicians and co-location in a physical space are typical pre-requisites for collaborative music performance. Despite technological

over large geographical distances. DIAMOUSES does not offer any major novelty in this regard, as it turns out that current technology cannot cope effectively with the geographical dispersion of performers in NMP scenarios. Nevertheless, despite the compromise with the inevitable physical and technological obstacles, DIAMOUSES proceeds one step further to provide a framework that we believe will have, in affordable technological conditions, a greater impact on forming community networks for music performance. The main idea of the DIAMOUSES approach originates from the fact that music performance may occur in different contexts and for various purposes, such as live concerts, improvisation and jam sessions, recording sessions, interactive compositional sessions, music lessons, and master classes. Clearly, different contexts of use raise different requirements, both in terms of the underlying technological infrastructures and in terms of the interaction practices that must be supported. This is evident from the fact that research in different music performance contexts brings up significantly different issues and research priorities. For instance, in live music performance the key research issue is timely and accurate delivery of audible streams (Caceres and Chafe 2009). However, in remote ´ music-learning scenarios the research focus is on providing appropriate pedagogical paradigms and on exploring methods for the evaluation of student progress (Ng and Nesi 2008). Furthermore, in collaborative music composition, a key challenge is to represent musical events effectively and to devise appropriate symbolic musical notations (Hajdu 2006). DIAMOUSES presents a different angle by facilitating a variety of research priorities and collaboration practices over a single customizable framework.

Current Trends and Research Focus Although musicians and researchers have always been fascinated by the idea of playing music over a network, realistic bidirectional musical interactions were made possible only around 2001, after the advent of high-speed network backbones for

academia. Because there exist reports dedicated to the origins of NMP and follow-up advances (Barbosa 2003; Follmer 2005), the current section does not ¨ elaborate on such efforts. In the following subsections we first provide an overview of key research issues and related works that motivate our current work, and then we sketch the boundaries and the distinctive focus of the DIAMOUSES framework.

Realistic vs. Non-Realistic NMP Systems A substantial body of recent research (Carot ˆ and Werner 2007; Renaud, Carot, ˆ and Rebelo 2007) classifies NMP systems by considering their approach to dealing with audio latency, thereby distinguishing between “realistic” NMP solutions and “latencyaccepting” solutions. The first category refers to systems that aim to provide low-latency conditions comparable to co-located performances. The distinguishing characteristic of such systems is that the audio latency between performers is kept below the so-called Ensemble Performance Threshold (EPT), which has been psychoacoustically measured and estimated to be around 25 msec (Chafe et al. 2004). The second category of NMP systems refers to solutions that accept compromises in audio latency. These latency-accepting solutions are anchored either in investigating how well users can adapt to the introduced latencies or in exploring how to creatively manipulate latencies in experimental music performances (Tanaka 2006). The first systems to take advantage of the reliability of the Internet2 backbone in order to conduct realistic NMP experiments were the Jacktrip application developed by the SoundWire research group at Stanford University’s Center for Computer Research in Music and Acoustics (Caceres and Chafe ´ 2009) and the Distributed Immersive Performance (DIP) project at the Integrated Systems Center of the University of Southern California (Sawchuck et al. 2003). Both these systems focus on high-quality and low-latency audio and video stream exchange. Jacktrip has recently become freely available and is being used by the majority of network musicians. DIP focuses on transmitting 3-D audio by involving relevant sound-localization processing techniques.

Alexandraki and Akoumanakis

67

Although technically competent, neither of these systems integrates different communication channels (audio, video, and chat) and collaboration practices (community awareness, score manipulation, etc.) in a single software application. As a result, they require extra effort of the performers to cope with the graphic representations of the various running programs that are necessary for efficient multimodal communication and collaboration. In multipart performances, this task may be considerably bothersome. As soon as realistic NMP was ensured by successfully supporting high-quality and within-time-limits exchange of live audio streams, the research focus appeared to move in new directions. For example, substantial work is being devoted in integrating compression codecs in NMP systems. Examples of such work are the Soundjack application, which uses the Fraunhofer Ultra-Low Delay (ULD) Codec (Kramer ¨ et al. 2007), and the exploitation of the WavPack codec by researchers at the Technical University of Braunschweig (Kurtisi and Wolf 2008). Increasingly, the development of new audio codecs is taking into account the real-time requirements of NMP systems. The recently released Constrained-Energy Lapped Transform (CELT) codec (http://www.celt-codec.org) provides evidence of this tendency. At the other end of the spectrum, non-realistic NMP approaches handle latency by requiring some or all of the musicians participating in a performance session to adapt to their auditory feedback being delayed with respect to their motor-sensory interaction with their musical instruments. Systems of this kind, though less interesting academically, form the bulk of the currently popular solutions for NMP. Representative examples are eJamming AUDiiO (http://www.ejamming.com) and Ninjam (http://www.ninjam.com). The former company has recently released versions that claim to minimize latencies, whereas the latter system adopts an approach of increasing latencies even further and requires performers to adapt to performing one measure ahead of what they are hearing. With respect to adapting to latency, significant research has recently been carried out in the neurological domain to investigate the relationships

A further classification of NMP systems may be made on the basis of their architectural models. Currently, most NMP systems, both realistic and latency-accepting, adopt the peer-to-peer communication model. The alternative to this type of communication is to feature a streaming server as an intermediate node in the communication of the audiovisual streams. The Ninjam Project provides an example of an NMP system that uses a streaming server. In this project, each performer’s audio stream is compressed at the performer’s site and then sent to the streaming server. The server does not perform any stream processing, and therefore increased CPU power is not a requirement. However, the server essentially provides increased outbound network bandwidth compared to the outbound bandwidth of each of the connected clients. It is worthwhile considering that, as demonstrated in Figure 1, if N performers are using direct peerto-peer connectivity, then each performer needs to receive N–1 streams and transmit N–1 copies of the stream generated on-site. However, if a server is utilized, then each peer transmits only one stream (to the server) and receives N–1 streams (from the server). This second communication scheme is more affordable, since in conventional networks the upload speed is much lower than the download speed. Moreover, if the server is located

68


between auditory feedback and motor interactions in music performance (Zatorre, Chen, and Penhune 2007). These relationships are being studied for different kinds of music, which are characterized by the speed with which pitches and rhythms change. As first indicated by Lazzaro and Wawryznek (2001), because the pipe organ has a sound-generation latency on the order of seconds, delays may be tolerable even in high values, depending on the participating instruments and the kind of music performed. However, although musicians can learn to adapt to constant latencies, they cannot adapt to varying latencies, caused by network jitter, which is a reason some approaches prefer to further increase latency so as to reach more deterministic values. Architectural Challenges: Peer-to-Peer vs. Star Topologies

Figure 1. Peer-to-peer vs. star topologies in NMP.

on a privileged network, with sufficient outbound bandwidth in addition to inbound, then the entire communication process is made less burdensome. Additionally, if the incoming audio streams are mixed at the streaming server’s site, then even the outbound bandwidth of the server is further reduced. Audio stream mixing can take into account performers’ requirements to individually control the relative levels of the other participants, and it can potentially incorporate spatial rendering of the various instruments. As will be seen in the sections that follow, DIAMOUSES experiments may optionally incorporate a streaming server. The reason for this is that the framework exploits an experimental digital TV platform, which allows for broadcasting network music performances. In such cases, the role of the streaming server is to mix and multiplex the audio streams arriving from dispersed musicians into a single stream that forms the input to the digital TV broadcasting center. Collaboration Toolkits for NMP A different aspect of NMP relates to the concern that network communication will never be as good as face-to-face communication and therefore that instead of trying to emulate physical proximity, virtual collaboration environments should try to augment user experience and foster a cooperative perspective on music expression (Wright 2005). In this direction, a variety of challenges may be exploited, such as privacy and anonymity, asynchrony, enabling collaboration among people who have never met, and so on. A direct benefit of this is the

emergence of new forms of music expression enabled by cultural exchanges among different music traditions, resulting in enhanced creativity through collective experience and practice. In this respect, the Open Sound Control protocol has provided a number of possibilities for allowing music performance, composition, and collaboration as well as alternative types of musical interaction through the exchange of low-bandwidth control data instead of high-quality audio streams (Wright 2005). For example, the Quintet.net project (Hajdu 2005) and the Auracle software application (Freeman et al. 2005) present interaction practices that rely on network communication. One of the features of the Quintet.net environment is that it allows a conductor to synchronously send performers directives in the form of musical notation, therefore presenting an alternative form of music improvisation that is not possible otherwise. On the other hand, Auracle is a Web-based application that allows anyone, musician or non-musician, to control a networked sound instrument with their voice. In light of this, the integration of existing software development toolkits in NMP applications can be remarkably creative for a number of research purposes, such as computer-assisted music composition, learning, and audience involvement. Such toolkits include the JMusic toolkit for conventional music notation (http://jmusic.ci.qut.edu.au/), the JSyn toolkit (http://www.softsynth.com/jsyn/) for sound-synthesis algorithms, the Phidgets hardware and application programming interface (http://www.phidgets.com/) for sensor-based interaction in musical instrument learning, and so on. Although generic environments such as Max/MSP


69

(http://www.cycling74.com/) provide the ability to develop external objects that implement specific functionalities such as score editing (Didkovsky and Hajdu 2008) or gesture recognition (Jensenius, Godøy, and Wanderley 2005), these environments require some specialized knowledge on the part of the end user. Research Focus From the discussion thus far, it becomes evident that NMP is an active research area catalyzed by existing knowledge in psychoacoustics and by progress in telecommunications, multimedia, and user-interface software technologies. Nevertheless, the synergistic exploitation of these advancements has yet to materialize. To this end, DIAMOUSES adopts a more-encompassing research agenda by aiming for a global approach to NMP across a variety of virtual settings. This translates to addressing issues under three broad and interrelated research directions. The first direction involves the conception of a hybrid architectural abstraction that will enable a variety of NMP scenarios with different requirements. The second targets the development and integration of suitable software toolkits that facilitate collaborative music practice across the different virtual settings designated by the envisaged scenarios. Finally, the third direction is concerned with advancing an alternative community-based model that frames NMP as a distributed collective practice and capitalizes upon the collective intelligence of the music performers.

Methodology

ductors, composers, sound engineers, and the general public/audience). Subsequently, reference scenarios were devised to provide contexts for design. The questionnaires were designed to familiarize users with the concept of NMP and to investigate their requirements in the hypothetical scenario of being physically separated from their peers and having to collaborate through a robust NMP infrastructure. It was anticipated that user requirements would vary greatly, depending not only on user roles, but also on the kind of music with which the users were acquainted. Consequently, the requirement analysis distinguished among different music genres and provided insights into the diversity of the collaboration practices taking place for different kinds of music. The questionnaires contained questions aimed to elicit users’ subjective opinion on various issues, including facilitation of visual and auditory cues for synchronizing with their peers during performance, preference in sound-monitoring facilities, use of music scores or alternative musical notation schemes, use of music controllers, etc. The key findings from this line of research, elaborated in previous works (Alexandraki and Kalantzis 2007; Alexandraki et al. 2008), were used to devise three reference scenarios for the purposes of informing technical pilot developments. These scenarios—a jazz rehearsal, a live electroacoustic performance, and a piano lesson—were chosen to allow exploration of the diversity that exists in musical interactions across dissimilar collaboration purposes, music genres, and technological infrastructures. Table 1 summarizes the main attributes of each scenario. The technical details, as well as the findings from experimenting with these scenarios, are elaborated upon in the section titled “Pilot Developments and Technical Assessment.”

DIAMOUSES was conceived as an exploratory effort aiming, on the one hand, to unfold requirements across radically different contexts of NMP use and, on the other hand, to explore technical alternatives through pilot developments. To gain the required insight, a survey was conducted using qualitative instruments, including questionnaires, site visits, and semi-structured interviews with representatives of the target end-user community (i.e., performers, con-

The overall architecture of the DIAMOUSES framework is depicted in Figure 2. The central entities of the distributed environment are the DIAMOUSES Collaboration Server, the DIAMOUSES Streaming Server, and the DVB-T Broadcast Center. (DVB-T refers to Digital Video Broadcasting – Terrestrial.)

70


DIAMOUSES Architecture

Figure 2. The overall architecture of the DIAMOUSES environment.

Table 1. DIAMOUSES Reference Scenarios Reference Scenario

Music Genre

User Role(s)

Streams Exchanged

Music rehearsal

Jazz

Performers (two or more)

Live performance

Electroacoustic

Piano Lesson

Classical

Networking

Streaming Server?

Audio Video

LAN

Yes

Performers (two or more)

MIDI Video

WLAN

Yes

Audience

Audio Video

WLAN, WAN, DVB-T

Teacher Student

Audio Video

ADSL

As DIAMOUSES is designed to facilitate NMP community support functions, it exploits a customized version of a Java-based open-source content management system to realize communityoriented policies such as registration, role-taking, asynchronous notification tasks,and information-

Yes

sharing. Through a dedicated portal offered by the DIAMOUSES Collaboration Server, users obtain roles such as performers or as the audience of a digital TV transmission. The users accordingly make use of downloadable software components that facilitate the collaborative engagement of peers in


71

Figure 3. Configuring the DIAMOUSES Streaming Server.

synchronous sessions realizing the designated NMP scenarios. During a synchronous NMP session, each music performer, represented in the bottom left of Figure 2, is equipped with a set of hardware devices (a microphone, speakers or headphones, a camera, possibly MIDI controllers, and a personal computer). Performers communicate through the network connection of their personal computer, where a two-layered software application is executed. The component at the bottom layer, denoted as NMP API (for “application programming interface”), is responsible on the client side for the exchange of live audiovisual streams. The component denoted as Collaborative Session Manager implements the synchronous groupware functions (i.e., session management, replication, floor control) and the graphical user interface (GUI). The DIAMOUSES Streaming Server is an optional component because the NMP API can be set up to communicate audiovisual streams either through this server or directly among communicating peers. Therefore in DIAMOUSES, the architectural model may be changed on occasion. There are two reasons for communicating through the DIAMOUSES Streaming Server: either to take advantage of its potentially increased outbound network bandwidth or to avoid the replication of necessary stream processing on multiple peers. For the digital TV broadcast, DIAMOUSES exploits an existing platform for digital terrestrial transmission. This platform is used for live broadcasts of distributed performances as well as for interactive video-on-demand (VOD) services offered

through a low-bandwidth return path (Pallis et al. 2004). For broadcasting, the distributed performers’ individual audio and video streams arriving at the DIAMOUSES Streaming Server are multiplexed and sent as a combined performance (a single stream) to the Multimedia Server of the DVB-T platform. The Multimedia Server then de-multiplexes the stream and supplies the resulting individual streams to the rest of the DVB platform. These streams are then encoded to the appropriate formats for HDTV and transmitted to the network of digital-TV viewers.

72


Implementation The entire framework is currently implemented on the Linux platform. The following subsections provide details about the implementation of the main components of the DIAMOUSES architecture. DIAMOUSES Streaming Server The DIAMOUSES Streaming Server is a customised version, implemented in C++, of the Apple Open Source Darwin Streaming Server (Apple Inc. 2009). The customizations of Darwin for the DIAMOUSES environment include functionality not originally supported by Darwin, such as audio stream mixing, stream multiplexing and de-multiplexing, and storage of the live audio, video, and MIDI streams in appropriate multimedia files. This functionality has been implemented as server extensions and can be activated or deactivated through appropriate XML files. Figure 3 presents an extract of such a Darwin

Table 2. The Third-Party Libraries Used in Implementing the NMP API Library name

Functionality

URL

PortAudio PortMIDI

http://www.portaudio.com/ http://www.cs.cmu.edu/∼music/portmusic/

JRTPLIB

Capture and playback of audio streams Routing of MIDI data to compliant software and hardware modules RTP communication

UPnP SDK

Capture and playback of video streams

configuration file. The parameters that configure the DIAMOUSES server extensions use the prefix “DMSS.” NMP API As presented in Figure 2, a two-layered software application is executed at each player’s site during the course of a synchronous NMP session. The component at the bottom layer, denoted “NMP API,” is implemented as a reusable C++ API and represents the client software’s core functionality. Specifically, this API configures network and hardware/driver parameters; triggers audiovisual stream capture, stream transmission to the network, and stream reception from the network; and activates or deactivates stream reproduction. In DIAMOUSES, audio stream exchange uses the Realtime Transport Protocol (RTP) on top of the User Datagram Protocol (UDP). RTP streams may carry audio, video, and MIDI information. The audio streams are uncompressed PCM audio, sampled in stereo at 44,100 Hz with 16-bit resolution, with the possibility to raise the quality to 48,000 Hz, 32-bit stereo. Audio buffer size can be set as low as 64 samples per channel, which corresponds to a blocking delay of 1.3–1.5 msec, depending on the sampling rate. Audio compression was not originally considered in DIAMOUSES, in an attempt to eliminate the processing latencies. However, our current research efforts seek to provide an appropriate audio encoding scheme that can reduce the required network bandwidth without introducing significant delays. As for the video streams, DIAMOUSES uses Axis network cameras (http://www.axis.com), providing

http://research.edm.uhasselt.be/∼jori/page/ index.php?n=CS.Jrtplib http://upnp.sourceforge.net/

low-resolution MJPEG over HTTP and MPEG-4 over RTP. We used MJPEG at a bit rate of 640 kbps. MIDI streams are captured from pluggable MIDI controllers and reproduced on MIDI sound modules, through appropriate USB-to-MIDI interfaces. Table 2 presents the third-party libraries that were used to implement the stream functionality. Figure 4 shows a running instance of this API, using a basic GUI to illustrate this functionality. The client API does not assume a connection to any server. Therefore, it allows the communication model to be changed from star to peer-to-peer and vice versa. Although this is not evident from Figure 4, the communication model can be changed simply by defining the IP addresses of the other peers instead of the IP address of the streaming server as remote hosts. The NMP API can be used for the development of GUIs that are more advanced than the one shown in Figure 4, allowing the client application to support different collaboration practices. For example, the GUI shown in Figure 5 was developed for a multipart music rehearsal. It uses the NMP API for fast exchange of live audiovisual streams, and the DIAMOUSES Collaboration Server for supporting users’ collaboration through well-defined interaction practices. Such practices allow performers to see who is participating in the rehearsal, to exchange instant messages, to activate or deactivate visual communication, to start an acoustic metronome, etc. GUIs are implemented using open-source Java toolkits, and Java Native Interface (JNI) technology is used for embedding the NMP API functionality. These more advanced interfaces that require communication with the DIAMOUSES Collaboration Server form the upper component of the client software, denoted as Collaborative Session Manager in Figure 2. Alexandraki and Akoumanakis

73

Figure 4. Basic graphical user interface demonstrating the functionality of the NMP API.

Figure 5. Graphical user interface for a rehearsal scenario.

Figure 4.

Figure 5.

74


Community Support Functions and Collaboration Server As already stated, the DIAMOUSES Collaboration Server supports two distinct and interrelated functionalities: (1) it offers asynchronous collaboration practices through a dedicated portal; and (2) it provides performers with session state information required by the replicated synchronized objects that allow for synchronous collaboration practices to be employed. The DIAMOUSES Collaboration Server is based on a customized version of the Liferay (http://www.liferay.com) content management system, which is the primary community support system. This takes the form of a portal set-up with augmented functionality. Specifically, we have implemented a variety of customized portlets to suit various asynchronous tasks, such as registration, role-base access rights, and notification services (which extend invitations to group members and notify them of progress made). The DIAMOUSES room (Alexandraki and Valsamakis 2009) is a useful abstraction for synchronizing activities within each of the three scenarios listed in Table 1. The primary function of DIAMOUSES rooms is to act as logical hosts of synchronous activities such as music rehearsals, live performances, and music lessons. Creating, scheduling, and declaring participation in a DIAMOUSES room is achieved through the portal. Once registered to participate in a music performance room, users are prompted to download and set up the client software, which comprises the NMP API and a GUI designed for the particular type of DIAMOUSES room. For example, for a music rehearsal room, the downloaded application will have the interface depicted in Figure 5. Another user interface, for music learning, is shown in Figure 6. This GUI uses the collaboration server to support collaboratively editing, annotating, and highlighting a score and sharing other multimedia material such as audio and video clips or various electronic scores. The editable score is replicated across the connected participants and remains synchronized at all times. Access to this object by dispersed members is coordinated by a floor manager. Once the floor is requested and occupied by a user, the object becomes editable

for the specific user. The rest of the participants are informed of modifications to this object once floor control is released. The data required for implementing the functionality of these collective practices is exchanged through a servlet deployed on the Collaboration Server. This servlet accepts HTTP and SOAP requests processed by the Apache Axis2 framework (http://ws.apache.org/axis2/). Details of this toolkit are presented in Akoumianakis et al. (2008).

Pilot Development and Technical Assessment This section portrays the three experiments that were outlined in the Methodology section. These experiments were used as pilots to investigate both technical efficiency and user experience. The technical assessment was based on a number of network quality-of-service (QoS) measurements. The measuring process involved the transmission of scheduled ping requests, of appropriate packet sizes, sent from the server to each of the connected network nodes during the course of the experiment. The statistics that were recorded consisted of (1) the average round trip times (Avg RTT), (2) their mean deviation (MDev RTT), which provides a representation of network jitter, (3) the percentage of lost network packets, and (4) the average outbound bandwidth of the DIAMOUSES Streaming Server. User experience, on the other hand, was assessed by interviewing the users after the completion of each scenario. The framework supports collaboration among multiple performers. However, to simplify the measuring processes, all three experiments involved only two dispersed performers. Jazz Rehearsal Scenario The first scenario concerns the rehearsal of a jazz duet (double bass and electric piano) performing through a 100-Mpbs local-area network (LAN). The communication among musicians, shown in Figure 7, was based on high-quality audio and lowquality video streams. Audio was exchanged through the DIAMOUSES Streaming Server, whereas the video streams were delivered directly from one musician to the other.


75

Figure 6. Graphical user interface for a learning scenario.

Table 3 summarizes the network measurements that were taken during the jazz rehearsal scenario. Because latency and packet loss are insignificant, it can be discerned that the DIAMOUSES framework performs fairly well in local-area networks. With respect to user experience, both performers were satisfied with the result and they felt that overall they could perform as well through DIAMOUSES as when they are collocated. When the live audio streams were raised from CD quality (44.1 kHz, 16-bit resolution, stereo) to 48 kHz sampling rate, 32-bit resolution, stereo, both performers felt that there was a significant improvement in

the way they could sense each other’s performance. These particular performers did not need to use visual communication through video. They said that they often keep their eyes closed when performing, which helps them concentrate on their musical output. They did not use the score-scrolling functionality, nor did they require a metronome.

76


Live Electroacoustic Music Performance Scenario The second scenario involved a structured improvisation of two electroacoustic music performers

Figure 7. Network diagram for the jazz rehearsal scenario.

Table 3. Summary of QoS Measurements for the Jazz Rehearsal Scenario Audio (CD quality) Packet Size (bytes) Avg RTT (msec) MDev RTT (msec) Avg Packet Loss (%) Avg Outbound Server Bandwidth

Video

268 1036 0.340 0.668 0.250 0.159 0.000 0.885 5.82 Mbps

connected through Wireless LAN, with a live broadcast to an audience of DVB-T viewers. The musicians communicated through MIDI streams and low-quality video. Each performer used a keyboard controller connected to a laptop running a Max/MSP patch. The keyboard controlled the patch, which used extensive sound manipulation techniques and processing. MIDI data was sent from the Max/MSP patch to a DIAMOUSES client laptop through appropriate USB-to-MIDI interfaces. The DIAMOUSES client laptop was

transmitting and receiving the MIDI streams communicated from and to the Max/MSP patch respectively. As shown in Figure 8, the communication with the DVB-T Broadcast Center was unidirectional, carrying two MIDI streams and two low-quality video streams. To avoid the need for stream synchronization at the receiving end, these four streams were multiplexed at the streaming server before being sent to the DVB-T platform. The two servers were connected through the GRNET2 (http://www.grnet.gr) wide-area network (WAN), which is a high-speed (up to 2.5 Gbps) optical-fiber network connecting various research and academic institutions in Greece. Because the terrestrial transmission of digital TV does not inherently offer a return channel, viewers could not interact with the content of the broadcast stream. However, some interactivity was granted to the broadcaster: during the live broadcast, a dedicated application running on the “Multimedia Server” allowed the video display to be changed from showing one performer to showing the other or both of them.


77

Figure 8. Network diagram for the live electroacoustic music performance scenario.

Table 4. Summary of QoS Measurements for the Live Electroacoustic Performance Scenario

Table 4 summarizes the measurements that were taken from the performer who was sending MIDI to the server through the wireless connection (the performer to the right in Figure 8). The required outbound bandwidth of the server was comparable to that of the first scenario, due to video transmitted through the server, as well as to the fact that the server was sending the combined performance of the two musicians to a third node, i.e., the multimedia

server of the DVB-T platform. A further conclusion is that although the wireless network did not affect the quality of the delivered MIDI streams (no packet loss), the latency introduced in the delivery of the MIDI stream was significantly higher than in the first experiment. With respect to user experience, this experiment was more challenging. The primary concern of both performers was that they were not confident that the other performer and the audience were receiving the same sound feedback that they were. This concern was related to the fact that the auditory communication was based on MIDI messages controlling parameters of each other’s patch. In other words, MIDI was used as a control protocol, not as a sound representation. Additional observations were that both performers had a strong requirement for visual communication and that the physical separation and the absence of any form of feedback from the audience obstructed

78


MIDI Packet Size (bytes) Avg RTT (msec) MDev RTT (msec) Avg Packet Loss (%) Avg Outbound Server Bandwidth

Video

268 1036 2.987 0.666 3.000 0.163 0.000 2.238 5.97 Mbps

Figure 9. Network diagram for the piano lesson scenario.

the performers from having the feeling of a live performance.

Table 5. Summary of QoS Measurements for the Piano Lesson Scenario Audio (CD quality)

Piano Lesson Scenario The third scenario, shown in Figure 9, concerned a piano lesson over the Internet. Although the teacher and the student were in fact playing electronic stage pianos, they could have been playing any acoustic instrument, because audio data was transmitted rather than MIDI. The communication between the music teacher and her student was based on an ADSL connection reaching 24 Mbps, through which CD-quality audio and low quality MJPEG video were transmitted. Network measurements are summarized in Table 5. It appears that a commonly used network infrastructure introduces significant latency and data loss. The latency of 63 msec round trip (31 msec one way) in the transmission of the audio streams is slightly higher than the EPT (described earlier under Realistic vs. Non-Realistic NMP Systems). Furthermore, the packet loss

Packet Size (bytes) Avg RTT (msec) MDev RTT (msec) Avg Packet Loss (%) Avg Outbound Server Bandwidth

Video

268 1036 63.510 38.125 49.392 14.834 3.984 12.754 4.20 Mbps

in the audio streams resulted in considerable sound distortion, which distracted the student and the teacher from realizing one another’s interpretation of the piece. In contrast, the video distortion introduced by network packet loss was tolerable. This scenario was significantly different from the other two in terms of the collaboration practices involved. The two musicians did not attempt to simultaneously perform the piece until they were asked to do so. In normal, collocated piano lessons, the teacher and the student are unable


79

to simultaneously perform the same piece while watching each other, but in this scenario they both discovered that simultaneous performance can greatly help student progress. However, due to high latency and jitter in the communication, the teacher noticed that the tempo of the student performance was unstable, and she was unsure whether this was due to the student’s occasionally losing the tempo or due to network lags. When using the DIAMOUSES metronome, the teacher and the student felt that they could more easily lock to a common tempo. The score-scrolling and annotation functions were greatly appreciated. In particular, the two participants felt that the GUI improved their communication, as they could simultaneously see the music score and the video image of each other on the same display, although they felt that the coverage of the video camera was small. This type of collaboration cannot be easily achieved by combining different software applications, which is the most common approach nowadays in network performances. Consolidation of Recent Experience Our pilot scenarios and their assessment thus far indicate that DIAMOUSES offers primary technical improvements for networked music performance as well as secondary benefits driving ongoing and future research. Regarding the former, by providing an integrated frame of reference, DIAMOUSES alleviates the challenges resulting from the lack of interoperability between networked music performance tools. Specifically, at present network music performances are facilitated through the use of a collection of different tools that enable communication through different sensory modalities and different collaboration practices. DIAMOUSES integrates such capabilities in a single customizable framework, one that is flexible enough to support different music performance scenarios and abstract enough to allow co-engagement in music among different user perspectives (composers, performers, and audience). In terms of secondary contributions and benefits, DIAMOUSES attempts to provide insights into bridging the divide between community man-

80

agement functions (i.e., creating and maintaining community, and sharing objects, artifacts, and social interactions) and the actual practice of networked music performance (making sense of music materials, performing music). This makes it possible to codify, assess, and capitalize upon individual performances, but also to appropriate the collective intelligence of the group as it emerges during asynchronous and synchronous collaboration sessions. DIAMOUSES offers various mechanisms for articulating collective intelligence of this type. For instance, during asynchronous collaboration the room (in the portal) hosts the group’s social interaction and history of co-engagement. The latter is typically codified as shared music scores, prior rehearsals, instructions, background/reference material, etc. In synchronized sessions of networked music performance, many of these artifacts are rendered into the performers’ client toolkit for reference and reflection. Additionally, as the music score is replicated across sites in synchronous sessions, its online manipulation by members represents a socially reconstructed resource that drives the collective achievement (i.e., rehearsed music performance) of participants. Phrased differently, DIAMOUSES advocates the notion of networked music performance as a socially relevant and “living” form of cultural expression (Tanaka 2006), one that is informative of historical context as well as of the intellectual, emotional, and physical interaction of which we otherwise might not have been fully aware (Sterne 2006). Although our recent evaluation efforts were aimed at verifying the technical feasibility of DIAMOUSES, thus not offering substantial insight into social and cognitive issues, ongoing research seeks to provide more elaborate accounts of how the collective intelligence of networked music performers drives their physical actions as well as their online behavior.

Discussion and Concluding Remarks NMP is a multifaceted endeavor with many open research challenges across various disciplines. In this article, we have attempted to portray the


current status of NMP research and to sketch current challenges and new perspectives as revealed through collaborative research and development. The development of the DIAMOUSES framework, presented in the main body of the article, provided the baseline for experimenting with different NMP scenarios, each posing essentially different requirements. Empirical evidence indicates both the technical capability of this framework to cope effectively with the designated requirements as well as its scalability to facilitate novel NMP contexts. Consequently, the contributions of the present work can be summarized as follows. First, with regard to network topologies, DIAMOUSES presents the possibility of adopting a hybrid architectural model that allows different network nodes to exchange live streams either directly (peer-to-peer) or through a streaming server (star topology) in which audio mixing may be activated or deactivated on occasion. Going even further, in DIAMOUSES a certain performing session may adopt the peer-to-peer communication model in the exchange of a particular type of stream (e.g., video) and the star topology in the exchange of another type of stream (e.g., audio). In addition, some of the connected musicians may communicate their streams directly while others communicate through the server. This allows exploration of the dynamics of music performance ensembles through various network communication schemes. As for user interaction and virtual collaboration, the focus of DIAMOUSES is to allow integration of various toolkits, which are suitably extended beyond their original capabilities in order to support the interactive manifestation of collaborative music objects. These objects are designed to enable making sense of the shared virtual space as well as to facilitate enactment, transmission, processing, and reconstruction in virtual settings of a variety of established music practices (i.e., collaborative score editing in music composition, possibly motion capture and gesture analysis in musical instrument learning, pitch recognition for student evaluation, instant messaging, capturing or exchanging multimedia content, participation in discussion forums, and so on). To this end, an open and reusable API was developed to implement the required functionality

for timely communication of audiovisual streams. This functionality may be integrated with existing toolkits to accommodate requirements of different NMP scenarios. In light of this, DIAMOUSES makes a third and distinct contribution to the existing literature on NMP. This is the advancement of a novel community-based model for appropriating the benefits of collective intelligence during music performance. As discussed by Alexandraki and Valsamakis (2009), the popular media for creating and sustaining music communities are social networking sites and videoconferencing platforms (Anderson and Ellis 2005). Nevertheless, such settings offer very limited insight into the actual practice of collaborative music performance. DIAMOUSES makes inroads toward an alternative model, framing NMP as a distributed collective practice in virtual settings by bridging the gap between community management (fostered by the DIAMOUSES portal) and practice management (supported by the downloadable application suites). Finally, it is important to reflect upon our experience with user-based assessments, which have revealed some of the new challenges being addressed in ongoing research and development efforts. Specifically, one issue that stands out immediately is devising schemas for expressing semantic information that will reveal session state and network configurations in distributed music performance groups. The possibility of exchanging such information in a uniform manner through open standards will ease the integration of upcoming software technologies in NMP applications. Moreover, querying and retrieving shared audiovisual contents using such semantic descriptions and metadata will enhance the creativity of partners engaged in NMP scenarios.

Acknowledgments The DIAMOUSES project was implemented in the context of the Regional Operational Programme of Crete, and it was co-funded by the European Regional Development Fund (ERDF) and the Crete Region, coordinated by the General Secretariat


81

for Research and Technology of the Ministry of Development of Greece. We would especially like to thank our former students Panayotis Koutlemanis, Petros Gasteratos, Giannis Milolidakis, Dimitris Kotsalis, and George Vellis for their involvement in the implementation work.

Akoumianakis, D., et al. 2008. “Distributed Collective Practices in Collaborative Music Performance.” Proceedings of the 3rd International Conference on Digital Interactive Media in Entertainment and Arts. Athens: Association for Computing Machinery, pp. 368–375. Alexandraki, C., and I. Kalantzis. 2007. “Requirements and Application Scenarios in the Context of Network Based Music Collaboration.” Proceedings of the AXMEDIS 2007 Conference. Florence: Firenze University Press, pp. 39–46. Alexandraki, C., et al. 2008. “DIAMOUSES: An Experimental Platform for Network-Based Collaborative Musical Interactions.” Proceedings of the ICEIS 2008 International Conference on Enterprise Information Systems. Porto: INSTICC Press, pp. 30–37. Alexandraki, C., and N. Valsamakis. 2009. “Enabling Music Performance Communities.” In D. Akoumianakis, ed. Virtual Community Practices and Social Interactive Media: Technology Lifecycle and Workflow Analysis. New York: IGI Global Inc., pp. 376–397. Anderson, A. J., and A. Ellis. 2005. “Desktop VideoAssisted Music Teaching and Learning: New Opportunities for Design and Delivery.” British Journal of Educational Technology 36(5):915–917. Apple Inc. 2009. “Open Source – Server – Streaming Server.” Available on-line at http://developer.apple .com/opensource/server/streaming/index.html. Accessed 28 February 2009. Barbosa, A. 2003. “Displaced Soundscapes: A Survey of Network Systems for Music and Sonic Art Creation.” Leonardo Music Journal 13:53–59. Caceres, J. P., and C. Chafe. 2009. “JackTrip: Under the ´ Hood of an Engine for Network Audio.” Proceedings of the 2009 International Computer Music Conference. San Francisco, California: International Computer Music Association, pp. 509–512. Carot, ˆ A., and C. Werner. 2007. “Network Music Performance—Problems, Approaches and Perspectives.” Proceedings of the Music in the Global Village Conference. Available on-line

at http://globalvillagemusic.net/2007/wp-content/ uploads/carot paper.pdf. Accessed 18 February 2009. Chafe, C., et al. 2004. “Effect of Time Delay on Ensemble Accuracy.” Proceedings of the 2004 International Symposium on Musical Acoustics. Nara, Japan, pp. 207–211. Didkovsky, N., and Hajdu G. 2008. “MaxScore: Music Notation in Max/MSP.” Proceedings of the 2008 International Computer Music Conference. San Francisco, California: International Computer Music Association, pp. 483–486. Freeman, J., et al. 2005. “Auracle: A Voice-Controlled Networked Sound Instrument.” Organised Sound 10(3):221–231. Follmer, G. 2005. “Electronic, Aesthetic and Social Factors ¨ in Net Music.” Organised Sound 10(3):185–192. Hajdu, G. 2005. “Quintet.net: An Environment for Composing and Performing Music on the Internet.” Leonardo Music Journal 38(1):23–30. Hajdu, G. 2006. “Automatic Composition and Notation in Network Music Environments.” Proceedings of the 2006 Sound and Music Computing Conference. Marseille: Centre National de Creation Musicale, ´ pp. 109–114. Jensenius, A. R., R. I. Godøy, and M. M. Wanderley. 2005. “Developing Tools for Studying Musical Gestures within the Max/MSP/Jitter Environment.” Proceedings of the 2005 International Computer Music Conference. San Francisco, California: International Computer Music Association, pp. 282–285. Kramer, U., et al. 2007. “Network Music Performance ¨ with Ultra-Low-Delay Audio Coding under Unreliable Network Conditions.” Proceedings of the 123rd Audio Engineering Society Convention. New York: Curran Associates, pp. 338–348. Kurtisi, Z., and L. Wolf. 2008. “Using WavPack for Real-time Audio Coding in Interactive Applications.” Proceedings of the 2008 International Conference on Multimedia & Expo (IEEE ICME 2008). Hannover: IEEE Publishing, pp. 1381–1384. Lazzaro, J., and J. Wawrzynek. 2001. “A Case for Network Musical Performance.” Proceedings of ACM NOSSDAV [International Workshop on Network and Operating Systems Support for Digital Audio and Video] 01. Port Jefferson, New York: Association for Computing Machinery, pp. 157–166. Ng, K., and P. Nesi. 2008. “I-Maestro Framework and Interactive Multimedia Tools for Technology-Enhanced Learning and Teaching for Music.” Proceeding—Fourth International Conference on Automated Solutions for Cross Media Content and Multi-Channel Distribution,

82


References

Axmedis 2008. Florence: Firenze University Press, pp. 266–269. Pallis, E., et al. 2004. “Digital Switchover: An Alternative Solution Towards Broadband Access for All Citizens.” Proceedings of the 2004 International Conference on E-Business and Telecommunication Networks. Setubal, Portugal: INSTICC Press, pp. 31– ´ 40. Renaud, A., A. Carot, ˆ and P. Rebelo. 2007. “Networked Music Performance: State of the Art.” Proceedings of the AES 30th International Conference. Saariselka, ¨ Finland: Audio Engineering Society. Available on-line at www.aes.org/e-lib/browse.cfm?elib=13914. Sawchuk, A. A., et al. 2003. “From Remote Media Immersion to Distributed Immersive Performance.” Proceedings of the ACM SIGMM 2003. Workshop on

Experiential Telepresence. New York: ACM Press, pp. 110–120. Sterne, J. (2006). “The MP3 as Cultural Artifact.” New Media and Society 8(5):825–842. Tanaka, A. 2006. “Interaction, Experience, and the Future of Music.” In K. O’ Hara and B. Brown, eds. Consuming Music Together: Social and Collaborative Aspects of Music Consumption Technologies. Dordrecht, Netherlands: Springer, pp. 267–288. Wright, M. 2005. “Open Sound Control: An Enabling Technology for Musical Networking.” Organised Sound 10(3):193–200. Zatorre, R. J., J. L. Chen, and V. B. Penhune. 2007. “When the Brain Plays Music: Auditory–Motor Interactions in Music Perception and Production.” Nature Reviews Neuroscience 8(7):547–558.


83