11th IEEE Symposium on Object Oriented Real-Time Distributed Computing (ISORC)
Realization of an Adaptive Distributed Sound System Based on Global-time-based Coordination and Listener Localization Emmanuel Henrich1 Panasonic Avionics Corporation, USA
[email protected]
Juan A. Colmenares2 UC Irvine, USA
[email protected]
Keizo Fujiwara1 Microsoft Japan
[email protected]
Chansik Im1 Intel Corporation, USA
[email protected]
K. H. (Kane) Kim UC Irvine, USA
[email protected]
Liangchen Zheng UC Irvine, USA
[email protected]
Abstract
1.1 TCoDA
This paper discusses the benefits of exploiting 1) the principle of global-time-based coordination of distributed computing actions (TCoDA) and 2) a high-level component-/object-based programming approach in developing real-time embedded computing software. The benefits are discussed in the context of a concrete case study. A new major type of distributed multimedia processing applications, called Adaptive Distributed Sound Systems (ADSSs), is presented here to show the compelling nature of the TCoDA exploitation. High-quality ADSSs impose stringent real-time distributed computing requirements. They require a global-time base with precision better than 100 µs. For efficient implementation, the TMO programming scheme and associated tools are shown to be highly useful. In addition, a prototype TMO-based ADSS has been developed and its most important quality attributes have been empirically evaluated. The prototype ADSS has also turned out to be a cost-effective tool for assessing the quality of service of a TMO execution engine.
One engineering principle that has great potential for having positive impacts on the QoS of quite a few major types of multimedia applications is the principle of global-time-based coordination of distributed computing actions (TCoDA) [Kop97]. A global-time base should provide information on the current time such that it can be referenced from anywhere within the distributed computing (DC) systems with well-understood error bounds. A global-time base is an abstract entity, which is approximated by local clocks in DC nodes with a known error bound [Kop97]. To keep the approximation errors within a target bound, distributed clocks are synchronized periodically. When TCoDA is exploited, a module X.1 running on node X may be designed to take the action AX.1 during the time-window [t1, t2] while the module Y.2 running on node Y may be designed to take the action AY.2 during [t3, t4] if a certain logical condition is satisfied before t3. For example, two DC nodes can be designed to simultaneously start at 10am to take a certain course of actions without exchanging any synchronization messages if they observe certain conditions by 9:59am. Regardless of whether the two nodes are separated by one meter or 100 miles, they will take actions in good synchrony at 10am. The TCoDA principle was advocated by Hermann Kopetz more than 20 years ago [Kop87, Kop97], but only a small fraction of the potential of TCoDA in the field of DC has been realized. Compared to asynchronous hand-shaking message-based coordination, the TCoDA approach can be significantly more efficient and reliable, especially in environments with large communication delays. Establishing a global time base of sufficiently high precision for various applications is no longer
Keywords: TCoDA, TMO, real-time, multimedia, global time, quality of service, distributed computing, synchronous playback, software components
1
Introduction
As the variety and sophistication of multimedia processing applications have been growing steadily in the past decade, the concern with the quality of service (QoS) of these applications has also been growing [Fah00, Cha03]. Customers' awareness of the application-level QoS has become quite keen.
1 2
Participated in the work reported here when they were students in the DREAM Laboratory, EECS Department, UC Irvine. Also with the Applied Computing Institute, School of Engineering, University of Zulia.
978-0-7695-3132-8/08 $25.00 © 2008 IEEE DOI 10.1109/ISORC.2008.87
91
difficult and the costs are expected to fall continuously [Kop87, Kop97, Kim02].
Controller node
Global Time reference
1.2 Overview of the ADSS developed
GUII SONG 1 guitar.wav piano.wav bass.wav drum.wav
The TCoDA principle has been exploited in LAN recent years in several major types of multimedia processing applications (e.g., audio- and videoPlayer steaming [KDH02, Kim05b], and on-line games nodes [KSJ05]). We also applied it in developing a (Minidigital music ensemble application [Kim05], also ITX) called a distributed sound system (DSS). In this application a set of single-board computers Mic Mic Mic equipped with speakers act as a music band in Mic which each member plays a different instrument or vocal chord of a song (see Figure 1). The Speaker Speaker Speaker Speaker player nodes receive audio streams supplied by drum.wav guitar.wav bass.wav piano.wav the controller node, which possesses massive data storage. The key factor determining the Figure 1. A configuration of a distributed sound system quality of this ensemble is the degree of synchrony among the players' actions, essential substantially. The resulting system is called here an to the harmony of the music. adaptive digital sound system (ADSS). The ADSS can be In this paper we present a major extension of the viewed as a new major type of multimedia processing digital music ensemble application in which the applications. The important measures of the QoS for this exploitation of the TCoDA principle is even more application are: 1) the degree of synchrony among the compelling. This extension involves adding a players, 2) the latency of the music playback incurred microphone to each player node and operating the after the user orders a new music play, and 3) the network of microphone nodes to localize the origin of a precision of the sound localization. special sound. The approach of using a microphone array In order to facilitate exploitation of TCoDA in for localizing the sound source was discussed in [Bra97], efficient manners, research is also needed to establish but that approach involved a centralized computer to effective programming models, associated application which an array of microphones was connected as programming interfaces (APIs), and execution engines. peripherals. Here we are using multiple computers A concrete programming model, formulated by co-author connected via an Ethernet local area network (LAN). Kane Kim together with his research collaborators, for One advantage of this DC approach is the potential for facilitating the TCoDA design paradigm is called the adopting wireless LAN technologies when their QoSs TMO (Time-triggered Message-triggered Object) become sufficiently high, which will enable elimination programming and specification scheme [Kim97, Kim00]. of all cables running across the room. Sound-source TMO combines the complexity management benefits of localization is performed by comparing the times at the object-oriented structuring paradigm with the ability which the sound reaches different microphone nodes. to explicitly specify temporal constraints in terms of Considering the relatively high speed of sound (i.e., global time in natural forms. A TMO execution engine 343 m/s or roughly 1.1 ft/ms), it is evident that a consists of hardware, a widely used operating system global-time base of high precision (at least (OS) kernel, and an adaptation of TMOSM (TMO 100-microsecond-level precision) is needed to achieve the Support Middleware), which is a middleware model sound-source localization with one-foot-level precision. devised to support TMOs. TMO execution engines3 have The microphone network can be used to localize a been developed on the basis of several major OS kernel target listener who signals his/her position by generating platforms (e.g., Windows XP, Windows CE, and Linux) a special sound, such as a hand-clapping sound. It is then [Kim02, Jen07]. An execution engine honors the possible to compensate for the undesirable effects of the temporal constraint specifications by managing execution sound propagation in the air, e.g., attenuation and resources judiciously. In addition, the TMOSL (TMO propagation delay, for this particular target listener. Support Library), consisting of C++ classes that wrap the Thus, by integrating the microphone network with the services of the middleware and collectively serve as an existing music ensemble application we can realize a system that offers a highly optimized music experience to 3 the listener. That is, the application QoS is increased Available at http://dream.eng.uci.edu/TMOdownload
92
approximation of an idealistic TMO programming language, has been developed. The TCoDA approach can be easily implemented by using the TMO programming scheme. Several types of multimedia applications have been developed in the form of TMO networks to validate the TCoDA approach and demonstrate its effectiveness [KDH02, Kim05, Kim05b, KSJ05]. The ADSS discussed in this paper has turned out to be a cost-effective means of evaluating the QoS of the middleware supporting TCoDA. The quality of the ADSS is tightly tied to the QoS of the middleware. Among the most important factors impacting the degree of achievable synchrony of the players and the precision of the sound localization is the precision of the global-time base established across the network. The paper starts with a brief discussion on how the TCoDA principle is exploited in an important distributed multimedia processing application, i.e., the ADSS. Section 3 describes various techniques used for the realization of the ADSS. Section 4 provides a brief overview of the TMO programming scheme and a description of an ADSS implementation as a TMO network. The performance of the system is also evaluated and discussed in Section 4. The paper concludes in Section 5.
2
2.1
the player node. More details about the implementation of the digital music ensemble application can be found in [Kim05]. This paper focuses mainly on the techniques used to implement a new extension, i.e., the microphone network.
2.2
Microphone network and a distributed sound system optimized for a particular listening location
The use of TPT in the digital music ensemble allows the player nodes to achieve highly precise globally synchronous playback of music. If we are to aim for another significant quality improvement of the distributed sound system (DSS), we should consider the effects of acoustic propagation in the air. This is because if the listener is not sitting in the center of the speaker network, the sounds from the speakers may not reach the listener's ears in well-synchronized forms. Listeners can easily notice two effects of sound propagation. First, the amplitude of a sound attenuates as the sound travels in the air: 6dB are lost every time the distance doubles. As a result, the listener feels that the player node closer to him/her is louder, even though all nodes play with the same volume. The second effect is the propagation delay. The sound travels in the air at the speed of about 34 centimeters per millisecond. Even though the nodes play synchronously, the listener will first hear the sound produced by the speaker closest to him/her. If the position of the listener relative to the player nodes is known, the effects of acoustic propagation can be compensated. This introduces the concept of target listener location, at which the music sounds originating from all player nodes are perfectly synchronized and of consistent intensity. By knowing the physical characteristics of sound propagation and the distances between the player nodes and the target listener location, it is possible to determine the individual TPTs and volumes that will result in optimal synchrony of the music sounds reaching the target listener location. The distances between the player nodes and the target listener location could be measured manually. However, measurements of sound propagation delays, which we want to compensate during music playback, can be used to estimate those distances automatically. By taking advantage of the global-time base and by equipping our player nodes with microphones, we create a network of microphones that can be used to detect special sounds and localize their origin. To localize the source of a special sound, each player node needs to issue a detection timestamp corresponding to the arrival time of the sound signal at the node. Then, the player node sends its detection timestamp to the controller node. After
Exploitation of TCoDA in a new major type of distributed multimedia processing applications Digital music ensemble via a speaker network
Ideally, the players in a digital music ensemble must be perfectly synchronized and coordinated. The user of an ensemble application must be able to choose a song from a list, play it back, pause it, stop it, and even rotate the players. Rotating the players means that each player node is assigned to play an instrument, vocal chord or channel of a song that was previously assigned to another node. In addition, the music must start playing with the shortest practical delay after the user gives the play command. Rotating the players must also be done with the shortest practical delay without playback interruption. Instead of storing on each player node copies of the music files containing the wave-format data to be played, the files are kept only in the conductor/controller node and streamed to the player nodes during playback time. Each audio packet is sent along with its target play time (TPT) to a player node. The TPT is expressed in terms of the global-time base of the system and specifies the time at which the associated music fragment must be played by
93
collecting the detection timestamps from all the player nodes, the controller node determines the propagation delays between the sound source and the player nodes and finally deduces the position of the sound source. The microphone network can use this technique to determine the location of its own player nodes as well. This "self-localization" of the player nodes is achieved by letting each of the player nodes take turn in generating a special sound such that three of the other player nodes can determine the location of the sound producer. The self-localization of the nodes (i.e., the calibration of the system) must occur before the localization of the target listener. The integration of the digital music ensemble [Kim05] and the microphone network results in an Adaptive Distributed Sound System (ADSS).
3
The alternative solution employed in our ADSS involves the use of: • TSSC : the timestamp generated when the audio capture started (this time has been recorded by each player node with a good precision), and •
NBDS : the number of audio samples captured before the detection sample that marks the beginning of the detected special sound signal. The value of the detection timestamp DT sent to the controller node is therefore: DT = TSSC + (NBDS / SamplingRate) This solution assumes that the audio sampling rate of the sound card capturing the data is exactly 22,050 Hz. In reality this rate may vary slightly from one player to the other, which can somewhat affect the accuracy of the detection timestamp. Overall, this solution has been found to offer a good performance without being too complicated. The real-time analysis of the captured audio data involves simple signal processing operations aimed at reducing the impacts of noises. Signal processing algorithms and integration filters have been fine-tuned to detect high-intensity narrow-frequency-band sounds such as clapping sounds, and to ignore ambient noise. After identifying a series of audio samples appearing to correspond to a clapping sound, the algorithm locates the first audio sample of that sound. This sample is then used for the calculation of the detection timestamp.
Techniques for realization of an adaptive distributed sound system
This section assumes that the ADSS consists of one controller node and four player nodes, all connected in a LAN. Each player node is equipped with an identical microphone and set of speakers. This corresponds to the smallest configuration. Further, we assume that a player node, its microphone, and its speakers are in the same location (i.e., their coordinates are the same).
3.1
Sound detection and origin localization
The microphone network performs two functions under control of the user. They are: 1) to determine the positions of its player nodes (i.e., self-localization), and 2) to localize the target listener. To realize the two functions, the following four tasks need to be integrated into the existing music ensemble application.
3.1.2 Sound signaling of the player nodes during self-localization The signal used here is a short mono-frequency sound. The corresponding audio data is stored on each of the player nodes before run time and played when the controller node gives the order. The controller node starts the self-localization of the nodes by sending to all player nodes the order to start listening, i.e., to start capturing audio data. Shortly thereafter, the controller node commands one of the players to emit the signal. Once the control node has received a detection timestamp from each of the players, it repeats the same procedure, asking the next player node to signal. Note that the player node that emits the signal needs to detect its own signal and issue a timestamp as well.
3.1.1 Sound capturing and detection timestamps Immediately after the player nodes are ordered by the controller node to go into the detection mode, each player starts capturing audio data at a rate of 22,050 samples per second and timestamps the precise moment at which it initiated its audio capture. The players then read periodically the audio data captured and analyze them in real-time until they detect a special sound signal. The easiest way to produce a detection timestamp would be to check the current global time as soon as the application detects a special sound signal in the captured audio data. Unfortunately the accuracy of the timestamps produced using this approach would be quite low. This is mostly because the application cannot get 100% of the CPU time, and also because the delay between the capture of an audio sample and its processing by the application is not constant and is difficult to estimate.
3.1.3 Processing sound detection timestamps for self-localization of the player nodes Once the controller node has collected the four sets of four self-localization timestamps, it can deduce the relative positions of the player nodes. In the first step, the distances between the nodes are determined. In Figure 2, A, B, C, D, E, and F denote
94
β = cos
those distances. N1
N1
N2
A B
α
D
E F N4
N3
C
γ
θ
N3
Step 1
C
Let Tij with i, j = 1,…,4, be the timestamp taken by player node j (Nj) as soon as node j heard the signal emitted by the player node i (Ni ). We define Dik = Tik - Tii , where i, k = 1, ...,4. Dik is the time difference between the timestamp of the node Nk and the timestamp of the signaling node, Ni . Dik is thus the travel time of the special sound between nodes Ni and Nk. Then we can determine the distances between the nodes as follows: A = [(D12 + D21)/2] ×Vs B = [(D13 + D31)/2] ×Vs C = [(D34 + D43)/2] ×Vs D = [(D24 + D42)/2] ×Vs E = [(D23 + D32)/2] ×Vs F = [(D14 + D41)/2] ×Vs where Vs is the speed of sound in the air. In our calculations we approximate the speed of sound with a constant, Vs = 343 m/s. Once the distances A, B, C, D, E, and F are known, we can determine the coordinates of all the nodes in the same reference frame (O, x, y) by triangulation.
F
B F
N 2.1
D E
E
N4.1
N 3.1
N 4.2
N 3.1
C
Step 4
N 1.1
N 2.1
N 1.2 F
N1.1
D
B
Step 3
N 1.1
B
F N 4.1 N 4.2
N 2.2
A
N 2.1
N 1.2
N4.1 C
N 3.1
D
N 4.2
N 3.1
Figure 4. Determining the coordinates of player nodes At this point, we have obtained a first set of coordinates for the locations of the player nodes. This set of coordinates corresponds to the quadrilateral Q1 = {N1.1, N2.1, N3.1, N4.1} shown in the Step 1 of Figure 4. Q1 has been derived without taking into account the distance C. Thus, in order to balance the effects of errors in all the distance measurements, we use C to obtain the quadrilateral Q2 = {N1.2, N2.2, N3.1, N4.2} as follows (see Figure 4): • Step 2: From locations N2.1 and N3.1, and distances C, D and E, we obtain N4.2, which is the second location estimate of player node 4. • Step 3: From N3.1, and N4.2, and B, C and F, we obtain N1.2, which is the second location estimate of player node 1.
N2
A
N 2.1
A
N 4.1
y x
Step 2
N 1.1
Figure 2. Geometrical representation (1)
N1 o
(( A 2 + E 2 − B 2 ) (2 × A × E ))
Given the length of the side E and the angle β, the coordinates of N3 can be derived as: (N3.x = N2.x - E×cos(β), N3.y = -E×sin(β)) Similarly, the angle α and the coordinates of N4 can be derived.
β
E
F N4
N2
A
−1
D E
Figure 3. Reference frame As shown in Figure 3, we assume that N1 is at the origin of the reference coordinate frame (i.e., N1.x = 0, N1.y = 0), and the axis x and the vector N 1 N 2 have the same direction. The coordinates of N2 are then (N2.x = A, N2.y = 0). By using the cosine rule, we determine the interior angles of the quadrilateral formed by the player nodes. For instance, in Figure 2 the angle β can be obtained as follows:
•
Step 4: From N4.2, and N1.2, and A, D and F, we obtain N2.2, which is the second estimate of the location of the player node 2. Finally, the location of each player node is estimated by averaging the coordinates of the node in Q1 and Q2. Note that the coordinates of N3 remain the same in both quadrilaterals. This averaging is not a perfect solution but is
95
considered to be a reasonable heuristic for handling sensor errors as validated to some extents through the experiments discussed in Section 4.
clock-synchronization scheme that we adopted as a part of the development of the ADSS is based on the use of message broadcasts and broadcast reception timestamps [Kop87]. A challenge in periodic resynchronization is that it must be done while real-time applications are executing. Different clocks tend to drift at different rates and thus they must be periodically resynchronized. Middleware and OS kernels must be structured and operated in judicious manners to enable resynchronizations leading to a high-precision global time base. Some basic techniques are discussed in [Kim02]. When this synchronization scheme is incorporated into a relatively small distributed system like the one used for the digital music ensemble (i.e., one controller node and only four players), it provides a clock precision of about 50 µs. We refined the above clock synchronization further. History was used to compensate for the clock drift on each node. In addition, instead of using the single synchronization command message, a burst of synchronization command messages were used. This allows averaging the transmission delays, thereby filtering out some of the transmission jitters. These refinements resulted in the clock-synchronization precision of about 40 µs. The measurements obtained with our prototype implementation are presented in Section 4.4.
3.1.4 Processing of sound detection timestamps for localization of the target listener and calculation of play delays and volumes The controller node performs this task after it has collected detection timestamps from the player nodes. The situation is depicted in Figure 5. Computation details are omitted in this paper due to the space limit.
N1(x1,y1)
N2(x2,y2)
d1 S(x,y) d2 d4 N4(x4,y4)
d3 N3(x3,y3)
Figure 5. Geometrical representation (2) Once the position of the target listener relative to the players is known, the controller node assigns a playback time delay and a volume to each player. This is to compensate for the differences in distance and sound propagation delay between the different players and the target listener. For the node furthest from the listener, this delay is null. For the others the delay is assigned such that the time at which a sound from any of those nodes arrives at the listener becomes the same as that for a corresponding sound from the node furthest from the listener. It can be expressed as follows: TDi = [max(di=1,..,4) - di ] / Vs where TDi is the time delay for Ni , di the distance from Ni to the listener, and Vs the speed of sound. The amplitude of a sound is attenuated as the sound travels in the air; 6dB are lost every time the distance doubles. To compensate for this phenomenon and make sure that the sounds coming from the different speakers arrive at the listener with the same volume, we set: VOLi = VOLRef - 19.93 log10(di / max(di = 1,..., 4)) where VOLi is the output volume of node i in dB; VOLRef is the reference output volume for all the nodes; and di is the distance from node i to the target listener.
3.2
4
4.1
Implementation of an ADSS via a high-level component-/object-oriented programming approach Overview of the TMO programming scheme
TMO [Kim97, Kim00, Jen07] is a natural, syntactically minor, and semantically major distributed computing extension of pervasive basic object structure. As depicted in Figure 6, the basic TMO structure consists of four parts: • ODS-sec (Object-data-store section). This section contains the data-container variables shared among methods of a TMO. Variables are grouped into ODS segments (ODSSs), which are the units that can be locked for exclusive use by a TMO method in execution. Access rights of TMO methods for ODSSs are explicitly specified and the execution engine analyzes them to exploit maximal concurrency. • EAC-sec (Environment access capability section). Contained here are “gate objects” providing efficient call-paths to remote TMO methods, logical multicast channels called real-time multicast and memory replication channels (RMMCs) [Kim00], and I/O device interfaces.
Clock synchronization
As mentioned before, a clock precision better than 100 µs is needed for the microphone network to achieve localization with the 10-centimeter-level precision. The
96
Conductor TMO Name of TMO
•••
EAC
Object Data Store ( ODS )
AAC
AAC Reservation Q Service Request Queues
From SvM's, SpM's
Capabilities for accessing other TMO's and network environment incl . logical multicast channels, and I/O devices
SpM 1
SpM 2
• • Deadlines
Control_SpM
Send_SpM
Period: 60ms Start Time: 0 Deadline: 30ms
Period: 60ms Start Time: 30 ms Deadline: 30ms
Time-triggered (TT) Spontaneous Methods ( SpM's) "Absolute time domain"
RMMC for RMMC for commands detection timestamps
SvM 1 Client TMO's
• •
SvM 2 concurrency control
• •
Audio Packets Streaming
RMMC for audio packets
Play TMO Node 1
Message-triggered Service Methods ( SvM's)
State Machine
"Relative time domain"
Sound Detection PlaySpM
Music Play
Period: 60ms; Start Time: 60 ms; Deadline: 30ms
Figure 7. TMO application
Figure 6. Basic TMO structure (Adapted from [Kim97]) • SpM-sec (Spontaneous method section). It contains time-triggered methods whose executions are initiated during specified time-windows. All time references are global-time references. An example of an execution-time specification is: “for t = from 10am to 10:50am, every 30min, startduring (t, t+5min) and finish-by t+10min” which has the same effect as: {“start-during (10am, 10:05am) finish-by 10:10am”, “start-during (10:30am, 10:35am) finish-by 10:40am”}. • SvM-sec (Service method section). It contains service methods, which can be called by other TMOs and subject to guaranteed execution time bounds (GETBs). Deadlines can be specified in the client's calls for service methods for the return of the service results. An execution rule called the basic concurrency constraint (BCC) is an integral part of the TMO scheme. Under BCC the activation of an SvM triggered by a message from a client is allowed only when there is no possibility of running into data (ODSS) conflicts with any SpM execution.
4.2
Audio Data Mgt
2
Sound Localization
1
State Machine
ODSS
ODSS
ODSS
are identical Mini-ITX boards running Windows CE .NET, each equipped with identical speakers and microphones. Note that each node's microphone is placed as close as possible to the node's speakers. We have also developed a version of the ADSS in which the player nodes run on Linux. From our experience, both versions have similar performance.
4.3
Implementation and system structure
The prototype ADSS developed was structured as a network of one Conductor TMO and four Play TMOs, as depicted in Figure 7. The Conductor TMO has two SpMs, one named Control_SpM and the other one named Send_SpM. Each Play TMO has a single SpM, named Play_SpM. The four Play TMOs differ only by their statically assigned identification numbers. All SpMs in the system have the same iteration period, 60 ms, and the same relative deadline of 30 ms. Control_SpM is the first to start in the system at T0, followed by Send_SpM which starts right after T0+30 ms, which is the completion deadline for Control_SpM. The methods Play_SpM all start at T0+60 ms.
Implementation platform
Control_SpM has four main functions: 1) it receives and interprets users' commands; 2) it sends orders to the Play TMOs (e.g., play, pause, and localize); 3) it prepares the audio packets to be sent to the Play TMOs; and 4) it processes the detection timestamps to determine the position of the listener. Control_SpM is therefore a part of both the music ensemble and the microphone network.
Our prototype ADSS consists of one controller node and four player nodes, all connected through an isolated 100-Mbps Ethernet LAN. The controller node is a Pentium III PC with a 500-MHz CPU and 512 MB of RAM, running Windows XP. It controls the player nodes, stores and down-streams the music files to the player nodes, and provides the user interface. The player nodes
97
In each Play TMO, Play_SpM supports three main functions: 1) it plays the audio packets provided by Conductor TMO, at the specified TPT; 2) it emits the special sound signal used for self-localization of player nodes; and 3) it analyzes audio data captured by the microphone and sends a timestamp when a special sound signal is detected.
Clock Difference [microsecond]
4.4
30
Experimental Results
20 10 0 -10 -20 -30
This section presents the results of the quality assessment of our TMO-based ADSS.
0
50
100
150
200
250
300
Synchronization Round
Figure 8. Clock synchronization Sound Localization 3
2.5
2 Microphone1 Y-axis [m]
4.4.1 Startup delay and playback latency The startup delay is defined as the lapse of time between the moment when the controller node interprets the user's command for playing a song and the moment when the player nodes begin to play the first audio fragments of the song. The playback latency of the music ensemble corresponds to the lapse of time between the moment when audio fragments have been extracted from the music files by the controller node and the moment when the player nodes begin to play those fragments. In our prototype ADSS the playback latency is quite close to the startup delay. Experiments have shown that both the startup delay and the playback latency are between 130 ms and 160 ms. The integration of the microphone network and the music ensemble to form the prototype ADSS has not altered these values.
Microphone2 Microphone3
1.5
Microphone4 Sound Localization 1
Sound Source 12.7 cm radius from Sound Source
0.5
0 0
0.5
1
1.5
2
2.5
X-axis [m]
4.4.2 Degree of synchrony The clock difference between the controller node and the player nodes was measured periodically (before each synchronization round) over several minutes. In Figure 8, we observe that all the clock differences measured range between ±20 µs. From there we judge that our measurable clock precision is 40 µs. In comparison, the clock precision achieved in the earlier version of the music ensemble application [Kim05], which was relying on a simple Master-Slaves clock synchronization scheme used in older versions of TMOSM, was about 155 µs.
Figure 9. Results of the localization tests detection time-stamping is at best 45 µs (i.e., approximately the sampling period). Also, a sound signal might not be detected by one or more player nodes. Experiments have shown that in most cases the detection uncertainty ranges between 1 and 5 samples (i.e., 45 µs 225 µs). An experiment has been carried out to measure the average and maximum error of sound-source localization. This error is defined as the difference between the position estimated by the system and the real position measured manually with a ruler tape. This distance is expressed in meters. The maximum error observed in this experiment is considered equivalent to the precision of sound-source localization. The real positions of the nodes were inputted manually to the system rather than estimated by using the self-localization feature. A user then produced a sound at various locations within the
4.4.3 Precision of sound-source localization The overall quality of the microphone network is assessed by measuring the precision of the sound-source localization that it can achieve. Since the sampling rate of the sound capturing component of a player node is 22,050 samples per second, the precision of the sound
98
boundary formed by the four microphones. For each location, the experiment was repeated five times. Figure 9 displays the real positions of the microphones, the real positions of the sound sources, and the corresponding estimated positions of the sound sources. Table 1 contains the maximum and average localization errors derived from those measurements.
1997, pp.91-126. [Cha03] C. Chafe, "Distributed Internet Reverberation For Audio Collaboration," AES (Audio Engineering Society) 24th Int'l Conference on Multichannel Audio, October 2003. [Fah00] H. Fahmi, W.G. Aref, M. Latif, A. Ghafoor, P. Liu, and L. Hsu, "Distributed Framework for Real-Time Multimedia Object Communication", Proc. 3rd IEEE Int’l Symposium on Object-Oriented Real-time Distributed Computing (ISORC 2000), March 2000, pp.252-259. [Gan03] S. Ganeriwal, R. Kumar, and M.B. Srivastava, "Timing-Sync Protocol for Sensor Networks", Proc. 1st Int'l Conference on Embedded Networked Sensor Systems, 2003, pp. 138-149. [Jen07] S. F. Jenks, K. Kim, Y. Li, et al., “A middleware model supporting time-triggered message-triggered objects for standard Linux systems,” Real-Time Systems, July 2007, 36(1), pp. 75-99. [KDH02] D.H. Kim, K.H. Kim, S. Liu, and J. H. Kim, "A TMO-Based Approach to Tolerance of Transmission Jitters in Tele-Audio Services", Computer System Science & Engineering, 17(6), Nov. 2002, pp.325-333. [Kim97] K.H Kim, "Object Structures for Real-Time Systems and Simulators," IEEE Computer, 30(8), 1997, pp. 62-70. [Kim99] K.H. Kim, M. Ishida, and J. Liu, "An Efficient Middleware Architecture Supporting Time-Triggered Message-Triggered Objects and an NT-based Implementation", Proc. 2nd IEEE Int'l Symposium on ObjectOriented Real-time Distributed Computing (ISORC'99), May 1999, pp.54-63. [Kim00] K.H. Kim, "APIs for Real-Time Distributed Object Programming," IEEE Computer, pp. 72-80, June 2000. [Kim02] K.H. Kim, C. Im, and P. Athreya, "Realization of a Distributed OS Component for Internal Clock Synchronization in a LAN Environment," Proc. 5th IEEE Int'l Symposium on Object-Oriented Real-time Distributed Computing (ISORC 2002), May 2002, pp. 263-270. [Kim05] K.H. Kim, Emmanuel Henrich, et al., "Distributed Computing Based Streaming and Play of Music Ensemble Realized Through TMO Programming," Proc. 10th IEEE Int'l Workshop on Object-Oriented Real-Time Dependable Systems (WORDS’05), February 2005, pp. 129-138. [Kim05b] K.H. Kim, S. Liu, M.H. Kim, and D.-H. Kim, "A Global-Time-Based Approach for High-Quality Real-Time Video Streaming Services," Proc. 7th IEEE Int'l Symposium on Multimedia (ISM’05), December 2005, pp. 802-810. [KSJ05] S. J. Kim, F. Kuester, and K.H. Kim, "A Global Timestamp-based Approach for Enhanced Data Consistency and Fairness in Collaborative Virtual Environments", ACM/Springer Multimedia Systems Journal, 10(3), March 2005, pp.220–229. [Kop87] H. Kopetz and W. Ochsenreiter, “Clock Synchronisation in Distributed Real-Time Systems,” IEEE Transactions on Computers, 36(8), 1987, pp. 933-940. [Kop97] H. Kopetz, “Real-Time Systems – Design Principles for Distributed Embedded Applications,” Kluwer Academic Publishers, 1997, Chapter 3, pp. 45-70.
Sound-source Localization Error Average
Maximum
4.2 cm
12.7 cm
Table 1. Precision of localization
5
Conclusions
In this paper, we presented an Adaptive Distributed Sound System (ADSS) that exploits the TCoDA principle in a compelling way. The development of the ADSS has also served as a test case for the TMO programming scheme and tools, which have been developed to support a methodical and efficient design of complex real-time distributed computing (RTDC) applications. The ADSS imposes stringent RTDC requirements. Experimental results indicate that our prototype implementation of the ADSS achieves a maximum playback latency of 160 ms, a global-time precision of about 40 µs, and a precision of sound-source localization below 13 cm. The measured quality of the ADSS is an indication that the TMO execution engines (e.g., TMOSM [Kim99, Jen07]) are now exhibiting a degree of reliability and performance that are acceptable to a considerable number of practitioners in the field of embedded computing software development. However, much further research is needed to establish various conceivable support tools for the TMO programming scheme, such as techniques and tools for abstract specification, timing analysis, and optimization of execution engines.
Acknowledgments The research reported here is supported in part by the NSF under Grant Numbers 03-26606 (ITR) and 05-24050 (CNS). Juan A. Colmenares also thanks the University of Zulia (LUZ) for supporting his participation in this research. No part of this paper represents the views and opinions of the sponsors mentioned above.
References [Bra97] M.S. Brandstein, and H. Silverman, "A Practical Methodology for Speech Source Localization with Microphone Arrays," Computer Speech and Language, 11,
99