The creation of complex sonic environments requires that a sound generation system .... For sounds directly in front of or behind the head it is zero. ...... 1-4 fig. in shown angle theis where otherwise cos. 3T0 if. ) 3( cos. )( l t g ...... M. Pesce, VRML: Browsing and Building Cyberspace, New Riders Publishing, 1995. [3].
Scheduling Algorithms for Real-time Sound Generation in Virtual Environments By Hesham Fouad B.S. May 1984, The American University, Washington, D.C. M.S. May 1986, The American University, Washington, D.C.
A Dissertation submitted to The Faculty of The School of Engineering and Applied Science of The George Washington University in partial satisfaction of the requirements for the degree of Doctor of Science
September 30, 1997
Dissertation directed by James K. Hahn Associate Professor of Engineering and Applied Science
Abstract Current tools for integrating spatial sound into Virtual Environment interfaces have not adequately addressed the problem of resource management in the sound generation process. Such systems place a hard limit on the number of sounds they support or in some cases they fail to maintain real-time evaluation rates when computational resources are exceeded. The research described in this dissertation addresses this problem. In order to establish a rigorous approach to resource management, a real-time scheduling strategy is employed. The sound generation process is expressed in terms of the hard real-time periodic job model and because the process is monotonic, the imprecise computation model is used. Three components are necessary in order to utilize the imprecise computation model: first an ordering scheme is established so that the active sounds in the environment can be rated according to their importance. Secondly, a method for iteratively evaluating sounds is devised. Finally a scheduling algorithm which minimizes the perceptible effects of degradation is established. The Least Utilization (LU) algorithm minimizes the average error given the characteristics of this problem. The behavior of the algorithm however produces perceptible discontinuities in the quality of the sounds during overload conditions. This behavior is quantified using a fairness index that was formulated. The scheduling algorithm must minimize both the average error and the fairness index. These two factors however are inversely related; reducing the fairness index raises the average error. The Priority Allocation (PA) algorithm developed in this research minimizes the fairness index while raising the average error somewhat. A listener study that was conducted in order to examine the effectiveness of these algorithms. It shows that under overload conditions, sounds generated using the PA algorithm were rated higher than those generated by the LU algorithm.
Acknowledgment I would like to thank first and foremost my wife Jo-Anne for enduring six long years of marriage to a student. Without her unwavering love and support I could not have completed this work. I would also like to thank my family for their encouragement and support. I consider myself very fortunate to have what I consider a second family at George Washington University. I thank all my friends and faculty for their help and encouragement. I also thank my advisor Dr. James Hahn for being a great teacher and coach. The research presented here was conducted at the Naval Research Laboratory. I’d like to thank Dr. Allen Duckworth for supporting this work. I would especially like to thank Dr. James Ballas for taking such an interest in this research.
Examining Committee
Dr. James K. Hahn (Advocate) Dr. John L. Sibert Dr. Bhagirath Narahari Dr. James A. Ballas Dr. Abdou Youssef
Table of Contents ABSTRACT
ii
ACKNOWLEDGEMENT
iii
EXAMINING COMMITTEE
iv
TABLE OF CONTENTS
v
TABLE OF FIGURES
vii
CHAPTER 1 INTRODUCTION
1
1.1 MOTIVATION
2
1.2 OBJECTIVE
3
1.3 PROBLEM DOMAIN
3
1.3.1 Modeling
5
1.3.2 Sound Generation
6
1.3.3 Rendering
9
1.3.3.1 Spatialization
10
1.3.3.2 Environmental Effects
12
CHAPTER 2 RELATED WORK
14
2.1 SPATIAL SOUND GENERATION SYSTEMS
14
2.2 SOUND SYNTHESIS SYSTEMS
16
2.3 REAL-TIME SYSTEMS
21
CHAPTER 3 FRAMEWORK
30
3.1 MODELING THE SONIC ENVIRONMENT
31
3.1.1 Auditory Actors
32
3.1.2 Sound Sources
33
3.2 VAS DEVICES
34
3.3 THE VAS SCHEDULER
35
3.3.1 Parallel Architecture
37
3.3.2 Runtime Metrics
38
CHAPTER 4 REAL-TIME EVALUATION OF SYNTHETIC SOUNDS 4.1 PRIORITIZING SOUNDS
42 43
0
4.2 ITERATIVE EVALUATION OF SYNTHETIC SOUNDS
46
4.3 SCHEDULING ALGORITHMS
48
4.3.1 Problem Characteristics of Sound Generation
49
4.3.2 Algorithm Evaluation Test Environments
51
4.3.3 Dynamic Task Allocation
53
4.3.4 Processor Scheduling
66
4.3.5 Subjective Evaluation of Algorithms
76
CHAPTER 5 CONCLUSION
82
5.1 CONTRIBUTIONS
84
5.2 FUTURE WORK
85
BIBLIOGRAPHY
87
APPENDIX A – RATING SHEET
A-1
APPENDIX B – SUBJECTIVE ALGORITHM EVALUATION DATA
B-1
1
Table of Figures FIGURE 1-1 SPATIAL SOUND GENERATION PROBLEM DOMAIN
4
FIGURE 2-1 AN EXAMPLE OF A TIMBRE TREE
19
FIGURE 3-1 THE VAS SYSTEM ARCHITECTURE
31
FIGURE 3-2 VAS SCHEDULER ARCHITECTURE
35
FIGURE 3-3 A SAMPLE OF THE SCHEDULER'S LOG
39
FIGURE 3-4 VAS’s GRAPHICAL USER INTERFACE
41
FIGURE 4-1 MEASURING THE ANGLE BETWEEN THE LISTENER’S GAZE VECTOR AND THE SOUND SOURCE
45
FIGURE 4-2 IBUFFER AFTER THREE ITERATIONS
47
FIGURE 4-3 THE INTERPOLATION STEP AND OUTPUT BUFFER. THE IBUFFER IS DEPICTED AFTER THREE ITERATIONS. THE SAMPLES IN MINI-BUFFERS 0, 2, AND 4 ARE EVALUATED. THOSE IN 1 AND 3 ARE INTERPOLATED.
48
FIGURE 4-4 BEHAVIOR OF SOUNDS IN THE CITY SONIC ENVIRONMENT
50
FIGURE 4-5 BEHAVIORS OF SOUNDS IN THE WAVES SONIC ENVIRONMENT
52
FIGURE 4-6 PROCESSOR UTILIZATION OF THE CITY, WAVES, AND HEAVY SONIC ENVIRONMENTS
53
FIGURE 4-7 THE M-D SCHEDULING ALGORITHM
64
FIGURE 4-8 PROCESSOR LOADS WITH AND WITHOUT DYNAMIC LOAD BALANCING 65 FIGURE 4-9 AVERAGE ERROR OF THE LU ALGORITHM WITH MIN. ITERATIONS VARYING BETWEEN 1 AND 6 ITERATIONS
67
FIGURE 4-10 ITERATIONS ASSIGNED TO SOUNDS IN THE WAVES SONIC ENVIRONMENT BY THE LU ALGORITHM
69
FIGURE 4-11 ITERATIONS ASSIGNED TO SOUNDS IN THE WAVES SONIC ENVIRONMENT BY THE PO ALGORITHM
70
FIGURE 4-12 THE PRIORITY ORDER ALGORITHM
71
FIGURE 4-13 THE PRIORITY ALLOCATION ALGORITHM
74
FIGURE 4-14 AVG. ERROR AND F PRODUCED BY THE LU, PO, AND PA ALGORITHMS 75 FIGURE 4.15 RESULTS OF ANOVA ANALYSIS OF THE SOUND QUALITY EVALUATION STUDY
79
FIGURE 4-16 INTERACTION BAR PLOT FOR THE ALGORITHM EFFECT
79
FIGURE 4-17 INTERACTION BAR PLOT FOR ALGORITHM EFFECT BY STIMULUS
80
FIGURE 4-18 F AND AVG. ERROR VS. RATING
81
FIGURE 4-19 CORRELATION OF RATING WITH AVG. ERROR AND F
81
2
3
Chapter 1 Introduction
Virtual Environment (VE) interfaces immerse users in interactive computer generated worlds. The information content that a VE can convey to those users depends largely on the sensory channels utilized. While emphasis has historically been on the visual channel, researchers are beginning to recognize the importance of sound as a tool for conveying information to users and for enhancing the immersive qualities of VEs. Researchers have just recently begun concentrating on the problems of integrating sound in VE systems. Research efforts to-date have concentrated mainly on techniques for localizing sounds; giving the listener the impression of a sound emanating from a particular direction. As for the future research efforts, the National Research Council’s report on Virtual Reality technology made the following recommendations:
...the accomplishments and needs associated with the auditory channel differ radically from those associated with the visual channel. Specifically, in the auditory channel, the interface devices (earphones and loudspeakers) are essentially adequate right now. In other words, from the viewpoint of synthetic environment (SE) systems, there is no need for research and development on these devices and no need to consider the characteristics of the peripheral auditory system to which such devices must be matched. What is needed, however, is better understanding of what sounds should be presented using these devices and how these sounds should be generated [1].
1
1.1 Motivation In order to facilitate the exploration of the uses of sound in VE interfaces, a general framework is needed which facilitates the study of sound in VE. This work focuses on the problem of real-time sound generation within the context of such a framework. Most current sound generation systems for VE limit their representation of sound sources to be digitally sampled waveforms commonly referred to as sampled sounds. Sampled sounds are computationally inexpensive to generate, and in many cases can produce good results with little effort. Synthetic sounds, on the other hand, are computationally expensive to generate and difficult to specify. None the less, the power of a general parameterized representation of sound which synthetic sounds provide is essential in the exploration of the sonic content of VEs. The use of synthetic sounds in VE opens new avenues of research, which are not available with the use of sampled sounds. Physically based and heuristic sounds can be expressed and readily parameterized to correspond to motion events in VE. Data sonification can be achieved by mapping data to synthetic sound parameters. The creation of complex sonic environments requires that a sound generation system support a large number of sound sources. Given limited computational resources, some form of resource management must be performed so that real-time response is guaranteed. As we shall see, current sound generation systems do not adequately address the resource management problem. The inclusion of computationally expensive synthetic sound sources exasperates the problem, making resource management increasingly important.
2
1.2 Objective The primary objective of this research is to define a framework for integrating sound into VEs that enables the exploration of the sonic content of VEs by addressing current technological barriers. The research described in this dissertation establishes a framework that can support a large number of synthetic as well as sampled sound sources while minimizing the perceptible effects of overload conditions. In order to achieve these objectives, the first research goal is to establish a rigorous approach to resource management which minimizes the perceptible effects of overload conditions. Three primary components are required in order to do this: first a prioritization scheme has to be established among active sounds in the sonic environment. This enables sounds to be rated according to their importance, which is crucial to minimizing the perceptible effects of overload conditions. Secondly, a real-time scheduling approach is necessary to manage the computational resources assigned to active sounds. Finally, a method has to be established so that sounds can be gracefully degraded during overload conditions so that perceptible discontinuities in the sound field can be avoided. The second research goal is to establish modeling abstractions for the specification of the sonic environment. This enables the creation of complex sonic environments and facilitates the integration of image and sound in creating a coherent auditory-visual environment.
1.3 Problem Domain Research efforts in sound for VEs have focused primarily on techniques for localizing sounds. While this is an important problem, it is certainly not the whole picture. Three basic problem areas need to be addressed by a VE sound system (fig. 1-1):
3
•
Modeling the sonic environment. Modeling abstractions describe the static and dynamic properties of the elements comprising a sonic environment. Those abstractions should provide a rich programming toolkit for describing the environment while accommodating varying levels of expertise by the programmer. Beyond that, modeling abstractions should facilitate the mapping of elements in the sonic environment to graphical constructs forming the visual environment. This helps the programmer to establish a coherent audio-visual environment.
Control parameters
Modeling the auditory world
Real-time generation of sampled and synthetic sounds
Real-time Rendering
Sample Stream
Figure 1-1 Spatial sound generation problem domain
•
Real-time sound generation. Sound generation is the process of evaluating the representation of a sound at fixed intervals in order to produce an audio sample stream. Given that the number of concurrently active sound sources in an environment is not known a priori, it is possible that a point is reached where the available computational resources are not sufficient to maintain real-time evaluation of the sample streams. The sound generation
4
mechanism must manage the computational resources so that the perceptible effects of an overload condition are minimized. •
Real-time sound rendering. Rendering sounds entails two problems: localization and the simulation of environmental effects. Localization is the process of recreating spatial auditory cues so that sounds appear to emanate from a particular direction in 3D space. This can be done by recreating the sound sources around the listener using an array of strategically placed loudspeakers or by using filter convolution to simulate Head Related Transfer Functions [7] (HRTFs) and replaying the resultant sound over headphones. The suitability of these two techniques depends largely on the intended application. Calculating environmental effects requires that sound waves be traced from source to listener taking into account reflection, diffraction, and attenuation. A VE sound system should support a variety of existing rendering techniques and also facilitate the exploration of new ones.
1.3.1 Modeling Integrating sound into VE interfaces requires that abstractions be defined which facilitate the definition and control of the various elements comprising a sound field. Little attention has been paid to this problem and thus current systems provide low level programming interfaces for defining the sound field. Recently, however, there has been an effort to integrate modeling abstractions for sound into the VRML [2] standard. This standard is based on the Inventor toolkit [3] which models a scene using a tree structure of nodes. In order to support sound, special nodes have been developed for modeling the elements comprising a sound field. The modeling abstractions provided by VRML are limited to modeling only sound sources and listener.
5
In [4] we have proposed an actor based model of the auditory world which closely parallels many of the constructs developed for modeling the visual world, making the process of integrating sound into existing VE simulations as fast and painless as possible. Models of the visual world consist of geometric models, light sources, and cameras. Geometric models represent the visual entities in the world. They are given material properties and may have behavior that manifests itself in the production of motion. Light sources illuminate the geometric models and the camera renders a view specific version of the world based on its position and orientation. In the auditory world, auditory actors are associated with the visual entities in the scene. They give them auditory properties in the form a sound repertoire, and exhibit behavior that manifests itself in sound events consisting of the starting and stopping of sounds and variations in the sound’s timberal qualities. The listener renders a view specific version of the auditory world based on its position and orientation. Having a close correspondence between the modeling abstractions of the visual and auditory worlds facilitates the integration process because the programmer need not force one view of the world into compliance with an opposing view. Instead the two models naturally correspond and their constituent objects are functionally similar. Auditory actors can be easily associated with the geometric models comprising the scene, giving them aural behavior. While the listener object is associated with the camera and is moved and oriented in the same fashion. 1.3.2 Sound Generation Sound generation is concerned with modeling sound at its source without taking into account its propagation through the environment. This problem has been well studied in the realm of electronic music, which had its inception more than three decades ago with the advent
6
of analog synthesizers. These gave way to the digital synthesis methods in use today. Two basic techniques have been devised for modeling sounds for the purpose of musical applications: sampled and synthetic representations. Sampled sounds are digital recordings of sounds that are temporally mapped and played back on demand. Like image-based texture maps in graphics, sampled sounds have gained wide popularity because they can be used to easily create complex sounds that would otherwise be difficult or impossible to synthesize. There are, however, drawbacks to this representation: The control parameters available to modify the characteristics of sampled sounds are limited due to the static nature of the representation. Filtering and time scale expansion and contraction can be applied to sampled sounds in order to affect the timbre and pitch of the sound. However, varying a filter’s parameters to achieve specific results is non-intuitive, and filter convolution is an expensive operation often requiring specialized hardware to achieve real-time performance. Finally, sampled sounds are storage intensive due to large number of samples required to recreate a high quality rendition of a digitized sound. This is particularly problematic for distributed applications such as VRML [2] clients. Sending a stream of high quality sounds over a network consumes a great deal of bandwidth usually requiring reduced sampling rates in order to achieve continuous playback. Synthetic sounds are procedurally defined, one-dimensional functions in the temporal domain. They are equivalent to procedural textures in the visual realm and thus have the same advantages: Their representation is inherently compact which makes them appealing for distributed applications such as VRML clients. Their representation can be arbitrarily parameterized which makes them useful for data sonification (mapping data to sounds) and for
7
synthesizing sounds from physical parameters. Finally, a procedural representation is a powerful construct giving the sound designer complete control over the shaping of the sound’s timbre. The primary disadvantage of synthetic sounds is that they are difficult to specify. They require a deep understanding of the sound synthesis process and an intuitive sense of how to achieve desired results. Another drawback of synthetic sounds is that they are computationally expensive. Generating a synthetic sound requires the evaluation of a procedural representation of the sound once for each sample in the output stream. In order to produce the sound in real-time at the standard sampling rates for high quality audio, the generation process must produce samples at a rate of 44100Hz or 48000Hz. Manufacturers of musical instruments address this problem by using dedicated Digital Signal Processing (DSP) hardware with a fixed number of available concurrent sounds and by limiting the possible synthesis algorithms to those supported by the hardware configuration. Using these techniques, some modern synthesizes are capable of producing up to 128 concurrent sounds. Because these devices can be controlled remotely through the use of the Musical Instrument Digital Interface (MIDI), one can integrate sound into VE interfaces by externally controlling such a device in order to produce the desired sounds. In fact, this approach has been adopted by several VE sound systems as we shall later see. The drawback of such an approach lies in the loss of generality imposed by hardware based synthesis required in order to achieve such high performance. Modern synthesizers use a hardware pipeline in order to synthesize sounds. A digital sample stream is routed through a fixed number of signal processing stages, each implementing a predetermined algorithm. The resultant streams are combined digitally and converted to an analog signal for playback. The source of the sample stream may either be a simple wave generator or a digitally recorded
8
sample. Some advanced synthesizers such as the Kurzweil K2500 [5] allow the designer the flexibility of creating new sounds by choosing a combination of algorithms to process the sample stream. The algorithms themselves, however, are hardware implemented and thus fixed as is the number of possible algorithms in the synthesis pipeline. Sound synthesis researchers, on the other hand, have developed software synthesis systems allowing complete generality in the synthesis algorithms. Specialized languages have been developed which allow the designer complete flexibility in designing the synthesis algorithms. Their systems, however, have classically been batch oriented where the samples are calculated off-line and stored for later playback. With increasing processor speeds, however, some current systems can support a small number of sounds generated in real-time. A general synthesis model such as that provided by software synthesis facilitates the exploration of sound synthesis techniques for use in VE interfaces. The problem is that where real-time performance is achieved using specialized hardware, the generality of the representation is severely limited. Ideally, a sound generation system for VE should provide a general approach to synthesis while maintaining real-time performance. 1.3.3 Rendering The term sound rendering was coined in a paper by Takala and Hahn [6]. It refers to the process of simulating the propagation of sound waves through the environment from source to listener and spatializing the sounds, creating the illusion of directional sound. This has been the most actively researched problem in VE sound, and a number of systems have been developed for this purpose.
9
1.3.3.1 Spatialization Before delving into spatialization techniques, it is important to understand how humans localize sounds. The human auditory system uses a large set of cues in order to determine the location of a sound. A large set of these cues is based on a sound’s interaction with the listener’s head and outer ear. They are collectively referred to as the Head Related Transfer Functions [7] (HRTF) and consist of the following: •
Interaural Delay Time [8] - IDT is the difference between the time a sound reaching one ear and another. For sounds directly in front of or behind the head it is zero.
•
Head Shadow [9] – Head Shadow is the effect on sound having to pass through or around the head in order to reach the other ear. It effects the overall intensity of the sound, and acts as a linear filter.
•
Pinnea Response [10] - The Pinnea of the ear has a response that varies based on the direction of the sound.
•
Shoulder Echoes [11] - Certain frequencies of sound (1-3 kHz) reflect from the shoulders and upper body. These echoes reach the ear with a delay that is dependent on the elevation of the source. This reflection also has an effect on the spectrum of the reflected sound that is direction-dependent. The loudness of a sound is also used as a cue to determine the distance of a source,
although it is not part of the set of cues comprising the HRTF set. The effectiveness of these cues varies widely based on the position of the sound source, the listener’s familiarity with the sound, and the frequency content of the sound. For example, the loudness of a sound is used as a localization cue, but is only effective for familiar sounds. The interaural delay time and phase
10
difference is an important cue. It is, however, only effective for low frequency sounds below 1500 Hz which do not lie along the median plane of the listener's head. Interaural intensity difference is effective for sounds above 1500 Hz. Finally, the pinnae response is an important cue, especially for sounds occurring along the median plane. Its effectiveness, however, is also related to the listener's familiarity with a sound where unfamiliar sounds tend to be perceived as being behind the listener irrespective of their actual position [12]. Integrating all these variables, some of which are intangible such as familiarity, into a computational model of three-dimensional hearing is a difficult problem. Instead, empirical approaches have been used to address the problem. These fall into two general categories. One is the recreation of the HRTF cues through filter convolution, and the other is the recreation of the sound field using free field loudspeakers. Both these methods are discussed below. HRTFs are modeled using finite impulse response filters (FIR filters). These filters are generated by actually recording the sound reaching the eardrum using a set of probe microphones placed in a listener’s ears. A set of noise pulses are generated from locations surrounding the head, and recorded inside the ear. In order to eliminate the effect of reverberant sounds reaching the listener’s ears, the recording takes place inside an anechoic chamber. The spectral, intensity and phase change in the recorded sound represents the effect of the HRTFs on the original sound. These changes are captured using a set of FIR filters corresponding to the locations of the sound sources around the listener’s head during the recording process. During playback, sounds are localized to a certain location by finding the corresponding filter or by interpolating the coefficients of four neighboring filters in order to obtain a filter for that location. The resultant filter is convolved with the sound signal and, when heard over headphones, gives the impression of a directional sound sources.
11
A major problem with this approach is that the application of the FIR filter to the source sound is an expensive computation and cannot be done in real-time on general workstations. Crystal River Engineering and NASA Ames Research Center have designed a hardware device called the Convolvotron which is capable of performing this task in real-time. Unfortunately the cost of this device prohibits its widespread use. Another problem with this approach is that the HRTFs are generated for the response of a particular person's head and ears. When recreated for other persons the effect may be suboptimal. HRTF based systems often do not externalize sounds such that they appear to occur inside the listener's head. Also localization along the median plane is not very good. Another approach to localization was introduced in [13]. In this approach a set of loudspeakers are located at the four corners of a square surrounding the listener. In order to simulate the effect of angular location and distance, the amplitude of the sound emanating from each speaker is scaled such that the resultant sound appears to be emanating from a particular direction. While this approach is cost effective and does not require a great deal of computational power, speaker based systems give only a weak impression of a moving sound source [24]. 1.3.3.2 Environmental Effects In order to determine the sounds reaching the listener’s ears, the rendering step must simulate the interaction of sound waves with the environment in which the sound propagates. Environmental effects consist of spectral changes, intensity changes, and delay, which is perceived as reverberation, echo and Doppler shift. Several approaches have been suggested for simulating some of these effects.
12
In [6] Takala and Hahn suggest the use of sound threads to simulate the effects of sound reverberating within an environment. Whenever a sound is reflected off of a surface, a new instance of that sound is instantiated (a sound thread) and is made to emanate from the point of reflection with the appropriate delay and attenuation. An alternative technique was suggested by [7] whereby HRTFs are used to model the acoustical effects of the environment. Normally HRTFs are generated in an anechoic chamber such that no reflections are present. In this approach HRTFs are recorded in an ordinary room thus capturing the effects of that room on the sound. The problem with this approach is that different HRTFs are required for each sound source position and user position in the room. Also different HRTFs are required for each room to be modeled. Another problem is that the size of the FIR filter is proportional to the size of the room being simulated. Finally in [14], an approach is suggested which uses a modified radiosity algorithm to calculate the impulse response of an environment. Impulse responses completely describe the acoustical characteristics of a room. The idea that a radiosity algorithm would work for acoustical simulation is based on three factors: First, the principal of radiosity is that light energy is conserved within an enclosure. This is also reflects the nature of room acoustics. Second, the radiosity approach is view independent. This is appropriate for sound since diffuse reflections of sound dominate over specular reflections due to sound's longer wavelength. Finally, the radiosity algorithm is not restricted to dealing with a finite set of rays, but the overall energy distribution based on the global geometry configuration.
13
Chapter 2 Related Work
In the following sections we will first examine the current state of the art in systems developed specifically for sound generation for VE. We will then cover applicable results in the area of synthetic sound generation and real-time systems in order to establish the necessary background for the approach taken in this work. In addressing the problems of resource management for real-time sound generation, we must draw on results from a number of disparate fields. As we shall show, current systems have not addressed this problem directly and therefore there is no real precedent for the approach taken here.
2.1 Spatial Sound Generation Systems Several sound servers have been developed specifically for VE applications. Each has addressed a subset of the spatial sound generation problem domain depicted in fig 1-1. The Audition sound server [15] is an actor-based system that supports multiple synthetic as well as sampled sounds. It primarily addresses the sound generation problem and provides limited abstractions for modeling the sonic environment. Actors model the sounds in the environment and provide constructs for controlling and synchronizing those sound in real-time. Audition’s use of the actor paradigm for modeling sounds represents an important step towards addressing the modeling problem. The NA3 audio server [16] incorporates multiprocessing techniques in order to support software-based spatialization using HRTFs. The server supports sampled sound sources as well as externally fed sample streams through a network socket. This system is geared primarily
14
towards addressing the spatialization problem. It does not provide abstractions for modeling the auditory world or for generating synthetic sounds. The Acoustitron [17], a commercially available system developed by Crystal River Engineering, provides hardware-based HRTF spatialization of multiple sampled sound sources. A low-level interface enables clients to interface with the spatialization hardware. The Acoustitron supports only sampled sounds and places a hard limit on the number sounds which varies based on the hardware configuration. The NPSNET-PAS server [18] extends an existing Distributed Interactive Simulation [19] (DIS) based simulation system to provide spatialized audio cues. The system uses the MIDI protocol to control a sample playback device in order to generate sound. Sounds are spatialized using six speakers surrounding the listener. A similar approach is used by Personal Audio [20], a commercially available system developed by VSI. Because both systems use a sample playback unit for sound generation, they are constrained to supporting only sampled sounds and are limited in the number of concurrent sounds they can produce. The main problem with these approaches is that they focus primarily on localization, and do not address the whole process of integrating sound into VEs. The lack of resource management for real-time sound generation, for example, forces programmers to either limit the number of sound sources they use or to devise ad hoc methods for prioritizing sounds in the environment. The adverse effects of the lack of a resource management scheme in these systems depend on whether they are hardware or software-based. Hardware-based systems such as the Acoustitron II, NPSNET-PAS, and Personal Audio provide a fixed number of channels, which varies based on the hardware configuration. Any sounds beyond the number supported by the
15
hardware must be discarded. Software based systems, on the other hand, do not fix the number of sounds supported. If, however, the processing capabilities of the hardware are exceeded, the sounds generation degenerates and silent gaps begin to appear in the output. These gaps are highly apparent due to the human ear’s exceptional sensitivity to discontinuities in the amplitude envelope of sound [12].
2.2 Sound Synthesis Systems Sound synthesis has been actively researched since the 1960s in the realm of computer music. Early synthesizer were analog devices composed of voltage controlled oscillators (VCO) for creating waveforms, voltage controlled filters (VCF) for filtering these waveforms and thereby modifying the relative strengths of their harmonics, and finally voltage controlled amplifiers (VCA) for dynamically shaping the amplitude of the waveforms. Sounds were created by physically connecting the above components using patch bays into a network of signal producing and signal modifying nodes. Sound synthesis entered the digital domain when researchers began creating programming languages and systems for the synthesis of sound. The earliest and most successful of these efforts is a suite of systems beginning with the Music I system and ending with the most recent variant the Music V system. They are collectively named Music N. These systems where designed for use by composers in creating digitally synthesized music. The systems work by having the user enter a description of what is termed the orchestra, which is a description of all the instruments, followed by a musical score. Samples comprising the score are generated and stored to disk. The digital recording can then be played back through a digital to analog
16
converter (DAC) [21]. The Music N system has spawned a number of variants most notably CSound [22] and cmusic [23]. At the core of the Music V system is the way in which the instruments are created. Synthetic instruments are basically algorithms that are specified as a network of building blocks called unit generators. A unit generator is a simple signal processing function such as oscillation, gain scaling, filtering, or mixing. These unit generators are interconnected into a configuration forming a graph that then constitutes an instrument. The signal flows through the components of the graph in a data flow fashion until a signal is produced at the output. Scratch pad memory is used to store the signal between nodes. One may note the similarity between this approach and that of analog synthesizers on which it was based. When an instrument is instantiated due to a note statement being executed, it is passed a number of command line parameters that control the synthesis algorithm of the instrument. Some of the drawbacks of the Music V system and its variants are that they are not interactive, and that the languages used in the specification of the orchestra (instruments) and the score are different, forcing the user to master two languages. In an attempt to circumvent these shortcomings, two languages have been developed Fugue [24] and MAX [25]. Fugue is a lisp-based language for music synthesis. The main advantage of Fugue is that both the score and orchestra are specified using the same language. Another feature of Fugue is its behavioral abstraction: behaviors that describe how to generate a sound can be defined and applied to instruments to control when and how they are played. The behavior abstraction allows composers not only to control the time and duration of a note but also the articulation, loudness, and pitch.
17
The MAX system is graphical programming environment for music synthesis. Its salient features are a graphical interface for programming and its real time response through the integration of DSP processors. Similar to the concept of instruments in the Music V system, MAX has the concept of a patch. A patch is a collection of interconnected boxes that are represented graphically in the user interface. The boxes represent some function and communicate with the other boxes through their input and output ports by message passing. The function boxes perform functions ranging from user input controls, to signal processing functions similar to the unit generators in Music V. We turn now to the problem of sound generation for VE. Unfortunately, the sound synthesis systems described thus far are not well suited for this problem. Synthesis techniques developed in these systems are geared toward the generation of synthesized music. They are therefore all based on the note as the specification of an acoustical event. This model is not well suited for modeling sound in computer animation or VE applications. Furthermore the parameters available for the dynamic control of sounds are also geared towards musical performance and do not readily allow for the manipulation of sounds along the dimensions that specify events in a synthetic environment. In the realm of computer animation, work has been done in [26, 46, 47] on a representation of sound using a functional composition of sound signals. At the core of the technique is the idea of a Timbre Tree (fig. 2-1) that represents a sound as a tree composed of signal processing functional units. The approach is again similar to that of its predecessors Music V and particularly MAX. A single Timbre Tree represents a class of sounds and is parameterized such that instances of this class are created by instantiating a tree with a specific parameter set. The parameterization of Timbre Trees achieves two important goals: by varying a small number
18
of intuitive parameters variations of a general class of sounds can be easily explored. This facilitates the creation of libraries of Timbre Trees, each representing a general class of sounds. Timbre Tree parameters can also be used to map motion to sound. Timbre Trees can be readily constructed so that tree parameters modify the generated sound according to some physical property of the motion event that activates the sound. Collision sounds, for example, can be coded so that the collision force is a tree parameter. Timbre Trees can also support data sonification. By mapping data values to sound parameters, a sound’s properties can be dynamically modified to reflect changes in the input data. This approach was used in an electronic warfare simulation system to sonify radar delectability of a target. Timbre Trees were coded so that the amplitude and timbre of the generated sounds varied according to radar type and amount of exposure of a target.
sine
500
+
100
sine
+
1.5
t
(sine (+ 500 (* 100 (sine (* 1.5 t)))))
Figure 2-1 An example of a Timbre Tree
19
One method of synthesis, which can be particularly useful in graphics applications, is to generate physically based sounds from physical parameters of the simulation. Physically based sound generation can be very useful for modeling that class of incident sounds which occur due to the vibration of objects in the environment. Examples of this class of sound include collision sounds, scraping sounds, and wind sounds. Accurately simulating the physical processes that generate sounds is at best a difficult task and is generally not possible under real-time constraints using current hardware. It is however often not necessary to model the processes involved in sound creation in order to generate realistic sounds. After all, most of the algorithms in the rendering of computer generated imagery is based on heuristics and are not physically correct. In [6], Takala and Hahn use modal analysis of simple vibrating object in order to construct sounds based on a weighted sum of the modes of vibration. In order to simulate the scraping sound of rough objects in contact, 1/f noise is used. Sawtooth waves are used to simulate the sound of objects alternating between sticking and sliding such as a violin bow does. In their more recent work [26], the authors represent a class of physical sounds based on a heuristic feel of how the sound is generated which then be modified through the Tree Tree’s parameters for a specific instance of that sound. In [27] Gaver suggests an alternative form of synthesis algorithm based on auditory event perception. Through the use of acoustical analysis of impact events and their perceptible attributes, Gaver determined the dimensions that affected the perception of that event. A synthesis model was then created that allowed the impact sounds to be specified along these dimensions. The idea of using heuristics for the modeling of incident sounds works well in many instances. It requires the animator to make judgments on the processes of sound creation and
20
develop heuristics that approximate that sound. Furthermore the application of this technique to VE requires real time generation which is not currently supported by these systems.
2.3 Real-time Systems In order to devise a rigorous approach to resource management of the sound generation processes, we must turn to scheduling techniques developed for real-time systems. A real-time scheduling approach is generally necessary when the correctness of a computation not only depends on the results produced but also on the time at which they were produced. Clearly sound generation falls under these constraints. In order to generate a continuous signal at the output, real-time generation of the sound samples must be ensured. A real-time scheduling strategy provides a rigorous framework so that real-time generation of the sound samples is guaranteed when possible. Where real-time constraints are not possible, a real-time scheduling strategy ensures that the condition is detectable and that the system will fail in a predictable fashion. This enables the sound generation process to react to overload conditions. In the study of different real-time scheduling strategies, workload models are used to describe the characteristics of the processes to be executed. In those models, an independent unit of computation is referred to as a job. Jobs are classified as hard or soft real-time. Hard real-time jobs are those where the results must be completed by a deadline or they are considered in error. For soft real-time jobs, this restriction is relaxed where the validity of the results decreases gradually as the deadline passes. Jobs are further classified as periodic when their execution consists of a periodic sequence of identical requests for execution termed tasks. The rate at which tasks are submitted for execution is the job’s period.
21
For our purposes we will consider the hard real-time periodic workload model for scheduling the execution of a set of jobs. In this workload model, a job set, J = {Jk}, consists of a set of jobs each making periodic requests for the same execution. Thus each job Jk generates a set of tasks Tk,j for j = 1, 2, 3,
. . . .
. The start time αk,j of each task is the time before which its
execution cannot begin. The period of a job p k,j is the time interval between two requests and can be expressed as αk,j + 1 - αk,j. The execution time of a task τk is the time required for the task to complete execution. Finally, the deadline of each task Tk,j is the start time of task Tk, j+1. A job can thus be specified by the two-tuple (pk, τk). The scheduling problem can then be stated as follows: Given a job set J, each task Tk,j must be scheduled for execution such that it begins execution at some point at or after its start time αk,j and completes it’s execution before αk,j + 1. A schedule where these requirements are met is termed a precise schedule. A real-time scheduling algorithm must determine if a precise schedule is possible and if so, establish an ordering of the execution of the task set such that the above conditions are met. In general when a precise schedule is not possible, the scheduler rejects the job set. In some cases however it may be possible to partially execute the task set, making use of intermediate results. In a seminal paper Chung et.al [28] describe a technique for evaluating monotone processes using a model for imprecise computations. A monotone process is one that is guaranteed to produce increasingly accurate results as it is allowed to execute longer. The imprecise model partitions a task into a mandatory part and an optional part. The mandatory part is that required to produce results at the minimum acceptable precision. The mandatory task set is scheduled as a hard real-time task set and a precise schedule is obtained. Any remaining time
22
in the schedule is used to schedule the optional parts of the task set. The resultant schedule is termed a feasible schedule. The workload model for the imprecise computation model is based on the workload model for periodic hard real-time jobs described above. It differs from that model in that a minimum execution time mk is specified for each task set Tk. This is the time that a task Tkj in Tk must execute in order to produce minimally acceptable results. The scheduling strategy for the imprecise computation model requires that each task be executed for a minimum time of mk and that any remaining time in the schedule be assigned to tasks such that some error metric is minimized. The characteristics of the job set determine the error metric used and thus the scheduling strategy. Jobs are classified according to their characteristics as type N or type C jobs. Type N jobs are those where the error incurred due to a task not completing its execution does not accumulate over time. An example of a type N job is one that periodically receives, enhances and transmits video frames. The error incurred due to a task not completing its execution will be limited to one frame and will not accumulate to subsequent frames. For type C jobs, on the other hand, errors do accumulate and therefore require that periodically some task Tkj be allowed to execute to completion. An example of a type C job is a radar-tracking program. When a task is prematurely terminated, the program produces coarse estimates of the target’s position, velocity and acceleration. Since subsequent estimates of velocity and acceleration are based on the coarse estimate produced the current period, it is essential that at some point a task is allowed to execute to completion otherwise the estimated position may diverge completely from the target’s actual position.
23
In chapter 4 we will show that the sound generation problem fits the characteristics of a type N job. We will therefore only consider the scheduling strategies developed for those types of jobs. In scheduling type N jobs, each job Jk is partitioned into two independent jobs, a mandatory job Mk and an optional job Ok. The mandatory job Mk(pk, mk) consists of the first portion of Jk’s task set required to produce acceptable results. The optional job Ok(pk, τk - mk), consists of the remaining time required to complete the tasks in Jk. The Mk job is scheduled as a hard real-time job and a precise schedule is obtained. The Ok job is scheduled as a soft real-time job. In devising an approach for scheduling type N jobs, two scheduling strategies are necessary, one to precisely schedule the mandatory job set, and another to schedule the optional job set such that some error metric is minimized. A good error metric for measuring the performance of a scheduling strategy for type N jobs is the average error of all results. Given that the M job set must be scheduled precisely, the error produced by any scheduling strategy is determined by how the O job set is scheduled. In [28] a number of preemptive, priority driven strategies are presented for scheduling a job set J on n processors using the imprecise computation model. A preemptive, priority driven strategy determines the execution order of a set of jobs by assigning each a priority based on the scheduling strategy. Whenever a task is available which is of a higher priority than the currently running task, the running task is suspended and the newly requested task is started. The specification of scheduling algorithms consists of the specification of methods for assigning priorities to tasks.
24
According to Dhall et.al, algorithms for scheduling periodic hard real-time tasks on multiprocessors have been shown to have unacceptably poor worst-case performance [29]. Instead a partitioning approach is taken where jobs are first assigned to processors and then each processor is scheduled independently of the others. The problem is then to find a partitioning of the job set J such that the number of processors is minimized. This problem is equivalent to the bin packing problem and hence is NP hard. Instead, a heuristic approach is used. The ratemonotone next-fit (or first-fit) algorithm [29] prioritizes jobs according to their repetition rate, with shorter repetition rates having higher priority. The jobs are then assigned to processors based on a next-fit or alternatively a first-fit basis. In deciding whether or not a job fits a processor, only the mandatory job Mk of Jk is considered. Thus the scheduling strategy for the mandatory job set is invoked to determine whether the inclusion of the newly assigned job to the existing job set on that processor can yield a feasible schedule. The rate monotone algorithm is a priority driven, preemptive algorithm that has been shown to be optimal among fixed priority algorithms [28]. Fixed priority algorithms assign a priority to each job such that the priorities of the tasks generated by that job never change. The rate monotone algorithm assigns priorities to jobs in the same fashion as the rate monotone next-fit algorithm does. These priorities, however, determine the execution order of the mandatory jobs assigned to a processor. A number of non-partitioning approaches to scheduling periodic hard real-time jobs have been suggested. In [30] a heuristic based algorithm is presented which assigns m tasks to n processors directly. The SA1 algorithm segments time into blocks that are the GCD of the deadlines of all the tasks. Each task is assigned its average time requirement within each block. The average requirement of each task is T * Ci/DI, where T is the GCD of the deadlines. Di is the
25
task’s deadline and Ci is its required execution time. Assuming processors are numbered 1..n, tasks are assigned their average requirement on the processors sequentially starting with processor 1. When the average requirement for a task exceeds the available time on the current processor, the task is split such that it uses the remaining schedulable time on processor 1 and the remaining execution time required to meet its average requirement is scheduled on the next processor, processor 2. In determining whether a task set m is schedulable on n processors precisely, the condition U ≤ n was determined to be sufficient [30], were U is the multiprocessor utilization factor n
U =∑ i =1
Ci Di
Equation 2-1 Multiprocessor utilization factor The SA1 algorithm exhibits O(m) complexity where m is the number of tasks. For a dynamic system with tasks entering and leaving the system on-line, the SA1 algorithm can be run for a newly arrived task in O(1) time if no tasks have left the system and time is available on one processor to schedule the task without partitioning it. Otherwise the full algorithm must be evaluated at O(m) time. In [31] the SA1 algorithm is extended to handle imprecise computations. The problem is formulated as a min-cost-max-flow network flow problem. While this approach yields a minimum error schedule, the complexity of the approach makes it impractical for any real use. Finally, in [32] the myopic algorithm is presented which can schedule a job set with resource constraints in O(n) time. The problem of finding a precise schedule is formulated as search tree and the branch and bound technique is used to search the tree for a precise schedule. The algorithm orders the set of tasks based on a heuristic function H. Possible heuristics for H
26
include: minimum deadline first, minimum processing time first, earliest start time first, minimum laxity first, and finally some weighted combination of the above. Starting with an empty schedule the task with the smallest H value is added to the schedule and a determination is made whether or not the resultant schedule is strongly feasible. A partial schedule is strongly feasible if all the schedules obtained by extending this schedule with one of the remaining tasks are also feasible. When a partial schedule is reached which does not satisfy this constraint, backtracking is performed to traverse other possible branches. Normally such an algorithm would have a O(n2) complexity. However, as the name implies, the algorithm does not consider all the remaining tasks when evaluating the H value and when determining whether a partial schedule is strongly feasible. Instead the tasks are ordered in increasing deadlines and only the first k tasks are considered in the evaluation, where k is fixed. Surprisingly, this algorithm performs as well as the original O(n2) version. We now return to the partitioning approach adopted for the imprecise computation model in [28]. Given that we have good strategies for scheduling the mandatory job set M precisely, we now focus on the problem of scheduling the optional job set O such that error is minimized. As mentioned earlier, for type N jobs, the average error over all the results is a good metric for evaluating the effectiveness of any scheduling strategy. An average error formulation for a type N Jk is
27
Ek =
pk p
l
∑ ε (σ , j ) k
j =l −
p
pk
k
+1
where : p is the least common multiple of the repetition periods of the jobs in J pk is the repetition period of job Jk l is the period in which the error is being calculated εk (σk , j ) is the error in jth task of job k given by εk (σk , j ) = 1 −
σk , j − mk τk − mk
where : mk is the minimum acceptable execution time of job Jk Equation 2-2 Error for type N jobs
The total error for the system is
K
E = ∑ w k Ek k =1
Equation 2-3 Total error for all jobs where wk is a normalized, nonnegative constant weight signifying the relative importance of each job. The above general expressions can be simplified to
28
E = 1−
1 k wk ∑ σk ( O ) p k =1 v k
where: v k is the utilization factor of the optional job Ok and is given by vk = ( τk − m k ) / pk σ is the total processor time assigned to the tasks in Ok over the period in which the error is measured p / pk .and is given by σk ( O ) =
l
∑σ
k, j
j=l −
p +1 pk
−
pm k pk
Equation 2-4 Simplified form of the error function
When the jobs are ordered in a non-decreasing order in k according to their weighted utilization factor vk/wk, it is evident that E is minimized when the maximum processing time is allocated to O1 which corresponds to the smallest utilization factor. This observation leads to the Least Utilization algorithm. This algorithm statically assigns higher priorities to the jobs with the smaller weighted utilization factors vk/wk. It is shown that this algorithm minimizes the average error when the error function is linear and when all the jobs have the same repetition period. We have seen a number of strategies for scheduling real-time periodic jobs using imprecise computations. In the chapter 4 we will evaluate the effectiveness of these strategies for the real-time sound generation problem.
29
Chapter 3 Framework
In order to facilitate the study of real-time scheduling strategies for synthetic sounds, we have developed the Virtual Audio Server (VAS). VAS is a real-time, distributed, sound generation server for VEs. The system supports both sampled and synthetic representations of sound sources so that a wide range of options is available in determining the content of the sonic environment. Synthetic sound sources are modeled using Timbre Trees [26]. This representation was chosen because it is general and thus facilitates the exploration of different sound synthesis techniques for VE. VAS provides the user with high level actor-based abstractions for modeling the auditory world, requiring little or no knowledge of underlying audio hardware. VAS incorporates a scheduler that facilitates the exploration of various scheduling strategies for the sound generation process. Finally, an extendible architecture allows the system to support a variety of existing as well as future rendering techniques. VAS is partitioned into four functional areas as depicted in fig. 3-1. Remote Objects provide client applications with access to the server. Each server object with which the client interacts has a corresponding Remote Object on the client’s machine. This object acts as a local representative, mirroring the state of its server object and communicating with that object when necessary. This approach maintains an object-oriented interface to the server and minimizes the communication required between client and server.
30
VAS Client
Sonic Scene Elements
Remote Objects Auditory World
RPC
Auditory Actor Sound
Devices
Auditory World Auditory Actor
Abstract Device
Sound
Specialized Device
VAS Scheduler
Spatialization Device
Figure 3-1 The VAS system architecture
Sonic Scene Elements model the sonic environment. They consist of the Auditory World, which maintains the state of the sonic environment including all the objects within it. Auditory Actors model high-level scene elements that consist of listeners, spaces, and sound producing entities. Sounds evaluate sound samples and write the resultant samples to Devices. An instance of a Device is attached to each Sound and provides it with a device independent interface to any rendering mechanism used by the server. Finally, the Scheduler manages the real-time evaluation of active sounds in the environment.
3.1 Modeling the Sonic Environment VAS incorporates an actor-based model of the sonic environment that closely parallels many of the constructs developed for modeling the visual environment. This facilitates the
31
integration process because the two models naturally correspond and their constituent objects are functionally similar. 3.1.1 Auditory Actors VAS models any sound producing entity in the world using the Auditory Actor. Auditory Actors give the visual entities in the scene auditory properties in the form of a sound repertoire. The Auditory Actors’ interface provides control primitives for positioning the actor in 3D space, adding and removing sounds from the actor’s sound repertoire, for controlling the play state of those sounds and for controlling a sound’s timbre through its parameter space. Auditory Actors can be controlled directly using the primitives described above, or their actions can be scripted. Each Auditory Actor includes a script manager that maintains a set of time-stamped auditory events. Each auditory event represents an invocation of one of the Auditory Actor’s interface methods. The Auditory World maintains a global clock that is used by the script manager to invoke the auditory events at their assigned time. The VAS Listener is a specialized version of the Auditory Actor. The only added property is the head orientation of the user. This allows the Listener to be a sound producing entity, which can be a very useful feature for communicating information to the user such as motion cues (footsteps for example) and collision sounds. The Auditory Space is another specialized Auditory Actor. It models distinct sonic spaces within the environment. An Auditory Space consists of one or more barriers each modeling an occluding surface. A barrier is essentially a bounded plane with which we associated a reflectivity and attenuation value. Ambient sounds which are attached to Auditory Spaces help distinguish sonic regions by giving them a unique character.
32
3.1.2 Sound Sources The VAS system supports both sampled and synthetic sound sources. Synthetic sounds are modeled using Timbre Trees, a functional representation of sound that was originally designed for use in computer animation. Timbre Trees represent sounds using a tree structure where the internal nodes of the tree are signal-processing nodes and the leaf nodes of the tree are signal-generating nodes. Inserting a named parameter in the node’s definition can parameterize any node in the tree. A parameter’s value can be modified at run time through the Auditory Actor’s interface. VAS models sound sources as active objects so that each Sound object has a thread of execution associated with it. This thread is responsible for evaluating the sound signal and writing the resultant samples to its attached Device. Upon instantiation, sound source objects register an evaluation routine with the VAS Scheduler. When executed, this routine generates a sample stream representing the sound signal at the source. The scheduler controls the execution of the evaluation routines. Samples are evaluated by the evaluation routines in blocks corresponding to 100 ms at the global sampling rate of the server. Synthetic sounds evaluate this block over a series of 10 iterations, each improving the resolution of the previous iteration. Upon the completion of each iteration, the evaluation routine checks with the scheduler in order to determine if another iteration is allowed. This facilitates the multi-resolution evaluation of synthetic sounds which is the basis of the graceful degradation scheme used by the scheduler. After a single block of samples has been evaluated, the evaluation routines write the resultant samples to a VAS device object which is responsible for rendering the sounds in order to produce the final product heard by the listener.
33
3.2 VAS Devices In designing VAS, we placed significant emphasis on having the system support a variety of rendering techniques. An abstract Device object defines the interface between any rendering mechanism and VAS. New devices are added to VAS by specializing the Device object through inheritance. The specialized object implements the abstract device’s primitives by interfacing with an external physical device through its supplied programming interface or by implementing them directly in software. This architecture, while general, requires that the spatialization device accept multiple sample streams to be rendered and combined in some fashion depending on the rendering technique. Many hardware-based localization devices, however, either do not support this functionality or only support a limited number of external inputs. Their architecture is optimized for sound sources consisting of sampled sounds that reside locally in the device. This scheme has the advantage of off-loading all the audio processing onto the hardware device. However, it constrains the available audio processing to whatever the hardware device supports, limiting the user’s options and requiring expensive hardware upgrades to increase the functionality of the available audio processing. As an alternative to hardware-based spatialization, VAS provides a no cost spatialization option using the loudspeaker approach discussed earlier. A specialized Device object was created which uses the four-channel output available on Silicon Graphics computers. Each Device object generates four channels of output properly scaled to position the sound to the desired location in the sonic environment. The spatialization algorithm determines the scaling for each channel.
34
3.3 The VAS Scheduler The VAS scheduler design is a crucial component in the creation of a framework for the study of real-time scheduling strategies. The scheduler was designed according to object oriented methodology so that it is composed of a set of classes representing the entities that comprise the scheduling problem. Figure 3-2 depicts the general architecture of the VAS scheduler and the salient functions of each of its classes. The Scheduler class represents the scheduler and is
Scheduler RegisterSound StartSound StopSound RemoveSound
Job StartJob StopJob CreateTask DestroyTask GetPriority
Task Block Unblock Iterate
Processor AssignTask RemoveTask ScheduleTasks
Sample_Eval_Proc() while (1) { evaluate sample block write block to device if blocked then block() }
Figure 3-2 VAS Scheduler architecture (shadowed objects have multiple instances)
responsible for adding and removing sounds from the system, and for controlling the play state of those sounds. The Scheduler class also performs load balancing among processors. A sound source is represented in the Scheduler using the Job class, which assigns one or more Tasks to each sound source. A Task represents a lightweight thread, which executes the sound’s evaluation function. While multiple Tasks can be assigned to evaluate one sound, only one Task per sound was used in the scheduling algorithms studied here. Finally the Processor class represents a physical CPU on the multiprocessor system.
35
Upon instantiation, a Processor object isolates its assigned physical CPU from executing any processes except those specifically assigned to execute on it. It then suspends the operating system’s scheduler and initiates its own real-time scheduler on that CPU. As Tasks are created, they are each assigned to a Processor according to the load balancing strategy employed by the Scheduler. The Processor object adds the Task to its task list and limits the execution of the Task’s evaluation routine to run only on the CPU represented by that Processor object. In order to employ real-time scheduling of the evaluation routines, the scheduler must have reliable execution metrics upon which scheduling decisions are made. This is accomplished in the VAS scheduler by a combination of benchmarking and monitoring. Upon instantiation, the evaluation routines evaluate a block of samples and measures the execution time required. This measure is used as an initial starting point for scheduling the first block of samples. The scheduler then monitors the evaluation and rendering times for each block of samples and updates the timing information for that task. In this fashion the scheduling decisions for each period of execution are based on the execution time of the previous period. As mentioned earlier, the evaluation routines for both sampled and synthetic sounds must evaluate blocks of samples at some fixed size before writing the resultant samples to the output device. The lower bound of this block size is constrained by the scheduling overhead. A small block size will result in fine-grained parallelism, which adversely affects performance of the system. The upper bound on the block size is constrained by the minimum perceivable latency from the instant a request is made to start a sound until the instant the sound is heard. In [35] an experimental study was conducted to measure the amount of delay or advance which would result in a perceptible auditory visual desynchrony. Tests were conducted using two scenarios: one with speech and the other with a hammer hitting a peg. In both cases the audio was advanced
36
or delayed gradually and the subjects were asked to press a button at their first perception of desynchrony between the audio and the images. The results of the study are shown in table 3-1. The smallest delay resulting in a detectable desynchrony was in the case of the hammer which occurred at 187.5 ms. As a conservative measure, the block size in evaluation process was fixed to be equivalent to 100ms at the current sampling rate. If, for example, the system is operating at a sampling rate of 44.1kHz, then the evaluation routine must generate blocks of 4410. This block size results in a delay of 100 ms, which is well within the limits found in [35].
Voice
Hammer
Auditory delay
Auditory advance
Auditory delay
Auditory advance
257.9
131.1
187.5
74.8
Table 3-1 Means of detected desynchrony in ms from [35]
3.3.1 Parallel Architecture We do not consider the problem of generating a single synthetic sound source to be compute intensive since we can evaluate relatively complex Timbre Trees on a single processor. Generating multiple sound sources certainly can be compute intensive. The VAS scheduler was devised so that it can schedule the execution of the sounds on a multiprocessor. The choice of parallel architecture is generally dictated by the characteristics of the problem. In this case a MIMD, shared memory machine was determined to be ideal. The problem of evaluating multiple synthetic sounds is clearly a MIMD problem since each sound requires the execution of unique codes corresponding to that sound.
37
A shared memory architecture was chosen due to the high data bandwidth required for this problem, and the flexibility which this architecture affords us in task migration. Each generated sound source produces a sample stream of R * N * S bytes per second. R is the sampling rate which is typically 44100 or 48000, N is the number of channels used for localization and can vary from 2 to 4, and S is the sample size and can be 2 or 3 bytes. The output rate per sound source then varies from 176K to 576K bytes per second. On a distributed memory machine, this sample stream would have to be routed through the interconnection network to the location of the sound output device. Clearly this places a heavy burden on the network and creates a bottleneck at the receiving CPU. Using a shared memory machine, the samples are simply written to shared memory where an output device can access them directly. In chapter 4 we discuss the dynamic task allocation strategy used in order to balance load across processors. Due to the dynamic nature of this problem, a migratory policy is used where running tasks are migrated between processors. The overhead of doing this on a distributed memory machine makes the technique infeasible. On shared memory machine however, there is little overhead associated with migration since the tasks do not have to be moved. The disadvantage of using a shared memory machine is that they are typically not scalable to a large number of processors. Given that we can generate on the order of 1 to 5 sounds per processor, depending on the complexity of the sound, a relatively small number of processors (8-64) are sufficient for this problem. 3.3.2 Runtime Metrics In order to study the effectiveness of different scheduling strategies, some form of statistics gathering capabilities are required. The VAS system provides two facilities for
38
performance evaluation. One is a logging facility in the scheduler. Throughout the scheduling cycle and task execution cycle, the scheduler writes time-stamped events to a logging facility provided by the Processor object. Typical events include the start of the scheduling cycle, the time allocated to each task in a processor’s task list, the runtime and priority of each task, the start time of each task, and any exceptions such as schedule overruns. In order to minimize the intrusion of the logging facility on scheduler operations, the logging facility maintains logged events in memory until a maximum number is reached at which point events are written to disk as a single block. This reduces the potential for the scheduler’s execution being blocked due to I/O. Figure 3-3 depicts the contents of an event log file. Event Begin Sched Initial Sched Initial Sched Initial Sched Added Iter Added Iter Starting End Sched Starting Starting Idle Begin Sched Initial Sched Initial Sched Initial Sched Added Iter Added Iter Starting
SndId 0 3 4 5 4 5 3 0 4 5 0 0 3 4 5 5 4 3
Type --------Sampled Synthetic Synthetic Synthetic Synthetic Sampled --------Synthetic Synthetic ----------------Sampled Synthetic Synthetic Synthetic Synthetic Sampled
Time 22202562 22202738 22202798 22202840 22202853 22202863 22202889 22202948 22211139 22230892 22251479 22304782 22304844 22304888 22304921 22304932 22304940 22304963
NIter 0 1 1 1 10 10 0 0 0 0 0 0 1 1 1 10 10 0
Priority 0 1.11 1.53 1.7 1.53 1.7 0 0 0 0 0 0 1.11 1.53 1.76 1.76 1.53 0
Iter Time 0 20000 1500 3812 1500 3812 0 372 0 0 0 0 6202 1808 1893 1893 1808 0
Time Left 0 74299 72799 68987 55487 21179 0 0 0 0 0 0 85554 83746 81853 64816 48544 0
Figure 3-3 A sample of the scheduler's log A number of post-processing analysis tools were developed to extract meaningful data from the events logged during runtime. One such tool reports load information on each processor
39
during runtime. Another collects vital statistics for a specific sound source including its priority, and runtime characteristics, and allotted time each period. Finally a tool was developed for collecting the error produced due to the allotment of execution time to each sound during a period. Also average error and the average equivalence index over the runtime of the system is reported. This collection of tools proved to be extremely useful in analysis of the different algorithms. The other facility provided by VAS is graphical user interface (GUI) which reports useful information and is appropriate both debugging scheduling algorithms as well as visually evaluating what an algorithm is doing. This is often difficult to do just by listening to the resultant sounds. By providing visual feedback while listening, a GUI can assist an algorithm designer is ascertaining the sonic effects of a scheduling algorithm’s behavior. Fig. 3-4 is a snapshot of VAS GUI interface during runtime. The GUI is partitioned into three separate areas. The Chart window displays each sound source’s name along with the time it has been active, its priority, the amount by which it is being degraded, and its play state. The World window displays a three dimensional rendition of the sonic environment with four views. Three orthographic views looking down each axis and a perspective view from the listener’s point of view. Each sound source is represented as a sphere that is either red when the sound is stopped or green when the sound is playing. The sphere is intersected by a plane that is color coded to match the sound’s entry in the Chart window. The listener’s gaze vector is visualized using an arrow protruding form the listener’s head. Finally a message window displays any information or warning messages output by the system.
40
Figure 3-4 VAS's Graphical User Interface
41
Chapter 4 Real-time Evalua tion of Synthetic Sounds
Given limited computational resources, we are faced with the eventuality of exceeding those limits as the number of active sounds in the system increases. Electronic musical instrument designers have addressed this problem. Their approach is simply to drop the oldest sustained sound when a newly introduced sound exceeds the limits of the hardware. While this approach may be appropriate for musical applications, it is not optimal for VE applications since the dropped sound may be one the listener is attending to. A better approach is to devise a mechanism that attempts to guarantee the continuous playback of the active sounds by gracefully degrading them while minimizing the perceptible effects of the degradation. The approach presented here makes use of real-time scheduling strategies to achieve this goal. When resources are not sufficient to fully evaluate all the active sounds in the sonic environment, the quality of those sounds is non-uniformly degraded based on an approximation of what sounds the listener is most likely attending to. During overload conditions, where it becomes necessary to completely discard some sounds, they are discarded in priority order so that the perceptible effects are minimized. Guaranteeing the continuous playback of sounds requires that the execution of the sound evaluation routines be closely monitored and controlled. This can be accomplished using realtime scheduling techniques if the sound evaluation process can be expressed in terms of a realtime workload model. In fact, the problem of scheduling concurrent sounds can be easily expressed in terms of the hard real-time, periodic workload model. The job set consists of the sounds in the environment. Each sound evaluation routine periodically submits requests for the
42
evaluation of a sample block of fixed size. These submissions form each job’s task set. Because the block size is fixed for all sounds, the periods of all the jobs are identical. The computation of a sample block must be completed in a time not to exceed this period. A precise schedule is, therefore, one where all the active sounds complete the evaluation of their respective sample blocks before their deadline. This may not be possible if the computational resources available are exceeded. In order to address this problem we make use of a graceful degradation scheme in order to maintain real-time evaluation rates. The evaluation of a synthetic sound in real-time can be expressed using the imprecise computation model. Because evaluating the sound signal at successively higher sampling rates will increasingly produce better results, the evaluation routine is a monotone process. The error introduced by degrading the signal is linearly proportional to the sampling rate and hence the error function is linear. Three primary questions must be addressed in order to make the approach described above possible. The first is how are priorities assigned to sounds in the environment? The second is how can synthetic sounds be iteratively evaluated? The final question is how can resources be allocated to sounds such that the perceptible effects of overload conditions be minimized? While this work addresses all three issues, our primary focus is on the problem of resource allocation. In the following sections we address each question and present our results.
4.1 Prioritizing Sounds Ideally, what the priority algorithm must do is determine which sounds in the environment the listener is attending to. This, of course, is impossible to do specifically since attention is subjective; the listener cognitively decides what to pay attention to.
43
The best we can do is to guess what the listener may be paying attention to, based on the state of the listener and the sounds in the environment. Psychoacoustic principles can give us some hints as to how to determine what sounds a listener is paying attention to. In determining what factors can be used to predict a listener’s attention, we are constrained to using data that is readily available to the scheduler. The following three factors, which are used to predict the focus of a listener’s attention, are based on data that is readily available at runtime: The orienting response [33] is a human response to aural stimuli where the listener will attempt to support the perception of aural stimuli through visual correspondence. In effect listeners will turn their head so that they can see what they’re listening to. This reaction can be very useful in determining what the listener is attending to. Our priority algorithm approximates this reaction by weighing a sound using a cos θ scaling factor where θ is the angle between the listener’s gaze vector and the sound (fig 4-1). The intensity of a sound is important due to masking phenomenon [12]. A higher intensity sound will tend to mask a lower intensity sound if the two sounds are within the same frequency band. The relative intensity of a sound is calculated as the ratio of the sound’s intensity and a reference intensity. The highest intensity sound within the same frequency band is used as the reference. Determining the frequency content of a sound, however, requires a Fourier transform to transform the signal into the frequency domain where the analysis must take place. The problem with this approach is that the computational expense of performing this transform would more than offset any gains made by the degradation of sounds. Furthermore, the determination of the relative priority of the active sounds would require n2 comparisons, where n is the number of active sounds. Given the computational expense of accurately calculating the masking of sounds
44
we opted to approximate this factor by simply giving louder sounds a higher scale factor. This determination takes into account distance based attenuation.
Sound Source
Gaze Vector
θ
Listener
Figure 4-1 Measuring the angle between the listener’s gaze vector and the sound source
The final scaling factor is based on the adaptation response of the human aural system [34]. Our sensitivity to aural stimuli decreases as the presence of the stimuli persists. This process continues for approximately three minutes after which it levels off. In effect, we adapt to persistent sounds in our environment making a sound’s age important. The scaling factor for this component begins at the start of a sound at its maximum level and decreases linearly until it reaches zero at the end three minutes. After three minutes it is no longer a factor in the rating of the sound. Given these three scalars, the priority of a sound can be calculated as follows:
45
Ii Wg cosθ i + Wt (3 − Ti ) + Wl Io if 0 ≤ Ti ≤ 3 P( si ) = W cosθ + W Ii otherwise i l g Io where θ i is the angle shown in fig. 4 - 1 Ii is the intensity of the sound at the listner Io is the reference intesity Ti is the amount of time in seconds that a sound has been playing Wg , Wt , Wl are weights used to vary the contribution of each component Equation 4-1 Evaluation of a sound's priority The problem of predicting a listener’s attention from environmental factors is necessarily speculative due to complexity of human hearing process, especially when cross-modal perception is considered.
We consider this approach as a starting point. Further work is
necessary to study the effectiveness of this scheme with experimental data.
4.2 Iterative Evaluation of Synthetic Sounds The imprecise computation model requires that we iteratively evaluate a Timbre Tree for a bounded region in the temporal domain at successively higher resolutions. If the iteration is stopped before full resolution is reached, the evaluation routine will not have generated all the samples. The missing samples must be calculated using interpolation.
46
S0,0
S0,1
S0,2
S0,3
S0,4
mini-buffers
S2,0
S4,0
S2,1
S4,1
S2,2
S2,3
S4,2
S4,3
S2,4
S4,4
Figure 4-2 IBuffer after three iterations
To facilitate this, a buffer structure was devised which we call the Interpolating Buffer or IBuffer. The IBuffer consists of a set of mini-buffers in a stacked configuration as depicted in fig. 4-2. The depth of this stack determines the number of iterations necessary before full resolution is reached. On each iteration, the evaluation routine calculates a block of samples equivalent to 100 ms at 1/d of the global sampling rate, where d is the depth of the buffer. The resultant sample block is written to the IBuffer, which fills one of its mini-buffers with the samples. The order in which the mini-buffers are filled is predetermined by the IBuffer to minimize the number of empty mini-buffers between any two full mini-buffers. The starting time of the evaluation of each iteration is offset by an amount
1 × Depth Rate
47
Rate is the global sampling rate, and Depth is the depth of the mini-buffer. Once the evaluation routine has completed its last iteration, any empty mini-buffers are filled by linearly interpolating between the two closest full mini-buffers. The content of the mini-buffers is then copied to an output buffer in an interleaved fashion as shown in fig. 4-3. Calculating missing samples by linear interpolation introduces aliasing artifacts in the resultant sound. This is due to the lack of low pass filtering before the resampling processes. A better approach would be to convolve a low pass filter with the evaluated samples and resample the filter at a higher sampling rate. The problem with this approach is that the filter convolution process is computationally too expensive given real-time constraints. Any gains made due to under-sampling the signal would be lost to the filter convolution.
mini-buffers 0 1 2 3 4
output buffer
S0,0
S0,1
S1,0
S0,2
S1,1
S2,0
S1,2
S2,1
S3,0 S4,0
S0,3
S3,1 S4,1
S0,4
S1,3
S2,2
S1,4
S2,3
S3,2
S3,3
S4,2
S4,3
S2,4
Interpolated mini-buffers
S3,4 S4,4
S0,0 ... S4,0 S0,1 ... S4,1 S0,2 ... S4,2 S0,3 ... S4,3 S0,4 ... S4,4
Figure 4-3 The interpolation step and output buffer. The IBuffer is depicted after three iterations. The samples in mini-buffers 0, 2, and 4 are evaluated. Those in 1 and 3 are interpolated.
4.3 Scheduling Algorithms Before considering the effectiveness of scheduling algorithms for imprecise computation, we first look at the characteristics of the problem domain of sound evaluation. We
48
have described how this problem fits the imprecise computation model and how it can be expressed in terms of a hard real-time periodic model. We now make some observations concerning the characteristics of the job set. 4.3.1 Problem Characteristics of Sound Generation Sound evaluation routines each submit a series of requests for the evaluation of a block of sound wave sample points. The size of this sample block is fixed across all jobs and thus occupies the same playback time in the output device. If the evaluation routine does not produce a new sample block before the current one has completed playback, silent gaps will appear in the output. The deadline for each of the requests is thus the time for a sample block to playback at the output device. Because the period of a job is defined as the time between two consecutive deadlines, all the jobs will have the same period, namely the playback time of a sample block. The execution time, on the other hand, depends on the complexity of the Timbre Tree being evaluated and varies across jobs. In our model, each active sound in the system corresponds to a job in the hard real-time workload model. Given the transient nature of the sounds in the environment, we can expect that the system will exhibit a very dynamic behavior where jobs frequently enter and leave the scheduler. Furthermore, given the method used for evaluating the relative importance of the sounds in the environment, we can expect that the priority of the jobs will also change frequently. The dynamic nature of this problem must be carefully considered when devising a scheduling strategy. Finally, users of a spatial sound system may have domain specific knowledge which would lead them to want to impose an ordering scheme on the sounds in the environment
49
independently of the one determined by the system. This requires that a mechanism be provided such that users can specify the relative importance of sounds. We propose a two tiered strategy where sounds can be classified as either critical or non-critical. Non-critical sounds are only scheduled when all the critical sounds have been evaluated at full resolution. This gives the user some measure of control over the relative importance of the sounds in the environment. Given the above characteristics, we will consider the suitability of various scheduling strategies for managing the sound generation process. Before delving into this, however, we first describe the sonic environments that were created in order to test the scheduling algorithms under consideration.
2,5
2
1,5 Priority
Sound 1 Sound 2 Sound 3 Sound 4 Sound 5
1
0,5
Period
Figure 4-4 Behavior of sounds in the City sonic environment
50
99
96
93
90
87
84
81
78
75
72
69
66
63
60
57
54
51
48
45
42
39
36
33
30
27
24
21
18
15
9
12
6
3
0
0
4.3.2 Algorithm Evaluation Test Environments Three sonic environments were developed in VAS so that the effectiveness of real-time scheduling algorithms could be examined. Each environment consisted of a number of synthetic as well as sampled sound sources. The driving criteria in developing these environments was that they create overload conditions at varying degrees for examining the behavior of both uniprocessor scheduling algorithms as well as multiprocessor scheduling algorithms. All environments were designed to exhibit dynamic behavior with multiple sounds starting and stopping. This was accomplished using the scripting capability in VAS’s Auditory Actors. The City sonic environment was developed to simulate the sounds one might experience in an urban environment. Fig. 4-4 depicts the priorities of the sounds in that sonic environment during the periods when they were active. The prominent feature of the City sonic environment is the appearance of two bell sounds at around the 48th period. Both these sounds dominate the sonic environment and have approximately the same priority. The City sonic environment creates moderate overload conditions for testing uniprocessor-scheduling strategies. The Waves sonic environment was created to examine the effects of priority crossover between two prominent sounds. The two dominant sound sources, sounds 1 and 2, cross priorities at around the 40th period (fig. 4-5). Sound 1, which previously had a lower priority than sound 2, now attains a much higher priority. This is accomplished by raising the loudness level of sound 1. The Waves sonic environment was designed to place a heavy load on the processor in order to facilitate the examination of scheduling algorithms under heavy load conditions.
51
3
2,5
Priority
2
Sound 1 Sound 2 Sound 3 Sound 4 Sound 5 Sound 6
1,5
1
0,5
99
96
93
90
87
84
81
78
75
72
69
66
63
60
57
54
51
48
45
42
39
36
33
30
27
24
21
18
15
9
12
6
3
0
0 Period
Figure 4-5 Behaviors of sounds in the Waves sonic environment The third sonic environment was designed in order to examine the processor allocation and load balancing schemes developed for VAS. The Heavy sonic environment consists of ten sound sources. The primary design objective for this environment was that it exhibits a dynamic nature in order to simulate sounds created due to a user’s interaction with a VE. The processor utilization factor for the Heavy environment as well as the City and Waves sonic environments are plotted in fig. 4-6. Clearly the processor utilization for the Heavy environment varies dramatically while sufficiently loading two processors so that the effectiveness of a loadbalancing scheme can be examined.
52
2,5
2
Utilization
1,5 City Waves Heavy 1
0,5
99
96
93
90
87
84
81
78
75
72
69
66
63
60
57
54
51
48
45
42
39
36
33
30
27
24
21
18
15
9
12
6
3
0
0 Period
Figure 4-6 Processor utilization of the City, Waves, and Heavy sonic environments
4.3.3 Dynamic Task Allocation In chapter 2 we presented two possible approaches for scheduling real-time jobs on a multiprocessor. One approach is to use a multiprocessor-scheduling algorithm. The primary advantage of a multiprocessor algorithm is that load imbalances can be avoided because all of the processors are considered on each scheduling cycle and error can be globally minimized by considering all the processors when assigning execution time to the optional job set. The problem of scheduling a set of tasks on a multiprocessor in order to minimize error is however a difficult problem. Scheduling algorithms that are optimal for a single processor case, such as the
53
Rate Monotonic algorithm, perform poorly when extended to multiprocessors [29]. An algorithm that has been suggested for multiprocessor scheduling for imprecise computation [31], formulates the problem as a min-cost-max-flow network flow problem. A minimum error, feasible schedule is constructed by solving the optimization problem. The complexity of this algorithm, however, makes its use impractical for any real application. Another approach suggested in [29], partitions the job set first then statically assigns jobs to processors. Each processor is then scheduled using an uniprocessor-scheduling algorithm. Algorithms devised for partitioning a task set such as the RMNF and RMFF which were presented in chapter 2 attempt to find a partition of the task set that minimizes the number of processors used while guaranteeing that a feasible schedule is possible on each processor. Using this approach for the sound evaluation problem has a number of pitfalls. The dynamic nature of the job set will almost certainly lead to load imbalances due to the static nature of the assignment of jobs to processors. Due to the non-deterministic nature of a user’s interactions within a VE, we cannot predict a priori the behavior of the sounds in the system. In other words, we cannot predict at any point what sounds may be active. Therefore, a static partitioning of the job set can only consider the potential load on any processor. This will not reflect the actual load conditions at runtime because they vary dynamically. It is generally agreed upon that maintaining a balanced load across processors is beneficial. In order to determine the validity of this assumption for the imprecise computation model we must analyze the effect of load imbalances on the average error. The following theorem relates the average error with load distribution.
54
Theorem 4-1: When the total utilization factor of all jobs U > 1, a partitioning which assigns each of N processors a load of U is optimal only if U ≤ N. We refer to optimality here in the N sense that no other assignment of load to processors will produce less error. Proof: In order to prove this theorem, we consider two cases of U: Case U ≤ N: In this case a balanced partitioning will assign each processor a load of
U ≤1 N
It has been shown in [44] that u ≤ 1 is sufficient to guarantee that the job set is precisely schedulable. Therefore under this partitioning all the optional jobs will be assigned their full execution time and hence the error will be 0. Clearly no other partitioning can produce less error. Case U > N: The formulation for average error presented in equations 2-2 and 2-3 simplifies to
1 K wk ∑ εk(O) p k =1 vk where E=
εk(O) = ∑j =l − p pk +1(τk − mk) − (σk, j − mk) l
Equation 4-2 Average error in terms of unassigned execution time εk(O) is the portion of k’s optional job which is not executed. Suppose that we have an optimal partitioning of the job set that allocates a load of to each processor, we should expect that the average error be minimized. If this supposition is correct, then moving some job Ji from processor Pj to Pk will result in a load imbalance and hence an increase the average error. We can formulate the difference in average error resulting from moving job Ji from processor Pj to Pk as
55
∆E =
1 wi K wk J wj wi ε ' i( O ) + ∑k =1 ∆Tk − εi( O ) + ∑ j =1 ∆Tj vi p vi vk vj
where wi is the weight of job Ji vi is the total utilization factor of Oi εi( O ) is the error due to Ji's assigned time on processor Pj ε ' i( O ) is the error due to Ji's new assigned time on processor Pk ∆Tk is the difference in assigned time to job k after adding Ji to Pk ∆Tj is the difference in assigned time to job j after removing Ji from Pj Equation 4-3 Change in average error due to a job migration
We now consider the following job sets assigned to two processors P1 and P2 respectively: J1 = {(2, .5, .2), (4, .2, .3), (5, .1, .1), (6, 2, .4)} J2 = {(2, .2, .8), (7, .1, .1), (8, .1, .1)} Each job is specified as a tuple (τ, m, w), consisting of the job’s total execution time, its minimum acceptable execution time, and its weight respectively. The total utilization factor on each processor is 1.7 and hence the load is ideally balanced. According to equation 4-3, moving job j2 (4, .2, .3) from P1 to P2 results in a decrease in average error of the jobs assigned to P1 of .20369. The corresponding increase in average error that results from adding j2 to P2’s job set is .056314. The net change in error due to this migration is ∆E = .056314 - .20369 = -.14738. The total average error was decreased due to the job migration and hence we have a contradiction. n The idea of optimizing the partitions so that the number of processors used is minimized may be a good constraint for the general real-time scheduling problem but is not realistic for this
56
problem. We choose instead an approach that minimizes the error given a fixed number of available processors. The RMNF (or RMFF) algorithm does not consider the resultant error in the imprecise computation when assigning jobs to processors since only the mandatory job set is considered. Finally, RMNF (or RMFF) assigns jobs to processors based on their period. As was described earlier, all of the jobs in our workload model have the same period. These algorithms will therefore lead to an arbitrary assignment of jobs to processors. If static analysis of the job set will not lead to a good partitioning, then we must consider a dynamic approach. A dynamic load-balancing scheme is concerned with distributing processes among the processors in a parallel or distributed system such that the load is evenly balanced. The features that distinguish a load balancing policy mechanism include whether or not they are pre-emptive, where the policy decisions are made and who initiates load balancing, the content and scope of the load information upon which policy decisions are made, and finally how processes are distributed once load balancing is initiated. The shared memory machine architecture chosen for this problem offers a number advantages over distributed memory machines with regard to dynamic load balancing. Processors in a shared memory machine all share a common memory, it is therefore much simpler to collect load information centrally and make global load balancing decisions. The problem of migrating processes is not necessary since all the processes reside in shared memory and can switch processors without being physically moved. This makes dynamic load balancing a much simpler problem on these machines. A major distinguishing feature of dynamic load balancing schemes is whether or not they are preemptive. A preemptive, or migratory, policy will preempt a running process in order to move it when a load imbalance is detected in the system. Non-migratory policies, on the other
57
hand, only move non-executing processes. A migratory policy affords the system better responsiveness to load imbalances since processes can be moved at any time. The problem is that moving a running process is not an easy task and involves a great deal of overhead. A Study of migratory policy [36] has shown that major performance gains are not attainable using preemptive policies over non-preemptive policies. In the case of shared memory machine however, the processes need not be moved because they all reside in the same memory. It is therefore feasible to reassign a process to a new processor at any time without incurring the cost of moving a running process. Load balancing strategies can be classified based on who does the load balancing as centralized or distributed strategies. In a centralized strategy all the system load information is collected at one processor which executes the scheduling algorithm and initiates process migration. In a distributed scheme each processor collects information regarding load information and initiates process migration. Centralized schemes, although simpler, are not scalable to a large number of processors due to the increasing load on the centralized scheduling processor and to communication bottlenecks in the case of distributed system. Distributed load balancing strategies, on the other hand, do not suffer from these drawbacks. The major problem with distributed load balancing schemes is that they place the overhead of gathering the system load information and running the scheduling algorithm on the individual processors. An important design issue in the collection of system load information is the question of what statistics to gather and how to sample those statistics. Load balancing decisions can be based on CPU load, memory demand, and communication load, among others, for both the source and destination processors. Sampling these statistics can occur by event sampling where statistics are gathered at the onset or completion of certain events or by interval sampling where
58
vital statistics are gathered at regular fixed intervals. Considering the sound generation problem on a shared memory architecture, communication load and memory demand are not applicable to the problem. The primary statistic of interest is CPU load. As mentioned earlier, the runtime of each task is monitored for each period of execution. The utilization factor for a task is calculated as follows: τi , j P where: ui, j is the utilization factor for task i of job j τi , j is the time required to fully execute task i of job j P is the period which is fixed for all tasks
ui , j =
Equation 4-4 Utilization factor of a job
The utilization factor of any processor is simply the sum of the utilization factor of the tasks assigned to that processor. This value is sampled at the end of each period and can then be used to perform load balancing at the start of the next period. The granularity of the parallelism in a distributed load balancing system is significant. It must be large enough to offset the overhead of the scheduling activities incurred by the processors. It is, therefore, appropriate to utilize medium to large grain parallelism such as that exhibited by this problem. Finally, we must consider the issue of when and where to move processes in order to achieve load balancing. This has been an active area of research in dynamic load balancing and a number of algorithms have been suggested. The algorithms vary in the amount of load information they require, how that information is collected, whether they are sender or receiver-
59
initiated, and the policy they use to decide when to initiate load balancing. At the simplest level a random allocation strategy has been proposed in [38] which requires no load information. New work is randomly allocated to processors in the system. This algorithm works surprisingly well given a uniform grain size such as that exhibited by our problem. Other load balancing strategies include the gradient model [39], adaptive contracting within a neighborhood (ACWN) [40], the sender algorithm, the receiver algorithm [37], the symmetric algorithm [41], and the periodic symmetrically initiated algorithm [42]. Most of these algorithms are threshold based. They define a minimum and maximum load threshold that specifies when a processor is overloaded or under loaded. When the load of the processor exceeds the threshold, load balancing is initiated. The issue of where to send a process to is an important one. A simple strategy is to randomly poll a number of surrounding processors for their load information and either send (sender-initiated) or request (receiver-initiated) a process based on the load information gathered. More sophisticated algorithms such as the gradient model and ACWN use the load information gathered within a neighborhood to determine where to send a process. These algorithms, however, only pertain to distributed systems where global load information is not available. Based on these observations, a heuristic load-balancing algorithm was devised and incorporated into the VAS scheduler. Due to the shared memory architecture chosen for this problem, the algorithm is able to use global load information across all processors, and use a preemptive migration policy. The following Policies were defined for task allocation and task migration. The task allocation policy is based on the random allocation presented above. This is based on the observation that client applications typically allocate all sounds during an initialization stage. At runtime sounds are dynamically started and stopped, and finally before exiting, sounds are deallocated. This means that at task allocation time, when new sounds are
60
created, no sounds are yet active and hence no actual load information exists to base a task allocation on. The static load information is of little use since it does not reflect the actual load during runtime. Finally as was shown in [38], the random allocation policy does a good job of distributing the load for a uniform grain size. While the sound evaluation routines can differ in grain size based on the complexity of the Timbre Tree, experience has shown that the amount of variation they exhibit is fairly limited. We have not yet considered when load balancing should be initiated. A static invocation policy is one that is synchronous; load balancing occurs at regular predetermined intervals. A dynamic invocation policy is asynchronous such that load balancing can occur at any time. A dynamic invocation policy is more responsive and can reduce scheduling overhead since load balancing is only initiated when needed. Unfortunately, a dynamic policy can lead to serious problems for a real-time scheduler. Once each processor’s scheduler has determined an allocation of time to its assigned jobs for a given period, migrating a job during that period will invalidate the schedules. A dynamic invocation policy is therefore not appropriate for a real-time system. Instead a static invocation policy is used where load balancing is initiated at the start of each period. The overhead incurred due to this policy is actually minimal because in highly dynamic sonic environments load imbalances may occur frequently enough so that a dynamic policy would invoke load balancing as often as a static policy, in which case both algorithms would impose the same overhead. In other cases, the overhead is minimized by only initiating load balancing when at least one of the processors meets the task migration threshold. The task migration policy chosen is a sender based preemptory policy that affords better response to overload conditions. In order to determine when a processor is to be considered overloaded, we use the total utilization factor of the mandatory and optional jobs assigned to that
61
processor. Obviously only active jobs (i.e. only those jobs whose sounds are playing) are considered. The threshold used for initiating the send procedure is Up > 1 where Up the total utilization factor assigned to processor P. The determination of where to send a job and which job to send is made as follows: The algorithm first finds the least utilized processor Pleast. A job is migrated from processor Pfrom to processor Pleast if there exists a job j on Pfrom that satisfies the following two constraints: In order to insure that the migration will reduce the difference in load between the two processors we have uj < Up - Up_least. uj is the total utilization factor of job j, and Up_least is the utilization factor of the least utilized processor. The second constraint is based on theorem 4-1, which shows that migrating jobs in order to balance load might increase the average error. In general, equation 4-3 gives the exact change in error, ∆E due migrating a job between two processors. Ideally we would like to guarantee that a given job migration will not only result in a negative ∆E but also maximize ∆E. Unfortunately an exact solution requires that for each job that is being considered for migration, the post-migration schedule on P1 and P2 be generated in order to evaluate ∆E. The resulting overhead makes that an unreasonable approach. Instead we must resort to using a heuristic to determine which job, if any should be migrated. In order to examine the effect that our choice of job has on ∆E, we reformulate equation 4-3 to the following
∆E =
wi K wk J wj (σ − σ ' ) + ∑ k =1 ∆Tk − ∑ j =1 ∆Tj vi vk vj
Our choice of job will determine the value of wi/vi and the value of σ. In order to make wi/vI(σ σ’) as negative as possible, we want to choose a job such that wi/vi is maximized, σ is minimized, and σ’ is maximized. Clearly σ’ is maximized when wi/vi < wf/vf. f is defined for processor Pleast as follows: Assuming the job set is ordered by the weighted utilization factor wk /
62
vk such that it is non-decreasing in k, J={j1, j2, …, jk}. There exists an integer f where an optimal assignment will allocate the first f optional jobs in Ok for k = 1..f their full execution time. More formally, we have
∑
vk ≤ 1 − u < ∑ k =1 vk k =1 f +1
f
Equation 4-5 Definition of integer f u is the utilization factor of the K mandatory jobs, and vk is the full utilization factor of an optional job. When wi/vi > wf/vf the migrated job will receive more execution time σ’ than its current execution time σ and hence wi/vI(σ - σ’) will be negative. wi/vi is maximized and σ is minimized when job jg+1 is migrated because it is the job with the highest wi/vi that is not receiving its full execution time. g has the same definition as f but for processor Pfrom. Based on these observations, our load balancing algorithm finds the first job ji on Pfrom such that i > g and ui < Up_from - Up_least. Job ji is migrated if wi/vi > wf/vf. Because the algorithm tries to minimize the difference in load between any two processors, we termed the algorithm the Minimum Difference or M-D algorithm. Pseudo code for the M-D algorithm follows:
63
M-D() begin P = {p1, p2, …, pN}
/* P is the set of N processors */
for i:= 1 to N do p := P(i) if p.utilization > 1 then pmin = FindMinUtilized(P) /* Returns the least utilized processor */ if pmin = p then next i else u := p.utilization – pmin.utilization /* GetJobByUtilization finds a job with the smallest vj/wj */ /* and with a total utilization ≤ u */ j := GetJobByUtilization(p, u) if (j null) and (vj/wj > vf/wf) then Migrate(j, p, pmin)
/* Migrates a job from p to pmin */
endif endif endif endfor end Figure 4-7 The M-D scheduling algorithm
The M-D algorithm was incorporated into the VAS scheduler and processor loads were measured during the evaluation of the Heavy sonic environment on two processors. In fig. 4-8 the load assigned to each processor with the M-D algorithm and a static allocation policy are depicted. The static strategy that was used simply allocates an incoming job to the processor with the least potential load. Also plotted on fig. 4-8 is the optimal load assignment of U/N.
64
Potential Load 2,5
2
1,5 Utilization
Proc1 no lb Proc 2 no lb Proc 1 lb Proc 2 lb Optimal
1
0,5
88
84
80
76
72
68
64
60
56
52
48
44
40
36
32
28
24
20
16
8
12
4
0
0
Period
Figure 4-8 Processor loads with and without dynamic load balancing
In order to measure the effectiveness of these two strategies, the mean of the difference between the load assigned to the two processors was measured. Clearly a mean value of 0 implies that both processors received identical load and hence we have the optimal assignment of U/N. The mean value for the static allocation policy was found to be 0.8. The corresponding mean value for the dynamic allocation was 0.2. The dynamic policy clearly balanced the load much more effectively than the static policy. This is also evident by examination of the plots in fig. 4-8. The processor loads for the dynamic allocation policy were much closer to the optimal case than those for the static policy.
65
The effect of the load assignments on error was measured by calculating the total average error produced by both policies. The total average error was calculated simply as the sum of the average error produced on each processor over a time of 100 periods. The total error produced by the static allocation strategy was 0.74046. The corresponding error generated using the dynamic task migration was .609861. As expected the M-D algorithm balanced the load while significantly reducing the average error. 4.3.4 Processor Scheduling Having assigned each processor a job set, we must now turn to the problem of scheduling the job set so that the perceptible effects of overload are minimized. The Least Utilization (LU) algorithm discussed in chapter 2 is clearly a promising approach. In [28], it was shown that for type N jobs where all the jobs have the same repetition period and linear error functions, the LU algorithm minimizes the average error. This makes the LU algorithm a promising candidate for scheduling jobs in this problem domain since it meets the above criteria. In order to employ an imprecise computation model for scheduling the evaluation of sounds, we must first define a minimum acceptable precision for the evaluation processes. As mentioned earlier, sounds are evaluated in an iterative fashion with ten iterations required to fully evaluate a sound. Clearly the lower bound of the number of iterations of the sound evaluation process has to be one iteration. This is the minimum necessary to produce any sound at the output. Furthermore, as the minimum acceptable execution time (mk) of the job set is raised, we show in Theorem 4-2 that under overload conditions where the total utilization factor of the job set U > 1 the average error increases with mk. The average error values produced by varying the minimum number of iterations assigned to each sound are plotted in figure 4-9. As
66
expected the average error rises as the minimum number of iterations is raised. This leads us to conclude that setting the minimum number of iterations to one iteration is a good choice. 0,65 0,64 0,63 0,62
Avg. Error LU
0,61 0,6 0,59 1
2
3
4
5
6
Figure 4-9 Average error of the LU algorithm with min. iterations varying between 1 and 6 iterations.
Theorem 4-2: When U > 1, the average error E increases with the minimum execution time for all jobs mk. Proof: We consider two cases of U: Case U ≤ 1: The job set can be precisely scheduled and the error is 0. Case U > 1: Over time p, the first f optional jobs are assigned their full execution time pvk out of the time left over from executing the mandatory job set, p - pu. Given the integer f that was defined in equation 4-5, optional job f+1 will receive the following partial execution time:
67
p(1 − u − ∑ k =1 vk ) f
where: p is a time equal to an integral number of periods u is the total utilization of the manditory job set vk is the utilization of an optional job
Clearly as mk and hence u is increased, there is less time available for executing the optional job set. We consider two cases for an increase ∆m in the mandatory jobs yielding m’k and u’. Case 1:
∑
K k = f +2
∆m ≤ pvf
+1
− ∆m
In this case the extra execution time allocated to the f+2..K mandatory jobs can be deducted from the time allocated to the optional job Of+1. This new allocation will produce a new error value E‘. Because wf+1 / vf+1 ≤ wf+2 / vf+2 ≤…≤ wK / vK, and by definition of the error function we have E’ ≥ E. Equality will occur in the case where wf+1 / vf+1 = wf+2 / vf+2 =…= wK / vK. Case 2:
∑
K k = f +2
∆m > pvf
+1
− ∆m
In order to satisfy equation 4-5, f must decrease yielding a new value f’’. For a given value of f wk σk ( O ) this is because the first f jobs are assigned vk their full execution time and hence produce no error. Because f > f’, it follows that E(f) < E(f’). the error produced is E ( f ) = 1 − ∑ k = f +1 K
n
68
12
10
# Iterations
8
Sound 1 Sound 2 Sound 3
6
4
2
96
92
88
84
80
76
72
68
64
60
56
52
48
44
40
36
32
28
24
20
16
8
12
4
0
0
Period
Figure 4-10 Iterations assigned to sounds in the Waves sonic environment by the LU algorithm
The LU algorithm was incorporated into the VAS scheduler. The time allocated to the synthetic sounds (sounds 1, 2, and 3) in the Waves sonic environment is presented in Fig. 4-10. The results produced by the LU algorithm exhibit a number of problems that are especially prevalent in the Waves sonic environment. Because the LU algorithm assigns execution time to jobs based on the Greedy method, it tends to generate large disparities between the processing time allocated to sounds that are close in priority. The majority of resources are allocated to two sounds with the smallest weighted utilization factor, while a third sound of a higher priority, is
69
allocated the minimum number of iterations. Because a listener is capable of attending to multiple sounds simultaneously, the effect of a highly degraded sound is pervasive. 12
10
# Iterations
8
Sound 1 Sound 2 Sound 3
6
4
2
96 10 0
92
88
84
80
76
72
68
64
60
56
52
48
44
40
36
32
28
24
20
16
8
12
4
0
0
Period
Figure 4-11 Iterations assigned to sounds in the Waves sonic environment by the PO algorithm
Another problem in the behavior of the LU algorithm is that it bases it job ordering on the weighted utilization factors and hence the job with the highest priority is not guaranteed to receive the biggest share of processing time. In evaluating the Waves sonic environment, the LU algorithm allocated sound 2 one iteration while it had the highest priority of the three sounds.
70
An alternate algorithm was devised which orders jobs purely on the bases of their weight. Within each period, the task set is ordered in non-increasing values of weight wk, such that w1 ≥ w2 ≥ … ≥ wk. Jobs are assigned execution time in order by assigning the maximum possible execution time to each optional job in increasing k (decreasing w). Because of the way jobs are ordered, we call this algorithm Priority Order (PO). Pseudo-code for this algorithm follows: Priority_Order() begin T := {t1, t2, …, tk}
/* Task set ordered in non-increasing w */
P := time per period
/* P maintains the available time in this period */
for i:=1 to k do t := T(i) t.execution_time := mk
/* mk is the minimum acceptable execution time */
P := P - mk endfor for i:= 1 to k do t := T(i) temp := min(P, ti.total_execution_time - mk) P := P - temp t.execution_time := t.execution_time + temp endfor end Figure 4-12 The Priority Order algorithm The execution times allocated by the PO algorithm are plotted in fig. 4-11. As expected the assigned times correlate more closely to the priority of the sounds. A large disparity however still exists between the time assigned to sounds 1 and 2 which have the highest priorities, and sound 3 which is of slightly less priority.
71
In general, any scheduling algorithm that assigns execution time to jobs based on the Greedy method will exhibit problematic behavior when applied to the sound evaluation problem. Given some ordering of the optional job set O= {o1, o2, …., on} with corresponding processing times per iteration of T={t1, t2, …., tn}. For some integer value 0 ≥ f ≥ n, a Greedy method algorithm will always assign the first f optional jobs the full execution time. Optional job f+1 will receive partial execution time. The remaining jobs will receive the minimum acceptable execution time. The problem with this behavior is that for small values of f, important sounds are heavily degraded. Also sounds that are close in priority may receive drastically different execution times. Finally when priority crossover occurs between two sounds on the f boundary, a perceptible discontinuity occurs in the quality of the two sounds. The above problems can be alleviated if a fair strategy is employed in assigning execution time to the optional jobs. A fair strategy allocates each optional job an execution time that is proportional to its weight. This ensures that large disparities do not occur between the time assigned to jobs of similar priority. High priority jobs receive a larger portion of the execution time. Finally priority crossover does not produce a discontinuity in the quality of the sounds since their assigned times are similar. The degree of fairness exhibited by an algorithm can be expressed as follows:
72
f =
1 n K σk , j − mk − wk , j ∑∑ K n j =1 k =1 Tp − ∑ mk k =1
where n is the number of periods that f is measured K is the number of jobs σk , j is the time assigned to optional job k in period j mk is the minimum acceptable precision of job k Tp is the total time available each period wk, j is the weight of job k in period j Equation 4-6 A formulation of the average fairness of a schedule The above expression measures the average fairness of a schedule in terms of the time assigned to each optional job and the priority of that job. We refer to f as the fairness index of a schedule. Because the weights of the jobs ∑wk = 1, f is minimized when each optional job receives a portion of the available execution time that is proportional to its weight. Hence smaller values of f indicate a schedule that is increasingly fair. The above expression suggests the Priority Allocation algorithm (PA). The PA algorithm assigns each optional job an execution time that is approximately proportional to its priority. The actual time assigned to an optional job may be less than that indicated by its weight because the proportion of assigned time based on a job’s weight may exceed the total execution time of the optional job. Variations also occur due to rounding of the assigned time to the nearest multiple of the iteration time of the sound. Any resulting free time in the schedule is assigned based on the LU algorithm. Having satisfied the fairness constraint, the LU algorithm assigns the remaining time in the schedule so that the average error is reduced. Pseudo code for this algorithm follows:
73
Priority_Allocation() Begin T := {t1, t2, …, tk}
/* Task set ordered in non-increasing w/v */
P := time per period
/* P maintains the available time in this period */
for i:=1 to k do t := T(i) t.number_iterations := (t.priority * P) / t.iteration_time t.number_iterations := min(t.number_iterations, 10) P := P - t.number_iterations * t.iteration_time endfor for i:= 1 to k do t := T(i) temp := min(P, ti.total_execution_time - mk) P := P - temp t.execution_time := t.execution_time + temp endfor end
Figure 4-13 The Priority Allocation algorithm
74
Theorem 4-3: The PA algorithm minimizes f. Proof: Based on equation 4-6, f is minimized when each optional job is assigned the following K
execution time σk , j − mk = wk , j (Tp − ∑ mk ) k =1
The time assigned each optional job by the PA algorithm varies from this optimal assignment only when it exceeds the optional job’s execution time or due to rounding to the nearest multiple of the sound’s iteration time. Because these constraints must be observed, no other scheduling algorithm can better approximate the optimal assignment. Hence the theorem follows. n
0.7 0.6 0.5 0.4 Error 0.3 0.2 0.1 0
LU PO PA
City
Waves
0.8 0.6 LU f 0.4
PO PA
0.2 0 City
Waves
Figure 4-14 Avg. Error and f produced by the LU, PO, and PA algorithms
75
The average error and average fairness index (f) produced by the LU, PO, and PA algorithms are plotted in fig. 4-14. In comparison to the LU and PO algorithms, PA produces higher average error. This is expected since the PA algorithm deviates from the optimal assignment in terms of average error. The average fairness index, on the other hand, is considerably reduced by the PA algorithm compared to the LU and PO algorithms. The PA algorithm achieves its intended result of minimizing f, but at the cost of producing higher average error. Determining the subjective effects of the average fairness and the average error requires subjective evaluation. 4.3.5 Subjective Evaluation of Algorithms The minimization of the average fairness index is only significant if there is evidence to support the assertion that it impacts the percieved quality of sound. In order to establish this evidence, a listening test was conducted that compared the peceived quality of the sounds produced by the three algorithms which were presented: PO, LU, and PA. Listening tests are used by the audio industry for subjective assessments of audio equipment. Such tests are conducted by presenting listeners with two versions of an audio segment, the object and the reference. Listeners are then asked to rate the object in relation to the reference. The reference audio segment is an unimpaired signal, CD quality audio is generally used. The object audio segment is an impaired version of the reference. The source of the impairment varies based on the equipment being tested. For our purposes we considered methods used for testing low bit-rate audio Codecs. An audio Codec is a device which converts an audio signal from an analog to a digital representation and vice versa. The impairment introduced by such Codecs is due to undersampling of the sound. Because the impairment introduced by the
76
graceful degradation technique utilized in the VAS scheduler is also due to undersampling, the methods used for evaluating low bit-rate audio Codecs are applicable. We chose a test developed by the Swedish Broadcasting Corporation and used by the ISO MPEG/Audio in the establishment of the international MPEG standard for storage and retieval of moving picutures on digital media [43]. The method employed is a Triple Stimulus, Hidden Reference, Double Blind test. Subjects are presented with three items A-B-C. Each item consists of an audio segment. Item A is always the reference. Items B and C contain the object and a hidden reference. Because the test utilizes a hidden reference design, the subjects do not know which of B and C are the hidden reference. Subjects rate the amount of imparement detected between items A-B and A-C using a five point, continuous scale with one decimal. One advantage of this method is that it provides a measure of the degree of impairment that the subjects detected in the object. The other advantage of this method is that the hidden reference facilitates the consideration of the subject’s ability to judge impairments in the object. The rating sheet used along with the five point scale is reproduced in Appendix A. In preparing the test, twelve audio sequences were generated each consisting of the three A-B-C items. The items where generated by recording the result of evaluating the City and Waves sonic environments. In order to generate the reference items the sonic environments were evaluated using two processors so that degradation did not occur. In generating the object items, the sonic environments were evaluated using only one processor resulting in degradation. Each of the sonic environments was evaluated using the PO, LU, and PA algorithms. This resulted in six sequences one for each algorithm and sonic environment combination. An additional six sequences were generated by swapping the order of the object and reference items
77
so that any order effects would be accounted for. The resulting twelve sequences were then presented to the subjects in a random order. Twenty subjects were presented with the generated sequences and asked to rate each item B and C in comparison to A. The sounds were presented to the subjects using loud speakers in a group session with 10 subjects participating in each session. The subjects were given a practice session so that they were familiar with the procedure. The following instruction were give to the subjects prior to the test: •
Three sounds A, B, and C will be played. Rate the sounds B and C in comparison to A using the 5-point scale supplied.
•
The 5-point scale used in this experiment is continuous with one decimal point. Remember to use the whole scale.
•
When rating sounds, do not rate the sounds themselves but the relative difference between the sound pairs A-B and A-C.
The data was collected and analyzed using a within subjects ANOVA design (The data is presented in Appendix B). In the analysis of the ratings, the difference between a subject’s rating of the reference item and the object item was used. Because the resultant ratings measure the difference between the rating of the reference and the object, smaller numbers indicate a better rating. This approach allows for the accounting of the variation in the subjects’ abilities to be taken into account. Fig. 4-15 contains the results of the ANOVA analysis. The results showed significance in the algorithm used and the stimulus presented which was expected. The order of the presentation of the reference and object also had significance. Interestingly ratings were consistently better when the presentation order was reference-object-reference. Finally the interaction of the
78
stimulus with the algorithm had significance. This is also expected because the three algorithms behave differently under the heavier load conditions imposed by the Waves sonic environment. Source
SS df 691.2614 68.38449 Order 15.83121 Algorithm 14.87113 Stimulus 10.26721 Order x Algorithm 0.839563 Order x Stimulus 0.396907 Stimulus x Algorithm 19.44296 Error Order 19.34993 Error Alg. 30.03914 Error Stim . 33.79993 Error Ord. x Alg. 169.6155 Error Ord. x Stim . 166.2974 Error Stim . x Alg. 142.1261
Total Subjects
ms 163 19 1 2 1 2 1 2 19 38 19 2 19 38
15.83121 7.435565 10.26721 0.419782 0.396907 9.721482 1.018417 0.790504 1.778944 4.463566 8.752493 3.740161
F
15.54491 9.406112 5.771519 0.094046 0.045348 2.599215
p