Algorithms for Aural Representation and Presentation ...

Algorithms for Aural Representation and Presentation of Quantitative Data to Complement and Enhance Data Visualization by Christopher Richard Volpe A Thesis Submitted to the Graduate Faculty of Rensselaer Polytechnic Institute in Partial Fulfillment of the Requirements for the Degree of DOCTOR OF PHILOSOPHY Major Subject: Computer Science

Approved by the Examining Committee: ____________________________________ Ephraim P. Glinert, Thesis Adviser ____________________________________ Joseph E. Flaherty, Member ____________________________________ Kenneth E. Jansen, Member ____________________________________ Mukkai Krishnamoorthy, Member

Rensselaer Polytechnic Institute Troy, New York April 2002 (For Graduation May 2002)

© Copyright 2002 by Christopher Richard Volpe All Rights Reserved

ii

Contents

List of Tables ................................................................................................. vii List of Figures ................................................................................................. ix Acknowledgements......................................................................................... xi Abstract ......................................................................................................... xiii 1 Introduction................................................................................................. 1 1.1 Why Auralize ..................................................................................... 2 1.2 Our contributions ............................................................................... 3 1.3 The structure of this document .......................................................... 7 2 Historical Review ..................................................................................... 11 2.1 Auditory components of user interfaces .......................................... 12 2.2 Auditory representations of virtual worlds ...................................... 13 2.3 Using sound for data understanding ................................................ 16 2.3.1 Bly’s experiments ................................................................ 16 2.3.2 Audification of raw data ...................................................... 16 2.3.3 Auralization of program behavior........................................ 19 2.3.4 Scientific auralization by McCabe and Rangwalla.............. 20 2.4 Analysis techniques ......................................................................... 21 2.5 Software systems ............................................................................. 24 2.5.1 Kyma.................................................................................... 25 2.5.2 Ssound.................................................................................. 26 2.5.3 The “Listen” Toolkit ............................................................ 26 2.5.4 The Visualization Toolkit .................................................... 28 3 Illusions for Aural Representation ............................................................ 33 3.1 Pitch class illusion ........................................................................... 34 3.2 Pulse Rate Illusion ........................................................................... 35 3.3 Representation of multi-variate data................................................ 36 3.4 Other techniques .............................................................................. 37 4 Preliminary Research ................................................................................ 39 4.1 Motivation........................................................................................ 39 4.2 Visualization with stream tubes ....................................................... 40 4.3 Fooling the ear ................................................................................. 41 4.4 Vorticity auralization ....................................................................... 44 4.4.1 Creating the illusion............................................................. 44 4.4.2 Mapping the vorticity........................................................... 46 4.5 Initial Results ................................................................................... 47 5 Auralization Architecture ......................................................................... 51 5.1 Motivation........................................................................................ 51 5.2 Sound data representation ................................................................ 54 5.2.1 Waveform data formats ....................................................... 54 5.2.2 “Signal-Data” and “Extend-Data” ....................................... 56 5.3 Sound data pipeline and data flow ................................................... 56 5.3.1 Upstream data propagation .................................................. 57 5.3.2 Downstream data propagation ............................................. 59 5.3.3 Requirements imposed on sources and filters...................... 59 5.4 Sound rendering process .................................................................. 61 5.4.1 Background .......................................................................... 61

iii

5.4.2 Multi-threading and concurrency issues .............................. 63 5.4.3 Handling delays with Extend-Data ...................................... 67 6 Sound Generation Algorithms .................................................................. 75 6.1 Efficient waveform generation ........................................................ 75 6.1.1 Periodic function lookup tables ........................................... 75 6.1.2 Fixed point arithmetic .......................................................... 76 6.2 Pitch-class illusion revisited ............................................................ 77 6.3 Pulse-rate illusion filter.................................................................... 78 6.3.1 Generalization of pitch-class illusion principles.................. 78 6.3.2 Application of illusion principles to pulse rate perception .. 79 6.3.3 Computing the required amount of input............................. 82 6.3.4 Breaking input into pulses ................................................... 84 6.3.5 Crossing the 2π boundary .................................................... 84 6.3.6 Pulse border processing ....................................................... 84 7 Effectiveness Testing ................................................................................ 87 7.1 Testing approach.............................................................................. 88 7.2 Test application and User Interface ................................................. 90 7.3 Data sets used for testing ................................................................. 92 7.4 Test scenarios................................................................................... 95 7.5 Test definitions ................................................................................ 96 7.6 Defining task performance............................................................... 96 7.7 Experimental design ........................................................................ 98 7.8 Data interpretation ......................................................................... 100 7.9 Basic statistics................................................................................ 100 7.10 Paired T-tests across all data.......................................................... 101 7.11 One-way ANalysis Of VAriance (ANOVA) ................................. 102 7.12 Subjective questionaire .................................................................. 103 7.13 Informal testing of the pulse rate illusion ...................................... 105 8 Discussion and Conclusions ................................................................... 109 9 Future Work and Open Questions .......................................................... 113 9.1 Expanding palette of tools ............................................................. 113 9.1.1 Timbre-modifying filters. .................................................. 113 9.1.2 Alternate waveform sources. ............................................. 113 9.1.3 Wave file readers and writers. ........................................... 113 9.1.4 Format-modifying filters.................................................... 114 9.1.5 Content-modifying filters. ................................................. 114 9.1.6 Miscellaneous filters .......................................................... 114 9.2 Alternative wave sample representation ........................................ 115 9.3 Pluggable waveform architecture .................................................. 115 9.4 Relaxation of pipeline rules and restrictions.................................. 115 9.5 Expansion to other application domains ........................................ 116 9.6 Porting mapper to alternate platforms ........................................... 116 References..................................................................................................... 117 A Software System ..................................................................................... 125 A.1 Software architecture ..................................................................... 125 A.2 System requirements...................................................................... 125 A.3 Testing and data exploration tool................................................... 128 A.4 Software composition .................................................................... 130 A.5 Obtaining the software................................................................... 132

iv

B Test Instructions and Data ...................................................................... 133 B.1 Instructions for test subjects .......................................................... 133 B.2 Raw test data .................................................................................. 135 C Source Code ............................................................................................ 143 C.1 vtkDualToneSource.h .................................................................... 143 C.2 vtkDualToneSource.cxx ................................................................ 144 C.3 vtkPeriodicFunctionLookupTable.h .............................................. 146 C.4 vtkPeriodicFunctionLookupTable.cxx .......................................... 148 C.5 vtkPitchClassIllusionSource.h ....................................................... 149 C.6 vtkPitchClassIllusionSource.cxx ................................................... 151 C.7 vtkPolyLineRangeSelector.h ......................................................... 157 C.8 vtkPolyLineRangeSelector.cxx...................................................... 158 C.9 vtkPulseRateIllusion.h ................................................................... 160 C.10 vtkPulseRateIllusion.cxx ............................................................... 162 C.11 vtkSineLookupTable.h................................................................... 170 C.12 vtkSineLookupTable.cxx ............................................................... 171 C.13 vtkSoundData.h.............................................................................. 172 C.14 vtkSoundData.cxx.......................................................................... 181 C.15 vtkSoundDataSource.h .................................................................. 223 C.16 vtkSoundDataSource.cxx............................................................... 224 C.17 vtkSoundDataToSoundDataFilter.h............................................... 227 C.18 vtkSoundDataToSoundDataFilter.cxx ........................................... 228 C.19 vtkSoundMapper.h......................................................................... 230 C.20 vtkSoundMapper.cxx ..................................................................... 232 C.21 vtkWin32DirectSoundMapper.h.................................................... 234 C.22 vtkWin32DirectSoundMapper.cxx ................................................ 238 C.23 vtkWin32MMTimer.h.................................................................... 262 C.24 vtkWin32MMTimer.cxx................................................................ 264 C.25 vtkWin32Semaphore.h .................................................................. 266 C.26 vtkWin32Semaphore.cxx............................................................... 267

v

vi

List of Tables

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Test Scenarios ......................................................................................95 Test definitions ....................................................................................97 Allowable ranges of answers per test ..................................................98 Number of errors per modality ..........................................................101 Subjective questionaire ......................................................................104 Software component break-down by category ..................................130 Breakdown of base architecture.........................................................130 Breakdown of auralization-specific components...............................131 Breakdown of test application components .......................................132 Subject #1 Test Results......................................................................136 Subject #2 Test Results......................................................................136 Subject #3 Test Results......................................................................137 Subject #4 Test Results......................................................................137 Subject #5 Test Results......................................................................138 Subject #6 Test Results......................................................................138 Subject #7 Test Results......................................................................139 Subject #8 Test Results......................................................................139 Subject #9 Test Results......................................................................140 Subject #10 Test Results....................................................................140 Subject #11 Test Results....................................................................141 Subject #12 Test Results....................................................................141

vii

List of Figures

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

Streamtube.........................................................................................41 Streamtube obscured .........................................................................42 Pitch class circle ................................................................................43 Amplitude Envelope..........................................................................45 Tone generation pseudocode .............................................................48 vtkSoundData with multiple voices...................................................54 Pipeline data flow ..............................................................................57 High-level view of mapper threads ...................................................64 Complex waveform: sin(x) + sin(1.5x) .............................................69 Complex waveform as discrete samples............................................70 Pseudocode for streaming extend-data..............................................72 Pulses as function of theta .................................................................81 Pulse amplitude and duration envelope.............................................82 Pulse input computation ....................................................................83 SetDeltaTheta function for Pulse Rate Illusion .................................85 Change in twist direction...................................................................89 Testing application GUI ....................................................................91 Tube segment used as a “ring cursor” ...............................................92 “LOx Post” Data Set..........................................................................93 “Bluntfin” Data Set............................................................................93 “Comb” Data Set ...............................................................................94 “Office” Data Set...............................................................................94 Paired-T results................................................................................102 ANOVA results ...............................................................................103 TCL script for pulse rate illusion.....................................................106 TCL script for combined illusion ....................................................108 Object diagrams: vtkSoundData & vtkWin32DirectSoundMapper 126 Object Diagrams: Auxilliary classes ...............................................126 Object Diagrams: Sources & Filters................................................127

ix

x

Acknowledgements

There are many people without whom I would not have been able to conduct this research. I’d like to thank my wife, Debbie, who received far less of my attention over the past few years than she deserved. My grandparents instilled in me an appreciation and respect for hard work, and my Great-Aunt Josephine served as a constant reminder of the importance that my great-grandfather placed on education. My parents, Richard and Rosemarie Volpe, made me who I am today and supported everything I’ve ever done. In particular, my father brought out my interest in science and technology, and my mother made certain that her fourth-grade slacker didn’t ignore his schoolwork. I would like to thank my wife, Debbie, for helping me prepare for my defense of this thesis in numerous ways. I must also thank the management team at GE for their support of my work in the form of tuition reimbursement and for paying my expenses for a conference presentation of my work. For that, and for their moral support, I owe a debt of gratitude to Pete Meenan, Kirby Vosburgh, Bijan Dorri, Cathy Forth, and Mark Grabb. I also thank my co-workers, who helped by volunteering to be participants in my experiments, and extend a special thanks to one in particular, Pascale Rondot, who used her expertise in user experiments to assist me in the experimental set-up and the arrangement of tests among test subjects. And once again, I need to thank my wife Debbie for enduring all those nauseating sounds coming from the computer for all those months while I was developing and testing my software. I would like to thank my doctoral committee, and in particular, my advisor, Ephraim Glinert, for his helpful advice and assistance, and for keeping me on the right track. Finally, I need to thank my wife Debbie once again for helping me every step of the way and keeping me sane.

xi

xii

Abstract

Large quantities of complex data are part of almost every industry and science. For some time now, experts in these domains have relied on computers to present their data to them in a form which is easier to understand. One way this is done is through a process called visualization, which refers to generating graphical images that capture essential characteristics of the data and highlight interesting relationships. Another approach, which has received far less attention, is to use sound as a means of presenting complex information. This approach, called auralization, is the auditory analog of visualization. This thesis involves novel techniques for auralization of scientific data and focuses on the general concept of an auditory illusion as a means of addressing some of the issues associated with more conventional techniques. In this thesis we propose, and implement, a general framework for incorporating auralization techniques with scientific visualization. We define a platform-independent software architecture that allows for a pipeline of sound processing components, including sources, filters, and device-mappers. We implement the necessary infrastructure that permits new auralization techniques to be developed and tested, including a hardware-specific sound device mapper that facilitates interactive sound presentation synchronous with an animated visualization. We examine an auditory illusion which produces a sound that seems to ascend or descend endlessly in pitch, and show how to use visualization data to control this sound. We demonstrate the applicability of this illusion for presenting Computational Fluid Dynamics data. We also show that an analysis of this illusion leads to general principles that can be adapted to construct another auditory illusion that may be used for auralization, one that produces pulses that seem to increase or decrease endlessly in rate. And we demonstrate through formal user testing and statistical analysis that an aural data presentation using an auditory illusion can improve performance in locating key data characteristics, a task that demonstrates a certain level of understanding of the data. Our data show that this holds true even

xiii

when the user expresses a subjective preference and greater confidence in a visual presentation. Finally, we discuss some open questions and opportunities for future work, and provide information on how to obtain our software.

xiv

Chapter 1 Introduction

The continuing advances in computer processing speed and graphics rendering capability have led scientists and engineers to rely on computers more than ever before, not only to generate and analyze large quantities of information, but also to present that information to users in a form that is easy to understand. Visualization is the process of transforming data from a form in which it is difficult to comprehend, either because of its large size or because of its complexities and hidden internal relationships, into a visual form which is easier to understand. This is done by generating graphical images that capture the essential characteristics of the data and highlight interesting relationships. Large quantities of complex data are a part of almost every industry, from manufacturing to medicine and to finance, not to mention the pure sciences such as physics, chemistry, and biology. Over the past ten years or so, a great deal of research has gone into discovering new and better ways of visualizing information from many different fields. From domain-specific algorithms to general purpose toolkits, much work has gone into the tailoring of information for the human visual system. And not without good reason. The evolutionary process has developed the human visual system into a remarkable tool, responsible in no small part for the survival of our species. The brain’s extraordinary visual pattern recognition ability, which allows us to instantly recognize and distinguish strangers from friends and family members, far surpasses all attempts to accomplish the same thing artificially through algorithms and software. The brain’s ability to fuse two stereoscopic images for depth perception and our ability to perceive color increase the value of our visual system as a tool for acquiring and interpreting large quantities of information simultaneously. Our sense of hearing, while almost equally important, has received comparatively little attention when it comes to presenting abstract data from a computer system to a human user. Research into the field of auralization, the auditory analog of visual-

1

ization, has only recently begun to flourish, and it is still dwarfed by the amount of research devoted to visualization. 1.1 Why Auralize Auralization can be useful for a number of reasons. The most obvious reason is that it can serve as an alternative to visualization for people with visual disabilities. But auralization can be a useful companion to visualization even for people with normal vision. To provide a convincing argument for the usefulness of auditory information display in general, one need only examine a few of the unique strengths of the auditory system relative to the visual system. •

It is capable of gathering input from all directions and all ranges simultaneously, whereas the visual system is limited in both angular extent and depth of field at any one time. Seeing anything outside this range requires turning the head, re-orienting the eyeball, or adjusting the focus of the ocular lens. The auditory system has no such limitations. Sounds can he heard from all directions simultaneously. Near sound sources can he heard in conjunction with sound sources of any distance away, so long as the sound intensity is strong enough.

•

It excels at being able to segregate the spectral components of multiple superimposed inputs. One can easily focus one’s attention to a particular instrument playing in an orchestra or listen to a person with whom one is conversing despite the presence of numerous other conversations taking place in the same room. Conversely, if one were to superimpose multiple images on top of each other, it would be difficult to make sense of the contents.

•

It has excellent temporal resolution. Whereas visual stimuli lasting less than about 30 ms cannot be perceived, the ear can detect signals whose duration is only a few milliseconds [Kramer92a]. Furthermore, the ears don’t “blink” and are therefore immune to one of the ways in which short-lived information can be missed; unlike the problem of losing information on account of mental distraction, this drawback of the visual system is something which is almost completely beyond the viewer’s control.

2

In addition to the advantages that the auditory system has over the visual system in certain circumstances, it may also be desirable to use auralization as a means of preventing visual overload. In graphical visualization of complex data sets it is often possible to overload the visual channel, either by providing too much visual information at one time (overloading the viewer’s cognitive resources for visual processing) or by mapping information in such a way that information is lost in the visual form (overloading the display capabilities of the visual device). In either case, an opportunity exists for improving the understanding of data by off-loading the visual channel and conveying some of the information through auditory means instead. A noteworthy demonstration of the usefulness of an auditory data representation is found in the experimental work done by Brown et. al., who compared the effectiveness of a visual versus an aural stimulus as a cue of where to search for a target string of characters presented on a visual display. They determined that in cases where the cue correctly indicated the location of the target string, an auditory cue worked as well as a visual cue in enabling the subject to find the target string. However, when a cue incorrectly indicated the location of the target string, the incorrect visual cue distracted the subject and hindered the ability to locate the target string whereas incorrect aural cues were less distracting and had less of an adverse impact on performance [Brown89]. This indicates that, under some circumstances, an aural presentation of data can be superior to a visual presentation. 1.2 Our contributions Although there is a broad range of uses of sound in the human-machine interface, from simple warning indicators on machinery to audio equivalents of graphical user interfaces for the visually impaired, it is the intent of this research to explore the opportunities to represent and convey complex data through the auditory channel in a way that complements and augments visual representations. Exploration of ways to use sound requires devising generic algorithms that can transform various forms of data into effective aural representations. In this context, “generic” implies usefulness to multiple application domains. This involves the synthesis of sound by using the data to control various independent parameters of

3

the sound. Such parameters include obvious characteristics such as pitch, amplitude, duration, and stereo balance, but also some lesser known characteristics such as timbre, and attack and decay rates. We wanted to manipulate these characteristics in novel ways, in particular to create certain classes of “auditory illusions”, for the purpose of overcoming some of the current obstacles to representing certain kinds of data with sound. We did so in our preliminary investigation, in which we applied our approach to data drawn from the field of computational fluid dynamics. The results from this investigation were encouraging, and gave rise to the additional research that followed. Our research has made multiple contributions to the field of data auralization. Primarily, it has led to the development of a new conceptual framework for constructing auralization algorithms whose practical use is demonstrated via specific algorithms applied to specific application domains. Pertaining to this, our contributions include: 1. New classes of auralization algorithms have been made available to help scientists and engineers better understand their data. 2. These new techniques have been experimentally tested to determine the degree of their effectiveness in helping to understand data. Achieving these goals has naturally required the development of a software architecture and infrastructure upon which new auralization techniques could be implemented and tested, and has also required addressing the issues involved with combining graphical and sonic output. By “infrastructure” we mean the support software that provides the core set of functionality that new and existing aural representation algorithms can utilize to shield the auralization researcher from the details of common tasks such as data management and device abstraction. This infrastructure addresses issues such as providing smooth and continuous audio output despite the computational drain imposed by the generation of graphical output, and the problem of synchronizing the generation of graphical frames to the appropriate points in the audio. Hence, this infrastructure itself comprises algorithms that facilitate auralization, beyond those algorithms which transform scien-

4

tific data into audible waveform data. By “architecture” we mean the precise interface specification and conventions that a data auralization module must conform to in order to interoperate with the supporting infrastructure. We knew, based on our goals and based on what was available, that we would have to develop this architecture and infrastructure ourselves. So, once our preliminary investigation justified proceeding with this research, we began designing and developing this system. This development work has provided a common framework for implementing new techniques and simplifies the integration of these techniques with visualization mechanisms. As a side benefit to the primary focus of our research, this framework may form the basis of future end-user applications incorporating auralization with visualization. In fact, we will make this software available to the community, and we provide contact information for acquiring the system we developed. (See “Obtaining the software” on page 132.) So another resulting contribution of our work is that researchers in the field of auralization now have an extensible software architectural framework upon which to build additional auralization techniques. Testing the effectiveness of new ideas is an important part of any proposed paradigm for human-computer interaction. Established techniques, with varying degrees of psychometric rigor, can be employed. On the less-formal end of the scale, subjective feedback can be solicited from volunteers on the “helpfulness” of aural components of proposed information displays. This subjective feedback can come from domain experts with a specific background in the field to which the data display pertains, or they can be people of arbitrary background responding to combined visual/aural displays in a more generic context. Alternatively, more formal approaches can be taken. These would involve objectively measuring the performance of some volunteer on some task aided by visual and/or aural feedback. These tasks can be “high level” tasks such as the emergency room scenario carried out by Fitch and Kramer [Fitch92], in which a highly domain-specific and realistic (at least, in principle) problem was shown to be made easier through the use of an aural display. Or, the tasks can be of a more “lowlevel” form, such as that described by Pollack and Ficks [Pollack54], who mea-

5

sured the information-conveying capability of certain aural displays more directly, rather than in the context of a particular application domain. As we shall see later, we have adopted a combination of techniques. We have conducted formal quantitative testing and followed it up with a subjective questionnaire pertaining to the formal tests. The goal behind the formal testing was to quantify the ability of the test subject to garner a certain level of understanding of the data via different presentation mechanisms. The formal testing involved volunteer test subjects who had to navigate through a complex set of data using three different presentation modalities: an aural representation only, a visual representation only, and a combination of aural and visual representations of the same information simultaneously. While navigating through the data, they were given the task of locating specific characteristics in the data. Our test results show that the subjects were able to performs the tasks with greater speed and greater accuracy when relying on our aural presentation, than when relying on a visual one. So, we can summarize all these contributions as follows. •

We have designed an architecture that allows new auralization techniques to be easily developed and integrated.

•

We have defined data structures and developed the requisite algorithms to support a software infrastructure that enables high-performance interactive aural/ visual data presentation, and then implemented the complete code in order to demonstrate that our approach works.

•

We have proposed the adoption of auditory illusions as a means of addressing some of the issues and limitations of conventional auralization techniques.

•

We have conducted formal testing with volunteer test subjects to quantitatively measure enhanced data understanding through task performance and carried out a statistical analysis of the results to demonstrate that improved performance results from an aural presentation mechanism based on an auditory illusion.

•

We have extrapolated the principles that allow this auditory illusion to work, allowing us to predict the existence of a novel auditory illusion in mathematical terms, design an algorithm to produce it, and prove its existence, as well as

6

demonstrate its ability to co-exist with the original illusion in a single simultaneous presentation. 1.3 The structure of this document In Chapter 2, "Historical Review", we present a synopsis of some of the work done by other researchers in the field. We begin with an introduction to the uses of sound as user interface components. We then transition from the 2D realm into the use of sound in 3D virtual worlds. In the remaining subsections, we explore areas of research that are more directly relevant to our focus: using sound for data understanding, techniques for analyzing the effectiveness of auditory displays, and software systems for presenting auditory information. The final subsection describes a visualization software system that embodies some of the characteristics similar to those we envisage for a software auralization infrastructure. In Chapter 3, "Illusions for Aural Representation", we introduce the concept of auditory illusions and provide a motivation for their use as a means for addressing some problems in auralizing certain kinds of data. We discuss a known illusion and hypothesize the existence of an additional illusion based on similar principles. We then discuss possible ways in which their use can be combined. Chapter 4, "Preliminary Research", presents some of our preliminary research results that were performed prior to the main body of work discussed here. A discussion of this early work helps justify the subsequent research and places it in the proper context. In this chapter we discuss a particular form of auditory illusion, called a “pitch class illusion”, and its application to aural data representation. We show how certain forms of auditory representation are inadequate for certain kinds of data, and that application of some of the ideas discussed in chapter three can overcome these deficiencies, using data drawn from the field of computational fluid dynamics as an example. In Chapter 5, "Auralization Architecture", we discuss in detail the auralization architecture that we have designed, as well as the algorithms and data structures behind the infrastructure that we have developed. We will cover the sound data representation and the data flow within the sound processing pipeline, consisting

7

of sources, filters, and a sound mapper. And we’ll finish with a detailed description of the sound rendering process that occurs within the sound mapper at the base of the pipeline. In Chapter 6, "Sound Generation Algorithms", we discuss the algorithms that pertain specifically to sound generation. We’ll begin by briefly discussing an efficient approach to waveform generation using a table lookup for the trigonometric sine function, and a fixed point representation for phase angles. We’ll then recap the algorithm for generating an auditory illusion used during our early work, discussing some details that were not present during out preliminary investigation. We’ll show how an analysis of the principles that make this illusion work can lead us to a derivation for a new type of auditory illusion, called the “pulse rate illusion”, and we’ll lay out the mathematical foundation for generating it. Chapter 7, "Effectiveness Testing", mainly covers our methodology for testing the effectiveness of the original pitch-class auralization technique, and presents an analysis of the results. We discuss an interactive combined aural/visual data exploration and testing environment built from the components described in chapters 5 and 6. We describe the variety of data sets we use and what they represent. We discuss in detail the task performance being measured, which consists of navigating through data presented via either the visual or aural modalities, or a combination of the two. We then describe the experimental design and the assignment of specific tests to volunteer test subjects. Following that, we continue with a discussion of the results of the aforementioned testing, and draw some conclusions from the results. We describe the application of statistical tests such as the “Paired T-Test” and “Analysis of Variance” tests to discern differences in task performance based on presentation modality. We then discuss the subjective questionnaire given to the test subjects about their impressions of the different presentation modalities, and point out some interesting discrepancies between subjective impressions and objective empirical results. Finally, we wrap up the chapter with some brief notes regarding the informal testing we did on the capabilities of the pulse rate illusion. In Chapter 8, "Discussion and Conclusions", we summarize our accomplishments and outline what we’ve learned.

8

Chapter 9, "Future Work and Open Questions", the final chapter in the main body, discusses some future work that could benefit the auralization community and some opportunities for further research in the form of open questions. Appendix A provides some additional software details, such as the development effort required for the various components, the nominal system requirements for using this software, and some details on the software architecture in the form of C++-language object models that depict the component, or class, hierarchy. We also provide additional information about the testing and data exploration tool we developed. This appendix ends with contact information and instructions on obtaining the software. Appendix B provides the instructions given to test subjects, and the raw data that resulted from our user testing. Appendix C contains an archive of all the C++ code developed for this system, in the form of VTK class definitions and implementations.

9

10

Chapter 2 Historical Review

The literature related to the uses of auditory data display can roughly be divided into two categories. The first category is the use of sound as part of a user interface, and a lot of the work in this area is being done from a human factors standpoint. The “data” in this case are usually of a discrete form, typically consisting of the occurrence or non-occurrence of events. The other category is the use of sound to aid in the understanding of data. It is this latter category that is properly referred to as “auralization”. This form of auditory display is usually applied to continuous data. It should be noted, however, that such a division is, to a large extent, arbitrary, and that much of the work going on in the field encompasses aspects of each. This division is similar to the division between qualitative and quantitative auditory display proposed by Kramer, who also points out that this distinction is actually a continuum [Kramer92a]. Orthogonal to the above division of usage categories are issues related to the mechanisms by which the different forms of data mentioned above are displayed aurally. One such mechanism that is the subject of a great deal of research is the use of “3D sound” in virtual environments. This work attempts to simulate sound sources localized at different spatial positions by modelling the physics of sound wave propagation and the spectral filtering effects of the human outer ear. This kind of capability can enhance both categories of auditory display outlined above. Not only is this useful for providing realistic immersive environments that incorporate simulated sound producing objects as part of a “3D user interface”, but 3D sound position can also be used as a display dimension for conveying abstract data. Regardless of one’s purpose for using auditory display, a fair amount of work goes into building a basic infrastructure for sound generation suitable to one’s particular needs. Some people have attempted to address this duplication of effort by building general purpose systems that others can use for their auralization research. Some of the interesting challenges posed in this area include making the sound generation algorithms general enough to be applicable to a multitude of tasks, making the system portable to multiple platforms with different capabilities in

11

hardware, and making the system interactive enough to generate complex auralizations on the fly as a user changes relevant parameters and having the audio output synchronized with an animated visualization. Despite the fact that our focus is specifically on auralization, a great deal can be learned from the work done in designing and implementing auditory interfaces, of both the 2D and 3D variety. The lessons learned can help researchers design more effective auralization techniques. Therefore, a brief survey of the state of the art in auditory interfaces is desirable. Following that, we will discuss prior work specifically in our areas of interest, namely the investigation of innovative ways to map data to sound and approaches for analyzing the effectiveness of these methods. We will then examine software systems to facilitate auralization use and research. Finally, we will discuss a software system for building visualization applications and discuss some of its characteristics which make it a good candidate for the incorporation of auralization capability. 2.1 Auditory components of user interfaces Gaver coined the term “auditory icons” to refer to everyday sounds that convey information about events in a computer by analogy with real, everyday sound-producing events. He incorporated these auditory icons into an enhanced version of the Macintosh “Finder” application called the “Sonic Finder”. Unlike static visual icons, Gaver’s auditory icons are parameterized by characteristics of sound-producing events. So, for example, the selection of a file in Sonic Finder can create a tapping sound, but the size of the file can influence the amplitude of the sound and the file type can influence the choice of synthetic material properties [Gaver86]. Gaver also describes algorithms for synthesizing various types of real-life sounds [Gaver92]. Although these algorithms are developed for the creation of auditory icons, it is possible to generate continuously varying versions of these for representing quantitative scientific data, and using different object and material types to represent different dimensions of the data. Blattner’s “earcons” are similar in the use of short discrete tones, but are different in two ways. One is that the sounds do not attempt to mimic the sounds produced

12

by real objects. The other is that multiple sequences can be combined in a variety of ways via syntactic rules to convey new messages. In this way, earcons are a form of language [Blattner89]. Although Blattner et. al. used simple tones, the experimental work done by Brewster et. al. [Brewster92] revealed that assigning different musical timbres to different earcons enhanced their effectiveness. This conclusion was later supported, again by Blattner et. al., who used earcons to represent structures in two-dimensional maps. Different timbres were used to represent different aspects of the map data. For example, a saxophone timbre was used to represent an administrative building, and a tom-tom drum was used to represent access restrictions. Quantitative information was related by changing various parameters of the sounds. For example, loudness of an earcon was used to indicate the number of corresponding objects within a selected region of a map, while speed and pitch would convey different levels of access restriction. Brewster also developed a sonically enhanced interface toolkit based on earcons [Brewster96]. The toolkit consists of a set of graphical, sound producing widgets, each with a characteristic chord structure and rhythm. Timbre and stereo balance parameters are assigned on a per-application basis. With this arrangement, all occurrences of, say, piano notes coming from the left channel, could be recognized as coming from actions within a word processor, for example, while trumpet notes coming from the right channel would correspond to a spreadsheet. In either case, a particular chord, such as the two notes “C” and “E”, would indicate a user interaction with a particular type of widget, such as a scrollbar, for example. Testing indicates that these sonically enhanced widgets are not only superior in terms of user preference, but also improve the user’s ability to recover from mistaken actions. 2.2 Auditory representations of virtual worlds In addition to the use of sound in enhancing ordinary two-dimensional graphical user interfaces, sound has also been studied for use in virtual reality environments in which 3D spatial cues are used to aid in the location of virtual sound-producing objects. Being able to determine the location of sound sources is an important capability in real life, and is a capability that can be quite useful in an auditory user interface, whether it be for training simulations or for user interfaces for the visually impaired.

13

A system designed at the NASA Ames Research Center, the Convolvotron, accomplishes this goal [Wenzel90]. The technique involves taking the sound emitted from a virtual sound source and passing it through a pair of finite impulse response (FIR) filters, one for each ear. These filters model the head-related transfer functions (HRTFs) that govern how sounds are modified from the time they leave the sound source to the time they are picked up by the inner ears. These HRTFs comprise three different causes of sound signal modification. The first are interaural intensity differences, or IIDs. These differences are due to the fact that the sound intensity received in the inner ear is greater in the ear facing towards the sound source than in that of the ear facing away from the sound source. This effect is not constant across all sound frequencies, but rather is frequency-dependent due to the fact that low frequency waves diffract better around obstacles, and therefore the head more effectively blocks higher frequencies from sources on one side of the head travelling towards the ear on the other side of the head. Thus, the effect provides information on the location of high frequency sound sources. The second effect on the sounds received in the inner ears is interaural time differences, or ITDs. These differences are due to the fact that one ear is physically closer to the source than the other ear, which results in a phase difference between the signals received by each ear, by an amount that varies from zero, in the case of a sound source directly in front of or directly behind the listener, to a maximum amount given by the distance between the ears, in the case of a sound source directly to the left or right of the listener. This effect, too, is frequency dependent. If the wavelength of the sound is greater than the distance between the ears (i.e. the diameter of the head), then the signals received by the two ears will always be less than one wavelength apart, and the phase difference between the ears will uniquely determine the difference in path length from the source to each ear. If, however, the wavelength of the sound is shorter than the distance between the ears, then a given phase difference between the signals could be produced by more than one path length difference from source to ears, or equivalently, more than one source location. Therefore, ITDs provide information about the position of low-frequency sounds. The final effect accounts for the fact that we can detect elevation and front/ back information even when there are no inter-aural differences, i.e. when the source of the sound is somewhere in the plane that is the perpendicular bisector of

14

the line passing through both ears. The reason we are able to discern different source positions in this plane is that the shape of the outer ear, or pinna, filters the incoming sound in a highly direction-dependent fashion. These HRTFs are typically generated by placing small microphones in a test subject’s ear canals and measuring the signal received from sources placed at various locations around the subject. Results are most accurate when the HRTFs are tailored to each individual, but this is not always practical. Furthermore, accuracy is improved, particularly in discerning front/back relationships, when the sound delivered to each ear incorporates reflections (echoes) from the environment, for example, the walls, ceiling, and floor of a room [Wenzel92b]. In the VR Toolkit produced by IGD [Astheimer93], this is done in one of two ways. Image source algorithms compute the mirror images of virtual sound sources. In a rectangular room, there are six such virtual sources to consider, one for the ceiling, the floor, and each of the four walls. These mirror-image sources, along with the original virtual sound source, can then be treated as separate soundproducing entities as if they each existed outside of the closed room virtual-environment. Ray tracing algorithms trace the propagation of sound waves, or rays, in a scene, and rays that come close to each of a discretized set of positions in a virtual environment contribute to the function that maps sound source locations to received signals at those position. By emitting sound source waves in a multitude of directions from each source, its effect on all possible listening positions can be determined. Astheimer’s comparison of this technique to visual ray tracing is, however, somewhat misleading. In visual ray tracing, rays are back-propagated from the destination (i.e. the eye point), through the screen pixel locations, until they intersect visible objects or light sources. In aural ray tracing, rays are emitted from the sources and forward-propagated through the virtual environment. Astheimer correctly points out that these techniques enhance the natural simulation of imaginary worlds. However, he also makes the obvious, yet questionably relevant, observation that these realistic sound effects yield a “broader acceptance and easier comprehension than early examples of technical data sonification” [Astheimer93], suggesting that the former obviates or supersedes the latter. Since

15

precisely the same comparison can be made between visual flight simulation or ground vehicle simulation versus gas pressure isosurface visualization from an aircraft engine combustor, it is clearly seen that nothing can be further from the truth. Each has its place, despite the fact that the isosurface visualization is useful to a handful of engineers while the flight or ground vehicle simulation benefits thousands of military and commercial pilot trainees. 2.3 Using sound for data understanding 2.3.1 Bly’s experiments An early experiment into the use of sound to aid in the understanding of data was done by Bly, who investigated the use of sound to represent three different kinds of data: multivariate, logarithmic, and time varying [Bly82]. Her experiments into multivariate data demonstrated the ability of test subjects to classify different objects by the sound produced when independent characteristics of the objects were mapped onto independent dimensions of sound. In this particular case, the classification was of different species of flowers, and the data consisted of measurements of four different characteristics of each sample. The dimension of sound frequency provided a natural means for representing data with a large dynamic range. The exponential range of earthquake magnitude was “linearized” by mapping it onto frequency, whose relationship to the perception of pitch is logarithmic. This magnitude was also mapped onto loudness and duration of sound. The tests involving time variate data used battle simulations to generate data for the sound mapping. These tests showed that the classification abilities of test subjects using aural displays were as good as those using visual displays, and that using both provided a distinct advantage over using either form alone. 2.3.2 Audification of raw data A different approach to the auditory display of seismic data was taken by Hayward who, rather than map earthquake magnitude to frequency as was done by Bly, decided to preserve all the recorded seismic data in the auditory display by playing back recorded seismic vibrations directly as sound samples [Hayward92], a process Kramer refers to as audification [Kramer92a]. The technique is applicable not only to seismic data related to earthquakes, but also seismic activity resulting from

16

other sources, such as explosions and artificially-generated shocks from hammers and gun shots, which are useful for obtaining information about rock structures below the earth’s surface. Seismic data audification is one of the rare circumstances in which directly playing back raw data as audio samples is practical, because the physics and mathematics of sound wave propagation through air is similar to that of seismic wave propagation. Although there are some difficulties with this method of display, the difficulties are not insurmountable. Indeed, two of the difficulties, namely the unusually large size of datasets (often in the billions of samples) and the subsonic frequencies of the vibrations, solve each other: playing back the seismic wave samples at an accelerated rate, also known as time compression, can drastically reduce the amount of time necessary for the scientist to aurally process all of the information, while at the same time bringing the highest energy components of the waves into the audible frequency range. Hayward notes several reasons why auditory display of seismic information is useful. Training of analysts and education of seismology students can be enhanced. If the use of an auditory display can help people learn and identify seismic patterns quicker and better, then that benefit will remain even if auditory displays are not used later on in the field. Of course, using auditory displays in practice provides even greater benefits. Being able to tell the difference between an earthquake and an explosion is an important task. This classification problem may be simplified, resulting in a more accurate analysis in less time, with the use of an auditory, or combined auditory and visual, display. Furthermore, this classification problem extends beyond seismic phenomena, but also includes the problem of recognizing noise in the data, including noise generated by the recording instruments themselves. In addition to the problems alluded to earlier, a number of other problems present themselves when audifying raw seismic data, particularly data from earthquakes. Despite the fact that earthquake signals span more than seventeen octaves, well beyond the approximately ten octave range of human hearing, recording devices are generally capable of covering a range of only three octaves. And despite the large quantities of data that are typically acquired, the more interesting sections of recordings of reflected waves tend to last only a couple of seconds. And further-

17

more, the dynamic range of these waves is overwhelming. Many signals have a range of greater than 100 dB over a short period of time, which is akin to trying to hear a whisper shortly after listening to the climax of an orchestra [Ridge94]. Some even have a dynamic range of greater than 140 dB. This surpasses the range between the threshold of hearing and the threshold of pain [Ridge94], causing problems not only for the human listeners, but for conventional recording equipment as well. To put this dynamic range into perspective, it is the difference between the loudness of normal speech and that of a rocket engine. By comparison, compact discs have an effective dynamic range in the vicinity of 95 dB. In addition to the time compression technique mentioned earlier, Hayward discusses a number of other techniques useful for dealing with the problems associated with audification of seismic data. The problem of excessive dynamic range is handled by a process called automatic gain control (AGC). This process compresses the dynamic range by attenuating periods of high amplitude signals and amplifying periods of low amplitude signals. Although this brings the dynamic range into a comfortable listening level, it also reduces the information content of the signal, as far as the human listener is concerned, because waveforms whose intensities were marginally discernible before the compression will be indistinguishable after the compression. Although the subsonic nature of the recordings can usually be corrected through time compression, sometimes the size of the datasets are not large enough to allow this. Compression would result in signals that are of such short duration that no information would be gleaned from them. Furthermore, it may sometimes be necessary to slow down the playback of some portion of the data to study it more closely. In this situation it would be desirable to play the data back more slowly without changing its sound, almost as if a pianist were playing back a piece at a reduced tempo. Unfortunately, simply playing back the samples at a slower rate reduces the frequencies of the waveforms that make up the signal. To combat this, Hayward suggests a technique known as frequency doubling, which, based on certain trigonometric relationships, permits the generation of a signal whose frequency is double that of a given signal. Although this technique works well for simple tones, Hayward points out that it has difficulty with more complicated sig-

18

nals. As an alternative to speed-halving and frequency doubling for listening to short segments more closely, Hayward offers the suggestion of playing short segments repeatedly in a loop. Hayward concludes that these techniques are useful for the analysis of seismological data, but points out that the true value will only be known when experiments are done to compare the actual performance of analysts using graphical analysis tools with and without accompanying audification. 2.3.3 Auralization of program behavior Whereas the majority of the work described in the last two sections involved data from natural origins, the work by Jameson described here involves data of manmade origin. In this case, the data consists of the flow of execution of computer software. His “Sonnet”, an audio-enhanced program debugger, demonstrated that program execution could be followed more easily by mapping events in the execution of a program to sound-producing events [Jameson92]. He claims that a user has a good idea of what the execution should sound like, and thus, deviations from the expected results can indicate the existence, and possibly the location, of bugs. In the Sonnet system, two types of information are displayed aurally to the user. The first is code execution. This maps different forms of programming constructs onto different sounds. For example, a tone can be initiated at the beginning of a loop and continue until the loop terminates normally. Different actions within the loop can influence the tone’s parameters. A tone that does not terminate when expected can indicate a problem with the termination condition (i.e. an infinite loop) or an abnormal termination (such as raising an exception and transferring to exception handling code). The other type of information displayed aurally is program data, i.e. variables. Some of the interesting possibilities Jameson suggests are tracking trends in the value of a variable, determining when a variable is read or modified, and aurally conveying the high-level structure of complex data types such as trees. A somewhat different approach in this same area was taken by Vickers and Alty in the development of the CAITLIN system [Vickers96], a tool designed to help novice programmers with debugging. Rather than using a separate debugger applica-

19

tion to auralize program behavior, the CAITLIN system incorporates the sound generation directly into the program to be auralized by acting as a preprocessor for source code. The system generates musical signatures for different programming constructs via a MIDI-enabled sound card. Tests performed with faculty member subjects demonstrated a reasonably good ability to discern the overall behavior of a program from the generated auralization, without any visual feedback on the nature of the program. Although the system was designed to aid in debugging, the authors do not discuss the system’s benefits specifically in this area, but rather only evaluate subjects’ abilities to interpret the overall structure of the program. 2.3.4 Scientific auralization by McCabe and Rangwalla Moving further into the scientific and engineering arenas, McCabe and Rangwalla investigated auditory displays of CFD data and discussed two case studies [McCabe92]. The first study involved the analysis by NASA Ames of a computer model of an artificial heart pump developed by Pennsylvania State University. Audio displays were used to represent the pressure generated by the pump (indicated by a continuous tone with a nominal frequency of 440 Hz, but modulated by the pressure value), moments when blood cells encountered dangerous vorticity levels (with tapping sounds per each occurrence), and instances of valves opening and closing (conveyed by the sound of a bass drum). They concluded that when a visualization was accompanied by a corresponding auralization, the understanding of the data was improved. The second study involved the analysis of data from a “rotor-stator interaction” simulation inside a jet turbine. This simulation generated complex pressure patterns that depend on the frequency of turbine blade motion and other physical characteristics of the turbine, and the resulting data was represented aurally. This was a particularly interesting application of auralization because the data was not used as parameters for a series of notes, but rather, like Hayward’s work with seismic data, were aggregated into a waveform from which sound was directly produced. Since a large quantity of pressure samples can be played back rapidly to produce a short sequence of sound, this allowed the user to hear all of the data,

20

whereas an analogous visualization would have resulted in a filtering of the data due to visual display resolution and limitations in the eye’s ability to see the minute changes in the data that occur from one sample to the next adjacent sample. 2.4 Analysis techniques Although a great deal of research is being conducted in creating new ways of conveying information with sound, formal effectiveness testing is still an area receiving relatively little attention. However, some of the work done by Fitch and Kramer focused on the analysis of existing techniques and produced interesting results [Fitch92]. In their study, subjects were given the task of monitoring various vital signs of a computer simulation of a medical patient in an operating room and responding to emergency situations to keep the “patient” alive. The goal of the study was to compare the performance between monitoring a standard visual display used in hospitals and monitoring a custom audio display designed by the experimenters. But rather than try to make the two types of displays as equivalent as possible in their information-conveying characteristics, they tried to exploit the unique capabilities offered by each modality to determine which approach offers greater potential. The subjects had to monitor eight different physiological variables, including body temperature, heart rate, blood pressure, and respiratory rate, among others. In the visual display, these variables were represented graphically with a labelled strip chart. In the aural display, the variables controlled key parameters of distinct sounds associated with each variable. For example, heart rate was represented naturally as the rate of synthetic “heart-beat like” tones. Respiratory rate was conveyed by the rate at which the loudness of a band-limited noise source changed. Some variables controlled more than one characteristic of a single sound. For example, body temperature was conveyed by the “pitch” (center frequency) of the noise that represented the breathing rate. Although none of these mappings were novel in themselves, their combination in this manner was new. The participants were subjected to random complications consisting of dangerous changes in a single variable or three variables simultaneously. They then had to

21

identify what variable or variables were changing, determine what problem(s) were causing the change, and take appropriate steps to correct the problem. The relative effectiveness of the two display modalities was evaluated through quantitative analysis of the experimental results. During the experiments, data were taken on the response times of subjects in correcting complications as they arose. Statistical techniques (“ANOVA”, or “ANalysis Of VAriance”) were employed to factor out variance in response times due to testing order and training order, which were determined not to be significant. The conclusion was that there was a significant difference in response times as a function of display modality, and that the response times were shorter (i.e. better) with the pure aural display than they were with either the pure visual or combined visual/aural displays. Furthermore, calculations showed that in addition to response time improvement, fewer mistakes were made with the aural display than with the visual display. And the superiority of the aural display was most pronounced in the situations where three vital sign changes occurred simultaneously. Fitch and Kramer hypothesize that this difference is due to the auditory system’s capability to process and interpret multiple overlapping sonic stimuli in parallel, whereas the visual system has to scan each of the vital sign indicators serially. It should be noted that the comparison done here was between one auditory display and one visual display. Although it demonstrates that the auditory display was superior in this particular instance, it does not prove that auditory display are universally superior to visual displays in general. It should also be noted that although the quantitative results are compelling, the qualitative results were somewhat less encouraging. Although subjective statements about the ease of use confirmed the quantitative results, there were widely disparate opinions on the aesthetic qualities of the sound mappings chosen. So, despite the quantitative validation of the techniques and of auditory display in general, little was learned in terms of overall guidelines for auditory displays. Nevertheless, the rigor with which the experimental results were quantitatively analyzed serves as a model for future aural display validation.

22

A number of other approaches to quantifying auditory display effectiveness have been proposed. Frysinger describes the early work of Pollack and Ficks, who took an information-theory approach to the quantification of auditory display effectiveness [Frysinger90]. They presented subjects with tones whose parameters, such as frequency, intensity, duration, pulse rate (number of interruptions per second), and apparent spatial orientation, varied among a variable number of levels. For each display parameter, the subject had to categorize the parameter value as being equal to one of n levels, where n was either, two, three, or five. Presumably, these numbers are chosen so as to demonstrate an excess of a certain integral number of bits of information transmitted by the display. For example, being able to classify the pitch of a tone as belonging to one of two different levels demonstrates that more than zero bits of information are being transmitted via that sound parameter. Similarly, being able to distinguish between three different levels indicates that more than one bit of information is being transmitted, and being able to distinguish between five different levels indicates that more than two bits of information are being transmitted. By choosing, for each configuration, the number of different sound parameters to vary and the number of different levels of values for each parameter, Pollack and Ficks were able to control total number of bits of information that could potentially be received by the subject in the given configuration. The number of bits actually received for a given configuration was determined by the subject’s ability to correctly categorize the different dimensional levels in a configuration. The experimenters came to the conclusion that the number of bits of information transmitted by an auditory display is more effectively increased by increasing the number of display dimensions than it is by subdividing each display dimension into a greater number of levels. Frysinger also discusses his own approach to the analysis of display effectiveness [Frysinger90]. Rather than measure potential number of bits of information that a given display conveys, Frysinger decided to evaluate the robustness of a particular display technique in the presence of noise. This allowed one display technique to be compared against another by determining which technique permitted a test subject to extract a given signal from a greater amount of noise. In Frysinger’s study, the “signal” was the presence of a correlation between two different data dimen-

23

sions, and “noise” was the degree to which the correlation constant of the data differed from unity. The subject’s task was to listen to two different data displays in succession, and determine which one of them contained a non-zero correlation between two variables of the display. A series of these tests were conducted in which the correlation coefficient of the non-zero correlation display would vary in response to the subject’s guesses as to which display contained a non-zero correlation. The tests would become harder (more noise, less correlation) or easier (less noise, more correlation) in response to correct and incorrect responses, respectively, by the test subject. By manipulating the correlation coefficient in the nonzero correlation data set, Frysinger was able to converge upon a correlation value for which the probability of two consecutive correct responses was 50%. Since a 50% probability of obtaining a correct response on any single trial can be attributed to random chance, Frysinger calls this converged-upon signal level the threshold of pattern detection. Using this operational definition, the relative effectiveness of two different data displays can be evaluated by determining which one has a lower threshold of pattern detection. As new aural data representation techniques are developed, quantitative measurements of their effectiveness will become necessary. The aforementioned techniques provide several different perspectives on determining the effectiveness of an auditory display, and can serve as a conceptual starting point for developing new testing procedures. 2.5 Software systems The work discussed in the previous sections has, by and large, been achieved through the development of custom sound-generating applications specific to the task at hand. They have, for the most part, been developed with particular computing platforms in mind, and for sending output to specific sound-generating devices, usually MIDI synthesizers. The constraints imposed by the state of the art in both hardware and software have prevented end-users of visualization techniques from taking advantage of the benefits offered by auralization research without themselves embarking on a tedious auralization architecture development.

24

Recently, the entertainment industry has been driving advances in sound generation hardware. Although more sophisticated sound-producing hardware is quickly becoming more available on conventional mainstream computing platforms, such as the virtually ubiquitous “Wintel” platform, the capabilities of these now-standard components have hardly been explored outside the gaming industry. As serious visualization work is now being done possibly as much on PCs (often with the help of new inexpensive 3D graphics hardware) as it is on high-end Unix-based workstations, it is becoming increasingly important to take advantage of sound capabilities in a platform-independent manner. Some work has been done in the development of portable software toolkits for sound generation, but we are quite a bit away from being able to develop combined visualization and auralization applications that integrate both modalities seamlessly in a portable fashion. In the next few sections we review some of the software systems developed for auralization purposes and discuss some of the limitations that we hope to address with this research. We also give an overview of one visualization system whose architecture can serve as a model upon which to base an auralization toolkit architecture. 2.5.1 Kyma Scaletti’s Kyma System is a general purpose sound specification system consisting of a hardware component called the Capybara, and a software component called Kyma, which is a language for specifying the flow of sound streams through a pipeline of sound manipulation objects [Scaletti91]. The Capybara is a multiprocessor consisting of nine Motorola digital signal processors (DSPs) which execute software sound synthesis algorithms in parallel and in real time. Scaletti points out that the use of “general purpose” DSPs provide greater flexibility and control over the sound than would be possible if a MIDI synthesizer were used to generate sounds in hardware. However, a Capybara is arguably special-purpose hardware itself (in the sense that few people already have them), and thus the Kyma system lacks a certain degree of portability. Another limitation of the Kyma System is the absence of integration with a visualization system. The sound-enhanced animations produced as a result of the work she discusses are created by a two-step process. First, a sequence of images are generated to comprise the animation, which are sent to a post production facility.

25

Then, the data that was used to generate the visualization are used as input to the Kyma System to create a corresponding auralization. The result is also sent to the post production facility where the video and audio streams are synchronized and combined and recorded to videotape. This process is adequate for producing demonstration videos depicting the results of visualization research, but is not interactive enough to be useful to the engineers and scientists who use visualization tools in their work. 2.5.2 Ssound The Ssound system developed by Minghim and Forrest is an application designed to produce a combined visualization and auralization of surface data generated from a regular volumetric grid [Minghim95]. It provides interactive combined visual and aural display, unlike the off-line production process used in the Kyma System. However, it is highly platform and hardware dependent, running on a Macintosh and sending MIDI instructions to a Korg synthesizer. The MIDI-centric design restricts its ability to produce more arbitrary data-to-sound mappings. Furthermore, it is unclear whether the system is adaptable to other purposes besides auralization of surface data. 2.5.3 The “Listen” Toolkit Of all the auralization systems currently available, the Listen toolkit by Wilson and Lodha appears to be the most portable and general purpose [Wilson96]. Indeed, Listen is not designed around any particular use or application domain, but instead allows users to map arbitrary data onto sound parameters in a variety of ways. Although the authors describe it as a toolkit, it should be pointed out that this term can mean a number of different things. In the context of Listen, it appears to mean “a collection of stand-alone programs” (with one exception). The programs are denoted Listen 1 through Listen 4, except that Listen 4 does not appear to be a stand-alone program but rather a collection of objects and methods that can be used to build an application or to include in another application. It is this latter attribute that we commonly associate with the word “toolkit”, and therefore it is Listen 4 that bears the most resemblance to one of the stated goals of our research, which is to produce an architecture and infrastructure to support the development of combined visualization/auralization applications. Thus, it is important to point

26

out the ways in which our goals differ from those of Listen 4. But first, let us briefly examine the stand-alone programs that make up the first three components of the Listen toolkit. Listen 1 and Listen 2 are designed to take advantage of the capabilities of an SGI’s audio chip. Listen 1 is controlled via command line parameters which allow the user to specify a file data source and various sound parameters. Listen 2 uses a graphical user interface and provides some more complicated mappings. Since these are described as written for the SGI platform, and in particular its audio chip, it is not clear how portable these applications are. Also, since these are stand alone applications, they can be integrated into a visualization system only in the loosely coupled sense of having the visualization system output data files from which these two applications can later extract the data to be auralized. Listen 3 is also a stand alone program, but generates sound through a MIDI device. (Whether it supports a MIDI device in addition to the SGI audio chip, or in place of it, is unclear.) In addition to providing MIDI support, Listen 3 provides a wider variety of mapping possibilities. Whereas Listen 1 and Listen 2 allow only linear mappings of data to sound parameters, Listen 3 provides non-linear transfer functions for data mapping, as well as the ability to define custom transfer functions. This too, being a stand alone program, can be integrated with visualization systems only in a loosely-coupled manner. Listen 4 is the most interesting of the four components of Listen. This component is a C++ library that can be directly integrated within a visualization application. When this is done, it is no longer necessary to output visualization data to a file to be read in later by Listen. Rather, the application can send the data directly to the Listen module. However, the Listen 4 software module appears to be overly constrained in the kinds of mappings it provides. The sound parameters controllable by the data stream are pitch, duration, volume, and stereo location. In the case of a MIDI output device, a timbre can be associated with each data stream. It is unclear whether arbitrary waveforms, such as a sawtooth waveform or a square wave, or even

27

band-limited noise, can be generated, or if continuously variable waveforms can be generated by interpolating sound parameters between data values on a per sound-sample basis. Furthermore, it is not known whether a sound stream processing pipeline architecture is incorporated that would permit a series of sophisticated manipulations to be performed on a sound stream. Another apparent limitation of the Listen system arises from not being designed in conjunction with an architecturally-similar visualization system, thereby requiring it to be retro-fitted onto existing visualization systems in an ad-hoc way. Consider a visualization pipeline in which data is regenerated on an as-needed basis depending on which objects in the pipeline have had their associated parameters altered since the last time data was generated. Unless the auralization components understood the protocols of this pipeline, the auralization and visualization algorithms in the pipeline would not operate together in an event-driven fashion. If, however, an auralization pipeline architecture existed that was designed around such a visualization architecture, auralization components that were influenced by derived or computed visualization data would automatically regenerate their sound data when, and only when, their associated visualization data was regenerated. The result would be an efficient, tightly-coupled visualization and auralization pipeline that would allow visualization and auralization algorithms to be combined in novel ways and interactively explored and used. 2.5.4 The Visualization Toolkit The Visualization Toolkit, or vtk, is a freely-distributable portable software library for building visualization applications. It was developed by three researchers at the General Electric Research and Development Center, and is available on CD-ROM with an accompanying textbook [Schroeder96a]. As we shall see from the following brief description of its design goals and design decisions, the philosophy behind vtk makes it an ideal companion for what we desire as an architecture for conducting auralization research that incorporates both auralization and visualization. Vtk is, as its name suggests, a toolkit. This implies a collection of well defined pieces that can be easily assembled into sophisticated applications. Ensuring that

28

the interfaces are simple and well-defined is done via object-oriented design techniques and the use of the object-oriented language C++ as the implementation vehicle. Vtk is also portable. By building abstraction layers on top of windowing system dependencies and graphics libraries, applications can be written to run on new platforms by providing an abstraction layer for that platform, without having to rewrite any application-specific code. Vtk currently runs on Intel-based Windows NT/9X/2000 systems using Microsoft’s windowing API, and on a variety of Unix systems using X windows. It originally supported various underlying graphics APIs, such as OpenGL on the Microsoft operating systems and on SGIs, XGL on Sun machines, and Starbase on HP systems, but the industry has mainly settled on OpenGL across a wide range of hardware vendors. Yet applications need know nothing about the intricacies of these low-level graphics libraries. Vtk defines a number of basic objects for encapsulating essential graphics and visualization functionality. Lights, Cameras, and Actors are the basic elements that comprise a scene. A camera defines the viewing transformations that control how the viewer is oriented with respect to the objects in the scene. Lights illuminate objects in the scene based on their positions and other characteristics. Actors represent the actual objects in the scene. An actor has a property, which controls visual characteristics such as color and material properties, a transform, which controls the position and orientation of an actor in a scene, and a mapper, which contains the geometric representation of the object. A renderer coordinates the activities of the actors, cameras, and lights, and displays the final image in a render window. Some of these objects have device-specific “peer” objects that are created automatically behind the scenes to interface to the platform-specific capabilities. The mapper objects obtain their geometric data from an associated visualization pipeline. Mapper objects are termination objects, or “sinks”, in a visualization pipeline. The other components of a visualization pipeline are filters and sources. Sources either generate specific geometry in an algorithmic way (such as generating a sequence of polygons that comprise the surface of a sphere based on some

29

parameters such as resolution or number of polygons), or provide geometry by reading geometric data from files of a number of different standardized data file formats. Filters are the most interesting types of objects in the pipeline because they can be extended and combined in limitless ways. One filter, for example, can take volumetric data as its input and generate a polygonal surface consisting of all the points having a constant value of some scalar data attribute defined over the original volumetric data. Such a surface is called an isosurface. Another filter can be defined to remove, or clip, a subset of polygons passed into it on the basis of some spatial characteristic. By creatively combining these filters we can, for example, generate skin and bone surfaces from CT (computed tomography) or MRI (magnetic resonance image) data and cut away a portion of the skin to reveal the underlying bone. In vtk, pipeline components are connected to form a complete pipeline by invoking methods that assign the input to one component (say a filter or a mapper) to be the output of another component (say a filter or a source). The strong type checking of C++ ensures that only compatible objects can be connected in this way. For example, a filter that generates a streamline from volumetric velocity vector data and a seed point would not know what to do if given a polygonal isosurface, and the type checking rules of the language prevent this. Once a pipeline is set up in vtk, it can be executed. Execution consists of a sequence of operations in which each component in the pipeline retrieves data from the previous component, operates on it in some manner, and passes it on to the next component of the pipeline. Execution occurs when the last component of a pipeline, a mapper, is asked to map its data. (This usually occurs when the renderer is asked to render an image.) However, pipeline components do not execute unconditionally, for that would be inefficient if it were to perform the same operation on the same data as it did the last time the network executed. Therefore, each pipeline component keeps track of a modification time and an execution time. The modification time is updated whenever a parameter affecting that component’s behavior is modified, such as if the desired isovalue for a computed isosurface is changed by the user. The execution time is updated whenever the pipeline compo-

30

nent actually generates new data. When a pipeline component receives a request for data from one of the components downstream (a pipeline component can provide the input for more than one other component), the first component examines its modification time and the modification time of its source upstream, and compares them to its own execution time. If either is more recent than the execution time, the component regenerates new data. Otherwise, it returns to its requestor a reference to existing previously-generated data. This mechanism for maintaining efficiency of computation is referred to as implicit execution. Whereas implicit execution can achieve computational efficiency, memory efficiency is achieved through the use of reference counting. This allows pipeline components to share data sets, or parts of datasets. For example, one filter object may take data from an upstream component consisting of vector and scalar data at a particular set of points. The filter’s function may be to replace the incoming scalar field with a new scalar field consisting of the gradient of the vector field, and pass the vector field on through unchanged. Using reference counting makes it unnecessary for the filter to duplicate the unmodified vector data. Rather, all that it needs to do is register its use of it and then that filter shares ownership of it, almost as if it allocated its own copy of the data. The ability to connect multiple filter inputs to a single filter output is a powerful feature, since it relieves the application designer of the necessity to create multiple duplicate filter pipelines to generate the same data input for two different requesting components. This capability is also very useful for the integration of aural pipeline components into combined auralization/visualization pipeline architecture. Consider, for example, a stream line filter controlled by a seed point parameter. This filter’s generated data can be used not only to render a graphical image through a mapper, but can also be re-used as input to an auralization algorithm to generate an aural representation of some characteristic data along the streamline, such as fluid vorticity. The work described in Chapter 4 was done without the benefit of such a combined architecture, but rather in an ad-hoc fashion consisting of printing out the contents of internal visualization data structures, cutting and pasting to data files, and submitting those data files to a custom sound mapping tool. This was far less convenient and far more time consuming than it should have

31

been, especially while exploring different seed points to find stream lines with interesting aural representations. Furthermore, the sound had to be generated without the benefit of a concurrent complementary visualization. In the next chapter, in addition to discussing our main focus of auralization techniques, we also briefly discuss our plan for addressing some of these issues in the form of an auralization toolkit tightly coupled to a visualization infrastructure.

32

Chapter 3 Illusions for Aural Representation

In this section we explain the concept of an auditory illusion and discuss how it can be used to address some problems associated with conventional auralization techniques. The vast majority of the algorithms used for aural representation of data today consists of a mapping of scalar values onto a single sound parameter. Multivariate data representation is done by performing several such mappings simultaneously. One major problem with a straightforward mapping of scalar values onto sound parameters is that such parameters have a very limited useful range. For pitch, the human ear cannot hear anything outside a certain range, nominally 20 Hz to 20 kHz. For duration, there are both physical and practical limitations: sounds of too short a duration are difficult to distinguish (a physical limitation), whereas sounds of too long a duration can be annoying and inconvenient due to the time required for the presentation (a practical limitation). One might be tempted to map the entire data range onto the usable region of the sonic parameter space being used. Although this would, by definition, prevent data values from being mapped onto out of range sound parameters (such as 0.5 Hz pitch, or 60 KHz pitch, or a microsecond of duration), it has the adverse effect of compressing potentially subtle-yet-crucial data fluctuations onto a parameter space range in which the differences are too small to be perceived. When this occurs, we have a range problem with respect to the parameter being used for auditory presentation, whether it be, for example, pitch, or duration. This “range problem” requires new approaches for aural representation and mapping of data. We need to be able to create sounds that change in an identifiable way for small changes in data without making the sound impractical or impossible to listen to for large changes in data. How can we do this? We believe that the required algorithms can be organized within a conceptual framework based on auditory illusions. The goal in using auditory illusions is to trick the listener into believing that he or she is hearing continuous monotonic changes in a parameter of a particular sound when in fact the sonic parameters are recurring.

33

3.1 Pitch class illusion One mechanism that we have explored early in our research with tangible results is the use of pitch class to represent scalar values that are either continually changing without bound or that follow a modular arithmetic rule (i.e. values that form an equivalence class under the remainder after division by a value called the modulus). Pitch can be thought of as having two independent dimensions, pitch height and pitch class. The former refers to the octave in which a tone occurs, and the latter refers to the position within an octave, usually denoted by the letters A through G of the musical scale, where G is followed by A of the next octave [Deutsch92]. By combining tones of various pitch heights and a single pitch class, one can create a composite tone that appears to vary only in pitch class, and can therefore appear to ascend or descend endlessly in pitch [Shepard64]. The technique was demonstrated with scalar data that obeys modular arithmetic at the IEEE Visualization ‘97 conference [Volpe97]. In this demonstration, Computational Fluid Dynamics (CFD) data were used to generate twisting streamlines, where the twisting represented vorticity in fluid flow. To visualize the fluid vorticity, one can use a thick twisting stream tube. But to reduce visual clutter, we drew thin streamlines and represented the vorticity by mapping rotational orientation of the stream line to pitch class. The result was the ability to hear clockwise and counter-clockwise rotations of the stream line as endless increases or decreases in pitch class. This technique is discussed in greater detail in Chapter 4. This use of a perceptual illusion in the auditory system suggests the possibility that other illusions might be developed and harnessed for data representation. Of course, the creation of additional types of illusions would by aided by a theoretical framework that would suggest how, in general, such illusions arise. To determine this framework, we observe the characteristics of the pitch-class illusion and generalize them so that they can be applied to other sound parameters besides pitch. In the pitch class illusion, the frequency of individual component tones increases while the emphasis shifts to the lower frequency components by increasing their volume. In a way, the listener is presented with contradictory information about the nature of the change of the parameter of interest (in this case, pitch). It is reasonable to speculate that this contradictory information contributes in a large part

34

to the illusion of continuous pitch increase. It is then also reasonable to suspect that the idea of presenting contradictory information in other ways could be used to create different kinds of auditory illusions. We discuss one such way below. 3.2 Pulse Rate Illusion A sequence of “pulses” or “ticks” can be used to indicate the rate of occurrence of some events of interest, as in, for example, a geiger counter. But as the inter-pulse spacing diminishes, the pulses lose their individual nature and they blend into a continuous tone. Retaining the individuality of the pulses may be of interest for preserving the discrete character of the source data. Thus, we seek a way to provide the illusion of speeding up the pulse rate indefinitely without actually doing so. By applying the considerations of the previous section, we see that we must provide contradictory information to the listener about how the pulse separation is changing. We can do this by, for example, decreasing the spacing of consecutive pulses but emphasizing the larger spacing between every nth pulse, for some value of n. The latter can be done by increasing the relative volume of every nth pulse while decreasing the relative volume of the other n-1 pulses. To give a concrete example, every fourth pulse can remain at its full amplitude, while the three other pulses in each group of four can fade out. The user will perceive less and less of each of the pulses until eventually he or she hears only every fourth pulse. If the rate at which the pulses are delivered increases by a factor of four during this process, the sequence being presented to the user at the end will be identical to the sequence presented at the beginning, although the listener should perceive nothing but a continuous increase in pulse rate. The process can repeat by increasing the rate at which this subset of pulses is presented, then gradually fading out three fourths of those pulses, etc. Naturally, this process can be carried out in reverse to achieve the opposite effect. We discuss this in greater detail in section 6.3. Each of these techniques has numerous potentially useful applications. The applications themselves are not within the scope of this research, but for motivational purposes we present two of them from the growing field of virtual environments: •

Human-assisted path planning for part manipulation. The complexity of the problems of automated planning of collision-free robotic motion and assembly

35

part removal prompts research into ways to allow a human to assist the automatic process. This requires feedback to the user. Although a visual representation of the virtual environment would accurately depict the direct spatial relationship (e.g. euclidean distance) between a part or robot and the desired location, it might not accurately convey actual progress towards the goal state. Thus, an aural feedback of goal progress could be an ideal companion for a visual environmental context.

•

Surgical training. A future surgical training tool could aurally present the trainee feedback on the correct position and orientation of surgical instruments. While the trainee is observing a visual representation of a virtual environment containing tools and anatomical structures, he or she can learn to associate the aurally-presented correctness or incorrectness feedback with the visual scene.

3.3 Representation of multi-variate data One aspect of these auralization techniques that deserves investigation is the ability to utilize more than one of them at the same time. This could be useful for representing multivariate data. Researchers and data analysts seldom deal with a single type of data at a time, and the ability to perceive simultaneously the changing relationships between two or more independent variables of interest provides enormous opportunities for increased understanding. There are several different possibilities for doing this. One way would be to superimpose two auditory illusions of the same type, such as a pitch class illusion, with each utilizing different bands of the audible frequency spectrum. This may be difficult with the pitch class illusion due to the fact that each instance contains multiple tones with a constant frequency ratio between them, which tends to consume a large portion of the spectrum. The bandwidth requirements of each instance could possibly be reduced by using a frequency ratio between component tones that is smaller than two, but this sacrifices a certain amount of purity of the composite tone because the components are no longer separated by an octave. Another possibility would be to assign different timbres to the different instances, but different

36

timbres are created via additional harmonics, which therefore increases the bandwidth requirements of each instance of the illusion. Another possible way to represent multivariate data would be to superimpose two illusions of different types. For example, the illusions of endlessly rising and falling pitch can be presented at the same time as a pulse rate illusion, using a pulse of constant pitch, which therefore has very modest bandwidth requirements so as not to conflict with the high bandwidth requirements of the pitch class illusion. This could, however, become a distraction due to information overload. A more promising alternative would be to combine aspects of two different illusions into a single presentation. Consider a presentation of a pulse rate illusion where each pulse consists not of a single constant pitch, but rather a composite tone whose pitch class is controlled by another variable. This would simultaneously require minimal bandwidth requirements while not forcing the listener to focus his or her attention onto only one of several audio signals. We discuss this alternative further in section 7.13. Initially, we believed that this would require that the pitch class illusion not consist of a single stream of continuously changing pitch class, but rather that the pitch class change discontinuously at the pulse boundaries. However, as a consequence of the auralization data pipeline we discuss in section 5.3, this turns out not to be the case. 3.4 Other techniques In our early investigation, we speculated about some other potential techniques that could be used. There were thoughts about using stereo sound and the same principles of the preceding illusions to create an illusion of perpetual side-to-side motion. Further reflection revealed that using stereo balance to place a single frequency waveform at multiple left/right positions would cause waveform superposition that would result in a waveform in a single position in the left/right continuum. To prevent this would require that the different component waveforms would have varying frequency, and we would then simply have the pitch class illusion spread out between the left and right stereo speakers.

37

We also speculated about using another technique called asymptotic mappings as an alternative to the general concept of auditory illusions as a way to handle the data range problem in auralization. The central idea here was that a potentially infinite or semi-infinite range of raw data could be mapped onto a finite range of sonic parameters in such a way that the sonic parameters would asymptotically approach a predefined limit as the raw data extends linearly towards greater and greater (possibly unbounded) values. Although we don’t see any specific problems with this idea that prevent it from being pursued, we decided to restrict our focus to the more interesting tasks of demonstrating and testing auditory illusions so that we’d have sufficient time to treat them thoroughly.

38

Chapter 4 Preliminary Research

When we introduced the concept of the pitch class illusion in section 3.1, we mentioned that we had explored this mechanism with tangible results. In this chapter we describe the preliminary research that brought about those tangible results. We begin by describing some issues that motivate the use of an auditory illusion, as opposed to simpler aural techniques, with certain types of data. Then we provide some background information on a technique for visualizing fluid flow containing vortices, and point out some limitations that make it suitable for aural augmentation. We follow that with a more detailed explanation of the pitch class illusion, and show how to correlate this illusion with the vorticity information we wish to present aurally. Finally, we provide a discussion on our initial results. 4.1 Motivation Linear mapping of scalar data values onto fundamental sound parameters is a common technique for auralization. Although this works well for some types of information, it works less well for others. For example, if the quantity in question represents the orientation of an object in a plane, mapping a rotational offset from a reference orientation onto a sound parameter such as pitch becomes problematic. As the object is rotated clockwise or counter-clockwise, the pitch should increase or decrease. But a problem arises when the object continues to rotate past a full 360 degrees. Either the pitch must continue in the same direction, or it must return to the starting point. The former case has the drawbacks that the pitch will eventually exceed the listener’s hearing capabilities, and that different sounds are used to represent the same physical object orientation. The latter case has the drawback that a discontinuity occurs in the auditory display, with a small change in orientation resulting in a large change in pitch in the opposite direction. This chapter presents a means by which this kind of information can be mapped onto sound in such a way that it does not suffer from the drawbacks previously

39

mentioned. The technique can be used to auralize the vorticity of a streamline through a vector field. 4.2 Visualization with stream tubes Schroeder et. al. [Schroeder91b] describe a graphical technique for visualization of vector fields known as a stream polygon. It builds upon a streamline, a curve in a three-dimensional volume which depicts the path that a massless particle would take in the vector field. The stream polygon is a regular n-sided polygon that is swept along a streamline. The result is either a discrete series of polygons, or a continuous volume known as a streamtube. In either case, rotation of the streamtube or set of polygons is used to represent the stream vorticity, which measures, loosely speaking, the rate of angular velocity of the fluid flow at each point. Figure 1 depicts such a streamtube being used to visualize the vector field of burning fuel velocity in an aircraft engine combustor (whose wireframe outline is only partly visible due to viewing location). The streamtube was generated using the VISAGE scientific visualization system [Schroeder92]. Visible in the image are places where the streamtube bends due to changes in the local velocity vector and places where it rotates due to fluid vorticity. In Figure 2 the same streamtube is viewed from another vantage point, in which the portion containing the high degree of vorticity is obscured. This demonstrates one drawback of the technique: important aspects of the data can easily be overlooked if the image is rendered from an unfortunate vantage point, either because the object partially obscures itself, or because the object obscures, or is obscured by, other objects in the scene. This is an example of the second form of visual overloading described earlier: a 2D display device invariably requires some loss of information when that information is inherently 3D in nature. This effect can be partly compensated for by reducing the thickness of the streamtube, but this has the disadvantage of making the vorticity harder to see due to resolution limitations. Another way around this problem is to represent vorticity as color changes along a thin streamline. Although it is not apparent in the black and white images, the VISAGE system uses color along the streamtube to represent

40

Figure 1. Streamtube Streamtube in engine combustor showing vorticity

temperature within the combustor. It would be possible to encode temperature as intensity of a particular color component, such as blue, thereby allowing the red and green components to encode vorticity magnitude and direction. But then it becomes difficult to visually segregate the individual components from the composite color, resulting in the first form of visual overloading described earlier. 4.3 Fooling the ear Deutsch describes a perceptual anomaly in the human auditory system that involves an ambiguity that occurs when a listener is presented with a certain combination of sounds of varying pitch [Deutsch92]. The “pitch” of a sound can be thought of as consisting of values along two orthogonal dimensions, known as pitch class and pitch height. Pitch height is an indicator of which octave a tone occurs in, whereas pitch class refers to the tone’s position within an octave.

41

Figure 2. Streamtube obscured Streamtube with high vorticity component partially obscured

Musicians denote different pitch classes with letters from A through G, interspersed with sharps (denoted by the pound sign, ‘#’) and flats (denoted by a symbol that somewhat resembles a lower case ‘b’). But the pitch class dimension is actually a continuum of values, not merely the discrete set of twelve semitones commonly used in musical compositions. This continuum of pitch classes is topologically a circle, with the notes A through G located at specific positions along this circle, and with G followed by A. This can be seen in Figure 3, which shows the pitch class circle containing the standard set of twelve semitones commonly used in western music. (Each of the “sharp” notes is also known as a corresponding flat note. For example, “C sharp” is the same note as “D flat”. This, however, is not indicated in the figure.)

42

B (ti)

C (do) C#

D (re)

A#

D#

A (la)

E (mi)

G#

G (sol)

F#

F (fa)

Figure 3. Pitch class circle Pitch class circle showing twelve standard semitones used in western music, with names of non-sharp notes in parentheses below

By combining sinusoidal waves of equivalent pitch class but different pitch height, one can construct tones that appear to have a definite pitch class but whose pitch height is ambiguous. Shepard first demonstrated this with a series of such tones, generated by a computer, which appears to ascend endlessly [Shepard64]. Deutsch also describes a paradox, called the “tritone paradox”, in which two tones on opposite sides of the pitch class circle (i.e. tones separated by six semitones, or a “tritone”) have completely ambiguous relative pitch. That is, some people judge the first to be higher than the second, while others judge the second to be higher than the first.

43

4.4 Vorticity auralization Since the pitch class dimension consists of values along a circular continuum, it provides us with a natural mapping of streamtube rotation onto sound. Different rotational offsets of a streamtube from a starting orientation can be represented by tones of different pitch class. With this information presented aurally, large selfobscuring streamtubes can be replaced with simple streamlines, permitting simultaneous viewing of other objects in the scene, or even of multiple streamlines. Before discussing the specific requirements of vorticity mapping, let’s first look at how one can create a tone that appears to ascend or descend endlessly. 4.4.1 Creating the illusion Although Shepard gives complete equations for generating the set of complex tones, we did not have Shepard’s work available to us at the time we prototyped the technique, and therefore our equations are not quite the same. One subtle difference is that his equations are expressed in a form convenient for producing a discrete sequence of complex tones, whereas we sought to produce a continuously varying tone with which to map continuously varying rotational offsets. Another distinction is in the choice of mapping from frequency to amplitude. An overview of the reasoning by which our equations were derived may be worthwhile. It is not difficult to imagine what the requirements must be of a composite tone that appears to ascend endlessly. We already know that it should move forward in the pitch class circle but contain ambiguous pitch height. A tone with a definite pitch class but ambiguous pitch height would need to be comprised of a summation of component tones, each a harmonic of a fundamental frequency identifying the pitch class. In practice, however, it is not necessary for the tone to have a specific pitch class. Rather, all that is necessary is that incremental changes in all of the components be perceived as getting higher or lower, and that the component tones produce a pleasant sounding (or at least non-irritating) combination. In light of this, it is not necessary that the frequency ratio between successive components be equal to two (i.e. one octave). Ratios of 3/2, 4/3, and 5/4 yield agreeable results. These are roughly equal to the ratios of notes in the chords C-G, C-F, and C-E, respectively.

44

In order to produce an incremental increase in the apparent pitch, each component tone would have to increase by some constant factor. Also, after several incremental increases in the frequency of each component, we want to be able to return to the sound with which we started without causing any discontinuities. This requirement implies that the lowest frequency component must start off with zero amplitude and that the highest frequency component must decay to zero amplitude as the frequency of each component approaches the original frequency of the next higher frequency component. Additionally, we want the component tones in the middle of the frequency range to have the highest amplitude. One way to achieve these goals is to have the entire range of frequencies be mapped on a logarithmic scale to the range of 0 to π and let the sine of the result be the amplitude of the component tone. This relationship of amplitude as a function of frequency is shown in Figure 4, in which the components of a particular tone (in this case, there are four of them) and an incrementally higher pitch tone are shown as vertical lines positioned at the frequency of each component with a height equal

Amplitude

to the amplitude of that component.

Log of Frequency Initial tone components Components giving incrementally higher pitch

Figure 4. Amplitude Envelope Amplitude as function of frequency for two tones each with four components

45

4.4.2 Mapping the vorticity As can be seen in Figure 4, as the vertical lines move to the right, they arrive at a configuration equal to that at which they started. Since the same holds true for an object as it is rotated through 2π radians, it is natural to map this range of rotation onto the range of motion of the vertical lines representing the sound components. Mapping streamline vorticity onto sound then becomes straightforward: for a given streamline or streamtube orientation θ (i.e. rotational offset from a starting orientation), determine the fraction of the “distance” (in the frequency domain) between successive sound components that each component must be shifted. Then determine the frequency and amplitude of the sound component at that location. To determine the actual frequency and amplitude of each tone, several parameters must be chosen. These are the ratio R of frequencies between successive tone components, the base frequency of the first component (BF0), and the number of components (n) comprising the sound. The base frequency (corresponding to θ=0) of each of the n components is given in terms of the base frequency for the first component (component number zero) as follows BFi = BF0 × Ri

0≤i

Algorithms for Aural Representation and Presentation ...

Algorithms for Aural Representation and Presentation ...

Suggest Documents

Representation independent algorithms for

3D TIME-BASED AURAL DATA REPRESENTATION ... - Ivica Ico Bukvic

Effect of “Sound Fonts” in an Aural Presentation - UI4ALL

Representation independent algorithms for ... - Semantic Scholar

Representation, Search and Genetic Algorithms - Semantic Scholar

MultiAspect Graphs: Algebraic representation and algorithms arXiv ...

Design of Representation-Changing Algorithms

a novel representation and algorithms for (quasi ... - Semantic Scholar

Grubbe 1 Presentation, not Representation: Cubist ...

Aural Tests - ABRSM

The Aural Enabler - downloads.smarttech.com

Aural myiasis - Semantic Scholar

Aly Magassouba Aural servo:

Genetic algorithms with different representation ... - Semantic Scholar

memory management algorithms representation to convert logical

Musicianship & Aural Training for - Deborah Smith Music

Musicianship & Aural Training for - Deborah Smith Music

A New Representation in Evolutionary Algorithms for the ... - CiteSeerX

Acoustic Measurement and Model Predictions for the Aural

Bilateral aural foreign bodies - Informit

Aural Antennae - Semantic Scholar

The Aural Enabler - downloads.smarttech.com

Perry Mason, Aural Barrister - sperdvac

Musical Knowledge for ABRSM Aural, GCSE and A levels