Three-Dimensional Virtual Acoustic Displays - CiteSeerX

14 downloads 26487 Views 2MB Size Report
Jul 19, 1991 - M. Wenzel, Ames Research Center, Moffett Field, California. July 1991 ..... at creating what we might now call a virtual acoustic display. One of ...
NASA Technical

Memorandum

103835

Three-Dimensional Acoustic Displays

Virtual

Elizabeth M. Wenzel

(NA_A-TM-I03_ 5) ACOUSTIC _ISPLAY5

THR_E-DIMENS (NASA) 30

ImNAL p

N91-30697

VIRTOAL CSCL 05C

G3153

July 1991

National Aeronautics and Space Administration

Unc13s 0036_19

NASATechnical

Memorandum

103835

Three-Dimensional Acoustic Displays Elizabeth

M. Wenzel,

Ames Research

July 1991

National Aeronautics and Space Administration Ames Research Center Moffett Field, California 94035-1000

Center,

Moffett

Virtual Field, California

SUMMARY

The development of an alternative medium for displaying information in complex humanmachine interfaces is described. The three-dimensional virtual acoustic display is a means for accurately directional events

transfering information to a human operator using the auditory and semantic characteristics to form naturalistic representations

in remotely-sensed

envisioned utility ment

as a component

in that context. of advanced

plays, project future.

interfaces

current

recent

at NASA

Ames

in some

detail,

be driven

and will no doubt

of the display

or constraints. acoustic

implementation

and finally

can stand

displays,

outlines

some

critical

of human

percep-

on this view,

characterizes describes research

it is

that the develop-

In expanding

and application,

alone,

find its greatest

has been

fu'st by an understanding

capabilities uses of virtual

to their

the technology

environment

in the design

should

and potential

approaches

Although

multisensory

philosophy

and later by technological

addresses

reviews

environments.

of a larger

The general

computer

tual requirements, the paper

or simulated

modality; it combines of dynamic objects and

such disthe research

issues

for the

INTRODUCTION

Rather

than focus

justification

on the "multi"

and development

part of multimedia

of a particular

medium,

interfaces,

this paper

the three-dimensional

will emphasize virtual

the

acoustic

display.

Although the technology can stand alone, it is envisioned as a component of a larger multisensory environment and will no doubt find its greatest utility in that context. The general philosophy in the design

of the display

driven bilities

flu'st by an understanding of human perceptual requirements, and later by technological or constraints. In expandingon this view, I will address why virtual acoustic displays

ful, characterize

has been

the abilities

of such

tion and application, describe some critical research issues advance for neglecting momentum.

that the development

The recent complex

any important

this problem the ubiquitous

burgeoning systems has been

faces can provide

work

VIRTUAL

of computing of information

to develop

combination familiarity

review

the current research for the future. Since

WHY

ingly

displays,

of advanced

or issues

approaches

in an area that seems

ACOUSTIC

technology

requires

that people

graphical

metaphor

complex computer

and the mouse.

across

should

to their

be capaare use-

implementa-

and finally I apologize

to be rapidly

outline in

gaining

DISPLAYS?

increasingly

direct-manipulation,

and consistency

recent

interfaces

at NASA Ames in some detail, these goals are rather ambitious,

and control

of the desktop

some

computer

applications,

Such

learn

to interpret

machines.

to

exemplified

by

interfaces

spatially-organized

thus avoiding

increas-

One approach

much

inter-

of the task-

dependent learning of the older text-oriented displays. Lately, a considerable amount of attention has been devoted to a more ambitious type of reconfigurable interface called the virtual display. Despite the oft-touted

"revolutionary"

nature

of this field,

the research

has many

antecedents

in previous

work in three-dimensional computergraphics,interactiveinput/outputdevices,andsimulationtechnology.Someof theearliestwork in virtual interfaceswasdoneby Sutherland(1968)usingbinocular head-mounteddisplays.Sutherlandcharacterizedthe goalof virtual interfaceresearch,stating, "The screenis a window throughwhich oneseesa virtual world.The challengeis to makethatworld look real,actreal,soundreal,feel real." As technologyhasadvanced,virtual displayshaveadopted a three-dimensionalspatialorganization,in orderto providea morenaturalmeansof accessingand manipulatinginformation.A few projectshavetakenthespatialmetaphorto its limit by directly involving the operatorin a dataenvironment(e.g.,Furness,1986;Brooks, 1988;Fisheret al., 1988). For example,Brooks(1988)andhis colleagueshaveworkedon a three-dimensionalinterfacein which a chemistcanvisually andmanuallyinteractwith a virtual modelof a drugcompound, attemptingto discoverthe bondingsite of a moleculeby literally seeingandfeelingthe interplayof thechemicalforcesat work. it seemsthatthekind of "artificial reality" oncerelegatedsolelyto the specializedworld of the cockpitsimulatoris nowbeingseenasthe next stepin interfacedevelopmentfor manytypesof advancedcomputingapplications(Foley, 1987). Oftenthe only modalitiesavailablefor interactingwith complexinformationsystemshavebeen visual andmanual.Many investigators,however,havepointedout the importanceof the auditory systemasanalternativeor supplementaryinformationchannel(e.g.,Garner,1949;Deatherage, 1972;Doll et al., 1986).Most recently,attentionhasbeendevotedto the useof non-speechaudioas an interfacemedium(Patterson,1982;Gaver,1986;BegaultandWenzel, 1990;Blattneret a1.,1989; Buxton et al., 1989).For example,auditorysignalsaredetectedmorequickly thanvisual signalsand tendto producean alertingor orientingresponse(Mowbray andGebhard,1961;Patterson,1982). Thesecharacteristicsareprobablyresponsiblefor the mostprevalentuseof non-speechaudioin simplewarningsystems,suchasthemalfunctionalarmsusedin aircraft cockpitsor thesirenof an ambulance.Anotheradvantageof auditionis that it is primarily a temporalsenseandwe are extremelysensitiveto changesin anacousticsignalovertime (Mowbray andGebhard,1961; Kubovy, allows

1981). This feature us to relegate

sustained

suited to monitoring malfunction. Non-speech fully designed compelling

tends

signals

have

over time,

the potential perceptual

sounds

to provide

than a silent

gies can be gleaned

from the fields

of music

when

an even

in mind.

"sound track" to the task at hand. or merely uninformative. Principles

event

to our attention

to the background.

for example,

abilities

and informationally-rich

an appropriate cacophonous

a new acoustical

or uninformative

state changes

with human

to bring

richer

following

from Gibson's

(1979)

audio

medium

with sound

a computer

is particularly

suddenly

interface

begins

if they are careis much

more

be enhanced

(Deutsch,

1982; Blattner

ecological

approach

et al., 1989),

studies of the acoustical deter1981; Buxt0n et al., 1989). For

to perception,

one can conceive

convey meaning about discrete events or ongoing actions in the world and their relationships an0ther_ One could systematicaily manipulate these features, effectively creating an auditory which

operates

on a continuum

by

psychoacoustics

audible world as a collection of acoustic "objects." Various acoustic features, such as temporal onsets and offsets, timbre, pitch, intensity, and rhythm, can specify the identities of the objects

bology

to

If used properly, sound need not be distracting or of design for auditory icons and auditory symbolo-

(Carterett e and Friedman, i 978 i Patterson, 1982), and psychological minants of perceptual organization (Bregman, 1981; 1990; Kubovy, example,

display

Just as a movie

film, so could

Thus

a car engine

and conversely,

from "literal"

2

everyday

sounds,

such as the clunk

of the and to one sym-

of mail in

your mailbox(e.g.,Gaver's"SonicFinder," 1986),to a completelyabstractmappingof statistical datainto soundparameters(Bly, 1982;Smith etal., 1990;Blattneretal., 1989). Sucha displaycould be furtherenhancedby takingadvantageof the auditorysystem's ability to segregate,monitor,andswitchattentionamongsimultaneoussourcesof sound(Mowbrayand Gebhard,1961).Oneof the mostimportantdeterminantsof acousticsegregationis an object'slocation in space(KubovyandHoward, 1976;Bregman,1981,1990;Deutsch,1982). A three-dimensionalauditorydisplaymaybe most sentation

of spatial

information

workload

is high.

Such displays

tional

with iconic

is important,

particularly

can potentially

information

usefully when

enhance

in a quite naturalistic

applied visual

in contexts

cues are limited

information

transfer

representation

where

us to monitor

and identify

sources

of information

of dynamic

from all possible

or absent

by combining objects

face. Borrowing a term from Gaver (1986), an obvious aspect of"everyday listening" we live and listen in a three-dimensional world. A primary advantage of the auditory allows

the repreand

direc-

in the interis the fact that system is that it

locations,

not just the

direction of gaze. In fact, I would like to suggest that a good rule of thumb for knowing when to provide acoustic cues is to recall how we naturally use audition to gain information and explore the environment; provide

that is, "the function

a more coarsely-tuned

analyses.

For example,

for a target central

fic control integrate parallel

field.

(ATC)

approach

plays

visual

displays

display

proposed

ATC

nications

incoming

from

systems. traffic

useful

simplicity

visual

which

finely-tuned even

spatial

aircraft.

correspond

be tracked

such as a complex situations

time. A second

signal

like potential

directional listener's

over

information

with a unique runway

advantage

of the binaural

alerting

rhythm,

Again,

could

the boundaries

involves

temporal

incursions.

and urgency

head, e.g., within

A second

example

could

the signal

be emphasized

by placing

of their "personal system,

often

referred

hears

location

for ATC.

space"

be processed the warning (Begault

to in

disben-

commu-

in the terminal aircraft are on a and their routes

An auditory to convey

icon,

of urgent true

close to the

and Wenzel,

to as the "cocktail

Ames,

significant

also be used as a warning

could

asked

of acoustic

the controller

systems

in the as air traf-

at NASA

area. In such a display, it should be more immediately obvious to the listener when potential collision course because they would be heard in their true spatial locations could

such

such as the triple

two types

to their actual

search

are being

patterns,

Research

in which

visual

for objects

tasks,

that they will provide

display

can

visual

controllers

landing

will emphasize

and the likelihood

in positions

search,

ATC

complex

is an ATC

system

that aurally-guided

in inherently

the flow of incoming

the auditory

of our more

For example,

Administration,

One example

Thus

reported

to unaided

or cockpit.

Aviation

of their conceptual

the attention

into increasingly

to maximize

with the Federal

because

is superior

air traffic

the eyes."

have recently

will be especially

for the tower

heavy

efits to current

to direct

et al. (1991)

Such features

increasingly

collaboration

mechanism

Perrott

in a cluttered

visual

of the ears is to point

party

1990).

effect",

is that

it improves the intelligibility of sources in noise and assists in the segregation of multiple sound sources (Cherry, 1953; Bronkhorst and Plomp, 1988). This effect could be critical in applications involving

the kind of encoded

non-speech

messages

acoustic representation of multi-dimensional data 1990), or the development of altemative interfaces et al., 1990).

Another

aspect

of auditory

spatial

proposed

for scientific

(e.g., Bly, 1982; Blattner for the visually impaired

cues is that, in conjunction

"visualization,"

the

et al., 1989; Smith et al., (Edwards, 1989; Loomis with the other

senses,

they can act as potentiators of information in a display. For example, visual and auditory cues together can reinforce the information content of a display and provide a greater sense of presence

3

or

realismin a mannernot readilyachievedby eithermodalityalone(Colquhoun,1975;O'Leary and Rhodes,1984;Warrenet al., 1981).Similarly, in direct-manipulationtasks,auditorycuescanprovide supportinginformationfor the representation of force-feedback(Wenzelet al., 1990),a quite difficult interfaceproblemfor multimodaldisplayswhich is only beginningto besolved(e.g., Minsky et al., 1990).Intersensorysynergismwill be particularlyusefulin telepresenceapplications, including advancedteleconferencing(Ludwig et al., 1990),sharedelectronicworkspaces(Fisher et al., 1988;GaverandSmith, 1990),monitoringteleroboticactivitiesin remoteor hazardoussituations (Wenzeiet al., 1990),andentertainmentenvironments(KendallandMartens,1984;Kendall andWilde, 1989; Cooper and Bauck, 1989). Thus, the combination of veridical spatial cues with good

principles

of iconic

design

could

provide

an extremely

powerful

and information-rich

display

which is also quite easy to use, Here, the term veridical is used to indicate that spatial cues are both realistic and result in the accurate transfer of information; e.g., the presentation of such cues results in accurate From some

estimates

of perceived

the above

considerations,

of the goals

to keep

perceptual

research.

related

location

by human

one can attempt

in mind

when

A virtual

mation to a human operator using the auditory acteristics to form naturalistic representations simulated

environments.

virtual

representation

should

provide

As with visual

a functional

atically verify (1) adequately sources

which

the ongoing dynamic

in three

needs

static

or streams a display

this definition from reality.

to human

which

can be displayed;

may potentially

(I 990; Durlach

audition

above,

1986)

the utility

transfering

does not necessarily it implies

in the context

infor-

mean

charor

that the

that the display

of the task to be performed.

patterns

of representing

and interactive;

to provide motion,

for example,

has proposed

viable. Therefore the display must: resolution and dynamic range, (2) pre-

(3) be capable a stable

acoustic

and (6) be flexible

real environmental or objects.

normal

perceptual

multiple

that is, responsive environment acoustic

capabilities.

with

icons,

to this approach

cues could

to

in the type of acoustic

sounds,

A corollary

that localization

For example, be artificially

speech,

is that such Durlach magnified

to

ability.

OF THREE-DIMENSIONAL

sources

and list

and conducting

for accurately

Rather,

(4) be real-time

with head

auditory

localization

ANTECEDENTS

As noted

dimensions,

be used to enhance

and Pang,

a kind of super

ize the various

spatial

or moving,

correlated

of multidimensional

display

a great deal about our sensory biases; that is, the what, when, used by the human listener. It also means that we must system-

of the user, (5) be head-coupled

cues appropriately

information

create

accurately can be either

acoustic

technology

is a medium

that the displays we develop are perceptually reproduce the audible spectrum in frequency

sent information

studies.

modality; it combines directional and semantic of dynamic objects and events in remotely-sensed

displays,

equivalence

a virtual

the supporting

display

must be indistinguishable

To achieve this goal, we must know and how of the acoustic information

in psychophysical

to a define

developing

acoustic

listeners

of a 3D auditory

of information

in auditory

VIRTUAL

display space.

greatly While

ACOUSTIC

depends

DISPLAYS

on the user's

compromises

obviously

ability

to local-

have

to be

made to achieve a practical system, the particular features or limitations of the latest hardware should be considered subservient to human sensory and performance requirements. Thus, designers of such interfaces must carefully consider the acoustic cues needed by listeners for accurate localization ensure that these cues will be faithfully (or at least adequately, in a human performance sense)

4

and trans-

ducedby the synthesisdeviceratherthanlettingcurrenttechnologydrive the implementation.In fact,knowledgeaboutsensoryrequirementsmight actuallysaveprocessingpowerin somecasesand indicateothersto which moreresourcesshouldbedevoted.

Psychoacoustical Much which

of the research

emphasizes

quencies

on human

research to localize

1969; Butler stimuli

differences review

sounds

over headphones,

studies ization

1977;

appropriate

as being

to an external

account

interaural Similarly,

inside

cues

are minimal subjects

though

are present

ears or pinnae.

externalization ally-veridical

Experiments

have

shown

that spectral

or the "outside-the-head" sensation localization over headphones should

as well as the interaural

difference

shaping

to

temporal

I974).

by the pinnae

Many

Prior

to the development

some

early

was the rather locating

attempts

amazing

enemy

aircraft.

directional

pinnae

(FLYing

By Auditory

tem used only crude indicate

turn, bank,

Much

later,

information.

techniques apparatus

and an expanded

left/right

intensity

In general,

of the Head-Related

One class of techniques kins, such as the KEMAR

panning

along display

with pitch

World

simulating

Function

have concentrated

from binaural

(Knowles

Electronics,

recording Inc.)

called

FLYBAR

pattern

changes

to

flying. auditory

localization

reproduction, on various

cues

and eventually, means

(e.g., Hudde

as a to

for reproducing

and measurement

and the development

and Neumann

and

in the form of

War II. This sys-

that is, the direction-dependent

signal by the outer ears. The nature detail.

derives

cues

display

and temporal

veridical

there

One of these

War I for detecting

just after World

in stereo

(HRTF);

display.

localization

for instrument

experience

the approaches

(1946)

localization,

acoustic

axis. A less elaborate by Forbes

to think about

Transfer

effects imposed on an incoming will be considered later in more

interaural

the listening

out-of-head

now call a virtual

of the use of enhanced

in an acoustic

began

and enhancing

for synthesizing

was developed

and air speed

that perceptuby the pinnae

synthesized.

(fig. 1) used during

It is an early example Reference)

direc-

to Implementation

what we might

pseudophone

investigators

way of analyzing the effects

of current at creating

is highly

accuracy (Gardner responsible for

(Plenge, 1974). Such data suggest be possible if the spectral shaping

cues are adequately

Approaches

display

of

(Blauert, listen

interaural

(Plenge,

tion dependent (Shaw, 1974), that the absence of pinna cues degrades localization and Gardner, 1973; Oldfield and Parker, 1984b), and that pinna cues are primarily

large

(see Blauert,

now suggest that deficiencies of the duplex theory reflect the important contribution to localof the direction-dependent filtering which occurs when incoming, sound waves interact with

the outer

were

at low fre-

for the ability

when

the head even location

theory"

1907). However,

with this approach

1986).

source

Rayleigh,

it cannot

where

and Parker,

"duplex

in time of arrival (Lord

limitations

For example, plane

in the classic

differences

to serious

median

they are perceived

differences

interaural

hearing).

Oldfield

is summarized

at high frequencies

points

of spatial

on the vertical

and Belendiuk,

and intensity

cues,

in intensity

over the last 25 years

1983, for an extensive subjects

localization

the role of two primary

and interaural

binaural

sound

Antecedents

acoustic of the HRTF

of normative and Schroter,

mani198 l)

OR1G!NAL BLACK

AND

PAGE

WHITE

PHOTOG_ri ORIGINAL

PAGE

OF POOR

QUALITY

Figure World

1, Photo of the pseudophone apparatus used for detecting and localizing aircraft during War I (from Scientists in Power, Spencer R. Weart, Harvard University Press (Cambridge,

Mass.;

reproduced

with permission,

Niels

Bohr

Library,

American

Institute

of Physics,

New

York,

NY)). artificial heads, used for applications like assessing concert hall acoustics (see Blauert, 1983). Recent examples of a real time version of this approach in information display include the work by Doll at the Georgia the Super

Institute Cockpit

of Technology Project

at Wright-Patterson

projects listener

used a movable heard headphone

coupled

to that of the listener's

Another

In this analog

approximated

using

dynarnically linked such as an intensity

digital impulse

artificial signals

various

responses

Base

ALl00

(see Calhoun

system

developed

for

et al., 1987). These The

own head. display

system,

which

types

of simple

is the work by Loomis worked filters

well in an active with interaural

et al. (1990) tracking

on a navigation

task,

spatial

time and intensity

work

of HRTFs.

since the early Techniques

in the ear canals

80s has been devoted

for creating

of either

digital

individual

filters

subjects

to the measurement based

cues were

heads

cues

and real time

on measurements

or artificial

of finite

have been

under

development since the late 70s. But it is only with the advent of powerful new digital signalprocessing (DSP) chips that a few real-time systems have appeared in the last few years in Europe

6

aid

differences

to head motion. The display also included simple distance and reverberation rolloff with distance and the ratio of direct to reflected energy.

of the recent

synthesis

Air Force

and the Gehring

head to simulate moving sources and correlated head-motion. transduced in the ears of a manikin which was mechanically

type of real time virtual

for the blind.

Much

(Doll et al., 1986)

I_

andthe United States.In general,thesesystemsareintendedfor headphonedeliveryandusetimedomainconvolutionto achieverealtime performance. Oneexampleis the CreativeAudio Processor,a kind of binauralmixing console,developedby AKG in Austriaandbasedon ideasproposedby Blauert(1984).The CAP 340M is aimedat applicationslike audiorecording,acousticdesign,andpsychoacoustic research(Persterer,1989).This particularsystemis ratherlarge,involving anentirerack of digital signalprocessorsandrelated hardware.The systemis alsoratherpowerful in thatup to 32 channelscanbe independently"spatialized" in azimuthandelevationalongwith variablesimulationof room responsecharacteristics.Figure 2, for example,illustratesthe graphicalinterfaceof the systemfor specifyingcharacteristicsof the binauralmix for acollectionof independently-positioned musicalinstruments.A collectionof HRTFsis offered,derivedfrom measurements takenin theearcanalsof both manikinsandindividual subjects.AKG's original measurements weremadeby Blauertandhis colleagues(Blauert, personalcommunication).In a newproduct,which simulatesanidealcontrolroomfor headphone reproduction,the BAP 1000,the userhastheoption of havinghis/herindividual transformsprogrammedonto a PROMcard.Interestingly,AKG's literaturementionsthatbestresultsareachieved with individual transforms.Currentlythereareplansfor the systemto beusedin anOctober 1991 missionof the RussianSpaceProgram.The AUDIMIR studyexamineswhetheracousticcuesfor orientationcaneliminatemismatchof auditoryandvestibularcuesandthuscounteractspacesickness(AKG Report,Nov. 1989). Otherprojectsin Europederivefrom theeffortsof a groupof researchers in Germany.This work includesthe mostrecentefforts of JensBlauertandhis colleaguesatthe RuhrUniversity atBochum (Boergeret al., 1977;LehnertandBlauert,1989;Posseltet al., 1986).The groupat Bochumhas beenworking on a prototypePC-basedDSPsystem,againa kind of binauralmixing console,whose proposedfeaturesincluderealtime convolutionof HRTFsfor up to four sources,interpolation betweentransformsto simulatemotion,androommodeling.The grouphasdevotedquitea bit of effort to measuringHRTFsfor both individual subjectsandartificial heads(e.g.,the Neumannhead), aswell asdevelopingcomputersimulationsof transforms. Anotherresearcherin Germany,KlausGenuit,workedat the Instituteof Technologyof Aachen andlaterwenton to form his own company,HEAD Acoustics.HEAD Acoustics has also produced a real time, version

four-channel

binaural

of an artificial

head

mixing

(Gierlich

development of a structurally-based That is, rather than use individualized description resonances,

(based torso,

pie, the outer zation

ears are modeled

are within

In the United Air Force

Base,

and simulator

for room

1989). Genuit's

acoustics

adds some

States, McKinley

single source in azimuth KEMAR manikin made

similar

notable

for his

model of the acoustic effects of the pinnae (e.g., Genuit, 1986). HRTFs, Genuit has developed a parameterized, mathematical

as three flexibility

the variability

as well as a new

work is particularly

on Kirchhoff's diffraction integrals) of the acoustic effects shoulder, and head. The effects of the structures have been

of the model

transforms

console and Genuit,

cylinders

of different

to this technique

of directly-measured

projects

and Ericson

are currently (1988)

and length. states

The parameteri-

that the calculated

HRTFs. in progress.

developed

in real time. The system at 1° intervals in azimuth

diameters and Genuit

of the pinnae, ear canal simplified; for exam-

For example,

a prototype

uses HRTFs based with a head-tracker

system

at Wright-Patterson

which

synthesizes

a

on measurements from a to achieve source stabilization.

I

IN 4

OBOE n

.

%

(_

mLT

I,

0

o

/

ROOM

nENUE

._TO

SET MAIN

ST MIX

SET SPOt

kO ROOM

SET

WALL

LO

ROT

MAIN

MIX_MEM

SPOT

0 UIRT.

MIX

WALLS

MOVE

MEM

LiST

OELETE

UALL RE_

_ASTER

EXIT

Figure 2. Illustration of the graphical interface of AKG's Creative Audio Processor for specifying characteristics of the binaural mix for a collection of independently-positioned musical instruments (adapted from product literature for the CAP 340 M).

Gary Kendall system aimed

and his colleagues at Northwestern University have also been working on a real time at spatial room modeling for recording and entertainment (Kendall and Martens,

1984). Recently, Gehring Research has offered a software application for anechoic simulation using a Motorola 56001-based DSP card which uses two sets of HRTFs with the filters truncated to conform

to the limitations

of the DSP chip.

Kendall's group and the other of Wisconsin, Madison.

THE

One set is from a KEMAR

is from an individual

NASA

AMES

human-computer

Workstation

(VIEW)

part approach: coustic

interfaces.

project

(1) develop

principles,

(2) in parallel,

the synthesis psychophysical

technique studies,

the approach

to synthesis

et al., 1988).

a technique

Begault

As noted binaural

above,

or the ear canals sented

develop

in both basic

closely transfer tions

To achieve

and applied

our objective,

localized,

acoustic

Stone

at NASA

placed

of a human

(Butler

there

contexts.

both pinnae

and Belendiuk,

1977).

When

and veridical

1977; Blauert,

with which

differences

the pinnae,

head,

For example, speaker

listener

test stimuli

we synthesize

location

who is seated

are presented

a collaborative (Groveland, and since 1988,

stimuli

perception

stimuli.

These

144 equidistant

chamber locations

1974;

recorded

cues

involves

Doll et al., 1986) this way are pre-

of 3-D auditory

1983; Doll et al., 1986).

in an anechoic

from

difference

(Plenge,

space

Our procedure

is

we measure the acoustical and use these transfer func-

Head-Related

Transfer

Functions

using techniques adapted are placed near each

(Wightman

and Kistler,

in the anechoic

1989a).

chamber.

A new

responses is then measured for each location in the spherical array at intervals of 15 ° 18 ° in elevation. HRTFs are estimated by deconvolving the loudspeakers, test stimu-

lus, and microphone and Kistler, 1989a). aural

a four-

on psychoa-

to implement

has been

and interaural

(HRTFs), in the form of Finite Impulse Responses (FIRs), are measured from Mehrgardt and Mellert (1977) (see fig. 3). Small probe microphones

pair of impulse in azimuth and

required

The research

in the ears of a manikin

is an immediate

and Belendiuk,

as the basis of filters

Wide-band

based

Ames.

for capturing

of a human

Environment

we have taken stimuli

technology

related to binaural recording. Rather than record stimuli directly, functions, from free-field to eardrum, at many source positions,

eardrum

Virtual

Scott Foster of Crystal River Engineering of the University of Wisconsin, Madison,

one technique

1974; Butler

at the University

PROJECT

as part of the Ames

the signal-processing

with microphones

over headphones,

(Plenge,

by

in real time, (3) perceptually validate the synthesis technique with basic and (4) use the real time device as a research tool for evaluating and refining

and Philip

recording

by Wightman

DISPLAY

began

for synthesizing

effort between myself as project director, Calif.), Fred Wightman and Doris Kistler Durand

measured

has been working on a real time system for use in both and applied studies of acoustic information display in

The research

(Fisher

measured

3-D AUDITORY

Since 1986, our group at NASA Ames basic research in human sound localization advanced

subject

manikin

responses from the recordings made with the probe microphones The advantage of this technique is that it preserves the complex

over

the entire

shoulders, the insets directly

spectrum

of the stimulus,

thus capturing

the effects

(Wightman pattern of interof filtering

by

and torso. in figure

3 show

a pair of FIR filters

to the left and at ear level,

measured

for one subject

that is, at -90 ° in azimuth

for a

and 0 ° in elevation.

As

you would expect,the waveformfrom this sourcearrivedfirst andwaslargerin the left earthanthe responsemeasuredin the right ear.The frequency-dependent effectscanbe analyzedby applyingthe FourierTransformto thesetemporalwaveforms. Figure4 showshow interauralamplitudeandphase(or equivalentlytime) variesasa functionof frequencyfor four differentlocationsin azimuthat 0° in elevation.Forexample,the top-left panels showthatfor 0° in azimuthor directly in front of the listener,thereis very little differencein the amplitudeor phaseresponsesbetweenthetwo ears.Onthe otherhand,in thetop-rightpanelsfor 90° or directly to the listener'sright, onecanseethat,acrossthe frequencyspectrum,the amplitudeand phaseresponsesfor theright eararelargerandleadin time (phase)with respectto theleft ear. In orderto synthesizelocalizedsounds,a mapof "location filters" is constructedfrom all 144pairsof FIR filters by first transformingthemto the frequencydomain,dividing out the spectral effectsof the headphones usingFouriertechniques,andthentransformingbackto the time domain.

LEFT EAR

i.

Pinnae (outer ear) responses measured with probe microphones

Pinnae

transforms

digitized as finite impulse response



Le f

',

%1

Synthesized cues

(FIR) filters

Figure 3. Iliustration of the technique for synthesizing virtual acoustic sources with measurements of the head-related transfer function. An example of a pair of finite impulse responses measured for a source location at -90 ° to the left and 0 ° elevation (at ear level) is shown in the insets for the left and right ears.

10

°° i

i

i

1

Ob ÷

(/) tO

i

e,-

b

=

_

u

i

I1:

/ .2 .N

_._.

4) C 0 Q.

P

--I

__ (ID I

(SP) epnl!ldwe eA!_eleEI

I

(sue!peJ) eseqd

I

I

(E]P) epnl!ldkue eA!leleEI

11

_ I

(sue!peJ) eseqd

_o

I

_ _

_

The Real Time

System:

The Convolvotron

In the real time system, designed by Scott Foster of Crystal River Engineering, corrected FIR filters is downloaded from an 80286- or 80386-based host computer memory

of a real time digital

signal-processor

known

as the Convolvotron

the map of to the dual-port

(fig. 5). This set of two

printed-circuit boards converts one or more monaural analog inputs to digital signals 50 kHz (16-bit resolution). Each data stream is then convolved with filter coefficients the coordinates each

input

converted allows more

of the desired

signal

target

in the perceptual

to left and fight analog

up to four independent than 300 million

simulating

relatively

accommodate

locations 3-space

signals,

and the position

of the listener. and presented

and simultaneous

multiply-accumulates small

the longer

reverberant

filter

sources per second.

environments,

lengths

required

of the listener's

The resulting

are mixed,

The current

with an aggregate

computational

and the hardware

speed

by

thus "placing"

over headphones. This processing

for larger

head,

data streams

at a rate of determined

configuration speed

is sufficient

can be scaled

of

for

upward

to

enclosures.

The Convolvotron High-speed

realtime

TMS I

tracker Head

80386

host

Convolution --1--P'-

Interpolates HRTF coefficients

--2---_-

Controls I/O and timing

--3---_-

--1--_--3--1_ --4--!_

signal-processor

320/C25

processor

--2--I_ Updates 4-source geometry

digital

engine

LEFT

FIR filtering and mixing of 4 independent sources RIGHT

--4---_I

HRTF

map

I

• Flexible processing resources Maximum rate ~300 MIPS • 16-bit conversion A/D conversion • 50-kHz sampling rate • Estimated latencies: Headtracker; 50 ms Host and DSP; 30-40 ms

Analog source Inputs

Figure 5. Block diagram of the Convolvotron system dimensional virtual acoustic displays in real time.

12

designed

by Scott Foster

for synthesizing

three-

Motion trajectoriesandstaticlocationsatgreaterresolutionthantheempiricalmeasurements are simulatedby selectingthe four measuredpositionsnearestto the targetlocation and interpolating with linear weighting functions. The interpolation algorithm effectively computes at the sampling interval (every 20 gsec) so that changes in position are free from clicks

or switching

3-Space neous

Isotrack), sources

should

with the magnetic

head position

help to enhance

the simulation

especially

unique

studies at Wisconsin linear interpolations

or in motion

suggest between

relative studies

1940; Thurlow

coupled

(Polhemus

to the user. Such

suggest

that head

et al., 1967; Thurlow

with simulations

to the Convolvotron

system

in real time so that the four simulta-

trajectories

since previous

(e.g., Wallach,

of interactivity,

is apparently

head-tracking

can be monitored

in fixed locations

for localization

1967). This degree environments,

integrated

the listener's

are important

Pilot two-way

When

are stabilized

head-coupling ments

noises.

a new coefficient artifacts such as

of simple

move-

and Runge,

reverberant

system.

that the interpolation approach is perceptually-viable; simple locations as far apart as 60 ° in azimuth are perceptually

indistinguishable

from stimuli

tion performance real time display

begins to degrade at separations of 36 °. These data suggest that the HRTF map of a could tolerate interpolation separations of as much as 60 ° in azimuth (currently a

maximum

of 45 ° in the Convolvotron)

bly be smaller

of interpolation

As with any system The Convolvotron, depending

upon

as filters,

by the head-tracker. delay

the system speed

displacement

(the minimum

the perception speakers) example,

is 360 deg/msec,

of auditory

from about slower

360 deg/sec

or larger

relative

moving

upon

movement

90 msec.

at 180 deg/sec,

angle)

for a given

and others

using

source

filters

Facility,

"Audiosphere"

component

capabilities

to result

in perceptible

of simple

reverberant

implicaAt the

The directional

update

when

the relative delays

are to changes sources

may or

in angular

Recent

work

(moving

loud-

on

for moderate velocities. For audible movement angles

(Perrott,

1982; Perrott

and Tucker,

of the Convolvotron,

delays, rooms)

especially are being

while

when

speeds

multiple

generated.

is being used in a variety of other government, university, and ours, including the NASA Ames Crew Station Research and Devel-

the Psychoacoustics

and Bellcore

are well within

(e.g., simulation

Currently, the Convolvotron industry research labs besides by Durlach,

begin

tone-burst

used is

head-motion.

velocity.

real sound

for a 500-Hz

of the HRTFs

and so on. Such

humans

one. msec,

has important

32 ° or greater

how sensitive

30-40

of at least 50 msec

or realistic

every

of about

the duration delays

sources

to a new location resolution

by Perrott

velocities

should

proba-

of the percep-

of about

latency

of computational

16 ° or greater

4 to 21 °, respectively,

1988). Thus,

opment

sources,

suggests that these computational latencies are acceptable for source speeds ranging from 8 to 360 deg/sec, minimum

approaching sources

audible

delay

An additional

realistic

lag, depending

motion

should

evaluations

has a computational

This accumulation

to an angular

in a perceptible

localiza-

Ames.

of simultaneous

can only update

may not result

for elevation,

of the map in elevation

comprehensive

at NASA

geometry.

can simulate

in turn, corresponds

while,

data "on the fly," the term real time is a relative

of the source

source-listener

ranged

to compute

as the number

tions for how well the system maximum

More

the host computer,

such factors

coefficients

but that the resolution

are underway

required

including

and the complexity

introduced

from measured

than 36 ° (18 ° in the Convolvotron).

tual consequences

interval,

synthesized

(Ludwig

Lab at the Research

Laboratory

et al., 1990). The system

of their virtual

reality

system.

13

also forms

of Electronics part ofVPL

at MIT Research's

directed

PSYCHOPHYSICAL

duce

VALUATION

OF THE SYNTHESIS

TECHNIQUE

The working assumption of our synthesis technique is that if, using headphones, we could proear canal waveforms identical to those produced by a free-field source, we would duplicate the

free-field

experience.

Presumably,

replicate

the free-field

experience

must

come

directly

studies

individualized

listener.

in which

HRTFs

would

The only conclusive

free-field

be the most likely

approach

mates

study

test of this assumption

and synthesized,

free-field

listening

are

sources.

Sources

and Kistler

The stimuli

Using

(1989b)

were

Individualized

HRTFs

confirmed

the perceptual

spectrally-scrambled

noisebursts

adequacy

of the basic

transduced

either

by

in an anechoic chamber or by headphones. In both free-field and headphone conditions, indicated the apparent spatial position of a sound source by calling out numerical esti-

of azimuth

and elevation

example, a sound heard to the left and somewhat and no feedback

(in degrees)

using

a modified

spherical

coordinate

system.

For

directly in front would produce a response of "0, 0," a sound heard directly elevated might produce "-90 azimuth, + 15 elevation," while one far to the

rear on the right and below original

for Static

by Wightman

for static

loudspeakers the subjects

might

was given.

produce

Detailed

"+ 170 azimuth,

explanations

-30

elevation."

of the procedure

Subjects

and results

were blindfolded

can be found

in the

paper.

The data analysis responses

of localization

are represented

of a unit-sphere data, the usual

of 54 ° . Thus,

these

centroid,

experiments

by points

is complicated

in three-dimensional

space;

by the fact that the stimuli in particular,

as points

and

on the surface

since distance remained constant in this experiment. For these spherically-organized statistics of means and variances are potentially misleading. For example, an azimuth

of 15" on the horizontal

elevation

it is more

psychophysical

is a unit-length

plane

is much

appropriate

data (Fisher vector

larger

in terms

to apply

of absolute

the techniques

et al., 1987). The spherical

with the same direction

distance

than a 15" error at an

of spherical statistic

as the resultant,

statistics

used here,

the vector

unit-length judgement vectors. The direction of the centroid, described tion, can be thought of as the "average direction" of a set of judgements

to charac-

the judgement

sum of all the

by an azimuth and an elevafrom the origin, the subject's

position. Two indicators of variability, K -1 and the average angle of error, were also computed. These results will not be discussed here; the reader is referred to the original paper. Another "confusions." the median also found.

type of error, observed in nearly all localization studies, is the presence of front-back These are responses which indicate that a source in the fro]at hemisphere, usually near plane, is perceived to be in the rear hemisphere. Occasionally, It is difficult to weight these types of errors accurately. Since

low (e.g., Oldfield descriptive hemisphere, greatly

to

compared.

A recent

terize

using

for a given

from psychophysical

Validation

error

synthesis

when

Computing

statistics; that is, the responses are coded as if the subjects had indicated as in the analyses of table 1 and figure 6. Otherwise, estimates of error

the correct would be

inflated.

and Parker,

1984a),

On the other hand,

reversals

if we assume

have

generally

that subjects'

14

been

the reverse situation is the confusion rate is often

resolved

responses

correctly

reflect

their

Table

1. Summary

field

(boldface

from

Wightman

statistics

type)

comparing

and virtual

and Kistler,

ID

resolved

sources

localization

(in parentheses)

judgements

for 8 subjects.

of free(Adapted

1989b)

Goodness

Azimuth

Elevation

of fit

correlation

correlation

Percent

front-back

reversals

SDE

0.93

(0.89)

0.98 (0.97)

0.68 (0.43)

12

(20)

SDH

0.95

(0.95)

0.96 (0.95)

0.92 (0.83)

5

(13)

SDL SDM

0,97 (0.95) 0.98 (0.98)

0.98 (0.98) 0.98 (0.98)

0.89 (0.85) 0.94 (0.93)

7 5

(14) (9)

SDO

0.96

(0.96)

0.99 (0.99)

0.94 (0.92)

4

(11)

SDP

0.99 (0.98)

0.99 (0.99)

0.96 (0.88)

3

(6)

SED

0.96 (0.95)

0.97 (0.99)

0.93 (0.82)

4

(6)

SER

0.96 (0.97)

0.99 (0.99)

0.96 (0.94)

5

(8)

Mean

5.6 (11)

perceptions, reported

resolving

such confusions

as a separate

Here,

table

mary statistics synthesized numbers

could

be misleading.

overview

of the results

1 provides

a general

comparing

the eight subjects'

stimuli

Thus,

the rate of confusions

is usually

statistic.

are shown;

in parentheses

resolved

the numbers

judgements

in bold-faced

are for the synthesized

of Wightman

and Kistler

of location

Note

that overall

Sum-

for real (free-field)

type are for the free-field

conditions.

(1989b).

and

data and the

goodness

of fit between

the actual and estimated source co-ordinates is quite comparable, 0.89 or better for the synthesized stimuli and 0.93 or better for free-field sources. The two correlation measures indicate that while source azimuth appears problematic, particularly ples of the range Actual jects

source

to be synthesized nearly perfectly, synthesis of source elevation is more for SDE who also has difficulty judging elevation in the free field. Exam-

of patterns

azimuth

of localization

behavior

(and, in the insets,

elevation)

SDO and SDE of Wightman

ments

and the panel

own transfer corresponds

on the right shows

functions.

stimuli

Thus,

individual

conditions of "good"

tended

is consistent and "bad"

(table

sources,

synthesized

versus

(1989b).

judgements

the judged

The panel

judgements

for the stimuli

the positive

diagonal,

can be seen

azimuth

in figure

are plotted

for sub-

on the left plots free-field synthesized

or a straight

6.

judge-

from the subjects'

line with a slope

of 1.0,

performance.

rates

field and synthesized while

On each graph,

to perfect

The confusion

and Kistler

for resolved

1) were

relatively

respectively.

to be greatest

differences for a given localizers

low, with average

Similar

to the location

for subjects

do occur,

who also had higher

the pattern

subject;

it appears

is supported

by these

15

rates

of results that Butler

data.

of about

judgements, across

rates

6 and 11% for freereversal

for the

in the free field.

synthesized

and Belendiuk's

rates

and free-field (1977)

observation

I

I

i

Free field (SDO) Azimuth

120

E)

I

l_

I

Headphones Azimuth

o

I

I

o6

(SDO)

®

888

o8

Q

8

o

8

8



o8

-6o

_,®

I I i_ BI Elevation o _]

e _ O

-

3O

I

I

60

Iml_l

I Elevation

_]

3

H N

®



,.,.j

I

o

o

8 %0

o

0

0

600

8

0

e_ e o

-30

o I

8 __8

I

i

I

I

i

I

I

-30

I

0

I

30 160

I

8

[

Free field (SDE) 120

/

1

I

I -30

I 0

I

I

I

I

I

8

Headphones (SDE) Azimuth

®

8 o 8

O)

o88

30

®

8

.-j

g

O-

I

I

®oo

I

-

-

1 -120

1 -60

1 0

o

-30

I 60

0

[]

I -120

[]

-30 I

I

30 60 I 120

I

0

0 1

I

3O

o

®® [

I

Elevation

o ®

®



I

®

_} ° 060

-30 o

of7

I

Elevation

0

8o

-120

o

og

g

-60

o

0

8

gg8 6O

0

®

60



1 I 30 60 I ,

,

¢ ¢ e e

Azimuth

1

..........

o

f

o

'

-120

I -60

I 0

I

-30

I 60

0

I

I

30 60 I 120

Target position (deg) Figure

6. Scatterplots

azimuth plots from

for subjects

of actual SDO

source

azimuth

(and, in the insets,

and SDE in both free-field

and headphone

eleVation)

versus

conditions.

judged

source

The plot on the left

free-field judgements and the plot on the right shows judgements for the stimuli synthesized the subjects' own transfer functions. Each data point represents the centroid of at least 6 judge-

ments.

72 source

positions

are plotted

in each plot. Data

bined in the azimuth plots and data from 24 different insets. Note that the scale is the same in the azimuth Kistler,

1989b.)

16

from 6 different

source

elevations

are com-

source azimuths are combined in the elevation and elevation plots. (After Wightman and

Acoustic Individual

differences

Determinants

in localization

behavior

of Performance suggest

that there

may be acoustic

features

pecu-

liar to each subject' even measurements

s HRTFs which influence performance. Thus, the use of averaged transforms, or derived from normative manikins such as the KEMAR, may or may not be an

optimum

for simulating

approach

For example,

figure

free-field

7 illustrates

responses

for a single

averaging

of these

significant

features

source

the between-subjects

location

functions

would

in the acoustic

30

sounds. variability

(after Wenzel

in the left and right-ear

et al., I988a).

tend to smooth

the peaks

Obviously,

and valleys,

magnitude

any straightforward

thus removing

potentially

transforms.

Left ear

"_,t ',_ _'_Y_-_.;_.k "_-

-30 ....

E

I

I

1

I

I

I

i

L

I

I

I

1

1

1

I

l

1

I

Right ear

30 .m

i13 Ill

0 i

!

-3o! 400

20O

1000

3000

l

L

i

[

[

l

___i

1

i

I

20000

7000

Frequency Figure

7. Magnitude

responses

right ears are plotted On the other especially acoustic Figure

magnitude illustrated

to identify

are shown

specific

data indicate

SDE. A preliminary

8 plots "interaural

and Kistler's

position

for 8 subjects.

The left and

analysis

features

that elevation of elevation

of HRTFs

which

is particularly

coding

result

difficult

suggests

that there

for four subjects'

interaural

in good

to judge, is an

for this poor performance.

data. The computational intensity

it may be possible

The psychophysical

for subject basis

source

separately.

hand,

or bad localization.

for a single

(1989b)

changes

elevation

derivation figure

for different

dependency"

of these

10. Essentially, elevations

functions

functions

can be found

the six functions normalized

in the description

on each graph

to zero elevation,

show

amplitude

of Wightman how interaural

the flat function,

when

the

responses are collapsed across all azimuths. In spite of the large intersubject variability in figure 7, the dependency functions for the better localizers (shown in the top three

17

graphs)arequite similar to eachotherandshowclearelevationdependencies. SDE'sfunctions,on the otherhand,aredifferent from the othersubjectsandshowlittle changewith elevation.Thus,it appearsthat SDE'spoorperformancein judging elevationfor bothrealandsynthesizedstimuli may be dueto a lack of distinctive acousticfeaturescorrelatedwith elevationin hisHRTFs.

0 -40 -80 F

0

[

I

Subject

I

1

-

-]

I

I

I

SDP

Deg 36 18 0

A

til _"---__"

v

0i

-'------'----_

1 _

-18

--_--"---'_

-36

iX

E ¢ii

[

0 [

t

1

Subject

1

[

_--

"

"

1

r

I

Deg

SDH ......

om

",.._ .__j_

-_____ ____,,__

_.__

t_ lii

/_

_._,_.__i_---,_-

!l: -40

54 36 18 0 i l

-36 -18

-80 L



m

]=

' _- SDI_ Subject

L

l

'

'

_

'

k

I

1

'

'

---_

Deg 54 36 18 0 -18

-40

____:__

-80 I

200

l

400

.£.

__

-36

....

i

1000

I

3000

L

[

7000

_

I

--

20000

Frequency Figure 8. Interaural elevation dependency functions plotted for 4 subjects. From top to bottom, the functions within a panel represent elevations of +54, +36, + 18, 0 (the reference elevation), -18, and -36 ° .

18

The analysis jectured

of individual

about

but rarely

is, can one manipulate

differences

directly

tested

localization

in pinna

cues brings

(see Butler

performance

up a topic

and Belendiuk,

simply

which

has often been

1977, for an early

by listening

through

another

con-

example). person's

That ears?

Or put another way, can we adapt to and take advantage of a set of good HRTFs even if we are a bad localizer? The following data from Wenzel et al. (1988b) illustrate the kind of "cross-ear listening" paradigm

that is possible

judgements

of location

Figure

9 shows

izer listens

using

our synthesis

as in the experiment

what happens

to stimuli

synthesized

technique.

Again,

by Wightman

to resolved

azimuth

from another

the subjects

and Kistler

and elevation

good

localizer's

provided

absolute

(1989b). judgements

pinna

when

transforms.

a good

Azimuth

local-

is plotted

in the top panels and elevation is on the bottom. The left and far-right graphs plot centroids for SDP's and SDO's azimuth judgements vs. the target locations when the stimuli were synthesized from their own HRTFs. Front-back confusions have been resolved as described above. As can be seen, both SDP and SDO center

graphs

azimuth

show

degrades

cues

correspondence

Figure the HRTFs

the synthesized

what happens somewhat,

ing that elevation overall

localize

when

SDP listens

but not a great

are not as robust between

stimuli

based

"through"

deal. Elevation

as azimuth

real and perceived

on their own HRTFs SDO's

cues across locations

features

do determine

SDE could

actually

plots these

data. Again,

his own HRTFs, that cross-ear

sample

improve SDE,

performs

listening provided

localization,

his performance whose

nearly

azimuth

is not a symmetrical by SDO's

judgements listening

effect

localizers,

pinnae.

one might

if he could

as well when

to only 2 hr for the good

cues

of

further,

of individuals,

The

suggest-

but an

intact.

10 compares performance when a good localizer, SDO, listens to stimuli synthesized from of bad localizer SDE. Again for azimuth there is little degradation. However, for eleva-

If acoustic

better

well.

Localization

degrades

the range remains

tion, it seems that SDE's pinnae provide poor elevation cues for SDO that acoustic features of the transforms determine localization.

compared

pinnae.

performance

quite

These

performance

to begin

are accurate to SDO's

for elevation.

data are hardly

for stimuli cues.

after about

not take advantage conclusive

in Wightman

with. But they are suggestive.

the notion

case is true; that

SDO' s ears. Figure

azimuth Even

supporting

the reciprocal

listen "through"

SDE still could

size of one; only SDE of the eight subjects

poor elevation

conclude

as well,

11

synthesized However,

from it appears

50 hr of testing, of the presumably

since they are based

and Kistler

(1989b)

showed

It ma2¢ be that there

on a such

is a critical

period for localization gous to the experiments

which, once past, can never be regained. Perhaps more likely is that, analowith prisms in visual adaptation (see Welch, 1978), SDE would need pro-

longed

exposure

and consistent

to SDO's

pinnae

in order

to learn

to discriminate

the subtle

acoustic

cues he does not normally experience. Apparently, a few hours of testing a day, especially in the absence of either verbal feedback or correlated information from the other senses, are not enough allow

adaptation

to occur.

19

to

I

E_

i

i

I

t'

i

i

t

i

[] -

o

[] m rn rn

E] Et]

-@

r_ Eli

[]

r----------_ DD []

r:1 i?n E]

0 13 B

-

i'n rq_'nE]

0

rnr_

0

[] "

-

E]_EI E)

133 13E]

'

[qIZ_] (/_ I,,i,,,I 1. I

_!]----I

I. I

1 I

l I

E,

I I

I I

I

I

[] -

o

[] [] G]E]

[] []

[]

o ¢D

rn7 []

13

D_

EZZ) mm_

[] [] [] I3

o -

_

Fn

1313

E]3E] E]

_

El3 l-rrn

I I

• .... I

I I

I I

[]

_--I-

I

I

I

[] (3 -

[] [] OD

"i

I

[_ []

E[]1313

13]

8

Ell

E]

EZ]

°,,,_

I":1 ran

0

El] -

E]EI]

El3rn

E]Z_

_,,

E_

Z I

t

l

i

l

[]

I

I _

I

o

o

l

(Bop) uo!l!sod paOpnr

(Bep) uo!llsod peBpnr

2O

I

] E_

I

I

I

I

I

I

[] [] [] E] EDE]

-

[]

ED []

[]

O ¢q

FA

--

[]

[]

I

[]

I

_..,

[]

-_,

E)EI]

I

o

[] [] []

ElI3

o._

E)EII]

o

o

-$ E] E]E] [] [] Eli

[]

EEEI]O]

D_ -

=

O

o_

FTnD BED

O

[]

LU O (D

[] Eli []

|

[] FI G] El] 121

O ¢q

D (D

E=8

!

_,_

.=-_

tU F-

(/I

0

--

t:_

_NN_

O

W O

I

E_] G] OB -[]

Eli 13

u._ _

[3]

I

rnq []

]

ga

EDEI3

O3

]

_

_

tU E] [DE]

LU a 03

--

= ,._

|

--

_

¢D []

LU a 03

_

E]E_]

O3

{2] [] [] BE]

-

0

@

I--

[] E] [] [3B

--

t.)

_

|

O

[]

[] EI)FI

O

|

rnn 121

I

I

I

I

I

I -

E]

I

I

_._

I

o cq

_

_

O

T--

E] [] D

[] E_]

_

[]

O (D

ED

E]

[] []

©_

rq Eli E]

[] Eli

_._

•_'_ [] --

D

O

EI33D

_ O

o

[] E][g

-

O

O

[]

r_

-

D

_,

Eli DE]

|

Q

I

I

I

I

•_

C_ O3 _

- 6

O -

O

I

El] [] [] E

"2,

_-_

O

E]G_]

C_

g I

I

I

I |

|

(6ap) uo!l!sod

(5ap) uo!t!sod

pa6pnr

21

pe6pnr

_ _

_._

_._



¢0

I

I

I

I

I

I

I

I

[] [] DE] -

o

[] [] []

151 rq

[]

rrrrl

_

o

_

O

-

O

[713

FIn 13

[] r.--n [] [] 13

13 FI1 V1

E1 rrm -

rl

o • Flq r:n

DE] OQ

DE]

_._

-

%_

|

r:n vl

O r:vI-I

-_,

131:1 E I I

I I

__I I

II

1

I

I

[] DD

I

I

__

I

o

[]

_

[] FIE] 133

,r=,,

_

[]

[]

r.-l [3 I-:I [] []

DE]

[]

[]

"O

EID

rln rrm

8.

,? []

"_'.n

_

[]

[]

o

I

[] E]E] []

D rn

r.lF1 i

r_

I

EtlD

_._._

I-

a

o

r_

I

.

o

[] []

I

_'_

o

o

-t

O tO

[]

Vlq rrrl FI

-

_

D

o

ffl

-

,,_

-_

F1EI3

_

_l,IJ

[]

I

I.

I

I

_ U-1o ._o.N

.=.o "_ .o ._

,

o tO

88

r:T1VI

[] OEI

8[3 [] [] [] [] OD

_'_ o

o

r_

W

•_;._

,? VI--1

88

121

_-_

[]

,

[] []

[]

[]

rqrrn

|

l o

l

1

l

o

_ "T

(6ep) uo!l!sod

I

I

1

o

I

_ "_ "_ (Oep) uo!1!sod pe6pnf

pe6pnf

22

Inexperienced In practice,

measurement

Listeners

and Nonindividualized

of each potential

listener's

HRTFs

HRTFs

may not be feasible.

It may also be

the case that the user of a 3D auditory display will not have the opportunity for extensive training. Thus, a critical research issue for virtual acoustic displays is the degree to which the general population of listeners

can readily

ized transforms.

The individual

worst

case,

using

obtain

nonindividualized

than the listener's

inherent

set of HRTFs

approach

is to use the HRTFs

from a subject

and are thus correlated

conditions.

Recently,

generated

"good

resolved, nearly

12 illustrates localization

poor elevation illustrated

performance

across

in figure

about

19 vs. 31%.

Kistler feedback

more

A reasonable "behaviorallyand headphone

using

a variant

on the

When

with judgements

and headphone

and virtual show

of a representative

source

front-back

SDO,

confusions

a

are

for the nonindividualized

et al. (1988b), conditions,

conditions

inconsistent

subject,

(1988b).

a response

(fig. 13). The

behavior

pattern

show which

third pattern

with poor elevation

if it turns

stimuli

2 of the subjects

out to be common,

accuracy would

is

is in

be a

that most listeners

Note,

errors

stimuli, though,

the result

listeners

tend to support

due to front/back

experienced

listeners

6 vs. I 1% while

that the existence

of the simulation.

useful

directional

information

from

confusions

are resolved.

in the Wightman

the inexperienced of free-field

It is possible,

and Kistler

listeners

confusions as Asano

For free-field

show

indicates

et al. (1990)

study

exhibit

average

rates

of this experiment Thus,

and the more experienced it may be that some

to take full advantage

of a virtual

23

form

subjects of adaptation

acoustic

display.

of

that these reverhave claimed,

as subjects adapt to the unusual listening conditions provided by static real or simulated. The difference in free-field confusion rates between

this view.

will be required

can obtain

requiring the use of individually-tailored HRTFs, particularly for is important here. Again, the results plotted in figures 6 and 9 through

rates of about

that these errors diminish anechoic sources, whether inexperienced

and Kistler

The latter phenomenon,

in which

free-field

confusion

sals are not strictly

in the ear canals

Like SDE in Wenzel

2 subjects

data suggest without a caveat

on analyses

simulated

study

much

may be able to use a

in both free-field

a more extensive

in the

displays.

these

an auditory display azimuth. However,

front/back

accuracy

listeners

have been

ability

on nonindividualthat, even

cues for localization.

perceptual

is quite good,

in both free-field

conditions.

for virtual

versus

localization

inexperienced

of 12 of the 16 subjects.

the free-field

14; here,

only the synthesized

14 are based

based

11 suggest

measurements

completed

measured

in the free-field.

performance

In general,

adequate

whose

by Wightman

the behavior

to those

at least consistent

problem

using HRTFs

from the experiment

identical

then, even

with known

et al. (1991)

9 through

paradigm; 16 inexperienced listeners judged the apparent spatial location of over loudspeakers in the free field or over headphones. The headphone stimuli

digitally

localizer"

Figure

Wenzel

cues from stimuli

does not degrade

In general,

as long as they provide

calibrated"

cross-ear listening sources presented

localization data of figures

transforms

ability.

particular

were

adequate

difference

of Wightman or training

and with

the

i

1

I

I

(9 E) 0®

I

I

I ®

®

Free field (SIK) Azimuth

120

1

I

__

SDO's filters 20 dB rove (SIK) Azimuth

® o

® ®

®

E)

E) O o

® ®

8

0 6O

i •o

I

I

@60 -Elevation

-60 -

8

,..j ®

-120

®

®

®

®

I -120

1 -60

@ e60

_]1

rn [] Q

o

30 -

30

I I Elevation

;

I

I

[]

_..

-

r_

®

i

O-

B

E]

-30 - E] [] I I -30 0

®

m

I

I 0

I 60

E)

0

Q

-

-30 ] [ 30 60 1 120

---

E)

o

0

,

o ] -120

1 -60

I 0

I -30 I 60

1 0

I I 30 6O I 120

Target position (deg)

Figures 12. Scatterplots of actual source azimuth (and, in the insets, elevation) versus judged source azimuth for subject SIK in both free-field and headphone conditions. The plot on the left plots freefield judgements and the plot on the right shows judgements for the stimuli synthesized from nonindividualized transfer functions. Each data point represents the centroid of 9 judgements. 24 source positions are given in each plot. Data from 6 different source elevations are combined in the azimuth plots and data from 18 different source azimuths are combined in the elevation insets. Note that the scale is the same in the azimuth and elevation plots.

24

,

I

I

I

I

I

I

I

I

I ®

®

Free field (SID) Azimuth

120

e

SDO's filters Azimuth

®Q® ®

20 dB rove (SID)

6O

"O

o °

® ®

v

E)

e-

.9

Q o

t_

g "O

E)®

Q

g 60 ® ®

-6O o

-120

I

060

-

-30

o

B

I -120

-

Q

I 0

1

I

I

-60

0

60

I 30

I 60

13. Scatterplots

azimuth

for subject

field judgements dividualized

of actual

1

source

SID in both free-field

120

azimuth

functions.

Each

is the same

in the azimuth

and elevation

plots.

25

I--

-i

-30

[

I

-60

0

I

I

3O 6O

I

I

60

120

1

J

(deg)

elevation)

conditions.

for the stimuli the centroid

versus judged

The plot on the left plots synthesized

of 9 judgements.

positions are given in each plot. Data from 6 different source elevations are combined plots and data from 18 different source azimuths are combined in the elevation insets. scale

I

_ .I

(and, in the insets,

represents

I

D

-3O 0

- 120 position

and headphone

data point

o

I

and the plot on the right shows judgements

transfer

0

o8

Target

Figures

30

o o

I -30

e

I I Elevation

O o

0

o

I

-

g30

o

¢ e

i

Elevation

from

source freenonin-

24 source in the azimuth Note that the

I

I

I

I

I

I

I

I

I

I

I