Jul 19, 1991 - M. Wenzel, Ames Research Center, Moffett Field, California. July 1991 ..... at creating what we might now call a virtual acoustic display. One of ...
NASA Technical
Memorandum
103835
Three-Dimensional Acoustic Displays
Virtual
Elizabeth M. Wenzel
(NA_A-TM-I03_ 5) ACOUSTIC _ISPLAY5
THR_E-DIMENS (NASA) 30
ImNAL p
N91-30697
VIRTOAL CSCL 05C
G3153
July 1991
National Aeronautics and Space Administration
Unc13s 0036_19
NASATechnical
Memorandum
103835
Three-Dimensional Acoustic Displays Elizabeth
M. Wenzel,
Ames Research
July 1991
National Aeronautics and Space Administration Ames Research Center Moffett Field, California 94035-1000
Center,
Moffett
Virtual Field, California
SUMMARY
The development of an alternative medium for displaying information in complex humanmachine interfaces is described. The three-dimensional virtual acoustic display is a means for accurately directional events
transfering information to a human operator using the auditory and semantic characteristics to form naturalistic representations
in remotely-sensed
envisioned utility ment
as a component
in that context. of advanced
plays, project future.
interfaces
current
recent
at NASA
Ames
in some
detail,
be driven
and will no doubt
of the display
or constraints. acoustic
implementation
and finally
can stand
displays,
outlines
some
critical
of human
percep-
on this view,
characterizes describes research
it is
that the develop-
In expanding
and application,
alone,
find its greatest
has been
fu'st by an understanding
capabilities uses of virtual
to their
the technology
environment
in the design
should
and potential
approaches
Although
multisensory
philosophy
and later by technological
addresses
reviews
environments.
of a larger
The general
computer
tual requirements, the paper
or simulated
modality; it combines of dynamic objects and
such disthe research
issues
for the
INTRODUCTION
Rather
than focus
justification
on the "multi"
and development
part of multimedia
of a particular
medium,
interfaces,
this paper
the three-dimensional
will emphasize virtual
the
acoustic
display.
Although the technology can stand alone, it is envisioned as a component of a larger multisensory environment and will no doubt find its greatest utility in that context. The general philosophy in the design
of the display
driven bilities
flu'st by an understanding of human perceptual requirements, and later by technological or constraints. In expandingon this view, I will address why virtual acoustic displays
ful, characterize
has been
the abilities
of such
tion and application, describe some critical research issues advance for neglecting momentum.
that the development
The recent complex
any important
this problem the ubiquitous
burgeoning systems has been
faces can provide
work
VIRTUAL
of computing of information
to develop
combination familiarity
review
the current research for the future. Since
WHY
ingly
displays,
of advanced
or issues
approaches
in an area that seems
ACOUSTIC
technology
requires
that people
graphical
metaphor
complex computer
and the mouse.
across
should
to their
be capaare use-
implementa-
and finally I apologize
to be rapidly
outline in
gaining
DISPLAYS?
increasingly
direct-manipulation,
and consistency
recent
interfaces
at NASA Ames in some detail, these goals are rather ambitious,
and control
of the desktop
some
computer
applications,
Such
learn
to interpret
machines.
to
exemplified
by
interfaces
spatially-organized
thus avoiding
increas-
One approach
much
inter-
of the task-
dependent learning of the older text-oriented displays. Lately, a considerable amount of attention has been devoted to a more ambitious type of reconfigurable interface called the virtual display. Despite the oft-touted
"revolutionary"
nature
of this field,
the research
has many
antecedents
in previous
work in three-dimensional computergraphics,interactiveinput/outputdevices,andsimulationtechnology.Someof theearliestwork in virtual interfaceswasdoneby Sutherland(1968)usingbinocular head-mounteddisplays.Sutherlandcharacterizedthe goalof virtual interfaceresearch,stating, "The screenis a window throughwhich oneseesa virtual world.The challengeis to makethatworld look real,actreal,soundreal,feel real." As technologyhasadvanced,virtual displayshaveadopted a three-dimensionalspatialorganization,in orderto providea morenaturalmeansof accessingand manipulatinginformation.A few projectshavetakenthespatialmetaphorto its limit by directly involving the operatorin a dataenvironment(e.g.,Furness,1986;Brooks, 1988;Fisheret al., 1988). For example,Brooks(1988)andhis colleagueshaveworkedon a three-dimensionalinterfacein which a chemistcanvisually andmanuallyinteractwith a virtual modelof a drugcompound, attemptingto discoverthe bondingsite of a moleculeby literally seeingandfeelingthe interplayof thechemicalforcesat work. it seemsthatthekind of "artificial reality" oncerelegatedsolelyto the specializedworld of the cockpitsimulatoris nowbeingseenasthe next stepin interfacedevelopmentfor manytypesof advancedcomputingapplications(Foley, 1987). Oftenthe only modalitiesavailablefor interactingwith complexinformationsystemshavebeen visual andmanual.Many investigators,however,havepointedout the importanceof the auditory systemasanalternativeor supplementaryinformationchannel(e.g.,Garner,1949;Deatherage, 1972;Doll et al., 1986).Most recently,attentionhasbeendevotedto the useof non-speechaudioas an interfacemedium(Patterson,1982;Gaver,1986;BegaultandWenzel, 1990;Blattneret a1.,1989; Buxton et al., 1989).For example,auditorysignalsaredetectedmorequickly thanvisual signalsand tendto producean alertingor orientingresponse(Mowbray andGebhard,1961;Patterson,1982). Thesecharacteristicsareprobablyresponsiblefor the mostprevalentuseof non-speechaudioin simplewarningsystems,suchasthemalfunctionalarmsusedin aircraft cockpitsor thesirenof an ambulance.Anotheradvantageof auditionis that it is primarily a temporalsenseandwe are extremelysensitiveto changesin anacousticsignalovertime (Mowbray andGebhard,1961; Kubovy, allows
1981). This feature us to relegate
sustained
suited to monitoring malfunction. Non-speech fully designed compelling
tends
signals
have
over time,
the potential perceptual
sounds
to provide
than a silent
gies can be gleaned
from the fields
of music
when
an even
in mind.
"sound track" to the task at hand. or merely uninformative. Principles
event
to our attention
to the background.
for example,
abilities
and informationally-rich
an appropriate cacophonous
a new acoustical
or uninformative
state changes
with human
to bring
richer
following
from Gibson's
(1979)
audio
medium
with sound
a computer
is particularly
suddenly
interface
begins
if they are careis much
more
be enhanced
(Deutsch,
1982; Blattner
ecological
approach
et al., 1989),
studies of the acoustical deter1981; Buxt0n et al., 1989). For
to perception,
one can conceive
convey meaning about discrete events or ongoing actions in the world and their relationships an0ther_ One could systematicaily manipulate these features, effectively creating an auditory which
operates
on a continuum
by
psychoacoustics
audible world as a collection of acoustic "objects." Various acoustic features, such as temporal onsets and offsets, timbre, pitch, intensity, and rhythm, can specify the identities of the objects
bology
to
If used properly, sound need not be distracting or of design for auditory icons and auditory symbolo-
(Carterett e and Friedman, i 978 i Patterson, 1982), and psychological minants of perceptual organization (Bregman, 1981; 1990; Kubovy, example,
display
Just as a movie
film, so could
Thus
a car engine
and conversely,
from "literal"
2
everyday
sounds,
such as the clunk
of the and to one sym-
of mail in
your mailbox(e.g.,Gaver's"SonicFinder," 1986),to a completelyabstractmappingof statistical datainto soundparameters(Bly, 1982;Smith etal., 1990;Blattneretal., 1989). Sucha displaycould be furtherenhancedby takingadvantageof the auditorysystem's ability to segregate,monitor,andswitchattentionamongsimultaneoussourcesof sound(Mowbrayand Gebhard,1961).Oneof the mostimportantdeterminantsof acousticsegregationis an object'slocation in space(KubovyandHoward, 1976;Bregman,1981,1990;Deutsch,1982). A three-dimensionalauditorydisplaymaybe most sentation
of spatial
information
workload
is high.
Such displays
tional
with iconic
is important,
particularly
can potentially
information
usefully when
enhance
in a quite naturalistic
applied visual
in contexts
cues are limited
information
transfer
representation
where
us to monitor
and identify
sources
of information
of dynamic
from all possible
or absent
by combining objects
face. Borrowing a term from Gaver (1986), an obvious aspect of"everyday listening" we live and listen in a three-dimensional world. A primary advantage of the auditory allows
the repreand
direc-
in the interis the fact that system is that it
locations,
not just the
direction of gaze. In fact, I would like to suggest that a good rule of thumb for knowing when to provide acoustic cues is to recall how we naturally use audition to gain information and explore the environment; provide
that is, "the function
a more coarsely-tuned
analyses.
For example,
for a target central
fic control integrate parallel
field.
(ATC)
approach
plays
visual
displays
display
proposed
ATC
nications
incoming
from
systems. traffic
useful
simplicity
visual
which
finely-tuned even
spatial
aircraft.
correspond
be tracked
such as a complex situations
time. A second
signal
like potential
directional listener's
over
information
with a unique runway
advantage
of the binaural
alerting
rhythm,
Again,
could
the boundaries
involves
temporal
incursions.
and urgency
head, e.g., within
A second
example
could
the signal
be emphasized
by placing
of their "personal system,
often
referred
hears
location
for ATC.
space"
be processed the warning (Begault
to in
disben-
commu-
in the terminal aircraft are on a and their routes
An auditory to convey
icon,
of urgent true
close to the
and Wenzel,
to as the "cocktail
Ames,
significant
also be used as a warning
could
asked
of acoustic
the controller
systems
in the as air traf-
at NASA
area. In such a display, it should be more immediately obvious to the listener when potential collision course because they would be heard in their true spatial locations could
such
such as the triple
two types
to their actual
search
are being
patterns,
Research
in which
visual
for objects
tasks,
that they will provide
display
can
visual
controllers
landing
will emphasize
and the likelihood
in positions
search,
ATC
complex
is an ATC
system
that aurally-guided
in inherently
the flow of incoming
the auditory
of our more
For example,
Administration,
One example
Thus
reported
to unaided
or cockpit.
Aviation
of their conceptual
the attention
into increasingly
to maximize
with the Federal
because
is superior
air traffic
the eyes."
have recently
will be especially
for the tower
heavy
efits to current
to direct
et al. (1991)
Such features
increasingly
collaboration
mechanism
Perrott
in a cluttered
visual
of the ears is to point
party
1990).
effect",
is that
it improves the intelligibility of sources in noise and assists in the segregation of multiple sound sources (Cherry, 1953; Bronkhorst and Plomp, 1988). This effect could be critical in applications involving
the kind of encoded
non-speech
messages
acoustic representation of multi-dimensional data 1990), or the development of altemative interfaces et al., 1990).
Another
aspect
of auditory
spatial
proposed
for scientific
(e.g., Bly, 1982; Blattner for the visually impaired
cues is that, in conjunction
"visualization,"
the
et al., 1989; Smith et al., (Edwards, 1989; Loomis with the other
senses,
they can act as potentiators of information in a display. For example, visual and auditory cues together can reinforce the information content of a display and provide a greater sense of presence
3
or
realismin a mannernot readilyachievedby eithermodalityalone(Colquhoun,1975;O'Leary and Rhodes,1984;Warrenet al., 1981).Similarly, in direct-manipulationtasks,auditorycuescanprovide supportinginformationfor the representation of force-feedback(Wenzelet al., 1990),a quite difficult interfaceproblemfor multimodaldisplayswhich is only beginningto besolved(e.g., Minsky et al., 1990).Intersensorysynergismwill be particularlyusefulin telepresenceapplications, including advancedteleconferencing(Ludwig et al., 1990),sharedelectronicworkspaces(Fisher et al., 1988;GaverandSmith, 1990),monitoringteleroboticactivitiesin remoteor hazardoussituations (Wenzeiet al., 1990),andentertainmentenvironments(KendallandMartens,1984;Kendall andWilde, 1989; Cooper and Bauck, 1989). Thus, the combination of veridical spatial cues with good
principles
of iconic
design
could
provide
an extremely
powerful
and information-rich
display
which is also quite easy to use, Here, the term veridical is used to indicate that spatial cues are both realistic and result in the accurate transfer of information; e.g., the presentation of such cues results in accurate From some
estimates
of perceived
the above
considerations,
of the goals
to keep
perceptual
research.
related
location
by human
one can attempt
in mind
when
A virtual
mation to a human operator using the auditory acteristics to form naturalistic representations simulated
environments.
virtual
representation
should
provide
As with visual
a functional
atically verify (1) adequately sources
which
the ongoing dynamic
in three
needs
static
or streams a display
this definition from reality.
to human
which
can be displayed;
may potentially
(I 990; Durlach
audition
above,
1986)
the utility
transfering
does not necessarily it implies
in the context
infor-
mean
charor
that the
that the display
of the task to be performed.
patterns
of representing
and interactive;
to provide motion,
for example,
has proposed
viable. Therefore the display must: resolution and dynamic range, (2) pre-
(3) be capable a stable
acoustic
and (6) be flexible
real environmental or objects.
normal
perceptual
multiple
that is, responsive environment acoustic
capabilities.
with
icons,
to this approach
cues could
to
in the type of acoustic
sounds,
A corollary
that localization
For example, be artificially
speech,
is that such Durlach magnified
to
ability.
OF THREE-DIMENSIONAL
sources
and list
and conducting
for accurately
Rather,
(4) be real-time
with head
auditory
localization
ANTECEDENTS
As noted
dimensions,
be used to enhance
and Pang,
a kind of super
ize the various
spatial
or moving,
correlated
of multidimensional
display
a great deal about our sensory biases; that is, the what, when, used by the human listener. It also means that we must system-
of the user, (5) be head-coupled
cues appropriately
information
create
accurately can be either
acoustic
technology
is a medium
that the displays we develop are perceptually reproduce the audible spectrum in frequency
sent information
studies.
modality; it combines directional and semantic of dynamic objects and events in remotely-sensed
displays,
equivalence
a virtual
the supporting
display
must be indistinguishable
To achieve this goal, we must know and how of the acoustic information
in psychophysical
to a define
developing
acoustic
listeners
of a 3D auditory
of information
in auditory
VIRTUAL
display space.
greatly While
ACOUSTIC
depends
DISPLAYS
on the user's
compromises
obviously
ability
to local-
have
to be
made to achieve a practical system, the particular features or limitations of the latest hardware should be considered subservient to human sensory and performance requirements. Thus, designers of such interfaces must carefully consider the acoustic cues needed by listeners for accurate localization ensure that these cues will be faithfully (or at least adequately, in a human performance sense)
4
and trans-
ducedby the synthesisdeviceratherthanlettingcurrenttechnologydrive the implementation.In fact,knowledgeaboutsensoryrequirementsmight actuallysaveprocessingpowerin somecasesand indicateothersto which moreresourcesshouldbedevoted.
Psychoacoustical Much which
of the research
emphasizes
quencies
on human
research to localize
1969; Butler stimuli
differences review
sounds
over headphones,
studies ization
1977;
appropriate
as being
to an external
account
interaural Similarly,
inside
cues
are minimal subjects
though
are present
ears or pinnae.
externalization ally-veridical
Experiments
have
shown
that spectral
or the "outside-the-head" sensation localization over headphones should
as well as the interaural
difference
shaping
to
temporal
I974).
by the pinnae
Many
Prior
to the development
some
early
was the rather locating
attempts
amazing
enemy
aircraft.
directional
pinnae
(FLYing
By Auditory
tem used only crude indicate
turn, bank,
Much
later,
information.
techniques apparatus
and an expanded
left/right
intensity
In general,
of the Head-Related
One class of techniques kins, such as the KEMAR
panning
along display
with pitch
World
simulating
Function
have concentrated
from binaural
(Knowles
Electronics,
recording Inc.)
called
FLYBAR
pattern
changes
to
flying. auditory
localization
reproduction, on various
cues
and eventually, means
(e.g., Hudde
as a to
for reproducing
and measurement
and the development
and Neumann
and
in the form of
War II. This sys-
that is, the direction-dependent
signal by the outer ears. The nature detail.
derives
cues
display
and temporal
veridical
there
One of these
War I for detecting
just after World
in stereo
(HRTF);
display.
localization
for instrument
experience
the approaches
(1946)
localization,
acoustic
axis. A less elaborate by Forbes
to think about
Transfer
effects imposed on an incoming will be considered later in more
interaural
the listening
out-of-head
now call a virtual
of the use of enhanced
in an acoustic
began
and enhancing
for synthesizing
was developed
and air speed
that perceptuby the pinnae
synthesized.
(fig. 1) used during
It is an early example Reference)
direc-
to Implementation
what we might
pseudophone
investigators
way of analyzing the effects
of current at creating
is highly
accuracy (Gardner responsible for
(Plenge, 1974). Such data suggest be possible if the spectral shaping
cues are adequately
Approaches
display
of
(Blauert, listen
interaural
(Plenge,
tion dependent (Shaw, 1974), that the absence of pinna cues degrades localization and Gardner, 1973; Oldfield and Parker, 1984b), and that pinna cues are primarily
large
(see Blauert,
now suggest that deficiencies of the duplex theory reflect the important contribution to localof the direction-dependent filtering which occurs when incoming, sound waves interact with
the outer
were
at low fre-
for the ability
when
the head even location
theory"
1907). However,
with this approach
1986).
source
Rayleigh,
it cannot
where
and Parker,
"duplex
in time of arrival (Lord
limitations
For example, plane
in the classic
differences
to serious
median
they are perceived
differences
interaural
hearing).
Oldfield
is summarized
at high frequencies
points
of spatial
on the vertical
and Belendiuk,
and intensity
cues,
in intensity
over the last 25 years
1983, for an extensive subjects
localization
the role of two primary
and interaural
binaural
sound
Antecedents
acoustic of the HRTF
of normative and Schroter,
mani198 l)
OR1G!NAL BLACK
AND
PAGE
WHITE
PHOTOG_ri ORIGINAL
PAGE
OF POOR
QUALITY
Figure World
1, Photo of the pseudophone apparatus used for detecting and localizing aircraft during War I (from Scientists in Power, Spencer R. Weart, Harvard University Press (Cambridge,
Mass.;
reproduced
with permission,
Niels
Bohr
Library,
American
Institute
of Physics,
New
York,
NY)). artificial heads, used for applications like assessing concert hall acoustics (see Blauert, 1983). Recent examples of a real time version of this approach in information display include the work by Doll at the Georgia the Super
Institute Cockpit
of Technology Project
at Wright-Patterson
projects listener
used a movable heard headphone
coupled
to that of the listener's
Another
In this analog
approximated
using
dynarnically linked such as an intensity
digital impulse
artificial signals
various
responses
Base
ALl00
(see Calhoun
system
developed
for
et al., 1987). These The
own head. display
system,
which
types
of simple
is the work by Loomis worked filters
well in an active with interaural
et al. (1990) tracking
on a navigation
task,
spatial
time and intensity
work
of HRTFs.
since the early Techniques
in the ear canals
80s has been devoted
for creating
of either
digital
individual
filters
subjects
to the measurement based
cues were
heads
cues
and real time
on measurements
or artificial
of finite
have been
under
development since the late 70s. But it is only with the advent of powerful new digital signalprocessing (DSP) chips that a few real-time systems have appeared in the last few years in Europe
6
aid
differences
to head motion. The display also included simple distance and reverberation rolloff with distance and the ratio of direct to reflected energy.
of the recent
synthesis
Air Force
and the Gehring
head to simulate moving sources and correlated head-motion. transduced in the ears of a manikin which was mechanically
type of real time virtual
for the blind.
Much
(Doll et al., 1986)
I_
andthe United States.In general,thesesystemsareintendedfor headphonedeliveryandusetimedomainconvolutionto achieverealtime performance. Oneexampleis the CreativeAudio Processor,a kind of binauralmixing console,developedby AKG in Austriaandbasedon ideasproposedby Blauert(1984).The CAP 340M is aimedat applicationslike audiorecording,acousticdesign,andpsychoacoustic research(Persterer,1989).This particularsystemis ratherlarge,involving anentirerack of digital signalprocessorsandrelated hardware.The systemis alsoratherpowerful in thatup to 32 channelscanbe independently"spatialized" in azimuthandelevationalongwith variablesimulationof room responsecharacteristics.Figure 2, for example,illustratesthe graphicalinterfaceof the systemfor specifyingcharacteristicsof the binauralmix for acollectionof independently-positioned musicalinstruments.A collectionof HRTFsis offered,derivedfrom measurements takenin theearcanalsof both manikinsandindividual subjects.AKG's original measurements weremadeby Blauertandhis colleagues(Blauert, personalcommunication).In a newproduct,which simulatesanidealcontrolroomfor headphone reproduction,the BAP 1000,the userhastheoption of havinghis/herindividual transformsprogrammedonto a PROMcard.Interestingly,AKG's literaturementionsthatbestresultsareachieved with individual transforms.Currentlythereareplansfor the systemto beusedin anOctober 1991 missionof the RussianSpaceProgram.The AUDIMIR studyexamineswhetheracousticcuesfor orientationcaneliminatemismatchof auditoryandvestibularcuesandthuscounteractspacesickness(AKG Report,Nov. 1989). Otherprojectsin Europederivefrom theeffortsof a groupof researchers in Germany.This work includesthe mostrecentefforts of JensBlauertandhis colleaguesatthe RuhrUniversity atBochum (Boergeret al., 1977;LehnertandBlauert,1989;Posseltet al., 1986).The groupat Bochumhas beenworking on a prototypePC-basedDSPsystem,againa kind of binauralmixing console,whose proposedfeaturesincluderealtime convolutionof HRTFsfor up to four sources,interpolation betweentransformsto simulatemotion,androommodeling.The grouphasdevotedquitea bit of effort to measuringHRTFsfor both individual subjectsandartificial heads(e.g.,the Neumannhead), aswell asdevelopingcomputersimulationsof transforms. Anotherresearcherin Germany,KlausGenuit,workedat the Instituteof Technologyof Aachen andlaterwenton to form his own company,HEAD Acoustics.HEAD Acoustics has also produced a real time, version
four-channel
binaural
of an artificial
head
mixing
(Gierlich
development of a structurally-based That is, rather than use individualized description resonances,
(based torso,
pie, the outer zation
ears are modeled
are within
In the United Air Force
Base,
and simulator
for room
1989). Genuit's
acoustics
adds some
States, McKinley
single source in azimuth KEMAR manikin made
similar
notable
for his
model of the acoustic effects of the pinnae (e.g., Genuit, 1986). HRTFs, Genuit has developed a parameterized, mathematical
as three flexibility
the variability
as well as a new
work is particularly
on Kirchhoff's diffraction integrals) of the acoustic effects shoulder, and head. The effects of the structures have been
of the model
transforms
console and Genuit,
cylinders
of different
to this technique
of directly-measured
projects
and Ericson
are currently (1988)
and length. states
The parameteri-
that the calculated
HRTFs. in progress.
developed
in real time. The system at 1° intervals in azimuth
diameters and Genuit
of the pinnae, ear canal simplified; for exam-
For example,
a prototype
uses HRTFs based with a head-tracker
system
at Wright-Patterson
which
synthesizes
a
on measurements from a to achieve source stabilization.
I
IN 4
OBOE n
.
%
(_
mLT
I,
0
o
/
ROOM
nENUE
._TO
SET MAIN
ST MIX
SET SPOt
kO ROOM
SET
WALL
LO
ROT
MAIN
MIX_MEM
SPOT
0 UIRT.
MIX
WALLS
MOVE
MEM
LiST
OELETE
UALL RE_
_ASTER
EXIT
Figure 2. Illustration of the graphical interface of AKG's Creative Audio Processor for specifying characteristics of the binaural mix for a collection of independently-positioned musical instruments (adapted from product literature for the CAP 340 M).
Gary Kendall system aimed
and his colleagues at Northwestern University have also been working on a real time at spatial room modeling for recording and entertainment (Kendall and Martens,
1984). Recently, Gehring Research has offered a software application for anechoic simulation using a Motorola 56001-based DSP card which uses two sets of HRTFs with the filters truncated to conform
to the limitations
of the DSP chip.
Kendall's group and the other of Wisconsin, Madison.
THE
One set is from a KEMAR
is from an individual
NASA
AMES
human-computer
Workstation
(VIEW)
part approach: coustic
interfaces.
project
(1) develop
principles,
(2) in parallel,
the synthesis psychophysical
technique studies,
the approach
to synthesis
et al., 1988).
a technique
Begault
As noted binaural
above,
or the ear canals sented
develop
in both basic
closely transfer tions
To achieve
and applied
our objective,
localized,
acoustic
Stone
at NASA
placed
of a human
(Butler
there
contexts.
both pinnae
and Belendiuk,
1977).
When
and veridical
1977; Blauert,
with which
differences
the pinnae,
head,
For example, speaker
listener
test stimuli
we synthesize
location
who is seated
are presented
a collaborative (Groveland, and since 1988,
stimuli
perception
stimuli.
These
144 equidistant
chamber locations
1974;
recorded
cues
involves
Doll et al., 1986) this way are pre-
of 3-D auditory
1983; Doll et al., 1986).
in an anechoic
from
difference
(Plenge,
space
Our procedure
is
we measure the acoustical and use these transfer func-
Head-Related
Transfer
Functions
using techniques adapted are placed near each
(Wightman
and Kistler,
in the anechoic
1989a).
chamber.
A new
responses is then measured for each location in the spherical array at intervals of 15 ° 18 ° in elevation. HRTFs are estimated by deconvolving the loudspeakers, test stimu-
lus, and microphone and Kistler, 1989a). aural
a four-
on psychoa-
to implement
has been
and interaural
(HRTFs), in the form of Finite Impulse Responses (FIRs), are measured from Mehrgardt and Mellert (1977) (see fig. 3). Small probe microphones
pair of impulse in azimuth and
required
The research
in the ears of a manikin
is an immediate
and Belendiuk,
as the basis of filters
Wide-band
based
Ames.
for capturing
of a human
Environment
we have taken stimuli
technology
related to binaural recording. Rather than record stimuli directly, functions, from free-field to eardrum, at many source positions,
eardrum
Virtual
Scott Foster of Crystal River Engineering of the University of Wisconsin, Madison,
one technique
1974; Butler
at the University
PROJECT
as part of the Ames
the signal-processing
with microphones
over headphones,
(Plenge,
by
in real time, (3) perceptually validate the synthesis technique with basic and (4) use the real time device as a research tool for evaluating and refining
and Philip
recording
by Wightman
DISPLAY
began
for synthesizing
effort between myself as project director, Calif.), Fred Wightman and Doris Kistler Durand
measured
has been working on a real time system for use in both and applied studies of acoustic information display in
The research
(Fisher
measured
3-D AUDITORY
Since 1986, our group at NASA Ames basic research in human sound localization advanced
subject
manikin
responses from the recordings made with the probe microphones The advantage of this technique is that it preserves the complex
over
the entire
shoulders, the insets directly
spectrum
of the stimulus,
thus capturing
the effects
(Wightman pattern of interof filtering
by
and torso. in figure
3 show
a pair of FIR filters
to the left and at ear level,
measured
for one subject
that is, at -90 ° in azimuth
for a
and 0 ° in elevation.
As
you would expect,the waveformfrom this sourcearrivedfirst andwaslargerin the left earthanthe responsemeasuredin the right ear.The frequency-dependent effectscanbe analyzedby applyingthe FourierTransformto thesetemporalwaveforms. Figure4 showshow interauralamplitudeandphase(or equivalentlytime) variesasa functionof frequencyfor four differentlocationsin azimuthat 0° in elevation.Forexample,the top-left panels showthatfor 0° in azimuthor directly in front of the listener,thereis very little differencein the amplitudeor phaseresponsesbetweenthetwo ears.Onthe otherhand,in thetop-rightpanelsfor 90° or directly to the listener'sright, onecanseethat,acrossthe frequencyspectrum,the amplitudeand phaseresponsesfor theright eararelargerandleadin time (phase)with respectto theleft ear. In orderto synthesizelocalizedsounds,a mapof "location filters" is constructedfrom all 144pairsof FIR filters by first transformingthemto the frequencydomain,dividing out the spectral effectsof the headphones usingFouriertechniques,andthentransformingbackto the time domain.
LEFT EAR
i.
Pinnae (outer ear) responses measured with probe microphones
Pinnae
transforms
digitized as finite impulse response
•
Le f
',
%1
Synthesized cues
(FIR) filters
Figure 3. Iliustration of the technique for synthesizing virtual acoustic sources with measurements of the head-related transfer function. An example of a pair of finite impulse responses measured for a source location at -90 ° to the left and 0 ° elevation (at ear level) is shown in the insets for the left and right ears.
10
°° i
i
i
1
Ob ÷
(/) tO
i
e,-
b
=
_
u
i
I1:
/ .2 .N
_._.
4) C 0 Q.
P
--I
__ (ID I
(SP) epnl!ldwe eA!_eleEI
I
(sue!peJ) eseqd
I
I
(E]P) epnl!ldkue eA!leleEI
11
_ I
(sue!peJ) eseqd
_o
I
_ _
_
The Real Time
System:
The Convolvotron
In the real time system, designed by Scott Foster of Crystal River Engineering, corrected FIR filters is downloaded from an 80286- or 80386-based host computer memory
of a real time digital
signal-processor
known
as the Convolvotron
the map of to the dual-port
(fig. 5). This set of two
printed-circuit boards converts one or more monaural analog inputs to digital signals 50 kHz (16-bit resolution). Each data stream is then convolved with filter coefficients the coordinates each
input
converted allows more
of the desired
signal
target
in the perceptual
to left and fight analog
up to four independent than 300 million
simulating
relatively
accommodate
locations 3-space
signals,
and the position
of the listener. and presented
and simultaneous
multiply-accumulates small
the longer
reverberant
filter
sources per second.
environments,
lengths
required
of the listener's
The resulting
are mixed,
The current
with an aggregate
computational
and the hardware
speed
by
thus "placing"
over headphones. This processing
for larger
head,
data streams
at a rate of determined
configuration speed
is sufficient
can be scaled
of
for
upward
to
enclosures.
The Convolvotron High-speed
realtime
TMS I
tracker Head
80386
host
Convolution --1--P'-
Interpolates HRTF coefficients
--2---_-
Controls I/O and timing
--3---_-
--1--_--3--1_ --4--!_
signal-processor
320/C25
processor
--2--I_ Updates 4-source geometry
digital
engine
LEFT
FIR filtering and mixing of 4 independent sources RIGHT
--4---_I
HRTF
map
I
• Flexible processing resources Maximum rate ~300 MIPS • 16-bit conversion A/D conversion • 50-kHz sampling rate • Estimated latencies: Headtracker; 50 ms Host and DSP; 30-40 ms
Analog source Inputs
Figure 5. Block diagram of the Convolvotron system dimensional virtual acoustic displays in real time.
12
designed
by Scott Foster
for synthesizing
three-
Motion trajectoriesandstaticlocationsatgreaterresolutionthantheempiricalmeasurements are simulatedby selectingthe four measuredpositionsnearestto the targetlocation and interpolating with linear weighting functions. The interpolation algorithm effectively computes at the sampling interval (every 20 gsec) so that changes in position are free from clicks
or switching
3-Space neous
Isotrack), sources
should
with the magnetic
head position
help to enhance
the simulation
especially
unique
studies at Wisconsin linear interpolations
or in motion
suggest between
relative studies
1940; Thurlow
coupled
(Polhemus
to the user. Such
suggest
that head
et al., 1967; Thurlow
with simulations
to the Convolvotron
system
in real time so that the four simulta-
trajectories
since previous
(e.g., Wallach,
of interactivity,
is apparently
head-tracking
can be monitored
in fixed locations
for localization
1967). This degree environments,
integrated
the listener's
are important
Pilot two-way
When
are stabilized
head-coupling ments
noises.
a new coefficient artifacts such as
of simple
move-
and Runge,
reverberant
system.
that the interpolation approach is perceptually-viable; simple locations as far apart as 60 ° in azimuth are perceptually
indistinguishable
from stimuli
tion performance real time display
begins to degrade at separations of 36 °. These data suggest that the HRTF map of a could tolerate interpolation separations of as much as 60 ° in azimuth (currently a
maximum
of 45 ° in the Convolvotron)
bly be smaller
of interpolation
As with any system The Convolvotron, depending
upon
as filters,
by the head-tracker. delay
the system speed
displacement
(the minimum
the perception speakers) example,
is 360 deg/msec,
of auditory
from about slower
360 deg/sec
or larger
relative
moving
upon
movement
90 msec.
at 180 deg/sec,
angle)
for a given
and others
using
source
filters
Facility,
"Audiosphere"
component
capabilities
to result
in perceptible
of simple
reverberant
implicaAt the
The directional
update
when
the relative delays
are to changes sources
may or
in angular
Recent
work
(moving
loud-
on
for moderate velocities. For audible movement angles
(Perrott,
1982; Perrott
and Tucker,
of the Convolvotron,
delays, rooms)
especially are being
while
when
speeds
multiple
generated.
is being used in a variety of other government, university, and ours, including the NASA Ames Crew Station Research and Devel-
the Psychoacoustics
and Bellcore
are well within
(e.g., simulation
Currently, the Convolvotron industry research labs besides by Durlach,
begin
tone-burst
used is
head-motion.
velocity.
real sound
for a 500-Hz
of the HRTFs
and so on. Such
humans
one. msec,
has important
32 ° or greater
how sensitive
30-40
of at least 50 msec
or realistic
every
of about
the duration delays
sources
to a new location resolution
by Perrott
velocities
should
proba-
of the percep-
of about
latency
of computational
16 ° or greater
4 to 21 °, respectively,
1988). Thus,
opment
sources,
suggests that these computational latencies are acceptable for source speeds ranging from 8 to 360 deg/sec, minimum
approaching sources
audible
delay
An additional
realistic
lag, depending
motion
should
evaluations
has a computational
This accumulation
to an angular
in a perceptible
localiza-
Ames.
of simultaneous
can only update
may not result
for elevation,
of the map in elevation
comprehensive
at NASA
geometry.
can simulate
in turn, corresponds
while,
data "on the fly," the term real time is a relative
of the source
source-listener
ranged
to compute
as the number
tions for how well the system maximum
More
the host computer,
such factors
coefficients
but that the resolution
are underway
required
including
and the complexity
introduced
from measured
than 36 ° (18 ° in the Convolvotron).
tual consequences
interval,
synthesized
(Ludwig
Lab at the Research
Laboratory
et al., 1990). The system
of their virtual
reality
system.
13
also forms
of Electronics part ofVPL
at MIT Research's
directed
PSYCHOPHYSICAL
duce
VALUATION
OF THE SYNTHESIS
TECHNIQUE
The working assumption of our synthesis technique is that if, using headphones, we could proear canal waveforms identical to those produced by a free-field source, we would duplicate the
free-field
experience.
Presumably,
replicate
the free-field
experience
must
come
directly
studies
individualized
listener.
in which
HRTFs
would
The only conclusive
free-field
be the most likely
approach
mates
study
test of this assumption
and synthesized,
free-field
listening
are
sources.
Sources
and Kistler
The stimuli
Using
(1989b)
were
Individualized
HRTFs
confirmed
the perceptual
spectrally-scrambled
noisebursts
adequacy
of the basic
transduced
either
by
in an anechoic chamber or by headphones. In both free-field and headphone conditions, indicated the apparent spatial position of a sound source by calling out numerical esti-
of azimuth
and elevation
example, a sound heard to the left and somewhat and no feedback
(in degrees)
using
a modified
spherical
coordinate
system.
For
directly in front would produce a response of "0, 0," a sound heard directly elevated might produce "-90 azimuth, + 15 elevation," while one far to the
rear on the right and below original
for Static
by Wightman
for static
loudspeakers the subjects
might
was given.
produce
Detailed
"+ 170 azimuth,
explanations
-30
elevation."
of the procedure
Subjects
and results
were blindfolded
can be found
in the
paper.
The data analysis responses
of localization
are represented
of a unit-sphere data, the usual
of 54 ° . Thus,
these
centroid,
experiments
by points
is complicated
in three-dimensional
space;
by the fact that the stimuli in particular,
as points
and
on the surface
since distance remained constant in this experiment. For these spherically-organized statistics of means and variances are potentially misleading. For example, an azimuth
of 15" on the horizontal
elevation
it is more
psychophysical
is a unit-length
plane
is much
appropriate
data (Fisher vector
larger
in terms
to apply
of absolute
the techniques
et al., 1987). The spherical
with the same direction
distance
than a 15" error at an
of spherical statistic
as the resultant,
statistics
used here,
the vector
unit-length judgement vectors. The direction of the centroid, described tion, can be thought of as the "average direction" of a set of judgements
to charac-
the judgement
sum of all the
by an azimuth and an elevafrom the origin, the subject's
position. Two indicators of variability, K -1 and the average angle of error, were also computed. These results will not be discussed here; the reader is referred to the original paper. Another "confusions." the median also found.
type of error, observed in nearly all localization studies, is the presence of front-back These are responses which indicate that a source in the fro]at hemisphere, usually near plane, is perceived to be in the rear hemisphere. Occasionally, It is difficult to weight these types of errors accurately. Since
low (e.g., Oldfield descriptive hemisphere, greatly
to
compared.
A recent
terize
using
for a given
from psychophysical
Validation
error
synthesis
when
Computing
statistics; that is, the responses are coded as if the subjects had indicated as in the analyses of table 1 and figure 6. Otherwise, estimates of error
the correct would be
inflated.
and Parker,
1984a),
On the other hand,
reversals
if we assume
have
generally
that subjects'
14
been
the reverse situation is the confusion rate is often
resolved
responses
correctly
reflect
their
Table
1. Summary
field
(boldface
from
Wightman
statistics
type)
comparing
and virtual
and Kistler,
ID
resolved
sources
localization
(in parentheses)
judgements
for 8 subjects.
of free(Adapted
1989b)
Goodness
Azimuth
Elevation
of fit
correlation
correlation
Percent
front-back
reversals
SDE
0.93
(0.89)
0.98 (0.97)
0.68 (0.43)
12
(20)
SDH
0.95
(0.95)
0.96 (0.95)
0.92 (0.83)
5
(13)
SDL SDM
0,97 (0.95) 0.98 (0.98)
0.98 (0.98) 0.98 (0.98)
0.89 (0.85) 0.94 (0.93)
7 5
(14) (9)
SDO
0.96
(0.96)
0.99 (0.99)
0.94 (0.92)
4
(11)
SDP
0.99 (0.98)
0.99 (0.99)
0.96 (0.88)
3
(6)
SED
0.96 (0.95)
0.97 (0.99)
0.93 (0.82)
4
(6)
SER
0.96 (0.97)
0.99 (0.99)
0.96 (0.94)
5
(8)
Mean
5.6 (11)
perceptions, reported
resolving
such confusions
as a separate
Here,
table
mary statistics synthesized numbers
could
be misleading.
overview
of the results
1 provides
a general
comparing
the eight subjects'
stimuli
Thus,
the rate of confusions
is usually
statistic.
are shown;
in parentheses
resolved
the numbers
judgements
in bold-faced
are for the synthesized
of Wightman
and Kistler
of location
Note
that overall
Sum-
for real (free-field)
type are for the free-field
conditions.
(1989b).
and
data and the
goodness
of fit between
the actual and estimated source co-ordinates is quite comparable, 0.89 or better for the synthesized stimuli and 0.93 or better for free-field sources. The two correlation measures indicate that while source azimuth appears problematic, particularly ples of the range Actual jects
source
to be synthesized nearly perfectly, synthesis of source elevation is more for SDE who also has difficulty judging elevation in the free field. Exam-
of patterns
azimuth
of localization
behavior
(and, in the insets,
elevation)
SDO and SDE of Wightman
ments
and the panel
own transfer corresponds
on the right shows
functions.
stimuli
Thus,
individual
conditions of "good"
tended
is consistent and "bad"
(table
sources,
synthesized
versus
(1989b).
judgements
the judged
The panel
judgements
for the stimuli
the positive
diagonal,
can be seen
azimuth
in figure
are plotted
for sub-
on the left plots free-field synthesized
or a straight
6.
judge-
from the subjects'
line with a slope
of 1.0,
performance.
rates
field and synthesized while
On each graph,
to perfect
The confusion
and Kistler
for resolved
1) were
relatively
respectively.
to be greatest
differences for a given localizers
low, with average
Similar
to the location
for subjects
do occur,
who also had higher
the pattern
subject;
it appears
is supported
by these
15
rates
of results that Butler
data.
of about
judgements, across
rates
6 and 11% for freereversal
for the
in the free field.
synthesized
and Belendiuk's
rates
and free-field (1977)
observation
I
I
i
Free field (SDO) Azimuth
120
E)
I
l_
I
Headphones Azimuth
o
I
I
o6
(SDO)
®
888
o8
Q
8
o
8
8
•
o8
-6o
_,®
I I i_ BI Elevation o _]
e _ O
-
3O
I
I
60
Iml_l
I Elevation
_]
3
H N
®
o®
,.,.j
I
o
o
8 %0
o
0
0
600
8
0
e_ e o
-30
o I
8 __8
I
i
I
I
i
I
I
-30
I
0
I
30 160
I
8
[
Free field (SDE) 120
/
1
I
I -30
I 0
I
I
I
I
I
8
Headphones (SDE) Azimuth
®
8 o 8
O)
o88
30
®
8
.-j
g
O-
I
I
®oo
I
-
-
1 -120
1 -60
1 0
o
-30
I 60
0
[]
I -120
[]
-30 I
I
30 60 I 120
I
0
0 1
I
3O
o
®® [
I
Elevation
o ®
®
•
I
®
_} ° 060
-30 o
of7
I
Elevation
0
8o
-120
o
og
g
-60
o
0
8
gg8 6O
0
®
60
i°
1 I 30 60 I ,
,
¢ ¢ e e
Azimuth
1
..........
o
f
o
'
-120
I -60
I 0
I
-30
I 60
0
I
I
30 60 I 120
Target position (deg) Figure
6. Scatterplots
azimuth plots from
for subjects
of actual SDO
source
azimuth
(and, in the insets,
and SDE in both free-field
and headphone
eleVation)
versus
conditions.
judged
source
The plot on the left
free-field judgements and the plot on the right shows judgements for the stimuli synthesized the subjects' own transfer functions. Each data point represents the centroid of at least 6 judge-
ments.
72 source
positions
are plotted
in each plot. Data
bined in the azimuth plots and data from 24 different insets. Note that the scale is the same in the azimuth Kistler,
1989b.)
16
from 6 different
source
elevations
are com-
source azimuths are combined in the elevation and elevation plots. (After Wightman and
Acoustic Individual
differences
Determinants
in localization
behavior
of Performance suggest
that there
may be acoustic
features
pecu-
liar to each subject' even measurements
s HRTFs which influence performance. Thus, the use of averaged transforms, or derived from normative manikins such as the KEMAR, may or may not be an
optimum
for simulating
approach
For example,
figure
free-field
7 illustrates
responses
for a single
averaging
of these
significant
features
source
the between-subjects
location
functions
would
in the acoustic
30
sounds. variability
(after Wenzel
in the left and right-ear
et al., I988a).
tend to smooth
the peaks
Obviously,
and valleys,
magnitude
any straightforward
thus removing
potentially
transforms.
Left ear
"_,t ',_ _'_Y_-_.;_.k "_-
-30 ....
E
I
I
1
I
I
I
i
L
I
I
I
1
1
1
I
l
1
I
Right ear
30 .m
i13 Ill
0 i
!
-3o! 400
20O
1000
3000
l
L
i
[
[
l
___i
1
i
I
20000
7000
Frequency Figure
7. Magnitude
responses
right ears are plotted On the other especially acoustic Figure
magnitude illustrated
to identify
are shown
specific
data indicate
SDE. A preliminary
8 plots "interaural
and Kistler's
position
for 8 subjects.
The left and
analysis
features
that elevation of elevation
of HRTFs
which
is particularly
coding
result
difficult
suggests
that there
for four subjects'
interaural
in good
to judge, is an
for this poor performance.
data. The computational intensity
it may be possible
The psychophysical
for subject basis
source
separately.
hand,
or bad localization.
for a single
(1989b)
changes
elevation
derivation figure
for different
dependency"
of these
10. Essentially, elevations
functions
functions
can be found
the six functions normalized
in the description
on each graph
to zero elevation,
show
amplitude
of Wightman how interaural
the flat function,
when
the
responses are collapsed across all azimuths. In spite of the large intersubject variability in figure 7, the dependency functions for the better localizers (shown in the top three
17
graphs)arequite similar to eachotherandshowclearelevationdependencies. SDE'sfunctions,on the otherhand,aredifferent from the othersubjectsandshowlittle changewith elevation.Thus,it appearsthat SDE'spoorperformancein judging elevationfor bothrealandsynthesizedstimuli may be dueto a lack of distinctive acousticfeaturescorrelatedwith elevationin hisHRTFs.
0 -40 -80 F
0
[
I
Subject
I
1
-
-]
I
I
I
SDP
Deg 36 18 0
A
til _"---__"
v
0i
-'------'----_
1 _
-18
--_--"---'_
-36
iX
E ¢ii
[
0 [
t
1
Subject
1
[
_--
"
"
1
r
I
Deg
SDH ......
om
",.._ .__j_
-_____ ____,,__
_.__
t_ lii
/_
_._,_.__i_---,_-
!l: -40
54 36 18 0 i l
-36 -18
-80 L
•
m
]=
' _- SDI_ Subject
L
l
'
'
_
'
k
I
1
'
'
---_
Deg 54 36 18 0 -18
-40
____:__
-80 I
200
l
400
.£.
__
-36
....
i
1000
I
3000
L
[
7000
_
I
--
20000
Frequency Figure 8. Interaural elevation dependency functions plotted for 4 subjects. From top to bottom, the functions within a panel represent elevations of +54, +36, + 18, 0 (the reference elevation), -18, and -36 ° .
18
The analysis jectured
of individual
about
but rarely
is, can one manipulate
differences
directly
tested
localization
in pinna
cues brings
(see Butler
performance
up a topic
and Belendiuk,
simply
which
has often been
1977, for an early
by listening
through
another
con-
example). person's
That ears?
Or put another way, can we adapt to and take advantage of a set of good HRTFs even if we are a bad localizer? The following data from Wenzel et al. (1988b) illustrate the kind of "cross-ear listening" paradigm
that is possible
judgements
of location
Figure
9 shows
izer listens
using
our synthesis
as in the experiment
what happens
to stimuli
synthesized
technique.
Again,
by Wightman
to resolved
azimuth
from another
the subjects
and Kistler
and elevation
good
localizer's
provided
absolute
(1989b). judgements
pinna
when
transforms.
a good
Azimuth
local-
is plotted
in the top panels and elevation is on the bottom. The left and far-right graphs plot centroids for SDP's and SDO's azimuth judgements vs. the target locations when the stimuli were synthesized from their own HRTFs. Front-back confusions have been resolved as described above. As can be seen, both SDP and SDO center
graphs
azimuth
show
degrades
cues
correspondence
Figure the HRTFs
the synthesized
what happens somewhat,
ing that elevation overall
localize
when
SDP listens
but not a great
are not as robust between
stimuli
based
"through"
deal. Elevation
as azimuth
real and perceived
on their own HRTFs SDO's
cues across locations
features
do determine
SDE could
actually
plots these
data. Again,
his own HRTFs, that cross-ear
sample
improve SDE,
performs
listening provided
localization,
his performance whose
nearly
azimuth
is not a symmetrical by SDO's
judgements listening
effect
localizers,
pinnae.
one might
if he could
as well when
to only 2 hr for the good
cues
of
further,
of individuals,
The
suggest-
but an
intact.
10 compares performance when a good localizer, SDO, listens to stimuli synthesized from of bad localizer SDE. Again for azimuth there is little degradation. However, for eleva-
If acoustic
better
well.
Localization
degrades
the range remains
tion, it seems that SDE's pinnae provide poor elevation cues for SDO that acoustic features of the transforms determine localization.
compared
pinnae.
performance
quite
These
performance
to begin
are accurate to SDO's
for elevation.
data are hardly
for stimuli cues.
after about
not take advantage conclusive
in Wightman
with. But they are suggestive.
the notion
case is true; that
SDO' s ears. Figure
azimuth Even
supporting
the reciprocal
listen "through"
SDE still could
size of one; only SDE of the eight subjects
poor elevation
conclude
as well,
11
synthesized However,
from it appears
50 hr of testing, of the presumably
since they are based
and Kistler
(1989b)
showed
It ma2¢ be that there
on a such
is a critical
period for localization gous to the experiments
which, once past, can never be regained. Perhaps more likely is that, analowith prisms in visual adaptation (see Welch, 1978), SDE would need pro-
longed
exposure
and consistent
to SDO's
pinnae
in order
to learn
to discriminate
the subtle
acoustic
cues he does not normally experience. Apparently, a few hours of testing a day, especially in the absence of either verbal feedback or correlated information from the other senses, are not enough allow
adaptation
to occur.
19
to
I
E_
i
i
I
t'
i
i
t
i
[] -
o
[] m rn rn
E] Et]
-@
r_ Eli
[]
r----------_ DD []
r:1 i?n E]
0 13 B
-
i'n rq_'nE]
0
rnr_
0
[] "
-
E]_EI E)
133 13E]
'
[qIZ_] (/_ I,,i,,,I 1. I
_!]----I
I. I
1 I
l I
E,
I I
I I
I
I
[] -
o
[] [] G]E]
[] []
[]
o ¢D
rn7 []
13
D_
EZZ) mm_
[] [] [] I3
o -
_
Fn
1313
E]3E] E]
_
El3 l-rrn
I I
• .... I
I I
I I
[]
_--I-
I
I
I
[] (3 -
[] [] OD
"i
I
[_ []
E[]1313
13]
8
Ell
E]
EZ]
°,,,_
I":1 ran
0
El] -
E]EI]
El3rn
E]Z_
_,,
E_
Z I
t
l
i
l
[]
I
I _
I
o
o
l
(Bop) uo!l!sod paOpnr
(Bep) uo!llsod peBpnr
2O
I
] E_
I
I
I
I
I
I
[] [] [] E] EDE]
-
[]
ED []
[]
O ¢q
FA
--
[]
[]
I
[]
I
_..,
[]
-_,
E)EI]
I
o
[] [] []
ElI3
o._
E)EII]
o
o
-$ E] E]E] [] [] Eli
[]
EEEI]O]
D_ -
=
O
o_
FTnD BED
O
[]
LU O (D
[] Eli []
|
[] FI G] El] 121
O ¢q
D (D
E=8
!
_,_
.=-_
tU F-
(/I
0
--
t:_
_NN_
O
W O
I
E_] G] OB -[]
Eli 13
u._ _
[3]
I
rnq []
]
ga
EDEI3
O3
]
_
_
tU E] [DE]
LU a 03
--
= ,._
|
--
_
¢D []
LU a 03
_
E]E_]
O3
{2] [] [] BE]
-
0
@
I--
[] E] [] [3B
--
t.)
_
|
O
[]
[] EI)FI
O
|
rnn 121
I
I
I
I
I
I -
E]
I
I
_._
I
o cq
_
_
O
T--
E] [] D
[] E_]
_
[]
O (D
ED
E]
[] []
©_
rq Eli E]
[] Eli
_._
•_'_ [] --
D
O
EI33D
_ O
o
[] E][g
-
O
O
[]
r_
-
D
_,
Eli DE]
|
Q
I
I
I
I
•_
C_ O3 _
- 6
O -
O
I
El] [] [] E
"2,
_-_
O
E]G_]
C_
g I
I
I
I |
|
(6ap) uo!l!sod
(5ap) uo!t!sod
pa6pnr
21
pe6pnr
_ _
_._
_._
•
¢0
I
I
I
I
I
I
I
I
[] [] DE] -
o
[] [] []
151 rq
[]
rrrrl
_
o
_
O
-
O
[713
FIn 13
[] r.--n [] [] 13
13 FI1 V1
E1 rrm -
rl
o • Flq r:n
DE] OQ
DE]
_._
-
%_
|
r:n vl
O r:vI-I
-_,
131:1 E I I
I I
__I I
II
1
I
I
[] DD
I
I
__
I
o
[]
_
[] FIE] 133
,r=,,
_
[]
[]
r.-l [3 I-:I [] []
DE]
[]
[]
"O
EID
rln rrm
8.
,? []
"_'.n
_
[]
[]
o
I
[] E]E] []
D rn
r.lF1 i
r_
I
EtlD
_._._
I-
a
o
r_
I
.
o
[] []
I
_'_
o
o
-t
O tO
[]
Vlq rrrl FI
-
_
D
o
ffl
-
,,_
-_
F1EI3
_
_l,IJ
[]
I
I.
I
I
_ U-1o ._o.N
.=.o "_ .o ._
,
o tO
88
r:T1VI
[] OEI
8[3 [] [] [] [] OD
_'_ o
o
r_
W
•_;._
,? VI--1
88
121
_-_
[]
,
[] []
[]
[]
rqrrn
|
l o
l
1
l
o
_ "T
(6ep) uo!l!sod
I
I
1
o
I
_ "_ "_ (Oep) uo!1!sod pe6pnf
pe6pnf
22
Inexperienced In practice,
measurement
Listeners
and Nonindividualized
of each potential
listener's
HRTFs
HRTFs
may not be feasible.
It may also be
the case that the user of a 3D auditory display will not have the opportunity for extensive training. Thus, a critical research issue for virtual acoustic displays is the degree to which the general population of listeners
can readily
ized transforms.
The individual
worst
case,
using
obtain
nonindividualized
than the listener's
inherent
set of HRTFs
approach
is to use the HRTFs
from a subject
and are thus correlated
conditions.
Recently,
generated
"good
resolved, nearly
12 illustrates localization
poor elevation illustrated
performance
across
in figure
about
19 vs. 31%.
Kistler feedback
more
A reasonable "behaviorallyand headphone
using
a variant
on the
When
with judgements
and headphone
and virtual show
of a representative
source
front-back
SDO,
confusions
a
are
for the nonindividualized
et al. (1988b), conditions,
conditions
inconsistent
subject,
(1988b).
a response
(fig. 13). The
behavior
pattern
show which
third pattern
with poor elevation
if it turns
stimuli
2 of the subjects
out to be common,
accuracy would
is
is in
be a
that most listeners
Note,
errors
stimuli, though,
the result
listeners
tend to support
due to front/back
experienced
listeners
6 vs. I 1% while
that the existence
of the simulation.
useful
directional
information
from
confusions
are resolved.
in the Wightman
the inexperienced of free-field
It is possible,
and Kistler
listeners
confusions as Asano
For free-field
show
indicates
et al. (1990)
study
exhibit
average
rates
of this experiment Thus,
and the more experienced it may be that some
to take full advantage
of a virtual
23
form
subjects of adaptation
acoustic
display.
of
that these reverhave claimed,
as subjects adapt to the unusual listening conditions provided by static real or simulated. The difference in free-field confusion rates between
this view.
will be required
can obtain
requiring the use of individually-tailored HRTFs, particularly for is important here. Again, the results plotted in figures 6 and 9 through
rates of about
that these errors diminish anechoic sources, whether inexperienced
and Kistler
The latter phenomenon,
in which
free-field
confusion
sals are not strictly
in the ear canals
Like SDE in Wenzel
2 subjects
data suggest without a caveat
on analyses
simulated
study
much
may be able to use a
in both free-field
a more extensive
in the
displays.
these
an auditory display azimuth. However,
front/back
accuracy
listeners
have been
ability
on nonindividualthat, even
cues for localization.
perceptual
is quite good,
in both free-field
conditions.
for virtual
versus
localization
inexperienced
of 12 of the 16 subjects.
the free-field
14; here,
only the synthesized
14 are based
based
11 suggest
measurements
completed
measured
in the free-field.
performance
In general,
adequate
whose
by Wightman
the behavior
to those
at least consistent
problem
using HRTFs
from the experiment
identical
then, even
with known
et al. (1991)
9 through
paradigm; 16 inexperienced listeners judged the apparent spatial location of over loudspeakers in the free field or over headphones. The headphone stimuli
digitally
localizer"
Figure
Wenzel
cues from stimuli
does not degrade
In general,
as long as they provide
calibrated"
cross-ear listening sources presented
localization data of figures
transforms
ability.
particular
were
adequate
difference
of Wightman or training
and with
the
i
1
I
I
(9 E) 0®
I
I
I ®
®
Free field (SIK) Azimuth
120
1
I
__
SDO's filters 20 dB rove (SIK) Azimuth
® o
® ®
®
E)
E) O o
® ®
8
0 6O
i •o
I
I
@60 -Elevation
-60 -
8
,..j ®
-120
®
®
®
®
I -120
1 -60
@ e60
_]1
rn [] Q
o
30 -
30
I I Elevation
;
I
I
[]
_..
-
r_
®
i
O-
B
E]
-30 - E] [] I I -30 0
®
m
I
I 0
I 60
E)
0
Q
-
-30 ] [ 30 60 1 120
---
E)
o
0
,
o ] -120
1 -60
I 0
I -30 I 60
1 0
I I 30 6O I 120
Target position (deg)
Figures 12. Scatterplots of actual source azimuth (and, in the insets, elevation) versus judged source azimuth for subject SIK in both free-field and headphone conditions. The plot on the left plots freefield judgements and the plot on the right shows judgements for the stimuli synthesized from nonindividualized transfer functions. Each data point represents the centroid of 9 judgements. 24 source positions are given in each plot. Data from 6 different source elevations are combined in the azimuth plots and data from 18 different source azimuths are combined in the elevation insets. Note that the scale is the same in the azimuth and elevation plots.
24
,
I
I
I
I
I
I
I
I
I ®
®
Free field (SID) Azimuth
120
e
SDO's filters Azimuth
®Q® ®
20 dB rove (SID)
6O
"O
o °
® ®
v
E)
e-
.9
Q o
t_
g "O
E)®
Q
g 60 ® ®
-6O o
-120
I
060
-
-30
o
B
I -120
-
Q
I 0
1
I
I
-60
0
60
I 30
I 60
13. Scatterplots
azimuth
for subject
field judgements dividualized
of actual
1
source
SID in both free-field
120
azimuth
functions.
Each
is the same
in the azimuth
and elevation
plots.
25
I--
-i
-30
[
I
-60
0
I
I
3O 6O
I
I
60
120
1
J
(deg)
elevation)
conditions.
for the stimuli the centroid
versus judged
The plot on the left plots synthesized
of 9 judgements.
positions are given in each plot. Data from 6 different source elevations are combined plots and data from 18 different source azimuths are combined in the elevation insets. scale
I
_ .I
(and, in the insets,
represents
I
D
-3O 0
- 120 position
and headphone
data point
o
I
and the plot on the right shows judgements
transfer
0
o8
Target
Figures
30
o o
I -30
e
I I Elevation
O o
0
o
I
-
g30
o
¢ e
i
Elevation
from
source freenonin-
24 source in the azimuth Note that the
I
I
I
I
I
I
I
I
I
I
I