From Whereware to Whence- and Whitherware: Augmented Audio Reality for Position-Aware Services Michael Cohen∗
´ Villegas† Julian
University of Aizu
University of the Basque Country
A BSTRACT Since audition is omnidirectional, it is especially receptive to orientation modulation. Position can be defined as the combination of location and orientation information. Location-based or locationaware services do not generally require orientation information, but position-based services are explicitly parameterized by angular bearing as well as place. “Whereware” [7] suggests using hyperlocal georeferences to allow applications location-awareness; “whence- and whitherware” suggests the potential of positionawareness to enhance navigation and situation awareness, especially in realtime high-definition communication interfaces, such as spatial sound augmented reality applications. Combining literal direction effects and metaphorical (remapped) distance effects in whence- and whitherware position-aware applications invites oversaturation of interface channels, encouraging interface strategies such as audio windowing, narrowcasting, and multipresence. Index Terms: H.5.5 [Information Interfaces and Presentation]: Sound and Music Computing— Methodologies and techniques; Signal analysis, synthesis, and processing— [H.5.1]: Information Interfaces and Presentation—Multimedia Information System— Audio input/output H.5.2 [Information Interfaces and Presentation]: User Interfaces— Auditory (non-speech) feedback— 1 I NTRODUCTION “Location-based services” (LBS) typically refers to information delivered to laptop (nomadic) or to phone (mobile) communication technologies such as (2G) GSM and (3G) UMTS: CDMA (WANs, wide area networks), WiFi (WLAN, wireless local area network), and, increasingly, IEEE 802.16: WiMAX and LTE (MAN, metropolitan area network), such as the NTT DoCoMo “Xi” (“crossy”) launched late 2010. The placelessness of purely virtual information cripples applications to LBS. “Hyperlocality,” as originally coined by Bruce Sterling, encourages the use of georeferences, web databases stuffed with geographic co¨ordinates and geospatial data: geotagged photos, panos, object movies, etc. Services such as FourSquare, Gowalla, and Facebook Places use realtime GPS-derived location information to mash-up navigation and social networking. Location-based or -aware services nominally use translational location, rectangularly representable as x, y, z (or sway, surge, and heave in a right-handed, z-up system). Position, as suggested by its cognate “pose,” is described by orientation as well as translation state. Orientation in 3-space is commonly described as roll, pitch, and yaw, also known as roll, elevation, and azimuth. These spatial dimensions are summarized by Table 1. Of course optical or video augmented reality “see-through” systems, such as that visualized by Figure 1 [8], need full position information, including orientation tracking, to properly align composited layers— using trackers or machine vision techniques such ∗ e-mail:
[email protected] † e-mail:
[email protected]
as fiducials, markers, or optical feature tracking. Besides GPS, devices such as mobile phones can use cameras along with gyroscopes, accelerometers, magnetometers (compasses), and even dead reckoning, perhaps integrated with some kind of sensor fusion, to estimate location and orientation. A receiver needs (geometric, photometric, acoustic, . . . ) alignment or calibration with the real world to properly register overlaid objects and scenes. Important issues for researchers and developers include static (geometric) error and drift, rendering (photometric) error, dynamic error, (minimization of time lag and jitter), and auditory error, all perhaps somewhat mitigated by a forgiving user or a non-literal user interface, allowing discrepancies within the bounds of “suspended disbelief” plausibility. “Whereware” [7] suggests using complete positional data to directionalize and spatialize media streams such that the relative arrangement of sources and sinks correspond with actual or notional circumstances. For simple example, a voice might be processed as part of its display to a remote user to denote the source’s direction. Such applications of spatial sound [22, 23, 38] [24] [19], consistent with and reinforcing one’s natural sense of direction and proprioceptive sense, are well known to improve situation awareness. Especially in polyphonic soundscapes, in the presence of multiple audio channels, such spatial sound can enable the “cocktail party effect,” by which speaker separation enhances discriminability and intelligibility. Humans can “hear out” sound sources, with resolution determined by minimum audible angle and angular just noticeable difference (localization blur) [14] and simultaneous apprehension, perhaps limited by some attention-juggling, multitasking threshold akin to “Miller’s magic number” (around 7). Whereware denotes location-aware applications, including LBS and augmented reality applications. “Whenceware” (from the word “whence,” meaning “from where”) denotes location-aware applications that reference a source; “whitherware” (from “whither,” meaning “to where”) denotes location-aware applications that reference a destination. These capabilities are perhaps especially relevant to interfaces with spatial sound capability. For example, whenceware-enhanced voicemail could directionalize a recording so that the message apparently comes from the sender’s location. Looser mappings are also possible: the virtual source location need not correspond to the geographic location of the sender, but is mapped into an individualized space of the sink, so that important messages could come from a direction in front of her, while ‘junk’ mail from somewhere behind. Whence- and whitherware navigation systems, primed with hyperlocal geotags of intended destinations, can auditorily display sonic beacons— landmarks, warnings of situated hazards, and “come hither”s like classical sirens, beckoning travelers to a goal or checkpoint, like the Pied Piper of Hamlin. 2
C HALLENGES
FOR
AUGMENTED AUDIO R EALITY
A mixed reality system must handle the alignment of real and virtual worlds: geometric registration (position, pose [orientation or attitude] and occlusion), photometric registration (tone, illumination, shadow and reflectance), temporal or dynamic registration (minimized latency for observer’s motion and motion of surrounding objects for overall registration and stability of registration), and
Table 1: Physically Spatial Dimensions: taxonomy of positional degrees of freedom, including cinematographic gestures.
Position Static (Posture)
Dynamic (Gesture) Translation Camera Motion
Location (Displacement)
Scalar
lateral (transverse)
abscissa x
sway
frontal (longitudinal)
ordinate y
surge dolly
vertical (height)
altitude z
heave boom (“crane”)
Orientation or Attitude
track (“crab”)
Rotation pitch (tumble, flip) tilt
Along Axis
Perpendicular to Plane
x
median (sagittal)
y
frontal (coronal)
z
horizontal (transverse)
About Axis
In Plane
climb/dive
y
median (sagittal)
Directions (Force) left ↔ right out, back (aft): retreat (drag) ց տ in, forth (fore): advance (thrust) up: ascend (lift) l down: descend (weight)
elevation or altitude
φ
(“barrel roll”)
ψ
roll (flop)
left/right
y
frontal (coronal)
azimuth
θ
yaw (whirl, twist) pan
CCW / CW
z
horizontal (transverse)
Axis
In Plane
z
horizontal (transverse)
Location and Orientation focal pivot
x, y, θ
Translation and Rotation orbit with phase-locked attitude “spin-around” or inspection
CCW / CW
(a) Both panes are configurable to show an endo-, ego-, or exocentric stereographic IBR, stereographic CG rendering, or mixed perspective display.
(b) The sampled and synthesized scenes are aligned. Figure 1: Perspective Displays by “Just Look at Yourself!” Visualizer and Emulator. “Just Look at Yourself!” is a computer graphic browser intended to emulate a stereographic display, allowing more flexible camera positions, including endocentric (1st -person: from the point-of-view of the avatar), tethered (2nd -person: attached to but separate from the avatar), and exocentric (3rd -person: totally detached from the avatar) perspectives. We use it to model the panoramic projection, including stereographic capability through side-by-side image pairs. A humanoid in the scene, a figurative avatar, stands at the location corresponding to a pair of panoramic nodes. Cylinder pairs with texture maps corresponding to the viewpoint node are instantiated in the scenegraph as “goggles” donned by the avatar, textured with the respective panoramic image, and centered at the eyes of an avatar at that node, Back-face culling is disabled to use a single polygon with a double face (bifaceted), so the rendered texture map is also visible exocentrically. An integrated spatial audio display superimposes a directionally consistent soundscape.
auditory registration. Enhancing or augmenting reality by juxtaposing artificial aural information implies recreating an inexistent medium perturbation and adequately mixing it with a current environment. This challenge is intensified by the generally different spatial characteristics of the virtual and real environments. Reverberation, resonance, dampening, and intensity, among other sound and medium qualities, must be matched to combine virtual and real auditory events, the more carefully the more seamlessly.
audio installation and as opposed to a cinema theater). In that way, corrective DSP could be applied to eliminate loudspeaker-induced artifacts, and successfully integrate virtual and real audio. On the other hand, head-tracking headphones, although somewhat cumbersome and isolatable, allow greater freedom of movement while maintaining an individualized correct audio display. One common technique to achieve a suitable combination of real and virtual audio is to model the auditory illusion as a composition of a sound source and a cascaded chain of filters that reshape it. The effects of the sensor (i.e., a microphone when the sound source is not synthesized), room or space, head, torso, & pinnae of the listener, and transducers (loudspeakers or headphones), must all be considered (by convolving their impulse responses with each sound source) to create a veridical auditory illusion. Spatial sound is almost always generated using spherical co¨ordinates, effectively separating direction (azimuth & elevation) and distance (radius). To reify the notion of a sound source projecting audible sound to a sink, a spatial sound receiver, delegated as an avatar in an exocentric virtual environment, it is natural to map the direction directly, but to take some liberties by scaling the range. A visual analog would be tags, textual markers, sketches, outlines, or other visual information, superimposed upon photographic scenes, like those featured by so-called augmented reality apps like Sekai Camera. Even setting aside issues of virtual depth such as vergence and accommodation in binocular or stereographic visual displays, the mark-up is adjusted for normalization or to somehow accommodate projected range. 2.1 Distance Effects
Figure 2: Preoccupied by his conversation, this everyman is in danc ger of striding off the pedestal of progress. ( 2003 The New Yorker Collection from cartoonbank.com. All rights reserved.)
Audio display form factors, in roughly decreasing order of intimacy, include • stereo headsets (in, on, or over the ears) • bone conduction headphones • “nearphones,” near earphones, mounted near enough the ear to avoid cross-talk • “sound bells,” parabolic or hyperbolic • parametric speakers with ultrasonic microspeaker arrays • stereo loudspeakers • loudspeaker arrays (telematics, as in a car) – discrete (home theater, 5.1, etc.) –
VBAP ,
Vector Base Amplitude Panning [31]
– DiRAC, DIRectional Audio Coding [32] –
HOA ,
High-Order Ambisonics
–
WFS ,
Wave Field Synthesis [33, 18]
The selected reproduction apparatus (most commonly, loudspeaker array or headphones) also imposes restrictions and complications on the integration: On the one hand, loudspeaker arrays seem less invasive therefore increasing the naturalness of a solution based on such technique. Loudspeaker arrays are best suitable for situations where listeners are confined to a predetermined space and the delivered audio is independent of the listener’s position (as in an
In auditory displays, the direction of virtual sources can be simply literal, but to make a source audible it must be granted a mythical intensity, a figurative loudness. Extraordinary sounds such as volcanic eruptions can be as loud as 200 dB SPL and audible hundreds or sometimes thousands of kilometers away, but ordinary sounds such as speaking voices and music rarely go above 100 dB SPL and can not be heard much more than a kilometer away. Despite the technological improvements in 3 D audio reproduction, most current solutions create a satisfactory illusion for any azimuth and elevation angle, but fall short in distance reproduction/recreation. Apparent distance of a stationary sound source [28] is determined mainly by overall sound level: intensity of a point source varies with the inverse of the squared distance [27], interaural level differences (ILD): closer sources present a larger ILD [5], reverberation: distance perception seems to depend on lateral reflections [29], direct-to-reverberant energy ratio: in an environment with reflecting surfaces, sources far from the listener present roughly the same reverberation level whereas the direct sound level decreases proportionally to the inverse of the squared distance approximately [42], head orientation: distance estimations are better when a source is located over the interaural axis [20], familiarity with the environment and the sound source: distance estimation is worse for unfamiliar sound sources [12], and source dullness (i.e., sharpness [17]): far sources are duller due to the high-frequency absorbent effect of the air [13]. When either the sound source or the listener is moving, temporal intensity variation and Doppler shift (i.e., apparent changes on the pitch-height of a source) also contribute to distance estimation (approaching sources have a positive Doppler shift accompanied by intensity increments) [30]. There is no consensus on the salience of monaural, binaural, stationary, and dynamic cues but a combination of them seems to be the mechanism for judging distances. One of the challenges of new augmented audio reality for position-aware services is to improve distance reproduction. Our group has been working in this field and has developed an HRIR (head-related impulse response) filter for creating virtual auditory images via headphones [41], but there is still invitingly sufficient
room for new improvements in this field! 2.2 Orientation tracking Unless a listener needs to be aware of second-order speaker attributes such as directivity, orientation sharing can be deferred until such subtle effects are expected. For music, it is not currently important that a soundstage not rotate with a listeners head if they’re wearing headphones. However, the listener’s direction (bearing, heading, azimuth, yaw, orientation, angle) must be monitored to properly render periphonically distributed sources. 3 S TEREOTELEPHONY “Stereotelephony” means putting multichannel audio into a realtime communication network, using stereo effects in telephones. For example, decibel 151 [25, 39] is an art installation and a music interface that uses spatial audio technology and ideas of social networking to turn individuals into walking soundtracksas they move around each other in a shared real space and listen to each other in a shared virtual space. As audio codecs improve and bandwidth increases in internet telephony (such as Skype’s SILK or Google Talk’s Speex [www.speex.org] from Xiph.org), asymptotically high-fidelity, high-definition sound, will become increasingly: broadband with high sampling rate for full spectrum, broadly dynamically ranged carrying appropriate bit depth, clear (with minimal noise) transparent— uncolored by artifacts of sampling, quantization, codecs, or network transmission— and therefore effectively indistinguishable from “live” events, and persistent always available, “anytime anywhere” 360/±90: around the clock around the world).
(24/7—
the avatars only relatively, using angular displacement instead of absolute azimuth. 3.2 Multipresence Increasing fineness of communication, from journaling through microblogging to life-streaming, suggests desirability of realtime communication via persistent channels for media streams, especially audio. Independence of location and orientation can be exploited to flatter auditory localization. For instance, a user might designate multiple avatars as “self,” individually displaceable or repositionable, but sharing an orientation, absolute or relative. Such orientation could be derived from the position of a human subject— the bearing of a one’s seat via a chair-tracker [15, 16, 21], the direction of a motion platform [36], or a vehicle [40]— or a subject’s head, via such technology as Microsoft’s Kinect, a set-top console accessory that uses near-infrared cameras and software to follow gamers’ motions. The advantage of separating translation and orientation is that directionalizability is preserved even across multiple frames of reference. For example, one might configure a soundscape corresponding to one’s office and another corresponding to one’s home. In Figure 4, a single user has delegated multiple avatars, akin to bookmarks, to browse a world music collection [34, 35]. These avatars can be collectively coupled with a real-life rotary motion platform [21], like that shown along with its “digital analog” in Figure 5. Turning in one’s swivel seat or one’s car can rotate (but deliberately not translate) multiple sinks, maintaining consistent proprioceptive sensation, even across forked doppelg¨angers. A technique for the integration of such multiple identities is explained in [9]. 4
C ONCLUSION
Stereotelephony encourages directional or spatial sound. Spatialization can be done in a peer-to-peer style (“P 2 P”) by multimedia processors directly exchanging streams at the edge of the network [10, 11] or in a client-server style, letting network servers push preprocessed data a thin clients, as in network (cloud-based) audio engines such as the Wonderland “DarkStar” voice bridge or the Asterisk (www.asterisk.org) PBX. Client/server 3D Voice engines include Vivox (vivox.com) (used in Second Life), Dolby Axon (dolby.com), and Mumble (mumble.sorceforge.net) [37]. Differentiated spatialization is indicated [37], directly spatializing predetermined clips or loops (locally stored or synthesized) and mixing such multichannel signals with networked-delivered multichannel streams. Even though a conferencing media server might be called something like a “voice bridge,” in acknowledgment of tuning for speech, it can still be applied to other kinds of audio sources, albeit with perhaps limited fidelity. For instance, radio-quality music, sound effects (SFX), and auditory icons [4], [26] can all be streamed on the net. 3.1 Audio Windowing and Narrowcasting Such resources present an over-tempting invitation to abuse, or at least a potential for sensory overload. Audio windowing treats soundscapes as separate but compositable [3], as formalized by the expressions in Figure 3. Soundscape composition is encouraged by multipresence. A conference might be arranged in a configuration resembling the desks of the respective colleagues in an office, irrespective of the actual positions of the session joiners. Soundscapes can be combined simply by summing, although in practice some scaling (amplification, attenuation), normalization, equalization might yield more user-friendly results. To make the composition manageable, some sources might be partially muzzled or totally muted; some sinks might be partially muffled or totally deafened [6]. Relaxedly shared position data might be filtered to adjust
Figure 6: Localized Beacons and Directionalization for Vehicular Wayfinding— Spatial Sound Whereware.
We predict that a standard will emerge to send location information upon setting up an “ordinary” phone call, “POTS” (“plain old telephone service”). Presumably such metainformation is already carry-able by nonPOTS, such as VOIP (Skype, Microsoft Messenger, Google Voice, Apple iChat, etc.), including voice chat for MMORPG (massively multiplayer online role-playing games) and other conferencing systems. S IP systems, which separate signaling side channels from realtime media streams, could easily be configured to convey such information [1]. Applications include work (teleconferencing), play (chat spaces, MMORPGs), utility (way-finding,
Figure 3: Formalization of narrowcasting and selection functions in predicate calculus notation, where ‘¬’ means “not,” ‘∧’ means conjunction (logical “and”), ‘∃’ means “there exists,” ‘⇒’ means “implies,” and ‘⇔’ means mutual implication (equivalence). The suite of inclusion and exclusion narrowcast commands for sources and sinks are like analogs of burning and dodging (shading) in photographic processing. The duality between source and sink operations is tight, and the semantics are identical: an object is inclusively enabled by default unless, a) it explicitly excluded source
sink
sources
sinks
}| { or z z }| { z }| { or z }| { (with mute || deafen), or, b) peers are explicitly included (with select [solo] || attend : confide or harken) when the respective avatar is not. “Privacy” has two interpretations, The first association, with sources, is security of information (preventing leaks and protecting secrets). But a second interpretation, with sinks, means “freedom from disturbance,” in the sense of solitude, protection from disruption, not being bothered by irrelevance, distraction, or interruption. Narrowcasting attributes are not mutually exclusive, and the dimensions are orthogonal. Because a source or a sink is active by default, invoking exclude and include operations simultaneously on an object results in its being disabled. For instance, a sink might be first attended, perhaps as a member of some non-singleton subset of a space’s sinks, then later deafened, so that both attributes are simultaneously applied. (As audibility is assumed to be a revocable privilege, such a seemingly conflicted attribute state disables the respective sink, whose attention would be restored upon resetting its deafen flag.) Symmetrically, a source might be selected and then muted, akin to making a “short list” but relegated to backup.
Figure 4: Multipresence in “Music in Wonderland.” Multiple avatars, associated with and piloted semi-independently by a single human user, distribute across a virtual space or virtual spaces.
(a) Rotary motion platform. www.mechtec.co.jp)
(Developed with Mechtec,
(b) Mixed reality simulation compositing panoramic imagery into dynamic CG. (Model by Daisuke Kaneko.)
e Figure 5: Information furniture: For its haptic output modality, servomotors render kinesthetic force display, rotating each S c hai r under networked control. Note the nearphones straddling the headrest.
situation awareness). For example, as seen in Figure 6, a virtual source can guide a driver around a corner. Exponential improvement of mobile, nomadic, and roaming devices surpasses the better-known Moore’s law, and seem just as inevitable, thanks to standards such as the Universal Mobile Telecommunications System (UMTS) and multiple-input multiple-output (MIMO), which boost bandwidth while maximizing antenna usage [2]. ABC, “always best connected,” suggests possible proliferation of persistent circuits, like intercoms to one’s families or intimate colleagues. “Ontogeny recapitulates phylogeny,” the biological maxim coined by German zo¨ologist Ernst Haeckel, can also describe the development and evolution of media technologies. Vision followed by audition are the human-preferred sensory modalities. As if paralleling the biological recapitulation theory, technical developments on a given sensory modality (technological ontogeny) follow the same path as the evolutionary history of its prevalence (i.e., its technological phylogeny). Nevertheless, perception research indicates that integration of information acquired by different modalities is deeply entangled in the brain, and that one sensory modality percept can modulate that acquired by other modality (i.e., something we hear can affect how we see, and vice versa). In a similar vein, it is relevant to review the French director JeanLuc Godard’s remarks on cinema, who said “Photography is truth. The cinema is truth twenty-four times per second.” This hyperbole can therefore be extrapolated to think of 3 D cinema/vision as hyper-truth and 3 D cinema with spatial sound as u¨ ber-truth, in Nietzsche’s parlance, “the highest state to which men might aspire.” (Of course Nietzsche was not thinking of multimodal, augmented reality systems!) Augmented reality visual interfaces include endocentric (selfreferenced) displays such as egocentric first-person views and teth-
ered second-person views as well as exocentric, third-person views such as a map. They are mutually strengthened by augmented reality auditory interfaces, particularly those featuring position-aware (and not just location-aware) realtime spatial sound, faithfully suggesting direction and metaphorical distance, modulated and composed with audio windowing, narrowcasting, and multipresence. The haptic modality also relates, as proprioception, one’s natural sense of direction and situation awareness, is stimulated by motion such as that inducible by a motion platform, vehicle, or simple locomotion. And we defer consideration of olfaction, gustation, as well as pseudo-senses, such as one’s sense of irony or sense of humor! R EFERENCES [1] S. Alam, M. Cohen, J. Villegas, and A. Ashir. Narrowcasting in SIP: Articulated Privacy Control. In S. A. Ahson and M. Ilyas, editors, SIP Handbook: Services, Technologies, and Security of Session Initiation Protocol, chapter 14, pages 323–345. CRC Press: Taylor and Francis, 2009. 10-ISBN 1-4200-6603-X, 13-ISBN 978-1-4200-6603-6, http: //www.crcpress.com/product/isbn/9781420066036. [2] K. Baker. Torque Kills! Future Control of the Ambient Electromagnetic Spectrum. IEEE Multimedia, 14(1):4–8, Jan.–March 2007. [3] D. R. Begault. 3-D Sound for Virtual Reality and Multimedia. Academic Press, 1994. ISBN 0-12-084735-3. [4] M. M. Blattner, D. A. Sumikawa, and R. M. Greenberg. Earcons and Icons: Their Structure and Common Design Principles. HumanComputer Interaction, 4(1):11–44, 1989. [5] D. S. Brungart and W. M. Rabinowitz. Auditory Localization of Nearby Sources. Head-Related Transfer Functions. J. Acoust. Soc. Am., 106(3):1465–1479, 1999. [6] M. Cohen. Multipresence narrowcasting operations comprise a media meta-mixer exponentiating interface value. In Proc. CIT: Fifth Int. Conf. on Computer and Information Technology, pages 535–542, Shanghai, Sept. 2005. ISBN 0-7695-2432-X.
[7] M. Cohen. Wearware, whereware, everyware, and awareware: Mobile interfaces for location-based services and presence. In A. D. Cheok, editor, ACM SIGCHI MobileHCI: Proc. 9th Int. Conf. on Human Computer Interaction with Mobile Devices and Services, Singapore, Sept. 2007. ISBN 978-1-59593-862-6. [8] M. Cohen, N. A. Bolhassan, and O. N. Fernando. A Multiuser Multiperspective Stereographic QTVR Browser Complemented by Java3D Visualizer and Emulator. Presence: Teleoperators and Virtual Environments, 16(4):414–438, Aug. 2007. ISSN 1054-7460, http: //www.mitpressjournals.org/toc/pres/16/4. [9] M. Cohen and O. N. N. Fernando. Awareware: Narrowcasting Attributes for Selective Attention, Privacy, and Multipresence. In P. Markopoulos, B. de Ruyter, and W. Mackay, editors, Awareness Systems: Advances in Theory, Methodology and Design, HumanComputer Interaction Series, chapter 11, pages 259–289. Springer, 2009. Human Computer Interaction, ISBN 1-84882-476-9, ISBN 9781-84882-476-8, ISSN 1571-5035, E - ISBN 978-1-84882-477-5. [10] M. Cohen and N. Gy˝orbir´o. Personal and portable, plus practically panoramic: Mobile and ambient display and control of virtual worlds. Innovation: The Magazine of Research & Technology, 8(3):33–35, 2008. [11] M. Cohen and N. Gy˝orbir´o. Mobile Narrowcasting Spatial Sound. In Y. Suzuki, D. Brungart, H. Kato, K. Iida, D. Cabrera, and Y. Iwaya, editors, Proc. IWPASH: Int. Wkshp. on the Principles and Applications of Spatial Hearing, Zao, Miyagi; Japan, Nov. 2009. eISBN 978-9814299-31-2, http://eproceedings.worldscinet.com/ 9789814299312/9789814299312_0057.html. [12] P. D. Coleman. Failure to Localize the Source Distance of an Unfamiliar Sound. J. Acoust. Soc. Am., 34(3):345–346, 1962. [13] P. D. Coleman. Dual Rˆole of Frequency Spectrum in Determination of Auditory Distance. J. Acoust. Soc. Am., 44(2):631–632, 1968. [14] A. Daniel, R. Nicol, and S. McAdams. Multichannel Audio Coding Based on Minimum Audible Angles. In AES: Audio Engineering Society Conv. (40th Int. Conf.), Tokyo, Oct. 2010. [15] K. Doi and M. Cohen. Visual affective sensing of rotary chair. In KEIS: Proc. First Int. Conf. on Kansei Engineering & Intelligent Systems, pages 257–258, Aizu-Wakamatsu, Sept. 2006. [16] K. Doi and M. Cohen. Control of navigable panoramic imagery with information furniture: Chair-driven 2.5D steering through multistandpoint Q TVR multinode panoramas. In 3DUI: Proc. 3D User Interfaces Symp. (Poster Demonstration), Charlotte, NC; USA, Mar. 2007. [17] H. Fastl and E. Zwicker. Psychoacoustics: Facts and Models. Springer series in information sciences. Springer, Berlin, 3rd edition, 2007. [18] J. A. Frank Melchior and S. Spors. Spatial Audio Reproduction: From Theory to Production, Part I. In AES: Audio Engineering Society Conv. 129th Conv., San Francisco, Nov. 2010. http://www.deutsche-telekom-laboratories.de/ ˜sporssas/publications/talks/AES129_Tutorial_ Spatial_Audio_Reproduction_Part1.pdf. [19] S. Holland, D. R. Morse, and H. Gedenryd. AudioGPS: Spatial Audio Navigation with a Minimal Attention Interface. PUC: Personal and Ubiquitous Computing, pages 253–259, 2002. [20] R. E. Holt and W. R. Thurlow. Subject Orientation and Judgment of Distance of a Sound Source. J. Acoust. Soc. Am., 46(6):1584–5, Dec 1969. [21] N. Koizumi, M. Cohen, and S. Aoki. Japanese patent #3042731: Sound reproduction system, Mar. 2000. [22] J. M. Loomis, C. Hebert, and J. G. Cicinelli. Active localization of virtual sounds. J. Acous. Soc. Amer., 88(4):1757–1763, Oct. 1990. [23] J. M. Loomis, C. Hebert, and J. G. Cicinelli. Personal guidance system for the visually impaired using GPS, GIS, and VR technologies. In H. J. Murphy, editor, Proc. Virtual Reality and Persons with Disabilities, San Francisco, CA, 1993. [24] K. Lyons, M. Gandy, and T. Starner. Guided by Voices: An Audio Augmented Reality System. In Proc. ICAD, Int. Conf. on Auditory Display, 2000. [25] M. Magas, R. Stewart, and B. Fields. decibel 151. In Proc. SIGGRAPH, New Orleans, 2009. http://www.siggraph. org/s2009/galleries_experiences/information_
aesthetics/index.php. [26] D. K. McGookin, S. A. Brewster, and P. Priego. Audio bubbles: Employing non-speech audio to support tourist wayfinding. In HAID ’09: Proc. of the 4th Int. Conf. on Haptic and Audio Interaction Design, pages 41–50, Dresden, Germany, 2009. Springer-Verlag. [27] D. H. Mershon and L. E. King. Intensity and Reverberation as Factors in the Auditory Perception of Egocentric Distance. Perception & Psychophysics, 18(6):409–415, 1975. [28] D. R. Moore and A. J. King. Auditory perception: The near and far of sound localization. Current Biology, 9(10):R361–R363, 1999. [29] S. H. Nielsen. Auditory Distance Perception in Different Rooms. In Proc. Audio Eng. Soc. Convention 92. Audio Eng. Soc., 3 1992. [30] C. Porschmann and C. Storig. Investigations Into the Velocity and Distance Perception of Moving Sound Sources. Acta Acustica united with Acustica, 95(4):696–706, August 2009. [31] V. Pulkki. Virtual source positioning using vector base amplitude panning. J. Aud. Eng. Soc., 45(6):456–466, June 1997. [32] V. Pulkki, M.-V. Laitinen, J. Vilkamo, J. Ahonen, T. Lokki, and T. Pihlajam¨aki. Directional audio coding— perceptionbased reproduction of spatial sound. In Y. Suzuki, D. Brungart, H. Kato, K. Iida, D. Cabrera, and Y. Iwaya, editors, Proc. IWPASH: Int. Wkshp. on the Principles and Applications of Spatial Hearing, Zao, Miyagi; Japan, Nov. 2009. eISBN 978-9814299-31-2, http://eproceedings.worldscinet.com/ 9789814299312/9789814299312_0056.html. [33] R. Rabenstein and S. Spors. Sound field reproduction. In J. Benesty, M. M. Sondhi, and Y. Huang, editors, Springer Handbook of Speech Processing, chapter 53, pages 1095–1114. Springer, 2008. ISBN 9783-540-49125-5. [34] R. Ranaweera, M. Cohen, and M. Frishkopf. Virtual world music— music browser in wonderland. In iED: Immersive Education Initiative Boston Summit, Boston, Apr. 2010. http://mediagrid.org/ summit/2010_Boston_Summit_program_full.html. [35] R. Ranaweera, M. Cohen, N. Nagel, and M. Frishkopf. (Virtual [World) Music]: Virtual World, World Music— Folkways in Wonderland. In Y. Suzuki, D. Brungart, H. Kato, K. Iida, D. Cabrera, and Y. Iwaya, editors, Proc. IWPASH: Int. Wkshp. on the Principles and Applications of Spatial Hearing, Zao, Miyagi; Japan, Nov. 2009. eISBN 978-9814299-31-2, http://eproceedings.worldscinet.com/ 9789814299312/9789814299312_0045.html. [36] R. Ranaweera, I. Jayasingha, S. Amarakeerthi, C. Karunathilake, and M. Cohen. Event script interpreter for synchronized “roller-cam” graphical display and rotary motion platform. In A. Marasinghe, editor, Proc. HC-2008: 11th Int. Conf. on Humans and Computers, pages 91–98, Nagaoka, Japan, Nov. 2008. [37] B. Seo, M. M. Htoon, R. Zimmermann, and C.-D. Wang. “Spatializer”: A Web-based Position Audio Toolkit. In Proc. ACE, Int. Conf. on Advances in Computer Entertainment Technology, Taipei, Nov. 2010. http://ace2010.ntpu.edu.tw. [38] J. M. Speigle and J. M. Loomis. Auditory distance perception by translating observers. In VR93: Proc. IEEE Symp. on Research Frontiers in Virtual Reality (in conjunction with IEEE Visualization), pages 92–99, San Jose, CA, Oct. 1993. [39] R. Stewart, M. Levy, and M. Sandler. 3D interactive environment for music collection navigation. In Proc. DAFx: 11th Int. Conf. on Digital Audio Effects, Espoo, Finland, Sept. 2008. [40] J. Villegas and M. Cohen. “GABRIEL”: Geo-Aware BRoadcasting for In-Vehicle Entertainment and Localizability. In AES 40th Int. Conf. “Spatial Audio: Sense the Sound of Space”, Tokyo, Oct. 2010. http: //www.aes.org/events/40/. [41] J. Villegas and M. Cohen. Hrir˜: Modulating range in headphonereproduced spatial audio. In VRCAI: Proc. of the 9th Int. Conf. on Virtual-Reality Continuum and Its Applications in Industry, Seoul, Dec. 2010. ISBN 978-1-4503-0459-7, http://vrcai2010.org. [42] P. Zahorik, D. S. Brungart, and A. W. Bronkhorst. Auditory Distance Perception in Humans: A Summary of Past and Present Research. Acta Acustica united with Acustica, 91(3):409–420, June 2005.