HAVE 2005 – IEEE International Workshop on Haptic Audio Visual Environments and their Applications Ottawa, Ontario, Canada, 1-2 October 2005
Haptic-Auditory Distance Cues in Virtual Environments: Exploring Adjustable Parameters for Enhancing Experience Kent Walker & William L. Martens1 1
Sound Recording Area, Department of Theory, Faculty of Music, McGill University 555 Rue Sherbrooke Ouest, Montreal, QC, H3A 1E9, Canada
[email protected]
This paper investigates two subjective responses to a recorded percussive musical instrument sound displayed in a hapticauditory virtual environment; the responses were Apparent Motion Towards (AMT) and Powerfulness. The system employed a wholebody actuator and a full-range multichannel stereophonic sound compliant with ITU standards. AMT was experienced when whole-body vibratory components were slightly delayed in relation to sound components. Furthermore, the recorded percussive hit was initially observed to vary in its apparent power with variation in intermodal delay values and level; participants, therefore, made ratings of Powerfulness. Threshold intermodal delays for experiencing AMT were found to depend strongly upon intermodal level values, but ratings of Powerfulness were found to depend primarily upon whole-body vibration levels. These results can be incorporated into a structure for control response parameters in multimodal environmental display systems incorporating sound and whole-body vibration. Keywords – auditory distance perception, auditory source motion, haptic-auditory interactions, multichannel audio, music reproduction, whole-body vibration
I.
INTRODUCTION
The technology that allows creation of synthetic multimodal environments enables the augmentation of sensory stimulation. Augmented realities can include perception of virtual events that seem more powerful to the observer than authentic reproduction affords. Augmented reality has been extensively discussed as integral to the aesthetics of sound recording [7]. Indeed, if enhancement were not an important part of making high quality sound recordings, producers of recordings would simply generate measurements of musical sound and present them to listeners via headphones or speakers. Even purists engage in the euphonic enhancement of their recordings. In multimodal music reproduction, manipulation of intermodal delay can generate motion percepts with a designer-specified amount of force, analogously enhancing experience. The series of experiments reported here investigated enhancement of perceived auditory distance in a multimodal music display system incorporating speaker playback and a whole-body actuator. Apparent Motion Towards (AMT) is a percept related to auditory distance and is specifically the sensation that an auditory source is moving towards the 0-7803-9377-5/05/$20.00 ©2005 IEEE
observer. While there is a large body of research in the area of auditory distance perception (though less than that on auditory direction perception) [4], this research has largely ignored the reality of interacting modalities. The second percept reported here is Powerfulness, which is related to the percept of loudness; overall sound level is considered to be one of the main cues used in auditory distance perception. In this case, it is possible that if the multimodal stimulus is perceived as being more powerful, it will also be perceived as being closer to the observer. The point of interest here is that both percepts, while related to auditory source distance perception, appear to be a function of interacting modalities, namely, differences in synchrony and level between haptic and auditory components. II. METHODS The system in question incorporated a commercially available motion simulation platform capable of movement in 3 degrees of freedom and a calibrated multichannel stereophonic sound system compliant with ITU standards with full range capabilities at all standard angles (-110, -30, 0, 30, and 110 degrees relative to the median plane) [6]. In this study, only foot to head translation vibration was used. Synonyms for this type of whole-body vibration include: heave, longitudinal, vertical, headward/tailward, and upward/downward [8]. All listening tests were conducted in a treated room. The vibrating platform was outfitted with a comfortable soft-cushioned chair and participants placed their shoes directly on the platform. All stimuli were generated from a monophonic recording of a single 20-inch kick-drum hit, struck with a hard plastic beater. A low-frequency channel, with content below 50 Hz, was created and split between the vibrating platform and a subwoofer located in front of the observer (at 0 degrees azimuth, and -15 degrees elevation). All high-frequency components were also reproduced by a frontally-located loudspeaker, positioned just above the subwoofer. Intermodal delay steps of 10 ms between the vibratory and auditory components were subsequently created, giving 14 values ranging from –54 ms (vibration early) to +76 ms (vibration late), relative to the measured point of objective simultaneity, the point at which the whole-body and airborne
vibration arrived at participants simultaneously. Envelopes of the objectively measured and aligned airborne and structural vibratory components of the kick-drum stimulus are shown in Figure One.
and asked to rate each along a bipolar scale using a continuous GUI-based slider; the anchoring adjectives at each end of the slider’s scale were “weak” and “powerful”. Four sets of 48 trials were completed by each participant. The sound level again was held constant throughout all trials. III.
Figure One: Filtered envelopes of the measured airborne and structural vibration at the listening position for the stimulus aligned according to the POS. Normalized energy over time and the Cross-Correlation Coefficients (CC) of the envelopes of the low-pass-filtered signals are shown (cutoff frequency was 50 Hz).
RESULTS AND ANALYSIS
Figure Two shows average vibration delay thresholds for AMT at five vibration levels, data having been pooled across all three participants, each of whom supplied 5 settings per stimulus. Results indicate that this percept depends upon both intermodal delay and whole-body vibration levels: as the magnitude of vibration decreases the intermodal delay time required to produce apparent motion increases. Intermodal delay times required to produce this motion percept are tightly grouped after the point of objective simultaneity. For this stimulus, AMT presented itself when the whole-body vibration followed the acoustic component between 6 ms and 66 ms, depending on the level of whole-body vibration. Participants remarked that when AMT was strongest they were more inclined to feel as if they were present with the instruments (as opposed to present in the environment).
For each of these 14 vibratory delay values, stimuli were prepared at 7 vibration levels, decreasing in 3 dB steps from the maximum level that was measured, to give a vertical acceleration RMS value of 1.3 m/sec2. This resulted in a total of 98 possible stimuli. All processing, playback, and testing were conducted using custom MATLAB scripts. The first experiment used an adaptive staircase method to track threshold vibratory delay at which AMT first occurred. Three participants previously trained in haptic-auditory intermodal delay tasks with no known deficiencies participated on a voluntary basis. Each staircase began with a randomly selected delay value within the -54 ms to +76 ms range. Participants were instructed to adjust intermodal delay values to find a delay value at which they experienced the source as moving towards them. Then, by increasing and decreasing the intermodal delay in 10 ms steps, they were asked to find the point at which increasing by 10 ms changed the perceptual response from that of “no motion experienced” to “motion experienced.” Independent sessions were conducted for each of the seven magnitudes of whole-body vibration. The sound level was held constant throughout all trials. In order to test the hypothesis that AMT might be related to estimates of the apparent Powerfulness of the virtuallyreproduced event, a second experimental task was completed by the same participants in which a subset of the same stimuli were presented. Participants were asked to estimate the Powerfulness for each of 48 stimuli (eight delay values increased by a factor of six levels of vibration magnitude). Participants were presented with these stimuli one at a time,
Figure Two: Average vibration delay thresholds for AMT at 5 vibration levels (circular symbols). These values are pooled across three participants, each of whom produced 5 threshold settings. Inter-quartile-range (IRQ) of the settings is indicated with horizontal lines. IQR is the difference between the 75th and the 25th percentiles of the set of settings (Note that the IQR is a robust estimate of the spread of the data, since changes in the upper and lower 25% of the data do not affect it: i.e., it is less sensitive to outliers in the data than is the standard deviation).
Figure Three summarizes the results of the Powerfulness rating experiment. These ratings were submitted to a regression analysis in order to create a smooth response surface suitable for creation of a contour plot. The isoPowerfulness contours describe a surface fit to the average estimates of three subjects presented at 48 combinations of vibration delay and level (indicated by the position of the “+” symbols in the plot). It is clear that Powerfulness can be predicted well in terms of vibration level, and that ratings showed only a mild dependence upon vibration delay (much milder than the experimenters’ initial expectations).
structural vibration is needed to produce iso-Powerfulness. Intermodal coordination is paramount to creating convincing haptic-acoustic virtual environments. Such coordination requires that a given system be able to perform in such a way that it is objectively coordinated. However, providing parameters for control related to synchrony, such as AMT, may provide opportunity for enhancement as well as opportunities for more convincing displays of auditory source distance. VI.
Figure Three: Contour plot based upon regression results for subjective estimates of Powerfulness. The iso-Powerfulness contours describe a surface fit to the average estimates of three subjects presented with 48 multimodal stimuli, defined by the 6 by 8 factorial combination of presented vibration delay and level values, respectively. The location on the plot of these 48 pairings of vibration delay and level values is indicated by the “+” symbols. The lightest (yellow) contour corresponds to a response of 90 on the 100point scale of powerfulness, the next darker contour (light green) to a response of 70, the next darker contour dark green) to a response of 50, and the darkest contour (blue) to a response of 30.
IV.
DISCUSSION
As reported in previous studies, tolerance for intermodal delay is lowest with higher whole-body vibration levels [9]. Tolerances are also greater when vibration follows an acoustic component: the least acceptable intermodal delay is one in which structural vibration of great magnitude precedes an acoustic component. The findings reported here are consistent with these previously discovered tolerances in that AMT was not reported when structural vibration preceded airborne vibration [10]. Also, the shape of the psychometric function is not unlike the shape of the functions of levels of acceptability. It should be noted, however, that differences in the temporal and spectral envelopes of musical stimuli play an important role in intermodal delay detection and subjective simultaneity ratings [4, 5]. It is quite likely that with a less transient stimulus, AMT would be much less or not present. V. CONCLUSIONS AMT is a haptic-auditory percept that causes sources to appear to move closer when structural vibration slightly follows airborne vibration. With this percussive stimulus AMT was present for intermodal asynchrony between 6 ms and 66 ms and is associated with later intermodal delay values as structural vibration is decreased. Implementation of a parameter to control AMT in a multimodal display system can be accomplished by slightly delaying structural vibration in relation to airborne vibration depending on intermodal level differences. In contrast, Powerfulness varies mainly according to changes in vibration level, although as structural vibration is delayed in relation to sound, slightly less
ACKNOWLEDGEMENTS
The authors would like to thank Sungyoung Kim for creating MATLAB scripts for the experiment. This research was supported in part by Valorisation-Recherche Québec and the Centre for Interdisciplinary Research in Music Media and Technology. REFERENCES [1] Davide F., Riva G., & Ijsselsteijn, W.A. (Eds), Being There: Concepts, Effects and Measurement of User Presence in Synthetic Environments, Ios, Amsterdam, 2003. [2] Martens W.L. & Woszczyk W., “Guidelines for Enhancing the Sense of Presence in Virtual Acoustic Environments,” in Proc. 9th Int. Conf. on Virtual Systems and Multimedia, pp. 306-313, Montreal, QC, Oct. 15-17, 2003. [3] Shinn-Cunningham B., “Distance Cues for Virtual Auditory Space,” in Proceedings of the First IEEE Pacific-Rim Conference on Multimedia, pp. 227-230, Sydney, Australia, December 2000. [4] Walker K. & Martens W.L., “Sensitivity to inter-modal asynchrony between acoustic and structural vibration,” in Proceedings of 149th Meeting of the Acoustical Society of America, volume 117, p. 2392, Vancouver, Canada, May 2005. [5] Martens W.L., Walker K., & Woszczyk W., “Tolerance for delay between whole-body vibration and audio reproduction of musical sound,” in Proceedings of the Twelfth International Congress on Sound and Vibration, Lisbon, Portugal, July 2005. [6] International Telecommunications Union, Recommendation BS.7751, Multichannel stereophonic sound systems with and without accompanying picture, 1994. [7] Dickreiter M., Tonmeister Technology: Recording Environments, Sound Sources, and Microphone Techniques, Temmer Enterprises Inc, 1989. [8] Griffin M.J., Handbook of Human Vibration, Elsevier Academic Press, London England, 1990. [9] Martens W.L., “Human centered design of acoustic and vibratory components for multimodal display systems,” in Proceedings of the Canadian Acoustical Association's "Acoustics Week in Canada” Conference, Ottawa, Ontario, Canada, 2004. [10] Martens W.L., & Woszczyk W., “Psychophysical Calibration of Whole-body Vibration in the Display of Impact Events in Auditory and Haptic Virtual Environments” in Proceedings of Haptic Audio Visual Environments, 2004, Ottawa, Ontario, Canada, 2004.