19th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007
AUDITORY-VISUAL PERCEPTION OF ROOM SIZE IN VIRTUAL ENVIRONMENTS PACS: 43.55.Hy Larsson, Pontus1; Väljamäe, Aleksander1 1 Applied Acoustics, Chalmers University of Technology, SE-41296 Göteborg, Sweden; pontus.larsson,
[email protected]
ABSTRACT It is generally believed that the effectiveness of Virtual Environments (VEs) relies on their ability of faithfully reproducing the multisensory experience of the physical world. An important aspect of this experience is the perception of size and distance. In e.g. an architectural application it is of course of great interest that the user gets the correct impression of room size. However, considering visual perception VE it is yet not fully understood what system parameters control room size. Some investigations on auditory distance perception have been carried out, but there is also an obvious lack of research concerning auditory room size perception. In addition, it is far from understood how audition and vision interact when sensing an indoor environment. The current paper reviews an experiment aimed at exploring aspects of auditory, visual and auditory-visual room size perception in VEs. In line with previous research, it is found that people in general seem to underestimate room size when exposed to a visual VE. It is also shown that there seems to be a tendency to overestimate room size in auditory VEs, and finally that the combination of auditory and visual stimuli allows for a more accurate room size perception.
INTRODUCTION Recreating the sensation of spaciousness of an environment using Virtual Reality (VR) technology is not an easy task. Imagine for example the solemn sensation of entering a large concert hall or the claustrophobic feeling of being trapped inside a small closet – such situations can hardly be conveyed properly using today’s technologies and knowledge. In the case of recreating the large concert hall, it is likely that if the medium is purely visual, people will perceive the hall as being much smaller than what is intended. Research in the area of visual spatial perception does currently not give a clear consensus on which parameters control room size percepts, and corresponding research in the auditory domain is relatively sparse. Moreover, it is likely that in order to achieve a full understanding of how we perceive the size and spaciousness of rooms, one has to consider all sensory inputs and the cross-modal effects that may arise between them. With such knowledge however, one could possibly enhance current visual technologies in a relatively cost-efficient manner by adding the proper room acoustics, which in turn may greatly improve end users’ general sensation and sense of being “inside” the Virtual Environment (VE). The current paper aims at exploring the potential auditoryvisual cross-modal effects in virtual room size perception as well as the individual modalities’ ability to induce a proper room size perception.
BACKGROUND There is a rather large body of research which deals with auditory spatial perception and auditory distance perception. When it comes to room size perception, the amount of research in the auditory domain is limited - although auditory room size seems to be closely linked to the above-mentioned topics. Regarding cues to auditory room size perception, is seems intuitive that reverberation time should be one of the main controlling factors. Of course, reverberation time is not related only to room size but also to the average absorption in the room, so two rooms of very different size can have the same reverberation time. If similar room furnishings and room surfaces are used, one can however expect that reverberation time increases with room volume [1]. The direct sound to reverberant level ratio is another factor which is linked to
room size, in the way that bigger rooms usually have a lower reverberant field level. Hameed et al. conducted a study of how the reverberation time and the direct to reverberant ratio affects room size perception [2]. In their listening tests Hameed et al. used speech stimuli spatialized using a 3D 16-loudspeaker setup in an anechoic room. Both early reflections and late reverberation were simulated. The results showed that reverberation time is the most dominant cue in room size perception, and that the direct to reverberant ratio had very little, if any, effect. Hameed et al somewhat surprisingly suggest that the early reflections do not contribute significantly to room size perception. The timing and level of early reflections is nonetheless something which physically relates very well to room size, and since they seem to contribute to distance perception they should also contribute to room size perception [1]. Work in progress in our research lab does confirm that when early wall reflections are delayed, people perceive the room as being bigger. This is also supported by the work of Cabrera et al [3,4] where early-late energy measures (C-80, C-50/D-50) are shown to correlate negatively to room size perception; i.e. the lower the clarity, the bigger the room is perceived. So far, there is little evidence that spatial distribution measures such as IACC and LEF are correlated to room size perception, but it seems reasonable that such measures should have some relation to room size perception since e.g. IACC tend to increase as room volume increases [1]. Moreover, the low frequency response may be of importance since smaller rooms often have more pronounced resonance peaks in the audible low frequency range. Ongoing research in our lab does however suggest that such low frequency cues are perceptually rather weak in providing room size information. Visual perception of room size Visual spatial perception is a complex and mature field of research and a complete account of the research within this area would be out of the scope of this paper. However, we may focus on some interesting findings related to size and distance perception in VEs. One of the most prominent findings is that virtual worlds often appear smaller than intended [5]. Results from studies using egocentric distance estimation paradigms support this common belief (see [5, 6] and references therein); egocentric distance in VEs, the distance from observer to a target, is generally underestimated. However, it is currently unclear which visual simulation parameters determine distance (and room size) perception. Texture is by some believed to be a strong cue to distance [7], and the lack of a continuous textured surface may cause inaccurate distance judgments in the real world [8]. It is not entirely clear to which extent or in what way textures influence distance perception in VEs [9] and there is research show that, for example, correct distance estimation is not contingent on the amount of textures [5]. Field–of–view (FOV) is another parameter which intuitively seems to affect size and distance perception in VEs. Nonetheless, a recent study using Head Mounted Displays (HMDs) concludes that FOV does not affect distance perception [6]. There is however research that indicate that eyepoint height affect perception of distances and dimension in VEs, in that eyepoint height is negatively correlated with distance perception [10]. That is, the higher the eyepoint height, the lower the estimations of distances. It therefore seems necessary to adapt and fix the eyepoint height to the user in order for the visual spatial percept to be correct. One may conclude that visual size and distance perception in VEs (and in the real world) is a complex topic and that there are no solid consensus regarding which VE system parameters control visual spatial perception. Cross-modal effects So far, we have considered audition and vision in isolation, but it is a well-known fact that the human perceptual system involves complex integration and interaction between stimuli reaching the different sensory modalities. There is a large body of research based on rather simple stimuli (such as flashes and beeps) supporting that interactions between visual and auditory input occur already on a low level of processing (see e.g. [11]). In the area of room acoustics, scholars have only recently begun to acknowledge the fact that perceptual processes are inherently multimodal. To the authors’ knowledge, the first systematic study in this field was performed by Nathanail and Lavandier [12], who investigated the relation between visually and auditorily perceived distance to a sound source in a virtual concert hall. In their experiment, participants rated auditory apparent distances in a sound-only condition and in two auditoryvisual conditions having different visual distances to the stage / sound source. The results from this study indicate that auditory apparent distance is to some extent influenced by visual distance; auditory distance judgments were lower when the stage appeared close and higher 2 th
19 INTERNATIONAL CONGRESS ON ACOUSTICS – ICA2007MADRID
when the stage was distant. Thus, this may be seen as an effect of “distance ventriloquism" and suggests a dominance of vision over audition in spatial-domain percepts. The existence of distance ventriloquism is further substantiated by Brown et al. [13], where a similar effect was found for more basic types of stimuli. Cross-modal effects in room size perception is yet a rather unexplored area. Larsson [14] concludes from experiments using VR and real environments that there are visual influences on room size perception in both cases, but that this effect may be greater in real environments.
EXPERIMENT We hypothesize that combined auditory and visual representation of a room will give a more correct perception of the virtual room’s size compared to when only one modality is stimulated. Participants’ perception of room width (in meters) will be used as primary measure of room size since we believe that it is easier to assess width than volume in absolute units. Furthermore, room length cannot be used in this case since this would imply that participants more or less rate distance to sound source instead of room size. Stimuli Auditory stimuli were created with CATT-Acoustic. Using this software, three rooms with the dimensions (4 x 7 x 3) m, (15 x 20 x 15) m and (25 x 60 x 30) m were modelled. Source and receiver were placed in the opposite ends of the rooms. To enable multichannel loudspeaker playback, the B-format receiver option in CATT-Acoustic was used. Thus, for each room, four impulse responses were simulated corresponding to the four WXYZ-channels in the B-format. Reverberation times and C80 (Clarity) values based on the W-channel (omni) room impulse responses are shown in Figures 1 and 2. The impulse responses for each room were then transferred to a Lake Huron CP4-system where they were convolved in real time with anechoic sound (a trumpet sound and a cello sound played in succession) using the Convolver application. The resulting signals were then decoded for the loudspeaker array (see “Participants, instrumentation and procedure” below) using the Speaker Decoder application.
100 C80 (%)
80 4m width
60
15m width
40
25m width
20
50 0 10 00 20 00 40 00 80 00
25 0
63 12 5
0
Frequency (Hz)
6 5 4 3 2 1 0
4m width 15m width
50 0 10 00 20 00 40 00 80 00
25 0
12 5
25m width
63
Reverberation time (s)
Figure 1: Clarity (C80) values for the three simulated rooms.
Frequency (Hz)
3 th
19 INTERNATIONAL CONGRESS ON ACOUSTICS – ICA2007MADRID
Figure 2: Reverberation time values for the three simulated rooms.
The visual stimuli, shown in Figure 3, were created in CATT 3D viewer and presented using large screen projection (see “Participants, instrumentation and procedure” below). As can be seen in Figure 3, the two larger rooms have a somewhat church-like appearance, while the small room is a simple shoebox shaped room. Participants were instructed that they are standing with their back close to the rear wall in each room and that the sound is coming from the wall opposite to this rear wall.
Figure 3: The three images used as visual stimuli in the experiment. From left to right: Small room (4 x 7 x 3 m), Medium room (15 x 20 x 15 m) and Large room (25 x 60 x 30 m).
Participants, instrumentation and procedure Fifteen participants (8 female) in the ages 19-64 took part on voluntary basis. The experiment was conducted in the anechoic chamber at the Department of Applied Acoustics, Chalmers University of Technology, which has the dimensions 10 x 10 x 8 m (see Fig. 4). Auditory stimuli were reproduced using an Ambisonic loudspeaker array consisting of five Genelec 8030A active monitors. Visual stimuli were presented by PowerPoint on a 248x205 cm sound transparent screen (Euroscreen One Acoustic) using an ASK M3 projector mounted above/behind the participant.
Figure 4: The anechoic chamber used in the experiment. The screen, the chair in which the participants sat in and one of the five loudspeakers can be seen.
As primary measure, participants were asked to rate the perceived width of the simulated room (in meters). To give participants a reference frame for their ratings, they were informed about the actual width of the screen and of the anechoic chamber before the experiment started. Two additional tasks/conditions were also used in the main experiment where participants were asked to select the most appropriate room acoustic simulation to a given visual room and, vice versa, the most appropriate visual to a given acoustic simulation but the results from that task is reported elsewhere [15].
4 th
19 INTERNATIONAL CONGRESS ON ACOUSTICS – ICA2007MADRID
Participants were briefed verbally about the test procedure, instructed on the use of the measures (assessing the perceived width of the room and selecting the most appropriate visual/acoustic room simulation) and a short training session was performed before the experiment started. After playback of each sound excerpt and / or presentation of a picture, participants were asked to give their ratings. All participants performed each subpart of the experiment in succession (visual condition, auditory condition and auditory-visual condition). After completing the experiment, participants were debriefed and thanked for their participation. Results First, to determine the overall effects, participants’ ratings of room width were submitted to a 3 (Room size) x 3 (Visual, Auditory and Auditory-Visual) within-participants ANOVA. Bonferroni’s method was used to adjust for multiple comparisons. The analysis firstly showed that participants were able to discriminate between the three room sizes; statistically significant differences in width ratings were found between all room sizes (M= 5.99, SE= 0.581 for the small room vs. M= 13.2, SE= 2.25 for the medium room and M= 19.0, SE= 3.26 for the large room, p< .01 for small vs. medium/large and p