Stereo Vision for Computer Graphics: The Effect that Stereo Vision has ...

Stereo Vision for Computer Graphics: The Effect that Stereo Vision has on Human Judgments of Visual Realism C. H. Lo1, and A. Chalmers1 1 Univeristy of Bristol provide a powerful source of information about 3-D scene structure, and alone are sufficient for depth perception.

Abstract Stereo vision is a fundamental part of our human visual system and yet this aspect is largely ignored when rendering high quality images. This paper investigates whether viewing in stereo could affect our judgements of the visual realism of computer-generated images. Three experiments concerning the visual factors of illumination directions, light source numbers, and view point position have been carried out to study the different responses between two groups of human subjects. Two groups of subjects had different conditions of viewing. One viewed stereoscopic pairs of images (different views for each eye) while the other group viewed identical pairs of images (same view for each eye). Subjects were asked to respond verbally with “Real” or “Not Real” to each visual stimulus presented. The data recorded in the experiments was the subjects’ response time. The results from the three experiments all reveal that under the stereo viewing condition, the subjects took more time to make the decision on the visual realism of the synthetic images. This outcome provides insight into how we should consider rendering highly realistic images of real scenes as opposed to images that simply approximate a photograph of that scene.

Advanced rendering algorithms such as ray tracing and radiosity have been developed to simulate the physical lighting conditions in the real world, providing some convincing looking results. However, since the output of such techniques is almost always a 2D image, these results are thus closer to a photograph of the scene rather than the real scene itself, and indeed a photograph may not be a faithful representation of reality [25]. The aims of this research is to investigate the impact binocular stereopsis has on our ability to judge the realism of computer generated images, and show how the insights from such investigation could be beneficial to conventional rendering techniques. Three visual factors, illumination directions, light source numbers, and view point variances have been chosen for the experiments. The hypothesis of the study is: Viewing in stereo will affect our judgement of the visual realism of the computer-generated images.

Keywords: Visual Perception, Stereo Vision, Realistic Computer Graphics.

1 Introduction

1.2 Structure of the paper The paper is organized as follows. The next section describes the nature of stereo vision and binocular disparity. We then discuss previous work in computer graphics related to stereo vision, followed by a review of some studies concerned with measuring visual realism of computer-generated images. Section 3 describes the experimental design and procedures, including the relevant equipment and software that has been used. Section 4 reports the results of the experiments. It is divided into three sub-sections containing results respectively for the three experiments. Section 5 is the discussion of the outcome of the experiments, including their possible implications and limitations. The final section draws conclusions and suggests future work.

1.1 Overview From the pair of 2-D images formed on the retinas, the brain is capable of synthesizing a rich 3-D representation of our visual surroundings. The horizontal separation of the two eyes gives rise to small positional differences, called binocular disparities, between corresponding features in the two retinal images. These disparities 1

On the other hand, the outputs of traditional synthetic images, and indeed photographs, are predominantly shown on a 2D display. With two eyes receiving identical visual stimulus from the single image on the 2D computer screen, we perceive a flat image rather than a real scene. Moreover, most of the studies in 3D computer graphics have focused on monocular visual cues such as shading, texturing, and shadowing to provide depth information.

{ cheng | alan }@cs.bris.ac.uk

Copyright © 2003 by the Association for Computing Machinery, Inc. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions Dept, ACM Inc., fax +1 (212) 869-0481 or e-mail [email protected]. © 2003 ACM 1-58113-861-X/03/0004 $5.00

109

2 Background

2.2 Psychological Investigation

2.1 Binocular Disparity

Recently more and more psychophysical experiments have been incorporated into computer graphics research. Many researchers believe that extensive previous work done in visual perception identifying perceptual criteria for perceiving realism can be used to help develop more efficient and accurate perceptually guided rendering algorithms. Several studies have been proposed to establish frameworks for doing such psychophysical experiments when comparing different computer graphics images with each other, or with photographs and even with the real scene they are attempting to portray.

Animals, including humans, with overlapping visual fields have stereoscopic information available to them from a comparison of the images obtained at the two eyes. Each eye sees a slightly different view of the world due to the horizontal separation of the two eyes. Figure 1 shows how the geometry of binocular vision gives rise to slightly different images in the two eyes. If the two eyes are fixating on a point P, then the images cast by P fall at the centre of the fovea in each eye. Now consider a second point Q. If the images of Q fell (say) 5 degrees away from the fovea in both eyes we should say that Q stimulated corresponding points in the two eyes, and that Q had zero disparity. If instead the image was located 6 degrees away from the fovea in one eye but 5 degrees away in the other, we should say that Q stimulated disparate or noncorresponding points and that Q produced a disparity of 1 degree. In general, if Q’s image falls Į degrees from the fovea in the left eye and ȕ degrees from the fovea in the right eye than the binocular disparity is (ȕ - Į), measured in degrees of visual angle. The amount of disparity depends on the physical depth (d) of Q relative to the fixation point P. In fact, disparity is approximately proportional to this depth difference divided by the square of the viewing distance (v). Thus disparity increases with the amount of depth, but decreases rapidly with increasing viewing distance.

Q Depth, d

A P

Į

B

ȕ

Uncrossed disparity

Left View

Servos et al [35] examined only stereopsis and found binocular vision to be significantly important for distance judgments for a grasping task. Hubona et al [17] compared the effectiveness of stereo, cast shadows, and background images for distance judgments for 3D inferencing. However, these studies examined depth cues for tasks other than the scene realism studied in this paper. The set of depth cues that are important for each task varies depending on the type of task involved. Most importantly, it is unclear if any of these previous findings will hold when considering the visual realism of a 3D scene. There are few other recent works in the computer graphics literature dealing with human visual perception related to visual realism measures. Rushmeier et al [31] proposed a framework that applied perceptually-based image metrics to differentiate between a pair of images, in order to evaluate the accuracy of synthetic renderings of real-world scenes. In Thompson et al [15], shadows and other visual cues are tested against subjects’ ability to discriminate properties such as object orientation or proximity. Horvitz [16] measured subjects’ response on various settings of image quality, to optimise the computation to develop an efficient renderer. Mcnamara et al [25] compared computer-generated images with real, physical scenes (viewed directly) to evaluate the perceptual fidelity of the renderings, in a manner similar to Meyer et al [26]. Rademacher et al [34] presented an experimental method that directly asked participants whether an image is real (photographic) or not real (CG),

Viewing Distance, v

ȕ

Į

Only recently have computer scientists started to look at depth perception when viewing renderings of 3D environments. Wanger et al. [37] compared the effectiveness of six different depth cues for three simple tasks, finding that for each task a different combination of depth cues was important. Kjelldahl [22] studied a different set of depth cues and found illumination and object placement were the most effective cues for conveying relative depth information. However, both of these studies did not compare their depth cues to stereopsis.

Right View

Figure 1

110

and a number of experiments were conducted to explore several visual factors, including shadow softness, surface smoothness, number of objects, variety of object shapes, and number of light sources. However, none of these perceptually-based approaches has taken binocular visual factors into account.

3 Method 3.1 Visual Stimulus

Fig. 2, ten sample images of the visual stimulus, with ten different illumination directions.

Fig. 3, five sample images used for the visual stimulus, each scene is illuminated with different light source numbers. They are one to five light sources from the left to the right.

Fig. 4, six sample images with the same scene and were rendered from six different viewing positions. They are 15, 30, 45, 60, 75, and 90 degree to the unique pivot point located at the centre ground among the objects.

111

Fig. 5 Interlaced image with two identical views

Fig. 6, Interlaced image with different views, one for each eye.

112

The images used for visual stimulus were created using the physically based 3D modelling and lighting simulation system, Radiance [38]. To eliminate the effect of other variables, the images only contained simple objects and were rendered in black and white without any texture mapping. While the visual factors we are interested in are changed over images themselves, the geometry and the spatial arrangement of the objects remained fixed.

The visual stimuli were displayed on a 19-inch CRT monitor with refresh rate set to 120 MHz to reduce the flicker effects caused by the shutter glasses as much as possible. The resolution of the interlaced images was 640x640 pixels and displayed at the centre of the screen with the resolution set to 1024x768. These settings remained the same for all experiments. Both groups of subjects were asked to wear shutter glasses throughout the experimental process. By doing

There are 21 visual stimuluses in total for each group. They are categorised into three visual factors of the scene: Illumination directions (10 images, see Fig. 2), Light source numbers (5 images, see Fig. 3), and View points (6 images, see Fig. 4). For the group with the stereo viewing conditions, all of the stimuli were generated in pairs with a different view point in the horizontal direction in order to obtain the left and right view for each eye. For the group without the stereo viewing condition, only one view of the scene was rendered and then duplicated to make a pair containing two identical views. After rendering the images, each pair of the images was taken into the Stereo imaging software (We used “3D combine”) to make an interlaced stereo image (see Fig. 5 and 6) to be viewed through the shutter glasses. Fig. 7: The experimental setup

3.2 Subject and Grouping

so we reduced two factors which could possibly have affected the experiment results. Firstly, all of the possible side effects caused by the shutter glasses themselves, such as the light being reduced in half resulting dimmer images or the ghost flickers caused by low refresh rate of the display, would be the same for both groups. Secondly, the subjects could not guess whether they would be viewing stereoscopic images or not.

There were a total of 40 subjects participating in the experiment. They were recruited from the students taking the computer graphics course in the Department of Computer Science at the University of Bristol. The subjects thus had general knowledge about 3D computer graphics. They all had normal or corrected-to-normal vision. The 40 subjects were divided into two groups; one was for the condition of viewing the visual stimuli in stereo while the other group for viewing the visual stimuli generated from the two identical images. Both groups of subjects wore shutter glasses throughout the experiments (See Fig. 7). The shutter glasses achieve stereo by using frame sequential techniques. Shutter glasses enable the subjects to see line-interlace stereoscopic images. The glasses alternately "shutter," i.e. block, the viewer's left, then right, eyes from seeing an image. The stereoscopic image is alternatively shown in sequence left-image, right-image synchronously with the shuttering of the glasses.

3.3 Experiment Procedures To run the experiments we used a standard Pentium IV PC running Presentation, a Windows-based program for stimulus delivery and experiment control. It enables the experimenter to randomly present the visual stimuli to the subjects. In addition the system is able to record the timing of the subjects’ response.

113

The experiments took place in a dark room with no additional light sources other then the CRT monitor, used for displaying the visual stimuli. Participants carried out the experiments individually, figure 7. The experimenter controlled the presence of the visual stimuli. Subjects were asked, in clear instructions, to report “Real” or “Not Real” verbally to the experimenter when a visual stimulus was shown on the screen. The experimenter then input the response to the Presentation program immediately by using the pre-programmed mouse keys. The first experiment carried out was illumination directions, and the next one was light source numbers, followed by view point variances. A few pre-trials were carried out in order for the participants to get familiar with the whole experiment process. All of the images in each experiment were presented on the screen in a random order.

4 Results

ˊ˃˃˃˃

4.1 Illumination Directions VS Stereo The result of the first experiment is shown in Diagram 1. It consists of a bar-chart with the horizontal axis representing the ten different illumination directions, and the vertical axis represents the subjects’ average response time (in milliseconds). The grey bar represents the average response time computed from the subjects’ performance in the group with stereo viewing condition. The hatched bar shows the results for the group viewing without stereo.

ˉ˃˃˃˃ ˈ˃˃˃˃

˔̉˸̅˴˺˸ʳ˥˸̆̃̂́̆˸ʳ˧˼̀˸ʳʻ̀̆ʼ

The data that was actually collected in the experiments was the subjects’ response time. The subjects’ response time was recorded in milliseconds by the “Presentation” program. The analysis of the data is described in the following sections.

ˇ˃˃˃˃ ˆ˃˃˃˃ ˅˃˃˃˃ ˄˃˃˃˃

ʳʳʳʳʳˡ̂́ʳ˦̇˸̅˸̂ ʳʳʳʳʳ˦̇˸̅˸̂

˃ ˟˄

˟˅

˟ˆ

˟ˇ

˟ˈ

ˡ̈̀˵˸̅ʳ̂˹ʳ˟˼˺˻̇ʳ̆̂̈̅˶˸̆

Diagram 2

Again, right across the different light-sourcenumber settings, the results showed that the participants took more time to decide whether they were viewing real or computer-generated images when the visual stimuli are stereoscopic images.

ˊ˃˃˃˃ ˉ˃˃˃˃

˔̉˸̅˴˺˸ʳ˥˸̆̃̂́̆˸ʳ˧˼̀˸ʳʻ̀̆ʼ

ˈ˃˃˃˃

4.3 View Point Variances VS Stereo

ˇ˃˃˃˃

Diagram 3 illustrates the results of the third experiment. Here the horizontal axis represents variances in view points.

ˆ˃˃˃˃ ˅˃˃˃˃ ʳʳʳʳʳʳʳˡ̂́ʳ˦̇˸̅˸̂

˄˃˃˃˃

ˊ˃˃˃˃

ʳʳʳʳʳʳʳ˦̇˸̅˸̂

˃ ˗˄

˗ˆ ˗˅

˗ˈ ˗ˇ

˗ˊ ˗ˉ

˗ˌ ˗ˋ

ˉ˃˃˃˃

˗˄˃

ˈ˃˃˃˃

Diagram 1

As can be seen in Diagram 1, the subjects generally took more time to make a judgement about whether the image they were viewing was real or not. This phenomenon is repeated across all the different illumination directions in the scene.

˔̉˸̅˴˺˸ʳ˥˸̆̃̂́̆˸ʳ˧˼̀˸ʳʻ̀̆ʼ

˜˿˿̈̀˼́˴̇˼̂́ʳ˗˼̅˸˶̇˼̂́

ˇ˃˃˃˃ ˆ˃˃˃˃ ˅˃˃˃˃ ˄˃˃˃˃

ʳʳʳʳʳʳˡ̂́ʳ˦̇˸̅˸̂ ʳʳ˦̇˸̅˸̂

˃ ˩˄ˈ

4.2 Light Source Numbers VS Stereo

˩ˆ˃

˩ˇˈ

˩ˉ˃

˩ˊˈ

˩ˌ˃

˩˼˸̊ʳ˔́˺˿˸ʳʻ˷˸˺̅˸˸ʿʳ˹˼̋˸˷ʳ̃˼̉̂̇ʳ̃̂˼́̇ʼ

Diagram 2 shows the results from the second experiment, light sources numbers with or without stereo. Again, right across the different light-source-number settings, the results showed that the participants took more time to decide whether they were viewing a real or computergenerated images when the visual stimuli are stereoscopic images.

Diagram 3

Of the six different view-point settings, the averages of subjects’ response times were again all found to be higher in the stereo-viewing group.

5 Discussion The findings of the experiments have confirmed that binocular visual factors do indeed affect our judgments of the visual realism of computer generated images. This

114

is somewhat expected as such stereo viewing is closer to our real-life viewing experience and as such should be considered when developing more realistic rendering techniques. Table 1 shows the results of the Independent-sample T tests used to compare the mean results from all the experiments. As can be seen, there is a significant difference between the results from the stereo and nonstereo viewing conditions. degree of freedom

Rather than experimenting with specific tasks that involved depth perception as presented in several previous studies, this paper has emphasized the visual realism related to stereovision. The findings have supported the initial hypothesis of this study. This therefore serves as a preliminary study to bridge the gap between photorealism and reality. Moreover, from a computational point of view, by perceiving real depth from binocular disparity we might be able to reduce the computation of monocular depth cues such as shadows, while maintaining the same perceptual response.

p (two-tail test)

Illumination directions

398

.0001

Light Source Numbers

198

.047

View point variances

238

have mainly been concerned with monocular cues. We believe that the answer for achieving true realism – subjects feel as if they are viewing a real scene rather than a photograph – should be achieved by considering stereovision which is the dominant human viewing experience.

Future work will consider comparing the virtual stereoscopic scenes with real scenes. We will design further experiments to test the visual realism of the computer-generated images against real scenes rather than photographs. We will also investigate just how much computational effort we can avoid, such as computing soft shadows, when binocular depth cues are present in the scene.

.035

Table 1 Independent-samples t test between groups (alpha = 0.05)

Acknowledgements

Many of the modern rendering techniques have been criticized as being inefficient in that they compute a lot of information that the human visual system is simply unable to cope with. In [39], Yee et al. presented a method to accelerate global illumination computation in pre-rendered animations by taking advantage of limitations of the human visual system. The results from our experiments suggest we may be able to go even further in reducing unnecessary computation, such as soft shadowing, when binocular depth cues are present.

We would like to thank all the participants in Department of Computer Science for taking part in the experiments.

References [1] K. Arthur, K.S. Booth, and C.Ware. (1993) Evaluating 3D Task Performance for Fish Tank Virtual Worlds. ACM Transactions on Information Systems. 11(3) 239-265.

Our results show that it takes at least 20 more seconds to decide on the realism of a scene when using stereovision. This implies that when presented with a dynamic scene, our visual system should be able to determine even less detail in stereo than we are able to process in the same time when viewing monocularly. Of course, the reality of a scene itself is difficult to define and everyone could have his/her own definition. This makes it difficult to actually quantify visual realism. In this paper we chose to focus on analyzing the time data rather than the answer of “Real” or “Not Real” itself so as not to over-simplify the complex nature of visual realism.

[2] C. Barbour, and G. Meyer. Visual Cues and Pictorial Limitations for Computer Generated Photorealistic Images. In The Visual Computer, vol 9, pp. 151-165. 1992. [3] M. L. Braunstein, G. J. Anderson, M. W. Rouse, and J. S. Tittle. 1986. Recovering viewer-centered depth from disparity, occlusion and velocity gradients. Perception Psychophys. 40, 216–224. [4] A. L. Bridges, and J. M. Reising. 1987. Threedimensional stereographic pictorial visual interfaces and display systems in flight simulation. In True Three-Dimensional Imaging Techniques and Display Technologies, D. F. McAllister and W. E. Robbins, Eds. Proceedings of SPIE—The International Society for Optical Engineering, vol. 761. SPIE Press, Bellingham,WA, 102–109.

6 Conclusion and Future Work A new way of looking into the visual realism of computer generated Imagery has been proposed. Studies in computer graphics for achieving realistic rendering

115

[18] I. Gordon. Theories of Visual Perception. John Wiley & Sons, New York, NY, 1997.

[5] M. E. Brown, and J. J. Gallimore. 1995. Visualization of three-dimensional structure during computer-aided design. Int. J. Human-Computer Interact. 7, 1 (Jan.-Mar. 1995), 37–56. [6] V. Bruce, P. Green, and M. Georgeson. Visual Perception: Physiology, Psychology, and Ecology. East Sussex, UK, 1996. Psychology Press [7] N. Bruno, and J. E. Cutting. 1988. Minimodularity and the perception of layout. Journal of Experimental Psychology 117, 161–170. [8] K. Chiu, and P. Shirley. Rendering, Complexity, and Perception. In Proc of the 5th Eurographics Rendering Workshop. SpringerWien, New York, NY. 1994. [9] J. J. Clark, and A. L. Yuille. 1990. Data Fusion for Sensory Information Processing Systems. Kluwer Academic, Dordrecht, Netherlands.

[19] E. B. Johnston, B. G. Cumming, and A. J. Parker. 1993. Integration of depth modules. Vision Research 33, 5/6, 813–826. [20] C. A. Kelsey. 1993. Detection of visual information. In Perception of Visual Information. W. R. Hendee and P. Wells, Eds. Springer-Verlag, Vienna, Austria, 30–51. [21] J. T. Kajiya. The rendering equation. In D. C. EVANS and R. J. ATHAY, editors, Computer Graphics (SIGGRAPH ’86 Proceedings), volume 20, pages 143–150, August 1986. [22] L. Kjelldahl, and M. Prime. A study on how depth perception is affected by different presentation methods of 3D objects on a 2D display. Computers & Graphics, 19(2):199–202, March 1995. [23] J. Lengyel. The Convergence of Graphics and Vision. In IEEE Computer, July 1998.

[10] C. Erkelens. 1988. Fusional limits for a large random-dot stereogram. Vision Research, 28, 345353. [11] D. H. Fender and B. Julesz 1967. Extension of Panum’s fusional area in binocularly stablized vision. Journal of the Optical Society of America, 57, 819-830. [12] D. Ferster. 1981. Acomparison of binocular depth mechanism in areas 17 and 18 of the cat visual cortex. Journal of Physiology, 311, 623-655. [13] C. F. Foard, and D. C. K. Nelson. 1984. Holistic and analytic modes of processing: The multiple determinants of perceptual analysis. Journal of Experimental Psychology. 113, 94–111. [14] W. R. Garner, and G. L. Felfoldy. 1970. Integrality of stimulus in various types of information processing. Cognitive Science 1, 225–241.

[24] G. R. Lockhead. 1972. Processing dimensional stimuli: A note. Psychol. Rev. 79, 410–419. [25] A. Mcnamara, A. Chalmers, T. Troscianko, and I. Gilchrist. Comparing Real & Synthetic Scenes using Human Judgement of Lightness. In Proc of Eurographics Workshop on Rendering. SpringerVerlag 2000. [26] G. Meyer, H. Rushmeier, M. Cohen, D. Greenberg, and K. Torrance. An Experimental Evaluation of Computer Graphics Imagery. In Transactions on Graphics, 5 (1), pp. 30-50. New York, 1986. ACM. [27] Neurobehavioral Systems, Presentation program. http://www.neurobehavioralsystems.com/ [28] K.N. Ogle. 1964. Researches in binocular vision. New York: Hafner.

[15] J. M. Hollerbach, W. B. Thompson, and P. Shirley. The convergence of robotics, vision, and computer graphics for user interaction. International Journal of Robotics Research, 18:1088–1100, November [16] E. Horvitz and J. Lengyel. Perception, Attention, and Resources: A Decision-Theoretic Approach to Graphics Rendering. In Proc of Thirteenth Conf on Uncertainty in AI, pp. 238-249. Providence, 1997. [17] G. S. Hubona, P. N. Wheeler, G. W. Shirah, and M. Brandt. The relative contributions of stereo, lighting, and background scenes in promoting 3D depth visualization. ACM Transactions on ComputerHuman Interaction, 6(3):214–242, September1999. Computer Graphics & Applications, pages 44–58, May 1992.

116

[29] R. Patterson. 1992 Human Stereopsis. Human Factors, 34(2) 669-692. [30] S. Pattanaik, J. Ferwerda, K. Torrance, and D. Greenberg. Validation of Global Illumination Solutions Through CCD Camera Measurements. In Proc of 5th Color Imaging Conf, Soc for Imaging Sci and Tech, pp. 250-253, 1997. [31] H. Rushmeier, G. Larson, C. Piatko, P. Sanders, and B. Rust. Comparing Real and Synthetic Images: Some Ideas About Metrics. In Proc of Eurographics Rendering Workshop 1995. SpringerWien, New York, NY. 1995.

[32] C.M. Schor, and C. W. Tyler. 1981. Spatialtemporal properties of Panum’s fusional area. Vision Research, 21, 683 - 692. [33] C. M. Schor, and I. Wood. 1983. Disparity range for local Stereopsis as a function of luminance spatial frequency. Vision Research, 23, 1649 – 1654. [34] P. Rademacher, J. Lengyel, E. Cutrell, and T. Whitted. Measuring the Perception of Visual Realism in Images. Rendering Techniques 2001, 235-248. [35] P. Servos, M. A. Goodale, and L. S. Jakobson. The role of binocular vision in prehension: A kinematic analysis. Vision Research, 32(8):1513–1521, 1992. [36] H. S. Smallman, and D. I. A. Macloed. 1994. Sizedisparity correlation in stereopsis at contrast threshold. Journal of the Optical Society of America, All, 2169-2183. [37] L. Wanger, J. Ferwerda, and D. Greenberg.1992. Perceiving spatial relationships in computer generated images. IEEE Comput. Graph. Appl. 12, 3 (May), 44–58. [38] G. J. Ward, The RADIANCE lighting simulation and rendering system, Proceedings of SIGGRAPH '94 (Orlando, Florida) (A. Glassner, ed.), Computer Graphics Proceedings, Annual Conference Series, July 1994. pp. 459 - 472. [39] H. Yee, S. Patanaik, and D. P. Greenberg. Spatiotemporal Sensitivity and Visual Attention for Efficient Rendering of Dynamic Environments. ACM Transaction on Graphics, Vol. 20, No. 1, January 2001. [40] 3d combine program. http://www.3dcombine.com/index.htm

117