Towards Quantifying Depth and Size Perception in 3D ... - CiteSeerX

1 downloads 39 Views 66KB Size Report
Roscoe attributed this discrepancy to inappropriate accommodation. ..... massively parallel graphics engine developed at UNC-CH under the direction of Henry.
-1-

Towards Quantifying Depth and Size Perception in 3D Virtual Environments Jannick P. Rolland*, Christina A. Burbeck°, William Gibson*, and Dan Ariely° Departments of *Computer Science, CB 3175, and °Psychology, CB 3270, University of North Carolina, Chapel Hill NC 27599, U.S.A.

Abstract With the fast advance of real-time computer graphics, head-mounted displays (HMDs) have become popular tools for 3D visualization. One of the most promising and challenging future uses of HMDs, however, is in applications where virtual environments enhance rather then replace real environments. The difficulty comes from the fact that the relative registration of real and virtual environments is a difficult task to perform from both a computational and a perceptual point of view, those being tightly correlated. Given a computational model for stereoscopic viewing, a better understanding of depth and size perception in virtual environments is called for, as well as the relationships between perceived depths and sizes of real and virtual objects sharing a common visual space. This paper first discusses the important parameters within the model that need to be carefully evaluated and set for accurate calibration of the system. We then describe the experimental paradigm used for assessing depth perception of two generic objects, in this case a cube and a cylinder. Finally, experimental results of perceived depth of real and virtual generic objects are presented. 1. Introduction The advance of 3D visualization techniques presents us with the problem of integrating simulated 3D environments with real 3D environments that we would like to experience simultaneously. If a 3D image is viewed directly on a CRT display, both the display and the image are seen simultaneously. If a 3D image is viewed through a see-through HMD, the intermingling of the real and simulated 3D world is even more complex. To provide a seamless integration of the real and virtual images, the absolute depths and sizes of the objects in the two images must correspond appropriately. Then, and only then, will the subject’s overall percept be consistent across the 3D space. Theoretically, see-through can be accomplished using an optical or a video see-through HMD. If video is used to combine real and virtual information, however, the real world is often replaced by a limited field of view projection of the real world and a new set of problems unfolds. Moreover, there are situations where awareness of the real world is desirable or necessary. For those situations, the see-through HMD is superior to both the opaque and video see-through because useful information can be provided to the user in the visual context of the real world. The main potential drawback of optical see-through HMDs is that virtual objects cannot in principle occlude real objects. But it is necessary in general to distinguish between physical and perceptual properties, and some solutions to the occlusion problem may be found once more basic perceptual properties have been quantified. In his book on binocular vision, R. W. Reading speaks of visual space as opposed to physical space to clearly distinguish between how things may appear given different

-2human-made visual stimuli, and how they are in the physical world (Reading, 1983). He defines visual space and physical space: “visual space is made of appearances and scaled off in terms of apparent relationships of objects in the field of view” and “physical space is that form which can be measured with rulers and goniometers." In some cases, the exact nature of the visual space is not very relevant for operating in a physical space. Because humans have the ability to adapt, that is, they can learn to adjust motor responses to what is seen, what is important is that there exists a fixed relationship between the visual and physical worlds. The most common example of this adaptation process is the adjustments people make when getting a new prescription of eye glasses. When virtual and physical worlds are superimposed and made to interact, however, the intermingling of the real and simulated worlds renders normal adaptations of this type useless. A fixed relationship between the two worlds is insufficient to provide a seamless integration of the real and simulated images; rather, the absolute depths and sizes of the objects, real and virtual, respectively, must correspond exactly. Only then, will the observer’s overall percept be consistent across the 3D space. Most perceptual psychophysics studies in virtual environments have been done in human factors studies of Air Force pilots. The use of virtual displays has been investigated as a potential means to create more effective displays, given the task of detecting visual targets. Two types of display, Head-Up Displays (HUDs) and HMDs, have been used to display symbols during flight runs. The use of virtual displays to visualize symbology was thought to be a means of relieving the pilot of the need to constantly shift his attention between the cockpit instruments and the environment outside the window of the aircraft, therefore increasing the pilot's potential chances of detecting several targets in his or her environment. Haines and colleagues have shown, however, that despite the fact that pilots are supposed to have access to symbology and physical targets information at the same time, symbology most often captures the pilot’s attention and targets are being missed (Haines et al. 1980). This simple example suggests that superimposing virtual and real information in an effective manner is difficult to achieve. Moreover, effectiveness needs to be measured in terms of the task to be performed. In more recent years, HMDs have become more and more popular in aircraft to visualize simulated scenes for flight training. Several articles by Stanley Roscoe were published in the Human Factors literature that describe some of the problems found with virtual displays for flight simulation (Roscoe, 1948, 1979, 1984, 1985, 1987, 1991; Roscoe et al., 1966). His main finding is that pilots have a tendency to overestimate the subjective distance to virtual objects in collimated displays. Pilots also reported a decrease in apparent size of virtual objects. More specifically, he has found that imaged objects such as airport runways appear smaller and farther away than objects subtending the same visual angle by direct vision. The decrease in apparent size of objects would then be the cause of depth overestimation because objects of given familiar physical sizes appear farther away as their apparent size decreases. Roscoe attributed this discrepancy to inappropriate accommodation. In such displays, the optical images of the two displays, where the simulated images are drawn, are formed at optical infinity. From a purely optical point of view, this would mean that the pilots are accommodating (focusing) at optical infinity to form sharp images of the displays on their retina. Research has shown, however, that pilots’ eyes do not automatically focus at optical infinity (Leibowitz, 1975; Roscoe, 1976; Benel, 1979; Simonelli, 1980; Randle, 1980); rather, they tend to focus more likely at their resting point of accommodation, sometimes referred to as their dark focus, which can lie anywhere between 1m and optical infinity, depending on the individual. This issue of inappropriate accommodation causing a decrease in apparent size may not be, however, the correct or the only explanation for the increase in perceived distance of

-3virtual objects. Other researchers claim that the coordinated change in accommodation, vergence, and pupil diameter that occurs when looking at nearby objects, may explain the phenomena (Enright, 1989; Lockhard and Wolbarsh, 1989). Other studies have also reported that optically generated displays that present synthetic images of the outside world have the characteristic of producing systematic errors in size and depth judgments (Palmer and Cronn, 1973; Sheehy and Wilkinson, 1989; Hadani, 1991). We have experienced in our own laboratory some discrepancies between the theoretical modelization of some environments and what was actually perceived in the HMD. An example is the case of navigation in a walkthrough experiment using an opaque (no direct view of the physical environment is then available) type of HMD with about 90 degrees binocular FOV. Several subjects reported that the floor of the virtual room they walked on seemed lower than they expected and behaved as if they had to take a step down to move forward, while no steps were present or visible. It proved to be difficult to navigate comfortably in this environment and to get a good idea of objects’ positions, room width, length, and depth. This percept may be the product of inaccurate calibration since this effect was not reported systematically by all subjects and, moreover, no interpupillary distance adjustments were available on the system. The resulting distorted space, however, could also be due, at least in part, to the relationship of the objects within the virtual environment itself, as well as the objects characteristics such as the types of texture, illumination, and color used. We propose in this paper to, first of all, review the evaluation of important parameters within the computational model used to generate the stereo images for the two eyes and to identify potential sources of inaccuracies in displaying the stereo images that would result in some mislocation of the objects in virtual space. We shall then describe the experimental setup and the calibration procedure performed to test our computational model. Finally, we shall describe the experimental paradigm used to assess perceived depth of generic objects (real or virtual) in real and virtual environments and present the experimental results. 2. Definitions Aperture: An opening or hole through which radiation or matter may pass Aperture Stop: A physical constraint, often a lens retainer, that limits the diameter of the axial light bundle allowed to pass through a lens. Entrance pupil: In a lens or other optical system, the image of the aperture stop as seen from the object space. Exit Pupil: In a lens or other optical system, the image of the aperture stop as seen from the image space. Pupil: 1. In the eye, the opening in the iris that permits light to pass and be focused on the retina. 2. in a lens, the image of the aperture stop as seen from object and image space. Chief ray: Any ray that passes through the center of the entrance pupil, the pupil, or the exit pupil is referred to as the chief ray. Field of view: The maximum area that can be seen through a lens or an optical instrument . Focal point: That point on the optical axis of a lens to which an incident bundle of parallel light rays will converge. Principal planes: The two conjugate planes in an optical system of unit positive linear magnification Principal points: The intersection of the principal plane and the optical axis of a lens.

-4Nodal points: The intersection with the optical axis of the two conjugate planes positive angular magnification.

of unit

3. Computational Model Generating accurate stereoscopic images for a HMD is a difficult task to perform because of the multiple potential sources of errors that can occur at several stages of the computation. The basic idea behind the computational model assumes that if the virtual images of the display screens are centered on the eyes of the subject and the center pixels of the two display screens are turned on, the lines joining the center of perspective chosen for the eyes to those bright pixels will be parallel and the subject will see a point of light at infinity. Any 3D point of light at a finite distance can be rendered by turning on the appropriate pixels on each display such that the line of sights of the eyes passing through those pixels converge at that distance. This concept, very simple in nature, is actually far from simple to implement with accuracy in a HMD (as we have ourselves discovered). A more complete description of the computational model used in HMDs can be found in (Robinett and Rolland, 1992). A computational model that details the appropriate transformation matrices in the most general case where the eyes may be decentered with respect to the virtual image of the optics, and the optical axis of the viewer may be non parallel is under preparation (Robinett and Holloway , 1993). One of the sources of inaccuracies in depth location of 3D virtual objects come from ignoring the optical system used to magnify the images, underestimating or overestimating the field of view (FOV), inaccurately specifying the center of perspective, and finally ignoring the variations in interpupillary distance (IPD) between subjects. The approach of tweaking the software parameters until things look about right is sometimes all one needs to get a subjective percept of depth. Such an approach however is far too subjective for many virtual environments applications (Bajura et al., 1992). Rather, an accurate calibration becomes absolutely necessary. Even though it is especially true for cases where virtual and real environments meet and interact, as with see-through HMDs, it will be benefit opaque HMDs as well. If the sense of vision is our main emphasis here, a HMD must be thought of as a fourcomponent system: the display screens, the optical magnifiers, the eyes of the viewer, and the head-tracking device. For the display software to generate the correct set of stereo pairs to be imaged on the retinas, the software must model the hardware components involved in creating the images presented to the eyes. This includes the characteristics of the frame buffer where the images are rendered, the display-screens’ parameters, some specifications of the imaging optics, and the position of the optics with respect to the eyes of the wearer. An important point to note is that the spatial relationship of this components relative to each other is most important to the calibration of the system. If more than the sense of vision is considered, additional components could be used for audio, touch, manipulation, force feedback, and smell. Even though we shall only speak of the sense of vision, some of the issues addressed here may be generalized to systems addressing other senses as well. 3.1 Image Formation We shall give a brief description of the image-forming process in order to understand how it can affect the measurements made on the images such as those of our 3D visual system. An optical points, and its in (Longhurst, display screen

system can be described most generally by its principal points, its nodal focal length ; A good review of the basics of geometrical optics can be found 1973). Let’s first consider a monocular system. Given the position of the with respect to the first (object) principal point of the optics, the position

-5and size of the virtual image can be determined with respect to the second (image) principal point. The display screens and their corresponding virtual images are said to be c o n j u g a t e s of one another. In the case of a binocular system, one would apply the same argument to each eye. We shall point out that this general description of optical imaging only applies to describe the imaging process between two conjugate planes. Moreover, a plane in the image space can only be the conjugate of a single plane in object space. This means that if one is to image multiple planes in object space onto one plane in image space, only objects within one plane in object space will be in sharp focus at the image plane. Other objects will be out of focus, and each point on those objects will be imaged as a blurry spot at the image plane. In this case, the knowledge of the location of the entrance and exit pupils of the optical system need to be known since the center of the blurry spot corresponds to the intercept of the chief ray with the image plane, rather than the intercept of the corresponding nodal ray with the image plane. A concrete application of this point will be encountered when we discuss the choice of the eyepoint location for the computational model. 3.2 Display Screens Alignment and Positioning We shall assume in this following discussion that 1. the eyes of the subject are centered on the optical axes of the optics and 2. that the eyes are reduced to one physical point that overlap the theoretical center of projection used to calculate the stereo projections for each left and right image. A discussion of the effect of the mislocation of the eyes is given in section 3.3. Given the above assumptions, in order for the images to be easily fused and computational depths to correlate accurately with the displayed depths, the display screens must be 1. centered on the optical axis of the optics; this will also maximize the monocular FOVs; 2. kept from unwanted rotations around the optical axis; 3. magnified equally through the optics. Any lateral misalignments of the display screens with respect to the optical axis will cause an error in depth location unless compensated for by the software. An existing lateral misalignment may be due to the fact that the displays are physically too large to be mounted centered on the optics as in the VPL and Flight Helmets (Vision Research) or simply be the result of mechanical limitations in mounting the displays centered on the optics with the precision of a fraction of a pixel. In the case of our experiments, we built an optical bench prototype HMD using off-the-shelves optical components in order to have complete access to all the parameters of the system and to eliminate lateral misalignments that are often found in commercially available systems. Even though we used 3 inch diagonal LCDs displays, we adopted a geometry of the system that allows the displays to be centered on the optics within residual mechanical and assembly errors. A picture of the setup is shown in Figure 3. The exact amount of residual lateral misalignment has proven to be difficult to estimate in number of pixels, and a calibration technique derived from our computational model is described in detail in section 4.2. A vertical misalignment of the displays with respect to the optics will either 1. shift the objects vertically within the FOV if the amount of offset is equal for the two eyes or 2. affect the ability to fuse the images due to induced d i p v e r g e n c e of the eyes of the wearer. While the eyes have a wide latitude in fusing images of various degrees of lateral shifts, the slightest difference in vertical position causes eye strains, possibly headaches, and if the discrepancy is more than 0.34 degree visual angle the time needed to fuse the images increases exponentially (Burton and Home, 1980). In the particular case of our system, the vertical dimension of the display area was 40 mm, and an angle of 0.34 degree at the eyes

-6between the two displays corresponds to a vertical displacement of one display with respect to the other of about 0.5 mm which corresponds to roughly 6 addressable vertical lines while a total of 442 addressable lines were available. An unwanted rotation of the display screens around the optical axis can prevent the fusion of the two images for severe angles. In any case, it introduces distortion of the 3D objects being formed through stereo fusion that may be beyond tolerances. Finally, the positioning of the display screens with respect to the optics must be the same for the two eyes since it determines the magnification of the monocular images. A difference in magnification for the two eyes will cause a systematic error in depth location for the objects being viewed. A complete model of computational errors in HMDs, and their effect on object size and depth locations in under way (Holloway, doctoral dissertation in progress) 3.3 Eyepoint In computer graphics, the center of perspective projection is often referred to as the eyepoint since it is either assumed to be the eye location for the viewer or the location of video cameras (acting as two eyes separated by an intraocular distance) used for the acquisition of stereo images. Even though the simplest optical system, known as the p i n h o l e c a m e r a (Barrett and Swindell, 1981) reduces to a single point that serves as the center of projection or eyepoint, this is generally not the case of most optical systems. For the purpose of accurate calibration of HMDs, the eyepoint for perspective projection used in the computational model must be defined in terms of a precise location within the eye, a standard eye being about 25 mm in diameter (Levine, 1985). There is much controversy in the literature about which point within the eye must be chosen for the center of projection. Let’s look at the meaning of such a point from a computer graphics point of view: searching for the existence of a center of projection within the eyes comes from the assumption that stimulating a region of the retina produces a perception of light that is projected out into the surrounding environment through “a center of projection”. This means that there exist a point to point relationship between a point object in the surrounding environment and a point image on the retina. According to this purely geometrical projection, the center of projections for the object and image point are the first and second nodal point of the eye, respectively, as we and others have claimed (Robinett and Rolland, 1991; Deering, 1992). The controversy arises from the fact that not all points in the field of view are in focus at the same time as has been pointed out by Ogle (1962). In this case, any point in the surrounding environment that is out of focus, corresponds to a blurry spot at the retina. The important point is that the center of the blurry spot is determined by the corresponding chief ray passing through the eye. That is, the fact that the nodal points correspond to the optical center of projection of an optical system is only true for precisely conjugate planes. For this reason, Ogle suggests that choosing the entrance pupil of the eye as the center of projection in object space is more realistic. This subtle point is related to the fact that the nodal point of an optical system such as the eye is a useful geometrical construct for describing imaging properties of conjugate planes in an optical system, but has little to do with the way the retina is being stimulated by light coming from the surrounding environment at different depth. Note that the first nodal point of the human eye is only at about 3.5 mm behind the entrance pupil in a standard eye (Ogle, 1962). However, the precise localization of the center of projection may have some impact on the value given to the FOV.

-7-

3.4 Field of View (FOV) The vertical and horizontal FOVs of an optical system working at finite conjugates are generally specified by 1. the vertical (V) and horizontal (H) dimensions of the display screens, respectively; 2. the focal length (f) of the magnifier; 3. the distance of the display screens (D) with respect to first principal plane of the optics; and 4. the location (L) of the entrance pupils of the eyes with respect to the virtual images of the display screens formed by the optical system. One may combine D and f to calculate the position D’ of the images of the displays formed through the optics with respect to the second principal plane, and thus the magnification m of the optical system for this set of conjugate planes. The FOV is then most generally given by FOV = 2Θ

-1

= tan

my L

,

(1)

where y equal V/2 or H/2 for vertical and horizontal FOVs, respectively. We shall, however, describe two optical configurations that make exception to this rule: one where the FOV is independent of the eye location and another where the FOV is independent of the display screens location with respect to the optics, in other words, is independent of the system optical magnification.

virtual image of the screen at infinity

screen

O O F'

F

P

eye location can be anywhere along the optical axis on the right hand side of the optical system.

P'

Figure 1. Model of a lens by its principal planes P and P' and its focal points F and F'. For the display screen at F, the virtual image of the display is at infinity and subtends a visual angle of 2 Θ, independent of the position of the eye. The FOV subtended at the eyes is independent of the position of the eyes with respect to the optics as shown in Fig. 1, when L becomes infinite. The images are referred to as collimated images, the magnification m becomes infinite as well, and one characterizes the size of an object in the FOV by its angular subtense at the eye point. In this particular case, the FOV depends solely on the size of the display and the focal length of the optical system. If we denote the focal length of the optics P’F’ as f’, the FOV is now given by FOV = 2 Θ = tan

-1

y f’

.

(2)

-8We shall note however, that given a physical dimension for the lens and a size for the pupil of the eye, only a limited range of eye point locations will produce an u n v i g n e t t e d FOV. Rays of light emitted by the display screens are said vignetted by the lens if they intercept the lens at a height larger than the outer diameter of the lens. As the eyes get further away from the lens, the effective rays from a point on the screens to the pupil of the eyes intercept the lens closer to its outer diameter and will eventually be first vignetted before being completely occluded. The other exception to the rule is the case where the FOV is independent of the position of the display screens with respect to the optics. This occurs when the eye points are located at the focal points of the lens as shown also in Fig. 2. In this case, the FOV can also be calculated using Eq. 2.

virtual image of the screen

O

Y

F screen

F'

X

eye location

P P'

d

Figure 2. Model of a lens by its principal planes P and P' and its focal points F and F'. For the eye position at F', the FOV is given by 2 Θ, independent of the position of the display screen with respect to the lens. The computed images are rendered into a frame buffer before being scanned out onto the display screens. The default assumption of the graphics library is that the frame buffer (512x640 pixels in our case) maps exactly to the vertical and horizontal dimensions of the screens. In practice, this is often not the case, and the exact o v e r s c a n of the frame buffer must be measured. What we mean by overscan, in this case, is that region of the frame buffer that does not get drawn onto the display area. This overscan can be accounted for by defining a sub-viewport within the frame buffer that corresponds to the display capability of the screens. The FOV can then be specified as the FOV corresponding to this sub-viewport which indeed corresponds to the screens size. One common definition of the FOV is to specify the vertical FOV and a pixel aspect ratio instead of the horizontal FOV. The pixel aspect ratio can be calculated using the physical dimensions of the screens and the vertical and horizontal number of scan lines of the frame buffer that are actually visible on the screens. 3.5 Interpupillary Distance We may want to distinguish between three types of distances: the human subject interpupillary distance (IPD); the lateral separation of the optical axes of the monocular optical systems that we shall refer to as the optics baseline; and the lateral separation of the

-9computational eyepoints that we shall refer to as the computational baseline since it is used to compute objects positions and shapes on the screens. In most systems, the optics baseline is fixed, the subjects IPD is ignored and the computational baseline is set to a small value of about 62 mm to allow most users to fuse the images. This still create a lot of problems for people with small IPDs since they must work harder at fusing images, and it does create shifts in depth perception to almost everyone since none of the parameters actually match. Again, this may be acceptable for opaque systems but is unacceptable for see-through HMDs. In a calibrated setup, the optics baseline and the computational baseline should be set to the IPD of the subject. Moreover, depending on whether the images are collimated or not, the computational model may have to assume a certain position for the distance of the eyes of the subject with respect to the monocular virtual images of display screens for the FOV to be correctly specified, as described in section 3.4. In the case of collimated images only, we have seen that the FOV subtended at the eyes is independent of the eyes location. This can also be said of any point in the FOV. When the two eyepoints are combined to determine the depth location of an object, however, simple geometry shows that virtual objects follow the eyes; that is, they theoretically occupy the same depth as measured from the eyes, regardless of where the latter reside, but in the case of lateral displacement of the eyes with respect to the optical axes, however, the 3D virtual objects will be laterally displaced by the same amount. 4. Experimental Setup and calibration 4.1. Experimental Setup All studies described here were carried out on an optical bench prototype HMD that we designed and built using off-the-shelves optical components. The layout of the optical setup for one eye is given in Fig. 3., while the setup is shown Fig. 4. We strongly believed that building our own setup was necessary in order to have complete access to and control over all the different parameters of the system. The displays used for the study were Memorex LCD color displays, while the computer graphics were generated by Pixel Plane 5, a massively parallel graphics engine developed at UNC-CH under the direction of Henry Fuchs and John Poulton (Fuchs et al., 1989). In addition, the geometry adopted for the layout of the optical system allowed us to eliminate lateral misalignments that are often found in commercially available systems. Even though the displays used were about 3 inch diagonal, this adopted geometry allowed the displays to be centered on the optics with minimal residual mechanical and assembly errors. The exact amount of residual lateral misalignment has proven to be difficult to estimate in number of pixels, and a calibration technique derived from our computational model was derived and described below. The vertical and horizontal dimensions of the display screens used was 40 mm and 52.65 mm, respectively, and the focal length of the magnifier 86.24 mm. The number of vertical and horizontal addressable scan lines were mesured to be 441 x 593, respectively. For collimated images, which is the case we are presenting in this paper, the vertical FOV is 26.11 degrees as calculated using Eq. 2. A pixel ratio of 1.02 can be calculated from the physical dimensions of the display area and the number of vertical and horizontal addressable lines. We shall point out that neglecting the overscan of the frame buffer would result in an error of about 5 degrees in the specified FOV which would create objects that look larger and closer than theoretically intended. This could be indeed detected by the calibration procedure described below.

-10-

Figure 3. Optical layout of the magnifier for one eye.

A nice feature of the optical setup was the ability to adjust the optics and computational baselines to match the subjects IPD down to 60 mm. All of our observers had IPDs between 61 mm and 68 mm measured with a pupilometer. Any dissymetry between the two eyes was taken into account. 4.2. Calibration Technique Once the FOV has been set, and the optics and computational baselines have been matched to the IPD of the subject, a quick calibration of the system can be performed to determine if any computational offsets of the images being displayed are necessary to account for unwanted lateral displacements of the displays with respect to the optical axis. If the system has been carefully aligned and the parameters within the computational correctly set, the magnitude of the offsets should be very small, on the order of a few pixels. The calibration consists of looking monocularly at a symmetrical pattern drawn on the displays in such a way that it projects at a very large distance if seen binocularly (greater or equal to 200m is recommended; 6m, which is often referred to as optical infinity proved to be insufficient). A symmetrical pattern is recommended since the symmetry will be used in the alignment process described now. While the subject is checking for his alignment of his/her eyes with the optical axis of the magnifier, one may draw on a sheet of paper two crosses separated by the IPD of the observer, as well as the mid-point between the two crosses. As the observer is looking through the displays monocularly, one positions the two crosses at any depth in space such that, while looking with the right eye let’s say, the right cross falls in the exact center of the virtual pattern displayed in the right eye. As the observer switches eyes, the center of the left virtual pattern and the left cross should exactly superimpose. If the superimposition for each eye does not hold, some offsets for the

-11images being displayed must be set in the software until perfect match occurs. This is better accomplished using some systematic procedure to offset the images.

Figure 4. Bench prototype optical setup designed and built by J.P. Rolland.

Once the calibration at infinity is completed, one can check for consistency by looking at the virtual pattern which is now projected at a nearby distance. The two physical crosses are now positioned at the same depth from the eyepoints than the virtual pattern. While closing alternatively one eye, the observer should perceive the center of the virtual pattern to fall exactly on the mid point between the two crosses. This calibration technique was derived directly from the basic concept of the computational model of stereo images illustrated on Fig.5. Note that the calibration procedure never used both eyes simultaneously to avoid confusing calibration issues with perception issues. The two eyes are only used sequentially. This consistency check will only be strictly valid, however, if the optical distortion of the optical system has been compensated for (Robinett and Rolland, 1990; Rolland and Hopkins, 1993). The calibration using the optical infinity condition, on the other hand, is always valid, regardless of optical distortion as long as the center of the symmetrical pattern is confounded with the center of optical distortion within a few pixels, which is the case in our experiment. A variant of the above technique could also be used to verify the FOV of the system (Ellis, 1993). It consists of comparing the size of a virtual object with that of a real identical object, again under monocular viewing. Caution must be exerted, however, if optical distortion is present in the display since monocular images will appear larger in the case of pincushion distortion, which often limits HMD systems. This method, however, can be used for distortion free systems.

-12-

IPD

object at 2m center marked by a cross located at IPD/2

virtual image seen by right eye

virtual image seen by left eye

location of the cross on right screen

location of the cross on left screen

screen

screen P P'

left eye

P optics model P' right eye

Figure 5. Computational model of stereo pairs and its direct application to calibration

5. Experimental Studies In our day-to-day experience in the real world, looking at nearby objects triggers two processes that are put into action almost simultaneously: the eyes converge by rotating the eyes inward and the crystalline lenses accommodate to the location of the object. Accommodation is needed to bring into focus details of the objects. It may seem natural that the actions of accommodation and convergence are definite, and that for any specific accommodation, there must be an exactly corresponding convergence. This is not, however, how vision works in most HMDs. In HMDs, one needs to distinguish between monocular and stereo aspects of vision. With respect to each eye separately, the virtual images of the 2 LCD displays mounted on the head are formed by the optics at a definite plane in space. The position of that plane solely depends on the distance of the LCD displays to the optical viewers. That distance usually remains fixed once it has been set to a desired value.

-13A particular setting is the one of infinity viewing or zero accommodation where the displays are positioned at the focal plane of the lenses. The monocular virtual images are then said to be collimated. Such a setting is often preferred because it sets the eyes to a relaxed state of accommodation and it can also be shown that what is perceived in the HMD is less dependent on the exact position of the eyes of the wearer behind the optics A particular plane, that is infinity, corresponds, for example, to positioning the displays at the focal plane of the lenses. The monocular virtual images are then said to be collimated. In this case, the eyes of the subject are in the condition for infinity viewing or zero accommodation. Such a setting is often preferred not only because it sets the eyes to a relax state of accommodation but also because what is perceived in the HMD is less dependent on the exact position of the eyes of the wearer as shown in section 3.3 and 3.4. Indeed, most stereoscopes are so adjusted that the viewing lenses act as collimators (McKay, 1948). With respect to the two eyes combined, vision in HMDs is similar to our vision in the real world in that convergence is driven by the location of the 3D objects perceived from the disparate monocular images. The fact that accommodation and convergence are divorced from one another in most HMD settings brings up the question of whether or not the distance at which the monocular images are optically formed has any impact on the perceived depths and sizes of virtual objects embedded in the physical environment and/or creates distortions in depth and sizes of virtual spaces. This is especially important in the case of see-through HMDs, where the observer has access to both the simulated and the physical world simultaneously. As mentioned in section 4, cases of discrepancies in assessing distance and size judgments of virtual objects have been reported by Roscoe and others. We started investigating this issue further by studying the case of generic non overlapping objects. To study the impact of the uncoupling between convergence and accommodation, we considered the most extreme case of uncoupling, that is when 3D real and virtual objects are placed at nearby distances while the 2D virtual monocular images are optically collimated Paradigm. We asked the subjects to judge the proximity in depth of a virtual object displayed via our see-through HMD with respect to that of a physical object presented simultaneously. The two objects were laterally separated by 110 mm so that, given the subject viewpoint and the range of distances the virtual object occupied, no overlap of the two objects ever occurred during the experiment. Moreover, such a lateral separation prevented the subjects from basing their judgments upon an edge of the object. The contrast of such edges is very dependant upon the shape of the objects and the light model configuration. Generic objects of different shapes were chosen for comparison, in our case a cube and a cylinder, to prevent the subjects from assessing depth perception solely on the basis of the perceived size of an object relative to the other. For example, if the two objects were of similar shapes but of different sizes, one may be biased to always see a small cube farther away than a larger one. On the other hand, if two cubes of identical size were chosen, then depth assessment could be done solely upon the assessment of their respective sizes. The virtual and real objects were both white. Shading in the virtual environment matched closely the shading of the physical objects. We considered three conditions for the objects: both objects were real, virtual, or one was real and the other one was virtual. These three conditions are referred to as real/real, virtual/virtual, and real/virtual. One of the objects was always fixed in location, and the other was moved in depth between trials. The screens displaying the virtual objects and the real objects were blanked for a few seconds between trials. No other objects were ever perceivable within the field of view.

-14Task. Subjects performed a 2-alternative force choice task. They were asked to answer (via a key press) if the object on the right (the one translated in depth between trials), was closer or farther than the object on the left. M e t h o d . We used a method of constant stimuli (MOC) to present the stimuli in any of the three conditions. The depth values presented were chosen before each run to capture the whole psychometric function for each subject and experiment, so that at least 4 points felt along the steepest part of the psychometric curve. Each experiment was made of 31 blocks of 10 trials each corresponding to a random sequence of ten possible depths around a mean value. Subjects used a chin rest to maintain a constant viewing distance. Viewing duration was unlimited and terminated on a decision key press. The subjects used a two button device for entering their responses. The data were analyzed using probit analysis on the last 30 blocks. The first block was always discarded. The resulting psychometric function was plotted for each subject and after each experiment to assure that at least 4 points lay on the linear part of the psychometric function. Points on the shoulders of the psychometric function were used as anchor points for the subject. Data were collected for three subjects. Subjects. Data were collected from three observers and at the two viewing distances of 0.8m and 1.2 m. The subjects were 3 volunteers, highly interested in the outcomes of these studies. Two were students, one was a graduate psychology major with no experience in HMDs, while the other was a computer science major who was working on the HMD team. The third subject was a faculty member working with the HMD team as well. All observers were tested for stereoscopic vision using a standard test like Julesz random dot stereograms. We define veridicality of perceived as specified in the computational model since it represents the departure from of perceived depth is defined as the detected reliably at the 84% level.

depth as the difference between the nominal depth and the measured value. We shall denote it as ∆ P S E the expected point of subjective equality. Precision smallest difference in perceived depth that can be

Results. Results are shown Figures 6 and 7 for veridicality and precision of perceived depth, respectively. Figures 6a, 6b and 6c show the veridicality of perceived depths for the three basic conditions, real/real, virtual/virtual, and real/virtual, respectively. Similarly, Figure 7a, 7b, and 7c, show the precision of perceived depth plotted this time on a log scale, for the three basic conditions. Each of the three graphs shows the performance of each of the three subjects separately. A summary plot of perceived depth is shown in Figure 6d and 7d, for veridicality and precision of perceived depth, respectively, where the performances per condition were averaged over the three subjects. Figures 6a, 6b, and 6c show that there is in average no mean error in perceived depth if both objects are real or virtual, even though the variance associated with the mean values in the virtual/virtual condition is significantly higher. The upward shift of the data points in Figures 6c and 6d means that there is a shift in perceived depth for virtual objects as compared to real objects. A positive ∆ P S E , in this case, means that the virtual objects are perceived farther away than real objects. Figure 7d, which summarizes Figures 7a, 7b, and 7c, shows a significant increase in discrimination thresholds for the two conditions virtual/virtual and virtual/real as compared to the real/real condition.

-15-

Veridicality of Perceived Depth Virtual/Virtual

0.10

0.10

0.05

0.05

0.00

E J 1

∆PSE (m)

∆PSE (m)

Veridicality of Perceived Depth Real/Real

J 1E

1J

J E

E Subject 1 1 Subject 2 J Subject 3

E -0.05

-0.05 -0.10 0.60

0.00

1

0.80 1.00 1.20 Distance in Depth (m)

1.40

-0.10 0.60

0.80 1.00 1.20 Distance in Depth (m)

1.40

(a)

Veridicality of Perceived Depth Real/Virtual

Veridicality of Perceived Depth Average data for 3 Subjects

0.10

0.00

J 1

J 1

E

E

> 0.00

G B

B G

B Real/Real G Virtual/V > Real/Virtual

-0.05

-0.05 -0.10 0.60

>

0.05 ∆PSE (m)

∆PSE (m)

0.05

0.10

0.80 1.00 1.20 Distance in Depth (m) (c)

1.40

-0.10 0.60

0.80 1.00 1.20 Distance in Depth (m) (d)

Figure 6. The veridicality of perceived depth, graphed for the three conditions: (a) REAL/REAL; (b) VIRTUAL/VIRTUAL; and (c) REAL/VIRTUAL, are plotted as a function of two depth 0.8 m and 1.2 m for three subjects. Figure (d) summarizes the results for the three conditions after averaging the data over the three subjects. The shifts of the data in (c) and (d) mean that virtual objects are seen further away than real objects.

1.40

-16-

Precision of Perceived Depth Virtual/Virtual

0.100 Discrimination Threshold (m)

Discrimination Threshold (m)

Precision of Perceived Depth Real/Real

0.010

J E 0.001

0.6

J 1 E

0.8 1.0 1.2 Distance in Depth (m)

1.4

0.100

0.010

0.001

0.6

E

E

J 1

J 1

0.8 1.0 1.2 Distance in Depth (m)

E Subject 1 1 Subject 2 J Subject 3

1.4

(a)

Precision of Perceived Depth Average data for 3 Subjects

0.100

E J

E 0.010

1

1J

0.001

0.6

Discrimination Threshold (m)

Discrimination Threshold (m)

Precision of Perceived Depth Real/Virtual

0.8 1.0 1.2 Distance in Depth (m) (c)

1.4

0.100

0.010

0.001

0.6

G >

> G

B

B

0.8 1.0 1.2 Distance in Depth (m) (d)

Figure 7. The precision of perceived depth, graphed for the three conditions: (a) REAL/REAL; (b) VIRTUAL/VIRTUAL; and (c) REAL/VIRTUAL, are plotted as a function of two depth 0.8 m and 1.2 m for three subjects. Figure (d) summarizes the results for the three conditions after averaging the data over the three subjects. Those Figures show that depth discrimination thresholds are being elevated from 2 mm in condition (a) to roughly 15 mm in conditions (b) and (c). Moreover no systematic elevation of thresholds with depth was observed.

B Real/Real G Virtual/V > Real/Virtual

1.4

-176. Discussion The results on veridicality of perceived depth indicate that virtual objects are seen farther away than real objects when both objects are presented at the same depth in visual space. Similar findings were reported by Roscoe and others for familiar objects to the subjects such as airport runways. One fundamental difference between this study and others is that 2 generic objects, here a cube and a cylinder, were chosen as stimuli, rather than more familiar objects. The question is whether this finding is purely a perceptual phenomenon or if it can be predicted totally or in part by taking into account parameters of our system that we have not yet discussed. One parameter that we have not yet discussed is the one of optical distortion. It is indeed the case that at the time of doing this first set of studies, optical predistortion of the display was not yet implemented in our laboratory. Even though the optical distortion should not have affected the calibration of the system as explained earlier, it did affect the placement in depth of the virtual objects since the objects were laterally separated by 110 mm and felt away from the center of distortion in at least one of the two displays. An illustration of the amount of distortion caused by the optics is shown Fig. 9. Because we know the exact layout and parameters of the optical system, we can estimate the effect of residual distortion on assigned depth to objects. An exact calculation of the impact of optical distortion on the results reported in Figure 6 and 7 was carried out using an optical ray tracing program available in house. Calculations show that pincushion distortion for the individual monocular images has the effect of bringing the 3D virtual objects created from the two disparate virtual monocular images closer to the eyes. The calculated values were about 11mm and 7 mm for the three subjects at the viewing distances of 0.8 m and 1.2 m, respectively. This means that we underestimated the amount of shifts in depth of the virtual objects with respect to the real objects. New data are now being acquired using predistorded images (Robinett and Rolland, 1992; Rolland and Hopkins, 1993). Another more subtle, yet relevant parameter that could have biased our results is the one of illumination. It is the case indeed that illumination in real environments is often punctual (it comes from a point source of light) while illumination in virtual environments is more likely to be directional (it is not spatially localized). Moreover, the spectral range of illumination can be different in subtle ways. We noticed for example that the illumination in the virtual environment was slightly bluish, while illumination of real objects was slightly more reddish. This is of importance since it is commonly known that the binocular observation of a red light and a blue light, located at the same physical depth, often will cause the red one to appear closer than the blue, phenomenon known as chromostereopsis (Vos, 1960). Experimental data reported by Vos show that the maximum amplitude of the effect over 4 observers was 4 mm at 0.8 m viewing distance. Moreover, half of the observers reported that the red object ( a slit in Vos’s case) was perceived behind the blue slit. Due to the very slight difference in the spectral range of our illumination sources in real and virtual environments, respectively, and the measurements reported by Vos in saturated environments, we are fairly confident to claim that if chromostereopsis can explain part in our findings, it is surely a very small, if not negligible, part. Experiments are under way to confirm our believes. An elevation in variance in the measure of veridicality of perceived depth of virtual objects was recorded. It can be interpreted as a decrease in strength of the percept of depth for virtual objects in the context of that specific experiment. A variant to this effect is the need for anchor points on the psychometric functions. The use of anchor points was mentioned in section 5. In their absence, the subjects had a tendency to drift toward larger differences between perceived and physical depth, and to be more inconsistent across repeated measures. This is indeed a point of interest because it suggests that the percept of

-18depth of the virtual objects was somewhat unstable. That is, it lacked strength in the sense that the objects did not seem anchored to the ground but rather to float somewhat in space. Further investigations of this finding are necessary.

Figure 9. Illustration of pincushion optical distortion .

Depth discrimination thresholds were found to be higher in virtual than in real environments. The elevation in thresholds, however, is less than the elevation that one would predict based on the addressability of the display. This can be explained by the fact that antialiasing is performed on the images. Using a high resolution display and the same computational algorithm to render the images, a systematic study of the elevation in discrimination thresholds as a function of the amount of blur of the images can set a lower limit on what display resolution is actually needed in virtual environment to mimic depth assessments in real environments. Size and depth perception are intimately related. As a virtual or real object is moved closer it tends to look bigger, or similarly, as it becomes bigger it tends to look closer. This is generally true for familiar objects of known size but less true for generic objects (e.g.. a cube or a cylinder) or for unfamiliar objects. Even though the subject could not compare the relative sizes of the two objects to assess depth, since we eliminated that possibility by choosing two objects of different shapes, the subjects may still have used the size change in trying to assess depth. To determine if size change was an important factor in making depth judgments, we run some pilot studies under two conditions: 1. the angular subtense of the virtual object that is being translated in depth from trial to trial was kept constant, or 2. The size of the moving object was randomly selected from trial to trial within a 20% range of the mean value. In the former case, the displayed object was of constant size when using one eye. When using stereo vision, on the other hand, the object appeared to actually shrink in size as it moved forward. This did constitute a counter cue to depth since the real objects look bigger when looked at closer. Under this condition, one of the subjects noticed what was happening but the results seems unchanged. The second subject was at first fooled and at

-19first reported the objects as always farther away than the real object. In an attempt to record the new range of perceived depth of the virtual object using a staircase method, the subject was able to recalibrate himself and final results remained unchanged as compared to the normal viewing condition. More data need to be collected to fully capture what is the impact of size perception on the perceived depth of generic objects. In the latter case, no significant changes in the results was observed for the few studies that we ran. More data need to be collected to confirm this finding. 7. Conclusion We have described the set of parameters that need to be carefully considered, evaluated and set before any attempt to quantify depth and size perception in HMDs can even be considered. We have quantified in first set of experiments depth perception in three conditions, real/real, virtual/virtual, and real/virtual using an optical bench prototype HMD. Measures were made at the two viewing distances of 0.8m and 1.2 m. Our results indicate that, even for generic objects, virtual objects are perceived farther away than real objects. Moreover, no error in either calibration, image generation, or illumination could possibly account for the amplitude of the finding. Those measurements were made for collimated images. We intend to repeat those measurements for predistorded, collimated and uncollimated images. It has been suggested in the literature that inappropriate accommodation may be the main cause of such a perceptual bias. If it is the case, further results may drive the technology for HMDs to new directions such as building in an autofocus capability so that accommodation and convergence operate in a more natural way . Acknowledgments We thank the HMD and Pixel-Plane teams at UNC for their support with the various components, hardware and software, necessary to carry out the studies. Especially, we would like to thank Vern Chi, John Thomas, John Hughes, and David Harrison for their help with the experimental setup. We thank Anselmo Lastra for his stimulating discussions about Pixel Plane 5 and the graphics library. We thank Erik Erickson and Mark Mine for providing help with part of the software used to carry out the psychophysical studies. We thank Warren Robinett for his constant encouragements during this research and for his help with deriving a reliable and accurate calibration method for the system. We would also want to thank Irene Snyder for analyzing the data, and Russ Taylor and Rich Holloway for volunteering as subjects at different phases of the study. Finally, I would like to give a special thank to my husband Bruce Scher for helping with the alignment of the system and for his constant support during this work. This research was supported by NIH PO1 CA47982-04, DARPA DAEA 18-90-C-0044, NSF Cooperative Agreement #ASC-8920219, and DARPA: “Science and Technology Center for Computer Graphics and Scientific Visualization,” ONR N00014-86-K-0680, and NIH 5-R24-RR02170. References Bajura, M., H. Fuchs, and R. Ohbuchi, “Merging virtual objects with the real world,” Computer Graphics (Proceedings of SIGGRAPH ‘92), 26 (2), pp203-210 (1992). Barrett, H. H., W. Swindell, Radiological Imaging: The Theory of Image Formation, Detection, and Processing, vol. 1, Academic Press (1981).

-20Benel, R.A., and D.C.R. Benel, “Accommodation in untextured stimuli fields,” Technical Report Eng. Psy. 79-1/AFOSR-79-1 Champaign, IL: University of Illinois at UrbanaChampaign, Department of Psychology (1979). Burton, G.J., and R. Home, “Binocular summation, fusion and tolerances,” 809 (1980). Cogan, D.G, “Accommodation Ophtalmology, 18, 739-766 (1937).

abd

the

autonomic

nervous

Optica Acta, 27, p.

system,”

Archives

of

Deering, M., “High resolution virtual reality,” Computer Graphics, 26, 2 (1992). Ellis, R.S., personal communications, (1993). Enright, J.T., “The eye, the brain, and the size of the moon: toward a unified oculomotor hypothesis for the moon illusion,” in The Moon Illusion, Hershenson M. (Ed.), Hillsdale, N.J.: Erlbaum (1989). Fuchs, H., J. Poulton, J. Eyles, T. Greer, J. Goldfeather, D. Ellsworth, S. Molnar, and L. Israel, “Pixel-Planes 5: A heterogeneous multiprocessor graphics system using processorenhanced memories,” Computer Graphics (Proc. of SIGGRAPH ‘89), 23 (3), pp79-88 (1989). Hadani, I., “Corneal lens goggles and visual space perception,” 4136-4147 (1991).

Applied Optics,

30 (28),

Haines R.F., E. Fisher, and T. Price, “Head-up transition behavior of pilots with and without head-up display in simulated low visibility approaches,” NASA TP-1720 (1980). Leibowitz, H.W., and D.A. Owens, “New evidence for the intermediate position of relaxed accommodation,” Documenta Ophtalmologica, 46, 133-147 (1978). Levine, M. D., Vision in Man and Machine, Mc-Graw-Hill, New York, p 61, (1985). Lockhard, G.D., and M.L. Wolbarsh, “The moon and other toys,” in The Moon Illusion, Hershenson M. (Ed.), Hillsdale, N.J.: Erlbaum (1989). Longhurst, R.S., Goemetrical and Physical Optics. New York: Longman (1973). Mandelbaum, J., “An accommodation phenomenon,” 926 (1960). McKay, H.C., Boston (1948).

Principles

of

Stereoscopy,

Archives of Ophthalmology, 63, 923-

American

Photographic

Publishing

Company,

Palmer, E., and F.W. Crown, “Touchdown performance with computer graphics night visual attachment,” In Proceedings of the AIAA Visual and Motion Simulation Conference, 1-6 New York: American Institute of Aeronautics and Astronautics (1973). Randle, R.J., S.N. Roscoe, and J. Petitt, “Effects of accommodation and magnification on aimpoint estimation in a simulated landing task,” Tech Paper NASA-TP-1635. Washington DC: National Aeronautics and Space Administration (1980). Reading, R.W., Binocular Vision: Foundations and Applications, (1983).

Butterworth Publishers

Robinett, W., and R. Holloway, “The visual display computation for virtual reality,” preparation).

(in

Robinett, W., and J.P. Rolland, “A computational model for the stereoscopic optics of a head mounted display,” Presence, 1 (1), 45-62 (1992).

-21Rolland, J.P., and C.A. Burbeck, “Depth and size perception in virtual environments,” invited speaker at the Optical Society of America annual meeting, New Mexico, (abstract) (1992). Rolland, J.P., and T. Hopkins, “A method of computational correction for optical distortion in head-mounted displays" Technical Report TR-93-045, Dept. of Computer Science, University of North Carolina at Chapel Hill (1993). Roscoe S.N., “The effect of eliminating binocular and peripheral monocular visual cues upon airplane pilot performance in landing,” Journal of Applied Psychology, 32, 649-662 (1948). Roscoe, S.N., S.G. Hasler, and D.J. Dougherty, “Flight by periscope: making tradeoffs and landings; the influence of image magnification, practice, and various conditions of flight,” Human factors, 8, 13-40 (1966). Roscoe S.N., L. A. Olzak, and R. J. Randle, “Ground-referenced visual orientation with imaging displays: monocular versus binocular accommodation and judgments of relative size,” Proc. of the AGARD Conf. on Visual Presentation of Cockpit Information Including Special Devices for Particular Conditions of Flying (pp. A5.1-A5.9). Neuilly-sur-Seine. France: NATO (1976). Roscoe S.N., “When day is done and shadows fall, we miss the airport most of all,” HUMAN FACTORS, 21 (6), 721-731 (1979). Roscoe S.N., “Judgments of size and distance with imaging displays,” (6), 617-629 (1984).

HUMAN FACTORS, 26

Roscoe S.N., “Bigness is in the eye of the beholder,” HUMAN FACTORS, 27 (6), 615-636 (1985). Roscoe S. N, “Improving visual performance through volitional focus control,” HUMAN FACTORS, 29 (3), 311-325 (1987). Sheehy, J.B., and M. Wilkinson, “Depth perception after prolonged usage of night vision goggles, “ Aviat. Space Environ. Med., vol. 60, 573-579 (1989). Simonelli, N.M., “The dark focus of visual accommodation: its existence, its measurements, its effects,” Doctoral dissertation, University of Illinois at Urbana-Champaign (1980). Vision Research. Flight Helmet / 3193 Belick Street #2, Santa Clara, CA 95054. Vos, J.J., “Some new aspects of color stereoscopy,” JOSA, 50, 785-790 (1960).