Realistic Perspective Projections for Virtual Objects and Environments

83 downloads 0 Views 6MB Size Report
of information [Murray 1994]. ... manufacturers, and the distance (d) of the user's head to the screen. ..... tal impression nonaffected by slight head movements.
Realistic Perspective Projections for Virtual Objects and Environments FRANK STEINICKE and GERD BRUDER University of Wurzburg ¨ and SCOTT KUHL Michigan Technological University Computer graphics systems provide sophisticated means to render virtual 3D space to 2D display surfaces by applying planar geometric projections. In a realistic viewing condition the perspective applied for rendering should appropriately account for the viewer’s location relative to the image. As a result, an observer would not be able to distinguish between a rendering of a virtual environment on a computer screen and a view “through” the screen at an identical real-world scene. Until now, little effort has been made to identify perspective projections which cause human observers to judge them to be realistic. In this article we analyze observers’ awareness of perspective distortions of virtual scenes displayed on a computer screen. These distortions warp the virtual scene and make it differ significantly from how the scene would look in reality. We describe psychophysical experiments that explore the subject’s ability to discriminate between different perspective projections and identify projections that most closely match an equivalent real scene. We found that the field of view used for perspective rendering should match the actual visual angle of the display to provide users with a realistic view. However, we found that slight changes of the field of view in the range of 10-20% for two classes of test environments did not cause a distorted mental image of the observed models. Categories and Subject Descriptors: I.3.7 [Computer Graphics]: ThreeDimensional Graphics and Realism General Terms: Human Factors Additional Key Words and Phrases: Perspective projection, scene perception, psychophysics

This work was partly funded by grants from the Deutsche Forschungsgemeinschaft (DFG) in the scope of the LOCUI project. Authors’ addresses: F. Steinicke (corresponding author) and G. Bruder, Immersive Media Group, University of W¨urzburg, Germany; email: [email protected]; S. Kuhl, Department of Computer Science, Michigan Technological University. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works requires prior specific permission and/or a fee. Permissions may be requested from Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212) 869-0481, or [email protected]. c 2011 ACM 0730-0301/2011/10-ART112 $10.00  DOI 10.1145/2019627.2019631 http://doi.acm.org/10.1145/2019627.2019631

ACM Reference Format: Steinicke, F., Bruder, G., and Kuhl, S. 2011. Realistic perspective projections for virtual objects and environments. ACM Trans. Graph. 30, 5, Article 112 (October 2011), 10 pages. DOI = 10.1145/2019627.2019631 http://doi.acm.org/10.1145/2019627.2019631

1.

INTRODUCTION

In computer graphics one is often concerned with representing spatial scenes and relationships on a flat display screen. When we view imagery on a typical desktop screen, the screen subtends a relatively small visual angle at the observer’s eye compared to the effective visual field of humans (approximately 200 degrees horizontally and 150 degrees vertically [Warren and Wertheim 1990]). We sometimes refer to this angle as the Display Field Of View (DFOV) (see Figure 2). Most of today’s computer screens have a screen size of roughly 13 to 30 , thus the DFOV averages 28 to 60 degrees diagonally, assuming an ergonomic viewing distance of 65cm. In order for a virtual scene to be displayed on a computer screen, the computer graphics system must determine which part of the scene should be displayed where on the screen. In 3D computer graphics, planar geometric projections are typically applied, which make use of a straightforward mapping of graphical entities in a 3D “view” region—the so-called viewing or view frustum—to a 2D image plane. During the rendering process objects inside the view frustum are projected onto the 2D image plane; objects outside the view frustum are omitted. The exact shape of the view frustum will often be a symmetric or asymmetric truncated rectangular pyramid. The angle at the peak of the pyramid, often denoted as Geometric Field Of View (GFOV) [McGreevy et al. 1985] should match the DFOV for the imagery to be projected in a geometrically correct way. Perspective projections of virtual environments on a computer screen are affected by the interplay between the geometric field of view and the field of view provided by the display, on which the virtual 3D environment is projected (see Figure 1). For instance, if the GFOV is greater than the DFOV, rendered objects appear smaller or “stretched” on the computer screen, with inverse effects caused by a smaller GFOV. Although the perceptual consequences of rendering virtual scenes with different GFOVs are not clear, graphics designers and developers often choose GFOVs that vary from the DFOV. The choice for a certain GFOV is often driven by aesthetics, necessity to display a certain amount of virtual space, or based on a rule-of-thumb. This assumption is confirmed by disclosures of developers as well as reviewing of existing 3D modeling applications and games. Though correct perception of metric properties is not always intended or necessary in the field of entertainment, ACM Transactions on Graphics, Vol. 30, No. 5, Article 112, Publication date: October 2011.

112

112:2



F. Steinicke et al.

(a) GFOV=10◦

(b) GFOV=20◦

(c) GFOV=30◦

(d) GFOV=40◦

(e) GFOV=50◦

(f) GFOV=60◦

Fig. 1. Which Utah teapot depicts best the physical counterpart? The teapot is rendered with different Geometric Fields Of View (GFOVs). The vertical fields of view gradually increase by 10◦ from left (10◦ ) to right (60◦ ); the horizontal field of view is adjusted such that the image ratio is maintained. The virtual camera is translated forward or backward to ensure that the projected teapot covers the same amount of screen space.

GFOV

DFOV

OFOV

top left right bottom

near

far

Fig. 2. Illustration of geometric, display, and object fields of view. The bottom-left inset shows a top view of the scene, the bottom-right inset shows a generic view frustum.

it is essential for application scenarios in which the virtual scene serves as basis for physical reconstruction, as is often the case in the virtual prototyping, 3D modeling, architecture, and computeraided design domain. Due to the known problems with perspective renderings, these applications often provide also orthographic projections, which conserve the relative size of virtual geometry during rendering on the image plane regardless of depth, but do not support linear perspective depth cues. Therefore, only a limited 3D impression can be conveyed by these kind of projections. As a result, switching between orthographic and perspective projections is often required to ensure a realistic spatial impression and accurate perception of scene metrics, emphasizing the importance of undistorted perspective projections. Perspective projections with matching DFOV and GFOV present a viewer with a geometrically correct image, as if the user was viewing a virtual window through the frame of the screen. It is frequently assumed that, in such a setup, observers perceive the metric properties of a virtual scene almost in the same way as they would perceive the properties of a corresponding real-world scene. However, this assumption has not been confirmed in previous research, in particular for Virtual Reality (VR) display environments [Steinicke et al. 2009; Franke et al. 2008; Ries et al. 2009]. For instance, in immersive VR systems larger GFOVs can be used [Kuhl et al. 2009] to compensate for compression effects in the user’s distance perception often experienced in VR systems [Loomis and ACM Transactions on Graphics, Vol. 30, No. 5, Article 112, Publication date: October 2011.

Knapp 2003; Loomis et al. 1996; Willemsen et al. 2009; Witmer and Kline 1998; McGreevy et al. 2005]. In the context of desktop-based environments, little research has been conducted to identify perspective projections that appear most realistic to users, and support correct perception of computer-generated virtual scenes. In this article we investigate how accurately a user is able to detect perspective distortions when she is asked to compare a real scene to an equivalent virtual world scene rendered with varying amounts of distortion. These experiments also allow us to measure the amount of distortions which people are reliably able to detect. The article is organized as follows. Section 2 provides necessary background information and presents an overview of related work. Section 3 introduces an exposure study, which reveals problems of using arbitrary or default GFOVs in rendering environments. Section 4 describes psychophysical experiments in which we analyze realistic projections for virtual objects and scenes. Section 5 discusses the experimental results. Section 6 concludes the article and discusses future work.

2.

BACKGROUND AND RELATED WORK

2.1

Natural Perspectives in Photography

It is a challenging task to project a 3D space on a planar display surface in such a way that metric properties of scene objects are perceived correctly by human observers. In order to extract 3D depth, size and shape cues about visual scenes from planar representations, the human visual system combines various different sources of information [Murray 1994]. Since 2D pictorial or monocular depth cues, for example, linear perspective or retinal image sizes, are interpreted by the visual system as 3D depth, the information they present may be ambiguous [Kjelldahl and Prime 1995]. These ambiguities make it difficult to extract correct depth cues and are intentionally introduced in many common optical illusions [Gillam 1980]. For instance, it was known to painters a long time ago that linear perspective may produce pictures that provide incorrect object properties. Leonardo da Vinci formulated a rule-of-thumb which states that the best distance for depicting an object is about twenty times the size of the object [Zorin and Barr 1995]. In photography, different lenses are used to convey different spatial impressions. For example, in images taken using a wide angle of view, an object close to the lens appears abnormally large relative to more distant objects, and in distant shots with a narrow angle of view, the viewer cannot discern relative distances between distant objects. Although such distortions are often unintended, perspective distortions are sometimes applied deliberately for artistic purposes in photography and cinematography [Hagen 1980; Pirenne 1970]. Extension (wide angle) distortion is often implemented to emphasize some element of the scene by making it appear larger and spatially removed from the other elements. Compression (telephoto) distortion is often used to

Realistic Perspective Projections for Virtual Objects and Environments give the appearance of compressed distance between distant objects in order to convey a feeling of congestion. Conversely, a normal lens provides a natural perspective, which approximates the human eye. As a rule-of-thumb, for a camera with a 35mm image size (43mm diagonal) a normal lens has a focal length of about 50mm, though focal lengths between 40mm and 58mm are still regarded normal [Stroebel 1999]. In cinematography, focal lengths of about twice the diagonal of the image size are regarded as normal, which corresponds to the typically greater viewing distance of about twice the diagonal of the screen. Although the retinal image changes when viewing a projected 2D image from different viewpoints, humans are in principle able to perceive stable object properties because of Emmert’s law [Vishwanath et al. 2005]. However, it has only been shown that the estimation of object properties does not change when changing the viewpoint, but it is not clear whether the estimation was correct in the first place.

2.2

View Frustums

The perspective view frustum defines the 3D volume in space, which is projected to the image plane. As illustrated in Figure 2, a view frustum can be specified in view space by the distances of the near (near ∈ R+ ) and far (far ∈ R+ ) clipping planes, and the extents of the projection area on the near plane (left, right, top, bottom ∈ R) [Shreiner 2009]. Alternatively, a symmetrical view frustum can be specified using only the vertical geometric field of view1     top ∈ 0, 180 , (1) GFOV = 2 · atan near and the image ratio (ratio ∈ R+ ). The nominal DFOVs of computer screens can be derived by the width (w) and the height (h) of the screen specified by the manufacturers, and the distance (d) of the user’s head to the screen. Assuming a symmetrical view frustum, the vertical display field of view can be calculated with the following equation. We have   h , (2) DFOV = 2 · atan 2·d whereas the horizontal display field of view can be calculated by replacing h with w in Eq. (2). Ideally, changes of the user’s head pose would be tracked and used to update the perspective projection in such a way that the view frustum is always aligned in correspondence with the half lines extending from the user’s eyes (for stereoscopic projections) to the corners of the physical screen (refer to Figure 2) [Burdea and Coiffet 2003; Holloway and Lastra 1995]. To implement such constantly adjusting projections, additional hardware, that is, stereoscopic viewing and tracking technology, is required, which is often not available in typical desktop-based environments.

2.3

Minification and Magnification

Most modern computer screens provide relatively narrow fields of view in comparison to the effective visual field of humans (refer to Section 1). As a result, if a virtual world is rendered such that the GFOV matches the DFOV, a user would see less of the virtual space than he would see in the real world with an unrestricted view. Therefore, many graphics applications use a GFOV that is 1 For

simplicity, we refer to GFOV and DFOV as the vertical fields of view, if not stated otherwise.



112:3

GFOV display

display

GFOV Fig. 3. Illustration of the relationship between the DFOV and GFOV used for perspective rendering. The left image shows a magnification, the right image a minification of the virtual object.

larger than the DFOV to provide a wider view of the virtual scene. When the GFOV and DFOV differ, the virtual imagery is mini- or magnified [Polys et al. 2005]. As illustrated in Figure 3 (left), if the GFOV is smaller than the DFOV, the displayed image will appear magnified on the computer screen because of the requirement for the image to fill a larger subtended angle in real space versus virtual space. Conversely, if the GFOV is larger than the DFOV, a larger portion of the virtual scene needs to be displayed on the same screen space, which will appear minified (see Figure 3 (right)). Differences between the visual fields also change the spatial sampling of a rendered image; more pixels will be devoted to different parts of the scene. Mini- or magnification changes several visual cues that provide information about metric properties of objects such as distances, sizes, shapes, and angles [Kjelldahl and Prime 1995]. For example, minification changes the visual cues in a way that can potentially increase perceived distances to objects [Kuhl et al. 2009]. On the other hand, magnification changes these cues in the opposite direction and can potentially decrease perceived distance. The benefits of distorting the GFOV are not clear. A series of studies were conducted to evaluate the role of the GFOV in accurate spatial judgments. Relative azimuth and elevation judgments in a perspective projection were less accurate for GFOVs greater than the DFOV [McGreevy et al. 1985]. This effect has been noted in see-through stereoscopic displays that match real-world viewing with synthetic elements [Rolland et al. 1995]. Alternatively, room size estimation and distance estimation tasks were aided by a larger GFOV [Neale 1996]. The sense of presence also appears to be linked to an increased GFOV [Hendrix and Barfield 1996]. For other tasks, such as estimating the relative skew of two lines, a disparity between DFOV and GFOVs was less useful [Rosenberg and Barfield 1995]. Mini- or magnification of the graphics is caused by changing the extents of the near plane, for example, by increasing or decreasing the vertical and horizontal GFOVs (see Figures 2 and 3). The described mini- and magnification can be implemented by means of field of view gains. The gain gF ∈ R+ denotes the ratio between . If the GFOV is geometric and display fields of view gF = GFOV DFOV scaled by a gain gF (and the horizontal GFOV is modified accordingly using ratio), we can determine the mini-/magnification with the following equation. m=

tan(GFOV/2) tan((gF · GFOV)/2)

(3)

The mini-/magnification m denotes the amount of scaling that is required to map the viewport (rendered with a certain GFOV and ratio) to the display (defined by its DFOV and ratio). If this mini/magnification equals 1.0, a person will perceive a spatially accurate ACM Transactions on Graphics, Vol. 30, No. 5, Article 112, Publication date: October 2011.

112:4



F. Steinicke et al. Table I. Single Virtual Objects Used in the Exposure Study.

(a)

(b)

(c)

C1 : +0.35% C2 : +2.47%

C1 : −6.98% C2 : −8.63%

C1 : +1.19% C2 : −2.37%

(d)

(e)

(f)

C1 : +3.67% C2 : +16.00%

C1 : −5.57% C2 : −5.09%

C1 : +2.76% C2 : +9.17%

(g)

(h)

(i)

C1 : +0.63% C2 : −6.73%

C1 : +2.27% C2 : +6.77%

C1 : +2.65% C2 : −0.44%

The values show the amount of distortion perceived by the subjects under the conditions (C1 ) GFOV = 25.99◦ and (C2 ) GFOV = 45◦ .

image. Figure 3 (left) illustrates magnification (m > 1) of an image when the GFOV is decreased (gF < 1), whereas Figure 3 (right) illustrates minification (m < 1) with an increased GFOV (gF > 1). Since we are interested in identifying the observer’s ability to discriminate between different perspective projections rather than discriminating between minified or magnified environments, we compensate mini- and magnification by translating the camera along the view direction. The amount of camera translation was selected such that the ratio of the visual angle of the rendered objects, which we refer to as Object Field of View (OFOV), and the GFOV are identical (refer to Figure 1). In a symmetric viewing condition the distance between the camera and the center of the top for a given GFOV. virtual object can be calculated by tan(GFOV/2) In cinematography this technique is referred to as dolly-zoom or vertigo technique [Stroebel 1999]. However, the scenes are displayed with different amounts of distortion (see Figure 1), and can significantly vary from viewing a real-world replica.

3.

Table I) with the 3D modeling application 3D Studio Max 2010 SP1 developed by Autodesk.2

3.1

Participants

For the experiment we recruited nine male and four female (age 23-56, ∅ : 28) experts in the domain of computer graphics, architectural design, 3D modeling, and CAD, each with at least two years professional experience. Prior to the experiment subjects had to fill out questionnaires to rate their levels of expertise. All had normal or corrected-to-normal vision; six wore glasses or contact lenses. Six subjects rated themselves as experts in computer games, nine as experts in 3D modeling, and four as experts in CAD. All subjects were na¨ıve to the experimental conditions. The total time per subject including prequestionnaire, instructions, training, experiment, breaks, and debriefing took approximately one hour for each experiment (refer to, Sections 3.2, 4.1.1, and 4.2.1). Subjects were allowed to take breaks at any time, and were informed that they should focus on accuracy rather than performance.

EXPOSURE STUDY

In an exposure study we analyzed how far discrepancies between GFOV and DFOV affect perception of metric properties of virtual scenes. Therefore, we displayed nine virtual 3D objects (see ACM Transactions on Graphics, Vol. 30, No. 5, Article 112, Publication date: October 2011.

2 Autodesk’s

3D Studio Max 2010 SP1 serves as one example. The default GFOV of the 3D preview within 3D Studio Max is 45◦ . Similar default GFOVs can be found in Cinema4D, AutoCAD, SketchUp, etc.

Realistic Perspective Projections for Virtual Objects and Environments

3.2

Material and Methods

According to ergonomic guidelines described by Ankrum [1999], we positioned subjects in front of a computer screen. Each subject’s head was fixed on a chin-rest in such a way that subjects assume a posture that is both visually and posturally comfortable. We adjusted the chin-rest such that the distance from the eyes to the computer screen was 65cm for each subject. We used a Fujitsu Siemens Scenicview P20-2 with a resolution of 1600 × 1200 and a screen size of 40cm × 30cm. The vertical and horizontal max sync rate was 76Hz × 82kHz with a response time of 8ms. The image was displayed with a contrast ratio of 800 : 1, image color temperature of 9300K, and brightness of 300cd/m2 . The viewing angle to the monitor was 32.5◦ below horizontal eye level, and the monitor tilt was 32.5◦ . This setup resulted in a symmetric viewing condition with a DFOV of 25.99◦ , whereas 3D Studio Max 2010 SP1 provided a default geometric field of view of 45◦ for the 3D preview. Considering the DFOV provided by our setup (DFOV= 25.99◦ ), the default GFOV provided by 3D Studio Max corresponds to a field of view gain of gF = 1.66 applied to the GFOV that matches the DFOV. In order to determine the amount of perceived distortion, subjects had to estimate relations within the tested 3D objects. For instance, when viewing the tire (see Table I, Figure (b)), a subject’s task was to estimate the ratio between the inner and outer circle, or when viewing the fan (see Table I, Figure (g)), the task was to estimate the length of the fan-blades relative to the stand. For the remaining objects subjects estimated the ratio of similar radii or side lengths. Subject were allowed to change their virtual viewpoint by using the trackball metaphor provided by the modeling application. We tested similar relations for all objects under two different conditions: (C1 ) GFOV = 25.99◦ (i. e., the GFOV that matches the DFOV), and (C2 ) GFOV = 45◦ (i. e., the default GFOV provided by the application). The independent variables were the used field of view, that is, condition C1 and C2 , as well as the considered objects (see Table I (a)–(i)). As dependent variable we measured the relative deviation of the estimation to the correct metric. We hypothesized that subjects would extract distorted metric properties of a 3D object, when the DFOV varies from the GFOV. H0. Estimation of metric properties improves under condition C1 in comparison to C2 .

3.3

Results

Table I shows the discrepancy between the subject’s estimations and the real metric relations of the nine objects (a)–(i) under conditions C1 and C2 . We analyzed the metric estimation with a two-way analysis of variance (ANOVA), testing the within-subjects effects of the field of view and the virtual object. The analysis revealed a significant main effect of the different fields of view in general (F (1, 495) = 4.24, p < 0.05) as well as the fields of view for individual virtual objects (F (8, 740) = 6.34, p < 0.01). Post-hoc analysis with the Tukey test showed that subjects made significantly smaller errors in estimating the object relations for objects (d), (e), and (f) (p < 0.01) under viewing condition C1 compared to the viewing condition C2 . We found no significant difference between the other objects under conditions C1 and C2 . These results support hypothesis H0 that the subject’s ability to estimate object metrics improves under condition C1 in comparison to C2 . On average subjects were about 4% better in estimating metrics of a 3D object which is displayed with a GFOV that matches the DFOV (C1 ) in comparison to an object that is displayed with the default of the modeling application (C2 ). For 7 of the 9 virtual



112:5

objects, subjects were better in judging metric properties under condition C1 . The relative errors made by subjects under condition C1 are on average 2.90%, whereas under condition C2 the relative errors average 6.41%.

3.4

Discussion

The findings of the exposure experiment suggest that subjects have difficulty estimating metrics of objects correctly when objects are rendered with default geometric fields of view as used in current modeling applications such as 3D Studio Max, which are often different from the DFOV. The results motivate that the GFOV should be adjusted to the physical viewing condition for undistorted mental impressions of displayed objects. However, the results have not revealed whether the estimation of object metrics is optimal for a GFOV that matches the DFOV (gF = 1) or for any other gain (considering perceptual biases [Loomis and Knapp 2003; Willemsen et al. 2009; Steinicke et al. 2009]) or how much humans are aware of perspective distortions of virtual environments displayed on a computer screen. The latter closely relates to the question how an observer’s mental impression of a model is affected by slight field of view mismatches, such as caused by head movements in front of a display, in particular if there is a close perceptual drop-off from the optimal GFOV towards greatly increased or decreased GFOVs, that is, if tracking the user’s head in front of a display is a necessity, or if precalibration of the field of view can provide an accurate mental impression nonaffected by slight head movements. We address these questions in the following experiments.

4.

PSYCHOPHYSICAL EXPERIMENTS: REALISTIC PROJECTIONS

We performed two psychophysical experiments to identify the optimal GFOV for rendering, and to determine human awareness of perspective distortions of virtual environments displayed on a computer screen. The subject’s task was to compare a real-world view of a physical scene to two renderings (with different GFOVs) of a corresponding virtual replica of the same physical scene, and to identify the perspective rendering that matches the real-world view more accurately. In order to manipulate the GFOV we applied a field of view gain gF ∈ R+ to the virtual camera frustum by replacing the vertical angle fovy of the view frustum by gF · fovy. The horizontal angle fovx is manipulated accordingly with respect to the ratio of the viewport. Since it is rather difficult to visually estimate whether a perspective distortion is the result of a field of view that is either too large or too small, we decided to use a two-interval forced-choice task for the experiment. We used the method of constant stimuli in which the applied gains are not related from one trial to the next, but presented randomly and uniformly distributed [Ferwerda 2008]. Alternately, we presented the subjects two projections of a virtual scene both rendered with different gains by preserving the ratio of OFOV and DFOV as described earlier (refer to Figure 1). The two visual stimuli we tested represented typical CAD, 3D modeling, or architecture applications. Since the goal of our experiments is to analyze if subjects can discriminate between different perspective projections of virtual scenes from physical counterparts, we required physical scenes which could be presented simultaneously with their virtual counterparts to the subjects. In experiment E1 (refer to Section 4.1) we used a Utah teapot, and in experiment E2 (refer to Section 4.2) we presented a replica of our real laboratory. We applied different gains to the GFOV used for the perspective projection of both virtual scenes ranging between 0.4 and 1.6 in ACM Transactions on Graphics, Vol. 30, No. 5, Article 112, Publication date: October 2011.

112:6



F. Steinicke et al.

steps of 0.2. We considered each gain constellations 6 times, but omitted constellation of equal gains resulting in 7 × 6 × 6 overall trials for all resulting constellations in a randomized order. The subjects had to choose between one of two possible responses, that is, “Which one of both virtual scenes is a more realistic model of the physical scene: scene A or scene B?”. Responses like “I can’t tell.” were not allowed. In the case of uncertainty, when subjects cannot detect the signal, they must guess, and will be correct on average in 50% of the trials. The gain at which a subject responds “scene A” in half of the trials is taken as the Point of Subjective Equality (PSE), at which the subject perceives both projections of the scene as equally realistic or unrealistic respectively. We hypothesized that subjects would perceive virtual scenes identical to physical counterparts if the DFOV equals the GFOV. H1. The average PSE of all subjects is gF = 1. As the gain decreases or increases from this value the ability of the subject to detect the distorted projection of the scene increases, resulting in a psychometric curve for the discrimination performance. The amount of change in the stimulus required to produce a noticeable sensation is defined as the Just Noticeable Difference (JND). However, stimuli at values close to thresholds will often be detectable. Therefore, thresholds are considered to be the gains at which the manipulation is detected only some proportion of the time. In psychophysical experiments, usually the point at which the curve reaches the middle between the chance level and 100% is taken as threshold. Therefore, we define the Detection Threshold (DT) for gains smaller than the PSE to be the value of the gain at which the subject has 75% probability of choosing the “scene A” response correctly and the detection threshold for gains greater than the PSE to be the value of the gain at which the subject chooses the “scene A” response in only 25% of the trials (since the correct response was then chosen in 75% of the trails). In this article we focus on the range of gains over which the subject cannot reliably detect the difference, and in particular the gain at which subjects perceive the physical and virtual scene as identical. The 25% to 75% range of gains will give us an interval of possible perspective projections, which can be used for rendering without users noticing a distortion of the scene. The PSE gives indications about how to map a DFOV to a GFOV such that a perspective projection appears to match the real-world scene.

4.1

Experiment E1: Single Object

In this experiment we tested subjects’ ability to discriminate between different perspective distortions, when a single virtual object is displayed. In the tradition of various computer graphics researchers comparing rendering algorithms, we used a virtual Utah teapot as representative reference object in this experiment. The interaction with such a single object is a typical situation in 3D modeling or CAD applications. Ten of the participants (4 females, age 23-33, ∅ : 27) of the exposure study participated in the experiment (refer to Section 3.1). 4.1.1 Material and Methods of E1. Setup. Figure 4 illustrates the setup used in the experiment. The viewing setup in this study was similar to the setup described in Section 3.2. In order to allow subjects to compare the view to the physical teapot with the view to the virtual teapot, we displayed virtual and real view side-by-side. The physical prop representing this virtual model was a white chinaware teapot manufactured by Melitta (see Figure 4). The virtual teapot that we used consisted of 6, 325 triangles and was slightly adapted by a professional 3D ACM Transactions on Graphics, Vol. 30, No. 5, Article 112, Publication date: October 2011.

Fig. 4. Illustration of the experimental setup: a subject in front of the screen with the virtual teapot on the left and the screen frame with the physical teapot on the right. The view to both teapots is identical from the subject’s perspective.

modeler such that virtual and physical teapots were congruent in submillimeter range. The angle between the view to the virtual and physical teapot was approximately 35◦ . To ensure equal viewing conditions, we placed the physical teapot inside the frame of another Fujitsu Siemens Scenicview P20-2 screen, which had been disassembled for the purpose of this experiment. The physical teapot was placed on a stand in such a way that the plane defined by the frame intersected the center of the teapot. As illustrated in Figure 4, we attached a black cloth to the frame and over the stand to provide the same background for virtual and real teapot. Both teapots were displayed at the same height and each rotated around the yaw axis by 45 degrees and slightly tilted by 20◦ such that the extents of the teapot were visible. We defined the initial perspective and the virtual teapot in such a way that GFOV and DFOV were identical, and that the visual angle of the physical teapot as seen by the subjects and the OFOV of the virtual teapot were identical. For half of the subjects the computer screen was left, for half of the subjects the computer screen was right of the physical teapot. The virtual teapot was rendered with Crytek’s CryEngine 3. As illustrated in Figure 4 we manually adjusted the global illumination parameters of the rendering in such a way that it mimics the real-world lighting conditions affecting the real teapot in the laboratory. Procedure. We used a within-subject design in the experiment. At the beginning of the experiment, subjects were told to view and memorize the size and shape of the physical teapot for 1 minute. During the first minute their heads were not constrained by the chin-rest. Afterwards, the subjects were positioned in front of the screens with their heads fixed by the chin-rest as described in Section 3.2, and the trials started. In each trial subjects saw two images of the teapot rendered with different perspective projections as described before. To switch between the two renderings subjects used a PowerMate manufactured by Griffin Technology (see Figure 4). A clockwise rotation by six degrees switched the second rendering active, a counter-clockwise rotation switched back to the first rendering. In order to avoid subjects directly comparing both projections of the teapot and relying on size cues only, we displayed a blank gray image for 80ms between the renderings of the teapot as a short interstimulus interval [Rensink et al. 1997]. In each trial the subject’s task was to decide, based on the stimuli, which of both teapots was a more realistic model of the real teapot. In order to

Realistic Perspective Projections for Virtual Objects and Environments



112:7

Fig. 5. Pooled results of the discrimination between (a) virtual and physical object and (b) virtual and physical scene. The x-axis and y-axis show the applied field of view gains gF , the color shows the probability that subjects estimate the virtual object or scene rendered with a larger GFOV as a more realistic model of the physical object or scene, respectively. Note that equal gains have not been tested, but were identified with a probability of 0.5.

explain the task, we started with two example trials in which clearly distorted virtual teapots were shown. In order to compare both renderings with the physical view, subjects could switch as often as desired. Subjects were told that the time was not measured, and they should focus on accuracy of their judgment. When a subject was confident that the currently displayed virtual teapot was a more realistic model of the physical teapot, they had to push the button on the PowerMate to indicate the end of the trial. Before the next trial started we applied another gray image for 240ms as interstimulus interval. We used this transition to avoid subjects being able to directly compare the visual stimuli of two subsequent trials [Rensink et al. 1997]. 4.1.2 Results of E1. Figure 5(a) shows the pooled results of the subjects for all gain conditions, with the field of view gains gF on the x-axis applied to the first rendered teapot, respectively on the y-axis for the second rendered teapot. The color shows the probability that subjects estimate the virtual object rendered with a larger GFOV as a more realistic model of the physical object. The results show that subjects have difficulty detecting perspective distortions when a single object is rendered. Figure 7(a) shows the pooled results for the discrimination task between virtual and physical teapots together with the standard error, here considering only the gain conditions, in which one applied field of view gain assumed gF = 1.0. The x-axis shows the applied field of view gain gF that varied from gF = 1.0. The y-axis shows the probability that subjects perceive the virtual teapot as more realistic when it is rendered: (1) magnified (GFOV smaller than DFOV: gF < 1) or (2) minified (GFOV greater than DFOV: gF > 1). The solid line shows the fitted sigmoid function 1 of the form f (x) = 1+ea·x+b with real numbers a and b (start values a = −9.5 and b = 10). We found no dependency whether we arranged the physical teapot left or right of the virtual teapot and therefore pooled the two conditions. From the sigmoid function we determined a slight bias for the point of subjective equality (PSE = 0.9744). The PSE shows that subjects are quite accurate in discriminating between different perspective distortions when the GFOV almost matches the DFOV. For individual subjects, we found PSEs of 0.79, 0.80, 0.83, 1.00, 1.17, 1.00, 1.06, 1.10, 1.00, 1.17 (∅ : 0.9932). The DTs were at gains of 0.8593 and 1.1096 for responses in which subjects judged the teapot rendered with the larger field of

Fig. 6. Illustration of subject’s side-by-side view of (a) virtual laboratory and (b) view to real laboratory through the screen frame.

view as a more realistic model. These results show that gain differences within this range cannot be reliably estimated when the GFOV varies from the tested DFOV (25.99◦ ) between 22.33◦ and 28.84◦ .

4.2

Experiment E2: Virtual Environment

In this experiment we tested the subjects’ ability to discriminate between different perspective distortions when an entire virtual environment is displayed. In order to present subjects again with a view to a corresponding real-world setup, we modeled the laboratory room as virtual environment. The exploration of such room-size virtual environments is a typical situation for architectural design and CAD applications. Twelve of the participants (4 females, age 23-56, ∅ : 28) of the exposure study participated in the experiment (refer to Section 3.1). 4.2.1 Material and Methods of E2. The viewing setup in this study was similar to the setup described in Section 3.2. Again, we used a disassembled monitor side-by-side with the monitor displaying the virtual laboratory, and arranged both on a table in the laboratory. Subjects were seated in front of the display analog to the setup described in Section 3.2. Their heads were fixed on the same chin-rest such that the distance from the eyes to the computer screen was 65cm for each subject. The perspective to the real room and the virtual replica were identical. The virtual camera was positioned in the center of the virtual laboratory and oriented towards one of the walls (see Figure 6). ACM Transactions on Graphics, Vol. 30, No. 5, Article 112, Publication date: October 2011.

112:8



F. Steinicke et al.

Fig. 7. Pooled results of the discrimination between (a) virtual and physical teapots and (b) virtual and physical scene respectively. The x-axis shows the applied field of view gain gF , the y-axis shows the probability that subjects estimate the virtual teapot/laboratory rendered with a larger GFOV as a more realistic model of the physical teapot/laboratory.

Figure 6 (left) illustrates the view of a subject to the virtual and Figure 6 (right) a view to the real setup. The virtual replica consisted of more than 50,000 texture-mapped polygons. The texture maps were obtained from a mosaic of digital photographs of the walls, ceiling, and floor of the laboratory. All floor and wall fixtures were represented true to original as detailed, textured 3D objects, for example, door knobs, furniture, and computer equipment. Virtual and physical laboratory were congruent in millimeter range. 4.2.2 Results of E2. Figure 5(b) shows the pooled results of the subjects for all gain conditions, with the field of view gains gF on the x-axis applied to the first rendered scene, respectively, on the y-axis for the second rendered scene. The color shows the probability that subjects estimate the virtual scene rendered with a larger GFOV as a more realistic model of the physical scene. The results show that subjects have problems detecting perspective distortions when a virtual scene is rendered. Figure 7(b) shows the pooled results for the discrimination task between virtual and physical scene together with the standard error, here considering only the gain conditions, in which one applied field of view gain assumed gF = 1.0. The x-axis shows the applied field of view gain gF that varied from gF = 1.0. The y-axis shows the probability that subjects perceive the virtual laboratory as more realistic when it is rendered: (1) magnified (GFOV smaller than DFOV: gF < 1) or (2) minified (GFOV greater than DFOV: gF > 1). The solid line shows the same fitted sigmoid function as used in experiment E1. We found no dependency whether we arranged the view to the real laboratory left or right of the view to the virtual replica and therefore pooled the two conditions. From the sigmoid function we determined a slight bias for the point of subjective equality (PSE = 1.0481). This PSE shows that subjects perceive virtual and real laboratory as identical when the GFOV almost matches the DFOV. For individual subjects, we found PSEs of 0.99, 1.30, 0.70, 1.21, 1.15, 1.00, 1.00, 0.93, 1.38, 0.89, 1.00, 1.01 (∅ : 1.047). The DTs were at gains of 0.8902 and 1.2261 for responses in which subjects judged the virtual laboratory rendered with the larger field of view as a more realistic model. These results show that gain ACM Transactions on Graphics, Vol. 30, No. 5, Article 112, Publication date: October 2011.

differences within this range cannot be reliably estimated when the GFOV varies from the DFOV (25.99◦ ) between 23.14◦ and 31.87◦ .

5.

DISCUSSION

The exposure study revealed that a deviation between GFOV and DFOV induces essential problems when judging object properties. The results from experiments E1 and E2 show that when the GFOV approximates the DFOV, users perceive the virtual scene as an identical representation of its physical counterpart in contrast to representations, which use a different GFOV. We observed only a minimal shift from gF = 1 of the PSEs for experiments E1 and E2. A simple t-test analysis showed that this shift is neither significant in experiment E1 (p = 0.88) nor in experiment E2 (p = 0.40), which supports hypothesis H1 that the perception of virtual scenes matches the perception of corresponding real-world scenes when the GFOV matches the DFOV. In the results of both experiments (refer to Section 4.1.2 and Section 4.2.2) we observed that the detection thresholds are almost symmetrical around the PSE, which indicates that mini- and magnification of scenes yield symmetrical perceived distortions. The results of the experiments indicate that for an individual object subjects could less reliably detect perspective distortion than for an entire visual scene. Indeed, the psychometric functions (see Figures 7(a) and (b)) are similar, since we considered only cases in which one stimuli was rendered with a gain that equaled one. However, in Figure 5, which depicts all gain constellation, it can be seen that the area which corresponds to uncertain answers of the subjects is larger in Figure 5(a) (E1) than in Figure 5(b) (E2). In experiment E1, only a single object was displayed in the center of the screen with OFOV

Suggest Documents