Dec 7, 2012 ... for depth perception in Direct Volume Rendering. ... cent surfaces, as shown in
Figure 1, DOS is less effective. Depth of. Field (DoF) is another ...
Evaluation of Depth of Field for Depth Perception in DVR Pascal Grosset, Mathias Schott, Georges-Pierre Bonneau, Hansen Charles
To cite this version: Pascal Grosset, Mathias Schott, Georges-Pierre Bonneau, Hansen Charles. Evaluation of Depth of Field for Depth Perception in DVR. IEEE Pacific Visualization 2013, Feb 2013, Sidney, Australia. IEEE, pp.81-88, 2013, .
HAL Id: hal-00762548 https://hal.inria.fr/hal-00762548 Submitted on 7 Dec 2012
HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destin´ee au d´epˆot et `a la diffusion de documents scientifiques de niveau recherche, publi´es ou non, ´emanant des ´etablissements d’enseignement et de recherche fran¸cais ou ´etrangers, des laboratoires publics ou priv´es.
To appear in an IEEE VGTC sponsored conference proceedings
Evaluation of Depth of Field for Depth Perception in DVR A.V.Pascal Grosset∗
Mathias Schott†
Georges-Pierre Bonneau‡
School of Computing and SCI Institute,
NVIDIA - School of Computing and SCI Institute,
LJK - Grenoble University - INRIA
University of Utah
University of Utah Charles D. Hansen§ School of Computing and SCI Institute,
University of Utah
A BSTRACT In this paper, we present a user study on the use of Depth of Field for depth perception in Direct Volume Rendering. Direct Volume Rendering with Phong shading and perspective projection is used as the baseline. Depth of Field is then added to see its impact on the correct perception of ordinal depth. Accuracy and response time are used as the metrics to evaluate the usefulness of Depth of Field. The on site user study has two parts: static and dynamic. Eye tracking is used to monitor the gaze of the subjects. From our results we see that though Depth of Field does not act as a proper depth cue in all conditions, it can be used to reinforce the perception of which feature is in front of the other. The best results (high accuracy & fast response time) for correct perception of ordinal depth is when the front feature (out of the users were to choose from) is in focus and perspective projection is used. Index Terms: I.3.7 [Computer Graphics]: Three-Dimensional Graphics and Realism—Color, shading, shadowing, and texture; I.3.m [Computer Graphics]: Miscellaneous; I.4.10 [Image Processing and Computer Vision]: Image Representation—Volumetric 1
I NTRODUCTION
As Direct Volume Rendering (DVR) gains acceptance and becomes more pervasive, it is becoming increasingly important to test the effectiveness of the different rendering techniques that claim to improve perception in DVR. While it is easy to judge the horizontal and vertical positions of features in an image, estimating the depth of these features is more difficult. As pointed out by Boucheny et al. [4], estimating depth even in a trivial highly transparent scene can be very hard and DVR images often have highly transparent features. To aid depth perception, humans rely on a number of depth cues [13], many of which are present in DVR images. Table 1 describes the different depth cues available in static monocular scenes. For solid surfaces, shadows are very useful in helping us estimate relative ordering of features. Lindemann et al. [17] conducted a user study on the use of different illumination methods and found that Directional Occlusion Shading [27] (DOS) brings about 20% improvement in ordinal depth perception. However, for highly translucent surfaces, as shown in Figure 1, DOS is less effective. Depth of Field (DoF) is another visual cue which may aid depth perception. There are different kinds of depth descriptors. Absolute descriptions are usually quantitative and are defined in units like meters or relative to the viewer’s body. Relative descriptions relate one property to another. This can be further broken down into relative
Figure 1: Flame Dataset (a) Original Rendering (b) with occlusion shading. We see that having ambient occlusion does not improve ordinal depth perception in this type of images. (c) shows the bonsai dataset with well defined surfaces. We can see that applying directional occlusion shading (d) helps with depth perception in this case.
and ordinal. “Relative descriptions relate one perceived geometric property to another (e.g., point a is twice as far away as point b). Ordinal descriptions are a special case of relative measure in which the sign, but not the magnitude, of the relations is all that is represented.” [29]. In this paper, we present a study on the use of DoF and its impact on the perception of ordinal depth in DVR and we try to establish under which conditions it is most beneficial. DoF cause regions further from the focal plane to be increasingly blurred while the focal plane region remains sharp. Photographers often use shallow depth of field, that is having an object which is relatively close to the camera in focus while the background is not, to put emphasis on a subject or feature they want viewers to focus. Many researchers have carried out studies to understand the role of blur [6] and DoF in perceiving depth [11, 12, 19, 21] but their finding is by no means conclusive. The conclusions range from blur being at best a weak depth cue [20] to “an important depth cue which has been previously underestimated” [11]. For DVR, the only work we found that studies the application of DoF to improve depth perception was by Ropinski et al. [25] which used a modified DoF and focussed only on angiography images described in
∗ e-mail:
[email protected] [email protected] ‡ e-mail:
[email protected] § e-mail:
[email protected] † e-mail:
1
To appear in an IEEE VGTC sponsored conference proceedings the related work section. To the best of our knowledge, this is the first comprehensive user study on the use of DoF as a depth cue in DVR. The main contributions of this paper are:
are generally grouped into two categories: monocular and binocular. Binocular refers to the inter ocular offset to perceive depth in a scene while monocular depth cues refers to having a single image of the scene. For this user study, only one view is being shown on the screen. Furthermore, there is no motion. So we only have static monocular depth cues. These are explained in Table 1. Depth of focus is one of the depth cues the human visual system uses to perceive depth. In our eyes, the shape of the lens is distorted by ciliary muscles to focus light on the retina. The process of modifying the shape of the lens of our eye is called accommodation. The human visual system gets information of the amount of distortion of the lens from the ciliary muscles and can use this as a depth cue; especially for very close objects [29]. Also, the part of the image not in focus, in front and behind the focal plane will appear blurred. Using the amount of blurriness of an image as a depth cue is not a viable indication of how close or far an object is. As indicated by Mather et al. [21], blur discrimination is poor in human vision.
• Establish whether DoF is a useful depth cue in DVR images. • Determine for which kind of datasets is DoF most helpful. The DoF technique proposed by Schott et al. [26] was implemented in the SLIVR renderer of the VisIt Visualization software [18]. The experiment is run on site to be able to control most experimental parameters including screen quality, luminosity and attention of subjects. The 3 main hypotheses we want to test are: 1. HYP1: DoF will help improve the accuracy of ordinal depth perception in a volume rendered image where there are multiple features. 2. HYP2: DoF will help improve the speed of ordinal depth perception in a volume rendered image where there are multiple features. 3. HYP3: If users view a moving focal plane, correct perception of ordinal depth will improve.
2.2 Depth of Field in Graphics Several attempts have been made to reproduce DoF in computer graphics. Barsky et al. [2] have a comprehensive study of different DoF techniques in computer graphics. Some of the earliest applications of DoF are from Cook et al. [7] where they simulated DoF in distributed raytracing applications by sending multiple rays. In interactive computer graphics, a common technique is to sort the scene along depth and apply different amounts of blur to different levels of depth which Barsky et al. [1] and Kraus et al. [16] did. In DVR, DoF has been proposed by Crassin et al. [8] for large volumes in the Gigavoxels Framework. Ropinski et al. [25] is of greater interest since in addition to having DoF, they also conduct a user study on the use of modified DoF to enhance spatial understanding; their application only looked at angiography images. In their work, they try to determine where the user is focussing in an image. The part deemed to be behind the region in focus has DoF effects applied to it. Consequently, there are no DoF effects in front of the focal plane. From their user study they find that DoF helps to improve the correctness of depth perception but they also saw an increase in response time. They attribute this to the user having to get used to some part of the image appearing out of focus since they study other techniques for improving the correct perception of depth and DoF is one of the ones they test. To implement DoF in DVR, we referred to the work of Schott et al. [26] that describes a DoF technique for slice based Volume rendering. A view-aligned slice based volume renderer is used where the view-aligned slices are blurred according to the distance from the viewer. Though this technique can suffer from over-blurring, its results are quasi-indistinguishable from a Monte Carlo ray tracer and so we chose this approach to implement DoF.
An experiment is set up that has two parts: a static and dynamic part to test these hypotheses that are being put forward. In the static part, to test the first two hypothesis, we used 150 static images of 5 different datasets. In each image two features are selected and the subject is asked to identify which one is closer. For the dynamic part of the experiment, to test the third hypothesis, we change the focal length of the lens and create a video where the focal plane sweeps through the volume from the front to the back and to the front again. The reason why we use a video instead of providing direct control to the test subject to move the focal plane is to avoid possible biases introduced by user interaction [30]. We use 20 videos of about 17 seconds for this part. The rest of the paper is structured as follows: Section 2 discusses the findings of previous research on depth perception and DoF, and the different approaches for DoF in computer graphics and DVR. Section 3 describes how DoF has been implemented for this user study. In section 4, we describe the setup of the experiment, state our results and discuss their implications. The last section, section 5, states the conclusions and proposes some future work. Table 1: Monocular Static Depth cues Depth Cue
Description
Depth of Focus Familiar Size
Objects in focus appear sharp while those not in focus appear blurred. When we use our prior knowledge of the world to judge distances. E.g. The smaller a plane looks in the sky, the further away from us we know it is. Objects in front of others overlap those at the back and so hide part of the back objects. Parallel lines appearing to converge at infinity. Objects which are far away from us take a smaller area in our field of view. Objects which are far from us tend to appear more dimly lit than objects that are close to us. When we know where the light source is, we make use of where the shadow will fall to decide where the object is. Fine details are clearly seen for objects which are close to us compared to objects that are far away.
Occlusion Perspective Relative Size Shading Brightness Shadow Texture Gradient
2
2.3 Perceptive studies for DoF On the subject of whether DoF is a valid depth cue, literature seems to be quite undecided. Table 2 shows a summary of the previous results. The general agreement among researchers is that DoF helps improve depth perception when combined with other depth cues. Mather et al. [21] established that blur discrimination is poor in humans. However, increasing the number of depth cues improves the perception of depth Mather et al. [22]. In their study, the subjects were presented with a set of tiles and used a combination of blur, contrast and interposition as depth cues. They were asked to identify which tile appeared closer to them by clicking on them starting by the closest. Blur on its own gives about 65% correct depth results but when combined with other depth clues like interposition and blur, this rises to about 90%. This was confirmed by Held et al. [12] who conclude that blur on its own does not help to estimate absolute or relative distance but coupled with perspective, it is much more effective. In their latest study Held et al. [11] claim
R ELATED W ORKS
2.1 Depth perception To be able to perceive the depth of objects in the real world and in a synthetic image, the human visual system makes use of a number of depth cues. Depth cues have been studied thoroughly [13, 29] and
2
To appear in an IEEE VGTC sponsored conference proceedings Table 2: Previous findings on the impact of DoF and blur as a depth cue. Research
Findings
Held et al. 2012 [11]
Blur is used to make depth estimation significantly more precise throughout visual space. Blur when combined with a number of other depth cues improves depth perception. Blur can only be used to establish ordinal depth.
Mather et al. 2004 [22] Held et al. 2010 [12] Mather 1996 [20] Mather et al. 2002 [21]
that combined with perspective, blur can even be used to guess distances. Since DVR often uses perspective, it could prove to be a very valuable addition. The only study on the impact of blur in DVR that we have found is from Ropinski et al. [25].
Figure 3: Geometric setup of the DVR implementation of DoF(image courtesy of Mathias Schott [26])
DoF is implemented as shown in Figure 3 and explained in [26]. A GPU slice based volume renderer is used and the scene is broken down into 2 sections namely before and after the focal plane. For the part between the focal plane and the front of the volume (where the camera is), the volume is processed from the front of the volume to the focal plane (in a front to back manner) and each slice is blurred according to its position. For the part behind the focal plane, the volume is processed from the end of the volume to the focal plane (in a back to front manner). Each rendered slice is blended with a blurred region of the previous slice. The blur kernel’s size decreases as the slice approaches the focal plane. Two directions are needed to ensure that the in focus region does not contribute to the blurred region. A more detailed description of the algorithm with pseudocode can be found in [26].
3 D EPTH OF F IELD FOR DVR The mechanics of DoF can be derived from the Thin-Lens equation: 1 1 1 = + f s zf
(1)
where f is the focal length, s is the distance from the lens to the focal plane and z f is the distance from the lens to the object as shown in Figure 2 (a). When light from a point in the scene passes through a camera lens, it should ideally be focussed on a single point on the image plane. However, if the image plane is not at the correct position, the point is mapped to a circular region instead, c on the diagram in Figure 2 (a). The diameter of the region can be determined according to the equation 2: z − z f (2) c(z) = A z where c(z) is the diameter of the circle of confusion, A is the aperture of the lens as shown in Figure 2 (b). This figure is quite revealing from a perception point of view. It shows that on both sides of the focal distance z f , we can have regions having similar diameter for the circle of confusion which would translate to the same amount of blur.
Figure 4: (a) Backpack displayed within Psychopy. The features to choose from have been circled. (b) Side view of the same dataset (focus plane shown as dashed line) not shown in the experiment. We can see that the features are quite far apart.
4 U SER S TUDY The aim of the experiment is to determine whether DoF provides a better understanding of the ordinal depth of different features in a volume rendered image. More specifically, we want to be able to check these three hypotheses: 1) DoF helps improve the accuracy of the determination of ordinal depth; 2) DoF helps improve the speed of the determination of ordinal depth; 3) if users can change the position of the focal plane, correct perception of ordinal depth will improve. Our focus is only on ordinal depth as absolute depth, for different datasets, can mean very different things but yet they can appear to be of similar size in a volume rendered image. For example, the size of the backpack dataset could be about 40 cm while the size of the aneurism dataset would be about 10 cm. To test these three hypotheses, we will carry out two experiments. In the static part, we show the test subjects a number of images in which we ask them to select which of two circled features (located at different depths in the image) is in front. Figure 4 shows an example of this. In the dynamic part, we show a video of a DVR dataset where the focal plane sweeps from the front to the back and back to the front. Here again two features are circled at different depths and the subject is asked to decide which one is
Figure 2: (a) Lens Setup, (b) Circle of Confusion
3
To appear in an IEEE VGTC sponsored conference proceedings
Figure 5: The 6 datasets used: (a) aneurism, (b) backpack, (c) bonsai, (d) flame, (e) Richtmyer-Meshkov instability & (f) thorax. The reason for having a different background color is explained in Section 4.1 Table 3: Images description
in front. As discussed by Knill [14], the influence of depth cues varies depending on the task and what might weigh more for one task might be less important for another. Consequently, to make our experiment as general as possible, we decided to have minimal interaction. Sections 4.1 to 4.3 describe the parameters and setup of the experiment.
Image
Description
aneurism
Has lots of occlusion which should be quite helpful to perceive depth but the complex network of veins is quite confusing. The earbuds are totally disconnected and appear to float in mid air which is quite unfamiliar. Moreover, even for people who are familiar with depth of field, they normally see it as a progression over the image. This is not the case here. The background is not blurred, only the volume is. However, there is still a certain amount of occlusion that could be helpful. People are familiar with these kind of shapes. The RichtmyerMeshkov instability looks like a landscape (though it is not one) and everyone is familiar with seeing trees. Moreover, they tend to have similar depth cues as shown in Table 4. This is a combustion dataset that is extremely hard to understand. The chosen transfer function is what chemical engineers have used to visualize this data. One of the complaints was that it is very hard to understand this static image in a publication and so they would like to know if DoF could help in this case. This image is complex due to the presence of many structures some of which are quite transparent and faded completely when DoF was applied. So, we used it only in the videos.
backpack
4.1 Stimuli Description To generate the stimuli, DoF was implemented as described in Section 3 in the SLIVR renderer of VisIt 2.4.2. PsychoPy 1.7.4 [24] was used to present the images and movies to the subjects, and to collect their answers. As can be seen in Figure 5 the background for each dataset is different. The background (except the flame dataset where the background interfered too much due to the highly transparent nature of the image) was carefully chosen by computing the Michelson contrast. This was done as follows: the image is generated in VisIt with a background which can be easily removed from the image such as pure green (RGB: 0 1 0) or pure blue (RGB:0 0 1) depending on the image. The Michelson contrast [23] is computed for all the colors which do not match the pure green or blue as follows: Mc =
Lmax − Lmin Lmax + Lmin
bonsai / RichtmyerMeshkov instability flame
thorax
(3)
Table 4: Images and their associated depth cues. We have 3 levels for each: High, Med and Low indicating how useful each of the depth cue is expected to be in each image. Relative Familiar Texture Occlusion Dataset Size Size Gradient
where Mc is the Michelson Contrast, Lmax is the maximum luminance and Lmin is the minimum luminance where luminance [5] is calculated as follows: L = 0.2126 ∗ Red + 0.7152 ∗ Green + 0.0722 ∗ Blue
aneurism backpack bonsai flame thorax RichtmyerMeshkov instability
(4)
where Red, Green and Blue are the RGB components of the color. The background color is assigned to be a grey RGB value (same red, green and blue) for which calculation of the Michelson contrast with the background is the same as calculating the Michelson contrast with only the dataset ignoring the background. Five of the six datasets were used for the first static images part of the experiment. This is due firstly to time constraints: we did not want the experiment to last too long but still manage to have enough data per dataset. Secondly some features of the torso dataset would disappear when blurring was applied for DoF; some of the circled features would disappear and the subject would see an empty ellipse or sometimes the feature behind the selected feature. The datasets and their associated transfer function were selected so that we have a mixture of shapes which are familiar (like the bonsai which is tree shaped) and unfamiliar (like the flame dataset) to non-volume rendering users. Also, each resulting image has a different number and type of depth cues. The depth cues that we can expect are: Occlusion, Perspective, Relative Size, Familiar Size and Texture Gradient. For this test we did not use shadows or changes in shading. The perspective projection settings will be controlled so that all datasets have the same settings. Table 4 shows a taxonomy of the depth cues for the images (Figure 5) that we are using and Table 3 describes each of the images. The minimum separation between the features to be selected in an image was 5% of the whole
High Low High Low Med Med
Med Med High Med Med Med
Med Low High Low Low Med
Med Low High Low Low High
depth of the volume and the maximum separation was 60%. The number of test images for each separation range is as follows: [0 10%): 12, [10% - 20%): 23, [20% - 30%): 22, [30% - 40%): 37, [40% - 50%): 49, [50% - 60%]: 7. With perspective projection, objects which are closer to the viewer appear bigger than objects which are far. This will be the baseline to which DoF is added to see whether ordinal depth is perceived better. We have also included orthographic projection to allow us to verify the net impact on only having DoF. Also, we expect to see a performance improvement from orthographic to perspective projection which would match what other researchers have found; increasing the number of depth cues will increase correct perception. Also, one of the things a good user study [15] should take care of is verifying that the participants are committed to answering truthfully. That can be done in this case by verifying that perspective projection gives an improvement compared to using orthographic projection.
4
To appear in an IEEE VGTC sponsored conference proceedings 4.2 Experimental procedure Environment setup The experiment was conducted on site so as to ensure that we have similar environment settings for all participants of the experiment. The study was carried out in a room where the curtains were closed to shut off light from the outside (which will vary during the day) and illuminated by white fluorescent light. This is to make the screen more visible as well as to increase the performance of the eye-tracker which requires specific light condition for better accuracy of the tracker. Eye tracking is not the most important part of the study but was included to see if it would help us understand the results of the experiment. The T2T (Talk to Tobii) [10] package was used to interface with the Tobii T60 eye-tracker [28]. Apparatus A Macbook pro was used to run PsychoPy which was connected to the eye-tracker. A gamepad was used as input device as it is more ergonomic than a keyboard and mouse. On the gamepad, the subject presses any of the left buttons to select the left feature and any of the right buttons to select the right feature. This is important since the subject will spend on average 30 minutes for the whole experiment and so we want to make it as easy and less tiring for them as possible to try to minimize the impact of boredom and fatigue. Also, interaction with a gamepad is less likely to introduce perceptual bias as it offers a simple and straight-forward method of entering selection. Participants & Design 25 subjects (6 females, 19 males) participated in the user study. All of them had good eye sight or corrected vision; 7 wore glasses and none were color blind. All but one test subjects reported to be right handed. The age range was as follows: 15 - 20: 1, 21 - 30: 21, 31 - 40: 3. All the participants had some experience with computer graphics through games or movies but most of them were not familiar with volume rendering: none of them were students, researchers or users developing or working with volume rendering. A within-subject design was used where all the participants did all the tests. The experiment, carried out over 2 days with 25 participants, used a two-alternative forced-choice methodology [3]. Each participant spent approximately 30 mins doing the experiment (along with calibrating the eye-tracker and training for the experiment) and each test was followed by a debriefing session.
Figure 6: (a) Average correctness for the different conditions (with static images) of the experiment with standard error. (b) Average response time for the different datasets and conditions (with static images) of the experiment with standard error.
a particular dataset following each other. However, all participants were shown the test images in the same pre-randomized order. The submitted answer and the time taken is recorded for each test image. Dynamic Experiment To test the third hypothesis, whether being able to change the position of the focal plane will improve the correct perception of relative depth, we carry out a second experiment. To minimize user interaction that could introduce bias in the experiment [30], we made a video of the focal plane sweeping from the front to the back and back to the front. This is referred to as the dynamic part of the experiment. We have 20 videos of the 6 datasets (aneurism, backpack, bonsai, flame and Richtmyer-Meshkov instability & thorax) each lasting approximately 17 seconds. We record only the answers, not the time, as the subjects are asked to watch each video completely before answering.
4.3 Tasks Static Experiment To test the first two hypotheses, of whether DoF helps improve the speed and accuracy of perception of ordinal depth in a scene. We conduct the following experiment. We have 150 images of 5 datasets (aneurism, backpack, bonsai, flame and Richtmyer-Meshkov instability) under different 6 different conditions namely: 1. orthographic projection 2. orthographic projection & front feature in focus (DoF Front) 3. orthographic projection & back feature in focus (DoF Back) 4. perspective projection 5. perspective projection & front feature in focus (DoF Front) 6. perspective projection & back feature in focus (DoF Back) In each image, we present the subject with two features. Each feature is surrounded by an ellipse and located at different horizontal positions such that it is clearly distinguishable which one is on the left and which one is on the right. The features are located at different depths and participants are asked to choose the one which they perceive as being in front. To select the left feature, the subject just has to click any of the left buttons on the gamepad. Same for the right feature. Figure 4 (a) shows an example image from the user study. On choosing a feature, the color of the circle changes. The order in which the different images were shown was randomized so as not to have all images under one specific condition or for
Both experiments were conducted as follows: 1. Calibration of the eye-tracker. The subject is asked to stare at a red circle on the screen which appears for 4 seconds at one position and then jumps to another spot. The calibration stage lasts for about one minute. 2. Training for the static part of the experiment. Firstly, the overall task is explained to the user. The input device to be used (gamepad) is introduced and the subject is briefed on how to use it. Next 7 training images are presented, 4 with DoF and 3 without. In each image, two features are circled with an ellipse and the subject is asked to select which one is in front. We request the correct answer for each before proceeding to the next image to ensure that the subject understands the task at hand. On completing this phase, we inform the subject that we will start with the experiment but now they will not be required to give the correct answer to proceed to the next image.
5
To appear in an IEEE VGTC sponsored conference proceedings 3. The static experiment. Each of the 150 images is shown to the participant who is asked to select which feature appears to be in front. The circle around the chosen feature changes color on being selected and the next test image appears. Response time and answers are recorded along with the eye tracking data. 4. Training for the dynamic experiment. A short animation describing the motion of the focal plane is presented to the user along with an explanation on what they have to do. 5. The dynamic experiment. Each of the 20 videos is shown and the participant is asked to choose which of the two circled features appears to be in front. The answer is recorded along with the eye tracking data. 6. The experiment ends with a debriefing session where the subject is asked for verbal feedback on the experiment which was noted down by the experimenters. 4.4 Experiment 1: Static 4.4.1 Results The average correctness and completion times recorded during the experiment are used to analyze the results. The overall comparison of average correctness under the 6 conditions for all the subjects reveals that there is a significant difference in the results for average correctness [one-way, repeated-measures ANOVA, F(2.2,52.8) = 49.754, p