D.V.S.X. De Silva, W.A.C. Fernando, G. Nur, E.Ekmekcioglu and S.T. Worrall. I-Lab, Multimedia Communications Research Group, Faculty of Engineering and ...
Proceedings of 2010 IEEE 17th International Conference on Image Processing
September 26-29, 2010, Hong Kong
3D VIDEO ASSESSMENT WITH JUST NOTICEABLE DIFFERENCE IN DEPTH EVALUATION D.V.S.X. De Silva, W.A.C. Fernando, G. Nur, E.Ekmekcioglu and S.T. Worrall I-Lab, Multimedia Communications Research Group, Faculty of Engineering and Physical Sciences, University of Surrey, Guildford, GU2 7XH, UK. {D.Desilva, W.Fernando, G.Nur, E.Ekmekcioglu, and S.Worrall}@ surrey.ac.uk ABSTRACT The ability to provide a realistic perception of depth is the core added functionality of modern 3D video display systems. At present, there is no standard method to assess the perception of depth in 3D video. Existence of such methods would immensely enhance the progression of 3D video research. This paper focuses on the depth perception assessment in color plus depth representation of 3D video. In this paper, we subjectively evaluate the depth perceived by the users on an auto stereoscopic display, and analyze its variation with the impairments introduced during the compression of the depth images. The variation of the subjective perception of depth is explained based on another evaluation that is carried out to identify the Just Noticeable Difference in Depth (JNDD) perceived by the subjects. The JNDD corresponds to the sensitivity of the observers to the changes in depth in a 3D video scene. Even though only the effects of compression artifacts are considered in this paper, the proposed assessment technique, based on the JNDD values can be used in any future depth perception assessment work. Index Terms— Depth images, 3D video, Depth Perception
1. INTRODUCTION 3-Dimensional Television (3DTV) provides improved experience to its viewers by providing the sensation of depth. Thus, 3DTV is a technology based on exploiting the properties of human depth perception [1]. Therefore, it is important to provide 3DTV services in a way that it enables high quality natural depth perception. Quality assessment is a key component of any video related service system. While subjective test results still remain the ‘Gold Standard’ of quality measurement in video research, objective metrics are of utmost importance for the timely development of 3D video related technologies. Quality assessment of 3D video is posing new challenges to the 3DTV research community. Among these, assessment of the perception of depth remains an unanswered problem. In this paper, we address this issue to a certain extent in the context of color plus depth representation of 3D video [2], where a monoscopic color image and a per-pixel depth image is used to represent 3D video. Using these two components, two slightly different perspectives of the same scene, known as stereoscopic views, are rendered at the receiver. While humans use different physiological and psychological cues to perceive depth [3], binocular stereopsis is the most important additional depth cue provided by modern stereoscopic 3D displays, in comparison to traditional 2D displays. Binocular stereopsis is the sensation by which the brain interprets depth information, by making use of the two slightly different views seen
978-1-4244-7994-8/10/$26.00 ©2010 IEEE
by the two eyes. Thus, the human brain is able to perceive depth by analyzing the disparities of different objects in stereoscopic views. One of the earliest studies reported in [4], explored the variation of human experience on three specific perceptual attributes of 3D video. These attributes are perception of depth, presence and naturalness of the depth. It was shown that the increase in depth would lead to an enhanced sense of presence, provided that the depth is perceived as natural (i.e. depth cues introduced are consistent and within natural bounds). The effect of depth image quality on 3D perception was presented in [5]. Here, the depth images were quantized at different bit rate settings and subjectively evaluated for overall 3D perception. It was concluded in [5], that the depth image can be significantly quantized without affecting the quality of 3D perception. In [6], the correlation between existing 2D video quality metrics and subjective scores on perceptual attributes of color plus depth based stereoscopic video, was analyzed. It was concluded in [6] that out of many 2D metrics considered, the Video Quality Metric (VQM) [7] of the color component correlates strongly with the overall perception of depth perception. Further, there was no quality metric among the ones that were tested that could be applied on the depth image alone, which correlates strongly with the viewer ratings of depth perception in 3D video. Similarly, in [8] it was reported that for 3D video depth distortions are less significant than the color distortions. Although, it is reported in [5] [6] and [8] that the depth perception does not change with different quantization levels of the depth image, none of them explain the reason. This paper investigates in detail the reaction of observers to the changes in the depth images incurred due to the compression artifacts. To explain the variation of the perceived depth, we subjectively investigate the behavior of the Just Noticeable Difference in Depth (JNDD) perceived by humans, in a 3D video scene. The reason for observers to not perceive any significant depth changes, due to the impairments in the depth images, is explained with the findings of the JNDD investigation. The rest of the paper is organized as follows. The Section 2 describes the subjective experiment carried out to investigate the variation of the perception of depth with different quantization levels and in section 3 the experiment to evaluate the JNDD values is presented. Section 4 explains the variation of the depth perception with the JNDD results. The Section 5 concludes the paper with some directions for future work.
2. EVALUATION OF DEPTH PERCEPTION This section provides details of a set of subjective experiments carried out to evaluate the user perception of depth in 3D video. In particular, we investigate how the users react to the impairments in the depth images, which are caused during compression.
4013
ICIP 2010
Mean Opinion Score
5
(a) The ‘Chess’ Sequence
(b) The ‘World Cup’ Sequence Fig. 1. Color and depth images of two of the sequences used for the experiments 2.1. The Experimental Setup
Breakdancers
Chess
World cup
Interview
4.5
4
3.5 20
25
30 35 40 Quantization Parameter
45
Fig. 2. Mean Opinion Score for depth perception of 3D video sequences with impaired depth maps at different QP settings.
The subjective experiment is performed according to the ITU-T BT.500-11 [9] and 16 observers (6 experts and 10 non experts) participate for all tests. In contrast to the standard, some experts are also selected for the experiment considering the specific nature of 3D video. All participants have a good visual acuity (> 0.7, as tested with a Snellen eye chart), good stereo vision (< 30 seconds of arc, as tested with the Randot stereo test) and good colour vision (as tested with the Ishihare test). The experiment is performed on a 42” Philips WoWvx multi-view auto-stereoscopic display. The aspect ratio is 16:9 and the peak luminance is 200 cd/m2. The viewing distance is set at 3m from the screen plane, which is the optimum for display optics. Inputs to this display system are in the form of color plus depth representation of 3D video. For this representation, the depth map has 256 (0-255) distinct depth levels. Video objects that have a depth level of 128, have zero disparity (co-planar with the screen plane).The measured environmental illumination is 200 lux, which is the recommended in [9] for home environments. 2.2 The Experiment Several 3D video sequences are considered for subjective testing. However, the results are presented for sequences “Breakdancers” (high motion/ high depth variation with static camera), “Chess” (low motion/ high depth variation and parallel camera motion), “World Cup” (medium motion/ low depth variation and Static Camera) and “Interview” (low motion/ low depth variation with static camera), considering their different characteristics. The thumbnails of two of the sequences are shown in Fig. 1. In each subjective evaluation test, observers are asked to assess the perceived depth in the impaired video sequence with respect to the one without any impairment. . The Double Stimulus Continuous Quality Scale (DSCQS) method [9], which has a scale ranging from one to five, is used in the experiments. H.264 reference Joint Model software (JM Version 15.1) is utilized to encode the depth map sequence at 5 different Quantization Parameter (QP) settings (QP = 22, 27, 32, 37, 42). The color images are not compressed. 2.3 The Results The results of the subjective experiments are presented in Fig. 2. As illustrated in Fig. 2, the curves representing the QP versus Mean Opinion Scores (MOSs) for the 3D video sequences show a
4014
consistent and a very slight decreasing trend with increasing QP. Since the decrease in the MOS is so small, it can be concluded that depth perception does not change even if the depth images are highly compressed. In general, the absolute MOS for the sequences ‘Breakdancers’ and ‘Chess’ is higher than the other two sequences. The reason could be that ‘Breakdancers’ and ‘Chess’ contain more distinct depth level than ‘Interview’ or ‘World Cup’, where the number of distinct depth levels is around three.
3. EVALUATION OF THE JNDD BEHAVIOUR In order to explain the subjective results presented in the previous section, we performed a subjective experiment to understand the sensitivity of humans to changes of depth in a 3D video scene. The binocular stereopsis cue is the most significant additional depth cue provided by 3D displays. As described earlier, it is a cue in which the brain interprets depth information by making use of the two different views seen by the two eyes. This section it is describes how the just noticeable difference in depth that is perceivable with the binocular stereopsis cue is experimentally evaluated. 3.1. The Experimental Setup The set up consists of the same display that is described in section 2.1. The near clipping plane and far clipping plane of the display used are both placed approximately 15.25 cm apart from the screen plane. The depth levels 0 and 255 are displayed 15.25 cm behind and in front of the screen respectively. 18 subjects in total participated in this experiment. 3.2. The JNDD Evaluation Experiment Subjects are asked to watch a synthetic image sequence with two objects, both representing a synthetic image of a car, as shown in Fig. 3 (a). In a natural scene, there are many other cues to enable depth perception. However, in this experiment we measure only the effect of binocular disparity cue, which is provided by a stereoscopic display. Therefore, since it is necessary to isolate any other cues that enable depth perception, a plane image as shown in the Fig. 3 (a) is used for this experiment.
Table I: Summary of the Experiments and Results Experiments (a) Texture Image
(b) Initial depth map (t = 0)
Variation of Object Depth Levels
Reading Number (c) Depth map at t = 90s (d) Depth map at t = 180s Fig. 3. An example of a texture image and the corresponding depth maps used for the JNDD experiment Initially, the two objects are placed at the same depth level relative to the screen, as shown in Fig. 3(b). The depth of the right object is gradually changed (increased or decreased) at a predetermined rate, while the depth of the left hand side object is kept unchanged. This is illustrated in Fig. 3(c) and Fig. 3(d). The subjects signal to the coordinator, when they sense a change in the depth level difference between the two objects. They are asked if the right object moves towards the front or back relative to the left object. To ensure the reliability of the experimental outcomes, the responses of the subjects (2 in total) who answer this question incorrectly are discarded from the final analysis. In the experiment, the depth of the right object is changed in increments of one depth level. The color image is not changed/ altered during the entire sequence. The depth level of the right object is changed every 1.5 second (i.e. 3 frames in the depth map for each depth level and the sequence is played at 2 fps). 3.3. Results of the JNDD Evaluation Experiment The Results of the experiment described in 3.2 are summarized in Table I. The results are presented in the units of depth levels, where each depth level in this experimental setup corresponds to approximately 1.2mm (30.5cm / 255). The JNDD corresponds to the number of unnoticed depth levels. The JNDD is a minimum at the screen level at which the disparity is zero, and increases towards the near and far clipping planes, which can be explained by Weber’s Law. The JNDD values in Table I are display dependant and should be verified for a different type of 3D display.
4. EXPLANATION OF THE DEPTH PERCEPTION RESULTS This section proposes a simple method to assess how depth impairments would affect the perception of depth in a 3D video scene. Based on the JNDD investigation that was reported in section 3, the following text explains the reason for the depth perception results that were presented in section 2. During the encoding and decoding process, the pixel values of the depth image change from its original, causing distortions on the stereoscopic views that are rendered with it. These distortions trigger changes in the perceived depth. However, the JNDD investigation done in the previous section suggests that observers do not perceive a change of depth until the depth level is increased or decreased by at least 16 levels (~7%). Thus, pixel changes in the depth image do not cause distortions to the perceived depth, unless those changes exceed a particular threshold. This threshold corresponds to the JNDD value, measured in units of depth levels.
4015
1 Object 2 Moving Forward 3 4 5 Object 6 Moving Backward 7 8
Left Car
Right Car
0 64 128 192 64 128 192 255
0 -> 128 64 -> 192 128 -> 255 192 -> 255 64 -> 0 128 -> 0 192 -> 64 255 -> 128
Results (Averaged & Rounded) Just Number of Noticed Unnoticed Depth Depth Level Levels 22 22 84 20 148 20 214 22 43 21 112 16 176 16 233 22
4.1 Determination of the Regions of Perceivable Depth Distortion In analyzing the effect of pixel errors on the depth perception, it is important to identify the regions in which there is perceivable distortion. For this purpose, we produce a ‘Distortion Map’ that highlights the pixel positions with a perceivable distortion. For the following explanation, let X and Y denote the original and the distorted depth images, respectively. The perceivable distortion (E) caused by a change of pixel value located at position (i,j) is calculated as given in the Equation (1). 0 E (i, j ) = ® X ( i , j ) Y ( i, j ) − D JND − ¯
if if
X (i, j ) − Y (i, j ) < D JND (1) X (i, j ) − Y (i, j ) ≥ D JND
In Equation (1), DJND is the just noticeable difference in depth or the number of unnoticed depth levels corresponding to the depth at (i,j), denoted by X(i,j). This DJND is calculated according to the JNDD values derived experimentally in the previous section. Ideally, DJND should change according to the depth level of each pixel (X(i,j), in this case). However, to reduce the computational complexity, we calculated DJND discretely, considering a specific set of bins. These bins are illustrated in Equation (2).
D JND
21 °19 ° =® °18 °¯20
if if if if
0 ≤ X (i, j ) < 64 64 ≤ X (i, j ) < 128
: Bin 1 : Bin 2
(2)
128 ≤ X (i, j ) < 192 : Bin 3 192 ≤ X (i, j ) < 255 : Bin 4
The DJND value for each of the bins identified in Equation (2) is calculated as the average of the number of unnoticeable depth levels, at the beginning and end depth level of each bin. This calculation is further shown in Table II. Since we deal with integers, DJND for each bin, as shown in Table II, is rounded to the nearest integer number. The Distortion Maps given in Fig. 4 illustrate the locations in which there is a perceivable distortion of depth. In other words, the Distortion Map indicates the pixels where E(i,j) > 0. This distortion map is useful to identify the regions of perceivable depth distortions.
Table II. Calculation of DJND value for each bin Testing Object Object Unnoticed Bin 1 Bin 2 Bin 3 Bin 4 Depth Moving Moving Depth (0- (64 - (128- (192Level Forward Backward Levels 64) 128) 192) 255) 0 64 128 192 255
22.33 20.29 19.67 21.88 N/A
N/A 20.58 16.13 16.00 21.96
22.33 20.44 17.90 18.94 21.96
21.38 19.17
(a) ‘Interview’ at QP=42
(b) ‘Breakdancers’ at QP=42
18.42 20.45
4.2. Discussion All the Distortion Maps shown in the Figure 4, which correspond to the highest impairment level considered in the experiments, indicate that the perceivable distortions occur only on the object boundaries. There are almost no perceivable distortions within the objects of interest. Even the existing distortions on the object boundaries are at most only 2-3 pixels in width. Depth differences of these few pixels are of very low significance to the overall perception of depth, since the depth of the objects of interest have not changed at all. Thus, the overall perception of depth, in general, is not affected by the compression process. However, the slight decrease in depth perception can also be explained using the Distortion maps. The slight distortion on the object boundaries, as seen in Fig. 4, would cause some rendering distortions that may be visible to the human eye. This leads to slight flickering on object boundaries. Furthermore, this inconsistency of the depth and rendering distortions would cause eye strain. This could well be the reason for some subjects to rate a slightly lower depth perception at higher quantization levels. Thus, the compression artefacts introduced by the codec used in this experiment do not cause significant perceptual depth changes. However, for different types of depth image impairments and different viewing conditions, this observation may not hold.
(c) ‘World Cup’ at QP=42 (b) ‘Chess’ at QP=42 Fig. 4. Distortion Maps: Images that indicate pixel positions with perceptually noticeable depth changes. However, for different viewing conditions, such as for different displays, the JNDD values need to be verified again. At the moment, we are working on using the JNDD results for compression and processing of depth maps and hope to validate them on different types of 3D displays in the future.
ACKNOWLEDGMENTS This work has been supported by the MUSCADE Integrating Project, funded under the European Commission ICT 7th Framework Program. Authors would like to thank Microsoft Research for providing the ‘Breakdancers’ sequence and Mr. S.L.P. Yasakethu for producing the synthetic sequences used in the JNDD experiment.
REFERENCES
5. CONCLUSIONS This paper investigated the variation of the subjective quality of depth perception with the quantization level of the depth maps. In particular the paper focused on the color plus depth representation of 3D video. A subjective experiment, which was carried out on an auto stereoscopic display, revealed that the depth perception does not significantly change with the quantization level of the depth image, when the color image is kept unchanged. To explain this observation another subjective evaluation was carried out on the same display to determine how sensitive the humans are to changes in depth. We experimentally derived the Just Noticeable Difference in Depth (JNDD) values at various depth levels. The JNDD results suggest that for the viewing conditions considered in the paper, the humans are not sensitive to slight variations of depth. On average a depth pixel should change its value by at least 7% (16 levels at 128 testing depth level) for a human to perceive any change of depth. Based on these JNDD results we proposed a method of assessing the effect of depth impairments with the aid of Distortion maps and discussed why the depth perception does not change significantly with the quantization level. It could be concluded that for the conditions described in this paper, the perception of depth will not be affected by the quantization of the depth map.
[1] [2]
[3] [4]
[5]
[6]
[7]
[8]
[9]
4016
L. Onural, “Television in 3-D: What Are the Prospects?,” Proceedings of the IEEE, vol. 95, 2007, pp. 1143-1145. Christoph Fehn, “Depth-Image-Based Rendering (DIBR), Compression and Transmission for a New Approach on 3D-TV,” Proceedings of the SPIE, vol. 5291, 93, 2004. V. Bruce, P.R. Green, and M.A. Georgeson, “Visual perception, ” Psychology Press, 4th Edition, 2003. W. IJsselsteijn, H. de Ridder, R. Hamberg, D. Bouwhuis, and J. Freeman, “Perceived depth and the feeling of presence in 3DTV,” Displays, vol. 18, May. 1998, pp. 207-214. G. Leon, H. Kalva, and B. Furht, “3D Video Quality Evaluation with Depth Quality Variations,” 3DTV Conference: The True Vision Capture, Transmission and Display of 3D Video, 2008, pp. 301-304. C. Hewage, S. Worrall, S. Dogan, S. Villette, and A. Kondoz, “Quality Evaluation of Color Plus Depth Map-Based Stereoscopic Video,” IEEE Journal of Selected Topics in Signal Processing, vol. 3, 2009, pp. 304-318. M. Pinson and S. Wolf, “A new standardized method for objectively measuring video quality,” IEEE Transactions on Broadcasting, vol. 50, 2004, pp. 312-322. A. Tikanmaki, A. Gotchev, A. Smolic, and K. Muller, “Quality assessment of 3D video in rate allocation experiments,” IEEE International Symposium on Consumer Electronics, 2008, pp. 1-4. ITU-R Recommendation,“BT.500-11 Methodology for the subjective assessment of the quality of television pictures,” 2002.