IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 5, NO. 2, APRIL 2011
335
Display Dependent Preprocessing of Depth Maps Based on Just Noticeable Depth Difference Modeling D. V. S. X. De Silva, E. Ekmekcioglu, W. A. C. Fernando, and S. T. Worrall
Abstract—This paper addresses the sensitivity of human vision to spatial depth variations in a 3-D video scene, seen on a stereoscopic display, based on an experimental derivation of a just noticeable depth difference (JNDD) model. The main target is to exploit the depth perception sensitivity of humans in suppressing the unnecessary spatial depth details, hence reducing the transmission overhead allocated to depth maps. Based on the JNDD model derived, depth map sequences are preprocessed to suppress the depth details that are not perceivable by the viewers and to minimize the rendering artefacts that arise due to optical noise, where the optical noise is triggered by the inaccuracies in the depth estimation process. Theoretical and experimental evidences are provided to illustrate that the proposed depth adaptive preprocessing filter does not alter the 3-D visual quality or the view synthesis quality for free-viewpoint video applications. Experimental results suggest that the bit rate for depth map coding can be reduced up to 78% for the depth maps captured with depth-range cameras and up to 24% for the depth maps estimated with computer vision algorithms, without affecting the 3-D visual quality or the arbitrary view synthesis quality. Index Terms—3-D video, depth map, depth perception, just noticeable depth difference (JNDD), view synthesis.
I. INTRODUCTION ECHNOLOGICAL breakthroughs in display technologies, such as auto-stereoscopic and light-field (mainly holographic) type displays, have enabled 3-D video systems to be easily deployed. However, the lack of 3-D video content is a major barrier towards the explosion of such display systems in to consumer mass markets. New formats for 3-D scene representation are being considered by standardization bodies to cater for these emerging market needs. A 3-D scene representation format, popularly known as 2-D-plus-Depth or Video-plus-Depth, was introduced in [1] and standardized by the MPEG [2], in which a 2-D color texture video and the corresponding per-pixel depth map video sequence (see Fig. 1) are used to represent 3-D video. This kind of scene representation is known to be both bandwidth efficient and backward compatible with legacy displays.
T
Manuscript received March 11, 2010; accepted April 27, 2010. Date of publication January 24, 2011; date of current version March 16, 2011. This work was supported by the MUSCADE Integrating Project, funded under the European Commission ICT 7th Framework Program. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Jan Biemond. The authors are with the I-Lab, Multimedia Communications Research Group, Faculty of Engineering and Physical Sciences, University of Surrey, Guildford, GU2 7XH, U.K. (e-mail:
[email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/JSTSP.2011.2108113
Fig. 1. Video-plus-Depth type representation of 3-D video.
Depth maps are not only used for assisting the generation of stereoscopic views in 3-D-TV, but are also exploited in synthesizing novel camera viewpoints in the context of multi-view free-viewpoint video applications. Depth maps are utilized in projecting the image coordinates of one camera viewpoint to the image coordinates of another camera viewpoint through the 3-D space coordinates. This process is usually named as Depth Image Based Rendering (DIBR). The accuracy of the depth maps used for view synthesis directly affects the structural and temporal consistency of the video objects in synthesized views. Hence, assuring a certain level of quality of depth maps is necessary to minimize the amount of structural distortions in the synthesized novel camera viewpoints. 3-D video provides an additional experience of depth to its viewers, in comparison to 2-D video. Humans perceive the depth of different objects in a scene by making use of different cues. 3-D display systems provide additional cues to its viewers that enhance the viewers’ perception of depth in a video scene. The most important one of these additional cues is the binocular disparity, which is obtained by providing two views of the same scene, from slightly different perspectives, to the each eye of a viewer. Head motion parallax, which enables the users to see different views while moving the head sideways, is another additional cue provided by modern 3-D display systems that enhances depth perception. Some recent studies have suggested that the quality of the depth map is less important for the perception of depth in 3-D video [3], [4]. These studies showed that the users do not perceive a considerable change in depth, even when the depth maps are coarsely quantized. However, coarser quantization is likely to result in degradation in the quality of novel camera views, which are generated using the corresponding depth maps. This is mainly due to the blocking distortion generated along the edges of the depth map, where such edges correspond to depth discontinuities. The depth map of a video scene can be estimated in two ways: active depth estimation and passive depth estimation. In
1932-4553/$26.00 © 2011 IEEE
336
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 5, NO. 2, APRIL 2011
active depth estimation, depth maps are directly captured (measured) with a depth-range camera, whereas in passive depth estimation, depth maps are generated by applying computer vision algorithms on color texture video frames and utilizing the inter-camera correspondences. Optical noise in the depth maps caused by the differences in the reflectivity of infrared (IR) sensors according to the color variations of objects is a major problem with active depth estimation [5]. Spatial and temporal inconsistencies within depth maps constitute another problem, especially in passive depth estimation. The effect of such inaccuracies with depth estimation techniques need to be eliminated to ensure high-quality virtual view generation. Preprocessing of depth maps is required mainly to assure high-quality virtual view generation, as in stereoscopic video and free-viewpoint video. Most of the preprocessing techniques proposed for application on depth maps consider limited aspects of its usage, such as hole-filling or compression. However, in usage scenarios like broadcasting or live streaming, a single video-plus-depth stream is delivered to various types of displays, such as stereoscopic displays, auto stereoscopic multi-view displays and displays with free view point capability. Therefore, a depth map preprocessing technique should be generic and robust enough to handle such use-cases. A 3-D video display system has a characteristic far clipping plane and a near clipping plane that define the maximum distance that the objects can be displayed behind the screen and in front of the screen, respectively. The depth map contains the corresponding disparity information of a particular color texture image, and is usually represented as a gray scale image and contains 255 distinct depth planes. However, the users are not able to perceive all the 255 depth levels, only a few distinct depth levels. The amount of perceivable depth planes depends upon the display characteristics, especially the difference between the near clipping plane and the far clipping plane. Humans cannot usually perceive sufficiently small depth changes in a scene, which is a result of the binocular stereopsis. Models are developed within psycho-physical (psychological and physiological) studies in the context of perception, which define and explain the just noticeable depth difference (JNDD) perceived by humans, using different depth cues [6]. Similar to the case of depth perception in stereoscopic video, view synthesis in free-viewpoint video is also sensitive only to depth differences above a certain value, which affect the reconstruction of background–foreground object separation regions in the rendered video sequences. The higher the depth difference between two horizontally neighboring pixels, the higher is the chance that the connectivity between the corresponding pixels is lost after perspective projection. Hence, visual distortions are more likely to be triggered in this case. In this paper, we propose an experimentally derived JNDD model to apply to a stereoscopic 3-D video display system. We prove that the proposed JNDD model can be adapted to suit various types of stereoscopic displays. Using the JNDD model that is derived as a base, we design a novel depth map preprocessing technique that enables depth maps to be compressed at significantly low bit rates, while ensuring that the perceived 3-D quality and the view synthesis quality are not compromised.
The rest of the paper is organized as follows. In Section II, we provide the details of some relevant work that is carried out in the scope of this paper. Section III gives a brief introduction to how humans perceive depth with binocular stereopsis. Section IV describes the details of the derivation of the JNDD model. In Section V, we propose a depth map preprocessing technique based on the derived JNDD model. We discuss the applicability of the derived model to different types of stereoscopic displays in Section VI. Finally, we conclude the paper in Section VII with the discussion of some future research aspects in the related area. II. RELATED WORK There is a considerable amount of work done in the literature, concerning various aspects of the human depth perception. These works mainly originate from the psychological and physiological researches for understanding the human perception. In [6], quantitative analysis of different cues for depth perception was introduced. However, to the best of our knowledge, there is no work published on a just noticeable depth difference model as applied to stereoscopic viewing on a 3-D video display. Depth maps can be preprocessed in certain ways to reduce the amount of visual holes (dis-occluded regions) generated due to the dis-occlusions in virtual camera viewpoints, which is common with depth image based rendering techniques. Further, depth maps need to be preprocessed to eliminate rendering artefacts that arise due to the inaccuracies in the depth map estimation process. Smoothing the depth map with a symmetric Gaussian filter was proposed in [1], as a method to reduce the number of visual holes generated during the stereoscopic view pair generation. This method causes uneven enlargement of objects in the 3-D scene, which is known as the rubber sheet effect. An asymmetric Gaussian filter was used in [7], to smooth the depth maps. The strength of the proposed smoothing filter is kept low in the horizontal direction, in comparison to the vertical direction. Objects are comparatively less deformed in the virtual viewpoints with this method. A distance-dependant depth map filtering was introduced in [8], as a method for improving the quality of stereoscopic viewpoint generation. In this method, the strength of the Gaussian filter is reduced in frame regions that are far apart from the video object boundaries. In free-viewpoint video applications, which deal with much wider camera baseline distances, view synthesis necessitate to keep the video object boundaries as sharp as possible. Hence, sharpening the depth map object boundaries is desired, rather than smoothing. Particularly in [9], the authors proposed to apply a video object shape adaptive up-sampling filter for the depth maps encoded with reduced spatial resolution, in order to decrease the blurring effect after up-scaling on the video object boundaries of the depth maps. Hence, the structural consistency of video objects is not compromised after perspective projection. Accordingly, the visual quality of the synthesized arbitrary camera viewpoints is improved. A common problem that exists with the current depth estimation techniques is that, the object boundaries in the color texture frame do not coincide with those in the corresponding depth
DE SILVA et al.: DISPLAY DEPENDENT PREPROCESSING OF DEPTH MAPS BASED ON JNDD MODELING
337
Geometrically, it can be proven that the disparity is proportional of the point Q with respect to point P, to the relative depth and inversely proportional to the square of the viewing distance . [12]: (2) Similarly, the angular disparity caused by the point R given as
is
(3) Fig. 2. Geometry of binocular stereopsis. An objects’ depth is perceived with relative to the fixation point. As the object comes toward the fixation point, angular disparity is reduced. Angular disparity is proportional to the depth perceived with relative to point P.
map frame. In [10], a three-step process is introduced to align the depth maps with their corresponding color texture frames. The depth maps are down-sampled and filtered using a joint bilateral filter and then up-sampled back to their original spatial resolution using a joint bilateral filter. Another problem that comes to the concern with the modern multi-view depth estimation methods is that the 3-D inter-camera correspondences are not exploited completely at once, to obtain the most accurate depth map information. In [11], the authors proposed a video content adaptive multi-dimensional median filter to be applied to raw multiview depth map frames to enhance the visual quality of the synthesized arbitrary free-viewpoint videos.
III. BINOCULAR STEREOPSIS AND 3-D VISION Humans are able to perceive the depth of different objects in space. Many psychological and physiological cues are made use of to perceive the depth. Out of these cues, the binocular stereopsis is an important physiological cue that enables the depth perception, with the aid of two eyes. Furthermore, the binocular stereopsis is the most significant additional cue that is provided by modern 3-D display systems. In this section, we discuss the basics of binocular stereopsis as a cue of depth perception for humans. Fig. 2 illustrates how the geometry of the binocular vision gives rise to slightly different images for the two eyes. If both eyes are fixating on the point P, then the images cast by that point fall on the center of the fovea of each eye. Assume that the point Q casts the image degrees away from one eye’s fovea and degrees away from the other eye’s , fovea. The binocular disparity in this case is given by measured in degrees of visual angle. Binocular disparity acts as the stimulation that enables the depth perception with binocular stereopsis. It has both a magnitude and a direction. Let the angular disparity stimulated by point Q be denoted by
(1)
and . The brain inAccording to (1) and (3), terprets this sign difference as the relative positioning of points Q and R with respect to point P. When the binocular disparity is greater than zero, the brain interprets that the point is behind the fixation point and vice versa. Thus, binocular disparity provides stimulation to the brain to perceive the relative depth of the objects, with respect to a fixation point. Equation (2) brings into the interest another kind of relationship. According to (2), as the viewing distance increases, the binocular disparity decreases. Hence, at larger viewing distances, to perceive a certain depth difference between a fixating point and an object, the fixation point and the object should be wider apart than at smaller viewing distances. In other words, the just noticeable depth difference increases with increasing viewing distance. According to the graphs given in [6], JNDD perceived with binocular disparity varies with the viewing distance, as given denotes the just noticeable depth difference in (4). In (4), and denotes the viewing distance, both measured in meters (m). Hereinafter, we refer to the relationship given in (4) as the : JNDD model for real world viewing scenarios JNDD (4) with the viewing distance is illustrated in Variation of the the Fig. 3. According to this relationship, the binocular disparity is a dominant depth cue only in the close proximity. However, this model, which is developed for real world viewing scenarios, has limited applicability to stereoscopic video display systems. Therefore, in the next section we derive a JNDD model, which is applicable to a stereoscopic video display. IV. JNDD MODELING FOR STEREOSCOPIC DISPLAYS The JNDD model that was discussed in the previous sechas limited applicability to a stereoscopic distion JNDD play, in the context of depth perception. The reason is that, when a viewer watches a 3-D video on a stereoscopic display, he/she always fixates his/her eyes on the screen and therefore, model the viewing distance does not change. The JNDD in (4) is defined to explain how the just noticeable depth difference changes when the viewing distance is varying. Therefore, in this section, we describe and discuss an explorative experiment that is conducted, to understand how sensitive the users
338
Fig. 3. JNDD Model for real world viewing (JNDD increases the JNDD increases exponentially. (a) JNDD
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 5, NO. 2, APRIL 2011
). These graphs illustrate the variation of JNDD with the viewing distance. As the viewing distance
Model in the log scale. (b) JNDD
Model in physical scale.
are in perceiving the depth changes in a 3-D video scene, on a stereoscopic display. A. Experimental Setup The experiment is performed on a 42-in Philips WoWvx multi-view auto-stereoscopic display. The display resolution is 1920 1080 and aspect ratio of the screen is 16:9. The peak luminance of the display is 200 cd/m . The viewing distance for all subjects is set at 3 m from the screen. The near clipping plane and far clipping plane of the display used are both placed at approximately 15.25 cm apart from the screen plane. Video objects that have a depth level of 128 have zero disparity (co-planar with the screen plane). The depth levels 0 and 255 are displayed 15.25 cm behind and in front of the screen, respectively. 18 subjects in total (6 experts and 12 non-experts) participate in this experiment. Subjects are asked to watch a synthetic image sequence with two objects, both representing a synthetic image of a car, as shown in Fig. 4(a). Initially, the two objects are placed at the same depth level with relative to the screen. The depth of the right object is gradually changed (increased or decreased) at a predetermined rate, while the depth of the left hand side object is kept unchanged. The subjects need to signal the coordinator, just when they sense a change in the depth level difference between the two objects. They are asked if the right object moves towards to the front or behind, relative to the left hand side object. To ensure the reliability of the experimental outcomes, the responses of the subjects who answer this question incorrectly are discarded from the final analysis. In our experiment, the depth of the right object is changed in increments of one depth level. The color texture image is not changed/altered during the entire sequence. The depth level of the right object is changed every 1.5 second (three frames in the depth map for each depth level and the sequence is played at 2 fps). Each subject is given at least four rounds of training before the actual experiment. The experiment is performed with several
Fig. 4. Example of a texture image and the corresponding depth maps used for the JNDD experiment. Two objects are shown side by side to the subject, and the depth of the right object is increased gradually, until the subject notices a depth difference between the two objects. (a) Texture Image. (b) Initial depth map (t = 0). (c) Depth map at t = 90 s. (d) Depth map at t = 180 s.
depth levels and eight readings are taken from each subject, as given in Table I. Experimental conditions and the corresponding results are summarized in Table I. Fig. 4 shows an example of a sequence that simulates that right object moving toward the user. B. Results of the JNDD Modeling Results of the experiment described in III-A are summarized in Table I and further illustrated in Fig. 5. According to Fig. 5(a), subjects are more sensitive to objects moving backwards than the objects moving forward from the initial depth level. Fig. 5(b) illustrates the average number of unnoticed depth level differences, at each testing depth level. This variation illustrated in Fig. 5(b) is hereinafter referred to as the JNDD model for stereoscopic displays JNDD . C. Discussion of Results In this subsection, the behavior of the JNDD model for is discussed. When the substereoscopic displays JNDD jects watch 3-D video, they fixate their eyes on the screen. As illustrated in Fig. 6, the subjects see the Point Q behind the
DE SILVA et al.: DISPLAY DEPENDENT PREPROCESSING OF DEPTH MAPS BASED ON JNDD MODELING
339
TABLE I SUMMARY OF THE EXPERIMENTS AND RESULTS
Fig. 5. Variation of just noticeable depth difference for a 3-D video display at various testing depth levels. Unnoticed depth level difference is a minimum at the screen level, when there is no disparity. As the depth of objects increases with relative to the screen, the unnoticed depth level difference increases. Testing depth level is the initial depth level of the objects. (a) Unnoticed depth level difference for objects moving in opposite directions from the initial depth level. (b) Average of unnoticed depth level difference at various depth levels (JNDD ).
Fig. 6. Explanation of the JNDD model for stereoscopic displays. In contrast to the real world depth perception, users always fixate there eyes on the display screen. Depth is perceived with relative to the screen and as the initial depth of the object (Testing Depth Level) increases, the number of unnoticeable depth levels is also increased.
screen (or in the opposite case, in front of the screen), due to the binocular disparity. The binocular disparity caused by point Q is calculated as in (1). is the stimulus that enables the subject to initially perceive that the point Q is behind the screen. Next, the point Q is from its original position to the point Q’, moved a distance , corresponds which is shown in Fig. 6. The change in depth, . The subjects do not perceive a to a stimulus difference of
change in the depth level of the point, until the stimulus differreaches a specific threshold. ence When the point Q is on the screen, i.e., when the depth level is 128, no binocular disparity with respect to the screen is stimulated. Therefore, the perceivable depth difference is at its minimum at the depth level of 128. However, the unperceivable depth difference is not zero. This can be described with the just noticeable depth difference model for real world viewing , explained before. According to this model, to no(JNDD tice a depth difference with the aid of binocular stereopsis, at a viewing distance of 3 m, the video objects should be at least 4.7 cm apart. It is clear from the results in Fig. 5(b) that, as the tested depth level increases, the unnoticed depth level difference also increases. This relationship is qualitatively consistent with the Weber’s law that quantifies the perception of change in a given stimulus [13]. According to Weber’s law, at a larger initial stimulus, a larger stimulus difference is required for a subject to perceive a change in the initial stimulus. When the testing depth level increases, the initial stimulus also increases. According is required to the Weber’s law, at a larger value, a larger to perceive a change in . Therefore, the unnoticed depth level difference is greater, at larger testing depth levels. This phenomenon is the same when the testing depth level is positioned behind the screen or in front of the screen. The
340
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 5, NO. 2, APRIL 2011
Fig. 7. Depth range versus maximum allowable physical depth difference.
reason is that in these scenarios, as described in Section III, the direction of the binocular disparity is different. The Weber’s law deals only with the magnitude. Therefore, the shape of the JNDD graph in Fig. 5(b) is almost symmetrical besides the zero disparity level (128). D. Applicability to View Synthesis In the context of free-viewpoint video and view synthesis, the just noticeable depth difference corresponds to the maximum depth level difference between the two neighboring pixels, for which the connectivity between the corresponding two pixels after perspective projection remains unchanged. In other words, for depth level differences above the just noticeable depth difference, the pixel separation between two originally neighboring pixels becomes greater than zero after projecting to another viewpoint. and in Considering two neighboring pixels the source viewpoint, for which there is a difference in their corresponding depth levels, the distance between them after projecting (warping) to another viewpoint is directly dependant on their original depth levels and the distance of the source camera to the projected camera. It is assumed that the multi-viewpoint video set consists of cameras arranged on a line and the virtual camera viewpoints also lie between the real cameras. Hence, during view synthesis, the disparity causes shifts of pixels only in the horizontal direction. Therefore, the neighborhood only in the horizontal (x) direction is under concern. Using the pinhole , origcamera model, the raw image coordinate of a pixel , after warping to the inally present in a source viewpoint 3-D world coordinates becomes (5) and, once the 3-D world coordinates are mapped back to the coordinates of the target viewpoint , it becomes (6) Putting (5) in (6), the resulting coordinate becomes
(7)
Fig. 8. Depth range versus maximum allowable depth level difference. This graph provides an upper bound for the luminance difference in depth pixels, to not cause any structural distortions in synthesized virtual views.
corresponds to the real world (physical) depth value of the pixel. The camera indices of the 3 3 affine matrix and the are dropped, as these inverse of the 3 3 rotation matrix matrices are the same for all viewpoints in the multi-viewpoint array under concern. corresponds to the 3 1 translational matrix. For simplicity, the second term is defined as (8) between the two Accordingly, the horizontal pixel distance and after warping to the neighboring pixels is coordinates of the target viewpoint
(9) The second term in the resulting equation decides if the connecand will be tivity of the two neighboring pixels broken or not in the target viewpoint, as the first term (equal to 1) is the ideal distance. The second term is a floating point number. This term is computed and treated as the nearest integer value. becomes 1 and otherwise, If this term falls under 0.50, then becomes larger than 1 and the connectivity is broken. Fig. 7 shows the maximum allowed physical depth difference for maintaining the connectivity under concern, i.e., the just noticeable depth difference, versus the depth range under concern in a 3-D video scene. This illustration is specific to the Akko test sequence (one of the test video sequences used to justify the proposed preprocessing technique described later in the paper), in which the camera parameters as well as the viewpoint warping baseline distance are exploited. Apparently, this maximum allowed difference tends to decrease when coming towards to the capturing camera. Fig. 8 illustrates the corresponding maximum allowed depth luminance difference versus depth range. This graph provides an upper bound for luminance value changes allowed in a depth map. In other words, it shows the maximum depth luminance difference that is allowed for a preprocessing operation, to not trigger any structural distortions in synthesized virtual view points.
DE SILVA et al.: DISPLAY DEPENDENT PREPROCESSING OF DEPTH MAPS BASED ON JNDD MODELING
CALCULATION OF
341
TABLE II VALUE FOR EACH BIN
D
Fig. 9. Example of the preprocessing operation applied on depth maps.
V. PREPROCESSING OF DEPTH MAPS BASED MODEL ON THE JNDD In this section, we introduce a preprocessing method based on the JNDD model derived in the previous section. The aim of this method is to preprocess the depth maps to reduce the bit rate required to transmit the depth map while not affecting both the stereoscopic view generation and the arbitrary view synthesis. Ideally, the proposed filter is an edge preserving low-pass filter similar to the bilateral filter. However, in the proposed method the edge threshold is adaptively varied according to the JNDD model. The performance of the proposed filter is compared against several depth processing techniques that has been proposed, as well as with non-adaptive bilateral filtering [14]. A. Preprocessing Filter Design Let the original luminance value at , be denoted as . The new luminance value (result of the preprocessing operation) in the depth map , is calculated of the pixel position as follows: (10) In (10), is the window width in pixels, is the luminance , and is found as follows: value of pixel position if if
(11)
In (11), is the just noticeable depth difference or the un, which is noticed depth level difference corresponding to calculated according to the model derived experimentally in the should change according to the previous section. Ideally, , in this case) as given on the depth level of each pixel ( graph in Fig. 5(b). However, to reduce the computational comdiscretely, considering a specific plexity, we calculated
set of bins. If the number of bins is larger, the accuracy of the results would be much higher, on the hindsight, the complexity would be greater. To find a compromise between the accuracy and the complexity, we decided upon 4 bins. These bins are illustrated as if if if if
Bin Bin Bin Bin
(12)
for each of the bins identified in (12) is calculated as the average of the unnoticeable depth level differences, at the beginning and end depth level of each bin. This calculation is for further shown in Table II. Since we deal with integers, each bin, as shown in Table II, is rounded to the nearest integer number. In these experiments, a window width of 15 is used for for each the preprocessing operation. Also, note that the bin is less than or equal to the upper bound provided by the graph in Fig. 8, so that this preprocessing operation does not affect the view synthesis process. The usage of this preprocessing filter is illustrated in Fig. 9. To calculate the preprocessed depth value for the pixel position highlighted, a window that is symmetrically placed around the pixel is considered. Next, based on the original depth value (i.e., is determined. This value is 125 in the example), used to generate the mask C, in Fig. 9. Finally, the preprocessed depth value is calculated by taking the average of the luminance values in the window, masked by C. B. Near Lossless Performance Analysis A near lossless performance analysis is performed to measure the amount of data contained in depth maps processed with different techniques. To do so, the depth maps are compressed by the Intra mode of Joint Model (JM) H.264/AVC reference software, version 15.1, at Quantization Parameter (QP) value
342
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 5, NO. 2, APRIL 2011
TABLE III JM INTRA MODE QP = 0 BIT RATE (kbits) FOR DEPTH MAPS
distortion sition
caused by a change of pixel value located at pois calculated as
if if (13)
Fig. 10. Amount of bits necessary to Intra code (JM) depth maps at
QP = 0.
equal to zero. In this setting the JM coder can be considered as a near-lossless video encoder. The comparison results are illustrated in Table III and in Fig. 10. The proposed algorithm is compared against the and ), the symmetric Gaussian filter [1] ( and ), and asymmetric Gaussian filter [7] ( the bilateral filter [14] at two different edge threshold values ( and ). For each of these filters the window size is selected to be [7]. The two are selected to reflect low and medium edge values of thresholds. According to the results illustrated in Table III and in the Fig. 10, symmetric Gaussian or asymmetric Gaussian filtered depth maps need the least amount of bits for near lossless compression. Further, as the edge threshold of the bilateral filter is the bit rate is reduced. However, these preincreased processing techniques affect the depth perception in different ways and this is analyzed in the next subsection. There is a near lossless coding bit rate gain of 21%, 19%, 55%, and 40% for the sequences Breakdancers, Ballet, Orbi, and Interview, respectively, in comparison with the unprocessed depth maps. In the following illustrations, the Sigma refers to the edge introduced earlier. Visual results are illusthreshold trated in the Fig. 11, showing the effect on depth maps by the different preprocessing techniques considered. Note that the smoothness of the depth map increases with the edge threshold of the bilateral filter. C. Analysis of the Perceived Depth Distortion In order to assess the effect on perception of depth caused by different filtering techniques described in the previous section, a simple technique based on the JNDD modeling is proposed here. In this technique we measure the area of the image (number of pixels) in which the perception of depth is affected or distorted. For the following explanation, let and denote the original and the distorted depth images, respectively. The perceivable
In (13), is the just noticeable difference in depth or the number of unnoticed depth levels corresponding to the depth at , denoted by . This is the same as given in , is considered to be a the (12). A pixel for which pixel where there is a distortion in the perception of depth. The , is taken as the distorted number of pixels for which area of depth perception. For the different preprocessing techniques considered in the previous subsection, the distorted area of depth perception is presented in Table IV, as a percentage of the total number of pixels in the image. According to the results in Table IV, it could be seen that, both asymmetric and symmetric Gaussian filtering, which were originally introduced as a solution for filling the dis-occluded regions, causes very high percentage of perceivable depth distortions. Non adaptive bilateral filtering on depth images would cause distortions in perception of depth if it exceeds a particular smoothing strength. The proposed method, which could be considered as an adaptive bilateral filter, can be used as a method to preprocess depth maps while ensuring there will not be perceptual depth distortions. D. Rate Distortion Analysis A Rate-distortion analysis is performed to measure the compression efficiency of the preprocessed depth maps. The preprocessed depth map sequences are compressed with the Joint Model (JM) H.264/AVC reference software, version 15.1. The experimental conditions are listed in Table V. The experiments are performed with several test sequences and the results for four sequences that are commonly used in stereoscopic video literature are reported. The sequences are, “Ballet” and “Breakdancers” from Microsoft Research [15] and “Interview” and “Orbi” from the ATTEST project [1]. Results are presented at 15 fps for “Ballet” and “Breakdancers” and at 25 fps for “Interview” and “Orbi.” which are the frame capturing rate for these sequences. Rate-distortion performance curves for these sequences are illustrated in Fig. 12. In these graphs, the horizontal axis represents the bit rate required to encode the depth map and the vertical axis represents the average PSNR of the reconstructed
DE SILVA et al.: DISPLAY DEPENDENT PREPROCESSING OF DEPTH MAPS BASED ON JNDD MODELING
343
Fig. 11. Comparison of different preprocessing techniques with the original depth map. (a) Asymmetric Gaussian filtering. (b) Symmetric Gaussian filtering. (c) Bilateral Filtering at Sigma = 0:02. (d) Bilateral filtering at Sigma = 0:1. (e) Proposed filtering technique. (f) Original depth map (unprocessed). TABLE IV AREA% OF DISTORTED DEPTH PERCEPTION
depth map. The four points correspond to the Quantization Parameter (QP) values 10, 20, 30, and 40 for Predictive (P) frames. Intra (I) frames are quantized with a QP of 10 for the points. Fore each sequence, the proposed filtering technique is compared against the unprocessed depth maps as well as with the bilateral filter at two distinct edge thresholds. The depth maps of “Interview” and “Orbi” sequences are captured with a depth range camera [1]. There is a significant amount of high texture frequencies in these depth maps. These high frequencies usually correspond to either perceptually unnoticeable depth differences or optical noise [5] that is inherent with such depth range capture technologies. The proposed spatial filter is suitable for preprocessing such kind of depth maps, which contains significant amounts of optical noise as well as
unperceivable depth variations. For the “Interview” sequence, an average of 62% (calculated according to [16]) of the depth map coding rate is saved at high bit rates. Similarly, an average depth map coding bit rate saving of 79% is observed for the “Orbi” sequence compared to the unprocessed depth maps. Fig. 13(a) and (b) provides a visual examples for original and preprocessed depth maps of the “Orbi” sequence. The depth maps for “Ballet” and “Breakdancers” are estimated with computer vision techniques [15] that take in to account the relative disparity in texture viewpoints captured with multi-view camera arrays. These depth maps consist of piecewise smooth frames and contain less amount of optical noise, compared to the depth maps captured with depth range cameras. Average savings in the bit rate of 22%–24% are observed, by uti-
344
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 5, NO. 2, APRIL 2011
Fig. 12. Rate-distortion analysis for unprocessed (original) and preprocessed depth maps. (a) “Ballet” sequence. (b) “Breakdancers” sequence. (c) “Interview” sequence. (d) “Orbi” sequence.
lizing the preprocessed depth maps, for these video sequences. Fig. 13(c) and (d) provides visual examples for original and preprocessed depth maps of the “Ballet” sequence. Other than the optical noise, the difference in the bit rate gains for Microsoft sequences and ATTEST sequences could also be explained with the difference in the quantization process of the depth maps from floating point real values (as captured by the depth estimation mechanism) in to 8-bit range values. Microsoft sequences are piecewise smooth images, which are uniformly quantized. ATTEST sequences on the other hand are non-uniformly quantized and therefore, have more spatial details at the near end. This leaves more depth details that are not visible to humans at the near end. Therefore, the preprocessing filter has more effect on such depth maps, since they contain more visually unperceivable depth details that could be suppressed. As illustrated in the Fig. 12, the rate distortion performance of the bilateral filter improves with the increasing edge threshold (Sigma). However, as discussed in the previous subsection, the increase in the edge threshold would lead to distortions in depth perception in certain regions. It is because humans have different levels of sensitivity at different simulated depth levels. Therefore, it is not acceptable to opt for bilateral filters with high edge thresholds considering only its rate distortion performance. Thus, it is justified to adaptively select the edge threshold as
in the proposed technique to maximize the R-D performance without affecting the depth perception in 3-D video. E. Visual Analysis of Generated Stereoscopic Views The depth maps are used to generate stereoscopic views in 3-D video applications. To analyze the quality of the stereoscopic view generation, an uncompressed color texture image was rendered with the corresponding depth map (all compressed ), according to the MPEG informative recomwith mendations [2]. The virtual views generation process is shown in Fig. 14. In this process the original image points at locations are transferred to new locations and for left and right view, respectively. This process is defined with (14) (15) (16) where is the pixel parallax and is the distance between the left and right virtual cameras or the eye separation, which is assumed to be 6 cm. is the viewing distance (250 cm) and denotes the depth value of each pixel in the reference view. and specify the range of the depth information, respectively,
DE SILVA et al.: DISPLAY DEPENDENT PREPROCESSING OF DEPTH MAPS BASED ON JNDD MODELING
345
Fig. 13. Examples for the depth maps preprocessed with the proposed method. Note that the high frequency texture areas of the original depth maps are smoothened out in the pre-processed ones. (a) “Orbi” original unprocessed depth map. (b) “Orbi” preprocessed depth map. (c) “Ballet” original unprocessed depth map. (d) “Ballet” preprocessed depth map.
Fig. 14. Virtual view generation in DIBR process.
or the right eye view, as displayed on a stereoscopic display. The rendering consistency is improved by eliminating optical noise using a bilateral filter as shown in Fig. 16(a) and (b). Further, the bit rate required to transmit depth maps can also be reduced by using such a de-noising filter. However, the level of rendering consistency and the bit rate gain depends upon the edge threshold as illustrated in Fig. 16(a) and (b). The proposed technique adaptively selects the edge threshold to maximize the bit rate gain and the rendering consistency, while not affecting the perceived depth in a 3-D video scene. Thus, the proposed preprocessing technique reduces the rendering artefacts originated especially from the optical noise that is inherent in depth maps, while reducing the bit rate required for encoding the depth maps.
TABLE V EXPERIMENTAL CONDITIONS FOR R-D ANALYSIS
F. Subjective Analysis of 3-D Video
behind and in front of the picture, relatively to the screen width. is the screen width measured in pixels. Virtual cameras were selected so that the epipolar lines were horizontal, and thus the component is constant. The (16) is in accordance with the MPEG informative recommendations in [2]. The dis-occluded regions (visual holes) are filled by background pixel extrapolation method, described in [1]. Visual examples are provided for two of the sequences in Figs. 15–19. These figures correspond to either the left eye view
A subjective evaluation is carried out to asses the suitability of the proposed preprocessing algorithm. The evaluation is carried out according to the Double Stimulus Continuous Quality Scale (DSCQS) method, as described in ITU-Recommendation BT.500-11 [17]. However, a few changes are done in the methodology, for a better adaptation to 3-D evaluation. The experimental conditions, for this analysis, are listed in Table VI. The reference signal is the 3-D scene rendered with the original, i.e., unprocessed, depth maps. Subjects are asked to rate the image quality, as well as the perceived depth of the 3-D scene rendered with the preprocessed depth maps, with respect to the reference signal. The subjects are asked to rate the 3-D sequences on a scale as shown in Fig. 20. In the subjective evaluations, 18 subjects are used consisting of 6 experts and 12 non-expert viewers. This is in accordance
346
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 5, NO. 2, APRIL 2011
Fig. 15. “Interview” left eye view rendered with the unprocessed depth map compressed at 691 kbits. The dotted rendering artefacts are due to the optical noise in original depth map.
Fig. 16. (a) “Interview” left eye view rendered with the depth map preprocessed by the bilateral filter (Sigma = 0:02) compressed at 507 kbits. (b) “Interview” left eye view rendered with the depth map pre-processed by the bilateral filter (Sigma = 0:1) compressed at 397 kbits.
with the ITU-Recommendation BT.1663-4 [18], which recommends conducting the experiments with at least 5 expert subjects. This recommendation lists certain advantages of using expert subjects, such as the reduction in the duration of the experiments and to the better identification of certain salient differences and artefacts. According to the results in Table VII, it is clear that neither the image quality nor the depth perception of the 3-D scenes is affected by the proposed preprocessing algorithm. The “Ballet” sequence contains significant amounts of temporal inconsistencies in the depth maps originally, which cause flickering. The
preprocessing algorithm was able to reduce some of these artefacts, owing to the smoothing effect of the filter. It is the reason for the small increase in the subjective image quality, with the utilization of the proposed preprocessing algorithm. G. View Synthesis Results To illustrate the performance of the proposed filter on the view synthesis based free-viewpoint video scenario, the multiview test videos called Akko&Kayo (provided by the Nagoya University, Japan) and News (provided GIST, Korea) are utilized. Two viewpoints are used for the coding experiments
DE SILVA et al.: DISPLAY DEPENDENT PREPROCESSING OF DEPTH MAPS BASED ON JNDD MODELING
347
Fig. 17. “Interview” left eye view rendered with the preprocessed depth map compressed at 410 kbits. The dotted rendering artefacts are also reduced due to the preprocessing of the Original depth map.
Fig. 18. “Orbi” left eye view rendered with the un-processed depth map compressed at 947 kbits. The dotted artefacts arise due to optical noise that exists in the original depth map.
Fig. 19. “Orbi” left eye view rendered with the prprocessed depth map compressed at 429 kbits. The dotted artefacts are significantly reduced, due to the removal of optical noise in the original depth map, with preprocessing operation.
Fig. 20. Scale for subjective evaluation.
and for the rate-distortion performance evaluation. The JM version 15.1 is used with the same encoder configurations as in Section V-D for this experiment. For evaluating the depth map coding performance, the quality of the rendered virtual
viewpoint in between the two real camera viewpoints is considered. Peak-signal-to-perceived-noise ratio (PSPNR) [19], as adopted by the MPEG FTV group, as well as the classic PSNR is utilized to measure the quality of the synthesized views. It is clear from the comparison of the performance curves of two schemes (with and without the proposed preprocessing method), given in Table VIII, that the proposed preprocessing scheme can achieve the same view synthesis quality at a low bit rate to encode the depth map. Both the PSPNR and the PSNRbased comparisons show similar gains. The performance of symmetric and asymmetric Gaussian filters is not acceptable for view synthesis, even though the bit rate required to encode such depth maps is minimal. Non adaptive
348
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 5, NO. 2, APRIL 2011
TABLE VI SUBJECTIVE EVALUATION CONDITIONS
TABLE VII RESULTS OF THE SUBJECTIVE EXPERIMENTS
Fig. 21. Illustration of various stereoscopic display types. (a) Reference system. (b) Viewing distance and depth perception is doubled. (c) Depth perception is doubled at the same viewing distance.
bilateral filtering with a high edge threshold seems to achieve the same view synthesis quality at a lower bit rate than the proposed method. However, such filtering is suitable only in scenarios such as free view point video (FVV), which does not provide 3-D perception. Therefore, for application scenarios that requires depth perception as well as arbitrary viewpoint synthesis, the proposed adaptive filter is comparatively suitable than the compared preprocessing techniques. VI. APPLICABILITY OF THE JNDD MODEL TO DIFFERENT TYPES OF STEREOSCOPIC DISPLAYS The JNDD model that we derived in Section IV is based on a flat panel auto stereoscopic display. However, this model may
be extended to a variety of stereoscopic display types. In this section, we discuss how this model can be applied to a different type of stereoscopic display. Stereoscopic displays may vary from each other in terms of how the rendering is done. Fig. 21 illustrates three different types of variations [2]. For the purpose of illustration, we assume that Fig. 21(a) represents the type of display, which is used for the JNDD model derivation in Section IV. For stereoscopic displays, there are two pixels that represent each point in a 3-D space. These are denoted as and in Fig. 21. The distance between and , which is referred to as pixel parallax and denoted in Fig. 21 as , is proportional to the perceived depth of a space point by a viewer. The two eyes of the sub-
DE SILVA et al.: DISPLAY DEPENDENT PREPROCESSING OF DEPTH MAPS BASED ON JNDD MODELING
TABLE VIII VIEW SYNTHESIS QUALITY AT DEPTH MAPS CODED AT
349
QP = 20
minimum, at the screen level, to be steeper than the reference method. However, since the viewing distance does not change, the unnoticed depth level difference at the minimum (at screen level), should be almost the same as that in the reference display system. The expected JNDD behavior for this type of displays is illustrated as Display type 2 in the Fig. 22. According to the discussion above, the JNDD model will vary depending on the stereoscopic display type. The preprocessing filter parameters need to be adapted to suit to these varying disvalues for the bins given in play types. In particular, the (12) need to be changed depending on the JNDD model for the particular display. Accordingly, for display types illustrated in Fig. 21(b) and (c), the depth maps can be smoothened more than that for the reference type, without affecting the viewing quality. However, when the depth maps are pre-processed for a display of type 1 or 2, it is not suitable for a display of the reference type. Thus, the JNDD for a particular display type provides an upper bound for the tolerable depth pixel value differences, which may be caused by the preprocessing operation. Fig. 22. Expected JNDD models for different kinds of stereoscopic displays shown in Fig. 21.
VII. CONCLUSION AND FUTURE WORK
ject are assumed to be separated with distance as depicted in Fig. 21. Fig. 21(b) illustrates how the perceived depth can be doubled in a stereoscopic display, when the viewing distance is doubled. In these types of displays, for the same pixel parallax as in the reference system, users perceive a much greater depth. However, since the pixel parallax does not change, the disparity on the eyes does not change as well. Therefore, as for the JNDD model for this type of display, we can expect a minimum at the screen level and the slopes that are parallel to the reference system on the either side of the minimum. However, the unnoticed depth level difference at the minimum (screen level) is higher than the reference. This is because, when the viewing distance is greater than that in the reference, according to JNDD , the unnoticed depth level difference is higher at the screen level. The expected JNDD behavior for this type of displays is illustrated as Display type 1 in the Fig. 22. Fig. 21(c) illustrates a scenario in which the perceived depth is doubled at the same viewing distance as the reference. Depth perception is doubled by increasing the pixel parallax. As the pixel parallax increases, the angular disparity on both eyes increases as well. This causes the slopes on both sides of the
This paper has investigated the sensitivity of the human vision system to depth changes in a 3-D video scene, shown on an auto-stereoscopic display. In general, viewers’ sensitivity to depth changes varies at different depth levels (depth ranges) and the changes in the depth at the screen’s depth level are perceived most, where the binocular disparity is zero. Viewers react almost similarly to the depth changes that occur at the depth levels behind and in front of the screen’s depth level (zero disparity depth level). While depth maps are preprocessed for different requirements, the objective of this paper is to preprocess them to reduce the bandwidth required to transmit them. Based on the JNDD model derived, a depth adaptive preprocessing filter is designed to suppress unperceivable depth details in a depth map. Certain high frequency texture areas that correspond to either unperceivable depth details or optical noise in the depth maps are smoothened by the proposed preprocessing filter, while the object edges are preserved. The proposed filter, which is ideally an edge preserving bilateral filter, is compared against non-adaptive bilateral filtering as well as several other methods of preprocessing depth maps. It is seen that preprocessing methods that were primarily aimed
350
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 5, NO. 2, APRIL 2011
to reduce dis-occluded regions (visual holes) in DIBR applications cause distortions to perception of depth in such scenes. Thus, it is important to fill the dis-occluded regions in ways that does not affect perception of depth. Bilateral filtering is suitable to pre-process depth maps so that the rendering consistency is improved and the bit rate required for the transmission of depth maps is reduced. However, the performance of the bilateral filtering solely depends on the smoothing strength or the edge threshold that is used. At a low edge threshold, they do not yield much bit rate reductions and at high thresholds it would affect the perception of depth. Therefore, it is important to adaptively change the edge threshold based on the depth. The proposed filtering method adaptively selects the edge threshold based on the JNDD model. Results of the conducted subjective tests suggest that the proposed depth map preprocessing technique preserves the originally perceived depth and the visual quality of 3-D video scenes. The rendering artefacts that arise due to the inaccuracies in the depth map are reduced and the arbitrary view synthesis quality, as applicable to free-viewpoint video, is also improved with the preprocessed depth maps. Most importantly, the bit rate for depth map coding can be reduced by 24% to 78% compared to the coding rate of the original (unprocessed) depth maps, depending on the content, by preprocessing them. Thus, the proposed filtering method reduces the bit rate required to transmit depth maps by preprocessing them in such a way that it does not affect the depth perception or the arbitrary view synthesis quality. The future work in this area considers the extension of the model, by taking into account the temporal sensiJNDD tivity to depth changes. Furthermore, the application of the proposed preprocessing algorithm in the temporal dimension will be investigated, in addition to the filtering in the spatial dimensions, by considering the differences in the sensitivity to depth changes, observed from the objects moving forward and backwards.
ACKNOWLEDGMENT The authors would like to thank L. Yasakethu for the support given in creating the synthetic sequences that were used in the JNDD modeling experiment.
REFERENCES [1] C. Fehn, “Depth-image-based rendering (DIBR), compression and transmission for a new approach on 3-D-TV,” Proc. SPIE, vol. 5291, p. 93, 2004. [2] ISO/IEC JTC 1/SC 29/WG 11, Committee Draft of ISO/IEC 23002-3 Auxiliary Video Data Representations. WG 11 Doc. N8038 Apr. 2006. [3] C. Hewage, S. Worrall, S. Dogan, and A. Kondoz, “Prediction of stereoscopic video quality using objective quality models of 2-D video,” Electron. Lett., vol. 44, pp. 963–965, 2008. [4] C. Hewage, S. Worrall, S. Dogan, S. Villette, and A. Kondoz, “Quality evaluation of color plus depth map-based stereoscopic video,” IEEE J. Sel. Topics Signal Process., vol. 3, no. 2, pp. 304–318, Apr. 2009. [5] S.-Y. Kim, E.-K. Lee, and Y.-S. Ho, “Generation of ROI enhanced depth maps using stereoscopic cameras and a depth camera,” IEEE Trans. Broadcasting, vol. 54, no. 4, pp. 732–740, Dec. 2008.
[6] J. Cutting and P. Vishton, “Perceiving layout and knowing distances: The interaction, relative potency, and contextual use of different information about depth,” in Perception of Space and Motion, W. Epstein and S. Rogers, Eds. New York: Academic, 1995, pp. 69–117. [7] L. Zhang and W. Tam, “Stereoscopic image generation based on depth images for 3-D TV,” IEEE Trans. Broadcasting, vol. 51, no. 2, pp. 191–199, Jun. 2005. [8] I. Daribo, C. Tillier, and B. Pesquet-Popescu, “Distance dependent depth filtering in 3-D warping for 3-DTV,” in Proc. IEEE 9th Workshop Multimedia Signal Process. MMSP 2007, 2007, pp. 312–315. [9] E. Ekmekcioglu, M. Mrak, S. Worrall, and A. M. Kondoz, “Utilisation of edge adaptive up-sampling in compression of depth map videos for enhanced free-viewpoint rendering,” in Proc. IEEE Int. Conf. Image Process., Cairo, Egypt, Nov. 2009, pp. 733–736. [10] O. Gangwal and R. Berretty, “Depth map post-processing for 3-DTV,” in Dig. Technical Papers Int. Conf. Consumer Electron., ICCE’09, 2009, pp. 1–2. [11] E. Ekmekcioglu, V. Velisavljevic, and S. Worrall, “Efficient edge, motion and depth range adaptive processing for enhancement of multiview depth map sequences,” in Proc. IEEE Int. Conf. Image Process., Cairo, Egypt, Nov. 2009, vol. ICIP 2009, pp. 3537–3540. [12] V. Bruce, P. R. Green, and M. A. Georgeson, Visual Perception. London, U.K.: Psychology Press, 2003. [13] “Weber’s Law” (Psychology)—Britannica Online Encyclopedia. [14] C. Tomasi and R. Manduchi, “Bilateral filtering for gray and color images,” in Proc. IEEE Int. Conf. Comput. Vis., Washington, DC, 1998, pp. 839–846. [15] C. L. Zitnick, S. Kang, M. Uyttendaele, S. Winder, and R. Szeliski, “High-quality video view interpolation using a layered representation,” ACM SIGGRAPH and ACM Trans. Graphics, pp. 600–608, Aug. 2004. [16] G. Bjontegaard, “Calculation of average PSNR differences between RD-curves,” in Proc. VCEG Meeting, Austin, TX, Apr. 2001. [17] Methodology for the Subjective Assessment of the Quality of Television Pictures, ITU-R Recommendation BT.500-11, 2002. [18] Expert Viewing Methods to Assess the Quality of Systems for the Digital Display of Large Screen Digital Imagery in Theatres, ITU-R Recommendation BT.1663, 2003. [19] Y. Zhao and L. Yu, Perceptual measurement for evaluating quality of view synthesis Apr. 2009, ISO/IEC JTC1/SC29/WG11/M16407.
Varuna De Silva received the B.Sc. engineering degree (first class honors) in electronic and telecommunications engineering from the University of Moratuwa, Moratuwa, Sri Lanka, in 2007. He is currently pursuing the Ph.D. degree at the the I-Lab, Multimedia Communications Research Group, University of Surrey, Guildford, UK. After graduating, he was an Engineer in the mobile communications industry. Currently, he is working on 3-D and multiview video processing, coding, and transmission-related issues. He is a contributor to the MUSCADE Integrating Project funded under the European Commission ICT 7th Framework Program. Mr. De Silva was awarded the Overseas Research Scholarships Award (ORSAS) by the Higher Education Funding Council of England to pursue the Ph.D. degree at the University of Surrey.
Erhan Ekmekcioglu received the B.Sc. degree in electrical and electronics engineering from Middle East Technical University, Ankara, Turkey, in 2006 and the Ph.D. degree from University of Surrey, Guildford, UK, in 2010. Currently, he is a Research Fellow in the I-Lab, Multimedia Communications Group, Center for Vision, Speech, and Signal Processing (CVSSP), University of Surrey, Guildford, U.K. His research interests include 3-D/multi-view video and depth map coding, scalable video coding, free-viewpoint TV (FTV), and 3-D video quality assessment. He has a number of publications in the related areas.
DE SILVA et al.: DISPLAY DEPENDENT PREPROCESSING OF DEPTH MAPS BASED ON JNDD MODELING
W. A. C. Fernando received the B.Sc. Engineering degree (first class) in electronic and telecommunications engineering from the University of Moratuwa, Moratuwa, Sri Lanka, in 1995, the M.Eng. degree (with distinction) in telecommunications from the Asian Institute of Technology (AIT), Bangkok, Thailand, in 1997, and the Ph.D. degree from the Department of Electrical and Electronic Engineering, University of Bristol, Bristol, U.K., in February 2001. Currently, he is a Senior Lecturer in signal processing at the University of Surrey, Guildford, UK. Prior to that, he was a Senior Lecturer at Brunel University, U.K., and an Assistant Professor at AIT. His current research interests include 3-D video coding, video quality assessment, quality of experience, distribute video coding (DVC), intelligent video encoding for wireless communications, channel coding, and modulation schemes for wireless channels. He has published more than 200 international papers in these areas. Dr. Fernando is a fellow of the HEA, U.K. He is also a member of the EPSRC College.
351
S. Worrall received the M.Eng. electronic engineering degree from the University of Nottingham, Nottingham, U.K., in 1998 and the Ph.D. degree from the Centre for Communication Systems Research (CCSR), University of Surrey, Guildford, U.K., in 2001. He joined the Centre for Communication Systems Research (CCSR) in 1998 as a research student. From 2001 to 2003, he continued to work within CCSR as a Research Fellow and was appointed as a Lecturer in multimedia communications in 2003. His research interests include error robust video coding, multiple description coding, video compression, transmission of video over wireless networks, and multiview video coding.