Depth maps estimated using DERS and MatchBox for a frame of the sequence ... and MatchBox.9 Those two tools adopt distinct depth estimation strategies and ...
Disparity Estimation and Disparity-coherent Watermarking Hasan Sheikh Faridul, Gwena¨el Do¨err, and S´everine Baudry, Technicolor R&D France – Security & Content Protection Lab ABSTRACT In the context of stereo video, disparity-coherent watermarking has been introduced to provide superior robustness against virtual view synthesis, as well as to improve perceived fidelity. Still, a number of practical considerations have been overlooked and in particular the role of the underlying depth estimation tool on performances. In this article, we explore the interplay between various stereo video processing primitives and highlight a few take away lessons that should be accounted for to improve performances of future disparity-coherent watermarking systems. In particular, we highlight how lost correspondences during the stereo warping process impact watermark detection, thereby calling for innovative designs. Keywords: Stereo video, depth, disparity, disparity-coherent, watermarking, view-synthesis
1. INTRODUCTION Traitor tracing consists of inserting forensic watermarks encoding a unique identifier within entertainment content.1 As a result, when a pirate copy is found on some unauthorized distribution platform, it is possible to extract the embedded identifier and to trace back to the source of the leak from the pirated content. In this context, stereo video content raises unique technical challenges due to the high correlation between the left and right views. Prior works indeed suggest that the watermarks embedded in the two views should be coherent with respect to the information of disparity.2–5 Disparity coherence refers to the fact that a physical point of the captured scene should always carry the same watermark sample regardless of where it appears in the left/right views. The advantage of producing disparity-coherent watermarks is twofold. First, it produces pairs of stereoscopic views that are more in line with what would naturally occur in reality and thereby yields less visual discomfort, e.g. headaches. Second, and perhaps more importantly, disparity-coherent watermarks are expected to exhibit superior robustness against view synthesis. View synthesis consists in generating a virtual view in-between views that are available, e.g. the left and right views in stereo video. This mechanism can be used to adjust the depth rendering range of stereoscopic display or to generate the many views required to feed an auto-stereoscopic display.
1.1 Disparity-coherent Stereo Video Watermarking Existing watermarking proposals to achieve disparity coherence can be classified according to the underlying representation of stereo video content, namely depth image-based rendering (DIBR)3, 4 vs. multi-view.2, 5 In DIBR, stereo video is represented by a central 2D view and a depth map that associates, to each pixel, the depth of the corresponding physical point in the captured scene. At rendering time, it is possible to generate the left and right views that are shown to the observer to provide depth perception, using standard depth-based view synthesis mechanisms.6 A straightforward idea is then to watermark the central view with any legacy watermarking system so that the embedded watermark is naturally exported to the left and right views during view synthesis, thereby producing the expected disparity-coherent watermarks.3 In a variant, two other inversewarped versions of the reference watermark pattern, one associated to the left view and the other to the right view, are also embedded in the central view. As a result, at the receiver side, the detector can look directly for the reference watermark pattern in the left and right views without caring for any warping distortion.4 On-going research suggests that the DIBR representation has potential to provide a better compression ratedistortion trade-off than the conventional multi-view left-plus-right representation. Nevertheless, DIBR remains hardly deployed nowadays. Most stereo video content is still delivered as a combination of two views (left and right) today, possibly encoded with dedicated video codecs such as MVC. This is typically the case with Blu-ray
3D and 3D broadcast for instance. In other words, DIBR-based watermarking techniques do not address the current need of the market. When stereo video content is represented as a couple of 2D views, the camera geometry needs to be properly accounted for to transport a watermark pattern embedded in one of the views to the other one, e.g. from the left to the right, in a disparity-coherent way.2 In a nutshell, disparity-coherence is achieved through the use of a ray tracing mechanism that connects a pixel from the reference view to the physical scene back to the view to be generated. This requires access to the intrinsic and extrinsic parameters of the cameras as well as the depth information of the scene. Since such side information may not always be available, a fall-back solution is therefore to rely on conventional disparity estimation techniques to obtain the information necessary to perform this warping operation.5 In other words, a reference watermark is first embedded in one of the two views, let say the left one, and then warped to the second view using the estimated disparity information.
1.2 Dealing with Imperfect Disparity Estimation The flip side of the coin of relying on disparity estimation is that these tools are, by nature, imperfect. In some cases, it is not possible to reliably assign a disparity value to a pixel e.g. in occluded areas that only appear in one of the views or when the estimation confidence is very low. Another aspect is that disparity can be estimated either from the left to the right or from the right to the left. Even with some regularization constraints, such two estimations routinely feature inconsistencies i.e. the round trip of a pixel, from one view to the other back to the first one, does not bring it to its original position. The objective of this study is to investigate how such lost correspondences may impact the performances of a disparity-coherent watermarking system. After reviewing a baseline watermarking framework in Section 2, we first analyze the watermark detection performances against view synthesis and showcase a correlation with some properties of the disparity maps using two estimation tools in Section 3. We also survey how the properties of the estimated disparity map affect robustness after lossy compression. Keeping in mind that disparity estimation will actually be performed both by the watermarker and the attacker, we review in Section 4 what is the impact of using matching or mismatching disparity estimation tools on detection performances. Eventually, we summarize our findings in Section 5 and list a few takeaway lessons.
2. BASELINE DISPARITY-COHERENT VIDEO WATERMARKING SYSTEM For the sake of simplicity, we will use throughout this article a system that embeds additive spread-spectrum watermarks in a disparity-coherent way with a blind detection procedure that relies on the computation of a horizontal cross-correlation array.5 On the emitter side, a reference watermark pattern wL is first embedded in the left view: (w)
vL
= vL + α · wL ,
wL ∼ N (0, 1),
(1)
where the superscript (w) indicates watermarked quantities, the subscript L (resp. R ) denotes quantities related to the left (resp. right) view, α > 0 is the embedding strength, and wL is normally distributed with zero mean and unit variance. Next, the reference watermark is warped from the left view to the right view: wR = warp(wL , dL , θ L , θ R ),
(2)
where the warp(.) operator takes in input, the view/image to be warped (in this case the reference watermark pattern wL ), the depth map d associated to it, and the source and destination camera parameters θ (intrinsic and extrinsic). Since the depth map of the left view is usually not readily available, it needs to be estimated through an evaluation of the disparity between the left and right views. Indeed, for rectified stereo video content, disparity essentially reduces to a horizontal shift whose amplitude is inversely proportional to the depth associated to a pixel. Eventually, the warped watermark pattern is simply added to the right view: (w)
vR = vR + α · wR .
(3)
In practice, we used an embedding strength α = 3 in our experiments to keep the embedding distortion imperceptible. It should be noted that this baseline watermarking framework could be enriched with conventional add-ons well-known by the man of the art, e.g. perceptually modulating the embedding strength to better accommodate for the human visual system or canceling host interference for improved detection statistics. This being said, in this article, we will keep this baseline watermarking framework to its crudest incarnation for simplicity. On the receiver side, the detection engine first computes the horizontal cross-correlation between an input view v and the reference watermark pattern wL : ρ[o] = lc (v, shift(wL , o)) ,
(4)
where lc(.) is the linear correlation operator and shift(.) horizontally shifts an input array by a specified offset o. Previous work showed that this strategy is helpful to pick up watermark energy that may have been scattered by the warping process, e.g. during view synthesis.5 Moreover, empirical observations have clearly highlighted the benefit of high-pass filtering on the tested view v to improve detection statistics. The role of this pre-filtering operation is to cancel out undesired host interference and in particular the low frequency component naturally present in the correlation array ρ because of the horizontal correlation of visual content. In the ideal non-blind detection case, one could simply subtract the original view v ← v − vorig prior to computing the correlation array. Alternately, in the blind detection scenario, a straightforward strategy consists in removing a low-pass version of the view, i.e. v ← v − L(v) where L(.) is a low-pass filter operator. In our experiment, we used a 11 × 11 Gaussian kernel with standard deviation 3. To aggregate scattered watermark energy, the detector then simply sums the components of the correlation array ρ that exceed a specified threshold τagg : X ρ= |ρ[o]|, O = {o, |ρ[o]| > τagg } . (5) o∈O
Eventually, to assess the presence or absence of the reference watermark pattern wL in the input view v, the detector simply compares the aggregated correlation statistic ρ to a threshold τdet . In contrast with watermarked content, original content is indeed expected to yield a correlation score ρ close to zero. In our experiments, empirical observations have shown that the setting τagg = 0.07 and τdet = 0.25 yield acceptable watermark detection statistics.
3. WATERMARK DETECTION AFTER VIRTUAL VIEW SYNTHESIS Apart from improved visual comfort, a key benefit of disparity-coherent watermarking compared to other constructions is the ability to detect the embedded watermark in virtual views, synthesized in-between the original left and right views. In Subsection 3.1, we first dive in the inner mechanics of virtual view synthesis to reveal that the detection statistic ρ results from the fusion of two contributions originating from the left and right views. This fusion can possibly be affected by the quality of the estimated disparity maps. As a result, in Subsection 3.2, we compare two disparity estimation tools in an effort to highlight how characteristic properties of the disparity maps impact detection performances. Eventually, Subsection 3.3 assess how much watermark detection performances are affected by lossy compression.
3.1 A Deeper Look into Virtual View Synthesis MPEG’s View Synthesis Reference Software (VSRS7 ) is a standard tool to synthesize views. It takes in input two views, vL and vR , their associated depth maps, dL and dR , and their camera parameters, θ L and θ R , as well as the parameters θ S of the view to be synthesized. Internally, VSRS first warps the input left and right views to the desired camera position of the view to be synthesized using the provided geometric information: vS
(L)
=
warp(vL , dL , θ L , θ S ),
(6)
(R) vS
=
warp(vR , dR , θ R , θ S ).
(7) (L)
(R)
The view synthesis engine then generates the final view vS by fusing the two temporary views vS and vS together pixel by pixel. The fusion process can be very crude by selecting the pixel value of one of the two views
3 2.9
(L)
vS
2.8
v(R) S
2.7
vS
ρ
2.6 2.5 2.4 2.3 2.2 2.1 2 left
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
right
ViewLSynthesisLParameter
Figure 1. Watermark detection statistics in synthetic views for the sequence Balloons. Here, view synthesis parameter (L) (R) defines the virtual camera position between input views. The contribution from the left vS and right vS views have been isolated to better appreciate how the inner behavior of view synthesis translate in the watermark detection score.
or more elaborately e.g. using some alpha-blending technique. Due to occlusions and various other effects, the synthesized view may still feature some holes that need to be filled using inpainting techniques. Neglecting some details, virtual view synthesis can therefore be assimilated to the fusion of two contributions originating from the left and right views. To better appreciate how each contribution participates to watermark (L) (R) detection, we instrumented VSRS to output vS , vS , and vS and computed the detection statistic ρ defined in Equations (4)-(5) for each one of these views. As mentioned earlier, VSRS requires access to the depth maps of the left and right views which may not be available. When needed, these depth maps have been estimated using MPEG’s Depth Estimation Reference Software (DERS8 ). As an example, Figure 1 depicts the recorded average detections statistics with the video sequence Balloons for the original left and right views as well as nine equidistant virtual views placed in-between. In line with previously reported experimental results, the baseline disparity-coherent video watermarking system showcases some ability to withstand view synthesis as illustrated by the solid blue line with ‘triangle’ markers. The correlation score ρ nicely degrades the further the synthesized view moves away from the left. This being said, the instrumentation of VSRS provides a finer understanding of what is going on. The left view vL contains the full frame reference watermark pattern wL and the corresponding correlation array ρ therefore features a Dirac impulse. When warping this view to some intermediary position to the right, the reference pattern is not shifted as a whole but is broken into pieces according to depth information. The richer is the depth information in the scene, the more pieces are created. This partitioning of the watermark pattern translates as the Dirac impulse of the left view being distributed in several components of the correlation array ρ. The further away is the synthesized view apart from the left one, the more scattered is the watermark energy. In addition, some parts of the watermark pattern are lost during the warping process, e.g. in occluded areas and border portions of the scene. This phenomenon is incarnated in Figure 1 by the solid red line (with (L) ‘star’ markers) that captures how much of the watermark pattern is found in vS . Correlation is gradually lost due to the apparition of some parts in the frame not carrying watermark samples and components ρ[o] that are discarded because they are below the aggregation threshold τagg . On the other hand, the right view vR contains a watermark wR , which is essentially a damaged version of wL due to the warping process applied during disparity-coherent watermarking. This is the reason why the correlation score for the right view is lower than for the left one. The solid magenta line (with ‘dot’ markers) (R) in Figure 1 illustrates the correlation detected in vS for different camera positions and clearly indicates that the correlation score only marginally increases when warping the right view to the left. Indeed, the parts of the reference watermark which have been lost during the initial warping from the left to the right cannot be recovered. The increase of correlation is due to the concentration of the watermark energy scattered during the first warping operation, permitting thereby to components of the correlation array ρ to rise above the aggregation
(a) Video frame
(b) DERS
(c) MatchBox
Figure 2. Depth maps estimated using DERS and MatchBox for a frame of the sequence Ballons.
threshold and thus to be taken into account. This begin said, it is important to notice that the first warping operation (watermarking process) uses the depth map dL whereas the second one (view synthesis process) uses dR . The down-to-earth issue is that depth estimation is not perfect and the two depth maps therefore feature inconsistencies one with respect to the other, thereby making the watermark energy consolidation imperfect on the way back to the left. (L)
(R)
Eventually, VSRS combines the two temporary views vS and vS to produce the final synthesized view vS . In this particular example, the fusion strategy is rather simple and the pixel value provided by the nearest reference view is kept when available. As a result, the actual correlation score computed from the synthesized (L) (R) view vS alternates between the correlation profiles obtained for vS and vS with a breaking point in the middle.
3.2 Impact of Lost Correspondences As mentioned earlier, lost correspondences during depth estimation is likely to hamper detection statistics of disparity-coherent watermarks. Lost correspondences may occur for a number of reasons: 1. some pixels are not visible in the left and right views e.g. at the border of the frames or because of occlusion; 2. depth cannot be estimated with enough confidence and remains therefore unknown for some pixels, e.g. in poorly textured areas; 3. the left-to-right and right-to-left depth estimations are not consistent, thereby resulting in pixels not going back to their original location after a warping round trip. To better appreciate the impact of such lost correspondences on disparity-coherent watermark detection, we benchmarked our baseline watermarking system using two alternate disparity estimation tools, namely DERS8 and MatchBox.9 Those two tools adopt distinct depth estimation strategies and thereby produce quite different depth maps as illustrated in Figure 2. DERS essentially solves a global optimization problem that incorporates a left-right consistency check and a temporal smoothing operator to avoid temporal flicker. The resulting depth map therefore feature uniform regions whose boundaries are rather imprecise as shown in Figure 2(b). In contrast, MatchBox is a local optimization method that is computationally efficient and that better preserves the shape of the edges of the disparity map. However, the lack of global regularization may yield low-frequency depth noise as can be seen in the background of Figure 2(c). Using those two depth estimation techniques, we embedded a disparity-coherent watermark in the original left and right views of the video. We then recorded the correlation score ρ in the central view, i.e. the synthesized view located exactly in the middle between the left and right views, as well as the percentage of pixels for which we detected a loss of correspondence. This experimental protocol has been applied to six sequences of the MPEG-FTV dataset10–13 (Balloons, Kendo, Lovebird1, Newspaper, Poznan Hall2, and Poznan Street) that offer a wide variety of configurations with respect to acquisition geometry, camera motion, scene complexity, etc. The results are summarized in Figure 3. For both tools, the scatter-plot clearly highlights that increasing lost
3.4
DERS MatchBox
3.2 3
ρ
2.8 2.6 2.4 2.2 2 1.8
0
5
10
15
20
25
30
Lostxcorrespondencesx(%)
Figure 3. Scatter-plot capturing the relationship between the presence of lost correspondences and the recorded watermark detection score for the central synthesized view. Each data point corresponds to a frame of one of the video sequences in the experimental dataset.
correspondences usually induce a lower watermark detection score. As a matter of fact, MatchBox appears to be more sensitive to lost correspondences than DERS. This being said, the many differences between the two estimation techniques make a comparison between them quite difficult. While MatchBox seems to feature less lost correspondences in general, DERS still provides better depth maps in some cases. Moreover, even when MatchBox yields fewer lost correspondences, it does not necessarily convert into higher watermark detection scores, as exemplified in the lower right corner of the scatter-plot. In summary, while it is apparent that the quality of the estimated depth maps has an impact on watermark detection statistics, it remains marginal to some extent. Lost correspondences of type #1 and #2 directly translate in decreasing watermark detection scores. However, they are fairly equal for the two depth estimation tools and thereby barely affect the comparison. In contrast, DERS routinely yields more lost correspondences of type #3 compared to MatchBox. Inconsistent depth maps results in watermark samples coming back slightly off their original position which somehow translates in low-pass filtering the correlation array ρ. Thanks to the aggregation mechanism and the natural concentration of the watermark energy in ρ, such smoothing is unlikely to result in major watermark detection loss except in some very specific corner cases.
3.3 Robustness to Multiple View Coding by MV-HEVC While we previously established that better depth maps yield better watermark detection statistics, it is unclear whether this robustness competitive advantage survives subsequent processing. To investigate this issue, we first embedded disparity-coherent watermarks using either DERS or MatchBox into the video sequences of the dataset as described in Section 2. We then compressed the resulting watermarked stereo pairs using MPEG’s reference codec Multiple View HEVC (MV-HEVC14, 15 ) with various quantization parameter values, namely QP = {10, 13, 16, 19, 22}. A higher quantization parameter value indicates a stronger compression. Subsequently, we synthesized the central view using the left and right views obtained by decoding the MV-HEVC bit stream. Eventually, we run the watermark detection algorithm and recorded the watermark detection scores. To exemplify the impact of MV-HEVC lossy compression, we focused this study on the two video sequences that feature stark depth estimation differences. Depth maps for the sequence Poznan Hall 2 feature in average 22% lost correspondences using DERS compared to 10% using MatchBox. Conversely, DERS yields 4% lost correspondences with Lovebird1 compared to 9% with MatchBox. The recorded experimental results are illustrated in Figure 4. As could be expected, the detection score gradually decreases when lossy compression becomes stronger. Moreover, the competitive advantage granted by a better depth map is preserved at different
2.8 2.6 2.4 2.2
ρ
2 1.8 1.6 1.4 1.2 1 10
Lovebird1z−zDERS Lovebird1z−zMatchBox PoznanzHallz2z−zDERS PoznanzHallz2z−zMatchBox
13
16 QP
19
22
Figure 4. Average correlation score recorded in the central view of the sequences Lovebird1 and Poznan Hall 2 after lossy compression of the stereo pair using MV-HEVC and different quantization parameter values.
compression levels even if the watermark detection gain decreases at higher compression levels. This being said, such clear separation between the two depth estimation tools happens because one of them yield notably better depth maps for these video content. For video sequences where both tools are comparable with respect to depth estimation, the original watermark detection gain is usually much smaller and may quickly vanish or be reversed after lossy compression.
4. MISMATCHING DEPTH ESTIMATION ALONG THE PIRACY WORKFLOW Depth estimation is the cornerstone to processing stereo video pairs. On the one hand, the watermarker needs a depth map to warp the reference pattern from one view to the other in a disparity-coherent way as given by Equation (2). On the other hand, the attacker needs a depth map to warp the original views to the location of a virtual camera and thereby synthesize virtual views. So far, in all experiments reported in this article, depth estimation has been performed only once, that is to say on the original stereo video pair, and the resulting depth map has been used both for disparity-coherent watermarking and virtual view synthesis. This hardly reflect a real-world scenario. First, the attacker would have to re-estimate the depth based on the watermarked stereo video pair. Second, she may use an alternate depth estimation tool compared to the one used by the watermarker. To evaluate watermark detection performances in this more realistic scenario, we re-run the embedding, view synthesis, and detection procedures using all the combinations of depth estimation with DERS and MatchBox. The detection results are summarized in Figure 5 for the video sequence Balloons with a box-plot that captures the distribution of the watermark detection scores. The central line in a box indicates the median value of the correlation scores whereas the edges of the box correspond respectively to the 25-th and 75-th percentiles; the whiskers extend to the most extreme detection scores not considered as outliers. For reference, Figure 5 also includes the watermark detection statistics when both disparity-coherent watermarking and virtual view synthesis use the same depth maps, estimated from the original stereo video pairs. The labels of the x-axis correspond to the depth estimation strategy. The depth estimation tools are represented by their initial. A single letter indicates that depth estimation has only been performed once on the original stereo video pair whereas two letters indicate that disparity-coherent watermarking and virtual view synthesis use different depth maps. When two letters are used, the first one indicates which algorithm has been used for disparity-coherent watermarking and the second the one for virtual view synthesis. Quite counter-intuitively, Figure 5 clearly highlights that depth re-estimation by the attacker does not hamper watermark detection performances but actually improve them. This is due to the fact that the depth estimation process is now biased by the presence of a disparity-coherent watermark that has been embedded using dL . This low energy noise lifts some ambiguities during the depth estimation process and the pair of depth maps estimated to perform virtual view synthesis therefore features less disparity inconsistencies than the pair of depth maps
2.55 2.5 2.45
ρ
2.4 2.35 2.3 2.25 2.2
D
DD
DM MD Depth estimation strategy
MM
M
Figure 5. Impact of depth map re-estimation on disparity-coherent watermark detection performances on the video sequence Balloons. The labels of the x-axis correspond to the depth estimation strategy using the initials of the depth estimation tools.
estimated directly from the original content. In a way, it is similar to the bias of motion estimation toward the null vector when the same watermark pattern is repeatedly embedded in the frames of a video sequence.16 This reduction of inconsistencies naturally translates in a marginal gain of the watermark detection score ρ. One can also notice that, when the watermarker and the attacker use mismatching depth estimation tools, the estimation of the attacker has more influence on the detection score. For this particular video sequence, MatchBox indeed provides better depth maps and therefore yield higher detection scores when it is used for view synthesis. In contrast, even if the disparity-coherent watermark has been embedded using a depth map estimated with MatchBox, if the attacker use DERS for view synthesis, watermark detection statistics are negatively affected.
5. CONCLUSION In this paper, we conducted an in-depth survey to assess how much disparity-coherent watermark detection statistics depend on the quality of depth estimation. By looking at the inner mechanics of virtual view synthesis, it indeed becomes apparent that the underlying working assumption of disparity-coherent watermarking is that the contributions from the left and right views to the correlation array ρ are nicely such that those pile up and thereby raise above the aggregation threshold τagg . However, practical depth estimation tools are imperfect and thus yield lost correspondences that deviate from such assumption. While the quality of depth estimation has been shown to affect watermark detection performances, the impact remains marginal. Indeed, depth estimation tools mostly differ with respect to the number of pixels whose left-to-right and right-to-left depth estimations are inconsistent. Such inconsistency does not directly translate in a loss of correlation but rather smooths the correlation array, which is somewhat compensated for by the aggregation mechanism of the detection procedure. Moreover, contrary to the intuition, the fact that the attacker has to re-estimate the depth maps for virtual view synthesis actually boosts detection performances rather than hampering them. In future work, we intent to investigate how to avoid the imbalance of the watermarking system towards the view hosting the reference watermark compared to the one with the warped version. We will also explore means to automatically adjust the aggregation threshold according to the statistics of the correlation array ρ in order to avoid conservative settings that discard watermark energy that could be aggregated in low-noise conditions. Moreover, we will conduct subjective quality tests to evaluate how much the human visual system is oblivious to disparity-coherent watermarking compared to other watermarking strategies.
ACKNOWLEDGMENTS The authors want to thank Thierry Borel from Technicolor’s 3D Excellence Center for fruitful discussions about real-life settings for stereo video content transmission/display. They also want to thank Valter Drazic from
Technicolor Research & Innovation for providing and helping installing his optimized disparity estimation tool MatchBox.9
REFERENCES [1] Furon, T. and Do¨err, G., “Tracing pirated content on the Internet: Unwinding Ariadne’s thread,” Security & Privacy 8, 69–71 (September/October 2010). [2] Koz, A., Cigla, C., and Alatan, A. A., “Watermarking of free-view video,” IEEE Transactions on Image Processing 19, 1785–1797 (June 2010). [3] Halici, E. and Alatan, A. A., “Watermarking for depth-image-based rendering,” in [Proceedings of the IEEE International Conference on Image Processing ], 4217–4220 (November 2009). [4] Lin, Y.-H. and Wu, J.-L. W., “A digital blind watermarking for depth-image-based rendering 3D images,” IEEE Transactions on Broadcasting 57, 602–611 (June 2011). [5] Burini, C., Baudry, S., and Do¨err, G., “Blind detection for disparity-coherent stereo video watermarking,” in [Media Watermarking, Security, and Forensics XVI], Proceedings of SPIE 9028 (February 2014). [6] Fehn, C., “A 3D-TV approach using depth-image-based rendering (DIBR),” in [Proceedings of IASTED Conference on Visualization, Imaging, and Image Processing], I, 482–487 (September 2003). [7] MPEG, “View synthesis reference software (VSRS) – Version 3.5.” http://wg11.sc29.org/svn/repos/ Explorations/FTV/vsrs/trunk/. [8] MPEG, “Depth estimation reference software (DERS).” http://wg11.sc29.org/svn/repos/ Explorations/FTV/ders/. [9] Drazic, V. and Sabater, N., “A precise real-time stereo algorithm,” in [Proceedings of the 27th Conference on Image and Vision Computing New Zealand], 138–143 (November 2012). [10] MPEG2011/N12036, I. J., “Call for proposals on 3D video coding technology.” http://3d-codec. multimedia.edu.pl/doc/w12036.pdf (March 2011). [11] Ho, Y.-S., Lee, E.-K., and Lee, C., “Multiview video test sequence and camera parameters.” ISO/IEC JTC1/SC29/WG11 MPEG2008/M15419 (April 2008). [12] Doma´ nski, M., Grajek, T., Klimaszewski, K., Kurc, M., Stankiewicz, O., Stankowski, J., and Wegner, K., “Pozna´ n multiview video test sequences and camera parameters.” ISO/IEC JTC1/SC29/WG11 MPEG2009/M17050 (October 2009). [13] Kang, Y.-S., Lee, E.-K., Jung, J.-I., Lee, J.-H., Shin, I.-Y., and Ho, Y.-S., “3D video test sequence and camera parameters.” ISO/IEC JTC1/SC29/WG11 MPEG2009/16949 (October 2009). [14] Chen, Y., Tech, G., Wegner, K., and Yea, S., “Test model 9 of 3D-HEVC and MV-HEVC.” ISO/IEC JTC1/SC29/WG11 N14704 – JCT-3V and Video Subgroup (July 2014). [15] JTC1/SC29/WG11, I., “Rerefence implementation of 3D-HEVC and MV-HEVC by JCT-3V and video subgroup – version HTM12.0.” https://hevc.hhi.fraunhofer.de/svn/svn_3DVCSoftware/tags/HTM-12. 0/. [16] Pankajakshan, V., Do¨err, G., and Bora, P. K., “Detection of motion incoherent component in video streams,” IEEE Transactions on Information Forensics and Security 4, 49–58 (March 2009).