data with expanded dynamic range and color gamut. The expanded .... for this is that the chroma subsampling with colored text is particularly noticeable and ...
DES2 - 1 Invited
Motion-based verification of visually lossless display stream compression for HDR imagery Wei Wang, David M. Hoffman, Dale Stolitzka, Wei Xiong Samsung Display, 3655 North First Street, San Jose, CA 95134, USA Keywords: visually lossless, image quality, display stream compression, consumer devices ABSTRACT Display stream compression (DSC) must have excellent stability such that it produces no scintillation artifacts with motion sequences that force a constant re-encoding of the data. We evaluated different image motion testing paradigms, including panning imagery, to explore DSC visual quality on demanding HDR imagery. 1. INTRODUCTION In the past decade, display systems have been rapidly expanding their resolution, frame rate, and will soon use high bit depth imagery. These imaging improvements outpace the ability of the links to transmit all the necessary data. Compact-form factors and cost-competitiveness have reduced the appeal of expanding the number of data lanes. An emerging solution is to use visually lossless compression in the display links, which employs light compression (~3:1) to reduce the bandwidth of the data without noticeable visual artifacts. In prior work, a subjective assessment method based on in-place flicker was used to define “visually lossless” visual quality on the assumption that the use of a temporal component is critical for a relevant evaluation [1]. It also assumes that the compression noise would be uncorrelated frame to frame, and any losses would be most visually noticeable in static imagery compared to video sequences. In this paper, these assumptions are tested using several dynamic subjective assessment methodologies to determine whether these compression routines achieve their stated visual quality goal. Another important consideration for display stream compression (DSC) testing is to determine whether it is effective in upcoming image formats such as high dynamic range (HDR) [5]. HDR imaging technology has strongly motivated deep color bit-depth to more effectively carry data with expanded dynamic range and color gamut. The expanded container volume raises its own questions for effective HDR testing. For example, how do display stream compression losses propagate through the HDR image processing? Furthermore, the existing method is designed for testing standard dynamic range (SDR) images on computer monitors and RGB format-friendly consumer devices. There are obstacles to applying it for HDR imagery due to equipment modes and content availability. In this paper, as part of the testing workflow,
image quality of the compressed image/video is evaluated after applying the image processing needed to remap HDR data to a signal appropriate for the test display. This paper describes a study to evaluate the validity of four visual assessment methods and to explore the feasibility of detecting display artifacts in what should be visually lossless compressed images and videos after using display stream compression on HDR image data. 2. METHODS 2.1 Subjective assessment We used an updated version of the ISO/IEC 29170-2:2015 [2] method with experimental revisions for UHD TV viewing ISO/IEC 29170-2:2015/PDAM1 [3] and ISO/IEC 29170-2:2015/PDAM2 [4]. We used a Samsung TV set (JS9500) in IT mode, configured to present 3840×2160 RGB444 image data with minimal processing. Observers were positioned with a chinrest at a fixed distance of 122cm such that pixels subtend 1/60 deg. We used paired comparison testing in this study. A test sequence is presented on one side of the screen and an uncompressed reference sequence is shown on the other side. These two alternatives are randomly assigned to the left or right side. Observers are asked to select which side has the best video quality that is most free of flicker. 2.2 Image presentation methods We tested four different image presentation methods. Three are based on a single frame, and one is based on a video sequence. The flicker method used before has been shown to be sensitive to the differences of still images, but it is unclear how the results might correlate with various types of motion imagery. Motion effectively masks many image compression artifacts, both perceptually (drawing attention away from image artifacts) and optically (motion blur, image re-encoding). In this study, we assessed four subjective assessment methods to determine which are most likely to reveal weakness in a compression codec. These methods include: 1) Compressed/reference image flicker 2) Panning image compression 3) Panning image compression with stabilization 4) Video playback with compression
ISSN-L 1883-2490/23/1285 © 2016 ITE and SID
IDW/AD ’16
1285
In the compressed/reference image flicker method, a compressed and reconstructed image is interleaved with a reference image (which is the uncompressed original) [2]. The alternation occurs every three frames (30Hz display refresh rate) with a flicker cycle of 5 Hz. In the panning image compression method, a still image (same as the compressed/reference image flicker) is diagonally shifted in each frame and compressed. The reconstructed image is cropped at a fixed frame location so that there is diagonal motion (lasting 3 sec) in the cropping aperture (Figure 1a). This method does not contain local motion to mask image artifacts. Panning motion is widely prevalent, whenever imagery is shown within a window. The panning image compression with stabilization method is based on the same frame data as the panning image compression method, but the crop from each frame is shifted in unison with the image shift such that there is no motion in the resulting video sequences (Figure 1b). This method introduces frame-by-frame re-encoding artifacts (similar to the compressed/reference image flicker) but ensures that actual frame to frame noise is representative of a DSC system. Compared to the panning method, this screen-stabilized panning reduces some potential display-pixel temporal issues such as panel overdrive or motion blur that could obscure compression artifacts. In the video playback with compression method, each frame of a video sequences was individually compressed. The clips were played forwards and backwards (bounce method) to avoid discontinuity. The clip duration was 3 seconds. The video playback method examines the visibility of the same type of image artifacts in the presence of local motion.
Figure 1. Image presentation methods: panning image compression (top row) and panning image compression with stabilization (bottom row) 2.3 Image pipeline Images were converted from their initial format to the HDR10 image container. This container uses 10 bits of precision per channel, Rec. 2020 color space limit, and SMPTE 2084 EOTF. Within the larger container, the image data exercised the color volume 0 to 350 nits and Rec. 709 color space. Prior to DSC compression, the image was either left in RGB 4:4:4 format or converted to YCbCr 4:2:2 using the standard MPEG-2 method with chroma samples co-sited with luma samples. Image and sequence compression
1286
IDW/AD ’16
used the codec configuration in Table 1. 7DEOH'6&FRPSUHVVLRQSDUDPHWHUV Code ver 1.48 RGB 4:4:4 YCbCr 4:2:2 Picture parameters Standard Native 4:2:2 Bits/component 10 10 Bits/pixel compression 8.0 7.0 Compression ratio 3.75 §2.86 Slice height x width 108 lines x 1 horizontal line Compression results between 4:4:4 and 4:2:2 are not easily comparable. The bit rate for 4:2:2 is about 12% less than for 4:4:4. The applied values are based on test recommendations from VESA [5]. Analysis of image quality results at different bit rates for the same content is beyond the scope of this paper. After compression, the image data is reconstructed and for the YCbCr case, converted to RGB 4:4:4. We then apply the appropriate linear transformations on the image to shift from the Rec.2020 with SMPTE 2084 EOTF format to an SDR-formatted signal for the TV (Figure 2).
Figure 2. Image pipeline in display stream compression of HDR images (YUV4:2:2 shown) 2.4 Content Cinema content was selected from the movies generously made available by the Blender Foundation including, Big Buck Bunny, Sintel and Tears of Steel (Figure 3). Additional content was adapted from the challenging images used in previous work for DSC testing [1]. For this subset of still images, video playback testing was not possible. A total of 12 scenes were tested in this study (7 cinema and 5 challenge scenes). All were selected based on having difficult-to-compress image features. All the scenes were cropped into a rectangle (600×400 pixels) for emphasis on a particular region of interest that highlights particular image impairment. A collection of the image crops from the cinema and challenge scenes is shown in Figures 3 and 4, respectively. 2.5 Observers Eight observers participated in this experiment. Four were familiar with the experimental hypothesis, and the other four were naïve observers. The four observers who have knowledge about DSC artifacts and the experimental hypothesis evaluated each image with various methods 20 times, while the other naïve four observers saw each image/method combination 10
times. One of the naïve observers was dropped from further analysis due to having low overall sensitivity to artifacts in all conditions.
Video playback effectively masks DSC artifacts, and has a much lower response rate than other methods. 2) There are some significant differences between observers, with at least some of the observers detecting artifacts that are not widely visible. 3) Of the different methods based on a single image, the average ratings for three methods are fairly similar but for at least some observers, the panning method reveals the most artifacts.
Figure 3. Thumbnails of cinema scene crops (Bunny indicates Big Buck Bunny and Tears indicates Tears of Steel). Figure 5. Results plot for Sintel Shoulder. Response rate is shown on ordinate for each visual assessment methods. A circle represents the mean of all observers, and the errors bars represent the standard deviation. A triangle indicates the highest response rate achieved by experienced observers.
Figure 4. Thumbnails of challenge image crops. 3. RESULTS Observers were asked to select the image that showed an artifact, and for each sequence/method condition, we compute the response rate for each observer. The response rate is the fraction of all trials in which the observer selected the compressed image. We then calculate the average across all observers. Additionally, from the four observers with DSC knowledge, we also note the highest response rate. We expect that these response rates would cluster near 0.5 when the compression is visually lossless, and observers respond at chance. When the response rate is reliably higher than 0.75, we can conclude the compression is not visually lossless. The response rate was different across various image presentation methods and the RGB4:4:4/YCbCr4:2:2 conditions. Since there is no video sequence in the challenge scenes, the analysis of response rate is broken down into two groups: cinema scenes/sequences and challenge scenes. 3.1 Cinema scene/sequence For the seven cinema scenes/sequences, results varied greatly by the specific scene/sequence. Among these cinema scenes/sequences, two (Sintel Shoulder and Sintel Wiseman) are plotted in Figures 5 and 6 since they showed strongest illustrations of several trends. 1)
Figure 6. Results plot for Sintel Wiseman. 3.2 Challenge images For the five challenge images, the response rate (sensitivity to the artifacts) was high. Three challenge images (Hintergrund, Tools, Color Text) are illustrated in Figures 7-9 as examples showing several dominat trends. 1) Observers are able to detect artifacts in many of the images above the 0.75 threshold for declaring visual lossiness. 2) Panning image compression and panning image compression with stabilization are comparitively more sensitive methods compared to the compressed/reference image flicker. 3) Artifacts are more visible using RGB 4:4:4 compared to YCbCr 4:2:2 at the tested regions (especially in Color Text). The panning and panning-stabilized methods were generally similar to the flicker testing in most cases, but there were several clear departures in which the panning methodologies were much more sensitive. In the case of the Color Text image, using the flicker method might erroneously lead us to pass the image with DSC whereas it would fail with the panning methods. Also, in
IDW/AD ’16
1287
the Tools image, the prominent difference in YCbCr 4:2:2 response rates suggests that for at least some observers, the panning motion could have made a substantial difference. The difference between RGB 4:4:4 and YCbCr 4:2:2 was typically subtle and the sign split between which had higher response fraction. However, in several cases, the result was heavily skewed toward RGB 4:4:4 having higher response fraction. The Color Text image in Figure 9 is one clear example. The YCbCr 4:2:2 case often met the standard for visually lossless. However, one reason for this is that the chroma subsampling with colored text is particularly noticeable and leads to a noticeable judder (which manifests as a flicker) with 1pixel translations. This judder masks the more subtle compression artifacts.
Figure 7. Results plot for Hintergrund.
considering how DSC artifacts could propogate through HDR processing to influence the final image quality on the screen. We found that for the evaluation of visually lossless coding, image panning methodologies, either stabilized or with motion, are most likely to reveal visible image artifacts. This is true for the cinema imagery and the challenge images, and suggests for robust testing it is important to consider frame to frame errors as opposed to fully emphasizing frame to reference differences. Applying the panning methods versus the flicker method does present some new challenges with visual evaluation. It demands full processing of every frame, and thus is much less suitable for sanity checking algorithm performance. For final approval of a revision, a panning based method seems like it could represent a more strict standard. Regarding the overall quality of DSC, it appears that it can be visually lossless in a great deal of HDR content, especially for natural imagery. Some challenge images designed to stress features in the codec reveal artifacts when tested with the metholodigies used in this study. The DSC codec has proven itself to be effective at compression in video playback. Artifacts introduced with video sequences containing local motion are far less visible than in the methods based on one frame, and in all but a few cases meet the visually lossless critierion. This paper also explored the use of DSC for encoding of HDR imagery; these signals require significant differential gain between color channels and a change in the transfer function. We found that these manipulations which do likely exacerbate image impairments, are still below threshold for cinematic video sequences. The use of DSC with HDR as part of an IT workflow, such as rendering small font and colored text, may need continued development.
Figure 8. Results plot for Tools.
Figure 9. Results plot for Color Text. 4. DISCUSSION This paper has attempted to explore the sensitivity of four image evaluation methods in visually lossless image coding during display streaming compression in HDR formats. The testing differs from prior work in two main ways. 1) We are using equipment and viewing distance that are representative of consumer TVs; 2) We are
1288
IDW/AD ’16
REFERENCES [1] D. M. Hoffman, and D. Stolitzka. “A new standard method of subjective assessment of barely visible image artifacts and a new public database”. Journal of the SID, Vol. 22, No. 12, 631 – 643 (2015). [2] ISO/IEC 29170-2:2015 Evaluation procedure for nearly lossless coding. (http://www.iso.org/iso/home/store.htm). [3] ISO/IEC 29170-2:2015/PDAM1 Parameters for nearly lossless coding of high dynamic range media. Document ISO/IEC JTC 1/SC 29 N 16019 (https://www.itscj.ipsj.or.jp/sc29/) [4] ISO/IEC 29170-2:2015/PDAM1 Evaluation procedure for nearly lossless coding of image sequences. Document ISO/IEC JTC 1/SC 29 N 16020 (https://www.itscj.ipsj.or.jp/sc29/) [5] DSC 1.2 Display Stream Compression Standard (VESA: San Jose, CA) (https://www.vesa.org/store/).