Nokia Research Centerâ. Figure 1: Artifacts resulting from high contrast diagonal edges. Left to right: original, Roimela et al. [2006], our method, and Munkberg ...
Efficient High Dynamic Range Texture Compression Kimmo Roimela, Tomi Aarnio and Joonas It¨aranta Nokia Research Center∗
Figure 1: Artifacts resulting from high contrast diagonal edges. Left to right: original, Roimela et al. [2006], our method, and Munkberg et al. [2007]. The method of Roimela et al. exhibits luminance distortion at the edge, whereas the latter two images are lossless for all practical purposes. Image courtesy of OpenEXR (Tree.exr).
Abstract
1 Introduction
We present a novel compression method for high dynamic range (HDR) textures, targeted for future graphics hardware. Identifying that the existing solutions provide either very high image quality or very simple hardware implementation, we aim to achieve both features in a single solution.
Texturing is a key feature in real-time rendering, and texture bandwidth is one of the most common bottlenecks in graphics performance. With the transition towards full high-dynamic-range (HDR) processing in modern graphics accelerators, HDR texturing is becoming increasingly attractive. This typically means even more texture data and even higher bandwidth requirements.
Our approach improves upon an existing technique by incorporating a simple chrominance coding that allows overall image quality on par with the state of the art, but at a substantially lower encoding and decoding complexity. The end result is what we believe to be an excellent compromise between image quality and efficiency of hardware implementations.
While texture compression is a standard bandwidth-saving feature in mainstream graphics hardware, the currently available compressed formats are not directly suited to compressing HDR textures. The first HDR texture compression methods have been presented in recent years, but none of them have been adopted by hardware manufacturers as of this writing. As we see this is a relevant feature, we can speculate that there is still room for improvement in the current approaches to the problem—either in their image quality or suitability to hardware implementation.
We evaluate our compression method using common test images and established HDR image quality metrics. Additionally, we complement these results with error measurements in the CIE L*a*b* color space in order to separately assess the quality of luminance and chrominance information. CR Categories: I.3.1 [Computer Graphics]: Hardware Architecture—Graphics processors I.3.6 [Computer Graphics]: Methodology and Techniques—Graphics data structures and data types I.3.7 [Computer Graphics]: Three-Dimensional Graphics and Realism—Color, shading, shadowing, and texture I.4.2 [Image Processing and Computer Vision]: Compression (Coding)— Approximate methods Keywords: HDR, high dynamic range, texture, image, compression, graphics hardware
∗ e-mail:
{kimmo.roimela|tomi.aarnio|joonas.itaranta}@nokia.com
In this paper, we look at how we could make HDR texture compression more attractive for future hardware designs. In particular, we propose chrominance and luminance coding improvements to a previously presented straightforward compression scheme, bringing image quality close to the current state of the art with notably lower decoding complexity. We believe that this results in an HDR compression format that offers excellent balance between image quality and feasibility of incorporating the format into mainstream graphics hardware.
c °ACM, 2008. This is the author’s version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version is published in the Proceedings of the 2008 Symposium in Interactive 3D Graphics, February 2008.
2
Previous work
In this section, we briefly discuss relevant previous work on texture compression, HDR texture compression in particular, and metrics commonly used for measuring HDR image quality.
2.1
Texture compression
Texture compression reduces the memory bandwidth required for accessing texture data in real-time rendering. This is achieved through the use of a compressed image format that is decoded in the graphics hardware at rendering time, as first introduced by Beers et al. [1996]. The texture compression methods currently available in graphics hardware include S3TC [Iourcha et al. 1999], PVRTC [Fenney 2003], and iPACKMAN [Str¨om and Akenine-M¨oller 2005]. All of these compress 24-bit RGB data into 4 bits per pixel (bpp) with little loss in visual quality. This enables savings of 80– 90% in texture bandwidth. Traditional image compression methods, such as JPEG, do not support random access to individual pixels, and are therefore not suitable for texture compression. This is also true of HDR image formats such as JPEG 2000 HDR [Xu et al. 2005] and JPEG-HDR [Ward 2005]. Traditional low-dynamic-range texture compression can be trivially applied to HDR content stored in the RGBE format [Ward 1992], but better image quality is achieved through dedicated HDR compression. Compressed formats dedicated to HDR textures have been presented by Munkberg et al. [2006; 2007], Roimela et al. [2006], and Wang et al. [2007]. They are all based on separating the image into chrominance and HDR luminance prior to encoding, but other than that, each of the three approaches has a slightly different focus. The first two are very similar to the low-dynamic-range methods in concept, enabling 6:1 compression over 48-bit HDR data (into 8 bpp) and aiming for a decoder implementation in graphics hardware. By contrast, Wang et al. have designed their format for decoding in pixel shaders on existing graphics hardware, leveraging the de-facto standard DXTC (S3TC) format. Despite using 16 bits per pixel, their format does not quite reach the hardware-based methods in imaqe quality [Munkberg et al. 2007]. We see the approach of Wang et al. as a good solution for the transition phase, before dedicated hardware support is widely available, but in the context of this paper, the first two are more interesting. From the same initial constraints, Munkberg et al. have aimed at maximum image quality possible, whereas Roimela et al. focus on the simplicity of the decoding step to facilitate efficient hardware designs. The most notable feature employed by Munkberg et al. is the sophisticated shape transformation in their chrominance coding [Munkberg et al. 2006], whereby they generate additional chrominance values from only two stored chrominance samples. The image quality of the original Munkberg et al. format is excellent, and their new version improves it further still [Munkberg et al. 2007]. The approach of Roimela et al., on the other hand, is based on a simple, hardware-friendly color space transformation and exploiting the properties of floating-point bit patterns. Their decoding step can be implemented almost entirely without floating-point operations, using simple integer shifts and additions only [Roimela et al. 2006]. As a result, the format is very simple both to encode and to decode, but somewhat lacking in image quality compared to that of Munkberg et al.
2.2
HDR image quality metrics
Measuring image quality is a major research area of its own, and HDR images add further variables due to the large luminance vari-
ations possible between closely neighboring regions of images. While LDR metrics can be verified through visual inspection and user tests, it is very difficult to objectively assess the severity of HDR image distortions. There is a large variety of tone-mapping operators for LDR displays, and true HDR displays are not widely available yet. We are therefore largely dependent on artificial metrics and subjective visual inspection when measuring the effect of lossy compression schemes. In the following, we briefly describe the most relevant quality metrics used for previous HDR work. HDR-VDP [Mantiuk et al. 2005] is perhaps the most robust HDR image quality metric at the moment. It simulates features of the human visual system to compute a likelihood of detection for each change between two images. However, HDR-VDP only works with the luminance information of images, so even large changes in chrominance values can go unnoticed. Based on the logarithmic response of the human visual system, Xu et al. have used the log[RGB] color space root-mean-square-error metric in their JPEG 2000 HDR work [Xu et al. 2005]. Essentially, they convert the pixels into a logarithmic RGB color space and compute the traditional RMS error in that domain. A similar log space error metric was also used by Wang et al. [2007]. The mPSNR metric [Munkberg et al. 2006] works by converting the original HDR image into multiple LDR images at different exposures, then computing an average of the peak signal-to-noise ratios (PSNR) of each individual exposure. This takes into account both the highlights and the shadows of the image, but of course does not consider properties of the human visual system. Ward has measured HDR image quality using a modification of the CIE delta E metric where the reference white level for each pixel is taken from the brightest pixel within a certain radius [Ward 2007]. While not directly derived from the human visual system, this seems to provide results that have some correlation with visual expectations. There are also other promising, perceptually motivated image quality metrics, such as Structural Similarity Index [Wang et al. 2004] and Perceptual Image Diff [Yee 2004]. As far as we know, however, they have not been applied to HDR content yet.
3 Our approach In this paper, we chose to focus on HDR compression methods designed primarily for incorporation into future graphics hardware. Given that the proposals of Munkberg et al. and Roimela et al. have already demonstrated very high image quality and very simple implementation, respectively, the logical direction for improvement is combining both aspects into a single format. In order to achieve the best of both worlds, we can either attempt to simplify the method of Munkberg et al. without losing image quality; or improve the image quality achieved by Roimela et al. without introducing much complexity. Identifying that the chrominance coding used by Roimela et al. clearly offers room for improvement, we chose to investigate that alternative by designing a more efficient chrominance coding.
3.1 The compression of Roimela et al. We briefly summarize the compression scheme used by Roimela et al. here, but for a complete treatment, refer to the original paper [Roimela et al. 2006]. Each 4 × 4 block of pixels is compressed individually. A straightforward color space transformation is performed on the original
12
PSNR [dB]
9
6
3
0
Luminance
Chrominance
ca Yu c
U H
Tu b
es Li bS ta tu e W in do w
Tr ee
k ol de nG at e M em or M ia ou l nt ai nD ew N ee dl e O ce an O rie nt at io Pi n gg yL am p St illL ife St re et Li gh t
s
G
D es
Bo ttl e
Ba c
ky ar Bi d gF og M ap
-3
Multi-exposure YCrCb
Figure 2: The image quality improvement of our HDR texture compression scheme, compared to the method of Roimela et al. [2006] in terms of luminance, chrominance, and multi-exposure PSNR. Luminance fidelity is improved by more than 6 dB on the average. byte
(Rp , Gp , Bp ) pixels:
7
Lp = 0.25Rp + 0.5Gp + 0.25Bp
(1)
and
0 1 2
bits 5 4 Lbias lum0
6
nzeros
3
2
1
0 nzeros lum1
lum1
lum2 ...
Rp 4Lp Bp bQ = . 4Lp
rQ =
(2) (3)
10 11 12 13 14 15
...
lum15
lum15
rbias
bbias
bbias r10
r00 b02
b01
b00 r30
r20
czeros r10 b03
b02
The chrominance coordinates rQ and bQ for each 2 × 2 sub-block are averaged and directly quantized into seven bits each. The luminance channel is encoded by first subtracting a per-block bias value from the bit pattern of each floating-point luminance Lp . From these differential luminance values, a per-block selectable number of leading zero bits are discarded, and four bits of differential luminance information are retained per pixel.
Table 1: New bit allocation for the compressed block. The luminance terms in bytes 0..10 are as originally described by Roimela et al. [2006], except for changes in bit field sizes and the fact that nzeros ranges from 1 to 8. Our new chrominance coding, bytes 11..15, is described in this paper.
While this is a very straightforward design, we can observe that the coding is clearly sub-optimal. In particular, the chrominance coding uses 56 bits, almost half of the 128-bit block, for the color data, and each 4 × 4 block can contain any four colors from the entire color space. Since chrominance tends to vary much more smoothly than luminance, there is considerable correlation within each image block that is currently unexploited.
In our new coding scheme, we effectively select for each block a square subspace of the entire chrominance space, and quantize each of the chrominance coordinates rQ and bQ into this subspace. First, we find the minimum values min(rQ ) and min(bQ ) from among the four chrominance values within the 4 × 4 block. These are then quantized to six bits each, denoted rbias and bbias , and subtracted from each rQ and bQ , respectively, to compute differential chrominance coordinates
3.2 An improved chrominance coding Observing that the chrominance values within each image block are typically local to a very small subspace of the entire color space, we should focus the coding on that subspace rather than the entire color space. Coding schemes in texture compression often store two colors per block, defining either a line [Iourcha et al. 1999] or a more complex shape [Munkberg et al. 2006] that is used to compute additional inferred values. Since we still wanted to keep the decoding complexity low, we decided against schemes that require interpolation between endpoints, even if that could have given a higher coding efficiency. Instead, we adopted a similar coding scheme as used by Roimela et al. for luminance ([Roimela et al. 2006], section 3.1), and applied it to the chrominance coordinates as well.
0 rQ = rQ − rbias
(4)
b0Q
(5)
= bQ − bbias .
We then find the smaller number of leading zero bits in the maxi0 mum differential values max(rQ ) and max(b0Q ). This is denoted czeros and clamped to [0, 7]. Finally, czeros most significant bits 0 of each rQ and b0Q are dropped, and the remaining values truncated into three bits each, as shown in Table 1. Three bits per chrominance coordinate may not seem like much, but we attempted a number of different bit allocations and found this one to work extremely well in practice. The new coding scheme
gives a maximum of ten bits of precision per chrominance coordinate when the data is smooth and czeros is at maximum, compared to the fixed seven bits in the original [Roimela et al. 2006], which actually improves overall chrominance precision slightly. More importantly, the reduction in the number of chrominance bits allows us to significantly improve the quality of the luminance information.
exp2 (2x , fixed) mul (float-fixed) mul (fixed-fixed) table lookup add (fixed) bit shift
Roimela et al. 2006 3 3 4
Munkberg et al. 2007 1 3 8 2 12 -
Our 3 5 6
Table 2: The number of arithmetic operations required in each decoder. Note that any additional decoding logic is omitted from the table: for example, the different encoding modes supported by the Munkberg et al. format are not represented here. Concerning decoding complexity, Table 2 shows the arithmetic operations required in our new decoder, compared to that of Roimela et al. [2006] and Munkberg et al. [2006; 2007]. While the number of floating-point times fixed-point multiplications is the same in all three alternatives, our proposed method has significantly fewer operations overall than that of Munkberg et al., and only adds a couple of integer shifts and additions to that of Roimela et al. Taking into account the additional logic required for the different coding modes of Munkberg et al., we can say with some confidence that our codec remains significantly less complex.
3.3 Luminance improvements Our new chrominance coding frees a total of 17 bits of chrominance information for use in luminance encoding. Since the per-pixel luminance quality is an obvious bottleneck in the original algorithm, 16 bits are added to the per-pixel luminance values, giving a total of five bits of luminance per pixel. As can be seen in Figure 1, this eliminates practically all luminance artifacts visible to the naked eye without magnification, bringing luminance fidelity on par with that of Munkberg et al. In allocating the final available bit, we gathered statistics on the coded blocks produced by the original method, using both natural images and extreme test cases. We found that of the block parameters, the Lbias values are fairly evenly distributed across the value range in most images. The one available bit was therefore allocated there, adding further precision to luminance encoding particularly in smooth image regions. Another observation from the statistics was that the nzeros parameter [Roimela et al. 2006] received a value of zero only in artificial test images, and even then in only a few blocks with an extreme dynamic range, whereas the maximum value of seven was frequently encountered. We decided to add a bias of one to the nzeros field to change the representable range from [0, 7] to [1, 8]. This reduces the dynamic range representable within a single block, but in exchange we can represent smooth blocks with even higher accuracy.
3.4
Encoder optimizations
In the original paper by Roimela et al., the encoder is very straightforward and there is no optimization after computing the quantized luminance values based on the full dynamic range of each block [Roimela et al. 2006]. This way, a single bright or dark pixel in
Figure 4: Artifacts exhibited at high exposures. Clockwise from top left: original, Roimela et al. [2006], Munkberg et al. [2007], and our method. The artifacts in the second image are caused by chrominance values quantized to the locations of the RGB primaries, which results in the viewer software saturating these chrominances to the primaries rather than white; refer to Section 6. Image courtesy of OpenEXR (Desk.exr).
an image block can cause significant precision loss in the other 15 pixels. It would seem that the overall result could be improved by reducing the dynamic range of the block in order to gain more precision for the majority of the pixels. With our new chrominance coding using a similar scheme, this may also be relevant for the chrominance parameters. We implemented such an optimization for both luminance and chrominance information in our encoder. We attempted to narrow down the dynamic range of the block by adjusting the scale (nzeros , czeros ) parameter by a few steps and finding a corresponding bias (Lbias , rbias , bbias ) parameter that yields a better encoding. This optimization was guided by either CIE Lab color space MSE with L, a, and b weighted at a ratio of 4:1:1, a multiple-exposure MSE metric similar to mPSNR (although with fewer exposures), or a combination of both. In practice, the results of this optimization were usually negligible. The PSNR (Lab space or mPSNR) of the image was typically increased by less than half a dB, and the subjective visual quality was the same or, in some cases, slightly worse. In a few images, such as that shown in Figure 4, we did get a significant improvement when the optimization stage corrected chrominance artifacts visible in the mPSNR results, improving the result by several dB. It can be argued, however, that this was not a real error in the first place—refer to Section 6 for further discussion.
4 Measuring luminance and chrominance fidelity The HDR-VDP metric does a good job measuring perceived luminance fidelity, but it does not evaluate the chrominance data at all. Also, it would be desirable for us to be able to separately assess the
12 9
PSNR [dB]
6 3 0 -3 -6 -9
Luminance
Chrominance
ca Yu c
es Li bS ta tu e W in do w
Tu b
U H
Tr ee
k ol de nG at e M em or M ia ou l nt ai nD ew N ee dl e O ce an O rie nt at io Pi n gg yL am p St illL ife St re et Li gh t
s
G
D es
Bo ttl e
Ba c
ky ar Bi d gF og M ap
-12
Multi-exposure YCrCb
Figure 3: The image quality of our method compared against the current state of the art [Munkberg et al. 2007] in terms of PSNR. We achieve higher fidelity in luminance, but are unable to match the sophisticated chrominance coding of Munkberg et al. The difference in chrominance quality is most evident in the artificial test images Orientation and Tubes, which contain unnaturally sharp chrominance boundaries. quality of our luminance and chrominance information, so as to be able to determine the exact effect of our modified coding scheme. In order to determine the error in the chrominance channel, we must first separate the luminance and chrominance information. Furthermore, we need to measure differences between two chrominance samples using a metric that is as uniform as possible across the entire color space. In order to achieve this, we convert our test images into the CIE 1976 L*a*b* color space—for brevity, we omit the asterisks and refer to this as “Lab” in the rest of this paper. Lab is, at least in theory, a perceptually linear color space. This means that the difference between any two color coordinates should correspond directly to the perceived difference between the respective colors. To perform measurements on chrominance fidelity, we first convert our linear-RGB images into Lab. From the Lab images, we then compute the peak-signal-to-noise ratio (PSNR) separately for luminance and chrominance data: „ PSNRL = 10 log10 „ PSNRab = 10 log10
Lmax 2 MSEL
«
« max(amax , bmax )2 , MSEab
(6) (7)
where the subscript “max” denotes the maximum value within the original image, and the mean square error (MSE) terms are computed as
MSEL = MSEab =
N 1 X ∆Li 2 N i=1
(8)
N 1 X (∆ai 2 + ∆bi 2 ). 2N i=1
(9)
Here, the delta terms denote the difference between the original and compressed pixel i, and N is the number of pixels in the image.
5 Results To compare the image quality of our method against the prior art [Munkberg et al. 2007; Roimela et al. 2006], we collected a set of eighteen images from the OpenEXR package and various sources on the web. Most of the images are photographic, reflecting the typical use cases of HDR textures, but a few images with synthetic content and text are also included. The images range in resolution from 0.1 to 3.0 million pixels, with an average of 1.0 megapixels. Their dynamic range varies between 3.2 and 6.9 orders of magnitude, and their global luminance levels, reported by HDR-VDP, from 10−5 to 0.8 cd/m2 . Based on visual inspection, all photographic images in the test set were virtually lossless when encoded with either our method or that of Munkberg et al. Artifacts could be seen in very few images, and even then only by rapidly flipping between the compressed and original version at very high levels of zoom and exposure. Compared to the method of Roimela et al., the increased luminance bit allocation in our method (5 vs. 4 bits per pixel) yields significantly better quality, particularly along high-contrast edges. See Figure 1 for an example. As photographic content produced very minor coding artifacts in most cases, making it difficult to see real diffences between the codecs, we also included some non-typical content in the test set. As expected, line drawings and text turned out to be more susceptible to error than photographs, but the quality is still quite acceptable; see Figure 6 for a worst-case scenario with overlapping fonts in different colors. Note, though, that this is an unlikely use case for HDR texturing. To complement visual inspection, we used five different error metrics: log[RGB] RMSE, luminance PSNR, chrominance PSNR, multi-exposure PSNR, and HDR-VDP. Luminance and chrominance were measured in the Lab color space, and mPSNR in YCrCb. We also measured mPSNR in the RGB color space; the results were essentially the same, except for one anomaly that seemed to give unfair advantage to our method (15 dB better on UHLibStatue for no apparent reason). For HDR-VDP evaluation, we fixed the global adaptation lumi-
100.00%
10.00%
1.00%
0.10%
0.01%
Munkberg 2007
Roimela 2006
ca Yu c
es Li bS ta tu e W in do w
Tu b
U H
Tr ee
s
k ol de nG at e M em or M ou ia l nt ai nD ew N ee dl e O ce an O rie nt at io Pi n gg yL am p St illL ife St re et Li gh t
G
D es
Bo ttl e
Ba c
ky ar Bi d gF og M ap
0.00%
Our method
Figure 5: The average probability of detection according to HDR-VDP for each image and compression method. Smaller values indicate better image quality. Note also that the scale is logarithmic. The spikes at Bottles and MountainDew are probably due to an error in the Munkberg et al. encoder implementation we used rather than the format itself; see Section 5. nance at 10.0 cd/m2 by adjusting the --multiply-lum parameter individually for each image. This level of exposure is high enough to ensure that errors in dark regions will not go unnoticed. From the Probability of Detection Map (PDM) produced by HDRVDP, we computed the average probability of detection over all pixels. The PSNR and VDP results are listed in Table 3 and illustrated in Figures 2, 3, and 5. log[RGB] RMSE scores are not included due to space limitations, but they are available from the authors on request. On the average, the RMSE was 0.21, 0.37, and 0.32 for Munkberg et al., Roimela et al., and our method, respectively. It should be pointed out that a few test images were badly distorted by the Munkberg et al. codec, although we could not identify anything in their compression format that would explain these anomalies. As the artifacts invariably occur in very dark image regions, we suspect a computational precision issue in the implementation that we used. The problematic images were excluded from the average results.
Figure 6: Text and line encoding artifacts. While not representative of typical HDR content, this serves as a worst-case scenario for the HDR codecs. Clockwise from top left: original, Roimela et al. [2006], Munkberg et al. [2007], and our method. Note the color distortion and blocking artifacts that are present regardless of the compression method. Image courtesy of OpenEXR (Orientation.exr).
We also benchmarked our method against JPEG 2000 HDR [Xu et al. 2005] to see how we measure up to the state of the art in still image coding. Not surprisingly, JPEG 2000 generally outperforms our method at the same bitrate (8 bpp). However, our method actually scored better on the HDR-VDP average, because many images in the test set were badly distorted by JPEG 2000; see Figure 7 for an example. We have not investigated the reasons behind this behavior, but suspect that their usage of log[RGB] RMSE for ratedistortion optimization is likely to be a factor. To summarize the results, we achieve an order of magnitude improvement in luminance fidelity over earlier HDR texture compression methods, and noticeable improvements in chrominance and mPSNR over Roimela et al [2006]. The chrominance encoding of Munkberg et al. remains superior in PSNR terms, but whether that translates into a noticeable difference in observed image quality is a subject for further study.
6 Discussion The main drawback of our new compression format is probably the low spatial resolution of chrominance information. While this is not
Backyard BigFogMap Bottles Desk GoldenGate Memorial MountainDew Needle Ocean Orientation PiggyLamp StillLife StreetLight Tree Tubes UHLibStatue Window Yucca Average
Luminance PSNR [dB] Munkberg Roimela Our 57.3 50.8 58.0 63.6 57.9 65.2 67.8 62.6 68.8 50.0 45.8 51.9 67.1 64.0 70.4 59.4 55.3 61.5 68.8 65.8 74.1 71.1 67.7 73.7 59.7 56.6 63.0 44.5 42.6 48.7 66.4 59.1 67.5 65.3 61.2 67.5 68.5 63.0 69.6 45.9 43.5 49.5 41.4 39.1 44.9 49.3 43.9 50.5 53.2 47.8 55.3 55.0 49.4 56.1 57.3 53.0 59.6
Chrominance PSNR [dB] Munkberg Roimela Our 42.5 39.2 40.1 48.3 44.7 45.5 62.3 59.7 59.8 40.8 37.5 37.1 52.2 47.9 48.2 45.2 41.7 41.9 53.1 48.1 51.0 56.9 53.6 53.8 39.7 35.3 35.4 34.5 23.7 23.8 48.1 43.0 44.6 51.7 46.4 46.4 54.0 49.7 49.9 30.4 27.1 27.2 35.0 24.3 24.3 38.2 33.1 33.3 37.2 32.5 32.8 38.2 34.0 34.2 43.3 38.4 38.7
Multi-exposure PSNR [dB] Munkberg Roimela Our 55.7 50.4 53.5 56.4 51.8 54.5 55.7 51.3 52.7 46.3 37.9 41.8 52.9 47.5 48.1 50.8 45.8 46.7 58.6 64.7 68.5 51.2 47.0 47.4 47.8 42.9 43.7 46.2 34.8 34.6 48.8 45.5 46.7 40.3 38.2 38.1 55.2 50.4 52.6 40.0 34.7 34.9 41.6 33.7 33.6 52.6 47.1 48.6 54.3 49.2 50.6 51.6 45.8 47.8 49.5 43.9 45.2
HDR-VDP Average Error [%] Munkberg Roimela Our 0.01 0.22 0.01 0.01 0.26 0.01 4.99 0.38 0.06 0.28 1.36 0.15 0.02 0.27 0.01 0.13 0.81 0.06 62.65 1.18 0.03 0.08 0.32 0.02 0.13 0.54 0.04 0.14 0.64 0.09 0.67 0.44 0.02 1.75 0.88 0.18 0.03 0.34 0.02 0.26 0.72 0.08 5.89 8.46 1.85 0.07 0.61 0.04 0.04 0.35 0.02 0.08 0.67 0.07 0.60 1.06 0.17
Table 3: Our method compared against those of Munkberg et al. [2007] and Roimela et al. [2006] using four different image quality metrics. Luminance and chrominance PSNR are measured in the CIE Lab color space, whereas multi-exposure PSNR is computed in YCrCb. Recall that better image quality is indicated by larger PSNR values and smaller VDP values. Bottles and MountainDew are excluded from the average results due to anomalous results on the Munkberg et al. codec; see Section 5. a problem in the grand majority of images, certain low-resolution test images exhibit visible artifacts. To overcome the problem, it would certainly be possible to incorporate an alternate bit allocation as done by Munkberg et al. in their improved method [2007]. There are currently a few unused bit combinations of the Lbias and nzeros parameters that we could use to signal a different encoding mode, but this has not been investigated further as of this writing. A topic we have expressly ignored in this paper is HDR texture filtering. High-end consumer graphics hardware already incorporates bilinear filtering of HDR textures, so it is conceivable that this can be performed as a post-process after the floating-point pixels have been decoded. However, floating-point filtering is computationally very expensive compared to filtering of non-HDR, fixed-point data. Therefore, both Wang et al. [2007] and Munkberg et al. [2007] have discussed HDR filtering approaches that work directly on the separated luminance and chrominance components. The latter approach is particularly interesting, as it should be easily applicable to our HDR format as well, but evaluating it in this context is a subject for further study. In conjunction with algorithmic developments, we still see a need for an HDR quality metric that is based on human perception and takes chrominance information into account. While the various RMSE and PSNR metrics are practical overall measures of image distortion, they do not always agree with visually perceived image quality—see Figure 7 for an example. Naturally, this is also true of our Lab space PSNR measurements, which are similarly based on mean square error. As we measure chrominance independently of luminance, our Lab space chrominance metric does detect certain chrominance errors that go unnoticed in the other metrics, but it is still not clear how prominent these errors would be in an HDR context. Of the metrics we used, HDR-VDP is still, subjectively, closest to reality. Regarding the high-exposure chrominance artifacts exhibited in Figure 4, we can argue that this is not an error in the encoded image, but rather in the viewing software used. As discussed by Glassner [2001], the proper way to render bright RGB colors is to adjust the saturation so that the desired luminance level is achieved; in HDR tone mapping, this is even more important, as the false colors in-
troduced by merely clamping to the RGB color cube can be very distracting. Naturally, the same approach should be taken in any HDR error measurements. Finally, upon investigating some of our test images, we were somewhat surprised to discover that many darker images contain large numbers of denormal floating-point values. Denormals can often be ignored or clamped to zero in algorithms working with full 32bit floats, but it seems that that the smaller dynamic range of the 16-bit half-float format makes them common enough that they must be handled properly in any algorithm working with the 16-bit half format.
7 Conclusion We have presented a high dynamic range texture compression scheme that delivers image quality comparable with the existing state of the art in the field. Still, the decoding complexity of our scheme remains low enough to facilitate a simple, efficient hardware decoder. We firmly believe that our method strikes an excellent balance between image quality and simplicity of implementation. Our results were verified through a number of HDR metrics. However, apart from HDR-VDP, the commonly used metrics still do not have a direct link to the human visual system. While we can measure and detect distortions in the image signal itself, it is not obvious how those distortions manifest themselves in the final HDR viewing environment. In some cases the signal-to-noise ratios are clearly not in line with subjectively observed image quality. We briefly touched upon details of HDR processing and rendering that are not common in the LDR domain. In particular, care should be taken in tone-mapping HDR rendered content to an LDR display in order to avoid unnatural color shifts in bright, colorful scenes. In the future, we hope to see yet more work on perceptually motivated HDR quality metrics that could reliably assess image fidelity across a wide range of scenes. Subjectively, we feel that the image quality of the current HDR compression offerings is already good enough for the grand majority of use cases, but a solid metric would
Figure 7: An example of mPSNR and log[RGB] RMSE conflicting with observed image quality. From the left: original image; our method (mPSNR = 38 dB, RMSE = 0.84); JPEG2000-HDR [Xu et al. 2005] at 8 bpp (mPSNR = 40 dB , RMSE = 0.55). In other words, severe blurring in the rightmost image is considered less harmful than the few imperceptible changes in the bright areas of the middle image. The distortion is better picked up by L*a*b* luminance PSNR (13 dB to our advantage) and particularly HDR-VDP (40% average error for JPEG2000 vs. 0.2% for our method). Image courtesy of OpenEXR (StillLife.exr). allow us to quantify the effect of any future developments.
Acknowledgements We would like to thank Jacob Munkberg for help with the Munkberg et al. texture compressor and the mPSNR tool, and the anonymous reviewers for their valuable input.
References B EERS , A. C., AGRAWALA , M., AND C HADDHA , N. 1996. Rendering from compressed textures. In SIGGRAPH ’96: Proceedings of the 23rd annual conference on Computer graphics and interactive techniques, ACM Press, New York, NY, USA, 373– 378. F ENNEY, S. 2003. Texture compression using low-frequency signal modulation. In HWWS ’03: Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware, Eurographics Association, Aire-la-Ville, Switzerland, Switzerland, 84–91. G LASSNER , A. S. 2001. A change of scene. IEEE Computer Graphics and Applications 21, 3, 86–92. I OURCHA , K., NAYAK , K., AND H ONG , Z., 1999. System and method for fixed-rate block-based image compression with inferred pixel values. US Patent 5,956,431. M ANTIUK , R., DALY, S., M YSZKOWSKI , K., AND S EIDEL , H.P. 2005. Predicting visible differences in high dynamic range images - model and its calibration. In Human Vision and Electronic Imaging X, IST/SPIE’s 17th Annual Symposium on Electronic Imaging (2005), vol. 5666, 204–214. M UNKBERG , J., C LARBERG , P., H ASSELGREN , J., AND ¨ A KENINE -M OLLER , T. 2006. High dynamic range texture compression for graphics hardware. In SIGGRAPH ’06: ACM SIGGRAPH 2006 Papers, ACM Press, New York, NY, USA, 698– 706. M UNKBERG , J., C LARBERG , P., H ASSELGREN , J., AND ¨ A KENINE -M OLLER , T. 2007. Practical HDR texture compression. Tech. rep., Lund University, August.
¨ ROIMELA , K., A ARNIO , T., AND I T ARANTA , J. 2006. High dynamic range texture compression. In SIGGRAPH ’06: ACM SIGGRAPH 2006 Papers, ACM Press, New York, NY, USA, 707–712. ¨ , J., AND A KENINE -M OLLER ¨ S TR OM , T. 2005. iPACKMAN: high-quality, low-complexity texture compression for mobile phones. In HWWS ’05: Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware, ACM Press, New York, NY, USA, 63–70. WANG , Z., B OVIK , A. C., S HEIKH , H. R., AND S IMONCELLI , E. P. 2004. Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing 13, 4, 600–612. WANG , L., WANG , X., S LOAN , P.-P., W EI , L.-Y., T ONG , X., AND G UO , B. 2007. Rendering from compressed high dynamic range textures on programmable graphics hardware. In SIGGRAPH Symposium on Interactive 3D Graphics and Games, ACM Press, New York, NY, USA. WARD , G. 1992. Real pixels. In Graphics Gems II, J. Arvo, Ed. WARD , G. 2005. JPEG-HDR: A backwards-compatible, high dynamic range extension to JPEG. In Proceedings of the Thirteenth Color Imaging Conference. WARD , G., 2007. High dynamic range image encodings. http://www.anyhere.com/gward/hdrenc/ hdr encodings.html. (Oct. 2007). X U , R., PATTANAIK , S. N., AND H UGHES , C. E. 2005. Highdynamic-range still-image encoding in JPEG 2000. IEEE Computer Graphics and Applications 25, 6, 57–64. Y EE , H. 2004. A perceptual metric for production testing. Journal of Graphics Tools 9, 4, 33–40.