Dierential Coding of Alpha Planes with Adaptive Quantization Laurent Piron and Murat Kunt Signal Processing Laboratory
[email protected] .ch
ABSTRACT: Alpha planes are used to compose objects in order to build synthetic images. They contain the shape and the transparency of the dierent objects. The scene is composed by superimposing objects in a given order. The alpha planes specify the transparencies of the objects. In order to independently compress these objects, the alpha planes must also be coded. In this paper, we present a way of coding alpha planes using 2D dierential coding. This technique quantizes alpha planes without introducing too many artifacts in the composed image. The quantization must be done carefully. The proposed method has been compared to the scheme used in the MPEG-4 Veri cation Model (VM) for several sequences using dierent types of alpha planes. Depending on the sequence, the PSNR can improve up to 3 dB compared to the VM. The scheme is also much simpler than the VM. 1 INTRODUCTION
The new technologies for image manipulation make it possible to merge dierent objects to compose an image. These objects may be segmented out from natural scenes (a moving car), or they may be computer generated (a logo). The scene is composed by superimposing objects in a given order. The alpha planes specify the relative transparencies and position of the objects. An alpha plane is associated to each object or layer in the scene, except the background. Each object is described by its texture, shape, relative transparency and relative depth to the background. The alpha plane contains two of these four dierent pieces of information the shape the relative transparency to the background. In an object oriented encoder, the alpha plane must be coded. As it includes the transparency information, an alpha plane is more \complex" than
just a binary shape. Therefore the techniques used to encode binary shapes are not sucient. This paper is organized as follows: Section 2 presents some interesting properties of alpha planes and Section 3 explains the proposed compression techniques which take into account the special properties of alpha planes. Section 4 presents results from some experiments. Finally, we present some conclusions in Section 5.
2 ALPHA PLANE PROPERTIES
Alpha planes can be very dierent. The images in Fig. 1 show four possible types of alpha planes. The two examples on top come from a fully synthetic sequence, \Destruction". Each layer is a graphic object. The rst example (top left) corresponds to the beginning of an explosion, and the second one (top right) corresponds to falling rain. The two bottom examples are extracted from sequences \Weather" and \Children" respectively. Both scenes have been composed using natural images and synthetic data. In the \Weather" example the goal is to superimpose a speaker onto a synthetic map. For most alpha planes the transparency is almost constant for all pixels except in the border region. In this case the transparency tapers down to zero. This process, also called feathering, removes aliasing eects. The transparency of a pixel has a range from 0 to 255 and is represented with eight bits. An alpha value equal to 0 means that the object is fully transparent at the considered position while an alpha value of 255 means that the object is compltely opaque (the background is not visible). The shape of the object can be deduced from these transparency values: All the pixels with a weight dierent from 0 belong to the object. The alpha values are weights assigned to the texture pixels of the object. This is used when composing the image. In order to build the nal image the objects are
Figure 1: Examples of alpha planes. The rst and second examples (top) are extracted from sequence \Destruction" (layer 6 and 8), the third one (bottom left) is from sequence \Weather" and the last one from sequence \Children". combined recursively, according to their assigned depths in the scene. First the object closest to the background and the background are \merged". The result of this rst operation becomes the new background. The next object in the stack is then merged with this new background and so on until all the objects have been processed. The merging operation is done using the following rule: (255 ? P ) Pback + P Pobj (1) Pback = 255 P is the alpha value associated to Pobj (a pixel of the object), Pback is the pixel in the current background and Pback is the resulting pixel in the new background. Eq. 1 shows that the in uence of an alpha plane on the nal image is not direct. The alpha plane contains weights which in uence the importance of a pixel value. It also contains the shape information, so, its visual eect can be signi cant. Alpha planes can be compressed more eciently if these properties are taken into account. The examples in Fig. 1 show that two dierent types of frequency information coexist in an alpha plane: 1. Large at areas. These regions correspond to low frequencies and can be eciently coded with several coding scheme (DCT, predictive coding, etc). 2. Very abrupt changes in the transparency values. These parts of the alpha planes are more dicult to code. A transformation 0
0
into the frequency domain results in significant high frequency coecients. These coecients are, in most cases, aected by the quantization. The second type of information (very abrupt changes) represent the major problem in alpha plane coding. These changes appear mostly on the border of the object where the information is important for the composition of the nal image. As most schemes have diculties dealing with high frequencies, the shape is coded in a lossy fashion. This can result in undesired border artifacts for the reconstructed image. In some cases, it can also increase the bit rate. In order to improve a compression technique for alpha planes, a pre-processing of the alpha values can be useful. The alpha values are represented using 256 levels. We apply a pre-processing algorithm to reduce the dynamic range of the alpha values. For an alpha value , the new value is given by ? c if 6= 0 K 1 + b K = 0 otherwise 0
0
where K is the quantization step-size and b:c is the Euclidean division. The values are saturated to K b 255 K c. After this process the dynamic range is reduced to only K values. In order to preserve the shape of the object, no value except the zero is quantized to zero. Fig. 2 shows a comparison of three images: In the left hand column the images are coded using 8 bits dynamic ranges for the alpha planes while the right hand images were obtained using 4 bits. Although the \quantized" images are dierent from the original ones, the visual artifacts are acceptable. The resulting PSNR values are presented in Table 1. Image PSNR (dB) \Destruction" (explosion) 50.97 \Children" 41.26 \Weather" 58.23 Table 1: PSNR of the composed images using a uniform quantization. The low PSNR for the \Children" example results from the quantization to a dierent value of the large area between the letters. The resulting image is acceptable and the error is not visible, but the change in the transparency results in a change in the pixel values, and hence a low PSNR (the alpha value has be quantized to 112 while the original value is 101).
Image PSNR (dB) \Destruction" (explosion) 50.97 \Children" 50.22 \Weather" 58.90 Table 2: PSNR of the composed images using a non uniform quantization. must be performed carefully. The image in Fig. 3 shows an example. The quantized alpha plane has only 17 levels and the value 255 (fully opaque) has been quantized to 247. In this image the white line behind the lady, which should be invisible, can be seen, mainly in the right part of the image.
Figure 2: Comparison between original images (on the left) and images composed with alpha planes containing 16 levels for the weights instead of 256 (on the right). As illustrated by the above examples, a uniform quantization changes the alpha values without taking into account the presence of large uniform areas. Another way of reducing the dynamic range is to keep only the most signi cant alpha values in the histogram of the image. Taking these values, the approximation of large uniform areas is perfect. The drawback is that these values must be included in the bitstream in order to reconstruct the alpha plane. Another drawback is that when no uniform areas are present, the resulting quantization is not acceptable. One way to solve this problem is to adapt the uniform quantization. If an alpha value is signi cant, the linear quantization is changed according to this value. The closest quantized value is replaced by this value. A value is retained if Xi > X + 2X
where X is the mean of the alpha values in the shape and X if the standard deviation. In this case, the alpha value is kept. Using this adaptation (see table 2), the composed image \Children" presents a higher PSNR, while the PSNR is not changed for the \Destruction" example because no value occurs more frequently than the de ned threshold. However, this reduction of the dynamic range
Figure 3: Example of a composed image with 17 levels in the alpha plane,where fully opaque objects are not preserved due to quantization. The reduction of the dynamic range must take into account two very important alpha values, 0 and 255. An alpha value dierent from 0 must remain so after quantization as this value is used to de ne the shape. An alpha value equal to 255 must also retain this value after quantization (or at worst be quantized to a very similar value). Taking into account the composition rule of Eq. 1, in the extreme case (ie a white object on a black background) the resulting pixel has a gray level equal to instead of 255. In addition, as in the case shown in Fig. 3, the white lines on the dark map increase this undesired transparency eect. If the whole background was uniform the transparency would not result in strange composition eects, but these lines intensify this eect. Based on these observations, and considering composed images with dierent quantization levels for the weight 255, it appears that a maximum dierence of ve levels between a quantized alpha value and the original value of 255 is acceptable. With these restrictions, the reduction of the number of levels can be further used in a compression scheme during the pre-processing stage.
3 DIFFERENTIAL CODING 3.1 Lossless JPEG
The lossless JPEG coding scheme is based on differential coding followed by entropy coding [1]. This algorithm is completely dierent from the usual JPEG algorithm based on a DCT decomposition. Seven dierent modes exist, each corresponding to a speci c prediction of the current pixel. Fig. 4 shows the considered neighboring pixels A, B and C used to predict pixel X. The seven modes are as follows: C
B
A X
Figure 4: Position of the pixels used for prediction in lossless mode of JPEG. 1. X = A 2. X = B 3. X = C 4. X = A + B - C 5. X = A + B?2 C 6. X = B + A?2 C 7. X = A+B 2 After prediction the residual error is then entropy coded. The cost of this lossless scheme is very high, but the shape can be reconstructed perfectly. If the transparency is modi ed appropriately the resulting image, after composition, is acceptable. Preprocessing the transparency values can decrease the cost while maintaining perfect shape information.
3.2 2D Dierential Coding
The principle of the JPEG lossless scheme can be applied to the quantized alpha planes. Before the coding scheme is applied, the dynamic range of the alpha values is compressed using the rules presented in section 2. Therefore the proposed scheme is not lossless. However the quality of the decoded alpha plane is quite high. In our experiments the dynamic range has been reduce from 256 levels to 16 levels.
To reduce the bitrate, the bounding box of the object is computed. The pixels outside the bounding box are not coded at all. The likely presence of large uniform area equal to 0 or 255 is also considered in the scheme. The bounding box is divided in 88 blocks. If all the values in a block are equal to 0 or 255 a speci c code word is used for the entire block. Otherwise dierential coding is applied and resulting dierences are entropy coded using adaptive arithmetic coding. We have performed some experiments to determine the best prediction scheme. It appears that mode 7 of the lossless JPEG coding scheme is the most accurate in most cases.
3.3 Two Layer Dierential Coding
The problem with 2D dierential coding is that once the number of levels has been de ned, the compression ratio can not be speci ed. One way to solve the problem is to sub-sample the alpha plane and to apply a dierential coding to each layer. In the rst layer only one in four pixels is kept. The scheme described in Section 3.2 is applied to this smaller shape. The remaining pixels are then coded in the second layer using the same scheme. The sub-sampled alpha plane can be decoded to obtain an approximation of the shape and transparency. In this case the shape coding is lossy, but the error is not larger than one pixel. If the second layer is also decoded the resulting alpha plane is the same as that obtained with a 2D dierential coding. If both layers are encoded this approach is somewhat more expensive, because the lake of correlation between the alpha values in the sub-sampled alpha plane is not strong. However, the overhead is not so high and the possibility to progressively decode only an approximation of the alpha plane may be of interest.
4 EXPERIMENTAL RESULTS
We have compared the two proposed schemes for coding alpha planes with the Intra mode of the MPEG-4 Veri cation Model Version 7 [2].
4.1 MPEG-4 Veri cation Model
In the VM, several algorithms are used to encode an alpha plane. Some algorithms are only suitable for feathered alpha planes, for example, the one in the bottom left of Fig. 1. Only one algorithm is general enough to be applied to all types of alpha planes: The shape information is extracted from the alpha plane, and a binary mask is created. This mask is coded using the standard binary shape coding technique, the CAE scheme [3]. The rest of the information (the transparency) is
coded using the algorithm described in the texture coding mode. The VM supports 88 padded DCT or 88 Shape Adaptive DCT for coding the texture information. In the rst case, the blocks that straddle the border of the shape are rst padded using a speci c technique.
4.2 Comparisons
One frame from each of the sequences used to perform the comparisons is shown in Fig. 1. These sequences use dierent kinds of alpha planes and are of dierent lengths. The respective lengths are given in table 3. Sequence # frames Destruction (layer 8) 168 Destruction (layer 6) 28 Weather (lady) 300 Children (logo) 296 Table 3: Sequence lengths. In our experiments, we have used the padded DCT based technique and the shape has been loss-
lessly coded. The quality is evaluated for the reconstructed frames. The composition algorithm is applied with the decoded alpha plane and the original textures for the background and the foreground, so that only the decoded alpha plane in uences the quality. Fig. 5 shows the PSNR comparisons between the method described in Section 3.2 and the VM. The comparison is done at similar bitrates. The quantization parameters, QP, used for the MPEG4 compression of the alpha planes are indicated in the respective plots. The results are quite dierent depending of the sequence. For the rst example, the \Destruction" sequence (layer 6, an explosion), the MPEG-4 VM performs better than the dierential coding. In this sequence no large uniform areas appear (except the opaque area). Thus the adaptive quantization is not very eective. After quantization the alpha plane has no large uniform areas and a dierential coding is not well suited. In the other examples the quantization reduces the range of the levels without signi cant degradation and large uniform areas appear which are well suited for dierential coding. Thus, for these sequences, the quality is 3dB better than the one obtained with the VM. Another important point of comparison is the complexity. The proposed schemes are much simpler to implement than the DCT based schemes used in MPEG-4 for coding alpha planes. Even
Figure 5: PSNR comparison between the proposed method and the MPEG-4 VM at the same bitrate. From top to bottom, the sequences are \Destruction" layer 6, \Destruction" layer 8, \Weather" and \Children".
for comparable quality of reconstructed images the complexity is much lower. In Fig. 6, the four plots show the bitrates necessary to code the four test sequences using the layered scheme proposed in 3.3. MPEG-4 performs better in these tests, but the dierence is not very large. This corresponds to the classical overhead cost for scalability. Our scheme enables partial decoding of the alpha planes (the shape is with a maximum of one pixel error). In addition, this approximation presents, at the same bitrate, a PSNR similar to that obtained using the VM.
5 CONCLUSION
In this paper, we have presented two schemes for compressing alpha planes. The alpha planes have some speci c properties which can be exploited to design a pre-processing scheme in order to reduce the alpha plane complexity and facilitate its coding. This pre-processing scheme has been combined with a 2D dierential coding technique. Comparisons with the MPEG-4 VM show that this algorithm, depending on the type of the alpha plane, allows a quality improvement up to 3 dB. Also, the complexity of the proposed scheme is much lower than the DCT based schemes used in the VM. Our pre-processing scheme can also be combined with other compression techniques such as wavelet decomposition.
References
[1] Gregory K. Wallace. \The JPEG still compression standard". Communications on the ACM, Vol. 34, No. 4, pp. 31{44, April 1991. [2] Ad hoc group on MPEG-4 video VM editing. MPEG-4 Veri cation Model Version 7. International Organisation for Standardisation, ISO/IEC JTC1/SC29/WG11, May 1997. [3] F. Bossen N. Brady and N. Murphy. \Contextbased Arithmetic Encoding of 2D Shape Sequences". to be published in ICIP'97.
Figure 6: Increase of the bitrate between the MPEG-4 VM and the method using 2 layers. From top to bottom, the sequences are \Destruction" layer 6, \Destruction" layer 8, \Weather" and \Children".