nipulation. This paper presents a new algorithm and feature model for blue-sky detection. The algorithm classifies the sky areas by computing a pixel-accurate ...
Blue Sky Detection for Picture Quality Enhancement Bahman Zafarifar2,3 and Peter H. N. de With1,2 1
3
Eindhoven University of Technology, PO Box 513, 5600 MB, The Netherlands, {B.Zafarifar, P.H.N.de.With}@tue.nl 2 LogicaCMG, PO Box 7089, 5600 JB Eindhoven, The Netherlands Philips Innovative Applications (CE), Pathoekeweg 11, 8000 Burges, Belgium
Abstract. Content analysis of video and still images is attractive for multiple reasons, such as enabling content-based actions and image manipulation. This paper presents a new algorithm and feature model for blue-sky detection. The algorithm classifies the sky areas by computing a pixel-accurate sky probability. Such a probabilistic measure matches well with the requirements of typical video enhancement functions in TVs. This algorithm enables not only content-adaptive picture quality improvement, but also more advanced applications such as content-based annotation of, and retrieval from image and video databases. When compared to existing algorithms, our proposal shows considerable improvements in correct detection/rejection rate of sky areas, and an improved consistency of the segmentation results.
1
Introduction
Sky is among the objects of high visual importance, appearing often in video sequences and photos. A sky-detection system can be used for different applications. At the semantic level, sky detection can contribute to image understanding by e.g. indoor/outdoor classification or automatic detection of image orientation. At this level, applications of sky detection include content-based actions such as image and video selection and retrieval from data-bases, or object-based video coding. At the pixel level, sky detection can be used for content-based image manipulation, like picture quality improvement using color enhancement and noise reduction, or as background detection for 3D depth-map generation. Content-adaptive processing in general, and sky detection in specific, can be used in high-end televisions. Modern TVs employ a variety of signal-processing algorithms for improving the quality of the received video signal. The settings of these processing blocks are often globally constant or adapted to some local pictorial features, like color or the existence of edges in the direct neighborhood. Such features are often too simple to deal with the diverse contents of video sequences, leading to a sub-optimal picture quality as compared to a system that locally adapts the processing to the content of the image. The above-mentioned local adaptation can be realized if the image is analyzed by a number of object
detectors, after which areas of similar appearance are segmented and processed with algorithms optimized to the features of each area [1]. Due to its smooth appearance, noise and other artifacts are clearly visible in sky regions. This motivates using appropriate image enhancement techniques specifically in the sky regions. The existence of special circuits in high-end TVs for improving the color in the range of sky-blue also illustrates the subjective importance of sky. Our objective is to develop a sky-detection algorithm, suitable for image enhancement of video sequences. This implies that the detection must be pixel accurate and consistent, and allow for real-time embedded implementation. Previous work on sky detection includes a system [2][3], based on calculating an initial ”sky belief map” using color values 4 and a Neural Network, followed by connected-area extraction. These areas may be accepted or rejected using texture and vertical color analysis, and the degree of fitting to a two-dimensional (2D) spatial model. While this method yields useful results in annotating sky regions, we found it not suitable for the requirements of video applications concerning spatial consistency. The algorithm takes crisp classification decisions per connected-area, leading to abrupt changes in the classification result. As an example, patches of sky may be rejected when their size reduces during a camera zoom-out. A second system proposed in [4][5] is based on the assumption that sky regions are smooth and are normally found at the top of the image. Using predefined settings, an initial sky probability is calculated based on color, texture and vertical position, after which the settings are adapted to regions with higher initial sky probability. These adapted settings are used for calculating a final sky-probability. The employed pixel-oriented technique (as opposed to the connected-area approach of the first system) makes this system suitable for video applications. However, due to its simple color modeling, this method often leads to false detections, such as accepting non-sky blue objects as sky, and false rejections, like a partial rejection of sky regions when they cover a large range in the color space. We propose an algorithm that builds upon the above-mentioned second system, and and exploits its suitability for video applications, while considerably improving the false detection/ rejection rates. The proposed sky detector is confined to blue-sky regions, which includes both clear blue sky and blue sky containing clouds. Experimental simulations of our new proposal indicate a substantial improvement in the correct detection of sky regions covering a large color range, and the correct rejection of non-sky objects when compared to the algorithm proposed in [4][5], as well as an improved spatial consistency with respect to the system described in [2][3]. 4
In this paper, “color” denotes all color components. When a distinction between chromaticity and gray values is required, we use the terms “luminance” and “chrominance”
The remainder of the paper is organized as follows. Section 2 characterizes the sky features, Section 3 describes the proposed algorithm, Section 4 presents the results and Section 5 concludes the paper.
Fig. 1. Various appearances of sky, from left to right: dark, light, large color range, occluded.
2
Observation of Sky Properties
In this section, we discuss the features of sky images, and address the challenges for modeling the sky. Sky can have a variety of appearances, such as clear sky, cloudy sky, and overcast sky (see Fig. 1). Sky color can cover a large part of the color space, from saturated blue to gray, or even orange and red during sun-set. Consequently, a system based on temporally-fixed color settings is likely to fail in correctly detecting different sky appearances. In addition, sky regions can significantly vary in color within an image: a wide-shot clear-sky image tends to be more saturated at the top and becomes less saturated near the horizon, while the luminance tends to increase from the top of the image towards the horizon. As a result, a sky detector using a spatially-fixed color is likely to reject parts of the sky region, when the sky color considerably changes within one image. An additional challenge is the partial occlusion of sky by foreground objects, cutting the sky into many disconnected parts. In order to prevent artifacts in the post-processed video, it is important that all sky areas are assigned coherent probabilities. Another non-trivial task is distinguishing between sky, and objects which look similar to sky but are actually not a part of it. Examples are areas of water, reflections of sky, or other objects with similar color and texture as sky. In the following section, we propose a system that addresses the aforementioned issues.
3
Algorithm Description
3.1
Sky Detector Overview
We propose a sky-detection system based on the observation that blue-sky regions are more likely to be found at the top of the image, they cover a certain part of the color space, have a smooth texture, and the pixel values show limited horizontal and vertical gradients. The algorithm contains three stages, as depicted in Fig. 2. Input Image (YUV)
Initial Settings
Initial Sky Probability Calculation
PskyInitial
Adapting Settings
Adapted Settings
Final Sky Probability Calculation
PskyFinal
Y,U,V Texture settings
Expected vertical position
Y
Vert. Pos.
Pposition Y,U,V
Expected color
Adaptive Threshold level & global sky confidence metric
Ptexture
Pcolor
PskyInitial
Ptexture Adaptive vertical position
Adaptive vertical position Adaptive expected sky color
Y
Texture settings
Vert. Pos.
Pposition Adaptive Expected color
PskyFinal
Y,U,V
Pcolor
Fig. 2. Block diagram of the Sky detector, divided in three stages.
Stage 1 : Initial sky probability. In this stage, an initial sky-probability map is calculated based on the color, vertical position and texture of the image pixels. The texture analysis also includes horizontal and vertical gradient measures. The settings of this stage are fixed, and are chosen such that all targeted sky appearances can be captured. Stage 2 : Analysis and sky-model creation. In this stage, the fixed settings of the first stage are adapted to the image under process. As such, the settings for the vertical-position probability and the expected sky color are adapted to the areas with high sky probability. For the expected color, a spatially-varying 2D model is created that prescribes the sky color for each image position. Stage 3 : Final sky probability. In this stage, a pixel-accurate sky probability is calculated based on the color, vertical position and (optionally) texture of the image, using the adaptive model created in Stage 2. With respect to implementation, we have adopted the YUV color-space because the sky chrominance components in the vertical direction of the image, tend to traverse linearly in the UV plane from saturated blue through gray to red. In order to reduce the amount of computations, the image is down-scaled to QCIF resolution for usage in Stage 1 and 2. However, Stage 3 uses the image at the original resolution in order to produce pixel-accurate results. Sections 3.2, 3.3 and 3.4 describe the three stages of the algorithm in more detail.
3.2
Initial Sky Probability
Using predefined settings, an initial sky probability (PskyInitial ) is calculated using a down-scaled version of the image. We combine color, vertical position, and texture to compute the initial sky probability as PskyInitial = Pcolor × Pposition × Ptexture . 1 . The color probability is calculated using a three-dimensional Gaussian function for the Y, U and V components, centered at predetermined positions Y0 , U0 and V0 (representing the expected sky color), with corresponding standard deviations σy1 , σu1 and σv1 . The settings are chosen such that all desired sky appearances are captured. The color probability is defined as 2 2 2 Y −Y0 U −U0 V −V0 − + σu1 + σv1 σy1 Pcolor = e . 2 . The vertical-position probability is defined by a Gaussian function, which has its center at the top of the image, starting with unity value and decreasing to 0.36 at the bottom of the image, so that 2 r − height Pposition = e , where r is the vertical coordinate of the current pixel (at the top of the image r = 0) and height denotes the total number of rows (i.e. TV lines) of the image. 3 . The calculation of the texture probability is based on a multi-resolution analysis of the luminance channel of the image. The analysis assigns low probabilities to parts of the image containing high luminance variation, or excessive horizontal or vertical gradients. This probability can be used to eliminate the textured areas from the initial sky probability. More specifically, three downscaled (with factors of 2) versions of the luminance channel are analyzed using a fixed window-size (of 5×5 pixels), and the results are combined in the lowest resolution, using the minimum operator. The texture analysis uses the following two measures. SAD: The local smoothness of the image can be measured by the luminance variation. Using the Sum of Absolute Differences (SAD) between horizontallyadjacent, and vertically-adjacent pixels in the analysis window, we calculate the luminance variation in the surrounding of the current pixel. The horizontal and vertical SAD (SADhor and SADver ) lead to a probabilistic measure PSAD as follows SADhor (r, c) =
SADver (r, c) =
1 NSAD 1 NSAD
w X
w−1 X
|Y (r + i, c + j) − Y (r + i, c + j + 1)| ,
w−1 X
w X
|Y (r + i, c + j) − Y (r + i + 1, c + j)| ,
i=−w j=−w
i=−w j=−w
∞ 2
PSAD = e− ([SADhor + SADver − TSAD ]0 ) . Here, r and c are the coordinates of the pixel in the image, w defines the size of the analysis window (window size= 2w + 1), and i and j are indices of the window. The factor 1/NSAD is used to normalize the SAD to the total number of the pixel differences within the window (NSAD = (2w + 1) ∗ 2w), and TSAD is a noise-dependent threshold level. The symbol [.]∞ 0 denotes a clipping function defined as b
[f ]a = M in (M ax(f, a), b) . Gradient: we observe that luminance values of the sky regions have limited horizontal and vertical gradients, and that the luminance often increases in top-down direction. We define the vertical gradient (gradver ) as the difference between the sum of pixel values of the upper-half of the analysis window, and the sum of the pixel values of the lower-half of the analysis window. The horizontal gradient (gradhor ) is defined similarly, using the pixels of the left-half and the pixels on the right-half of the analysis window. For pixel coordinate (r, c) this leads to w w −1 w X X 1 X X gradhor (r, c) = Y (r + i, c + j) , Y (r + i, c + j) − Ngrad i=−w j=−w i=−w j=1 gradver (r, c) =
1 Ngrad
−1 X
w X
i=−w j=−w
Y (r + i, c + j) −
w w X X
i=1 j=−w
Y (r + i, c + j) ,
where the factor 1/Ngrad normalizes the gradient to the size of the window (Ngrad = w ∗ (2w + 1)). Using appropriate threshold levels, the horizontal and vertical gradients are translated to a probability Pgrad , calculated as ∞
∞
∞ 2
Pgrad = e− ([Tvl − gradver ]0 + [gradver − Tvu ]0 + [|gradhor | − Th ]0 ) , where Tvl and Tvu are the threshold levels for the lower and upper bounds of the vertical gradient respectively, and Th is the threshold level for the horizontal gradient. These thresholds are fixed values, determined by a set of training images. Using separate thresholds for the upper and lower bounds in the vertical direction allows an increase, and penalized a decrease of the luminance in the downwards image direction. Finally, the texture probability Ptexture combines PSAD and Pgrad as Ptexture = PSAD × Pgrad .
3.3
Analysis and Sky-Model Creation
In this stage, the initial sky probability (calculated in Stage 1) is analyzed in order to create adaptive models for the color and vertical position used in the final sky-probability calculation. This involves the following steps. 1 . Calculating Adaptive threshold level and global sky confidence metric: the initial sky probability needs to be segmented in order to create a map of regions with high probability. Simple measures for threshold determination, such as using the maximum of the sky-probability map as proposed in [5] can perform inadequately, for example by favoring small objects with high sky probability over larger non-perfect sky regions. In order to avoid this problem, we propose a more robust method that takes both the size and the probability of sky regions into account, by computing an adaptive threshold and a global sky confidencemetric Qsky . The confidence metric yields a high value if the image contains a significant number of pixels with high initial sky probability. This prevents small sky-blue objects from being accepted as sky, in images where no large areas with high sky probability are present. The calculation steps are as follows: first the Cumulative Distribution Function (CDF) of the initial sky probability is computed, after which it is weighted using a function that emphasizes the higher sky probability values and decreases to zero towards the lower sky probability values. Due to this weighting, the position of the maximum of the resulting function (weighted CDF) includes our preference for higher probability values, while being dependent on the distribution of the initial sky probability values. Therefore, this position can be used to determine the desired adaptive threshold. The maximum amplitude of the weighted CDF is dependent on the number of pixels with relatively high sky probability, and thus can be used for determining the aforementioned confidence metric Qsky . 2. Adaptive vertical position: the areas with high sky-probability are segmented by thresholding the initial sky-probability map, with the threshold level described in the previous paragraph, after which the mean vertical position of the segmented areas is computed. This adaptive vertical position is used to define a function, which equals unity at the top of the image and linearly decreases towards the bottom of segmented sky region. This function is then used for computing the final sky probability. 3. Adaptive expected sky color: as mentioned in Section 2, the sky detector needs to deal with the wide range of sky color values within and between different frames. In [5], it is proposed to use frame-adaptive, but further spatially-constant expected colors. This method addresses the problem of large color variation between frames, but fails when the sky covers a considerable color range within one frame, resulting in a partial rejection of the sky areas. To address this problem, we propose to use a spatially-adaptive expected sky color. To this end, each signal component (Y, U, and V) is modeled by a spatially-varying 2D function, that is fitted to a selected set of pixels with high sky probability. An example of model fitting technique is as follows. Using a proper adaptive threshold, the initial sky probability is segmented to select sky regions with
high sky probability. Next, the segmented pixels are selected with a decreasing density in top-down direction. This exploits our assumption that the pixels at the top are more important for model fitting than those near the bottom, and ensures that the model parameters are less influenced by reflections of sky or other non-sky blue objects below the actual sky region. The last step is to use the values of the (Y, U, V) signal components of these selected pixels to fit the 2D function of the corresponding signal component. The choice of the color model and the fitting strategy of the 2D functions depend on the required accuracy and the permitted computational complexity. We implemented (1) a 2D second-degree polynomial, in combination with a leastsquares optimization for estimating the model parameters, and (2) a model which uses a matrix of 23×18 values, per color component, for representing the image color [6]. The 2nd -degree polynomial model offers sufficient spatial flexibility to represent typical sky colors, but is computationally expensive (the presented results in this paper use this model). The second model, also offers the necessary flexibility, and is in addition more suitable for hardware implementation. 3.4
Final Sky Probability
Using the adaptive model created in Stage 2, we compute a pixel-accurate final sky-probability map as PskyF inal = Pcolor2 × Pposition2 × Ptexture2 × Qsky , where Qsky denotes the sky confidence metric. The required pixel accuracy is achieved by using the original image resolution, and applying a moderate texture measure to prevent distortion in the final sky probability map, near the edges of non-sky objects. The following paragraphs further describe the features applied in this stage. 1. The color probability is calculated using a 3D Gaussian function for Y, U and V components, centered at the spatially-varying values Y0,(r,c) , U0,(r,c) and V0,(r,c) (representing expected sky color at the spatial position (r, c)), with corresponding standard deviations σy2 , σu2 and σv2 . In order to reduce false detections, these standard deviations are reduced with respect to the values of stage 1. 2. As opposed to the fixed vertical-position function used for initial sky probability, the final stage uses an adaptive vertical probability function, which is tuned to cover the sky areas with high sky probability, as calculated in Stage 2. 3. The inclusion and the type of texture measure depend on the application for which the sky detection output is used. For some applications, using a texture measure in the final sky-probability calculation could lead to undesirable effects in the post-processed image, while other applications may require some form of a texture measure. For example, for noise removal in the sky regions, we found it necessary to reduce the sky probability of pixels around the edges of objects adjacent to sky, in order to retain the edge sharpness. This was done by taking the Sobel edge detector as texture measure.
Fig. 3. Examples of improved correct detection, left: input, middle: proposed by [4][5], right: our algorithm.
Fig. 4. Examples of improved correct rejection, left: input, middle: proposed by [4][5], right: our algorithm.
Fig. 5. Examples of improved spatial accuracy, left: input, middle: proposed by [2][3] (courtesy of Eastman Kodak), right: our algorithm.
4
Experimental Results
We applied the proposed algorithm on more than 200 sky images. The images were selected to present a large variety of sky appearances, many including sky reflections and other challenging situations. Figure 3 compares our results to [4][5]. In Fig. 3-top, the halo (top-middle) is resulted by the spatially constant color model used in [4][5], while the spatially adaptive color model employed in our algorithm is capable of dealing with the large color range of the sky area (top-right). A similar difference in the results can be seen in Fig. 3-bottom, where in addition, the reflection of the sky is removed by the gradient analysis. Figure 4 shows the improved correct rejection of non-sky objects (areas or water in Fig. 4-top and mountains in Fig. 4-bottom), which has been achieved because of the multi-resolution texture analysis. Lastly, Fig. 5 shows the greatly improved spatial accuracy of our results in comparison to [2][3]. This is due to the two-pass approach for calculating the sky probability, in which the second pass uses the original image resolution and a moderate texture measure. When compared to [4][5], our experiments indicate a substantial improvement in the correct detection of sky regions covering a large color range due to the spatially varying color model, and an improved correct rejection of non-sky objects due to the multi-resolution texture analysis. When compared to [2][3], we observed an improved spatial consistency of the segmentation results. Here, a notable improvement in the correct detection was discovered in 16 out of 23 images, where a side-by-side visual comparison was made. In the remaining cases, our proposal performed comparable to the existing system. In many of these cases we still prefer our proposal, as it is based on a smooth probability measure, whereas the existing system produces crisp results, which is more critical in the case of false detections for video applications. An experiment with a video sequence indicated that the spatial consistency also improves the temporal behavior of the system. More algorithmic tuning and experiments will have to be conducted to validate this conjecture. A simplified version of the proposed algorithm is currently being implemented as a real-time embedded system, using FPGA technology. Preliminary mapping results indicate that a real-time implementation is feasible on a standard FPGA device.
5
Conclusions
Sky detection for video sequences and still images can be used for various purposes, such as automatic image manipulation (e.g. picture quality improvement) and content-based directives (e.g. interactive selection and retrieval from multimedia databases). The main problems with the existing algorithms is incomplete detection of sky areas with large ranges of color, false detection of sky reflections or other blue objects, and inconsistent detection of small sky areas. This paper has presented a sky-detection algorithm which significantly reduces the mentioned problems, and has suitable properties for video applications. This was
achieved by constructing a sky model that incorporates a 2D spatially-varying color model, while reusing the vertical position probability from an existing method. Moreover, we have introduced a confidence metric for improving the consistency and removal of small blue objects. Wrong detection of the reflections of sky areas and other non-sky objects has been reduced by employing a gradient analysis of the luminance component of the sky. Experimental results show that the proposed algorithm is capable of handling a broad range of sky appearances. The two primary advantages of the proposed algorithm are increased correct detection/rejection rates, and an improved spatial accuracy and consistency of the detection results. Our future work includes developing additional measures for meeting the requirements of real-time video applications. Particularly, the key parameters of the system, such as the vertical position model, the color model, and the confidence metric need to be kept consistent over time. Furthermore, the algorithm will be optimized for implementation in consumer television systems.
6
Acknowledgement
The authors gratefully acknowledge Dr. Erwin Bellers for his specific input on existing algorithms. We are also thankful to Dr. Jiebo Luo for providing us with the results of sky-detection algorithm described in [2][3] on a number of sample images.
References 1. S. Herman and J. Janssen, “System and method for performing segmentation-based enhancements of a video image”, European Patent EP 1 374 563, date of publication: January 2004. 2. A.C. Gallagher, J. Luo, and W. Hao, “Improved blue sky detection using polynomial model fit,” in IEEE International Conference on Image Processing, October 2004, pp. 2367–2370. 3. J. Luo and S. Etz, “Method for detecting sky in images”, European Patent EP 1 107 179, date of publication: February 2001. 4. S. Herman and E. Bellers, “Locally-adaptive processing of television images based on real-time image segmentation,” in IEEE International Conference on Consumer Electronics, June 2002, pp. 66–67. 5. S. Herman and E. Bellers, “Adaptive segmentation of television images”, European Patent EP 1 573 673, date of publication: September 2005. 6. P.H.N. de With B. Zafarifar, “Adaptive modeling of sky for video processing and coding applications,” in 27th Symposium on Information Theory in the Benelux, June 2006, pp. 31–38.