A two-level classification-based color constancy

0 downloads 0 Views 903KB Size Report
performance, some new color constancy methods have been .... ations of a more general color constancy algorithm based on ...... tistics and scene semantics.
SIViP DOI 10.1007/s11760-013-0574-7

ORIGINAL PAPER

A two-level classification-based color constancy Mohammad Mehdi Faghih · Mohsen Ebrahimi Moghaddam

Received: 12 March 2013 / Revised: 12 October 2013 / Accepted: 13 October 2013 © Springer-Verlag London 2013

Abstract This paper addresses color constancy, the problem of finding true color of objects independent of the light illuminating the scene. However, many algorithms exist in this scope, they are all based on specific assumptions and none of them is universal. Therefore, in order to achieve better performance, some new color constancy methods have been proposed, which most of them are combinational algorithms. In this paper, a new combinational method is proposed; the proposed method consists of two steps: first, a classifier is used to determine the best group of color constancy algorithms for the input image; then, some of the algorithms in this group are combined to estimate the scene illuminant. In this way, it always combines the algorithms that have good performance for the input image, and as a result, the overall performance increases. The proposed method has been evaluated using multiple benchmark datasets, and the experimental results showed that the proposed approach outperformed current state-of-the-art algorithms. Keywords Color constancy · Illuminant estimation · Scene classification · Support vector machine

1 Introduction The color of an object in real world depends on physical characteristics of objects as well as color of the light source illuminating the scene containing the object. Thus, the object color may appear different if the color of illuminant changes. M. M. Faghih · M. E. Moghaddam (B) Electrical and Computer Engineering Department, Shahid Beheshti University, 1983963113 G. C., Tehran, Iran e-mail: [email protected] M. M. Faghih e-mail: [email protected]

Human visual system can somewhat remove the effect of the scene illuminant on the color of objects, and therefore, a human can recognize same color of an object even if the scene illuminant largely changes. This ability is called color constancy, which is very important in color-based computer vision tasks like color object recognition [1]. There are many color constancy methods in the literature; see [2] for a complete review. In general, it is possible to categorize the color constancy algorithms into three main groups. The first group contains algorithms that use information gained from a learning phase to estimate the illuminant, while the methods of the second group are based on the low-level image features. Algorithms that try to reach better results by combining other algorithms or selecting the best algorithm for a given image are placed in the third group. Our previous works [3– 5] and also other neural network-based approaches [6– 8] are instances of the first group. Gamut-based methods [9,10], genetic-based algorithms [11], and probabilistic methods (Color By Correlation) [12] are also subsets of the first group. Examples of the second group are GrayWorld [13,14], White-Patch [15], Shades of Grays [16], and Gray-Edge [17] algorithms. The NIS [18] and CAS [19] algorithms are two new algorithms in the third group. More details about these methods have been mentioned in Sect. 3. The problem with the second group of color constancy methods is that all of them are based on specific assumptions. This limits their use to conditions that satisfy the assumptions. For example, Gray-World algorithm assumes that, on average, the world is gray. This assumption is correct only when there is enough large number of different colors in the image. If this is not the case, then the Gray-World algorithm does not work correctly [20]. For example, the Gray-World is not proper for images with large regions of color which

123

SIViP

are far from the gray. Algorithms in the third group solve this problem to some extent. But while combining multiple algorithms for a given image, the results are not as well as expected if a good algorithm is combined with a bad one. So, it is important to combine the algorithms that have good performance on the input image. The state-of-the-art algorithms in third group (e.g., CAS and NIS) do not take this into account. They only assign a weight to each predefined method and combine all of them. Although the algorithm may assign a near zero weight to bad methods, a very bad method with a near zero weight still may decrease the accuracy of the combination output. Also, considering the bad methods in the process of determining combination weights increases the complexity of this process and as a result, the algorithm may not be able to determine the weights precisely. In this paper, a new color constancy method is proposed, which is classified under both the second and the third group of color constancy methods. For a given image, the method involves two steps. In the first step, the best group of color constancy algorithms for that image is determined by using a classifier based on image features, and then some of the algorithms in this group are combined using a neural network to estimate the scene illuminant. In this way, the proposed method always tries to combine algorithms that have good performance for the input image, and as a result, the overall performance increases as shown by experimental results. The rest of the paper has been organized as follows: Sect. 2 describes color constancy concept and reflection model. The related works have been presented in Sect. 3. Section 4 discusses the proposed approach, and in Sect. 5, the proposed approach has been evaluated using two benchmark color constancy datasets. Finally, Sect. 6 concludes the paper. 2 Color constancy Assuming the Lambertian reflectance, an image taken by a digital camera can be seen as function f that depends on three physical factors: the illuminant spectral power distribution e(λ), the surface spectral reflectance s(λ), and the camera sensitivity function ρ(λ). Using this notation, the sensor responses at pixel with coordinate (x, y) form as the following [18]:  (1) f (x, y) = e(λ)s(x, y, λ)ρ(λ)dλ ω

where ω is the visible spectrum. Suppose that the observed color of the light source e depends on the color of the light source e(λ) and the camera sensitivity function ρ(λ), the color constancy is equivalent to estimating e using the following Eq. [18]:

123

⎞T  eR e = ⎝ eG ⎠ = e(λ)ρ(λ)dλ eB ω ⎛

(2)

After estimating the color of the light source, the image colors can be corrected using this estimate to produce a new imageof the sceneas it is taken under a perfect white light [i.e., √1 , √1 , √1 ]. Under the assumption of the diago3 3 3 nal model [9], the correction is simple; first, the image values are divided by e and then the white light is multiplied to the image. The problem of color constancy is an underconstrained problem because in general, both values of e(λ) and ρ(λ) are unknown. Therefore, it is not possible to solve this problem without further assumptions. Hence, most color constancy algorithms assume that an hypothesis is met while trying to estimate e.

3 Related works Gary-World [13], which is a popular color constancy algorithm, is based on the assumption that the average reflectance in a scene is achromatic. This assumption is equal to the following equation: s(λ, x)dx =k (3) dx Having achromatic reflectance for a scene makes it possible to compute the color of the light source as follows:   f(x)dx 1 = e(λ)s(λ, x)ρ(λ)dλdx (4) dx dx ω

 s(λ, x)dx ρ(λ)dλ (5) = e(λ) dx ω  = k e(λ)ρ(λ)dλ = k · e (6) ω

Here, k is a constant that is chosen such that the illuminant color e has unit length. Another common color constancy algorithm is WhitePatch [15], which assumes that the surface in the scene with highest luminance (White-Patch) reflects maximally and uniformly over the spectrum. Using this assumption, the color of light source can be approximated by the color of brightest patch in the image [20]. In [16], it is shown that the GrayWorld and White-Patch algorithms are two different instantiations of a more general color constancy algorithm based on Minkowski norm. This algorithm is called Shades of Gray and can be computed using the following formula: 1/ p

( f (x)) p dx = k · e (7) dx

SIViP

where p is Minkowski norm; by setting p = 1, the Eq. (7) is equal to Gray-World assumption, and for p = ∞, it is equivalent to White-Patch. Van de Weijer et al. [17] proposed the Gray-Edge algorithm that uses the higher-order statistics of image in the form of image derivatives. It is based on the Gray-Edge assumption that assumes the average of reflectance differences in a scene is achromatic. The formal description of this assumption is as follows [17]: σ s (λ, x) dx x =k (8) dx It is clear from this assumption that Gray-Edge would work well on images with high detail. The sxσ indicates the spatial derivative that is defined as the convolution of the image with the derivative of a Gaussian filter with scale parameter σ [21]: ∂ s+t G σ ∂ s+t f c,σ = f ∗ c ∂ x s ∂ yt ∂ x s ∂ yt

(9)

where ∗ denotes the convolution, and s + t is equal to the order of the derivative. Using Eq. (8), the computation of scene light source color [using a similar approach to Eqs. (4), (5) and (6)] can be done as follows: σ f (x) dx x = k · e (10) dx where | f x (x)| = (|Rx (x)| , |Gx (x)| , |Bx (x)|)T . In reference [17], a more general framework has been proposed based on Minkowski norm that represents the wellknown methods like Eq. (7) as well as methods based on higher-order statistics using the following formula: 1/ p

 n σ ∂ f (x) p dx = k · en, p,σ (11) ∂xn In Eq. (11), n is the order of the derivative, σ is the scale parameter of Gaussian filter, the division by dx is incorporated into the constant k, and also it is assumed that the pth Minkowski norm of the nth order derivative of the reflectance in a scene is achromatic: 1/ p

 n σ ∂ s (x) p dx =k (12) ∂xn Equation (11) is a framework that represents the Gray family of color constancy algorithms, and by changing its parameters, different color constancy algorithms are generated such as: (1) e0,1,0 is equal to Gray-World algorithm. (2) e0,∞,0 is equivalent to White-Patch algorithm. (3) e0, p,0 is the Shades of Gray algorithm.

(4) e0, p,σ is the family of zero-order color constancy algorithms. (5) e1, p,σ is the family of first-order color constancy algorithms. (6) e2, p,σ is the family of second-order color constancy algorithms. The edge-based color constancy is recently modified by computing a weighted average of the edges [22]. Reference [22] showed that different types of edges in an image (material edges, intensity shadow edges, color shadow edges, specular edges, and inter-reflection edges) have different amount of information about the color of light source, and the performance of edge-based color constancy can be improved by properly weighting different types of edges. It showed that among the different types of edges, using specular edges in the Gray-Edge algorithm results in near-perfect color constancy, and intensity shadow edges are more favorable than three other types of edges. Based on this fact, this reference suggested to use different types of edges with different weights to improve the edge-based color constancy. The main problem with this approach is that the photometric edge classification method, which is used for determining the type of edges, assumes the scene is illuminated by a white light source, and therefore, the automatic detection of edge type becomes erroneous when the color of the light source is not white (which is the case prior to applying color constancy). Therefore, reference [22] proposed an iterative weighting scheme that sequentially estimates the color of the light source and updates the computed edge weights. The rationale is that after applying the color constancy, the illuminant should be neutral (at least in theory). Therefore, at first, the input image is corrected with an estimated illuminant and then weights are computed using this color-corrected image. A new updated estimate of the illuminant is calculated by the weighted Gray-Edge algorithm based on this weighting scheme. This process is applied iteratively, and the weighted Gray-Edge is applied in each iteration and a new instantiation of the weighting scheme is computed based on the corrected image. After some iterations, the estimation approach approximates a white light source that its accuracy no longer increases so in this case the method converges. The neural network-based methods train a neural network using some features extracted from a dataset of images and the real scene illuminant as network inputs and output, respectively. Then, the network is used to estimate the scene illuminant of an unseen image. For example, Barnard et al. [23] proposed a multilayer perceptron (MLP) neural network-based approach for color constancy. They used color histogram of images as training features, and to reduce the complexity of the network, they trained their network using color histogram of images in rg-chromaticity space. However, the proposed network architecture is still complex;

123

SIViP

it consists of 3,600 input nodes and 400 neurons in the hidden layer, 40 neurons in the second hidden layer, and 2 output neurons. Another drawback of this approach is that using rg-chromaticity space discards all intensity information, while intensity information can help in estimating the illuminant [23]. Some other neural network-based approaches for color constancy [6,8] also used the color histogram of images in rg-chromaticity space, and hence, they all suffer from mentioned drawbacks. The NIS [18] method is based on the fact that the distribution of edge responses of an image can be modeled using Weibull distribution as follows [24]:  γ

γ sx γ −1 − sβx e (13) w(sx ) = β β where sx is the edge responses in a single color channel, β > 0 is the scale parameter of the distribution, and γ > 0 is the shape parameter. The parameters of Weibull distribution are representative for the scene statistics when it is fitted on edge responses of an image. The parameter β represents the contrast of the image, and the parameter γ indicates the grain size. So, a higher value for β indicates more contrast, whereas a higher value for γ indicates a smaller grain size (more fine textures). In NIS paper, it is also shown that the Weibull parameters of images are useful for determining the best algorithm for the images. In fact, for a group of images that one algorithm is the best, the Weibull parameters can be clustered together. So, in NIS algorithm, a classifier is learned using Weibull parameters, and then, the learned classifier is used to select the best algorithm for new images. CAS [19] is another method that uses decision tree in order to select the best algorithm for a given image. In this approach, a color constancy algorithm is placed in each leaf of decision tree and every intermediate node contains a criterion, which is determined in the training phase based on image features. For a given image, the tree traversal begins from the root; then in each node, the feature corresponds to that node is extracted from image and the next node is selected by comparing the extracted feature and the node criterion. The traversal continues till reaching a leaf node, which contains an algorithm that is expected to be the best algorithm for the given image. The CAS and NIS algorithms also provide a way for combining multiple algorithms. The classifier used in NIS and the decision tree of CAS assign a weight to each pre-selected algorithm, and the combination can be done by weighted averaging the output of algorithms. All the mentioned algorithms of the color constancy have a common assumption that is “the scene is illuminated by a single and uniform light source.” This assumption in some cases may not be a proper assumption. Therefore, some color constancy methods have been proposed to consider the cases that the assumption of uniform illumination is not met. For example, reference [25] proposed an algorithm that uses

123

information from both surface reflectance and illumination variation to solve the color constancy problem. This method first uncovers the illumination variation in an image and then uses the additional constraints to obtain better color constancy results. It incorporates knowledge about the set of plausible illuminants and from this set derives information about the possible chromaticity changes within a region of uniform reflectance due to a change in illumination. Once the illumination variation is uncovered, it is combined with other constraints arising from the set of colors found in the image. Therefore, the algorithm can provide good color constancy results when there is sufficient variation in surface reflectances, sufficient illumination variation, or a combination of both. In this paper, the focus is on the methods with the assumption of uniform illumination and the interested reader can refer to the [25] for the nonuniform illumination scenario. 4 Proposed approach The proposed method is a combinatorial approach that tries to combine some Gray algorithms that have better performance in comparison with other ones. This method is based on an observation: when a Gray color constancy algorithm of order n is the algorithm with the best performance on a natural image, most of the time the next k best Gray algorithms for that image is also of order n. In other word, instead of saying that the best Gray algorithm for a given image is of order n, we can say that for the given image, the k best Gray algorithms have order n. This observation is the result of an experiment using the Gray-Ball color constancy dataset [26] (a dataset that contains 11,346 natural images). In this experiment, 75 different random Gray algorithms (25 zero- order, 25 firstorder, and 25 second-order algorithms) have been generated by setting the parameters of the Eq. (11) as follows:  p = random integer ∈ [1, 13]  Zero order e0, p,σ : σ = random integer ∈ [0, 11]  p = random integer ∈ [1, 13]  First order e1, p,σ : σ = random integer ∈ [1, 11]  p = random integer ∈ [1, 13]  Second order e2, p,σ : σ = random integer ∈ [1, 11] In the parameter adjustment, we tried to select parameters in such a way that all cases of related works in the literature were considered. For example, the maximum value of σ that has been used in the literature is 7 [17]; therefore, the range for random values of this parameter has been chosen as [1,11] to cover all cases. The same manner has been used for random values of p, and its range has been chosen as [1,13] (note that the valid range for parameter p is [1, ∞] and p = ∞ converts algorithm to White-Patch, which is not a high usage method, so it is excluded from test algorithms).

SIViP

Then, the performances of all 75 Gray algorithms have been evaluated on the dataset and the dataset was divided into three subsets: zero-order subset contains the images that the order of the best algorithm for all of them is zero, first-order subset contains the images that the order of the best algorithm for all of them is one, and second-order subset contains the images that the order of the best algorithm for them is two. The percentages of dataset images that fall into each subset are 41, 23, and 36 % respectively. These values have been calculated by selecting the order of the best algorithm for each image (algorithm with the minimum angular error on that image among all 75 algorithms) as the category that it falls into. Then, for each subset, the average percentages of images in the subset that the orders of two, three, four, or five best algorithms for them are the same have been calculated and shown in Table 1. As it can be seen in Table 1, when a zero-order algorithm is the best algorithm for an image, on average, in 99.5 % of cases, the second best algorithm for that image is also a zero-order algorithm. This value for the firstorder and the second-order algorithms is about 90 %. Also, the percentages of images in zero-order subset that three, four, and five best algorithms for them have the same order are 99.1, 98.3, and 97.9 % respectively. For first- and secondorder algorithms, these values are somewhat lower. This is because that the first- and second-order algorithms are very similar in nature. As it is shown in Table 2, when a higher-order (first and second order) algorithm is the best algorithm for a set of images, on average, the second best algorithm for 98.1 % of the set is also a higher order. Also, the average percentages of images that the three, four, and five best algorithms for them are higher-order algorithms are 99.1, 98.3, and 97.9 %, respectively.

Table 1 The average percentages of images in zero-order, first-order, and second-order subsets of the Gray-Ball dataset with the same order of two, three, four, or five best algorithms Two best (%)

Three best (%)

Four best (%)

Five best (%)

Zero-order subset

99.5

99.1

98.3

97.9

First-order subset

90.6

81.1

66.1

58.8

Second-order subset

89.9

82.9

74.4

61.1

Table 2 The average percentages of images in zero-order and higherorder subsets of the Gray-Ball dataset that the orders of two, three, four, or five best algorithms for them are the same Two best (%)

Three best (%)

Four best (%)

Five best (%)

Zero-order subset

99.5

99.1

98.3

97.9

Higher-order subset

98.1

97.1

94.2

93.9

From Tables 1 and 2, it can be concluded that in general, all five best Gray algorithms for each natural image are either zero- or higher-order ones. So, in order to choose best Gray algorithms for combination, we must determine the best order and then select some of algorithms with that order. This approach is very simpler than the previous approaches like NIS [18] and CAS [19]. NIS and CAS methods try to select the best algorithm for a given image from a list of predefined algorithms, which is a more complex task in comparison with the determination of best order. The determination of best order only needs a simple classifier with two or three class, while the number of classes in the required classifier for the selection of best algorithm must be equal to the number of predefined algorithms. The proposed method consists of two steps. In the first step, a classifier is used to determine the best order of Gray algorithms for the input image, and some of the algorithms of this order are combined in the second step to estimate the scene illuminant. The proposed method can be applied using two schemas: in the first schema, the images are considered in three groups (images with the best Gray algorithm of order 0, 1, and 2) and the classifier is trained to determine the best group of algorithms for a given image among zero-order, first-order, and second-order Gray algorithms. On the other hand, the second schema splits the images in two groups (images with the best Gray algorithm of zero order or higher order) and trains the classifier to determine the best group of algorithms for a given image as the group of zero-order or higher-order Gray algorithms. 4.1 Determining the best group of Gray algorithms for a given image First, the feature vector and the corresponding class (i.e., best group of Gray algorithms) for each image in a training dataset should be determined, and then, a classifier should be trained to predict the class that new images belong to. To make a good classification in this approach, the feature vector should accurately encode the properties of each class. So, various image features should be included in feature vector, and consequently, the feature vector may have high dimension. Therefore, to decrease the dimensionality of feature vectors and also complexity of the required classifier in the proposed approach, a dimension reduction algorithm named DGPP [27] is used. 4.1.1 Image features The color information, edges and intensity information, and natural image statistics are some of the main properties of image that the Gray algorithms are dependent on. Therefore, to achieve high classification accuracy, the feature vector should signify these properties. Hence, we have considered

123

SIViP

the scene classification literature and select some of the features in this literature that can encode these properties. To encode the color information, the number of colors, and the rg-chromaticity color histogram have been selected, and the natural image statistics have also been encoded based on the Weibull parameters. The edges and intensity information are also encoded using the wavelet LEH, edge direction histogram, and biologically inspired features. Note that these features may have very long length, and also, they may have some intersection and low information dimensions. The proposed approach solves this problem using the DGPP, which discards the intersections and low information dimensions and extracts the most discriminative part of the feature vector. These features have a lot of information, and by extracting this information using DGPP, it is possible to determine the best derivative order of Gray algorithms for each image. The evaluation on multiple benchmark datasets supports this claim. Each image feature and its motivating properties have been explained in the following sections, and also in case that the feature has been used before in color constancy, the corresponding reference has been cited. 4.1.1.1 Number of colors The color range of an image can be represented by the number of distinct colors in the image. This parameter is chosen to be in the feature vector of proposed approach because the zero-order algorithms are all based on the Gray-World assumption and this parameter is an indication of whether the Gray-World assumption holds true for the given image or not [20]. If an image contains many different colors, then the average color is likely to be a gray value [20]. While computing this feature, the RGB color channels have been quantized to remove small variations in the color appearance and also decrease the influence of noise. The quantization has been done by considering only the six most significant bits for each channel. Thus, the maximum number of different colors that can be discriminated is (26 )3 = 262144. This feature has been used before in [19]. 4.1.1.2 rg-chromaticity Color histogram Color histogram represents the color distribution of the image, and it is one of the most widely used image descriptors [6,8,19,23]. It encodes several useful properties that make it a robust visual feature. It is also invariant and robust with respect to the geometric transformation of the original image like rotation and scale. Using the full color histogram in RGB space as a feature is practically impossible because it requires 2563 bins. Therefore, the image values must be transformed into the rg-chromaticity space before computing the color histogram. In this way, the number of bins required for color histogram decreases to 2562 . Also, the rg-chromaticity color histogram is usually sparse; therefore, it is possible to use a quantized color histogram as a part of feature vector. In the proposed approach, the color histogram in rg-chromaticity

123

space is quantized by uniformly dividing each color axis into 32 intervals. The rg-chromaticity square is thus subdivided into 4,096 smaller squares, and each of the original colors is mapped to the square which it falls into. This means that the color histogram part of feature vector is a vector with the size equal to 1,024. 4.1.1.3 Weibull parameters Natural image statistics (distribution of edge responses) indicate the type of scene [28]; Geusebroek and Smeulders [24] showed that the natural statistics of an image can be modeled using Weibull distribution (Eq. 13). The Weibull parameters should be computed for derivative of each channel of RGB images separately. But, since RGB channels are highly correlated [18], images should be mapped to a decorrelated color space before computing Weibull parameters. Therefore, the decorrelated opponent color space [18] has been used in the proposed approach. The edge responses are computed by convolving O1 , O2 and O3 channels of the image with the derivative of a Gaussian filter with scale parameter σ as in Eq. (9). The Weibull parameters are obtained from the first- and secondorder derivatives of each color channel considering four different values for parameter σ of the Gaussian filter. The different values for σ are experimentally chosen as 1, 2, 3, and 5. Hence, Weibull part of the feature vector consists of 2 × 2 × 3 × 4 = 48 positive and real valued elements (two Weibull parameters for two derivatives of three color channels for each four values of σ ). 4.1.1.4 Wavelet local energy histogram The wavelet decomposition extracts the information about the textures and structures within the image [29–31]. The multiresolution wavelet decomposition is a process that is applied to the LL (low-pass-filtered version of the image) sub-band in a recursive manner. It can be repeated until the LL sub-band cannot be further processed or until a specified number of wavelet decomposition is reached. In each level of the wavelet decomposition, three high-pass band is created, so with considering the LL band, the total number of bands after a multiresolution wavelet decomposition with L levels is equal to 3 ∗ L + 1. To extract wavelet features in the proposed approach, three-level wavelet decomposition has been applied on the luminance image with Daubechies 1 (db1) filter bank, producing a total of 10 bands. Local energy histogram (LEH) [31] of each sub-band was used in feature vector. The LEH extracts the local (Norm-1) energy features within S × S neighborhoods in each sub-band. Typically, in the jth j j high-pass sub-band of size Ωi ×Ωi at the ith scale, the local energy features is defined by [31]: i, j

E Loc (l, k) =

S S 1   wi, j (l + u − 1, k + v − 1) 2 S u=1 v=1

(14)

SIViP j

where 1 ≤ l, k ≤ Ωi − S + 1 and wi, j (m, n) is the wavelet coefficient at location (m, n) in the sub-band. The local energy features in the low-pass sub-band, denoted by L L for clarity, are also extracted in the same manner accordE Loc ing to (14). All the above local energy features are nonnegative, and the average amplitude of the local energy values increases i, j almost exponentially with the scale i. Therefore, the E Loc i should be regularized by multiplying the factor 1/2 in order to make a uniform measure for local energy features at different scales [31]. Because the average amplitude of the local energy feature values in low-pass sub-band is much higher L than high-pass sub-bands, the regularization factor for E Loc L considered as 1/4 . For a particular wavelet sub-band with M local energy features (e1 , e2 , . . . , em ), the LEH, which is capable of modeling the probability density function of the local energy features, can be computed by taking the normalized histogram of ei where 1 ≤ i ≤ m [31]. Considering 10 sub-bands and the total number of 100 bins in LEH for each sub-band, the length of wavelet part in feature vector is equal to 1,000. 4.1.1.5 Biologically inspired features Systems based on biologically inspired features try to mimic the process of visual cortex in recognition tasks by simulating the C1 and S1 units of visual cortex [27,32,33]. The C1 units correspond to complex cells in the visual cortex, and the S1 units correspond to simple cells in S1 layer of the visual cortex [32]. C1 units use a maximum operation and keep the max response of a local area of S1 units from the same orientation and scale. In order to represent the S1 units, Gabor functions are used because these functions are similar to the receptive Feld Profiles in the mammalian cortical simple cells [27]. The Gabor x +γ 2 y 2

mother function is F(x, y) = exp(− 0 2δ 2 0 ) × cos( 2πλx0 ) where x0 = x cos(θ ) + y sin (θ ), y0 = −x sin (θ ) + y cos (θ ). The range of x and y determines the scales of Gabor filters and θ controls the orientations. As in [27], a pyramid of Gabor filters with eight scales is used in the proposed approach. The sizes of filters are from 7 × 7 pixels to 21 × 21 pixels with a step of two pixels. Four values (0◦ , 45◦ , 90◦ , and 135◦ ) are considered for θ , which results in 8 × 4 = 32 Gabor filters. By applying these Gabor filters on the initial input image, 32 feature maps are obtained for S1 units. By applying a maximum operation over two adjacent scales of S1 units with matching orientations, 16 feature maps for C1 units are obtained. Finally, the normalized histogram of each C1 unit feature map with total number of 50 bins is used as

part of feature vector. Hence, the total length of biologically inspired part of feature vector is 16 × 50 = 800. 4.1.1.6 Edge direction histogram Edges in images with different types of scenes are different, and they have valuable information for image classification. For example, strong edges can be found in buildings, roads, and other man-made structures, which usually have definite direction pattern. Conversely, the objects in the pictures of natural scenes usually have no clear structure and do not show a specific pattern; therefore, these pictures usually do not contain strong edges. Edge direction histogram is a good tool for determining the edge structures within an image and therefore allows us to distinguish between different image classes. In order to obtain edges, the derivate of a Gaussian filter with σ = 1 is convolved by the luminance image in both the x and y directions (G x , G y ). The edge orientation at edge position (x, y) is then computed using the following equation: θ (x, y) = arctan

G y (x, y) G x (x, y)

(15)

The 36 bins edge direction histogram is then obtained by quantizing the orientations into intervals of 5◦ . Also, to ensure that only sufficiently strong edges are used in computing the histogram, only the orientations belonging to edges with magnitude above a given threshold are considered. The threshold is experimentally chosen to be equal to 0.6. This feature has been used before in [23] with lower quantization resolution. 4.1.2 Dimension reduction As shown in Fig. 1, the dimension of obtained feature vector in previous section is very high. It consists of 2,909 features: one feature shows the number of colors, 1,024 features are related to rg-chromaticity color histogram, 48 features corresponds to Weibull parameters, 1,000 features are extracted from LEH of wavelet decomposition, 800 features represent the biologically inspired features, and 36 features are related to edge direction histogram. Using such a long-length feature vector not only increases the complexity of the classifier, but also the possible low information dimensions in the feature vectors may decrease the classification accuracy. Therefore, the length of the feature vector should be decreased by extracting the most discriminative parts before training the classifier. Many dimension reduction algorithms exist in the literature, such as

Number of colors

rg-chromaticity histogram

Weibull parameters

Wavelet decomposition LEH

Biologically inspired features

Edge direction histogram

One number

1024 positive numbers

48 numbers

1000 numbers

800 numbers

36 numbers

Fig. 1 The high-dimensional feature vector for each image. The dimensionality of this feature vector must be reduced by DGPP before training the classifier

123

SIViP

locally linear embedding (LLE) [34] and Discriminative and Geometry Preserving Projection (DGPP) [27]. For the purpose of classification, a good dimension reduction algorithm should model both the intraclass geometry and interclass discrimination. It must make the distances between interclass samples as large as possible while keeping the distances between intraclass samples as small as possible. Also, it should preserve the local geometry of intraclass samples as much as possible. DGPP is a new dimension reduction method that can precisely do these tasks. It outperforms previous methods in classification accuracy while using same classifier and also never meets the under-sampled problem when training samples are insufficient [27]. Hence, DGPP is used in the proposed approach. In summary, for n samples xi , i = 1 . . . n in the highdimensional space R H , DGPP maps the input samples to the low-dimensional space R L using a linear mapping defined by a projection matrix U ∈ R H ×L as follows: X = [ xi ]1≤i ≤n ∈ R H ×n

(16)

Y = UT X ∈ R L×n

Each column of Y is the low-dimensional representation of an input sample; i.e., yi = U T xi ∈ R L . In DGPP, there are c classes and the ith sample xi in the high-dimensional space is associated with a class label m i ∈ {1, 2, . . . , c}. To implement the discrimination preservation in DGPP, the average weighted pairwise distance between samples in different classes should be maximized, while the average weighted pairwise distance between samples in an identical class should be minimized [27]: yi = arg max

yi ,1≤i≤n



n 

n   i=1 j:m j =m i





ci, j

i=1 j:m j =m i

= arg max

yi ,1≤i≤n

1 ||yi − y j ||2 n

1 1 ||yi − y j ||2 − nmi n

n  

h i, j ||yi − y j ||2

(17)

i=1 j:m j =m i

where weighing factor h i, j encodes both the distance weighing information and the class label information:

ci, j (1/nl − 1/n) if m j = m i = l (18) h i, j = 1/n if m j = m i ci, j = e− || xi − x j ||i2 /δ 2 .

(19)

Also, to implement the local geometry preservation, it is assumed that each sample can be reconstructed by the samples within the same class:  wi, j xi + εi (20) xi j:m j =m i

123

where εi is the reconstruction error for xi , and wi, j is obtained n ||εi ||2 : by minimizing i=1  2  n       wi, j xi  (21) wi, j = arg min xi −   wi, j  i=1  j:m j =m i Using this assumptions with some matrix operations (see [27]), the matrix U can be obtained by the following equations:   U = arg max tr U T X M X T U (22) U ∈R H ×L

where the unified coefficient matrix M = (D − H T )(D − H T ) − λ(I − W T )(I − W T )T is a symmetric matrix, H = [h i, j ] ∈ R n×n , W = [wi, j ] ∈ R n×n , and D ∈ R n×n is a n h i, j . By imposing diagonal matrix and its ith entry is i=1 UU T = Id on (22), the solution is given by standard eigenvalue decomposition on [X M X T ] and U is formed by the L eigenvectors associated with the L first largest eigen-values. Note that the DGPP is not a feature selection algorithm; it reduces the dimensionality of the feature vectors by projecting them onto another coordinate system with lower dimensions, and hence, it is not possible to determine which features are more important. See [27] for further details. 4.1.3 The classifier: support vector machine There are many choices for an appropriate classifier from model-based methods to learning algorithms. Among these, SVMs appear to be a good candidate because of their ability to generalize in high-dimensional spaces without the need to add a prior knowledge [35]. The appeal of SVMs is based on their strong connection to the underlying statistical learning theory, and for several pattern classification applications, SVMs have been shown to provide better generalization performance than traditional techniques such as neural networks [35]. Since the feature vector that has been used in the proposed approach is a long vector, the SVM classifier is used in the proposed approach because of its generalization performance in highdimensional space. Note that although the proposed approach uses the DGPP to decrease the dimensionality of the feature vector, as the upcoming experiment in Sect. 5.5 shows, the dimensionality of feature vector after reduction is still high. The SVM is a binary classifier; it trains a classifier by finding an optimal separating hyper-plane, which maximizes the margin between two classes of data in the kernel-induced feature space. In the case of multiclass classification, one approach is to reduce the single multiclass problem into multiple binary classification problems. A common method for such reduction is building total number of C * (C − 1)/2 binary classifiers, which distinguish between every pair of

SIViP

classes (pairwise approach). The classification is then done by a max-wins voting strategy; every classifier assigns the instance to one of the two classes, then the vote for the assigned class is increased by one vote, and finally, the class with the highest votes determines the instance classification. Each training image has a feature vector with low dimension that is obtained by applying the DGPP on the high-dimensional vector containing the mentioned features in Sect. 4.1.1. The proposed approach uses a multiclass SVM classifier to determine the order of the best group of Gray algorithms for a given image. The method only considers the Gray algorithms of orders up to two, and it has implemented in two different ways: first and second schema. In the first schema, classifier should classify the images into three classes in such a way that the class ci of each training image i is set to the order of the Gray algorithm with the maximum performance on the image and the proposed method uses the pairwise approach for SVM multiclass classification. Consequently, 3*(3 − 1)/2 = 3 binary classifiers must be constructed and each classifier should be trained by data from two different classes. In the second schema, images are classified in two classes so the proposed method uses a single binary SVM classifier with the class ci of each training image i is set to 0 or 1: the first (0th) class contains images with the best Gray algorithm of zero order, and the second (1th) class contains images with the best Gray algorithm of higher order (first order and second order). 4.2 Algorithm combination After determining the best group of Gray algorithms for the given image, the proposed method combines the output of some Gray algorithms of that group to achieve a new estimate for the color of light source. The simplest method for generating a new estimate of the scene illuminant using the output of multiple algorithms is to take the average of the estimates over all algorithms. A straightforward extension is to take the weighted average of the estimated illuminants. Better approach is to use a nonlinear combination. A well-trained nonlinear neural network may produce better results than the weighted average. The proposed method uses three (first schema) or two (second schema) nonlinear multilayer perceptron (MLP) neural networks (in the first schema, one network for each order of Gray algorithms and in the second schema, one network for zero-order Gray algorithms and one network for higher-order Gray algorithms). The inputs to the neural networks are the obtained estimates of the scene illuminant using Gray algorithms, and the output is the new estimate of scene illuminant vector. The training procedure for such neural network is very important. The back-propagation training algorithms for MLP [36]

usually use the mean square error (MSE) as the fitness function. The training methods normalize the elements of the input and target vectors in [0,1] or [−1,1] as a preprocessing step. But they do not normalize the length of output vectors in the training process. Hence, the training process with MSE fitness differs between two illuminant vectors of the same color but with different intensity. In other word, if an output vector of neural network is parallel by its corresponding target vector (same color) but their lengths differ from each other (different intensity), the training procedure considers this difference as error and tries to compensate it by updating the weights of neurons. For our purpose, this is incorrect, because the task of color constancy is to estimate the color of scene illuminant and not its intensity. The intensity of light source is not important because the process of image correction always transforms the image in such a way that it appears to be taken under a white light source with standard intensity [i.e., ( √1 , √1 , √13 )]. Therefore, the training process must 3 3 be changed in such a way that does not take the intensity information of the output vectors into account. One way is to use another fitness function instead of MSE; another way is to add additional step to the back-propagation process, which normalizes the lengths of output vectors before computing the MSE. These two ways need a lot of hard work math and justifications, and we are not sure that they are applicable. The simpler way that does not require any changes to back-propagation process is to change the problem space. The only needed change is to convert the input and target vectors from 3D Cartesian coordinate of RGB cube to the 3D spherical coordinate (r, θ, ϕ), then discarding the radial distances (r ) and only using their polar (θ ) and azimuthal (ϕ) angles for training. In this way, the back-propagation process minimizes the MSE of the difference between the angular deviation of output and target vectors. The output vector of neural network can be converted back to the normalized scene illuminant vector in the Cartesian coordinate while setting the radial distance to 1. The proposed approach uses one neural network for each group of Gray algorithms. Training each neural network has been done using its corresponding subset of training images (i.e., the training images in which the order of the best algorithm for them is equal to the order of algorithms that the network is used to combine). In this way, the complexity of the function that is to be approximated by each neural network decreases and the generalization performance of each neural network increases. Figure 2. shows the architecture of neural network for each order of algorithms. The inputs to the neural network are polar, and azimuthal angles for the estimates of scene illuminant using the algorithms of same order and the outputs of the neural network are new estimated angles of the scene illuminant.

123

SIViP Fig. 2 The architecture of neural network for each order of algorithms

Hidden Layer

Alg1

θ

Alg2

θ θ

Algn



θ

4.3 The proposed method in summary Figure 3 shows the overview of the proposed method (the first schema). In summary, the training phase for the first schema of the proposed method consists of the following steps: 1. Choose n 0 + n 1 + n 2 = n alg Gray algorithms (n 0 zero order, n 1 first order and n 2 second order). 2. For each training image imgi , i ∈ [1, N ] and Gray algorithm alg j , j ∈ [1, n alg ], determine the vector ei, j ∈ R1×2 and performance i, j (angular error) of alg j on imgi · ei, j contains the angles θ and ϕ for the estimate of the scene illuminant in spherical coordinate and N is the total number of training images. 3. Form the label vector L ∈ R N ×1 with each element L i equal to the order of the algorithm with the highest performance on imgi as follows: 



L i = order arg min{i, j } alg j

(23)

4. Train the classifier C using the following steps: a. Extract the high-dimensional feature vector f i ∈ R H ×1 for each image imgi using the process defined in Sect. 4.1.1. Build the matrix F H ∈ R H ×N such that each column is equal to f iH · H is the dimension of high-dimensional feature vector that is equal to 2,909 in the proposed method. b. Apply the DGPP on the matrix F H to obtain the projection matrix U ∈ R H ×L and the matrix F L ∈ R L×N (F L = U T F H ). Each column f iL of matrix F L is equal to low-dimensional representation of feature vector for a training image. L is the dimension of low-dimensional (reduced) feature vector. c. Train the classifier C with matrix FL and the vector L as the inputs and labels, respectively. H

123

5. Split the training set into three subsets S0 , S1 , and S2 based on labels of images (place the images with the same labels 0, 1, and 2 into the subsets S0 , S1 , and S2 , respectively). 6. For each subset Sk , k = 0. . .2, train the neural network N Nk using the following steps: a. Build the matrix E k ∈ R(n k ∗2)×Nk such that each column is equal to the concatenation of ei, j vectors obtained by applying all order k algorithms on each imgi , i ∈ Sk · Nk is the number of images in Sk and n k is the number of algorithms of order k. b. Build the matrix Tk ∈ R2×Nk such that each column is equal to the vector E i containing the θ and ϕ angles of the spherical coordinate representation for real scene illuminant vector for an image imgi , i ∈ Sk . c. Train the network N Nk using the matrix E k and the matrix Tk as the training inputs and targets, respectively. After training phase, the normalized scene illuminant for a given image can be estimated using the following steps: 1. Extract the high-dimensional feature vector f H ∈ R H ×1 for the given image using the process defined in Sect. 4.1.1. 2. Use the obtained projection matrix U in the step 4.b of the training phase to obtain the f L ∈ R L×1 (the lowdimensional representation of f H ) by the following formula: f L = UT f H

(24)

3. Feed f L to the trained classifier to determine the best order of Gray algorithms for the image and name it as o. 4. Build the vector E ∈ R(n o ∗2)×1 by concatenation of ei, j vectors obtained via applying all order o algorithms on the image. n o is the number of algorithms of order o.

SIViP

(a)

(b)

Fig. 3 Overview of the proposed approach (first schema). a Training phase: the training set of images must be labeled and splitted into three subsets: one subset for each considered order of Gray algorithms. The neural network for each order of Gray algorithms should only be trained with its corresponding subset of training images, while the training of the classifier should be done using the complete set of training images.

b Test phase: the normalized estimate of the scene illuminant for a given image can be obtained by first predicting the best order of Gray algorithms for that image using the trained classifier and then, feeding the outputs of the Gray algorithms of the best order to the corresponding trained neural network and obtaining its outputs

5. Feed the vector E to the corresponding trained neural network (N No ) to estimate the polar and azimuthal angles of the scene illuminant. 6. The normalized estimate of the scene illuminant can be obtained by converting the estimate of the polar and azimuthal angles of the scene illuminant back to the Cartesian RGB coordinate while setting the radial distance to 1.

5 Experiments

For the second schema, the proposed method steps are identical to the steps in the first schema with the difference that the number of image subsets, classes in the classifier, and neural networks must be equal to 2.

5.1 Algorithms to be combined The first step in the proposed method is to determine the best group of algorithms for the given image among some predefined algorithms. This task can be done among the Gray algorithms with satisfactory precision because of their natural similarity. Also, the Gray algorithms are known as a family of algorithms with satisfactory good performance with regard to their simplicity and computation efficiency [37]. Therefore, the proposed method is designed for combination of Gray algorithms only. The number of Gray algorithms that can be

123

SIViP Table 3 The Gray algorithms and their performances on the Gray-Ball dataset Zero-order algorithms

First-order algorithms

Second-order algorithms

Algorithm

e0,9,0

e0,8,0

e0,11,0

e0,5,0

e0,9,3

e1,2,1

e1,5,3

e1,7,3

e1,5,4

e1,9,3

e2,1,2

e2,3,3

e2,3,5

e2,5,3

e2,7,3

Median error

5.3

5.3

5.4

5.5

5.6

4.6

5.8

6.0

6.1

6.2

4.9

5.1

5.2

5.3

5.4

These algorithms have been used in evaluation section

Original Image

Ideal Correction

Proposed Approach

3.2o

0.3o

0.9o

Fig. 4 Examples of proposed algorithm results for Ciurea and Funt real-world image dataset

combined with the proposed approach is somewhat arbitrary. In the experiments, we chose the number of combined algorithms to be equal to 15 (five zero-order, five first-order, and five second-order algorithms as shown in Table 3) because the experiments explained in Sect. 4 showed that at least in almost 70 % of the times, the five best Gray algorithms for an image are in the same group (have the same order). 5.2 Datasets The evaluation of the proposed approach has been done using two public color constancy datasets containing real-world images. The first dataset is Gray-Ball dataset [26], a large dataset with 11,346 images. The images in this dataset have been extracted from 15 video clips taken at different locations. The real scene illuminant for each image in dataset is acquired using the small gray sphere at the bottom right corner of the images. Some sample images of this dataset are shown in Fig. 4. Note that while evaluating the proposed approach, the gray sphere is omitted from images to avoid biasing the algorithms. The Gray-Ball dataset is a widely used dataset for the evaluation of color constancy algorithms; but, it has some disadvantages: the main disadvantage of this dataset is that the images have been extracted from video sequences, and therefore, some of the images have correlation. Another drawback of this dataset is that the images are processed by an

123

Original Image

Ideal Correction

Proposed Approach

2.1o

9.4o

0.07o

Fig. 5 Examples of proposed algorithm results for Color-Checker image dataset

unknown post-processing procedure including gamma mapping and lossy compression, and also, the quality of images is low. The reference [37] reprocessed the images in this dataset and converted them to linear images using the reverse gamma correction assuming the gamma value to be 2.2. Also, this reference recomputed the ground truth illumination of linear images and made the illuminations publicly available for download [38]. But the quality of these linear images is still low, and the lossy compression is still in place. Therefore, as second dataset, the proposed approach has been tested using the Color-Checker dataset [39]. The Color-Checker dataset consists of 568 indoor and outdoor images. For each image, a MacBeth Color-Checker is placed in the scene in order to obtain the real scene illuminant for that image. Although this dataset contains limited number of images comparing to Gray-Ball dataset, the images in this dataset are not correlated and also the quality of images is high. The Color-Checker dataset is available in RAW format as well as tiff images in sRGB format. Shi and Funt [40] reprocessed the RAW data to obtain linear images with a higher (12 bit) dynamic range. The proposed approach is tested on these linear images, and to avoid biasing the algorithm, the Color-Checker is omitted from images in the evaluation process. Some gamma corrected example images (γ = 2.2) of this dataset are shown in Fig. 5. Another dataset has been generated based on spectral reflectance data presented in [41]. The dataset in [41] con-

SIViP

to accept or reject the null hypothesis at a given significance level α is made on the number of times the random variable A are greater than the corresponding values of B. 5.4 Neural network parameters

Fig. 6 Examples of the images in the Mondrian dataset

tains surface and illuminant spectra, and the first step is to convert them into (R, G, B) values to obtain pixel colors. Using these generated pixel colors and to simulate the statistics of real-world images, several Mondrian-like images have been created with different properties in the number of edges, the amount of texture, and contrast. This dataset is called the Mondrian dataset in the remainder of the paper. Figure 6 shows a few examples of these images. 5.3 Performance measure The angular error ε is used to measure the accuracy of the estimated illuminant: ε = cos−1 (ˆel · eˆ e )

(25)

where eˆ e is the normalized estimated illuminant, eˆ l is a normalized vector representing real scene illuminant, and “·” denotes the dot product of two vector. However, the best value of the angular error is image dependent, but since better measure is not available in the literature, the median angular error has been employed to measure the overall performance of algorithm on the dataset, because this measure is known as the most appropriate measure for this aim [18]. Although the median angular error is an appropriate measure for comparing the overall performance of algorithms on a dataset, it does not tell us everything about the distribution of errors. Therefore, it would be better to somehow compare statistical significance of the whole error distribution of algorithms. To this end, the Wilcoxon Sign Test [42] has been used. Given two samples of random variables A and B, the Wilcoxon sign test is used to test the null hypothesis: H0 : p = P(A > B) = 0.5 which means the probability that A has a value larger than B in 50 % of the time. In the case of color constancy algorithms, the random variables A and B are the angular error results for two different algorithms and the test can be used to determine whether the performance of the algorithms is the same (the null hypothesis is true) or whether one algorithm performs significantly better than another (the null hypothesis is rejected). The decision

The first schema of proposed method uses three neural networks that all of them are MLP with one hidden layer since one hidden layer is sufficient for nearly all problems [43]. The proper architecture of an MLP depends on the dimensionality of the function that it tries to model as well as the amount and quality of training data. In order to model highly complex functions, the network must have large number of neurons. But increasing the number of neurons significantly increases the training time and requires much larger training set. Another problem with a large network is that it has a tendency to memorize the relationship between the input and target values and therefore has poor generalization performance. In contrast, a small network cannot completely model the inputs–targets mapping. Each network in the proposed approach should model the mapping between the outputs of multiple Gray algorithms with the same order to the real scene illuminant. This mapping is independent of the used classifier in determining the best order of Gray algorithms for the given image. Therefore, in order to determine the optimum number of hidden layer neurons for each network, we examined multiple networks with different number of hidden layer neurons while assuming the classifier is an ideal one. Figure 7 shows the results of this experiment. As it can be seen in Fig. 7, the networks with 15–30 neurons in hidden layer had comparable color constancy performance on the Gray-Ball dataset. Also, the number of hidden layer neurons can vary within a range of 20–30 without affecting the overall performance. Therefore, the number of neurons in hidden layer of networks is chosen to be 25 neurons. The number of hidden neurons for networks of the second schema is determined to be 35 using a similar approach. Here, the number of neurons is higher than the first schema; this is because in the second schema, the function that should be approximated by the neural network of higherorder group (i.e., mapping between the output of algorithms with different order to the real scene illuminant) is more complex than the approximated functions with the networks of the first schema (i.e., mapping between the output of algorithms with the same order to the real scene illuminant). All neurons have a sigmoidal activation function (tangent hyperbolic); the fitness function used for training the networks is MSE and networks are trained using the improved Levenberg–Marquardt algorithm [36], while the damping factor (learning rate) is set dynamically according to the Marquardt recommendation [44] during the learning process.

123

SIViP 4

Median Angular Error (Degree)

Fig. 7 The color constancy performance of the first schema versus the number of neurons in neural networks while assuming the classifier is ideal

Gray-Ball Dataset Color Checker Dataset 3.5

3

2.5

2

9

11

13

15

17

19

21

23

25

27

29

31

33

35

37

Number of Hidden Layer Neurons 100

Second Schema First Schema

90 80

Classification accuracy (cross validation)

Fig. 8 Classification accuracy of the first and second schemas of the proposed approach. The overall accuracy of classification in the second schema is higher than the first schema

70 60 50 40 30 20 10 0

0

200

400

600

800

1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000

Length of DGPP low dimensional feature vector

5.5 Choosing the best DGPP parameters The dimension reduction algorithm (DGPP) extracts the discriminative part of the feature vector. In this process, one of the most important parameters is the length of the lowdimensional vector representing the discriminative part. The original feature vector length in the proposed approach is 2,909, and it is important to choose the best length for its low-dimensional representation. The classification accuracy highly depends on this parameter and the best value for this parameter should be found experimentally. Therefore, in our experiments, we have varied the value of this parameter from 2,909 (no reduction) to 3 (99 % reduction) and computed the classification accuracy for both schemas of the proposed approach using the k-fold cross-validation on the Gray-Ball dataset. In k-fold cross-validation, the training data are divided into k parts; next, the classifier is trained on k − 1 parts of the data, and tested on the remaining part. This procedure is repeated k times, so every image is in the test set exactly once and all images will be either in the training set or in the test set. One of the basic assumptions in machine learning is that the distribution of the test data should be similar to the distribution of the training data. Therefore, the variety of

123

images used for training the algorithm should be similar to the variety of images that are used to test it. Also, to achieve evaluation fairness, the training images and the test images should not be similar. In other words, when the train and test images are similar (correlated images), the test is not valid because the test does not evaluate the generalization performance of the method due to the correlation between the train and test images. The images of Gray-Ball dataset are from 15 different clips and images of each clip are correlated. Therefore, we used the 15-fold cross-validation test to evaluate the performance of the proposed approach while keeping all correlated images (images that are extracted from the same clip) in the same fold. Hence, no similarity exists between the training data and test data. The results of this experiment in both schemas of the proposed approach are shown in Fig. 8. As it is expected, on overall, the classification accuracy in the second schema is higher than the classification accuracy of the first schema. One reason is that the number of classes in the second schema is less than the number of classes in the first schema and another reason is that the two of three classes in the first schema are rather similar. As it is shown in Fig. 8, the classification accuracy with no dimension reduction is about 50 and 60 % in the first and

SIViP

second schema, respectively. The reason for the poor classification performance with no dimensionality reduction is that the classification of high-dimensional feature vectors needs a complex classifier and also the high-dimensional feature vector contains some low information (nondiscriminative) parts that prevent convergence of the classifier. Decreasing the length of the low-dimensional feature vector from 2,909 to about 500 does not improve the classification accuracy so much. The classification accuracy in both schemas increases by decreasing the length of the low-dimensional feature vector from 500 to about 200. The maximum accuracy for the first schema is 92.6 %, which obtained by setting the length of low-dimensional feature vector to 200. Also, the maximum accuracy for the second schema is 97.7 %, which obtained by setting the length of low-dimensional feature vector to 210. By decreasing the length of low-dimensional feature vector from 200 to 3, the classification accuracy decreases with a large slope. This is because that the reduction removes the necessary discriminative information. Since the maximum classification accuracy has been obtained with the length of low-dimensional feature vector set to 200 and 210 for the first and second schema, respectively, the proposed approach also uses these values. 5.6 Evaluation on Ciurea and Funt Gray-Ball dataset To evaluate the performance of the proposed approach on this dataset, 15-fold cross-validation has been used on the linear version of this dataset based on the recomputed illuminations. Also, to further evaluate the performance of the proposed approach, a cross-dataset test has been done by training the method on the Mondrian dataset and testing it on this dataset. Table 4 shows the result of applying several algorithms on the Gray-Ball dataset as well as the results

of the Wilcoxon sign test on this dataset with 99 % confidence interval, i.e., α = .01 (the results of algorithms other than the proposed approach have been obtained from [38]). A plus sign (+) in the ith row and jth column of the table means that algorithm i is statistically better than algorithm j when judged according to the Wilcoxon test. A minus (−) implies that it is worse, and if the box is empty, the two algorithms are statistically the same. On the basis of the results in Table 4, it can be concluded that although the max error for the proposed method is higher than other methods, on overall, both schemas of the proposed approach outperform other algorithms. The median and mean angular error and also the Wilcoxon sign test results support this claim. Note that the performance of the first schema is lower than the second schema, while the classification accuracy in the second schema was better than the first schema. One reason is that the neural network (higher-order group) of the second schema should model a more complex function and, as a result, has lower generalization performance. Another reason is that in the second schema, the first- and the second-order Gray algorithms are placed into one group, and therefore, the proposed method cannot determine the best algorithm for images as precisely as in the first schema when the best algorithm for an image is first or second order. Also, note that the authors of the CAS algorithm report the performance of CAS on this dataset as 3.21, but they used the nonlinear version of this dataset and also they did not use whole dataset for the evaluation of their method. In order to decrease the correlation between test images, they selected a subset of images from dataset containing total number of 1,135 images. Therefore, this algorithm has been omitted from the Table 4. The authors of the weighted Gray-Edge algorithm reported the performance of their algorithm on this dataset as 9.0 using a subset of images from dataset containing total number of

Table 4 The performance of various color constancy methods on Gray-Ball image dataset (Linear) Median error 1

Gray-World

2

Shades of Gray ( p = 4)

3

Edge-based Gamut (σ = 9)

Mean error

Max error

1

2 −

11.0

13.0

63.0

9.7

11.6

58.1

+

10.9

12.8

58.3

+



3

4

5

6

7

8

9

10

11



















+

































+











































+

+

+

4

First-order Gray-Edge ( p = 1, σ = 1)

8.8

10.6

58.4

+

+

+

5

Second-order Gray-Edge ( p = 1, σ = 1)

9.0

10.7

56.1

+

+

+



6

Weighted Gray-Edge ( p = 1, σ = 1)

8.6

10.5

54.2

+

+

+

+

+

7

NIS

7.7

9.9

56.1

+

+

+

+

+

+

8

Proposed method—first schema (cross-validation) Proposed method—second schema (cross-validation) Proposed method—first schema (cross-dataset) Proposed method—second schema (cross-dataset)

7.0

8.9

55.2

+

+

+

+

+

+

+

7.2

9.2

56.1

+

+

+

+

+

+

+



+

7.3

9.3

55.6

+

+

+

+

+

+

+



+

7.5

9.8

56.4

+

+

+

+

+

+

+



9 10 11





123

SIViP Table 5 The performance of various color constancy methods on Color-Checker dataset Median error

Mean error

Max error

1

Gray-World

6.3

6.4

24.8

2

Shades of Gray ( p = 12)

4.0

4.9

22.4

3

Edge-based Gamut (σ = 3)

5.0

6.5

29.0

1

2

3

4 −

















+

+

+

+































− + −

5

7

8

9

10

11

12

4

First-order Gray-Edge ( p = 1, σ = 1)

4.5

5.3

26.4

+



+

5

Second-order Gray-Edge ( p = 1, σ = 2)

4.4

5.1

23.9

+



+

+

6

Weighted Gray-Edge ( p = 1, σ = 1)

4.3

6.3

22.4

+



+

+

+

7

NIS

3.1

4.2

26.2

+

+

+

+

+

+

8

CAS

2.9

3.9

22.3

+

+

+

+

+

+

+

9

Proposed method—first schema (cross-validation) Proposed method—second schema (cross-validation) Proposed method—first schema (cross-dataset) Proposed method—second schema (cross-dataset)

2.5

3.8

37.8

+

+

+

+

+

+

+

+

2.8

4.1

38.1

+

+

+

+

+

+

+

+



+

2.8

4.0

39.0

+

+

+

+

+

+

+

+



+

3.0

4.3

39.3

+

+

+

+

+

+

+

+



10 11 12



6























































+

+

+

+





1,135 images. But they made the source code of their algorithm publicly available [38], and we used their source code to compute the performance of weighted Gray-Edge algorithm on this dataset. Also note that the proposed approach has somewhat lower performance in the cross-dataset test comparing to the cross-validation test. However, the performance of the proposed method in the cross-dataset test was better than other methods as well as the test of statistically significance.

Table 6 Misclassification cost for three different algorithms in terms of median angular error increment

5.7 Evaluation on Color-Checker dataset

5.8 Discussion and further analysis

To evaluate the performance of proposed approach on this dataset (Shi and Funt dataset [40]), the threefold crossvalidation test as well as the cross-dataset test has been used. The reason for selecting threefold is the author of this dataset divided the images in three categories. The median, mean, and maximum angular errors as well as the results of the Wilcoxon sign test with 99 % confidence interval for various color constancy algorithms on this dataset are shown in Table 5 (the results of algorithms other than the proposed approach have been obtained from [38]). The authors of the weighted Gray-Edge algorithm reported the performance of their algorithm on the original nonlinear version [39] of this dataset, and we used their provided source code to compute the performance of weighted Gray-Edge algorithm on this dataset. Again, both schemas of the proposed approach outperformed other algorithms, while the performance of the first schema is better than the second schema and the performance of the proposed approach in the crossvalidation test is better than its performance in the crossdataset test.

In previous sections, the performance of the proposed approach has been evaluated using three color constancy datasets, and the results have been compared to the stateof-the-art methods. The evaluation showed that the proposed method outperforms the state-of-the-art algorithms in terms of median and mean angular error and it is also statistically significant comparing to these algorithms. In this section, we have analyzed the proposed approach to answer these questions: Is the combination phase necessary or the classification phase is enough? How much performance improvement in the proposed approach is due to the classification phase and how much is related to the combination phase? Table 6 shows the misclassification cost for three best algorithms of Table 3 in terms of median angular error increment on the Gray-Ball dataset. As it can be seen, the misclassification cost when dealing with the zero-order algorithms is very high. For example, if we consider the images that e0,9,0 (which is a zero-order algorithm) is the best algorithm for them, using e1,2,1 (which is a first-order algorithm) instead of e0,9,0 increases the median angular error for these

123

Best algorithm

Predicted algorithm e0,9,0

e1,2,1

e2,1,2

e0,9,0

0

3.73

3.95

e1,2,1

3.10

0

0.93

e2,1,2

4.11

0.87

0

SIViP Table 7 The performance of the proposed method with and without the combination phase on Gray-Ball dataset (median angular error in degree) First schema

Second schema

Absolute difference

Without combination

3.96

4.04

0.08

With combination

2.9

3.1

0.20

Combination improvement

1.06

0.94

0.12

images by 3.73◦ . This can explain why the performance of the zero-order algorithms on the Gray-Ball dataset is less than the first- or second-order algorithms, while the fraction of the dataset that the best algorithm for them is zero order is larger than the corresponding fraction to first- or secondorder algorithms (The corresponding fraction for zero-, first-, and second-order algorithms are 41, 23, and 36 %, respectively.). In fact, the zero-order algorithms are such that they either have very good or very bad performance on each image and they infrequently have intermediate performance on an image. Therefore, the high classification accuracy in the proposed approach (as discussed in Sect. 5.5) is very important. To further evaluate the proposed method, another test has been employed. In this test, we used the best algorithms of Table 3 (i.e., e0,9,0 , e1,2,1 , and e2,1,2 ) and replaced the combination phase by selecting and applying one of these algorithms based on the order determined by the classification phase. In this way, we can somewhat distinguish the performance gain in the proposed method that comes from classification phase and the advantage that comes from combination phase. Table 7 shows the results of this test. The table shows that the combination phase improves the performance of the proposed approach about one degree and the remaining performance improvement over other methods comes from the classification phase. Also note that without the combination phase, the absolute difference between performances and combination improvement of two schemas are 0.08 and 0.12 degree, respectively. This means that among two mentioned reasons that make the second schema to be less efficient than the first schema, the complexity of the neural network is more effective than the selection between the firstand second-order algorithms. 6 Conclusion We have proposed a method for the combination of color constancy algorithms with this motivation in mind that in order to achieve better performance, it is better to combine algorithms that have good performance on the input image. To this end, we have shown that among the Gray family of color constancy algorithms, the derivative orders of the best color constancy algorithms for each image are the same. There-

fore, we have proposed a method to first determine the derivative order of the best algorithms for the input image and then, combine some algorithms with this order to achieve better performance. This method is based on a classifier that determines the best derivative order using a feature vector that contains the discriminative parts of multiple image features. After determining the best derivative order for the given image, a neural network is used that combines the output of some algorithms with this derivative order and provides a new estimate for the color of light source. We have evaluated the performance of the proposed approach with three different color constancy datasets (two benchmark datasets with real-world images and one dataset with synthetic images), and the evaluation results showed that the proposed approach outperforms the state-of-the-art algorithms in terms of mean and median angular error and is also statistically significant while comparing to these algorithms. Acknowledgments We would like to thank from Iran National Science Foundation for their financial support of this research.

References 1. Muselet, D., Funt, B.: Color invariants for object recognition. In: Fernandez-Maloigne, C. (ed.) Advanced Color Image Processing and Analysis, pp. 327–376. Springer, New York (2013) 2. Foster, D.H.: Color constancy. Vis. Res. 51, 674–700 (2011) 3. Faghih, M.M., Moghaddam, M.: Neural gray edge: improving gray edge algorithm using neural network. In: IEEE International Conference on Image Processing (ICIP). Brussels, Belgium (2011) 4. Akhavan, T., Moghaddam, M.: A new combining learning method for color constancy. In: International Conference on Image Processing Theory Tools and Applications (IPTA), pp. 421–425 (2010) 5. Akhavan, T., Moghaddam, M.: A color constancy method using fuzzy measures and integrals. Opt. Rev. 18, 273–283 (2011) 6. Agarwal, V., Gribok, A.V., Abidi, M.A.: Machine learning approach to color constancy. Neural Netw. 20, 559–563 (2007) 7. Cardei, V., Funt, B.V., Barnard, K.: Estimating the scene illumination chromaticity using a neural network. J. Opt. Soc. Am. 19, 2374–2386 (2002) 8. Stanikunas, R., Vaitkevicius, H., Kulikowski, J.J.: Investigation of color constancy with a neural network. Neural Netw. 17, 327–337 (2004) 9. Gijsenij, A., Gevers, T., van deWeijer, J.: Generalized gamut mapping using image derivative structures for color constancy. Int. J. Comput. Vis. 86, 127–139 (2010) 10. Finlayson, G.D., Hordley, S.D.: Gamut constrained illuminant estimation. Int. J. Comput. Vis. 67, 93–109 (2006) 11. Ebner, M.: Evolving color constancy. Pattern Recognit. Lett. 27, 1220–1229 (2006) 12. Finlayson, G.D., Hordley, S.D., Hubel, P.M.: Color by correlation: a simple, unifying framework for color constancy. IEEE Trans. Pattern Anal. Mach. Intell. 23, 1209–1221 (2001) 13. Buchsbaum, G.: A spatial processor model for object colour perception. J. Frankl. Inst. 310, 1–26 (1980) 14. Provenzi, E., Gatta, C., Fierro, M., Rizzi, A.: A spatially variant white-patch and gray-world method for color image enhancement driven by local contrast. IEEE Trans. Pattern Anal. Mach. Intell. 30, 1757–1770 (2008) 15. Land, E.: The retinex theory of color vision. Sci. Am. 237, 108–128 (1977)

123

SIViP 16. Finlayson, G.D., Trezzi, E.: Shades of gray and colour constancy. In: Presented at the Color Imaging Conference. Scottsdale, Arizona (2004) 17. van de Weijer, J., Gevers, T., Gijsenij, A.: Edge-based color constancy. IEEE Trans. Image Process. 16, 2207–2214 (2007) 18. Gijsenij, A., Gevers, T.: Color constancy using natural image statistics and scene semantics. IEEE Trans. Pattern Anal. Mach. Intell. 99, 687–698 (2010) 19. Bianco, S., Ciocca, G., Cusano, C., Schettini, R.: Automatic color constancy algorithm selection and combination. Pattern Recognit. 43, 695–705 (2010) 20. Ebner, M.: Color Constancy: Wiley-IS&T Series in Imaging, Science and Technology (2007) 21. Freeman, W.T., Adelson, E.H.: The design and use of steerable filters. IEEE Trans. Pattern Anal. Mach. Intell. 13, 891–906 (1991) 22. Gijsenij, A., Gevers, T., van de Weijer, J.: Improving color constancy by photometric edge weighting. Pattern Anal. Mach. Intell. IEEE Trans. 34, 918–929 (2012) 23. Barnard, K., Cardei, V., Funt, B.: A comparison of computational color constancy algorithms; part one: methodology and experiments with synthesized data. IEEE Trans. Image Process. 11, 972– 984 (2002) 24. Geusebroek, J.-M., Smeulders, A.: A six-stimulus theory for stochastic texture. Int. J. Comput. Vis. 62, 7–16 (2005) 25. Barnard, K., Finlayson, G., Funt, B.: Colour constancy for scenes with varying illumination. In: Buxton, B., Cipolla, R. (eds.) Computer Vision—ECCV ’96, vol. 1065, pp. 1–15. Springer, Berlin (1996) 26. Ciurea, F., Funt, B.: A large image database for color constancy research. In: Proceedings of the 11th Color Imaging Conference, pp. 160–164 (2003) 27. Dongjin, S., Dacheng, T.: Biologically inspired feature manifold for scene classification. Image Process. IEEE Trans. 19, 174–184 (2010) 28. Torralba, A., Oliva, A.: Statistics of natural image categories. Network (Bristol, England) 14, 391–412 (2003) 29. Bayazit, U.: Adaptive spectral transform for wavelet-based color image compression. IEEE Trans. Circuits Syst. Video Technol. 21, 983–992 (2011) 30. Idris, F., Panchanathan, S.: Storage and retrieval of compressed images using wavelet vector quantization. J. Vis. Lang. Comput. 8, 289–301 (1997)

123

31. Yongsheng, D., Jinwen, M.: Wavelet-based image texture classification using local energy histograms. Signal Process. Lett. IEEE 18, 247–250 (2011) 32. Serre, T., Wolf, L., Bileschi, S., Riesenhuber, M., Poggio, T.: Robust object recognition with cortex-like mechanisms. IEEE Trans. Pattern Anal. Mach. Intell. 29, 411–426 (2007) 33. Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42, 145– 175 (2001) 34. Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290, 2323–2326 (2000) 35. Abe, S.: Support Vector Machines for Pattern Classification. Springer, New York (2010) 36. Wilamowski, B.M., Hao, Y.: Improved computation for Levenberg–Marquardt training. IEEE Trans. Neural Netw. 21, 930– 937 (2010) 37. Gijsenij, A., Gevers, T., van de Weijer, J.: Computational color constancy: survey and experiments. IEEE Trans. Image Process. 20, 2475–2489 (2011) 38. Color Constancy website. Available: http://www.colorconstancy. com 39. Gehler, P.V., Rother, C., Blake, A., Minka, T., Sharp, T.: Bayesian color constancy revisited. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8 (2008) 40. Shi, L., Funt, B.: Re-processed Version of the Gehler Color Constancy Dataset of 568 Images. Available: http://www.cs.sfu.ca/ ~colour/data/ 41. Barnard, K., Martin, L., Funt, B., Coath, A.: A data set for color research. Color Res. Appl. 27, 147–151 (2002) 42. Hogg, R.V., Tanis, E.A.: Probability and Statistical Inference. Prentice Hall, Englewood Cliffs, NJ (2001) 43. Anthony, M., Bartlett, P.L.: Neural Network Learning: Theoretical Foundations. Cambridge University Press, Cambridge, MA (1999) 44. Levenberg, K.: A method for the solution of certain nonlinear problems in least squares. Q. Appl. Math. 2, 164–168 (1994)

Suggest Documents