linear regression tools like ridge regression and kernel ... Figure 1. A few example images from a training sample of 2330 images [15] used for training purposes.
ESTIMATING ILLUMINATION CHROMATICITY via KERNEL REGRESSION Vivek Agarwal, Andrei V. Gribok, Andreas Koschan, and Mongi A. Abidi Imaging, Robotics and Intelligent Systems Laboratory The University of Tennessee, Knoxville ABSTRACT We propose a simple nonparametric linear regression tool, known as kernel regression (KR), to estimate the illumination chromaticity. We design a Gaussian kernel whose bandwidth is selected empirically. Previously, nonlinear techniques like neural networks (NN) and support vector machines (SVM) are applied to estimate the illumination chromaticity. However, neither of the techniques was compared with linear regression tools. We show that the proposed method performs better chromaticity estimation compared to NN, SVM, and linear ridge regression (RR) approach on the same data set. Index Terms— Kernel regression, Color constancy 1. INTRODUCTION A color image can be represented as a product of three functions of the wavelength Ȝ over a visible spectrum ω .
Ek ( x, y, Ȝ) = R( x, y, Ȝ) L ( Ȝ) S k ( Ȝ ) dȜ ,
³
(1)
Ȧ
where R ( x, y , Ȝ ) is the surface reflectance, illumination property, and S k ( Ȝ ) is characteristics. The subscript k represents response in the kth channel and Ek ( x, y, Ȝ)
L( Ȝ ) is the the sensor the sensor’s is the image
corresponding to the kth channel (k =1, 2, 3). If constant surface reflectance and known sensor characteristics are assumed, then any variation in illumination will change the color appearance of the image. In color constancy research, efforts are directed towards discounting the effect of illumination and obtaining a canonical color appearance. It is of significant importance in areas like video tracking, target detection, and object recognition. Barnard et al. [2] [3] provided a discussion on various techniques used to achieve color constancy including neural networks. Initial learning approaches to the color constancy problem were based on neural networks. Cardei et al. [5] [6] and Funt et al. [10] proposed a multilayer perceptron (MLP) neural network approach in the chromaticity space. They
1424404819/06/$20.00 ©2006 IEEE
981
showed that neural networks achieved better color constancy than color by correlation [5]. Moore et al. [12] developed a neural network to deal with multiple illuminations. Nayak et al. [13] proposed an approach in RGB space to achieve color correction for skin tracking. Stanikunas et al. [14] performed investigations on color constancy using neural network and compared it to the human vision system. Funt et al. [9] proposed an algorithm based on support vector regression (SVR) to achieve color constancy. While neural networks and support vector machines are successfully applied to achieve color constancy, they have not been compared with linear regression tools like ridge regression and kernel regression. In this paper, we present kernel regression, a linear learning method that learns the functional relation between the input and the output. It is used to predict the output response of the previously unseen test image [4]. The computation is performed in the chromaticity space to provide a fair comparison. The test data set [4] and the training data set [15] are independent. A few example training images are shown in Figure 1. Given the chromaticity estimate, the diagonal model [8] is used to obtain a color corrected image. We show through our experiment that kernel regression performs better than ridge regression and established nonlinear approaches such as neural networks and support vector machines on the same set of test images. The rest of the paper is laid out as follows. Kernel regression is presented in Section 2. The implementation of kernel regression and experimental results are shown in Section 3. Finally, conclusions are drawn in Section 4. 2. KERNEL REGRESSION
Kernel regression is a locally weighted polynomial and nonparametric regression technique [1] [16]. There are two forms of local regression models, namely: univariate regression and multivariate regression. We use multivariate regression to estimate the illumination chromaticity. The multivariate kernel regression model is given by, İi + yi = f ( xi , ȕ (q)) ,
(2)
where xi = ( xi1 , xi 2 ,......., xim ) is a vector of predictors for
ICIP 2006
Figure 1. A few example images from a training sample of 2330 images [15] used for training purposes.
the ith of n observations, m denotes the number of bins, ε i is noise, assumed to be normally and independently distributed, yi is the ith response of n observations, ȕ (q) are the kernel coefficients at every query q, and f (⋅) is the function relating the values of response y to the predictors. f ( x , ȕ (q)) is a local model and can have a different set of parameters ȕ (q) for each query. In kernel regression, every computation is with respect to a query. Unless a query is provided, no further computation is performed. For this reason, it is also referred as a lazy learning approach. The output response at every query for nonlinear model is obtained by minimizing the cost function,
The computed distance is weighted using a kernel function. There exist different kernel functions [1] [16]. However, the choice of the kernel function becomes insignificant if the data set is large [1]. We choose the Gaussian Kernel function,
wq = K H ( d ( xi , x q ) / H ) = exp(− d ( xi , x q ) / H ) 2 ,
where wq is the weight computed at a particular query and K H is the Gaussian kernel function. We perform a zero order polynomial surface fit on the data set. The kernel regression coefficients are obtained using, ȕ (q) = ( X qT W q X q ) −1 X qT W qY ,
n
C ( q) = ¦ [( f ( xi , ȕ (q)) − yi ) KW ( d ( xi , x q ) / H )] ,
(3)
where KW is the kernel function, d is the distance function, H is a diagonal bandwidth matrix containing n bandwidth values h, and x q is the query. The query is a point in case of univariate model and a vector in case of multivariate model. The best estimate of yi is obtained by minimizing the cost at every query x q . Thus from equation (3), the estimation of the response in multivariate kernel regression involves, (i) defining a distance function, (ii) selecting the kernel function, (iii) selecting the bandwidth of the kernel function, and (iv) selecting the order of the polynomial fit. There are different distance measures [1]. A scalar Euclidean distance measure between the given data and the query x q , used in this paper, is given as,
¦ ( xi − x q ) i
2
=
¦ ( x i − x q ) T ( xi − x q )
(6)
where
i =1
d ( xi , x q ) =
(5)
Y = ( yi ,....... yn )T is the output vector, Wq = diag ( wq1 , wq 2 ,......... ...., wqn )) is the weight matrix, ª1 x − x 1 q « . . « Xq = « . «. «1 xn − x q ¬
. ( x1 − x q ) p º » . . » », . . » . ( x n − x q ) p »¼
and
p is the order of the polynomial fit . The output estimate Yˆ is obtained using the model, Yˆ ( xq , p, H ) = eT ȕ (q),
(7)
where e T is the ( p + 1) × 1 vector having 1’s in the first entry and 0’s elsewhere.
(4)
i
982
3. EXPERIMENTAL RESULTS
In this section, we explain how kernel regression is used to estimate the illumination chromaticity. All the computation is performed in the chromaticity space. A training data set [15] of 2330 images and a test data set [4] of 320 images are used in our experiment. Both the data sets are independent of each other. Each training images are converted into chromaticity space, sampled and binarized to obtain a 2D binary chromaticity histogram. The bin width was empirically selected, where 32 by 32 bins provided the best results. Other larger bin width selections did not improve the results much but slowed down the process drastically, while smaller bin width selections performed poorly. This bin width selection is in accordance with [5]. In the 32 by 32 binary histogram, 1’s represent the presence of the chromaticity information and 0’s represent the absence of the chromaticity information. The histogram is converted into a column vector of 1024 points. Thus an input design matrix X of size 2330 by 1024 is obtained. The output vector Y for the training images is computed as in [10]. Now each column vector is selected sequentially as a query vector and the kernel coefficients β are computed for every query vector for a fixed bandwidth matrix. The value of the bandwidth affects the output estimate, in the sense that it over fits or under fits the data. Therefore, optimal selection of bandwidth is essential to obtain a right model. When a single bandwidth h is used for the entire data set, it is referred as global bandwidth. There are many techniques [1] to perform optimal bandwidth selection. We selected the global bandwidth empirically. In the test phase, the optimal bandwidth obtained during training is used to estimate the output illumination chromaticity of 320 calibrated test images in the data set. The actual chromaticity values for the data set [4] are known. The root mean square (RMS) rg error is computed using equation (8) between the estimated and actual illumination chromaticity. 2· § 1 N 1 K (k ) §¨ e − t i( k ) ·¸ ¸ , ¦ ¦ i ¨N © ¹ ¸ © i =1 K k =1 ¹
RMS rg = ¨
hidden layer and 2 represents the number of output nodes corresponding to the chromaticity r and g respectively. Support vector regression seeks a continuous function that learns the input-output functional association by mapping them linearly in a higher dimensional feature space. The feature space for nonlinear SVR is defined by kernel functions. Due to its performance in terms of algorithm convergence and robustness we selected the Radial Basis Function (RBF) kernel. In our experiments the C++ implementation of SVR suggested by Collobert et al. [7] is used. The columns of the input design matrix X are highly correlated. Therefore, the matrix ( X T X ) in equation (9) is rank deficient and its inverse does not exist. An approach known as ridge regression for solving a rank deficient system was proposed by Hoerl et al. [11].
Wridge = ( X T X + Ȝ I ) −1 X T Y ,
(9)
where I is the identity matrix and λ is the regularization parameter. Figure 2 shows an example of the results obtained using each algorithm and Table I shows the computed RMS rg errors. Table I also contains the error value computed when no correction is performed, i.e., the output image is the same as the input image.
(a)
(b)
(c)
(d)
(e)
(f)
(8)
where N is the number of images, the superscript K is the number of channels (K = 2 for chromaticity space and K = 3 for RGB space) , e is the estimated illuminant chromaticity, t is the target illuminant chromaticity. Given the chromaticity estimation, a diagonal model [8] is used to obtain color corrected images. The intensity of the image pixels is adjusted such that the average intensity of the image remains constant. A neural network of 1024-10-2 architecture using a scaled conjugate gradient training algorithm is used to compute the illumination chromaticity. In the architecture, 10 represent the number of hidden nodes in the single
983
Figure 2. An example showing the color correction using each algorithm on the "Macbeth" real image. (a) Target image [4], (b) Input image [4], (c) Kernel Regression, (d) Ridge regression, (e) Support vector machine, and (f) Neural network.
Table I: RMS error of (r, g) chromaticity space of all the test images applying each algorithm.
Algorithm
RMS rg error
No correction
0.124
Kernel regression
0.052
Ridge regression
0.060
Neural network
0.071
Support vector machines
0.066
[3] K. Barnard, L. Martin, A. Coath, and B.V. Funt, “A Comparison of Computational Color Constancy Algorithms – Part II: Experiments with Image Data,” IEEE Transactions on Image Processing, vol. 11, no. 9, pp. 985-996, 2002. [4] K. Barnard, L. Martin, B. Funt, and A. Coath, “A Data Set for Color Research,” Color Research and Application, vol. 27, no. 3, pp.147-151, 2002. [5] V. Cardei, B.V. Funt, and K. Barnard, “Estimating the Scene Illumination Chromaticity Using a Neural Network,” Journal of the Optical Society of America A, vol. 19, no. 12, pp. 2374-2386, 2002.
From the example in Figure 2 and the RMS rg errors in Table I, we observe that kernel regression performed better chromaticity estimation than NN, SVM and RR. 4. CONCLUSIONS
In this paper, we presented how kernel regression can be applied to estimate the illumination chromaticity of images, acquired under unknown conditions, based on the trained model. It is a linear technique that can work with nonlinear data. In our experiments, this method obtained better illumination chromaticity estimation compared to neural networks, support vector machines and ridge regression. Kernel regression is computationally less expensive than NN and SVM. We also observed that linear learning techniques performed better than nonlinear learning techniques. The ability of the kernel regression to handle nonlinearity was possibly the reason that it performed better than the ridge regression which cannot handle nonlinear data. 5. ACKNOWLEDGEMENTS
The authors would like to thank the computer vision laboratory at Simon Fraser University for making the image data set publicly available. The training image data base can be found at http://www-2.cs.cmu.edu/~chuck/nips-2003/. This work was supported by the DOE University Research Program in Robotics under grant DOE-DE–FG522004NA25589 and by FAA/NSSA Program, R01-134448/49. 6. REFERENCES [1] C.G. Atkeson, A.W. Moore, and S. Schaal, “Locally Weighted Learning,” Artificial Intelligence Review, vol. 11, pp. 11-73, 1997. [2] K. Barnard, V.C. Cardei, and B.V. Funt, “A Comparison of Computational Color Constancy Algorithms – Part I: Methodology and Experiments with Synthesized Data,” IEEE Transactions on Image Processing, vol. 11, no. 9, pp. 972-983, 2002.
984
[6] V. Cardei, B.V. Funt, and K. Barnard, “Modeling Color Constancy with Neural Networks,” in Proceedings of International Conference on Vision, Recognition, and Action: Neural Models of Mind and Machine, May 29-31, 1997. [7] R. Collobert and S. Bengio, “Support Vector Machines for LargeScale Regression Problems,” Journal of Machine Learning Research, vol. 1, pp. 143-160, 2001. [8] G.H. Finlayson, M.S. Drew, and B.V. Funt, “Color Constancy: Generalized Diagonal Transforms Suffice,” Journal of Optical Society of America A, vol. 11, no. 11, pp. 3011-3020, 1994. [9] B.V. Funt and W. Xiong, “Estimating Illumination Chromaticity via Support Vector Regression,” in Proceedings of Twelfth Color Imaging Conference: Color Science and Engineering Systems and Applications, pp. 47-52, 2004. [10] B.V. Funt and V. Cardei, “Bootstrapping Color Constancy,” in Proceedings of SPIE, Electronic Imaging IV, vol. 3644, pp. 421-428, 1999. [11] A.E. Hoerl and R.W. Kennard, “Ridge Regression: Biased Estimation for Nonorthogonal Problems,” Technometrics, vol. 12, no. 1, pp. 55-67, 1970. [12] A. Moore, J. Allman, and R.M. Goodman, “A Real Time Neural System for Color Constancy,” IEEE Transactions on Neural Networks, vol. 2, no. 2, pp. 237-247, 1991. [13] A. Nayak and S. Chaudhuri, “Self – induced Color Correction for Skin Tracking under Varying Illumination,” in Proceedings of International Conference on Image Processing, pp. 1009-1012, 2003. [14] R. Stanikunas, H. Vaitkevicius, and J.J. Kulikowski, “Investigation of Color constancy with a Neural Network,” Neural Networks, vol. 17, pp. 327-337, 2004. [15] C. Rosenberg, M. Herbert, and A. Ladsariya, “Bayesian Color Constancy with Non-Gaussian Models,” appeared as Poster in Proceedings of NIPS, 2003. [16] Wand M.P. and Jones M.C., “ Kernel smoothing,” CRC Press LLC, Florida, 2000.