Feature Extraction and Classification in Surface

0 downloads 0 Views 196KB Size Report
image database is comprised of 14 different models, 42 surface classes and 960 pieces. ... results outperform those obtained in a previous work [1] where the soft color ... only to carry out the feature extraction on color texture descriptors, but also to ..... If βi is negative, the higher the value of Xi, the lower the probability.
Feature Extraction and Classification in Surface Grading Application Using Multivariate Statistical Projection Models Jos´e M. Prats-Montalb´ana , Fernando L´opezb , Jos´e M. Valienteb and Alberto Ferrera a b

Department of Applied Statistics (DEIOAC) Department of Computer Engineering (DISCA) Technical University of Valencia, Camino de vera s/n, 46022 Valencia, Spain ABSTRACT

In this paper we present an innovative way to simultaneously perform feature extraction and classification for the quality control issue of surface grading by applying two well known multivariate statistical projection tools (SIMCA and PLS-DA). These tools have been applied to compress the color texture data describing the visual appearance of surfaces (soft color texture descriptors) and to directly perform classification using statistics and predictions computed from the extracted projection models. Experiments have been carried out using an extensive image database of ceramic tiles (VxC TSG). This image database is comprised of 14 different models, 42 surface classes and 960 pieces. A factorial experimental design has been carried out to evaluate all the combinations of several factors affecting the accuracy rate. Factors include tile model, color representation scheme (CIE Lab, CIE Luv and RGB) and compression/classification approach (SIMCA and PLS-DA). In addition, a logistic regression model is fitted from the experiments to compute accuracy estimates and study the factors effect. The results show that PLS-DA performs better than SIMCA, achieving a mean accuracy rate of 98.95%. These results outperform those obtained in a previous work [1] where the soft color texture descriptors in combination with the CIE Lab color space and the k-NN classifier achieved a 97.36% of accuracy. Keywords: SIMCA, PLS-DA, Soft color texture descriptors, Design of experiments, Logistic regression

1. INTRODUCTION At present, there are many industries manufacturing flat surface materials that need to split their production into homogeneous series grouped by the global appearance of the final product. These kinds of products are used as wall and floor coverings. Some of them are natural products such as marble, granite or wooden boards, and others are artificial, such as ceramic tiles. At the moment, these industries rely on human operators to carry out the task of surface grading. However, human grading is subjective and often inconsistent between different graders (reproducibility) and even within the same grader (repeatability) [2], thus, automatic and reliable systems are needed. Also, real-time compliance to inspect the overall production at on-line rates is an important issue. In a recent work [1] we successfully approached the issue of surface grading using a set of soft color texture descriptors which were able to achieve a global success ratio of 97.36% in the VxC TSG image database. The work was performed using two well established statistical tools, the experimental design [3] and logistic regression [4], in order to test for the k-NN classifier, several color spaces, cross-validation schemes and all the possible combinations of color texture features. In all color spaces, the best results were achieved using all the color texture descriptors. Thus, no feature selection was performed in the set of color texture features. As reducing computational costs is an important issue in inspection applications, we now have focused our attention to statistical methods able to carry out an efficient compression of the feature space, that is, able to perform an efficient feature extraction. Thus, in this paper we present a study based on Multivariate Statistical Projection Models which make possible this compression. Moreover, we apply SIMCA and PLS-DA methods not Further author information: (Send correspondence to Jos´e M. Prats-Montalb´ an) E-mail: [email protected]

only to carry out the feature extraction on color texture descriptors, but also to directly perform classification using statistics and predictions computed from the extracted projection models. The paper is developed as follows. Section 2 presents an overview of literature works related with the surface grading issue and a presentation of the soft color texture descriptors method. In Section 3 the Multivariate Statistical Projection Approaches: SIMCA and PLS-DA are introduced. Section 4 presents the experimental design and logistic regression. Section 5 introduces the VC TSG database, used as ground truth in experiments. Section 6 deals with the experiments carried out and its results. And finally, Section 7 concludes the paper.

2. BACKGROUND 2.1. Surface Grading Surface grading is related with the automatic classification of flat pieces presenting random surface patterns. The aim of surface grading is to split the production into different classes sorted by their global appearance which depends on color and texture properties. In recent years many approaches to surface grading have been reported (see Table 1). Boukouvalas et al. [5] proposed color histograms and dissimilarity measures of these distributions to grade ceramic tiles. Table 1. Summary of surface grading literature.

Boukouvalas Baldrich Lumbreras Pe˜ naranda Kauppinen Kyll¨ onen Lebrun Kukkonen

ground truth ceramic tiles polished tiles polished tiles polished tiles wood wood marble ceramic tiles

features color color/texture color/texture color/texture color color/texture color/texture color

time study no no no yes yes no no no

accuracy 92.0% 93.3% 80.0% 98.0% 80.0%

Other works consider specific types of ceramic tiles; polished porcelain tiles, which imitate granite. These works include texture features. Baldrich et al. [6] proposed a perceptual approximation based on the use of discriminant features defined by human classifiers at factory. These features mainly concerned grain distribution and size. The method included grain segmentation and features measurement. Lumbreras et al. [7] joined color and texture through multiresolution decompositions on several color spaces. They tested combinations of multiresolution decomposition schemes (Mallat’s, `atrous and wavelet packets), decomposition levels and color spaces (Grey, RGB, Otha and Karhunen-Lo`eve transform). Pe˜ naranda et al. [8] used the first and second histogram moments of each RGB space channel. Kauppinnen [2] developed a method for grading wood based on the percentile (or centile) features of histograms calculated for RGB channels. Kyll¨ onen et al’s approach [9] used color and texture features. They chose centiles for color, and LBP (Local Binary Pattern) occurrence histograms for texture description. Lebrun and Macaire [10] described the surfaces of the Portuguese ’Rosa Aurora’ marble using the mean color of the background and mean color, absolute density and contrast of marble veins. They achieved good results but their approach was very dependent on the visual properties of this marble. Finally, Kukkonen et al. [11] presented a system for grading ceramic tiles using spectral images. However, spectral images have the drawback of producing great amounts of data. In the literature review, we found that many of these approaches specialized in a specific type of surface, others were not accurate enough or simple did not provide accuracy information, most of them performed a brief experimentation work and yet others did not take into account time restrictions in a real inspection process at factory. Thus, we thought surface grading was still an open field were more contributions were possible. In this sense, we attempted to fill these literature deficiencies by presenting an approach which is a generic method

suitable for use in a wide range of random surfaces. Also we carried out and extensive experimentation and achieved good accuracy results with a representative data set of ceramic tiles. Our approach [1] used what we called soft color texture descriptors, which are simple and fast [to compute] global color and texture statistics. Thus, it is also appropriate for real-time compliance.

2.2. Soft Color Texture Descriptors Method The soft color texture descriptors method is simple, a set of statistical features describing color and texture properties are collected [12]. The features are computed in a perceptually uniform color space (CIE Lab or CIE Luv). These statistics form a feature vector used in the classification stage where the well known k-NN was chosen as classifier. CIE Lab and CIE Luv were designed to be perceptually uniform. The term ’perceptual’ refers to the way that humans perceive colors, and ’uniform’ implies that the perceptual difference between two coordinates (two colors) will be related to a measure of distance, which commonly is the Euclidean distance. Thus, color differences can be measured in a way close to the human perception of colors. These spaces were chosen to provide accuracy and perceptual approach to color difference computation. As the data set images were acquired originally in RGB, conversion to CIE Lab or CIE Luv coordinates was needed. This conversion is performed using the standard RGB to CIE Lab and RGB to CIE Luv transformations [13]. Following the ITU-R Recommendation BT.709, we used the illuminant D65 in the formulae. We proposed several statistical features for describing surface appearance. For each color channel we chose the mean and the standard deviation. Also, by computing the histogram of each color channel, we were able to calculate histogram moments. The nth moment of z about the mean is defined as: µn (z) =

L X

(zi − m)n p(zi )

i=1

where z is the random variable, p(zi), i=1, 2, ... , L the histogram, L the number of different variable values and m the mean value of z. We chose histograms moments from 2nd to 5th, which are related to textural information [12].

3. MULTIVARIATE STATISTICAL PROJECTION APPROACHES 3.1. SIMCA (Soft Independent Modelling of Class Analogy) The SIMCA approach, Soft Independent Modelling of Class Analogy [14], consists of building one Principal Component Analysis (PCA) [15] for each of the studied surface grades, in order to determine whether or not new observations belong to one of them. This is performed by means of the calculation of the distance to each of the built models, computed as the Residuals Sum of Squares (RSS), and the assignment of the new observation (ceramic tile) to the model showing the lowest distance. In this case, when creating each of the PCA models, each surface grade has been mean centered and scaled by their own mean and standard deviation. In order to understand the mechanism of this approach, let us introduce the Principal Component Analysis (PCA) model. PCA is a well known projection method of the original variables onto new ones, called latent variables that express the inner relationship between the original ones, by maximising the variance in the data structure. A PCA model can be expressed in matrix notation as: X = AB T + R where A (IxF) is the score matrix and B (JxF) is the loading matrix, and F is the number of latent variables called principal components. These new latent variables are extracted from the data according to their eigenvalue, extracting and compressing the most relevant information in a few orthogonal vectors. This is why PCA has been traditionally used in Pattern Recognition as a feature extraction technique since, by combining the weights

of the original variables in the new ones, two objectives are achieved: in one hand, a reduction of the data dimensionality; on the other hand, a fulfilment of the independency and normality assumptions required by most of the classifiers used a posteriori to perform the classification [16]. However, in SIMCA, a training procedure is performed building a PCA model for each surface grade of a given tile model. Once the loading matrix B has been obtained, the scores for the feature vector of any new testing image Xnew is calculated by projecting this matrix onto B: Anew = Xnew B When the scores of the new image are obtained, the residuals can be computed as: Enew = Xnew − Anew B T From these residuals (Enew ) associated to each image, the RSS statistic are computed [17], which represent the Euclidean distance to each built model. This way, the lower the RSS value of a new image to a PCA model, the closer it is to that model. Thus, the SIMCA approach works assigning the image to the surface grade related to that PCA model. Here, PCA models are directly used as a classifier as well as a feature extraction tool.

3.2. PLS-DA (Partial Least Squares Discriminant Analysis) Other Multivariate Projection approach is the discriminant version of the Partial Least Squares (PLS) [18] model, called Partial Least Squares Discriminant Analysis (PLS-DA) [19]. PLS is a multivariate predictive model, similar to PCA in the sense that it also looks for reducing the size of the original data structure, and at the same time generating new latent variables that explain the inner relations between an input X (IxJ) matrix and an output response Y (IxC) matrix, by maximising the covariance (X,Y). In our case, X relates to the feature matrix, and Y to the classes matrix obtained from the surface grades. In the PLS model, the weight matrix W (JxF) is the one maximising the covariance between X and Y . This matrix lets to obtain the feature extraction vectors or scores T (IxF) by means of T = XW (P T W )−1 , where P (JxF) is the loadings matrix for X, which lets to test whether a new image belongs or not to the built model. ??Thus, we can apply a reject criterion??. X = TPT + R where R (IJ) is a residual matrix. On the other hand, Y can also be expressed in a latent or feature space as: Y = U QT + S where U (IxF) and Q (CxF) are scores and loadings matrices for Y (IxC) and S (IxC) is a residual matrix. Finally, using the inner relationship between the feature vectors T and U gathered by B (FxF), U = T B, it is possible to predict the response matrix Y: Ypred = T BQT By means of the PLS-Discriminant Analysis (PLS-DA) [19] approach, PLS models can be used for classification purposes. In this case, it is necessary to build a Y matrix associated to each of the classes (surface grades) present in the training set, and formed by as many dummy variables as classes. These dummy variables are actually column vectors with value 1 for those images (feature vectors) linked to the corresponding class (surface grade), and 0 for the rest. This way, the PLS-DA model tries to find those directions that maximise the

separation between the different classes in the training set. Finally, by projecting any new testing image onto the PLS-DA model, its prediction with respect to each class is obtained. One traditionally applied classifier in Pattern Recognition is LDA [4], which is used after a feature extraction process in order to fulfil the LDAs requirements linked to data multicollinearity, normality assumptions and homogeneity between classes; as well as the relations between the number of samples and variables. When using LDA, the dimensionality reduction model used is PCA, not PLS. Thus, some latent variables providing discriminating information may be not taken into account when extracting the feature vectors, if we do not extract many latent variables (and this is not our aim since we are also interested in dimensionality reduction). This happens because PCA does not look for discriminating directions (latent variables) between classes, but for latent variables maximising the variance for each class. However, this hidden information may be important for the discrimination between classes. PLS-DA is different from this more traditional Fishers Linear Discriminant Analysis (LDA) [20], and has advantages since it does not suffer of these mentioned computational restrictions. In PLS-DA, since maximising the covariance between X and Y is the aim, this information will be directly gathered by the first latent variables of the PLS-DA model. Thus, by using PLS-DA we achieve two goals at the same time: extracting the features that help in segregating the classes (surface grades) in the model, and minimising the dimension of the feature extraction data structure. A comparison of different discriminant analysis techniques can be found in [21].

4. DESIGN OF EXPERIMENTS AND LOGISTIC REGRESSION In this work we have 14 different ceramic tile models to classify, three possible color spaces to use (CIE Lab, CIE Luv and RGB) and two compression/classification approaches to apply (SIMCA and PLS-DA). In order to efficiently analyze the influence on the accuracy of these factors, a Design Of Experiments (DOE) was required [3]. In this case, we opted to use a complete factorial experimental design, which includes all the possible combinations of the factors levels in the model. This way, each studied effect related to each factor and its interactions with the other factors is orthogonal to the rest, enabling to analyze all the statistical significances of every effect in a separate way, and to find the best combination that maximizes the accuracy in the surface grading for each ceramic tile model. Thus, a 141 x31 x21 complete factorial design has been built, leading to an 84 runs Design Of Experiments. Once the results of the accuracy in the classification have been registered for each run (corresponding to each ceramic tile model, color space and classification approach), an adequate tool to analyse these results is needed. Commonly, a linear regression model is used to predict the response variable. But when dealing with the analysis of percentages, as is the case of the accuracy in classification, the proper tool to employ is the logistic regression model, where the original percentage p variable is transformed by the logit function in the variable log[p/(1 − p)]. This transformation assures that the prediction will range in [0 1], hence avoiding estimation and interpretation problems [20], since the p variable does not follow a normal distribution, as shown in Figure 1. A multiple logistic regression model, where several explicative variables Xi are involved (it does not matter if they correspond to simple factors or interactions) can be expressed as:  p = β0 + Σβi Xi log 1−p 

Expressing the model in terms of p: p=

eβi +Σβi Xi 1 + eβi +Σβi Xi

where the expression βi + Σβi Xi is called alpha (α). This model differs in the meaning of the involved variables, attending to their nature. When Xi corresponds to a quantitative variable, and assuming constant values for the rest of variables, a positive βi value in the

quantitative variable means that as the value of Xi increases, the probability of the accuracy in the classification also does. If βi is negative, the higher the value of Xi , the lower the probability. However, when dealing with qualitative variables, as is our case, the interpretation is different, since we need to create, for each factor with K categories, K − 1 dummy variables (binary variables) that evaluate the difference of the accuracy with respect to the k-th category used as a reference. This way, for one specific categorical variable, a positive (or negative) βi value related to one of the categories of the variable means that the probability of the accuracy in the classification in that category is higher (or lower) than that accomplished by the category used as a reference. In order to decide which terms should be part of the final model and remove the non-statistically significant the forward selection stepwise method has been applied [20].

2

Logistic regression 1.5

Linear regression

Probability

1

0.5

0

−0.5

−1

0

20

40

60

80

100

120

140

160

180

200

Figure 1. Logistic vs linear regression.

5. VXC TSG IMAGE DATABASE All the experiments were carried out using the VxC TSG image database (VxC Tiles for Surface Grading). This image database is based on samples taken from the ceramic tile industry and is comprised of 14 ceramic tile models, 42 surface grades and 960 pieces. It was built in the VxC laboratory in collaboration with Keraben S.A. (a large ceramic tile company) and is an extensive image database of ceramic tiles representing the wide range of surface classes in the ceramic tile industry. VxC TSG is also intended to be a tool for the scientific community working on surface grading or texture recognition. It is public and available at miron.disca.upv.es/vision/vxctsg/. Each model in the database has three different surface classes a priori given by specialized grading operators at factory (see Table 2 and Figure 1). Every model has two close classes and another one distant to them. Surface classes or grades are represented by numbers and close numbers mean close classes. Thus, the database includes difficult cases within every model. Models were chosen representing the extensive variety of surfaces that factories can produce. A catalogue of 700 models is common in these industries. However, in spite of this great number of models almost all of them imitate one of the following mineral textures: marble, granite or stone. Digital images of tiles were acquired using a spatially and temporally uniform acquisition system. This system was comprised by high quality components: a scan-line color camera (Dalsa Trillium TR-31-02k25)

and an illumination system (Mercrom FXC2372-2) formed by two special high frequency and spatially uniform fluorescents. To overcome variations along time, the power supply of fluorescents was automatically regulated by a photoresistor located near to them. Table 2. Ground truth of ceramic tiles.

Agata Antique Berlin Campinya Firenze Lima Marfil Mediterranea Oslo Petra Santiago Somport Vega Venice

classes 13, 37, 38 4, 5, 8 2, 3, 11 8, 9, 25 9, 14, 16 1, 4, 17 27, 32, 33 1, 2, 7 2, 3, 7 7, 9, 10 22, 24, 25 34, 35, 38 30, 31, 37 12, 17, 18

tiles/class 16 14 24 30 20 24 14 30 24 28 28 28 20 20

size (cm) 33x33 23x33 16x16 20x20 20x25 16x16 23x33 20x20 16x16 16x16 19x19 19x19 20x25 20x25

aspect marble stone granite stone stone granite marble stone granite stone stone stone marble marble

Figure 2. Samples from the ground truth; Petra (up) and Marfil (down) models. Each sample corresponds to one of the three classes within each model.

Spatial and temporal uniformity is very important in the surface grading application [5, 6, 8] because slight variations in acquisition conditions, i.e illumination, can produce different shades for the same surface and then misclassifications. To demonstrate the reliability of the acquisition system we carried out the following experiment. Six tiles, each one corresponding to a different model, were captured repeatedly. The tiles were chosen trying to cover a wide range of surface types and colors. The complete set of tiles was acquired at random moments, 23 times, over 54 hours. We extended the experiment over 54 hours (two days and six hours) because this is the mean period at factories when they produce a specific model, and we wanted to study the spatial and temporal uniformity for a complete surface grading session. Environmental conditions were holded constant

using an air conditioner system for temperature and a closed cabin for illumination. In order to study the temporal response we measured the mean CIE Lab color of each piece. Also, to study the spatial response we randomly oriented the pieces in each capture. The CIE Lab is a perceptually uniform color space and we can measure the perceptual difference between two colors using the Euclidean distance in this space [13]. Thus, color differences can be measured in a very similar way to the human perception of colors. Mahy and Oosterlink [22] stablished that in CIE Lab a noticeable difference of color (for humans) begins at 2.3 or greater Euclidean distances. From this assertion we will consider the system sufficiently stable if there is no Euclidean distance above 2.3 when we calculate all the Euclidean distances between the first sample and the rest of them for a given tile. In the results (see Figure 2) there was no distance above 2.3 in all tiles, and also they remained significantly far away from this limit. Thus, we determined the acquisition system was sufficiently uniform spatially and temporally. FLUORESCENTS 5

noticiable difference granito mediterranea somport vega blue venice green venice

Euclidean distance

4

3

2

1

0 0

10

20

30

40

50

time in hours

Figure 3. System response over 54 hours using fluorescents.

6. EXPERIMENTS AND RESULTS We carried out the Design of experiements commented in section 4, and computed the corresponding logistic regression model. Accuracies were compiled using the Leaving-one-out cross-validation scheme which achieves unbiased accuracy rates and it is recommended for small data sets (the number of samples for each tile model ranges from 48 to 90). Table 3 shows the fitted model. In the model, no statistically significant difference is appreciated between the color spaces, maybe linked to the fact that both PCA and PLS models are able to combine the effect of the different color channels in a proper way when creating the latent variables. So it seems feasible that no statistical difference appears among the color spaces, since they are combinations of the others. This is an important result since we could use the fastest [to compute] color space without affecting the classification performance. RGB is the fastest color space as for CIE Lab and CIE Luv we need to apply transformation formulae to convert the original RGB data into this spaces. According to the classification approach, PLS-DA model shows in all cases a better performance in classification than SIMCA approach. Thus, PLS-DA should be used rather than SIMCA. This also suggests a very important conclusion; that there is some discriminating information not taken into account by PCA models, probably because its variance in the data structure is small. All ceramic tile models show statistically significant differences in the accuracy with respect agata model, which was used as the reference one. This means that, although the accuracies are quite close, the differences between models are consistent from a statistical point of view.

Table 3. Logistic regression model. Dependent variable: accuracy Factors: Model, Space, Approach Estimated Regression Model (Maximum Likelihood) Standard Estimated Parameter Estimate Error Odds Ratio CONSTANT Model=ANTIQUE Model=BERLIN Model=CAMPINYA Model=FIRENZE Model=LIMA Model=MARFIL Model=MEDITERRANEA Model=OSLO Model=PETRA Model=SANTIAGO Model=SOMPORT Model=VEGA Model=VENICE Approach=PLS-DA

2,3199 14,4076 1,3831 2,0218 -0,0144 1,0882 -0,4089 -0,3314 14,9027 -0,9213 1,7258 -0,14583 3,0099 0,6680 1,8861

0,4688 374,0390 0,8532 0,9878 0,6235 0,7766 0,6287 0,5527 365,9240 0,5279 0,9091 0,5719 1,7972 0,7283 0,3431

1,81E6 3,9873 7,5524 0,9857 2,9699 0,6644 0,7179 2,97E6 0,3980 5,6171 0,8643 20,2850 1,9503 6,5935

Analysis of Deviance Source Model Residual Total (corr.)

Deviance 112,692 35,201 147,893

Df 14 69 83

P-Value 0,0000 0,9998

Percentage of deviance explained by model = 76,1984 Adjusted percentage = 55,9135

Finally, Table 4 presents the predicted and observed (averaged for the three color spaces) percentages achieved by PLS-DA for the different ceramic tile models.

7. CONCLUSIONS This work has analysed the benefits of using Multivariate Statistical Projection Models for surface grading, when using soft color texture descriptors as features. Particularly, SIMCA and PLS-DA approaches, based on Principal Component Analysis and Partial Least Squares models have been applied for automatically classifying three different grades of 14 ceramic tiles models. These PCA and PLS models let to perform the classification by directly projecting the new observations (images) onto them, and computing some statistics or predictions. Leaving-one-out cross-validation method have been used to build the models and compute the classification accuracies. In order to study the possible differences between the accuracies achieved for each ceramic tile model, the color space used (CIE Lab, CIE Luv or RGB), and compare the two approaches applied (SIMCA or PLS-DA) a complete factorial design of experiments has been performed, leading to a 84 runs experiment. Since the response variable, the accuracy, is a percentage, the correct way of analysing the design of experiments is via a logistic regression model. The results showed statistically significant differences among the tile models. Thus, though in all models a very good accuracy rate is achieved, the tile models differ with regards to classification. However, the most important result is that PLS-DA shows better performance than SIMCA, giving strength to the idea that using inferential (covariance based) models for classification provides better results that using compression (variance based) models. One possible reason for PLS-DA to perform better than SIMCA is that surface grades show slight differences that do not affect to the general internal data structure of the images. So, they may be not

Table 4. Predicted and observed accuracies for the ceramic tile models using PLS-DA.

tile model Agata Antique Berlin Campinya Firenze Lima Marfil Mediterranea Oslo Petra Santiago Somport Vega Venice global mean

predicted accuracy 98.53% 100.00% 99.63% 99.80% 98.51% 99.50% 97.81% 97.97% 100.00% 96.39% 99.74% 98.30% 99.93% 99.24%

observed accuracy 100.00% 100.00% 100.00% 100.00% 98.89% 99.54% 96.83% 98.15% 100.00% 95.63% 99.60% 97.62% 100.00% 99.44%

98.95%

98.98%

gathered by the PCA models. But if these differences have an influence on the segregation of the surface grades, they will be comprised in the PLS-DA model. So this could be the reason why, for classes showing some common structure, inferential models should work better than compression models. The logistic regression model predictions have shown accuracies very close to the averaged observed ones, which give confidence in the good obtained results, with predicted values between 96.39% and 100%, with an averaged predicted value of 98.95%. These results also point out to the soft colour texture descriptors and the Multivariate Statistical Projection Models as a simple and easy to implement technique in industrial environments for monitoring purposes. Finally, it must be pointed out that PLS-DA has outperformed the results achieved in [2], where the k-NN classifier was used instead. In that work, the global accuracy achieved was 97.36%. Moreover, both SIMCA and PLS-DA approaches present the additional advantage of being able to use a reject criterion [16], so they can reject any new sample that does not belong to any of the classes used to build the models, for some pre-defined type I risk. However, this reject criterion has not been applied in this work, in order to be able to compare the results with the ones in [2].

REFERENCES [1]

[2] [3] [4] [5] [6]

F. L´ opez, J.M. Valiente and J.M. Prats. Surface grading using soft color texture descriptors. 10th Iberoamerican Congress on Pattern Recogniton, CIARP 2005. Lecture Notes in Computer Science, 3773, pp. 13-23, 2005. H. Kauppinen. Development of a color machine vision method for wood surface inspection, Phd Thesis, Oulu University, 1999. D.C. Montgomery. Design and analysis of experiments (4th edition). John Wiley & Sons, Inc. New York, 1997. R. Christensen, Log-linear models and logistic regression (2nd Edition), Springer-Verlag, New York, 1997. C. Boukouvalas, J. Kittler, R. Marik and M. Petrou. Color grading of randomly textured ceramic tiles using color histograms. IEEE Transactions on Industrial Electronics, 46(1):219-226, 1999. R. Baldrich, M. Vanrell and J.J. Villanueva. Texture-color features for tile classification. EUROPTO/SPIE Conference on Color and Polarisation Techniques in Industrial Inspection, Germany, 1999.

[7] [8] [9] [10] [11] [12] [13] [14]

[15] [16] [17] [18] [19] [20] [21]

[22] [23]

F. Lumbreras, J. Serrat, R. Baldrich, M. Vanrell and J.J. Villanueva. Color texture recognition through multiresolution features. 5h International Conference on Quality Control by Artificial Vision, France, 2001. J.A. Pe˜ naranda, L. Briones and J. Florez, Color machine vision system for process control in ceramics industry. SPIE, 3101:182-192, 1997. J. Kyll¨ onen and M. Pietik¨ainen, Visual inspection of parquet slabs by combining color and texture, Proceedings of IAPR Workshop on Machine Vision Applications, Japan, 2000. V. Lebrun and L. Macaire, Aspect inspection of marble tiles by color line-scan camera, 5th International Conference on Quality Control by Artificial Vision, France, 2001. S. Kukkonen, H. K¨ alvi¨ anen and J. Parkkinen, Color Features for Quality Control in Ceramic Tile Industry, Optical Engineering, 40(2):170-177, 2001. R.C. Gonzalez and P. Wintz, Digital image processing, Addison-Wesley, 2nd Edition, 1987. G. Wyszecki and W.S. Stiles, Color science: concepts and methods, quantitative data and formulae, Wiley, 2nd Edition, New York, 1982. S. Wold, C. Albano, W.J. Dunn, U. Edlund, K. Esbensen, P. Geladi, S. Hellberg, E. Johansson, W. Lindberg and M. Sj¨ ostr¨ om. Multivariate Data Analysis in Chemistry. In: B.R. Kowalski (ed.) Chemometrics: Mathematics and Statistics in Chemistry, D. Reidel Publishing Company: Dordrecht, Holland, 1984. J.E. Jackson. A Users guide to Principal Components. Wiley. New York, 1991. A.K. Jain, R.P.W. Duin and J. Mao. Statistical Pattern Recognition: A Review. IEEE Transactions on Pattern Analysis and Machine Intelligence (2000) 22 (1) 4-37. P. Nomikos, J.F. MacGregor. Multivariate SPC Charts For Monitoring Batch Processes. Technometrics 1995; 37: 1, 41-59. H. Wold. PLS Regression. Encycopledia of Statistical Sciences, vol. 6, 581-591. Johnson NL, Kotz S (eds.). Wiley: New Cork, 1984. M. Sj¨ ostr¨ om, S. Wold and B. S¨ oderstr¨ om B. PLS Discriminant Plots. Proceedings of PARC in Practice, Amsterdam, June 19-21, 1985. Elsevier Science Publishers B.V.: North-Holland, 1986. J.D. Jobson. Applied Multivariate Data Analysis: Categorical and Multivariate Methods. Springer-Verlag: Berlin, 1992. J.M. Prats-Montalb´ an, A. Ferrer, J. Gorbe˜ na, J.L.J. Malo. A comparison of different discriminant analysis techniques in a steel industry welding process, Chemometrics and Intelligent Laboratory Systems 80 (2006), 109-119. M.L. Mahy and E.A. Oosterlink, Evaluation of uniform color spaces developed after the adoption of CIELAB and CIELUV, Color Research and Application, 19(2):105-121, 1994. R.O. Duda and P.E. Hart, Pattern classification and scene analysis, John Wiley and Sons, New York, 1973.