Oct 5, 2003 - automatically detect the absence of the fastening bolts that secure the rails to the sleepers. The inspection system uses images from a.
Proceedings of the 2003 IEEE International Symposium on Intelligent Control Houston. Texas * October 5-8,2003
Visual Recognition of hexagonal headed bolts by comparing ICA to Wavelets Pier Luigi Mazzeo, Nicola Ancona, Ettore Stella, Arcangelo Distante Institute on Intelligent Systems for Automation CNR - Italy 70126 Bari, Via Amendola 166/5 E-mail: {mazzeo,ancona, stella, distante] @ba.issia.cnr.it
Abstract- In this p a p we present vision-based techniques to automatically detect the absence of the fastening bolts that secure the rails to the sleepers. The inspection system uses images from a digital line scan camera installed under a train. This application is part of the most general problem of object recognition. In object recognition as in supervised leaming, we offen extract new fearures from original ones for the purpose of reducing the feature space dimensions and achieving better performances. The goal of this paper is to compare two techniques within the context of the hexagonal-headed bolts recognition in railway maintenance. The first technique is Wavelets Transform (WT), the second technique is Independent Component Analysis (ICA), a new method that produces spatially localized and statistically independent basis vector. n e coeflicients of the new representation in the 1CA and WI subspace are supplied as input to a Support Vector Machine (SVM). A SVM classifier analyses the images in order to evaluate the pre-processing technique which could give the highest rate in detecting the presence of the bolts. Results in terms of detection rate and false positive rate are given in the paper.
I.
Introduction
In the last decade a large variety of algorithms for object detection problems have been studied by the computer vision community especially for industrial inspection process. This kind of detection problem can be reconducted to the flat object recognition problem from 2-Dintensity image. Usually, edge detection, border following, thinning algorithm, straight line extraction but also the use of active contour (snakes) are the low level processes preparing image to high level processes, such as grouping or perceptual organization [3]. However, these methods fail if the pattems are distorted by imaging process, view-paint change, lighting changes or large intra-class variation among the pattems. In order to overcome these problems, the mostly used approaches in object recognition are based on feature extraction by a pre-processing technique. Each pattem is represented in terms of features or attributes and it is considered as a point in a multidimensional space. The efficacy of the representation space is determined by evaluating the separation among pattem from different 0-7803-7891-1/03/$17.000 2003 IEEE
636
classes. In this way, the object detection can be seen as a classification problem, because our ultimate aim is to determiae a separating surface, optimal under certain conditions, which is able to separate object views from image patterns that are not instances of the object. Vapnik [4] has introduced a new leaming scheme, called Support Vector Machine (SVM) to approach classification and regression problem The basic idea of Vapnik's theory is related to a regularization 151: for a ffite set of training examples, the search for the best model or approximating function has to be constrained by an appropriately small hypothesis space, that is the set of functions the machine implements. Furthermore, the SVMs are robust in presence of noise, and they do not require a geometrical model of the searched pattem. The goal of this paper is to compare a WT technique to a newer one based on Independent Component Analysis (ICA) and Principal Component Analysis by a SVM classifier, in order to fmd the best one for our application domain. In order to achieve this, the SVM classifier has been trained to, recognize the hexagonalheaded bolts. In this case, examples of hexagonal-headed bolts having different orientationshave been extracted from the images to make the training set. This type of bolt does not appear in the image in a fixed orientation, therefore as many different examples as possible mnst be considered. The image pattems are fvstly pre-processed before being passed as input to the classifiers. Wavelet Transform (WT) and Independent Component Analysis (ICA) have been applied to the patterns in order to significantly represent them in a smaller number of coefficients. The ICA of multidimensional vector is a linear transformation minimizing the statistical dependence between its components. This type of representation proves useful in an important number of applications, such as data analysis and compression, blind source separation, blind deconvolution and denoising [6,7,8,9].This representation projects our data into a space where the components have maximized their statistical independence, and in many problems, sparse distributians. Over the last ten years,
several researches have tried to demonstrate the possibility to apply the ICA in feature extraction. More specifically, ICA has been applied by Bell to the study on natural scenes[lO], on face recognition [I11 and to general object recognition and classification in [12]. In these works ICA reduces the data dimension and it is well suited for feature extraction. In a previous work [13] the Wavelet Transform has been successfully applied in a railway context for the recognition of fastening elements. S V M Classifier has been utilized to evaluate the performance of both pre-processing methods on a validation set of test images different from those used for the training set. Major details will be given in the experimental section. The paper is organized as follows. In section 2 an overview of the system is presented. In section 3 the pre-processing algorithm used for the fundamental features extraction are briefly described. The classification techniques are introduced in section 4 and, finally, experimental results and some concluding remarks are given. .
Fig. 1. Images of rail fixed
to the sleeper by
hexagonal-headed
bolts
Fig. 2. Sample image panems of the hexagonal-headedbolts
extracted fiom the original image.
11.
System Overview
LH kKI I
As introduced in the previous section, we have developed
a vision system which automatically detects the fastening elements (track bolts) securing the rail to the sleepers. This system is important for track maintenance as it provides information about the possible absence of these bolts. This type of anomaly is very dangerous because it can provoke changes in the rail smcture with serious consequences for safety issues. As can be seen in fig.] there are two hexagonal-headed bolts on the sleeper. Object recognition by using a learning from example technique has a relevant concern with the computational complexity. In order to achieve real-time performances, the computational time, needed to classify patterns, has to remain actually low. One of the parameters responsible for high c o m p ~ t a t i o complexity ~l is certainly the input space dimension. A reduction of the input space is the fust step to successfully speed up the classification process. This requirement can be answered by using an algorithm of feature extraction able to store all the necessary information of input patterns in a small set of coefficients. Wavelet Transform (WT) and Independent Component Analysis (ICA) are two approaches which allow to reduce the dimension of the input space capturing the significant variations of input pattems in a smaller number of coefficients. In om approach we compare and apply WT and ICA to pre-process the image pattems extracted from the original image. After the pre-processing step, the obtained fundamental features are fed into the SVM classifiers for the recognition phase.
Fig. 3. The decomposition of the image with a 2-level Wavelet Transform.
RI.
-
Pre Processing
In the pre-processing step some transformations in the image are made to allow the extraction of its intrinsic features. First of all, only the subimages containing the object to recognize are extracted from the original image. In the case of hexagonal-headed bolts the subimages containing the bolts are 12 x 72 pixel windows (see fig. 2). WT and ICA are applied to the extracted subimages in order to represent them only through coefficients containing the most discriminant information. The objective is to characterize the images with a narrow number of features so as to process data more quickly. In the following two subsections we briefly review WT and ICA properties. According to the characteristics of the present problem, some techniques will result more appropriate than others to extract the significant information. For this reason we have applied different pre-processing strategies on images in order to establish the hest technique for om application domain.
637
A. Wavelet Transform The WT is an extension of the Fourier Transform containing not just frequency information but spatial information too [14]. The Wavelet Transform operator F: L*(R) +L2(W) can be defined as follows:
when s varies, the frequencies on which the function Y operates are changed and when f varies, the function Y is moved on all the support of the functionf:
In this work we have used a Discrete Wavelet Transform which supplies a hierarchical representation of the image implemented with the iterative application of two filters: a low pass filter (approximation filter) and its complementary in frequency (detail filter). A bidimensional WT breaks an image down into four subsampled or decimated images. In figure 4 the fmal result of a 2-level WT is shown. In each subimage the capital letters refer to the fdters applied on the image of the previous level: H stands for an High pass filter, L stands for a Low pass filter. The first letter is the filter applied on the horizontal direction, while the second letter is the filter applied on the vertical direction. The band LL is a coarser approximation of the original image. The band LH and HL record the changes of the image along horizontal and vertical directions. The band HH shows the high frequency components of the image. Decompositions are iterated on the LL subband which contains the low-frequency information of the previous stage. For example, after applying a 2-level WT, an image is decomposed into subbands of different frequency components as shown in figure 5. Numerous fdters can be used to implement W T we have chosen the Daubechies filters for their simplicity and orthogonality.
B. Independent Component Analysis Independent Component Analysis (ICA) is a statistical method to transform an observed multidimensional random vector into statistically independent components. If we denote as =(xl,x2,,,,,xm)r a zero mean m-dimensional random variable that can be observed and, likewise, as s = ($, ,s l , - ,,sn ) r its n-dimensional transformation, then the problem is to determine a constant weight matrix W so that the linear transformation of the observed variables s = wx (3) produces statistically independent transformed components. In this way, we can consider the equation (3) as linear
638
transformation of the data performed by the projection of the observed data on the rows of the matrix W. This interpretation is suitable for featixe extraction and pattem recognition.
Figure. 4. R e 2-level Wavelet Transform on a subimage containing the hexagonal-headedbolt.
Figure. 5. Some statistically independent ICA basis extracted.
Figure. 6. Some basis images for the ICA factorial representation, obtained by training on the first 200 principal component coefficients of the bolt images, oriented in columns of the iQut. Bases are contained in the columns of A= W' The fundamental consideration to be made is that this projection on the basis vector (i.e the rows of W) is not fixed a priori but is estimated starting *om the data. In other words, this approach is similar to the Principal component analysis but it is different from many other linear transformations (like Fourier, Wavelet and Gabor) where the basis vectors are independent from the data. The principal difference with PCA lies in the way to determine the coefficients of matrix W. In PCA the covariance mahix (a second order statistics) and the
coefficients are evaluated assuming that the resulting si are not-correlated; in ICA higher order statistics are needed and the coefficients of matrix W are estimated (not calculated) assuming that si is independent (that is a requirement stronger than not-correlated), The main problem in ICA is concerned with the estimation of the weights W, so that the transformed components are independent. The method is based on objective (or contrast) functions calculated on the basis of some statistic properties of the data. The minimization or maximization of these functions and the relative adaptive change of the weights allow the final estimation of W. Several objective functions have been proposed for the estimation of the projection matrix. Basically, these functions are based on likelihood, network entropy, mutual idonnation, and approximation of these [1,6,7,15]. In [2], Hyvarinen presents a robust scheme for the minimization of mutual information through the successive estimation of projection pursuit directions. The implementation of this scheme is a fast and reliable algorithm called FastICA. Figure 5 shows eight hasis generated by the FastICA algorithm.
IV.
In this section we briefly review the basic concepts of SVM for classification. Refer to [4] for a more detailed description. We are given a training set s = {(xj,y,)}:=,of
/
sizelwhere x ~ E % " and y i ~ { - l , l ) , f o r i=1,2,._., [.In other words, we assume that the examples in S belong to either of two classes. SVM maps the input data X in a higher dimensional feature space H by using a non linear function $(x) and fmds in H the optimal separating hyperplane: (4)
maximizing the margin, distance between the hyperplane and the closest points of the training set, and minimiziug the number of misclassified patterns. The trade-off between these two factors is controlled by a user defined regularization parameter C > 0. Let K be a function of two variables x and y of the input space which computes the inner product between their comespoudiug images, $(x) and
$0. in
the
feature
space
that
xi of S with
,Ii > 0.Specifying the kernel function K used in S V M is equivalent to specify the set of possible classifier that the machine implements, or the complexity of the function space in which the fmal classifier lives. The classification of a new data X involves the evaluation of the decision function: y = sign(f(x)).
V.
Experimental Results
The images of the rail have been obtained by a line scan camera DALSA with 512 pixels of resolution, installed under a diagnostic train during its maintenance route. In order to reduce the effects of variable natural lighting conditions, an appropriate illumination setup equipped with six OSRAM 41850 FL light sources has been installed too. In this way the system should be robust against changes in the natural illumination. Besides, in order to synchronize data acquisition, a trigger is sent to the TV camera by the wheel encoder. The spatial resolution of the trigger is 2mm. A pixel resolution of 2x2 mm can be obtained choosing a TV camera with a focal length of 6mm. The integration time of the TV camera has been properly set to acquire images also when the train speed is m a x i " , i.e. 200
Kmnt.
Support Vector Machine for Classification
f(x)=3.&)+b.
centered on the support vectors only, points
is:
K ( x , y )= & x ) . $ ( y ) .
Then, the optimal separating hyperplane in the feature space (4) can be written as a non linear separating surface in the input space:
represented as a linear combination of kernel functions
A long video sequence of a rail network of about 50km has been obtained. A number of sample images has been exhacted fiom the sequence to create the training sets and the validation sets for the SVM. Positive and negative examples of bolts have been manually extracted from the training images. Each example consists of a n x m pixel subwindow, where n and m depend on the dimension of the considered bolt in the image (see section 3). The mining set for the classifier trained to recognize the hexagonal-headed bolt contains 301 positive examples and 351 negative examples. The different preprocessing strategies described in section 1II.A and 1II.B have been applied on the image examples. The Support Vector Machine has been trained on the previous specified training set. The set of positive examples has been used to calculate the ICA basis with the FastICA algorithm proposed by Hyvirinen [2]. Two different architectures have been utilized to perform ICA on bolt images[l6]: Architecture to fmd statistically independent basis images (ICAI); Architecture to find a factorial code (ICA2). The first architecture generates 199 Independent Component of the image set (200 positive example set), which provide a set of statistically independent basis image (figure 6 shows eight basis generated).
639
In the second architecture, instead of performing ICA directly on the 5184 image pixels, ICA has been performed on the first 200 PCA coefficients of the face images in order to reduce the dimensionality. The first 200 PCA coefficients
accounted for over 96% of the variance in the images. The FastICA algorithm found a 200 x 200 weight matrix W that produced a set of independent coefficients in the output The basis functions for this representations consisted of the columns of cy-' . A sample of the basis set is shown in Figure 7, where the principal component reconstruction has been used to visualize the bases as images. All the training examples are projected on these basis (or on some subsets of them) and the resulting coefficients are the features used to train Support Vector Machine. Furthermore, WT has applied up to the 3" level and the obtained approximation coefficients (LL level 3) have been directIy considered as input features for the S V M classifier. The Support Vector Machine has been trained by the approximation (LL level 3) coefficients of the training examples. In all of OUT experiments, we have been using the same trainiig set with a linear kernel function K ( x ,y ) = x .y and a unbounded regularization parameter C.To measure the generalization capabilities of the learning machine, defined as the ability of the machine to correctly classify image patterns never seen before, we tested the classifier on 801 positive examples and 801 negative examples of hexagonal-headed bolt. In this way we evaluated the classifier generalization ability and the effects of the different pre-processing strategies on the images. The table 1 and 2 report the results of the experiments. In fust column of the both table the pre-processing strategies are listed in the case of WT the first row refers to the pre-processing carried out by using the approximation coefficients (LL) at 3d level of decomposition. In the case of ICA pre-processing the rows refer to a different architecture utilized (ICAl and ICA2) and to a different subsets of the basis vectors, generated by FastICA a l g o r i w to project the images. In the last two rows are reported the results obtained by using PCA pre-processing strategies. In the second column of the table 1, the number of input coefficients for the S V M classifier are listed. In the third column of table 1 the number of support vector, generated by SVM during the training phase, is reported. AU examples of the training set are correctly classified by trained SVM. In the last two columns of table 2 the percentage of detection rate obtained from the test on the validation set is reported. Detection rates are given in terms of True Positive (TP) rate and True Negative (TN) rate. The detection rates are referred to the results of the single run on a validation set. From table 2 can be found out that the SVM classifiers
640
perform almost well in all the considered cases of image pre-processing. However, the best result obtained is in boldtype in one row of table 2. In particular, the approximation coemcients obtained applying low pass fdter to both directions (vertical and horizontal) at level 3 of decomposition of WT have provided the best results in the classification process for the hexagonal bolt. Funhermore, the percentage of correct classification in ICA preprocessing increases as the unmber of basis rises (number of coefficients fed to SVM). The grey cells in the table 2 allow to compare E A , PCA and WT pre-processing smtegies by the same number of features (coefficients). We can note that the WT is better than ICA and PCA since the classification percentage of true positive and true negative examples is higher (see grey cells in table 2). In the same way the number of Support Vector is lower for the WT pre-processing (see grey cells in table I). The main problem of ICA is that there is not a systematic method to select the basis. For example, in Principal Component Analysis, the eigenvalues associated to each base give information on the importance of that base; then the most significant basis can be selected choosing the ones having the highest weights. For example, in table 2 (grey cells of ICA2 pre-processing) we have selected the fnst 81 basis obtaining a detection rate, of true positive examples, equal 99.00%. However, in the row ICA2* an order between the independent component has been introduced. One way is to use the norms of the columns of the mixing matrix, which give the contributions of the independent Components to the variances of the each component of X (see 3). Ordering the components according to descending norm of the corresponding columns of A, for example, gives an ordering reminiscent of PCA [I 71.We can note by table 2 that the ICM* method work better than ICAZ pre-processing strategy by the same number of independent basis. It should be observed, then, that bigber performances could be obtained with the same number of basis supposing that only the most significant ones are selected. Next works will be addressed in this direction.
VI.
Conclusion
In this paper we have described a vision-based system to automatically detect the absence or presence of the hexagonal-headed bolts securing the rails to the sleepers. The periodical inspection of railway infrastructures is very important to prevent dangerous situations. The inspection system uses images acquired by a digital line scan camera installed under the train. SVM classifiers was trained to recognize hexagonalheaded bolts. Firstly, the images were pre-processed by using methods based on WT, PCA and ICA. Then the obtained detecting system was tested on a validation set to establish the best pre-processing techniques for the
hexagonal-headed bolt. Different experiments were carried out to address the problem of feature reduction in order to increase the system’s computatioual performance. The results showed that the Independent Component AMIYS~S is well suited for the pre-processing step because the classifier revealed high classification performance. However, comparing these results to the ones obtained by a WT pre-processing, we noticed that the last method worked better.
[ I l l Aapo Hyv5inen. Survey an Indqendenr Componenl Andysis, Helsinki University of Technology.
TABLE I Number of SV of Wined SVM
References RA. new leaminn.algorithm far 111 S.Amari, A. Cichoeki and H. Y ~. . blind separation, Advances for neural lnfomtim Processing, 8 pp. 757-163, 1996. [2] N A. Hyvirinen, Fast and robust Fixed-point Algorithms for Independent Component Analysis.Neura1 Compulation, I I pp. 14831492. 1999. [3] Juan AndradeCena and Avlnash C. Kak, “Object Reco@ition”, in Wiley Eneyelopedio of Eieciricd Engineering. ,vol. Sup. 1 pp. 449470,2000.. 141 V. Vapnik, The Nature of Statistical Leaming 7heory. Springer Verlag, 199s. 151 T. Evgenious, M. Pontil, and T.Poggio, A unified framework Tor regulan’zation nehvorks and support vector Machines. Technical Report A.I. Memo No.1654, Artificial htclligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, 1999. (61 A.Bell and T. Sejnowski. An infomatim-nuximization approach to blind smaration and blind dnonvolution Neural Conmutation.. 7 w. .. 1129-11’59,1995. 171 J. Cardoso, bfomax and maximum likelihood Tor source setlaration .. IEEE Lcttm on Simal Prwersing,4pp. 112-114,1997. [SI A. Hyv5rineinen. Sparse code shrinkage: demising ofnon-gausrian data by maximum likelihood estimation, Neural Computation, 11 pp. 1739-1769.1999, 191 C. Jutten, 1. H m u l t , Blind Separation ofsources part 1: an adaptive algorithm based on neummimtic architecture, signal processing, 24pp.1-10,1991. [IO] Anthony J. Bell, Tmence I. Sejnowsky, ne Independent Componenls ofnolurol scenes are edge/illers, Advances in Neural Information Processing System, 9 pp. 831-837,1996. [ I l l K. Back, Bmce Draper, J.R Beveridgc. K. She, PCA YS [CA: n comporiran on lhe FERETdoro ,er, Joint C o n f m c e an Information Sciences (JCIS ’OZ), Durham, North Carolina, pp. 824-827, Mar. 2002. [I21 M. Bressan, David Guillamet, Jordi Vihii, Using an [CA representolion of high dimensional dolo for o&cl recognilion m d clossijieorion, Panem Recognition (accepted far publication, April 2W2). (131 EStella. P.L. Marzeo, M. Nini, 0. Cicirelli, A. Distante, T. D ’ h i o . yiruol Recogniiion of Missing Fmlenig Elemenls for Roilroad Moinlenonce. proceedinp of the I7SC-IEEE International Conference on Intelligent Tnnrportation System, Sept. 2002, Singapore. [I41 S. Mallat, A Woveler Tow o/signol Proceshg, Academic Press, 1999 [IS] P.Comon, Independem Componenl Anaiysir - o new eoncepl?, Signal processing, 36 pp. 287-314,1994. [la] M.S. Bartleit H.M. Lades and T. I. Sejnowski, lndependenl component represenlalions for face recognilion, pmeedings of the SPlE C o n f m c e on Human Vision and Electronic Imaging 111, San Jose, 1998.
TABLE 11 Classificationon Validation Test Projection on 199 and 81 ICA basis and 200,81 PCA basis.
1
641
PreProcessing
1
Numbez of Coefficient
1
True Positive
I
True Negative
1