Visual feature extraction via PCA-based ... - Semantic Scholar

1 downloads 0 Views 308KB Size Report
Following Coifman and Wick- erhauser, [25] an iteration of the WPT ..... [13] M. Hilton, “Wavelet and wavelet packet compression of electro- cardiograms,” IEEE ...
INTERNATIONAL SYMPOSIUM ON ROBOTS AND AUTOMATION 2002

1

Visual feature extraction via PCA-based parameterization of wavelet density functions Matthew Browne, Saeed Shiry, Matthias Dorn, Robert Ouellette GMD-Japan Research Laboratory, Collaboration Center, 2-1 Hibikino, Wakamatsu-ku, Kitakyushu-city, JAPAN Abstract—This paper describes the use of a novel method of feature extraction for visual sensor processing. An application to crack detection by a sewer maintenance robot equipped with an infra-red camera is described. A spacefrequency distribution ’signature’ is generated via a three step process involving; space-frequency decomposition, density function estimation, and parameter extraction. These steps are achieved, respectively via; the wavelet packet transform, empirical cumulative density functions, and principle components analysis. The ’wavelet signature’ method is shown to be superior to conventional methods as a feature extractor for a logistic regression model in a crack discrimination task. Index Terms—wavelet transform, principle components analysis, PCA, empirical cumulative distribution function, CDF, image classification

I. Introduction

T

HE present research is concerned with feature extraction for the visual system of a KURT2 [1] mobile robot equipped with an infra-red video camera. The task of the the robot is the online detection of cracks and other faults in sewer pipes. The cracks are defined by a relatively distinctive space-frequency structure. However, they are often embedded in a variety of complex textures and other spurious features. Lighting and reflection effects present an additional source of difficulty. Further, the range of possible translation and rotation permutations make robust crack detection in sewer pipes an interesting application in robot vision. For image classification problems of this kind, statistical characterization of the differences between image classes is a significant challenge. Methods for generating concise yet informative parameters of natural images are of interest for a broad range of classification problems. Effective pre-processing / transformation of data (also referred to as feature extraction) is necessary for both dimensionality reduction and better class seperability. These may be regarded as two essential criterion for good classifier performance [2], [3]. Although feature extraction has long been an important topic in pattern recognition and image processing, there are circumstances where previous methods do not work well [4]. In particular, issues of concern include a) computational complexity, due to the high-dimensional nature of the data, b) robustness to lighting effects, and translation, scaling, and rotation of the objects of interest [5].

The utility of wavelet transform (WT) and principle components analysis (PCA) for feature extraction has been demonstrated. Both the WT and PCA linearly transform inputs onto a set of basis vectors, representing an orthogonal rotation of the coordinate axes. However, the basis vectors of the WT are determined a priori by considerations of time (or space) - frequency regularity while PCA basis functions are determined empirically according to a variance maximization criterion. Wavelets have been demonstrated to be a useful tool for various classification and detection problems [6], [7], as well as dimensionality reduction [8], [9]. The wavelet packet transform (WPT), a generalization of the original algorithm [10], appears to be a particularly powerful method for feature extraction and classification applications. For example, the WPT has been used as a feature extractor for classification of faces [11], textures [12] and biosignals [13]. While computationally expensive, PCA is perhaps the most popular technique for dimensionality reduction, with successful application to topics in robot vision [14], [15]. Linear transformation such as WPT and PCA would both appear to be potentially very useful for the present problem. Investigations by the authors have indicated that both methods do well at separating sources of sensor variance (lighting effects, pipe textures, spurious and bonafide cracks). However, straight-forward application of either method to the data is, in itself, an unsatisfactory feature extraction method. Although cracks have a relatively homogenous structure, the translation and rotation varies considerably across samples. Linear transforms alone must model such features with an equivalently large set of basis functions, resulting in poor feature extraction and dimensionality reduction. Chen and Bui [16] describe an interesting application of combined Fourier-wavelet methods which provide a translation, rotation and scale invariant character descriptor. However, the method of [16] does not generalize well beyond character recognition. Taking the squared value of the Fourier transform represents a simple method of generating a translation-invariant descriptor of an image [17] by removing phase information. However, the well-known lack of space / temporal information severely limits the applicability of Fourier methods to many signal classification problems. Many features of interest (including cracks and other faults in sewer-pipe systems) are of limited spatial extent. At a particular fre-

2

INTERNATIONAL SYMPOSIUM ON ROBOTS AND AUTOMATION 2002

quency range, these features represent a characteristic distribution of energy over space. The problem at hand appears to be the combination of two properties required of a single feature extraction system: 1) translation invariance: various translations of the input should be mapped to a similar region in feature space, 2) spatial resolution: the model should retain as much information as regarding the distribution of energy across various spatial translations. Recent research in image processing has utilized parameters of wavelet distributions as low-dimensional image descriptors. Do and Vetterli [18] parameterize the distribution of coefficients in wavelet sub-bands using generalized Gaussian models for texture similarity measurement. Sebe and Lew [19] studied Gaussian, exponential, and Cauchy distributions for extracting parameters of wavelet distributions. The parameters were used to generate a feature vector for subsequent classification. Application of these texture classification methods to general translation invariant image classification would appear to require an expanded dictionary of space-frequency decomposition atoms. One example would be the library of time-frequency bases provided by the wavelet transform (WT) [20]. Further, parameters such as Gaussian mean and standard deviation appear to be insufficient to function as a general distribution descriptor, given a sample drawn from an unknown probability distribution. Empirical wavelet distributions are usually found to be non-Gaussian. The present study involves the application of PCA for extracting parameters from a raw empirical cumulative probability distribution of wavelet coefficients. It will be shown that PCA is capable of extracting features of wavelet distributions to an arbitrary degree of precision, and that the extracted features are effective discriminators. It will be shown that this parameterized spacefrequency energy distribution compares favorably with related feature-extraction methods such as the Fourier transform (e.g. used by [21],[22]), or PCA based texture analysis of raw pixels (e.g. [23]). II. Method A. Generation of a time-frequency signature The present study utilizes the WPT [24], [20] as a primary step in the decomposition process. Wavelet packets comprise a family of orthonormal bases for discrete functions of L2 (R). For a particular wavelet ψ and scaling function φ, wavelet packet functions are of the form Wn (2` x − k), indexed by scale (frequency center) ` ∈ Z, spatial localization k ∈ Z and oscillation (frequency spread) n ∈ N [25]. Given orthonormal quadrature mirror filters hk and gk , wavelet analysis functions are generated recursively: √ X 2 hk Wn (2x − k) √ X W2n+1 (x) = 2 gk Wn (2x − k). W2n (x) =

(1) (2)

W0 (x) and W1 (x) correspond to the generating functions φ and ψ, respectively. Following Coifman and Wickerhauser, [25] an iteration of the WPT may be described as the operation of filters combined with a down-sampling operation

F0 {sk }(2i) =

X

sk hk−2i

(3)

F1 {sk }(2i) =

X

sk gk−2i .

(4)

Recursive splitting of a signal into highpass F1 and lowpass F0 subspaces generates a dyadic tree-structure of wavelet coefficients for 1-dimensional signals. When applied to 2-dimensional signals (i.e. video images), the operations are performed on both rows and columns, resulting in a quadratic rather than a dyadic tree structure. For an N by N | log2 N ∈ Z+ image, the full WPT outputs subspace sets S`,n = {s(`, n, k, j)} consisting of coefficients corresponding to space translations k, j ∈ {1, 2n+1 , 2· 2n+1 , ..., N }. Note that the number of frequency subspaces ` increases logarithmically with n, as the number of coefficients within each subspace decreases. The WPT, by providing estimates of band-limited activity at a range of translations, provides combined spacefrequency information. The subspaces S`,n describe an empirical distribution of activity across space at scale ` with a space-frequency trade-off indexed by n. We introduce the vector s the entries of which are the sorted members of a particular wavelet coefficient set S`,n . The series s = s{l}, l ∈ {k, j} may be interpreted as inverse CDFs, since s{l} is a function of the cumulative probability: s{l} = f (p)|p = Ll , p(s) = f −1 (s). Thus, simply sorting wavelet coefficients within each wavelet subspace yields the inverse empirical CDF. Unlike other possible forms of parameterization of the observed wavelet distributions (i.e. Gaussian- or entropy-based statistics), srepresents a relatively complete description of the statistical properties of an observed distribution, since it makes no assumptions or abstractions regarding the properties of the underlying distribution. Further, s discards information regarding the original x-y spatial co-ordinates of the features, generating a translation invariant descriptor 1 . Let w represent the catenated subspace distributions for all wavelet subspaces with ranked coefficients, w = {s`,n }, characterizing the density distribution for each wavelet subspace with respect to space. For particular wavelet packet parameters, a training set of J shape vectors {wj }, j ∈ {1, ..., J} may be generated. We wish to parameterize the features of the collected wavelet CDFs in order to reduce the dimensionality as much as possible whilst preserving the essential characteristics of the observed CDFs. We utilize PCA as an assumption free, data-driven method for extracting the primary axes in along which variation in cumulative distributions occur. Dimensionality reduction 1 Note that ’translation-invariant’ is used here with reference to the distribution model generated by the inverse CDF. The wavelet series {sj`,n } is invariant to translations which are a factor of 2` .

PCA BASED FEATURE EXTRACTION OF WAVELET CDFS

is achieved by extracting the first q principle axes un of the data set, being those orthonormal axes onto which the retained variance is maximal. The sample covariance maP trix C = j (wj − w)(wj − w)T /N is calculated from the training data set {wj }. The principle components uq are the eigenvectors with the greatest associated eigenvalues of C, such that Cun = λn un . The basis functions uq yield coefficients λq , which represents a mapping to a lower dimensional subspace, while maintaining the retained variance. Points in this subspace may be used as input to a classifier system with q dimensional input, J training cases, and associated class labels. The generation of a space-frequency energy signature may be summarized: 1. Transformation of the original image into a series of space-frequency subspaces S`,n via the wavelet transform. 2. Generation of sorted wavelet coefficient sequences {sj`,n }, corresponding to inverse empirical CDFs, and thus representing an estimate of the cumulative distribution across space for each frequency subspace. 3. Parameterization / dimensionality reduction of the collected subspaces w via application of PCA, yielding features λq for input to a classifier system.

3

Figure 1 displays a single training frame: in shot is a clear vertical crack, a horizontal pipe join (not a fault), along with some lighting / reflection effects.

Fig. 1. Example video frame comprising 300 16x16 pixel subsections

Space-frequency signatures λq were generated for 5700 training examples, and used as input to linear and nonlinear classifiers. The linear classifier consisted of a logistic regression model. Logistic linear regression models

B. Application to robot vision The present study takes place in the context of the development of a robot-vision system for a sewer maintenance robot. The task of the robot is to navigate sewer pipes, scan the walls of the pipe, and detect cracks and other defects. The development of the present feature extraction algorithm was motivated by the need for a translationinvariant feature extractor for video image data, which nonetheless was sensitive to differences between spatial distributions. For example, a crack in a sewer wall is characterized by a particular space-frequency distribution, but a wide variety of translations and orientations. The generation of space-frequency distribution is used as a low-level feature detector for discrimination of crack-like structures at a low spatial level 2 . Video data was gathered from a conventional sewerinspection robot during inspection of damaged sewer pipes in the city of Kitakyushu, Japan. Video was converted to a 30fps, 320x240 pixel digital representation. Nineteen video frames at a variety of time points were manually analyzed, pixels being determined as part of a crack structure or not. Each video frame was divided into a tiled map 300 16x16 pixel sections. If at least one pixel in the section was manually determined to be part of a crack structure, the entire section was classified as a crack. The section provided the raw input to the space-frequency signature algorithm. The section was classified a ’crack’ if at least one pixel in the array was manually determined to be part of a crack structure. 2 Note that a higher-level classifier system is ultimately used to make the final decision regarding the presence of pipe-wall damage. However, discussion of this system is beyond the scope of this paper.

y=

exp(α + β1 χ1 + · · · + βn χn ) 1 + exp(α + β1 χ1 + · · · + βn χn )

(5)

are the preferred method for performing linear prediction on a dichotomous dependent variable [26]. The nonlinear classifier consisted of a artificial neural network (ANN) with 15 hidden neurons. Tan-sigmoid activation functions were used for the hidden layer, with a log-sigmoid activation function, suitable for binary valued targets, used for the single output neuron. III. Results A variety of feature-representation spaces on the video image sections were generated for comparision purposes: 1. Raw pixel data. (RAW) 2. Sorted raw pixel data. (CDF) 3. Absolute Fourier frequency spectrum (FFT) 4. Sorted wavelet space-frequency distribution (WAVCDF). PCA was applied to each of the four types of image representation, reducing the dimensionality of the input data from 256 to 30 in each case. Performance was assessed using a single run of logistic regression, and ten (10) runs of the ANN classification model, using different random starting weights. ANNs were trained for a limited period of 200 epochs. Figure 2 shows the classification performance of the linear classifier for each of the four kinds of input representations. Adjusted classification rates are shown, given equal weightings for each class (i.e. performance shown is the mean of classification rate over classes). Figure 3 displays the mean performance of each batch of 10 trained ANNs, with the 95% confidence interval of

4

INTERNATIONAL SYMPOSIUM ON ROBOTS AND AUTOMATION 2002

Total Correct Perc.

Non-crack 5074 5072 99.96%

Crack 626 580 92.65%

Total 5700 5652 99.16%

TABLE I Performance of optimized ann with wavsort input

Fig. 2. Classification performance of logistic regression with various input transformations

the mean. Three one-tailed t-tests were performed, testing the null hypothesis that WAVCDF performance was not higher than each of the other forms of pre-processing. The WAVCDF preprocessing method was associated with significantly better performance, p < 0.001 in each case.

Fig. 3. Classification performance of logistic regression with various input transformations

In order to assess the potential performance of the WAVICD method, the best performing network was trained for a further 400 epochs, resulting in the MSE being reduced from 0.0158 to 0.0091. The classification performance of this network is shown in table 1. IV. Discussion A feature extraction method was presented, based on PCA-extracted parameters of sorted wavelet coefficient distributions (which represent inverse empirical CDFs). This represents a rather new and unusual approach to feature extraction for several reasons; firstly due to the inter-

mediate stage of feature extraction, constituting estimation of the empirical CDF of each wavelet scale. Secondly, characterizing the features of the CDFs is accomplished using PCA rather than using parameters such as variance, mean, skewness, etc. The PCA approach makes no assumptions regarding the underlying shape of the distribution, outputting statistically independent coefficients that optimally represent the characteristics of the distribution. The wavelet - empirical CDF method compared favorably with other feature extraction methods. Both linear and non-linear classifier systems performed better using the wavelet - CDF descriptor, achieving an average classweighted performance of 90%, in the case of the ANN. The results support several conclusions. The WAVCDF method performed significantly better than the FFT or the CDF, indicating that a combined space-frequency representation is more useful than either by itself. Generally, more sophisticated pre-processing led to better classifier performance. It is well known that effective pre-processing / feature extraction leads to better classifier performance. However, considering that dimensionality reduction / statistical independence is achieved in any case, via the PCA step, it is worthwhile to enquire why better performance is achieved. Note that the key characteristic of each pre-processing technique appears to be in it’s non-linearity (i.e sorting, taking Fourier magnitudes) - which amounts to discarding information (permutation and phase information, respectively). Better classification performance is obtained by throwing (less useful) information away. Thus, the classification problem may be simplified during the pre-processing stage. This interpretation is supported by the result that the differences in classification performance were more pronounced for the linear, compared to the non-linear classifier. The powerful function approximating properties of the ANN were able to offset, to some degree, the more complex discrimination problem presented by less sophisticated pre-processing methods. Conversely, the WAVCDF and FFT preprocessing methods increased the linear separability of the discrimination problem. However, overall performance of the non-linear classifier was significantly higher than the linear classifier, indicating that, even with sophisticated pre-processing, the discrimination problem remained non-linear. Finally, it is noted that for the less sophisticated forms of preprocessing (CDF and RAW), the variance of the ANN performance was significantly great (see fig. 3). This supports the interpretation that more sophisticated pre-processing simplifies the decision boundary, rather than ’adding’ to the discriminant information. It is suggested that there exists a complex decision bound-

PCA BASED FEATURE EXTRACTION OF WAVELET CDFS

ary for discriminating CDF and RAW pixel data, resulting in more variable classifier performance. It appears that the space-frequency distribution of wavelet coefficients is a useful form of feature extraction for some image discrimination problems. Further, parameterization of CDFs may be satisfactorily achieved through use of PCA. References [1] [2] [3] [4] [5] [6]

[7]

[8] [9] [10]

[11] [12] [13] [14] [15] [16] [17] [18]

[19] [20] [21]

F. Kirchner and J. Hertzberg, “A prototype study of an autonomous robot platform for sewerage system maintenance,” Autonomous Robots, vol. 4, no. 4, pp. 319–331, 1997. P.F. Hsien and D. Landgrebe, Classification of high dimensional data, Ph.D. thesis, School of Electrical and Computer Engineering, Purdue University, 1998. T.Y. Young and K.S. Fu, Handbook of Pattern Recognition and Image Processing, Colledge of Engineering, University of Miami, 1986. C. Lee and D. Landgrebe, “Feature extraction and classification algorithms for high dimensional data,” Tech. Rep. TR-EE 93-1, School of Electrical Engineering, Purdue University, 1993. X. Luo and G. Mirchandani, “An integrated framework for image classification,” in Proceedings of ICASSP, 2000. A. Mojsilovic, M. Popovic, A. Neskovic, and A. Popovic, “Wavelet image extension for analysis and classification of infarcted myocardial tissue,” IEEE Trans. Biomedical Engineering, vol. 44, no. 9, September 1997. P. Gunatilake, M. W. Siegel, A. G. Jordan, and G. W. Podnar, “Image understanding algorithms for remote visual inspection of aircraft surfaces,” in Proceedings of the SPIE Conference on Machine Vision Applications in Industrial Inspection V, San Jose, Februrary 1996, SPIE, pp. 2–13. F. Murtagh and J.L. Starck, “Image processing through multiscale analysis and measurement noise modeling,” Statistics and Computing, vol. 10, pp. 95–103, 2000. F. Murtagh, J.L. Starck, and M. Berry, “Overcoming the curse of dimensionality in clustering by means of the wavelet transform,” The Computer Journal, pp. 107–120, 2000. Mladen Victor Wickerhauser, “INRIA lectures on wavelet packet algorithms,” in Probl´ emes Non-Lin´ eaires Appliqu´ es, Ondelettes et Paquets D’Ondes, Pierre-Louis Lions, Ed., pp. 31–99. INRIA, Roquencourt, France, 17–21 June 1991, Minicourse lecture notes. C. Garcia, G. Zikos, and G. Tziritas, “Wavelet packet analysis for face recognition,” Image and Vision Computing, vol. 18, no. 4, pp. 289–297, February 2000. A. Laine and J. Fan, “Texture classification by wavelet packet signatures,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 5, no. 11, pp. 1186–1191, November 1993. M. Hilton, “Wavelet and wavelet packet compression of electrocardiograms,” IEEE Transactions on Biomedical Engineering, vol. 44, pp. 394–402, May 1997. Aleix M. Martinez and Avinash C. Kak, “PCA versus LDA,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 2, pp. 228–233, 2001. D. L. Swets and J. Weng, “Using discriminant eigenfeatures for image retrieval,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 18, no. 8, pp. 831–836, 1996. G. Y. Chen and T. D. Bui, “Invariant fourier-wavelet descriptor for pattern recognition,” Pattern Recognition, vol. 32, pp. 1083– 1088, 1999. A.D. Marshall and R.R. Martin, Computer Vision, Models and Inspection, World Scientific Publishing, 1992. M. N. Do and M. Vetterli, “Wavelet-based texture retrieval using generalized gaussian density and kullback-leibler distance,” in IEEE Transactions on Image Processing, February 2002, vol. 217. N. Sebe and M. Lew, “Wavelet based texture classification,” in International Conference on Pattern Recognition, 2000, pp. 959–962. S. Mallat, A Wavelet Tour of Signal Processing, Academic Press, San Diego, 1998. S.S. Wang, P.C. Chen, and W.G. Lin, “Invariant patternrecognition by moment fourier descriptor,” PR, vol. 27, no. 12, pp. 1735–1742, December 1994.

5

[22] O.R. Mitchell and T. Grogan, “Global and partial shape discrimination for computer vision,” OptEng, vol. 23, no. 5, pp. 484–491, September 1984. [23] J. Hurri, A. Hyv, J. Karhunen, and E. Oja, “Image feature extraction using independent component analysis,” 1996. [24] R. R. Coifman, Y. Meyer, S. R. Quake, and M. V. Wickerhauser, “Signal processing and compression with wavelet packets,” in Progress in Wavelet Analysis and Applications, Y. Meyer and S. Roques, Eds., Proceedings of the International Conference “Wavelets and Applications,” Toulouse, France, 8–13 June 1992, pp. 77–93. Editions Fronti` eres, Gif-sur-Yvette, France, 1993. [25] M. V. Wickerhauser and R. R. Coifman, “Entropy based methods for best basis selection,” IEEE Trans. on Inf. Theory, vol. 38, no. 2, pp. 719–746, 1992. [26] S. Menard, Applied Logistic Regression Analysis, Sage Publications, Thousand Oaks, CA, 1995.

Matthew Browne completed his PhD studies in 2001 at Griffith University, Brisbane, Australia. For his dissertation, he investigated the use of wavelets for the analysis of psychophysiological signals. He likes working in artificial intelligence using statistical, signal processing, and other numerical methods. He is currently working as a researcher with the GMD-Japan research laboratory, focusing on processing of robot sensor data.

Suggest Documents