Feature extraction via PCA of wavelet densities - CiteSeerX

0 downloads 0 Views 286KB Size Report
plication to crack detection by a sewer maintenance robot equipped with an infra-red .... is achieved by extracting the first q principle axes un of the data set, being ..... Editions Fronti`eres, Gif-sur-Yvette, France, 1993. [25] M. V. Wickerhauser ...
INTERNATIONAL SYMPOSIUM ON ROBOTS AND AUTOMATION 2002

1

Feature extraction via PCA of wavelet densities M. Browne, S. Shiry, M. Dorn, R. Ouellette and T. Christaller GMD-Japan Research Laboratory, Collaboration Center, 2-1 Hibikino, Wakamatsu-ku, Kitakyushu-city, JAPAN Abstract—This paper describes the use of a novel method of feature extraction for visual sensor processing. An application to crack detection by a sewer maintenance robot equipped with an infra-red camera is described. A spacefrequency distribution signature is generated via a three step process involving; 1. space-frequency decomposition, 2. density function estimation, and 3. parameter extraction. These steps are achieved, respectively via; the wavelet transform, empirical density estimation, and principle components analysis. Extraction of wavelet distribution features is shown to be superior to conventional methods for classification. Performance was assessed using both linear (logistic regression) and non-linear (ANN) classifiers in a crack discrimination task. Index Terms—wavelet transform, principle components analysis, PCA, empirical cumulative distribution function, CDF, image classification

I. Introduction

T

HE present research is concerned with feature extraction for the visual system of a KURT2 [1] mobile robot designed for sewer inspection and equipped with an infrared video camera. The task of the the robot is the online detection of cracks and other faults in sewer pipes. The cracks are defined by a relatively distinctive spacefrequency structure. However, they are often embedded in a variety of complex textures and other spurious features. Lighting and reflection effects present an additional source of difficulty. Further, the range of possible translation and rotation permutations make robust crack detection in sewer pipes an interesting application in robot vision. For image classification problems of this kind, statistical characterization of the differences between image classes is a significant challenge. Methods for generating concise yet informative parameters of natural images are of interest for a broad range of classification problems. Effective pre-processing / transformation of data (also referred to as feature extraction) is necessary for both dimensionality reduction and better class seperability. These may be regarded as two essential criterion for good classifier performance [2], [3]. Although feature extraction has long been an important topic in pattern recognition and image processing, there are circumstances where previous methods do not work well [4]. In particular, issues of concern include a) computational complexity, due to the high-dimensional nature of the data, b) robustness to lighting effects, and translation, scaling, and rotation of the objects of interest [5]. The utility of wavelet transform (WT) and principle components analysis (PCA) for feature extraction has been

demonstrated. Both the WT and PCA linearly transform inputs onto a set of basis vectors, representing an orthogonal rotation of the coordinate axes. However, the basis vectors of the WT are determined a priori by considerations of time (or space) - frequency regularity while PCA basis functions are determined empirically according to a variance maximization criterion. PCA is perhaps the most popular technique for dimensionality reduction, with successful application to topics in robot vision [14], [15]. Wavelets have been demonstrated to be a useful tool for various classification and detection problems [6], [7], as well as dimensionality reduction [8], [9]. The WT and the wavelet packet transform, a generalization of the original algorithm [10], appear to be particularly powerful methods for feature extraction and classification applications. For example,wavelets have been used as a feature extractor for classification of faces [11], textures [12] and biosignals [13]. Linear transformation such as WT and PCA would both appear to be potentially very useful as a component in a robot vision system. Investigations by the authors have indicated that both methods do well at separating sources of sensor variance (lighting effects, pipe textures, spurious and bona-fide cracks). However, straight-forward application of either method to the data is, in itself, an unsatisfactory feature extraction method. Although cracks have a relatively simple basic structure, the translation and rotation varies considerably across samples. Linear transforms alone must model such features with an equivalently large set of basis functions, resulting in poor feature extraction and dimensionality reduction. Further computational considerations in machine vision processing are of concern - it is of interest to determine the most inexpensive transform that performs effectively. Chen and Bui [16] describe an interesting application of combined Fourier-wavelet methods which provide a translation, rotation and scale invariant character descriptor. However, the method of [16] does not generalize well beyond character recognition. Taking the squared value of the Fourier transform represents a simple method of generating a translation-invariant descriptor of an image [17] by removing phase information. However, the well-known lack of space / temporal information severely limits the applicability of Fourier methods to many signal classification problems. Many features of interest (including cracks and other faults in sewer-pipe systems) are of limited spatial extent. At a particular frequency range, these features represent a characteristic distribution of energy over space. The problem at hand appears to be the combination of

2

INTERNATIONAL SYMPOSIUM ON ROBOTS AND AUTOMATION 2002

two properties required of a single feature extraction system: 1) translation insensitivity: various translations of the input should be mapped to a similar region in feature space, 2) spatial resolution: the model should retain as much information as regarding the distribution of energy across various spatial translations. Recent research in image processing has utilized parameters of wavelet distributions as low-dimensional image descriptors. Do and Vetterli [18] parameterize the distribution of coefficients in wavelet sub-bands using generalized Gaussian models for texture similarity measurement. Sebe and Lew [19] studied Gaussian, exponential, and Cauchy distributions for extracting parameters of wavelet distributions. The parameters were used to generate a feature vector for subsequent classification. Application of these texture classification methods to general translation invariant image classification require an expanded dictionary of space-frequency decomposition atoms. The use of parameters such as Gaussian mean and standard deviation requires assumptions that a) the data follows a particular distribution, b) differences between classes in terms of the model parameters exist. In certain analysis situations, these assumptions may be justified, but in general, appear to be insufficient to function as a general distribution descriptor, given a sample drawn from a mixture of unknown probability distributions. For example, the wavelet distribution drawn from an image of a crack embedded in a background texture would be best modelled as a mixture of two (or more) distributions. The present study involves the application of PCA for extracting parameters from a raw empirical cumulative densities of wavelet coefficients. The purpose is to use these parameters as features for discrimination. Cumulative density functions describe an empirical distribution without fitting a model, and without requiring any assumptions regarding the underlying generative distribution. It will be shown that PCA is capable of extracting features of wavelet distributions to an arbitrary degree of precision, and that the extracted features are effective discriminators. It will be shown that this parameterized spacefrequency energy distribution compares favorably with related feature-extraction methods such as the Fourier transform (e.g. used by [21],[22]), or PCA based texture analysis of raw pixels (e.g. [23]). II. Method A. Generation of a time-frequency signature The present study utilizes the WT [24], [20] as a primary step in the decomposition process. Wavelets comprise a family of orthonormal bases for discrete functions of L2 (R). For a particular wavelet ψ and scaling function φ, wavelet functions are of the form Wn (2` x − k), indexed by scale (frequency center) ` ∈ Z, spatial localization k ∈ Z and oscillation (frequency spread) n ∈ N [25]. Given orthonormal quadrature mirror filters hk and gk , wavelet analysis functions are generated recursively:

√ X 2 hk Wn (2x − k) √ X W2n+1 (x) = 2 gk Wn (2x − k). W2n (x) =

(1) (2)

W0 (x) and W1 (x) correspond to the generating functions φ and ψ, respectively. Following Coifman and Wickerhauser, [25] an iteration of the WT may be described as the operation of filters combined with a down-sampling operation F0 {sk }(2i) =

X

sk hk−2i

(3)

F1 {sk }(2i) =

X

sk gk−2i .

(4)

Recursive splitting of a signal into highpass F1 and lowpass F0 subspaces generates a dyadic tree-structure of wavelet coefficients for 1-dimensional signals. When applied to 2-dimensional signals (i.e. video images), the operations are performed on both rows and columns, resulting in a quadratic rather than a dyadic tree structure. For an N by N | log2 N ∈ Z+ image, the complete wavelet transform (ie packet transform) outputs subspace sets S`,n = {s(`, n, k, j)} consisting of coefficients corresponding to space translations k, j ∈ {1, 2n+1 , 2 · 2n+1 , ..., N }. The WT selectively decomposes the low-pass subspaces only, resulting in high frequency structures well localized in space/time and low frequency structures well localized in frequency. Note that the number of frequency subspaces ` increases logarithmically with n, as the number of coefficients within each subspace decreases. The WT, by providing estimates of band-limited activity at a range of translations, provides combined space-frequency information. The subspaces S`,n describe an empirical distribution of activity across space at scale ` with a space-frequency trade-off indexed by n. We introduce the vector s the entries of which are the sorted members of a particular wavelet coefficient set S`,n . The series s = s{l}, l ∈ {k, j} may be interpreted as inverse CDFs, since s{l} is a function of the cumulative probability: s{l} = f (p)|p = Ll , p(s) = f −1 (s). Thus, simply sorting wavelet coefficients within each wavelet subspace yields the inverse empirical CDF. Unlike other possible forms of parameterization of the observed wavelet distributions (i.e. Gaussian- or entropybased statistics), s represents a relatively complete description of the statistical properties of an observed distribution, since it makes no assumptions or abstractions regarding the properties of the underlying distribution. Further, s discards information regarding the original x-y spatial co-ordinates of the features, generating a descriptor less sensitive to translations of the input. 1 . Let w represent the catenated subspace distributions for all wavelet subspaces with ranked coefficients, w = {s`,n }, characterizing the density distribution for each wavelet 1 Note that a fully ’translation-invariant’ is not normally obtained by applications of the wavelet transform. The wavelet series {sj`,n }

is variant to translations t for which 2` modt 6= 0.

PCA BASED FEATURE EXTRACTION OF WAVELET CDFS

subspace with respect to space. For particular wavelet packet parameters, a training set of J shape vectors {wj }, j ∈ {1, ..., J} may be generated. We wish to parameterize the features of the collected wavelet CDFs in order to reduce the dimensionality as much as possible whilst preserving the essential characteristics of the observed CDFs. We utilize PCA as an assumption free, data-driven method for extracting the primary axes in along which variation in cumulative distributions occur. Dimensionality reduction is achieved by extracting the first q principle axes un of the data set, being those orthonormal axes onto which the retained variance is maximal. The sample covariance maP trix C = j (wj − w)(wj − w)T /N is calculated from the training data set {wj }. The principle components uq are the eigenvectors with the greatest associated eigenvalues of C, such that Cun = λn un . The basis functions uq yield coefficients λq , which represents a mapping to a lower dimensional subspace, while maintaining the retained variance. Points in this subspace may be used as input to a classifier system with q dimensional input, J training cases, and associated class labels. The generation of a space-frequency energy signature may be summarized: 1. Transformation of the original image into a series of space-frequency subspaces S`,n via the wavelet transform. 2. Generation of sorted wavelet coefficient sequences {sj`,n }, corresponding to inverse empirical CDFs, and thus representing an estimate of the cumulative distribution across space for each frequency subspace. 3. Parameterization / dimensionality reduction of the collected subspaces w via application of PCA, yielding features λq for input to a classifier system.

3

of a crack structure or not. Each video frame was divided into a tiled map 300 16x16 pixel sections. If at least one pixel in the section was manually determined to be part of a crack structure, the entire section was classified as a crack. The section provided the raw input to the spacefrequency signature algorithm. The section was classified a ’crack’ if at least one pixel in the array was manually determined to be part of a crack structure. Figure 1 displays a single training frame: in shot is a clear vertical crack, a horizontal pipe join (not a fault), along with some lighting / reflection effects.

Fig. 1. Example video frame comprising 300 16x16 pixel subsections

Space-frequency signatures λq were generated for 5700 training examples, and used as input to linear and nonlinear classifiers. The linear classifier consisted of a logistic regression model. Logistic linear regression models

B. Application to robot vision The present study takes place in the context of the development of a robot-vision system for a sewer maintenance robot. The task of the robot is to navigate sewer pipes, scan the walls of the pipe, and detect cracks and other defects. The development of the present feature extraction algorithm was motivated by the need for a translationinvariant feature extractor for video image data, which nonetheless was sensitive to differences between spatial distributions. For example, a crack in a sewer wall is characterized by a particular space-frequency distribution, but a wide variety of translations and orientations. The generation of space-frequency distribution is used as a low-level feature detector for discrimination of crack-like structures at a low spatial level 2 . Video data was gathered from a conventional sewerinspection robot during inspection of damaged sewer pipes in the city of Kitakyushu, Japan. Video was converted to a 30 frames per second, 320x240 pixel digital representation. Nineteen video frames at a variety of time points were manually analyzed, pixels being determined as part 2 Note that a higher-level classifier system is ultimately used to make the final decision regarding the presence of pipe-wall damage. However, discussion of this system is beyond the scope of this paper.

y=

exp(α + β1 χ1 + · · · + βn χn ) 1 + exp(α + β1 χ1 + · · · + βn χn )

(5)

are the preferred method for performing linear prediction on a dichotomous dependent variable [26]. The nonlinear classifier consisted of a artificial neural network (ANN) with 15 hidden neurons. Tan-sigmoid activation functions were used for the hidden layer, with a log-sigmoid activation function, suitable for binary valued targets, used for the single output neuron. III. Results A variety of feature-representation spaces on the video image sections were generated for comparision purposes: 1. Raw pixel data. (RAW) 2. Sorted raw pixel data. (CDF) 3. Absolute Fourier frequency spectrum (FFT) 4. Sorted wavelet space-frequency distribution (WAVCDF). PCA was applied to each of the four types of image representation, reducing the dimensionality of the input data from 256 to 30 in each case. Performance was assessed using a single run of logistic regression, and ten (10) runs of

4

INTERNATIONAL SYMPOSIUM ON ROBOTS AND AUTOMATION 2002

the ANN classification model, using different random starting weights. ANNs were trained for a limited period of 200 epochs. Figure 2 shows the classification performance of the linear classifier for each of the four kinds of input representations. Adjusted classification rates are shown, given equal weightings for each class (i.e. performance shown is the mean of classification rate over classes).

Fig. 2. Classification performance of logistic regression with various input transformations

Figure 3 displays the mean performance of each batch of 10 trained ANNs, with the 95% confidence interval of the mean. Three one-tailed t-tests were performed, testing the null hypothesis that WAVCDF performance was not higher than each of the other forms of pre-processing. The WAVCDF preprocessing method was associated with significantly better performance, p < 0.001 in each case.

Fig. 3. Classification performance of ANN classifier with various input transformations. Mean and standard error of 10 runs is shown.

IV. Discussion A feature extraction method was presented, based on PCA-extracted parameters of sorted wavelet coefficient distributions (which represent inverse empirical CDFs). This represents a rather new and unusual approach to feature extraction for several reasons; firstly due to the intermediate stage of feature extraction, constituting estimation of the empirical CDF of each wavelet scale. Secondly, characterizing the features of the CDFs is accomplished using PCA rather than using parameters such as variance, mean, skewness, etc. The PCA approach makes no assumptions regarding the underlying shape of the distribution, outputting statistically independent coefficients that optimally represent the characteristics of the distribution. The wavelet - empirical CDF method compared favorably with other feature extraction methods. Both linear and non-linear classifier systems performed better using the wavelet - CDF descriptor, achieving an average classweighted performance of 90%, in the case of the ANN. The results support several conclusions. The WAVCDF method performed significantly better than the FFT or the CDF, indicating that a combined space-frequency representation is more useful than either by itself. Generally, more sophisticated pre-processing led to better classifier performance. It is well known that effective pre-processing / feature extraction leads to better classifier performance. However, considering that dimensionality reduction / statistical independence is achieved in any case, via the PCA step, it is worthwhile to enquire why better performance is achieved. Note that the key characteristic of each pre-processing technique appears to be in it’s non-linearity (i.e sorting, taking Fourier magnitudes) - which amounts to discarding information (permutation and phase information, respectively). Better classification performance is obtained by throwing (less useful) information away. Thus, the classification problem may be simplified during the pre-processing stage. This interpretation is supported by the result that the differences in classification performance were more pronounced for the linear, compared to the non-linear classifier. The function approximating properties of the ANN were able to offset, to some degree, the more complex discrimination problem presented by less sophisticated preprocessing methods. Conversely, the WAVCDF and FFT preprocessing methods increased the linear separability of the discrimination problem. However, overall performance of the non-linear classifier was significantly higher than the linear classifier, indicating that, even with sophisticated pre-processing, the discrimination problem remained nonlinear. Finally, it is noted that for the less sophisticated forms of preprocessing (CDF and RAW), the variance of the ANN performance was significantly great (see fig. 3). This supports the interpretation that more sophisticated pre-processing simplifies the decision boundary, leading to a less complex error surface. It is reasonable to suppose that more variable classifier performance observed for the CDF and RAW data pre-processing methods is a result of the more complex decision boundary. Parameterization of the density distribution of wavelet

PCA BASED FEATURE EXTRACTION OF WAVELET CDFS

coefficients via PCA appears to be a useful method of feature extraction for the present machine vision application. We are currently investigating this, and similar approaches for fault detection at a low spatial level (16x16 pixel resolution). The second component of the fault detection system involves incorporation of this information across multiple 16x16 regions, yielding a final decision regarding the presence of a fault. The feature extraction method described has the advantage that no model of the wavelet coefficient distribution need be specified. PCA coefficients characterize the distribution ’blindly’ according to the variance maximization criterion. Thus, the method of generating wavelet density parameters is automatically tailored to the data set at hand. The method should be useful for classifier pre-processing in cases where classes are differentiation by different time-frequency energy distribution patterns. Because estimation of densities involves eliminating permutation information, it is expected to be less useful in cases where the order (or relative spatial location) comprises the critical discriminant information. References [1] [2] [3] [4] [5] [6]

[7]

[8] [9] [10]

[11] [12] [13] [14] [15]

F. Kirchner and J. Hertzberg, “A prototype study of an autonomous robot platform for sewerage system maintenance,” Autonomous Robots, vol. 4, no. 4, pp. 319–331, 1997. P.F. Hsien and D. Landgrebe, Classification of high dimensional data, Ph.D. thesis, School of Electrical and Computer Engineering, Purdue University, 1998. T.Y. Young and K.S. Fu, Handbook of Pattern Recognition and Image Processing, Colledge of Engineering, University of Miami, 1986. C. Lee and D. Landgrebe, “Feature extraction and classification algorithms for high dimensional data,” Tech. Rep. TR-EE 93-1, School of Electrical Engineering, Purdue University, 1993. X. Luo and G. Mirchandani, “An integrated framework for image classification,” in Proceedings of ICASSP, 2000. A. Mojsilovic, M. Popovic, A. Neskovic, and A. Popovic, “Wavelet image extension for analysis and classification of infarcted myocardial tissue,” IEEE Trans. Biomedical Engineering, vol. 44, no. 9, September 1997. P. Gunatilake, M. W. Siegel, A. G. Jordan, and G. W. Podnar, “Image understanding algorithms for remote visual inspection of aircraft surfaces,” in Proceedings of the SPIE Conference on Machine Vision Applications in Industrial Inspection V, San Jose, Februrary 1996, SPIE, pp. 2–13. F. Murtagh and J.L. Starck, “Image processing through multiscale analysis and measurement noise modeling,” Statistics and Computing, vol. 10, pp. 95–103, 2000. F. Murtagh, J.L. Starck, and M. Berry, “Overcoming the curse of dimensionality in clustering by means of the wavelet transform,” The Computer Journal, pp. 107–120, 2000. Mladen Victor Wickerhauser, “INRIA lectures on wavelet packet algorithms,” in Probl´ emes Non-Lin´ eaires Appliqu´ es, Ondelettes et Paquets D’Ondes, Pierre-Louis Lions, Ed., pp. 31–99. INRIA, Roquencourt, France, 17–21 June 1991, Minicourse lecture notes. C. Garcia, G. Zikos, and G. Tziritas, “Wavelet packet analysis for face recognition,” Image and Vision Computing, vol. 18, no. 4, pp. 289–297, February 2000. A. Laine and J. Fan, “Texture classification by wavelet packet signatures,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 5, no. 11, pp. 1186–1191, November 1993. M. Hilton, “Wavelet and wavelet packet compression of electrocardiograms,” IEEE Transactions on Biomedical Engineering, vol. 44, pp. 394–402, May 1997. Aleix M. Martinez and Avinash C. Kak, “PCA versus LDA,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 2, pp. 228–233, 2001. D. L. Swets and J. Weng, “Using discriminant eigenfeatures for image retrieval,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 18, no. 8, pp. 831–836, 1996.

5

[16] G. Y. Chen and T. D. Bui, “Invariant fourier-wavelet descriptor for pattern recognition,” Pattern Recognition, vol. 32, pp. 1083– 1088, 1999. [17] A.D. Marshall and R.R. Martin, Computer Vision, Models and Inspection, World Scientific Publishing, 1992. [18] M. N. Do and M. Vetterli, “Wavelet-based texture retrieval using generalized gaussian density and kullback-leibler distance,” in IEEE Transactions on Image Processing, February 2002, vol. 217. [19] N. Sebe and M. Lew, “Wavelet based texture classification,” in International Conference on Pattern Recognition, 2000, pp. 959–962. [20] S. Mallat, A Wavelet Tour of Signal Processing, Academic Press, San Diego, 1998. [21] S.S. Wang, P.C. Chen, and W.G. Lin, “Invariant patternrecognition by moment fourier descriptor,” PR, vol. 27, no. 12, pp. 1735–1742, December 1994. [22] O.R. Mitchell and T. Grogan, “Global and partial shape discrimination for computer vision,” OptEng, vol. 23, no. 5, pp. 484–491, September 1984. [23] J. Hurri, A. Hyv, J. Karhunen, and E. Oja, “Image feature extraction using independent component analysis,” 1996. [24] R. R. Coifman, Y. Meyer, S. R. Quake, and M. V. Wickerhauser, “Signal processing and compression with wavelet packets,” in Progress in Wavelet Analysis and Applications, Y. Meyer and S. Roques, Eds., Proceedings of the International Conference “Wavelets and Applications,” Toulouse, France, 8–13 June 1992, pp. 77–93. Editions Fronti` eres, Gif-sur-Yvette, France, 1993. [25] M. V. Wickerhauser and R. R. Coifman, “Entropy based methods for best basis selection,” IEEE Trans. on Inf. Theory, vol. 38, no. 2, pp. 719–746, 1992. [26] S. Menard, Applied Logistic Regression Analysis, Sage Publications, Thousand Oaks, CA, 1995.

Matthew Browne completed his PhD studies in 2001 at Griffith University, Brisbane, Australia. For his dissertation, he investigated the use of wavelets for the analysis of psychophysiological signals. He likes working in artificial intelligence using statistical, signal processing, and other numerical methods. He is currently working as a researcher with the GMD-Japan research laboratory, focusing on processing of robot sensor data.