Sparse PCA for gearbox diagnostics - CiteSeerX

2 downloads 76 Views 1MB Size Report
Two samples of power spectra, obtained from the Matlab PSD procedure, the amplitude of ..... http://www2.imm.dtu.dk/˜ksjo/kas/software/index.html [19]. The software .... A tutorial", Mechanical Systems and Signal Processing Vol. 25, 2009, pp.
Proceedings of the Federated Conference on Computer Science and Information Systems pp. 25–31

ISBN 978-83-60810-22-4

Sparse PCA for gearbox diagnostics Anna Bartkowiak

Radosław Zimroz

Institute of Computer Science, University of Wrocław, and Wrocław High School of Applied Informatics Wrocław, Poland Email: [email protected]

Wrocław University of Technology Vibro-Acoustic Science Laboratory Wrocław, Poland Email: [email protected]

PCA to the recorded data, and discussion on the results may be found in Section 5. An overall discussion on the applied methods and their validity is presented in Section 6.

Abstract—The paper presents our experience in using sparse principal components (PCs) (Zou, Hastie and Tibshirani, 2006) for visualization of gearbox diagnostic data recorded for two bucket wheel excavators, one in bad and the other in good state. The analyzed data had 15 basic variables. Our result is that two sparse PCs, based on 4 basic variables, yield similar display as classical pair of first two PCs using all fifteen basic variables. Visualization of the data in Kohonen’s SOMs confirms the conjecture that smaller number of variables reproduces quite well the overall structure of the data. Specificities of the applied sparse PCA method are discussed.

II. T HE DATA A. Collection of the data Data were recorded in an experiment carried out during a research conducted in the Vibro-Acoustics Science Laboratory of Wroclaw University of Technology [9]. Two complex multistage gearboxes used in two bucket wheel drive units – working in surface mining – were investigated. The Bruel&Kjaer Pulse system was used in the experiment. The vibration signals were measured in two auxiliary channels (tachometer signal and electric current signal) and four vibration channels. The signal duration was 60s and the sampling frequency 16384 Hz. After data acquisition, the speed profile and the vibration signal data were processed. This was done by (a) signal segmentation (according to digging process cyclicity), and (b) feature extraction. The recorded vibration signal is shown in the top exhibit of Fig. 1. The exhibit contains two graphs depicting speed [RPM] and acc [m/sec2 ] (acceleration) viewed as function of time [sec]. To achieve stationarity, the recorded signal was cut into segments (two cuts are also shown in the top exhibit of Fig. 1). Next, each of the obtained segments was subjected to spectral analysis using Matlab function psd. This yielded for each segment a power spectrum. Two such power spectra are shown in Fig. 1, bottom exhibit. One may notice some periodicity in appearing of the peaks of the spectrum: all they appear at equally spaced values of mesh frequency (expressed in Hz). This can be explained, taking into account the architecture of the gearbox and its way of working (to learn more, see [9], [8], where also a preliminary analysis of the data may be found). As result of this stage of data recording and preprocessing, two sets of data vectors were obtained, each with 15 components. Each data vector corresponds to the recorded variables pp1, . . . , pp15 obtained from the spectral analysis carried out for one segment of the vibration recording. The bad data yielded nA = 1232, and the good data nB = 951 segments. The obtained two sets of data vectors constitute two matrices named A and B. These matrices, of size [nA × 15] and [nB × 15], are the basis of our further analysis.

I. I NTRODUCTION OWADAYS we are witnessing a growing interest in the predictive assessment of industrial machinery. Machines work everywhere and are critical in maintenance of environmental and life conditions of humans. Machines use gearboxes, which in turn are substantial for functioning of the devices. Bad functioning is connected with huge economic losses; therefore early detection of malfunctioning is crucial both from economic and vital aspect. Although there are many general rules how to asses the problem, the working devices are quite different and different approaches are needed. What concerns machinery faults, vibration analysis has become almost the universal method to assess the state of a machine. See [1], [2], [3], [4], [5], [8], [6], [7] for methods of fault analysis used in this domain. Further references may be found in [10]. We will be concerned with diagnosis of gearboxes, used in bucket wheel excavators, working in surface mining. These are huge and expensive machines. In the following we will analyze vibration sounds obtained from two machines: one in bad state (machine A), and one in good state (machine B). We will show that, on the basis of vibration signals, it is possible to assess, what is the state of the machine: good or bad. This will be done by two combined methods: using self-organizing maps (SOMs) and sparse principal components (sPCs). Next section describes how the data were acquired; it contains also some statistical characterization of the recorded data. A preliminary approach to dimensionality reduction - by using Kohonen’s self-organizing maps (SOMs) is shown in Section 3. The principles of constructing classical and sparse PCs are presented in Section 4. Results of applying sparse

N

* This work was partially supported by (Polish) State Committee for Scientific Research in 2009 - 2012 as research project by R. Zimroz

c 2011 IEEE 978-83-60810-22-4/$25.00

25

26

PROCEEDINGS OF THE FEDCSIS. SZCZECIN, 2011

set A, n=1222 12

values of # pp

10 8 6 4 2 0

1

2

3

4

5

6

7

8 9 10 11 12 13 14 15 # pp

set B, n=951 5

The number of spectral components extracted from a spectrum depends on many factors (design of machine, its condition, external load, signal-to-noise ratio etc). In literature usually 5-6 components were used. Generally, the obtained variables pp1, . . . , pp15 are expected to indicate whether the device is in bad or good state; in our case they are expected to be associated with state of particular element of the working device, i.e. the planetary stage. It seems that so far this problem (how many power spectra take for the analysis) has not been investigated in a systematic way. What concerns the damages of the gearbox A, it was found that all the rolling bearings had exceeded the allowable radial backslash and most of the gears had scuffs and micro cracks on their teeth [8].

2

0

1

2

3

4

5

6

7

8 9 10 11 12 13 14 15 # pp

Total raw, n=2183 12 10 8 6 4 2 0

B. Preliminary analysis of the data: simple statistics and visualization

1

2

3

4

5

6

7

8 9 10 11 12 13 14 15 # pp

Total normalized, n=2183 0.12 0.1 0.08

values of # pp

Firstly, univariate characteristics such as means, medians and boxplots were calculated. This was done both for raw and normalized data. Each data matrix (X) has been firstly centered to have the mean of each column equal zero, and next normalized so to have the length of each column equal to 1 (this was the way used by [18] and [19] in their sparse PCA). In consequence, every column of X has variance equal to 1/n. The boxplots obtained for both raw and normalized data matrices are shown in Fig. 2. Looking at the boxplots constructed from raw data of sets A and B, one may notice that their distributions differ (in absolute value) considerably in these two sets. For example, in set A, the most characteristic are very large values in variable 5 and 7; while in set B the largest values are exhibited by variable 2 (contains exceedingly large part of outliers) and variable 4.

3

1

values of # pp

Fig. 1. Data acquisition. Top exhibit: Recorded vibration signals shown as dependence of speed [RPM] and acceleration [m/s2 ] from time [s]. Bottom: Two samples of power spectra, obtained from the Matlab PSD procedure, the amplitude of the spectra is depicted as function of frequency [Hz].

values of # pp

4

0.06 0.04 0.02 0 −0.02 −0.04 −0.06

1

2

3

4

5

6

7

8 9 10 11 12 13 14 15 # pp

Fig. 2. The top two boxplots depict groups A (bad) and B (good) with raw data, i.e. non-standardized. The bottom two exhibits depict set C containing the sets A and B merged into one common group. Two boxplots are shown for the set C: firstly for raw data, next for normalized data.

ANNA BARTKOWIAK, RADOSLAW ZIMROZ: SPARSE PCA FOR GEARBOX DIAGNOSTICS

On the other hand, looking at the variables no. 9–15, which have very small amplitudes, but differentiated distributions, it is difficult to say something definite about their abilities of differentiation between groups A and groups B. The medians calculated for all the 15 variables are shown in Fig. 3. There are two plots constructed for raw and standardized data of both groups. The bottom plot, exhibiting medians for normalized data, is especially interesting. One may notice that: (i) variable no. 1 has practically the same value in both groups, therefore it is hard to expect that this variable will contribute to the differentiation between groups A and B; the same can be said about variable no. 13 (ii) variable no. 2 has inverted value as compared to remaining variables; why does it happen? needs further exploring (iii) remaining variables No.s 3 – 12, 14 – 15 exhibit approximately the same difference between their medians, thus any one of them is a candidate for a good discriminator. raw group medians 4.5

bad good

4 3.5

III. M ULTIVARIATE DATA VISUALIZATION USING KOHONEN ’ S SELF - ORGANIZING MAPS To visualize the considered 15-dimensional data, we use two methods: Canonical discriminant analysis and Kohonen’s self-organizing maps. Canonical discriminant analysis is a supervised method, it needs as input two data matrices, A and B, containing the two groups of data. The question to answer is: Do the variables pp1 - pp15 have some discriminative power with respect to the bad and good state of the machine? The problem was considered in [10] by using all the 15 variables pp2–pp15 and considering so called Fisher’s LDA criterion. The method yields one canonical discriminant variate which - combined with another one without discriminative power - permits to construct a scatterplot, in which one may see the location of data points belonging to the two considered groups A and B. It happens that these two sets are separated nearly perfectly: only 5 data points are misclassified [10]. Kohonen’s self-organizing maps (the SOM method) are constructed in an unsupervised way. The algorithm obtains as input only one set of data points and tries to depict their mutual neighborhoods in a predefined map. Maps obtained for set C of our data, when considering all 15 variables, and a reduced set of 7 variables, are shown in Fig. 4. First row of maps is based on 15, and the second row on 7 variables.

3 2.5 2 1.5 1 0.5 0 0

5 10 no. of variable 1 ... 15

15

group medians for normalized data 0.025

bad good

0.02 0.015 0.01 0.005 0 −0.005 −0.01 −0.015 −0.02 0

5 10 no. of variable 1 ... 15

15

Fig. 3. Medians for the raw (top) and normalized (bottom) groups A and B of the recorded data.

Fig. 4. Kohonen’s self-organizing maps visualizing groups A and B when taking (top) all 15 variables, and (bottom) only variables 2–8. Notice the topological similarity between the two rows of maps.

27

28

PROCEEDINGS OF THE FEDCSIS. SZCZECIN, 2011

For construction of the maps we have used the Matlab SOM Toolbox [11]. The architecture of the map is hexagonal; the map is composed from 24 × 10 hexagons. The data were standardized statistically, that is to have zero mean and unit variance for each column. The maps displayed in Fig. 4 are based on 24 × 10 nodes, which subdivide the total area of the map into 240 sub-areas having the shape of hexagons organized in a grid having 24 rows and 10 columns. To each hexagon a prototype vector called by Kohonen a codebook vector is found during training of the map. Each codebook represents a certain number (nkl , k = 1, . . . , 24; l = 1, . . . , 10) of input vectors located nearby in the data space. We may find which ones of the data vectors are represented by the given codebook. We may also split each B count nkl into the sum nA kl + nkl , which shows how many of data points localized at the node < k, l > belong to group A B and how many to group B. Having the counts nA kl and nkl , we may insert into each hexagon constituting the map two B others with size proportional to the counts nA kl and nkl , each inserted hexagon painted with different color. The respective maps constructed that way are shown in the right exhibits of Fig. 4. One may notice that data coming from group A and group B are differentiated quite nice. In the upper map only two nodes (no. 217 and 218) have mixed group content, i.e. represent data vectors belonging either to A or to B. In the bottom map, the topological locations are inverted with respect to the y-axis (this happens quite often, both maps are topologically equivalent). In this map only two hexagons (no. 1 and 2) have mixed group content, the others are perfectly disjoint. Apart from this, both maps show a decided separation of hexagons containing either bad (group A, color red) or good (group B, color green) data. We have constructed also another kind of maps, showing directly - by color - the distances of neighboring prototype vectors in the data space. These maps are shown in left exhibits of Fig. 4. The distances are indicated by colors as shown by the adjacent colorbars. One may notice that the two groups are separated quite distinctly by an empty space; also: group A is more homogenous then group B. The quality of the maps is measured by two indices: q - quantization error and t - topographic error (see [11] for definitions). The respective errors amount: when using all variables: when using variables 2–8:

q = 1.075, q = 0.586,

t = 0.041 t = 0.040

The quantization error q is measuring an average Euclidean distance of data points (located in one node) to their prototype, that is, to their codebook vector. It is obvious, that with more components (variables taken for analysis) this distance is larger. The topographic error t reflects the fidelity of the topographic representation in the map compared to the true topology in the data space. Surprisingly or not, this error is practically the same for the maps based on 15 and 7 variables. The above considerations have shown, that the number of considered variables can be reduced, without losing the

essential information about the topological location of the data points. A principled way for finding relevant variables is presented in next section. IV. O RDINARY AND SPARSE P RINCIPAL C OMPONENTS A. The classical PCA Principal component analysis (PCA) is one of the most common and widespread methods for multivariate linear data analysis. It serves for investigating data structure, data mining, data smoothing and approximation, also for exploring data dimensionality. The method permits to build new features, called principal components (PCs), which may serve for visualization of the data [12]. Let X of size n × d denote the observed data matrix. For simplicity of presentation, assume that n > d, and that X is of full rank. It is advised to standardize or normalize the matrix X. Following [18], [19], we assume that the data matrix is column wise centered and has columns of unit length. This means that all columns have means = 0 and variances equal 1/n. The PCA starts from computing the eigenvalues (λ) and eigenvectors (v) of the cross-product matrix S = XT X satisfying the matrix equation (S − λI)v = 0. This results in d eigenvalues λ1 ≥ λ2 ≥ · · · ≥ λd (1) and associated with them d eigenvectors vj = (v1j , . . . , vdj )T , j = 1, . . . , d. The eigenvectors constitute the loading matrix V [v1 , . . . , vd ]. The two fundamental PCA paradigms are: 1) Feature construction: (K)

Zn×K = X ∗ [v1 , . . . , vK ], 1 ≤ K ≤ d.

(2) =

(3)

The new features identified as columns of Z(K) are called Principal Components (PCs). 2) Data reconstruction: ˆ (K) = Z(K) ∗ ([v1 , . . . , vK ]T ). X n×d

(4)

Taking K = d, the full original data matrix Xn×d is reconstructed. For K < d the best linear approximation ˆ (K) is obtained – it is of Xn×d by a rank-K matrix X n×d best in the meaning of the L2 norm. The constructed PCs have the major advantage that they are uncorrelated, which permits to analyze each of them separately, without referring to the others. Principal components can be also computed via SVD: X = UDVT where Z = UD contains the PCs, and V the loadings. Applications. Concerned our domain of interest, that is machine condition monitoring and fault detection: the authors [13] considered some statistical characteristics of vibration signals recorded from an internal-combustion engine sound analysis and an automobile gearbox vibration analysis; after performing a

ANNA BARTKOWIAK, RADOSLAW ZIMROZ: SPARSE PCA FOR GEARBOX DIAGNOSTICS

reduction of dimensionality of the data by PCA they obtained indices suitable for machine condition monitoring. PCA proved also to be very useful in vibration-based damage detection in an aircraft wing and in implementing a procedure for automatic detection and identification of ball bearing faults of some rotating machinery [14], [15]. A detailed analysis of the considered data (sets A, B, C described in Section 2) is shown in [10]. Among others it was found that the first three PCs calculated from set C explain 86 % of total variance of the data. Displaying a scatterplot of the first two PCs (i.e. PC1 and PC2) the points-projections of sets A and B are practically separated and it is possible to draw a linear line separating the two sets with only 5 misclassified data points (out of total nA = 1232 and nB = 951 data points). B. Sparse PCA The idea of sparse PCA: To construct a PC, one needs all the variables contained in the data set. Is this necessary? It would be desirable to reduce the number of explicitly used variables. This idea was floating for some time among the data analysts and some proposals were offered (see [12], [19], [18]). A reasonable approach with a fast implementation algorithm was proposed in [18]. The authors proposed a regression approach. The idea starts from the observation that each PC is a linear combination of all d variables. Thus, for known values of a given PC (given as vector z, being a column of the matrix Z defined in formula 3), and for known values of the matrix X, we are able to recover the coefficients of that combination by applying a regression method. If we want a sparse PC (which means that it is based only on few original variables), we should apply here a sparse regression method. The authors of [18] proposed for this purpose the sparse elastic net regression [17], [19]. The computational algorithm proposed by Zou, Hastie et Tibshirani: The sparse PCs are computed in an iterative way. The computations are for fixed K ( 1 ≤ K ≤ d) chosen by the user. The starting point of the reasoning is the observation that the eigenvectors vj , j = 1, 2, . . . , K constituting the matrix V = [v1 , . . . , vK ] appear in PCA in two roles: when constructing the features (formula 3) Z(K) = XV(K) ,

Stage 1. Given A, compute B. Having A, we may perform for each j = 1, . . . K the following evaluations: • compute the values (scores) of the j th feature Zj appearing in the j th column of Z(K) as: z∗j = X ∗ aj , • notice that the values (scores) of the j th feature Zj may be also computed as zj = X ∗ b j ,

(7)

• notice, that at this moment we know z∗j , and so we may substitute in equation (7): zj = z∗j , obtaining z∗j = X ∗ bj ,

(8)

• since equation (8) may be viewed as a linear regression problem in unknown regression coefficients vector bj , obtain an estimate of bj by applying a sparse regression algorithm (e.g larsen [17]). In such way it is enforced that only few columns of X will appear in the regression equation, because only few elements of the vector b will have non-zero elements. The degree of sparseness of the regression depends on the algorithm. The larsen algorithm permits the user to declare, how many original variables (columns of X) should be retained. Performing the estimation of sparse regression coefficients vectors bj for j = 1, . . . , K we obtain K successive vectors ˆ 1, . . . , b ˆK ] ˆ = [b composing estimates of the sought matrix B – which was desired at this stage of the calculations. Now we pass to stage 2.

Stage 2. Given B, compute A. If B is fixed, then the problem is to find such a matrix A, that minimizes the quadratic form kX − (XB)AT k2 subject to the restriction that AT A = IK×K . It is shown in [18] that this is obtained by a reduced rank form of the Procrustes rotation by computing the SVD of (XT X)B: (XT X)B = UDVT

(5)

ˆ = UVT . and substituting A

(6)

Stages 1 and 2 are repeated alternately until a final criterion of convergence is met. The convergence might be achieved if the maximal difference computed for the respective elements Bik (i = 1, . . . , d, k = 1, . . . , K) is smaller than an assumed small number ǫ (say, ǫ = 1∗ e−6 )

or reconstructing the original data matrix (formula 4). ˆ (K) = Z(K) ∗ (V(K) )T . X

29

Let B denote the matrix V(K) used in construction of Z(K) (formula 5), and A the matrix V(K) used in reconstruction of ˆ (formula 6). To obtain sparse PCs, the authors [18] propose X to estimate the loading matrix V(K) in an alternating way, working in two stages: Stage 1. Given A, compute B; and Stage 2. Given B, compute A. The initial estimate of A may be obtained, e.g. by ordinary PCA.

old new |Bik − Bik | < ǫ,

where Bold and Bnew denote the matrices of loadings obtained in two successive iterations. This condition might be combined with another condition that the total number of iterations carried out so far is smaller than a declared maximal number of iterations.

30

PROCEEDINGS OF THE FEDCSIS. SZCZECIN, 2011

TABLE I R ESULTS OF SPCA FOR THE GEARBOX DATA : CHOSEN VARIABLES AND ( ADJUSTED ) EXPLAINED VARIANCE AV – FOR 3 VALUES OF lambda. VARIANCE EXPLAINED BY 3 FIRST ORDINARY PC S EQUALS 86.05 %.

chosen AV chosen AV chosen AV

lambda=1 7, 10 1, 4 0.13 0.06 lambda=10 12, 14 7, 10 0.12 0.03 lambda=100 12, 14 7, 10 0.12 0.03

sparse PC1, PC2 stop=−2 lam=1

0.06 0.04 0.02

12, 14 0.02

Inversion 3, 2, 1 Σ 20.66%

5,8 0.01

Inversion 2, 3, 1 Σ 15.54%

5,8 0.01

Inversion 2, 3, 1 Σ 15.54%

0 −0.02 −0.04 −0.06 −0.08 −0.15

−0.1

−0.05

0

0.05

sparse PC1, PC2 stop=−2 lam=1

V. S PARSE PCA – RESULTS FOR THE GEARBOX DATA

When analyzing the results, one should consider: (i) Which original variables were chosen by the applied spca algorithm to constitute the sparse principal components? (ii) How much of total variance is explained by the constructed sparse PCs? Before discussing the results, we should emphasize that the applied method yields sparse PCs that have different properties then the classic ones. This happens because the classical PCs are set with the criterion of extracting the maximum amount of total variance of the data, while in construction of sparse PCs this principle is de facto not used. To say it plainly, the

0.055

0.05

0.045

0.04

0.015

0.02

0.025

0.03

0.035

PC1, PC2, PC3 stop=−2 lam=1 0.1 0.05 PC3

For calculations we have used the Matlab software implemented by Karl Skoglund and available at http://www2.imm.dtu.dk/˜ksjo/kas/software/index.html [19]. The software works extremely fast, and being in the form of M-files, can be quite easy adopted to the wishes of its user. Sparse principal components are calculated by the function spca. The most important parameters to be declared by the user are: K, the number of desired sparse principal components, lambda, parameter for the function larsen performing the elastic net regression (see description of stage 2 of the algorithm presented in Section 4), stop, a parameter specifying some stopping criteria, containing also the option how many non-zero components to retain in each loading vector (eq. 2) of the retained PCs. The algorithm relies heavily on the function larsen realizing the algorithm for elastic net regression [18], [19], being an extension of the least angle regression [16], [17]. Assuming K=3 (compute 3 sparse principal components), stop=2 (each PC uses only 2 original variables) and lambda=1 (lambda is a parameter needed by the function larsen called in Stage 1 of the algorithm) we got for our data the (constructed) feature matrix Zn×3 containing in its columns three sparse principal components: . They are depicted in Fig. 5. The displays look much similar to those obtained by ordinary PCA (shown in [10]). Similar displays (not shown here) were obtained when using lambda=10 and lambda=100. Details of the results, for the same values of K (=3), stop (=2) but different values of lambda are shown in Table 1.

0

−0.05 −0.1 0.1 0 PC2

−0.1 −0.15

−0.1

0 −0.05 PC1

0.05

Fig. 5. Sparse principal components PC1, PC2 and PC3. Legend: a red ’+’ denotes ’bad’ data point (set A); a green hexagon denotes ’good’ data point (set B). Top: scatterplot of . Middle: Zoom of the overlapping parts of the ’bad’ and ’good’ data clouds, the bad points, i.e. red pluses, are additionally circled. Bottom: 3D display of .

columns of B (used for construction of Z) are based on other principle: the sparse regression. This makes that the variances of the sparse PCs need not decrease with the no. of the PC; also that the variances of the sparse PCs do not need to sum to the total variance, given as trace of S = XT X. To make them act similarly like the classical PCs, after convergence of the 2-stage algorithm the following actions are undertaken : α) normalization of the loadings appearing in B, to have unit Euclidean length. β) ordering of the columns of the Z matrix according to their decreasing variance, and next orthogonalizing them – to have adjusted increments (AV) of explained variance. Now let us look at Table 1. We find that • Considering point (i) formulated above: The variables

ANNA BARTKOWIAK, RADOSLAW ZIMROZ: SPARSE PCA FOR GEARBOX DIAGNOSTICS

chosen for two pairs of the sparse PCs are the same for lambda=1 and lambda=2, but one pair of chosen variables differs. • Considering point (ii) formulated above: An inversion happened in finding the PCs with the largest variances. In the experiment using 3 different values of lambda, the sparse PC explaining the maximum variance was found as the third (for lambda=1) or the second (for lambda=10 and lambda=100). • Generally, the total percentage of explained variance called AV (adjusted variance) is low, however that the graphically displayed information of the data is practically the same as that obtained from first three classical PCs, which reconstruct together 86.05% of total variance. This leads to the conclusion that the sparse PCA is able to recover the proper information on the topological structure of the data. This is interesting and needs further elaboration. Probably this is due to the specificity of our data. It was noticed already in Section 2 that there are several variables which repeat the same information. The classical PCA is able to notice them at once and account all their information into the first PCs. The sparse PCA algorithm deals only with part of the variables and constructs dependent PCs. However, despite this, the sparse PC algorithm is able to recover the proper information on the topological structure of the analyzed gearbox data. VI. D ISCUSSION AND CONCLUDING REMARKS The necessity and usefulness of producing sparse PCs was considered since long as part of so called factor analysis seeking to group the analyzed variables into some subgroups which appear jointly indicating for a hidden, non-measurable variable which manifests itself through the observed and measurable phenomena [12] (e.g. ’intelligence’ is such a hidden variable). One very popular method consisted of performing rotations of the loading matrix obtained from PCA (e.g., the varimax rotation). Other method acted by determining simple thresholds of elements of the loading matrix; e.g. elements smaller then 70% of the maximum from all elements were simply set to zero. The sparse algorithm proposed in [18] is mathematically very sophisticated. It can be implemented to work effectively and fast when using Matlab [19]. The algorithm is relatively new and not very much in use. We have applied this algorithm with the aim of reducing the number of variables needed when making a diagnosis of gearbox functioning. It seems that we were successful and the set of used variables can be reduced. However, we have noticed that the results of analysis depend from few initial parameters needed by the spca algorithm. For the moment the parameters are found by a kind of cross-validation. The results may also depend on the structure of the data. Therefore, before advising the algorithm for an automated procedure, some further research on work of the algorithm is needed. SCotLASS is another interesting method, developed about the same time [20]. It seems to be easier in determining the input parameters for a sparse PCA, however the numerical optimization of the applied criterion is more complex.

31

SCoTLASS has elucidated a difficult problem connected with structure of Yeast DNA [21], which could not be identified previously. R EFERENCES [1] R.B. Randall, "State of the art in monitoring rotating machinery – Part 1", Sound and Vibration, March 2004, pp. 14–20. [2] R.B. Randall, "State of the art in monitoring rotating machinery – Part 2", Sound and Vibration, May 2004, pp. 10–16. [3] R.B. Randall and J. Antoni, "Rolling element bearing diagnostics – A tutorial", Mechanical Systems and Signal Processing Vol. 25, 2009, pp. 485–512. [4] K. Worden, W.J. Staszewski, J.J. Hensman, "Natural computing for mechanical system research: A tutorial overview". Mechanical Systems and Signal Processing Vol. 25, 2009, pp. 4–111. [5] D.A. Clifton, L.A. Clifton and P.R. Bannister, "Automated novelty detection in industrial systems". Studies in Computational Intelligence (SCI) Vol. 116 (www.springerlink.com) 2008, pp. 269–296. [6] A. Bartkowiak and R. Zimroz, "Outlier analysis and one class classification approach for planetary gearbox diagnosis", (9th Int. Conf. on Dammage Assessment of Structures DAMAS 2011,Oxford UK.), J. Phys. Conf. Ser. vol. 305, 2011, 012031, pp. 1–10, IOP Publishing, doi:10.1088/1742-6596/305/1/012031 [7] R. Zimroz and A. Bartkowiak, "Investigation on spectral structure of gearbox vibration signals by principal component analysis for condition monitoring purposes" J. Phys. Conf. Ser. vol. 305, 2011, 012075, pp. 1–10, IOP Publishing, doi:10.1088/1742-6596/305/1/012075 [8] W. Bartelmus, F. Chaari, R. Zimroz and M. Haddar, "Modelling of gearbox dynamics under time-varying nonstationary load for distributed fault detection and diagnosis", European J. of Mechanics A/Solids Vol. 29, 2010, pp 637–646. [9] W. Bartelmus and R. Zimroz, "A new feature for monitoring the condition of gearboxes in non-stationary operating conditions", Mechanical Systems and Signal Processing Vol. 23, 2009, pp. 1528–1534. [10] R. Zimroz and A. Bartkowiak, "Two simple multivariate procedures for monitoring planetary gearboxes in nonstationary operating conditions". Manuscript 2011, pp. 1–18, unpublished. [11] J. Vesanto, J. Himberg, E. Alhoniemi and J. Parhankangas, SOM Toolbox for Matlab 5. Som Toolbox team, Helsinki University of Technology, Finland, Libella Oy, Espoo 2000, pp. 1–54. [12] I.T. Jolliffe, Principal Component Analysis, 2nd Edition, Springer, New York 2002. [13] Q. He, R. Yan, F. Kong and R. Du, "Machine condition monitoring using Principal Component representations", Mechanical Systems and Signal Processing Vol. 23, No. 2, 2009, pp. 446–466. [14] I. Trendafilova, M. Cartmell, W. Ostachowicz, "Vibration based damage detection in an aircraft wing scaled model using principal component analysis and pattern recognition", Journal of Sound and Vibration 313, 3-5, 2008, pp. 560-566. [15] I. Trendafilova, "An automated procedure for detection and identification of ball bearing damage using multivariate statistics and pattern recognition", Mechanical Systems and Signal Processing Vol.24, 2010, pp. 1858-1869. [16] B. Efron, T. Hastie, I. Johnstone and R. Tibshirani "Least angle regression", Annals of Statistics, Vol. 32, No. 2, 2004, pp. 407–451. [17] T. Hastie, R. Tibshirani and J. Friedman, The Elements of Statistical Learning. Data Mining, Inference and Prediction. 2nd Edition, NewYork, Springer 2010. [18] H. Zou, T. Hastie, R. Tibshirani. "Sparse principal component analysis". J. of Computational and Graphical Statistics, Vol. 15, no. 2, 2006, pp. 265–286. [19] K. Sjöstrand, M.B. Stegmann and R. Larsen, "Sparse principal component analysis in medical shape modeling". International Symposium on Medical Imaging 2006, San Diego, CA,USA, Proc. SPIE 6144, 61444X, 2006, pp. 1–12, doi:10.1117/12.651658 [20] N.T. Trendafilov and I.T. Joliffe, "Projected gradient approach to the numerical solution of the SCoTLASS". Computational Statistics and Data Analysis Vol. 50, 2006, pp. 242–253. [21] A. Bartkowiak and N.T. Trendafilov, "Feature extraction by the SCoTLASS: an illustrative example". In: Intelligent Information Processing and Web Mining, Springer Series ’Advances in Soft Computing’ Vol. 31, 2005, pp. 3-11, DOI: 10.1007/3-540-32392-9_1.

Suggest Documents