wavelet networks for sensor signal interpretation in flank ... - CiteSeerX

1 downloads 0 Views 208KB Size Report
component monitoring and product quality monitoring. Keywords: Wavelet networks, Flank wear assessment. 1. Introduction. On-line ank wear assessment is ...
Proc. of the Second World Congress on Intelligent Manufacturing Processes & Systems, Budapest, HUN, 1997, Springer Verlag, pp. 82{87.

WAVELET NETWORKS FOR SENSOR SIGNAL INTERPRETATION IN FLANK WEAR ASSESSMENT S. Pittner, S. V. Kamarthi and Q. Gao Department of Mechanical, Industrial and Manufacturing Engineering Northeastern University, Boston, MA, 02115 USA Abstract: It is known that the vibration sensor signals in a turning process are sensitive to the gradually increasing ank wear. Accordingly, this paper investigates a

ank wear assessment technique in turning through vibration signals. To overcome some of the limitations associated with former methods based on sensor data fusion and neural networks for continuous ank wear assessment, a so-called wavelet network is investigated. The basic idea in this new method is to optimize simultaneously the wavelet parameters and the parameters for the signal interpretation (equivalent to neural network weights) to eliminate the feature extraction phase without increasing the computational complexity of the neural network. A neural network architecture similar to a standard one-hidden-layer feedforward neural network is used to relate sensor signal measurements to ank wear classes. A novel training algorithm for such a network is developed. This research investigates for the rst time the application of wavelet networks to manufacturing process monitoring; its results can also be useful for developing signal interpretation schemes in machine tool monitoring, critical component monitoring and product quality monitoring. Keywords: Wavelet networks, Flank wear assessment

1. Introduction On-line ank wear assessment is one of the main problems in automating turning processes. Recently, methods based on sensor data fusion and arti cial neural networks for continuous ank wear assessment were developed (Kamarthi and Pittner, 1996b). These methods are attractive with regard to practical implementation, speed, and accurate estimation. They involve four important steps: sensor data acquisition, sensor data preprocessing, sensor data representation, and ank wear assessment. Among these four steps, sensor data representation and ank wear assessment are the most critical and time-consuming steps for obtaining accurate estimates. It turned out that the so-called wavelet transform provides a very good S. Pittner, Tel.: 617-373-3699, FAX: 617-373-2921, e-mail: [email protected]

means for sensor data representation, while certain neural networks are appropriate means for a reliable ank wear assessment. Tool wear in a turning process occurs in various forms, predominantly due to three wear mechanisms: adhesion, abrasion and di usion. Compared to all other forms of wear, ank wear has the most deleterious in uence on the quality of a workpiece (Boothroyd and Knight, 1989). Flank wear is generally quanti ed by the average height VB of the ank wear land on the cutting tool. This value VB is assessed in this paper from vibration sensor signals. Depending on the desired dimensional accuracy and surface nish of the workpiece, an upper limit V Bmax is xed, beyond which the tool is considered worn. In this paper it is focused on ank wear assessment with V Bmax = 0:012 inches. The concept of wavelet networks was introduced by Zhang and Benveniste (1992) as a computational scheme that combines the mathemati-

cal rigor of wavelet theory with the adaptive learning properties of conventional neural networks into a single unit. In this paper, the ideas presented by Szu et al. (1992) and Dickhaus and Heinrich (1996), which have been successfully applied for speech recognition (Szu et al., 1992; Kadambe and Srinivasan, 1992; Kadambe et al., 1993) and the classi cation of electrocardiogram signals (Dickhaus and Heinrich, 1996; Vilim and Wegerich, 1995) are extended for ank wear classi cation in turning. This new scheme relies on the former concept of sensor-based ank wear assessment (Kamarthi and Pittner, 1996a), but combines the two major steps { sensor data representation and

ank wear assessment { into a single computational unit. The wavelet parameters are learned during neural network training since they are a part of the neural network. Consequently, the sensor data representation step prior to ank wear assessment can be dropped, while the ank wear assessment step can still be performed with a simple and ecient neural network.

2. Neural networks for classi cation It follows from a result in linear algebra that the simple perceptron like neural network architecture shown in Fig. 1 with at least two hidden nodes and the novel activation function g(x) = x2 can continuously interpolate every real-valued function on a set of at most T linearly independent discrete signals, where s(1); s(2); : : :; s(T ) stand for equidistant sampling values of the signal to be interpreted. This feedforward neural network has T input nodes and a single output node with no activation function, and it contains no thresholds. Due to its simplicity and the interpolation property stated above, this network is used for

ank wear classi cation. Depending upon the level of ank wear V B , the cutting tools were categorized into N = 2 classes, fresh or worn. To classify cutting tools, for each class a separate neural network was designed. Each of the N networks possesses the same structure as shown in Fig. 1 and works independently from the other neural network modules by transforming input patterns to output values. In that way every input pattern is tested for its membership in each one of the N classes represented by the N networks. The N outputs of this system of neural network modules are compared with N di erent threshold values in a nal decision procedure to decide about the class of the input pattern, i.e. the class of the cutting tool as fresh or worn. The determination of the weights of such a network by the backpropagation algorithm (Rumelhart and McClelland, 1986) is often prohibited by the large number of weights in the lower part of the

Classification Indicator z

g

s(1)

g

s(2)

g

s(T)

Fig. 1. Neural network architecture for classi cation network, which leads to very long training times. Therefore the parameters of the lower part were reduced and a new training method was developed. The simplicity and independentness of the ensemble of specialized neural networks, which allows for parallel processing, leads to a relatively fast performance both for training and classi cation during the operational stage.

3. Internal neural network structure used for training An earlier technique that has been successfully used for ank wear estimation (Kamarthi and Pittner, 1996a) and other process monitoring applications relies on extracting features from the squares of the wavelet transform coecients (Daubechies, 1992) of certain process signals during the data representation step. These features serve as the input to a neural network or a certain statistical procedure which performs the interpretation of the input process signals in the assessment step. In contrast, in this research a feedforward neural network structure with one hidden layer as shown in Fig. 2 has been designed to perform both the feature extraction and the signal interpretation steps for ank wear classi cation. The lower part of this network provides (approximations of) the wavelet coecients as input to the hidden nodes. After squaring these wavelet coecients, the hidden nodes pass them on to the upper part of the network, where they are transformed to a nal class indicator z . This means that the wavelet network for classi cation consists of two parts: a lower part for feature extraction and an upper part for classi cation. Such a network can be constructed and trained for each one of the N

ank wear classes and it can be reduced to the feedforward neural network described in section 2 after the training by simple function evaluations. 3.1.

Neural Network Parameters

The general form of a nite orthogonal wavelet

Classification Indicator z

Upper Part: Discrimination v

vq

1

Wavelet Coefficients

g

g

~

vK

v

g

g

1

~

v

v

r

L

g

g

~

Lower Part: Feature Extraction ~ ~ φ((p-br)/ar)

ψ((p-bq)/aq)

s(1)

s(T)

s(p)

Fig. 2. Neural network architecture for training expansion of a signal s 2 L2 (IR) is given by !  X  L K X x ? ~bl x ? bk + ~ l  a~ s(x)  k ak l l=1 k=1 (1) with ak ; bk ; a~l ; ~bl 2 IR for a special basic wavelet function 2 L2 (IR) and a corresponding scaling function  2 L2 (IR) (Daubechies,1992). The wavelet coecients, which can be considered as the signal features, are given by inner products 

and



x ? bk k = < s(x); > ak   Z x ? bk = s(x) d(x) ak   T X t ? bk s(t)  ak t=1

(2)

!

x ? ~bl > ~ l = < s(x);  a~l ! Z x ? ~bl = s(x) d(x) a~l ! T X t ? ~bl (3) s(t)  ~al ; t=1 where s(t) denotes the measurement of the input signal s at time t. If these approximations for wavelet coecients k (as proposed by Szu et al. (1992)) and ~ l are computed in the hid-

den layer, then the wavelet coecients for a very

exible set of basis functions are embedded in the resulting wavelet network. Moreover, the weights in the lower part of the network structure simply take the forms ((t ? bk )=ak ) and ((t ? ~bl =a~l ) for t = 1; 2; : : :; T , i.e. they are parameterized by the parameters ak ; bk ; a~l and ~bl . The value of the output of the network corresponding to an input sensor signal s is given by z =

K X

k=1

vk g( k ) +

L X l=1

v~l g(~ l )

=

K X k=1

vk 2k +

L X l=1

v~l ~ 2l ;

(4)

where vk ; v~l are the weights of the interconnections in the upper part of the neural network. The advantages of this wavelet network lie in the following facts:  The basic wavelet function and the scaling function  can be chosen to have a fast decay in the time domain, so that the parameters of the neural network are easy to interpret. On one hand, in the lower part of this network, the parameters ak and a~l are proportional to the lengths and the parameters bk and ~bl denote the positions  of the functions  ((x ? bk )=ak ) and  (x ? ~bl )=a~l ) . On the other hand, the parameters vk and v~l are the coecients of a linear discriminant function of the wavelet features.  The total number of parameters for each network is 3(K + L), which is in general much smaller than the number (T + 1)(K + L) of weights in the original network structure (shown in Fig. 1), so that a drastical reduction in training time can be expected.

4. Neural network training We have used the same procedure to independently train each one of the N neural network modules introduced in the preceding section. Therefore, one can discuss this training procedure for a neural network module corresponding to a certain class Cn (1  n  N ). 4.1

Choice of the wavelet basis functions

Our basic wavelet function is the same as the one proposed by Szu et al. (1992), namely the realvalued Morlet wavelet (t) = cos(1:75t)e? t2 shown in Fig. 3.

2

(5)

It can be shown with the theory of frames (Daubechies, 1992) that in analogy to relation (1) any signal s 2 L2 (IR) can be approximated by s(x) 

K X k=1



k



x ? bk : ak

(6)

Consequently, only coecients of the form k = < s(x); ((x ? bk )=ak ) > and none of the form ~l are included in the wavelet network structure. Choice of Error Function The output z of the nth network is evaluated dur4.2.

ing the nal decision procedure by comparing it with a threshold n . The input signal s is considered to belong to the ank wear class Cn if z is greater than n . If this is not the case, then the

ank wear state is assumed not to belong to the class Cn . For this criterion the half-sum-of-squares error function usually applied for training feedforward neural networks is too restrictive (Telfer and Szu, 1994). Instead, the criterion function ( z ? z )2 J (z1 ; z2 ; 12 ; 22 ) := 12 22  + 1

2

(7)

was maximized, where zi and i2 denote the mean and variance of the network output for signals belonging to Cn if i = 1 and those of all other signals if i = 2, which ts better to this situation. With this criterion the network training can be interpreted as determining the network parameters in such a way, that the network output z has the maximum between-class scatter and the minimum within-class scatter for a set of training signals s. It has the great advantage that the parameters in the lower and the upper part of the neural network shown in Fig. 2 can be determined separately as will be described in the following subsections. 4.3.

Determination of Lower Network Weights

We note that according to relation (4) the output z of the nth network is the orthogonal projection of the feature vector y := ( 21; 22; : : :; 2K ; ~ 21; ~ 22 ; : : : ; ~ 2L ) onto any line in IRK +L into the direction of the weight vector v := (v1 ; v2 ; : : :; vK ; v~1 ; v~2 ; : : : ; v~L ). If one considers the one-parametric set of hyperplanes in IRK +L with normal vector v, it follows immediately that the regarded wavelet network performs 1.5

1

0.5

0

−0.5 −5

−4

−3

−2

−1

0

1

2

3

4

5

Fig. 3. The basic wavelet

a perfect discrimination on a set of training signals if and only if the corresponding feature vectors y computed in the hidden layer are linearly separable in IRK +L . A measure for the degree of separability similar to the one presented in relation (7) for dimensions greater than one is given by Jh (fyg) := trace(Swn ) + (trace(Sb ))?1 ; (8) where Swn denotes the within-class scatter matrix only for the class Cn corresponding to the nth neural network, and Sb denotes the betweenclass scatter matrix of the feature vectors y (Devijver and Kittler, 1982). This criterion function Jh , which was chosen according to our experimental experience, has to be minimized for the given set of training signals s. For the computation of the parameters in the lower part of the neural network, an iterative gradient descent method has been developed. It starts with initial values for the weights ak ; bk ; a~l and ~bl , which are updated successively through 30 iterations with the formulas @Jh anew k = aold k ?  @ak ; @Jh bnew k = bold k ?  @bk ; @Jh a~new k = ~aold k ?  @ a~k ; @Jh old ~bnew k = ~bk ?  @~b ; k

(9)

to minimize the criterion function Jh in (8) for the set of training signals s. Here, the so-called learning rate  was initialized with 0.1. Each time the value Jh was greater than the previous one during our training algorithm, the last step was canceled and the learning rate  was reduced by a factor 10. Our decision was to use 16 hidden nodes which have functions of the form ak bk := ((t ? bk )=ak ). For the corresponding pairs (ak ; bk ) of wavelet parameters the form (0:71; pk ) was used for one half of the nodes and (0:478; qk + 0:327) for the other half as initial values. The parameters pk and qk were selected as integers that cover the whole time domain of the regarded discrete signals s. The choice of these pairs of initial values has the property that (

maxt2IR (t) for r = pk mint2IR (t) for r = pk  1 0 for r 2 ZZ n[pk ? 1; pk + 1] (10) for the rst(choice and ? mint2IR (t) for r = qk for r = qk + 1 ak bk (r)  mint2IR (t) 0 for r 2 ZZ nfqk ; qk + 1g (11) ak bk (r) 

for the second choice. It is obvious from relation (2) that for both cases the values 2k are larger for discrete signals s with locally alternating sign than for those which are constant around t = bk .

Consequently, with our initial choices, the wavelet features 2k computed in the hidden layer of the wavelet network represent simple frequency characteristics of the processed signals whose quality can be improved during training with regard to an optimal classi cation performance. 4.4

Determination of Upper Network Weights

After the lower part of the wavelet network has been determined, it is used to compile the set of feature vectors y = ( 21; 22; : : :; 2K ; ~ 21; ~ 22; : : : ; ~2L ) corresponding to the individual training signals. For the calculation of the weights in the upper part of the wavelet network it was decided to apply the formula used for determining the generalized Fisher linear discriminant function (Duda and Hart, 1973). This procedure does not require any special distribution of the feature vectors, and determines a weight vector v := (v1 ; v2 ; : : :; vK ; v~1; v~2 ; : : :; v~L ) such that the wavelet network performs optimal with respect to the criterion function J in (7). The vector v is simply given by the formula v = Sw?1 (z 1 ? z 2 ); (12) where z 1 and z 2 are the means of the feature vectors y belonging to the class Cn and the feature vectors belonging to all other classes respectively, and the matix Sw denotes the within-class scatter matrix of the feature vectors y.

5. Sensor data acquisition and preprocessing Turning experiments were conducted on a 20HP LeBlond lathe for obtaining necessary data. Vibration signals were sampled at 26 kHz frequency. The record length of each sample was 512 points. Once the cutting was started with a fresh insert edge, V B values, and vibration signals were recorded once in every minute until the cutting edge was considered totally worn out, i.e. V B  0:018 inches. The VB values were measured o -line with a toolmaker's microscope. Experiments were divided into two di erent sets, Set 1 and Set 2, which consisted of data collected from 15 cutting edges and 9 cutting edges respectively. The 168 signals from Set 1 were used for training the neural network modules and the 95 signals from Set 2 for testing them. Since time series data obtained from the sensors are usually noise laden due to nonhomogeneities in the workpiece material and electric interference during the signal transmission through cables and instrumentation, they were properly ltered by a Bessel bandpass lter (Johnson et al. 1980) before they were input to each neural network.

The ank wear values were divided into two classes: Fresh cutting tools with V B between 0 and V Bmax = 0:012 inches belong to class C1 and worn cutting tools with V B beyond V Bmax belong to class C2. Both Set 1 and Set 2 contain approximately the same number of measurements from tools belonging to class C1 and from tools belonging to class C2.

6. Experimental results The proposed method was implemented in MATLAB 4.2c on a Sun SPARC-20 workstation LX. Two wavelet networks (one for each class) were trained for each of the two vibration signals in cutting and feed direction. The CPU-times for training two wavelet networks together for each sensor signal direction are provided in Table 1. During the training, in all four neural networks, 6 automatic adaptations of the learning rate  had to be performed. While all parameters ak were located in the interval [?0:1; 3:6] and seemed to be normally distributed around their initial choices after training, almost all of the parameters bk changed only slightly. It should be noticed that the sign of the parameters ak is irrelevant for the neural networks because the function is symmetric with respect to zero. In the nal decision procedure a threshold n was subtracted from the network output on to decide about the class membership of the corresponding input signal. A suitable value for this threshold turned out to be n := zj ? j ;

(13)

where j is such that zj = maxfz1; z2 g and zi ; i are as de ned earlier in subsection 4.2. A signal s is considered to belong to the class corresponding to the mean zj if on ? n is positive and to the other class if on ? n is nonpositive. The results on ? n for the networks corresponding to class C1 were summed up for both vibration directions. This summation was also performed for the networks corresponding to class C2 . Then each signal of Set 2 was assigned to the class with the higher sum. The classi cation results for Set 2 are summarized in Table 2. The outputs of each pair of neural networks were computed sequentially for a single testing signal in MATLAB within 7 milliseconds. Since this performance is one of the fastest of all existing Vibration signals cutting feed 40.1 40.9

Table 1. CPU-times in minutes for training pairs of wavelet networks for 30 iterations

Vibration signals 69% Class C1 signals Class C2 signals 67% 72%

Table 2. Percentage of correct ank wear classi cation results for Set 2 provided by the system of wavelet network modules methods for ank wear assessment with the same level of accuracy (Kamarthi and Pittner, 1996a), the presented method can be a valuable alternative especially for tool wear assessment in realtime.

7. Discussion The feasibility of ank wear classi cation using the proposed wavelet network was examined. This wavelet network lead to new insights into sensor based methods for ank wear assessment, and it is very amenable for parallel processing applications. It is more powerful than common wavelet decompositions, because unlike in the latter ones the parameters of the inherent wavelet functions are allowed to vary without any restrictions. The simple neural network structures as well as its fast performance, make the proposed methodology also suitable for machine tool monitoring in milling and drilling, surface roughness estimation of workpieces, critical component monitoring and product quality monitoring.

References Boothroyd, G. and Knight, W. A. (1989). Fundamentals of Machining and Machine Tools, Marcel Dekker, New York. Daubechies, I. (1992). Ten Lectures on Wavelets, SIAM Press, Philadelphia. Devijver, P. A. and Kittler, J. (1982). Pattern Recognition { A Statistical Approach, Prentice-Hall, Englewood Cli s. Dickhaus, H. and Heinrich, H. (1996). Classifying Biosignals with Wavelet Networks { A Method for Noninvasive Diagnosis, IEEE Engineering in Medicine and Biology Magazine, Vol. 15, No. 5, pp. 103-111. Duda, R. O. and Hart, P. E. (1973). Pattern Classi cation and Scene Analysis, John Wiley, New York. Johnson, D. E., Johnson, J. R., Moore, H. P. (1980). A Handbook of Active Filters, Prentice-Hall, Englewood Cli s. Kadambe, S. L. and Srinivasan, P. (1992). Application of Adaptive Wavelets for Speech, Optical Engineering, Vol. 33, No. 7, pp. 2204-2211. Kadambe, S. L., Srinivasan, P., Telfer, B., and H. H. Szu (1993). Representation and Classi cation of Unvoiced Sounds Using Adaptive Wavelets, Proc.

of the International Society for Optical Engineering, Vol. 1961, pp. 324-335. Kamarthi, S. V. and Pittner, S. (1996a). Fast Fourier and Wavelet Transform for Flank Wear Estimation - A Comparison, unpublished Manuscript, Department of Mechanical, Industrial, and Manufacturing Engineering, Northeastern University, Boston. Kamarthi, S. V. and Pittner, S. (1996b). Sensor Data Representation Schemes for Flank Wear Estimation in Turning Processes, Technical Report, Department of Mechanical, Industrial, and Manufacturing Engineering, Northeastern University, Boston. Rumelhart, D. E. and McClelland, J. L. (1986). Parallel Distributed Processing { Explorations in the Microstructure of Cognition, Vol. 1, MIT Press, Cambridge. Szu, H. H., Telfer, B., and Kadambe, S. L. (1992). Neural Network Adaptive Wavelets for Signal Representation and Classi cation, Optical Engineering, Vol. 31, No. 9, pp. 1907-1916. Telfer, B. A. and Szu, H. H. (1994). Energy Functions for Minimizing Misclassi cation Error with Minimum-Complexity Networks, Neural Networks, Vol. 7, No. 5, pp. 809-817. Vilim, R. B. and Wegerich, S. W. (1995). A Neural Network Classi er with Analytic Translation and Scaling Capabilities for Optimal Signal Viewing, In: Intelligent Engineering Systems Through Arti cial Neural Networks Volume 5 { Fuzzy Logic and Evolutionary Programming, (C. H. Dagli, M. Akay, C. L. P. Chen, B. R. Fernandez, and J. Ghosh, (Eds.)), ASME Press, New York, pp. 719{726. Zhang, Q. and Benveniste, A. (1992). Wavelet Networks, IEEE Transactions on Neural Networks, Vol. 3, No. 6, pp. 889-898.

Suggest Documents