Efficient training of neural network models in classification ... - CiteSeerX

3 downloads 0 Views 614KB Size Report
and architecture size. In addition ... a fairly accurate line search algorithm, these methods are guaranteed to ... This is a rapidly progressive neurogenic disorder due to a loss of cells ..... The neural network architectures are expressed as strings ...
f

Efficient training of neural network models in classification of electromyographic data C. S. Pattichis I

C. Charalambous 1

L.T.

Middleton 2

University of Cyprus, Kallipoleos 75, PO Box 537, Nicosia, Cyprus ;The Cyprus Institute of Neurology and Genetics, PO Box 3462, Nicosia, Cyprus

Keywords--Conjugate gradient back-propagation algorithm, Etectromyography, Motor unit action potential, Neural networks Med. & Biol. Eng. & Comput., 1995, 33, 499-503

1 Introduction

2 Materials and m e t h o d

THE APPLICATIONof artificial neural networks (ANt'N) in the diagnosis of neuromuscular disorders based on electromyography (EMG) has recently been proposed (SCHtZAS et aL, 1990; PATTICHIS,1992). Artificial neural network models have been trained to diagnose normal (NOR), motor neuron disease (MND) and myopathy (MYO) subjects successfully (PATTICHIS et al., 1990; PATTICHIS,1992). The momentum back-propagation (MBP) training algorithm was used as proposed by Kumelhart et al. (RUMELHARTet aL, 1986). The method has a number of limitations; heavy computational and memory requirements, as well as the non-existence of design methodologies for determining the values of the learning coefficient 2, and the momentum coefficient #, number of hidden layers, and architecture size. In addition, the algorithm can exhibit oscillatory behaviour or can even diverge, depending on the values of ~. and #. It is clear that the arbitrary method of choosing the values of and # should be eliminated in order to derive an improved learning algorithm. A method is required that automatically adjusts the values of ). and # so that the resulting algorithm is efficient and reliable. The conjugate gradient method (FLETCHERand REEVES, 1964; POLAK, 1971) is a class of methods for unconstrained optimisation, which basically fulfil this requirement and is based on a sound theoretical basis. With a fairly accurate line search algorithm, these methods are guaranteed to find a local minimum and with a fast rate of convergence. It has recently been demonstrated (BATTn'I, 1990; JOaANSSON et al., 1990; CHARALAMBOUS, 1992; MOLLER, 1993) that the performance of the conjugate gradient backpropagation neural network learning algorithm (CGBP) is superior to that of the standard MBP. The purpose of this study is to apply the CGBP learning algorithm in building neural network models excited with EMG data and compare the results to those obtained by the MBP learning algorithm.

Neuromuscular disorders, although relatively uncommon, constitute a significant cause of disability in both the young and old. Of the many disorders that have been clinically identified, two basic pathological processes have been found: muscle fibres are either lost through a degenerative process or there is a loss of the motor neurons and their axons. When muscle fibres are lost, it is termed as a myopathy; when neurons or their axons are lost, this is a neuropathy or neurogenic process. A large number of clinically distinct muscle disorders are not recognised. For this study, NOR subjects, and patients suffering with MND and MYO have been selected for investigation.

First received 29 November 1993 and in final form 1 November 1994 9 IFMBE: 1995 Medical & Biological Engineering & Computing

May 1995

2.1 Motor neuron disease This is a rapidly progressive neurogenic disorder due to a loss of cells associated with the voluntary motor system including the anterior horn cells. This process is typically seen in the older age groups and is, in general, not a hereditary disorder, although a familial form is recognised. The diagnosis can be made on clinical grounds. 2.2 Myopathies This is a group of diseases that primarily affect skeletal muscle fibres and are divided into two groups, according to whether they are inherited or acquired. Muscular dystrophies are inherited diseases causing severe degenerative changes in the muscle fibres. In this group of diseases, there am four main types: Duchenne, Becket, fascioscapulohumeral and limb girdle. They show a progressive clinical course from birth or after a variable period of apparently normal infancy. The most frequently acquired myopathy is polymyositis, which is Characterised by an acute or sub-acutz onset, with muscle weakness progressing slowly over a matter of weeks. The above disorders cause structural reorganisation of the motor unit, the smallest functional unit of the muscle. Motor unit morphology can be studied by recording its electrical activity, the procedure known as EMG. In this work, EMG

Table 1 ML.~P feature vector parameters from NOR, MND and MYO subjects

subject

duration, ms

NOR8 MND6 MYO7

spike duration, ms

amplitude, rnV

area, mVms

spike area, mVms

phases

turns

MN

SD

MN

SD

MN

SD

MN

SD

MN

SD

MN

SD

MN

SD

9.45 13.65 6.57

1-74 4.00 1.22

5.63 5.82 2.90

1.85 2.38 1.07

0.33 0.72 0-43

0.22 0-50 0.40

0.37 1-22 0-23

0.15 0.86 0.16

0-26 0.70 0.15

0-09 0.51 0.15

2-30 3.60 3.10

0.47 1.19 1-33

2.75 4.05 3.10

1-07 1.43 1.17

was recorded in NOR, MND and MYO subjects from the biceps brachii muscle. Data were collected at a slight voluntary contraction for 5 s using a needle electrode. Motor unit action potentials (MUAPs) were identified and selected from the EMG recording, based on predetermined criteria (PATTICHIS, 1992). A parametric pattern recognition algorithm, based on MUAP features, was applied for recognising similar MUAPs to form MUAP sets (PATrlCHIS, 1992). MUAP set features were then applied to neural network models for obtaining a diagnosis. MUAP features measured automatically were (Fig. 1) (i) duration (dur); beginning and ending of the MUAP were identified by sliding a measuring window of 3 ms in length and 10 #V in width. (ii) spike duration (spdur); measured from the first to the last positive peak. (iii) amplitude (amp); maximum peak-to-peak measure of the MUAR (iv) area; sum of the rectified MUAP integrated over the duration. (v) spike area (sparea); sum of the rectified MUAP integrated over the spike duration. (vi) phases (ph); number of base-line crossings that exceed 25 #V, +1. (vii) turns (t); number of positive and negative peaks separated from the preceding and following peak by 25 FV. 20 MUAP sets were recorded from the biceps muscle of each subject. This number is considered acceptable for sampling the whole muscle (BUCHTAL et al., 1954). A 14-element feature vector for each subject was formed by calculating the mean (MN) and the standard deviation (SD) of the above features for the 20 MUAP sets. Typical MUAP findings for NOR, MND and MYO subjects are shown in Table I. A total of 880 MUAPs were recorded and analysed from 14 NOR subjects, 16 patients suffering from MND and 14 patients suffering from various forms of MYO. Eight subjects from each

~ o u p were used for training the ANN models, whereas the remaining subjects were used for evaluating their diagnostic performance. A two-dimensional scatter plot of the parameters, mean duration and mean amplitude for each subject is shown in Fig. 2. This illustrates the complexity of the data under investigation, showing that no clear boundaries enclosing each group can be drawn. Mean duration of NOR subjects varies from 8 to 12 ms, mean amplitude varies from 0.280 to 0-520 mV, and mean number o f phases varies from 2 to 4. Myopathy patients usually have MUAPs with short duration, low amplitude and a small number of phases, whereas MND patients have MUAPs with long duration, high amplitude, and a large number of phases. The following metrics have been used for measuring the performance of the neural network diagnostic system (EBERHARTand DUBBINGS, 1990). For a given decision, suggested by a certain output neuron, four possible alternatives exist; true positive (TP), false positive (FP), true negative (TN), and false negative (FN). In our study, a TP decision occurs when the positive diagnosis of the system coincides with a positive diagnosis according to the physician. An FP decision occurs when the system made a positive diagnosis that does not agree with the physician's. A TN decision occurs when both the system and the physician suggest the absence of a positive diagnosis. An FN decision occurs when the system made a negative diagnosis that does not agree with the physician's. From these four measures, the following percentages have been calculated for the N cases in the evaluation set: %CCs = 100 • ( T P + T N ) / N , %FPs = 100 x FP/(TN+FP),. %FNs = 100 x FN/(TP + FN).

3 Conjugate gradient beck-propagation

The following gives an overview of the theory of the CGBP learning algorithm (CHARALAMBOUS, 1992). Fig. 3 shows an

1-0

~turns

o

0-9

9 NOR o MND MYO

0-8 >

!_\,

!

E

o ~

0-7

i

o

0

o o

0.5

T

0"4

50pV

[. I

spil~eduration dura'~on

o

'lip d'

o

0-3 0-2

J

t~ :

L

,

5

,

,

6

.

,

7

,

,

8

,

,

9

.

.

.

10

.

.

,

11

12

,

,

.

13

,

14

,

.

15

,

,

16

duration, ms

1

Fig. 2

Fig, 1 MUAP parameters

==

o

o

tms

l

0

0

0.6

i' \ ] V

algorithm

Two-dimensional scatter plot of mean duration and mean amplitude ]br each subject

Medical & Biological Engineering & Computing

May 1995

input

N-input neuron. At time k, the components of the kth input pattern vector

Xk=[XoXtkX2k "''XNI] r,

k=l,

2, . . . , p

(1)

are weighted by the corresponding components of the weight vector

w = [w0 w, ... ~ 1 ~

SD area MN SO

N

WTXk

~---

(~t)

phases , ~ ,

The superscript T is used to denote transposition, p is the number of input patterns and w0 is the bias weight connected to a constant input x0 = I. w0 controls the threshold level. The analogue output is then passed through the sigrnoid function

myopathy

turns MN = SD Fig. 4

f(x)

1 - e -~

= tan

1 + e -x

h (2)

(4)

to obtain a scaled output between - l and +1. For each input pattern vector xk, the network output yk is compared to its target value or class (desired response) dk. This leads to the kth error function

ee(W)=

yk(W) - d ,

(5)

between the kth actual output and the kth target value. Ideally, we would like to choose the weight vector Wthat will give zero errors for all input patterns. In an attempt to force the errors to become zero or to get close to it in the least squared sense we minimise the least squares error function over all patterns P

dp(W)

=Z

e](W)

(6)

k=l

In general, neutrons are highly interconnected to produce "multilayer feedforward networks with one or more layers of nodes between the input and output. These additional layers contain hidden units or nodes that are not directly connected to both the input and output nodes. A three-layer neural network with two layers of hidden units is shown in Fig. 4. For an Moutput network, the least squared error function becomes p

d)(W)

M

(e#-d.ik)2

=Z Z

(7)

k~ 1 j=1

where W is an n-dimensional column vector whose elements are the interconnecting weights of the network. The CGBP learning algorithm minimises the objective function ~(W).

motor

neuron disease

spike area MN SD

i=1

=

output layer

normal

SO amplitude MN

(2)

M WlXik

~mo~l

hiOdenlayer

duration MN SD spike ~Jration MN

to produce the kth analogue output y~

yk(w) = wo + Z

first

hidden layer

M~dtilayerfeedforward neural network with 14 inputs, two hidden layers and three outputs; 14-input feature vector consists of MN and SD of seven MUAP parameters; three output nodes correspond to NOR, MND, and MYO

The main steps of the algorithm are

Step 1: [Initialisation] set W = W (~ (starting point), and D (o) = -vc,(w(O)) (8) Step 2: [Line search] at time k (k is the iteration counter), starting at point W (k), search along the line that is parallel to direction D (k) to determine a step length ct~, relative to its value at = = 0, that will 'sufficiently' decrease the function value of the single variable function LF(oO = O ( W (k) + 729(k))

(9)

Step .3." [Update the estimate of minimum point] set WTM = W (k) + ~kD if') (10) Step 4.

[Direction o f search] set D (k+l) = - V ~ ( W

+/~k+tDCk))

(*)

(1 t)

(v,(w('+'))/~k+ 1 =

V~(W(k)) r v ~ ( W(k)) (POLAK, 1971)

(12)

Step 5: [Stopping criterion] if II v~(wC*+l)) I1< s stop; otherwise set k = k + 1 and go to Step 2. kth input feature weight vector vector X, W

The back-propagation algorithm updates its weight vector using the following formula

[xo-1

x'* 1~,,,, ~

,

~

--

where /I. and # are the learning and momentum coefficients, respectively, which are fixed constants assigned by the user. The original back-propagation algorithm of Rumelhart et aL corresponds to the ease where /1 = 0, in which case the algorithm becomes the steepest descent algorithm with fixed step size 2 (RUMELHART et aL, 1986). Comparing the backpropagation and the conjugate gradient algorithms, it can be seen that the conjugate gradient algorithm can be considered as the back-propagation algorithm with adjustable coefficients 2 and /z, ),k = ~q, and #, = ~t~flk.

nonlineari~

Fig. 3 N-inputneuron

Medical & Biological Engineering & Computing

May 1995

501

4 Results 25~]

ANN models trained with the CGBP algorithm are shown in Table 2. The neural network architectures are expressed as strings, showing the number of inputs, number of nodes in each of the hidden layers and number of outputs. The number of inputs was 14, for all the models, i.e. the MN and SD of the seven MUAP parameters, and the number of outputs was always three, corresponding to NOR, MND, and MYO (Fig. 4). The number of function evaluations and the number of iterations k are tabulated. The function 9 under consideration is the total sum of squares o f the differences between the actual and the desired outputs. As outlined in the description of the CGBP algorithm (CHAI~ALAMBOUS,1992), for each iteration, a line search along the chosen direction of search is performed. During the line search, a number of function evaluations are carried out with different step length ~, until the value of ct, ak is found that sufficiently decreases the function value. With the exception of model 18, for one iteration, usually three to four function evaluations were carried out. One function evaluation in the CGBP algorithm is equivalent to one epoch in the MBP algorithm. In one epoch, all 24 patterns were applied at random to the neural network models. The training process for each model was stopped when the final function value decreased below the value of 0.09. The CGBP models given in Table 2 achieved 100% correct classification (%CCs) for the training set. The evaluation set performance for these models is based on the diagnostic yield metrics defined in Section 2. In addition to ~ the percentage of false positive (%FPs) and percentage o f false negatives (%FNs) were computed. A zero score for either %FPs or %FNs indicates that the system correctly classified all the subjects as either positives or negatives, respectively. The learning performance of the CGBP on models 7, 9, and 18 of Table 2 is compared to that of the MBP algorithm with different values of 2 and /~ coefficients, in Figs. 5, 6, and 7, respectively.

5 Discussion The CGBP algorithm was used for training ANN models for EMG diagnosis. The results of CGBP models in Table 2 are discussed. With this algorithm, both one and two hidden layer networks were trained. The number of hidden nodes for the one hidden layer networks varied from 3 to 25, models 1 to 5 (Table 2). The percentage of correct classifications for the

Table 2

20

architecture: 14-5-10-3 MBP algorithm, k = 0.0!, p = 0.01 MBP algorithm, k = 0.01, p = 0-10 MBP algorithm, k = 0"05, p = 0.10 CGBP algorilhm 25

15

2

0

~

15

t\ 0

~

-

:i ........ 250

500

750

1000

1250

1500

1750

2000

function evaluation

Fig. 5 Performance of CGBP and MBP algorithms on ANN with architecture 14-5-10-3

training set was 100% and for the evaluation set varied from 75 to 95%. Model 1, with architecture 14-3-3, managed to achieve learning within 134 function evaluations, and the value of dropped to 0.069. It is interesting to note that, although the architecture of this model is very small, a solution was found. Model 14-10-3 gave a 90% correct classifications score, %FPs = 0, and %FNs = 14 within 129 function evaluations and the value of 9 dropped to 0-072. By increasing further the number of nodes in the first hidden layer, no improvement in diagnostic performance was achieved. The learning performance of the one hidden layer CGBP models illustrates that r dropped to less than 1 within 100 function evaluations. In most of the two hidden layer CGBP models that were investigated, the percentage of correct classifications for the evaluation set was 90%. With the exception of model 18, the number of function evaluations o f the two hidden layer models goes up to 200, with the value o f r in the range o f 0.050 to 0.090. The performance o f two-layer ANN models is examined in three groups. The first group consists of models 6, 7 and 8. The value of %CCs for the evaluation set is in the range of 70 to 100%. Model 7 is the only one that gives a 95% success for the evaluation set. The second group consists of models 9, I 0, 11 and 12. The best %CCs for the evaluation set is 90%, %FPs = 0 and %FNs = 14, for both models 9 and 10. The %CCs drops to 85% when the number of nodes in the second

CGBP EMG diagnostic models

model

ANN

weights

k

function evaluations

~

training %CCs

%CCs

1

14-3-3 14-10-3 14-15-3 14-20-3 14-25-3 14-5-5-3 14-5-10-3 14-5-15-3 14-10-5-3 14-10-10-3 14-10-15-3 14-t0-25-3 14-15-10-3 14-20-10-3 14-20-25-3 14-25-15-3 14-35-20-3 14-40-10-3

51 170 255 340 425 I t0 "150 190 205 270 335 465 390 510 855 770 1250 990

43 53 48 30 58 42 63 60 51 30 48 51 36 43 47 38 34 100

134 129 120 86 163 118 190 178 133 76 124 152 99 t23 126 I01 12t 860

0.069 0.072 0.086 0.018 0-085 0-087 0-067 0-055 0.049 0.065 0-077 0-088 0.080 0.080 0-076 0-080 0.080 0.074

100 100 100 100 100 100 100 100 100 100 100 I00 100 I00 100 100 100 100

85 90 85 75 85 85 95 75 90 90 85 85 90 85 90 90 90 90

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

502

9 ,. 9 9

evaluation %FPs %FNs 17 0 0 0 '17 33 0 33 0 0 17 17 0 33 17 17 17 17

14 14 2l 36 14 7 7 21 14 14 14 14 14 7 7 7 7 7

Medical & Biological Engineering & Computing

May 1995

It can be concluded that

architecture: 14.10-5-3 9 MBP algorithm, k = 0"01, 9 MBP algorithm, ;k = 0-01, 9 MBP alcj~rithm, L = 0"05. r MBP algorithm, k = 0,10, 9 CGBP algorithm

25

20

~ p p p

= 0-01 = 0-10 = 0"50 = 0-50

25

,

r

lo f ~

0

o i-

250

',,--.-.._._. - - - - - _ ~

750

500

1000

1250

1500

(a) unlike the MBP algorithm, -where several trials have to be run with different values of ). and # to achieve learning, in the CGBP algorithm the selection of 2 and /~ is completely eliminated. As previously stated, the CGBP algorithm is the same as the MBP algorithm, but with adjustable values of 2k and #k at each iteration. (b) the CGBP algorithm does not exhibit any oscillatory behaviour during learning, unlike the MBP algorithm. The value of * monotonically decreases due to the line search procedure of the CGBP algorithm. (c) models trained with the CGBP algorithm required smaller architecture size and thus less computational effort. (d) diagnostic performance of CGBP models is comparable to that of the MBP models.

function evaluation

~.6

Performance of CGBP and MBP algorithms on ANN with architecture 14-t0-5-3 6 Conclusions

hidden layer is increased further. These models achieved learning within 80 to 150 function evaluations. The third group consists of models 13 to 17. With the exception of model t4, a 90% correct classification was obtained for the evaluation set. For model 14, %FPs = 33 and %FNs = 7, whereas for models 15 to 17, %FPs = 17 and %FNs = 7. All models achieved learning within 100 function evaluations. The learning performance of the CGBP algorithm is compared to the learning performance of the MBP atgonthm trained with different values of 2 and # for neural network models with architectures: t4-5-10-3 (Fig. 5), 14-10-5-3 (Fig. 6), and 14-40-10-3 (Fig. 7). For models (14-5-10-3) and (1410-5-3), the CGBP algorithm outperformed the MBP algorithm in relation to the number of function evaluations required to achieve learning. The percentage of correct classifications was of the same order for both algorithms. For the 14-40-10-3 CGBP model, 860 function evaluations were carried out for dropping r to 0.074. The percentage of the correct classifications for the traLning and evaluation sets was t 00% and 90~ respectively. However, MBP models with 2 = 0.1 and # = 0.1 and 2 = 0.5 and p = 0-5 learned faster than the CGBP model. The oscillatory behaviour of MBP model with 2 = 0-5 and # = 0-5, as well as the monotonically decreasing curve of model 18 (Table 2) are shown in Fig. 7.

architecture: 14-40-10-3 9 MBP algorithm, L = 0.01, p = 0.01 = MSP algorithm, ;k = 0.10, p = 0.10

251t/'" }~1 20 ~,1

0

,~ MBP algorithm, X ,, 0"50, p

100

200

300

400

500

=

600

0-50

700

800

900 '1000

~unetkmevalu~on

The usefulness of artificial neural networks trained with the momentum back-propagation algorithm in the classification of electromyographic data has recently been demonstrated, tn this study, a conjugate gradient back-propagation learning algorithm was applied in the training of ANN models, and results are compared to those of the momentum back-propagation algorithm. Both algorithms gave similar diagnostic yields. However, the conjugate gradient back-propagation learning algorithm significantly reduced the training time and the model architecture size. Acknowledgments--This research work was supported by the Cyprus

Institute of Neurology and Genetics.

References

BUCHTHAL, F., P[NELLt, P., and ROSENFALCK, ~ (1954): 'Action

potential parameters in normal human muscle and their physiological determination,' Acta. PhysioL Scand., 32, pp. 2t9-229 BATTITI, R. (1990): 'Optimisation methods for back-propagation: Automatic parameter tuning and faster convergence.' Proc. tnt. Joint Neural Network Con., Washington DC, 1, pp. 593-596 CitAP,ALAMBOUS, C. (1992): 'A conjugate gradient algorithm for efficient training of artificial neural networks,' lEE Proc. Circ. Dev. Syst., 139, (3), pp. 301-310 EBERHART,R. C., and DUBBINGS,R. Vv'.(1990): 'Neural network PC tools, practical guide' (Academic Press) FLETCHER,R., and REEVES, C. M. (1964): 'Function minimization by conjugate gradients,' Comput. d., 7, pp. 149-154 JOHAN$SON,E. ~'v[.,DOWLA,F. O. and GOODMAN,D. M. (1991): 'Backpropagation learning for multi-layer feed-forward neural networks using the conjugate gradient method,' lnt. d. Neur. Syst., 2, (4), pp. 291-302 MOLLER,M. E (1993): 'A scaled conjugate gradient algorithm for fast supervised learning,' Neur Netw., 6, pp. 525--533 PAITICI-IIS, C. S. (1992): 'Artificial neural networks in clinical etectromyography.' PhD Thesis, Queen Mary and Westfield College, University of London, UK PATTICHtS,C., SCHtZAS,C., MIDDLETON,L., and FINCHAM,W. (1990): 'Computer aided clinical electromyography.' Proc. 12th Ann. Conf. of IEEE EMBS, 12, (5), pp. 2229-2231 POLAK, E. (1971): 'Computational methods in optimisation: a unified .approach, (Academic Press, New York) RUMELHART, D. E., HFNTON, G. E., and WILLIAMS, R. J. (1986): 'Parallel distributed processing: explorations in the microstructure of cognition,' (MIT Press) Vol. t, pp. 318-362 SCHIZAS,C. N., PATTICHtS,C. S., SCHOFIELD,1. S., FAWCETT,P. R., and

Fig. 7 Performance o f CGBP and MBP algorithms on ANN with architecture 14-40-10-3 M e d i c a l & Biological E n g i n e e r i n g & C o m p u t i n g

May 1985

MIDDLE'tON,L. 1". (1990): 'Artificial neural nets in computer-aided macro MUAP classification,' IEEE EMBS Mag., 9, (3), pp. 31-38 503

Suggest Documents