Knowledge extraction from SID epidemiological data ...

Knowledge extraction from SID epidemiological data using neural networks B.Solaiman*,A.Hillion*, Dr C.LeBot** and Dr D.Alix** * Ecole Nationale Supérieure des Télécommunications de Bretagne, Dept MSC BP 832, 29285 BREST - Cedex ** Centre Hospitalier Regional & Universitaire Morvan, 5, Av Foche, 29200 BREST - Cedex

Abstract Knowledge extraction is an important problem that has been little addressed by neural networks. In this work, we try to analyse the knowledge stored in a trained three-layer perceptron. It is shown that when analysing the internal structure of the network, the classification solution realised by the network may be optimal in terms of classification results but not optimal in terms of knowledge representation. A simple method is proposed in order to reorganise and to extract knowledge stored in the synaptique weights.

I-Introduction : Neural networks ( NN ) using the BackPropagation learning algorithme ( BP ) have been successfully used for pattern classification [1]. In this work, the problem of Sudden Infant Death (SID) syndrom is considered. This syndrom is the main cause of death in the first year of life accounting the death of about 2000 babies every year in France. A medical data base called "Save The Babies" [2] containing information on all the babies of the following cases: SID, Near miss (unexpected accident representing an imminent death for the observer) and sibling related (child born after the SID death of a brother or a sister), has been created at the Regional Hospital Center of Brest in France. This data base contains data concerning the health record of the child and antecedent cases results of tests carried out during pregnancy, the first months of life. The main objectif of this study is to extract the knowledge from a trained neural network using the learning data base. The proposed method is composed of two levels, classification and knowledge extraction levels. The first level consists on classification of the SID epidemiological data using a simple multilayer perceptron. The second level consists on knowledge extraction from the internal representation of the trained neural network.

II- Learning base and network description : The data base "Save The Babies" containes several hundreds of files. The features describing each babie (file) are of different natures : discret value features (sex, sleep position , artificial milk used,....) and continouse value features ( age, weight, quantity of milk,... ). In our study, only 32 features have been used. Files with small quantities of mised data are retained.This leads to 78 sudden infant death files, 94 near miss files and 22 sibling related files.All continous value features are normalised in order to have values in the interval [0,+1]. The neural network that was tested is a three-layer fully connected feed-forword perceptron with one hidden layer. The first layer contains 32 input neurones, the hidden layer 6 neurones and the output layer 3 neurones. Each output neuron refers to one classe : SID, NM (Near Miss) and SR (Sibling Related). This is shown in figure.1. :

feature 1 hidden neuron 0 feature 2 SID class output NM class output SR classe output

feature 32

hidden neuron 5

Figure.1. The three-layer perceptron architecture The input neurons output the input data without modification. The input/output caracteristics 1 + tanh(x) of the hidden and output neurones are specified by a sigmoid function: f(x) = . 2

III-Results and internal representation of the trained NN: In order to realise knowledge extraction from the learning data, the proposed method is composed of two levels, classification and knowledge extraction levels. The first level consists on data classification using the three-layer perceptron described in the previous section, This means that, we consider the problem as a pattern classification one. The second level consists on knowledge extraction from the internal representation of the trained neural network.

III- a- Classification results : The three-layer perceptron was trained using the BP learning algorithm. Best results were obtained when choosing 6 neurones in the hidden layer. Classification results are resumed in the following confusion matrix :

2

SID

NM

SR

SID

54

24

0

NM

0

93

1

SR

0

6

16

Confusion matrix Tests results showed that adding more neurons than 6 in the hidden layer in order to improve the classification results is no worth

III- b- Internal analysis of the trained network: The choose of a three-layer perceptron in this study is due to the fact that when the network is trained, hidden neurones will determine the positions of the best linear separation surfaces ( or Hyperplanes ) in the data space. The synaptique weights between the hidden layer and the output layer will realise a kind of logical functions in order to formulate decision functions for different classes. In this study, our main objectif is to understand how did the network realise its internal representation of the knowledge contained in the learning base in order to extract the most significant information about the SID syndrom. So, only the SID class output is analysed here. The proposed method for analysing the internal representation consists of three steps. It is described as follows: Step1). Suppression of " no sens " hyperplanes. In this step, hidden neurones connected with nearly zero synaptique weight to the output neuron will have no influence on the decision made by the network concerning this classe.So, they can be suppressed. In our case, the hidden neuron "0" determins no sens hyperplane, and therefor will be suppressed in the following analysis. Step2). Two (or three) dimensional projection. The two ( or three ) dimensional projection of the hyperplanes and the training data is made in order to have a "visual" interpretation of how the network distributed its hyperplanes in the data space. This is not easy to do in general because this means a sever reduction in the dimensiality of the data space ( in our case, this means a reduction of the dimensiality from 32 dimensions to two dimentions ). This projection is realised by positionning lines ( projections of the hyperplanes ) and the training data at one or another sides of these lines depending on the value of the output result of the hidden neuron, when the data are presented to the network. If the output value is greater than 0.5, the pattern must be placed in D+ area, otherwise in D- area.

3

This projection is shown in figure.2. Z5

Z4 38 x

+ -

D5

12 x Z3 Z1

- D3 +

36 x 16 x 16x 3x Z0

D1

+

+ -

40 x

-

Z2 good classified, SID bad classified, SID

D2

+ D4

good classified Sibling related Near miss bad classified Sibling related

Fig.2. Two dimensional projection of the hyperplanes and data Fortunatly, in our case, it was impossible to visualise only four "patterns" ( or files ) all from the near miss class . One of these patterns was missclassified as a sibling related pattern. Step3). Internal Interpretation: This two dimensional projection is fodamental for the interpretation of obtained results and of the distribution of the data space. The contribution of each neuron from the hidden layer into the decision made by the output neuron can be determined as a function of the synaptique weight connecting them. In our case, the connection weights of the five non suppressed neurones are given by (7.1, +1.7, -3.0; +3.0, +4.0). This can be interpreted as follows: - D1+ contains patterns that must not be interpreted as belonging to the SID class (synaptique weight = -7.1). In fact, nearly 100% of NM and SR patterns are in this area. - D5+ contributes strongly to the decision of the SID class (synaptique weight = +4.0). Notice that D5+ contains 8 patterns only from SID class. -........ This classification solution found by the network is not optimal in terms of knowledge representation. This can be easily seen by noticing that D1- contains 52 patterns from SID

4

class (66%) and that D1 plays only the role of telling the SID neuron that D1+ contains other classes patterns. The decision made by the SID neuron is based on D2+ D4+ and D5+. This means that the optimisation of the classification rate does not mean the optimisation of knowledge representation.

III-d- SID important features The main results obtained in this section concerning the SID syndrom, are du to internal interpretation made in the last section. In fact, given that 66% of the SID class are linearly separable from other data of the learning base, in this section we focus our attention on the hidden neuron "1" wich realise this separation. A simple adaline is used to realise linear separation of ths SID class from the other two classes. The non linear function is the same [0..1] sigmoid function used previously. The initialisation of the this adaline is choosn to be the same synaptique weights connecting the input neurons to the hidden neuron "1" in the three layer perceptron used previously. The Widrow-Hoff algorithm has been used in minimising the LMS error. A desired output "0" is choosen to the class SID. Obtained results are resumed in the following confusion matrix:

SID

NM/SR

SID

55

23

NM/SR

0

116

Negative synaptique weights between inputs and the output neuron mean important features for the decision of SID. The following features are found to be of great importance : (Age, use of artificial milk, usual quantity of milk, frequent feaver, frequent transpiration, and the interval between the last two bregnancies). Some of these parameters had already been pointed to be important by medical experts. These results are actually under study and validation.

IV-Conclusions : In this study, knowledge extraction from a trained three-layer perceptron is considered. The main result obtained here is that the knowledge "stored" in the synaptique weights must be treated carefully if the objectif is to extract this knowledge and not only to realise pattern classification. Conerning the SID syndrom, important features obtained during this study are actually under validation by the medical community.

Acknowledgment: The authors are grateful to Mme C.Ghesquière (CRITT GBM) for fruitful discussions. This work is supported in part by a grant from "CRIT Génie Biologique et Médical" in Rennes, Région Bretagne.

References : [1] :D.E.Rumelhart, G.E.Hinton & R.J.Williams, " Learning internal representations by error propagation ", Parallel Distributed Processing, Vol 1, ch 8, 1986. [2] :P.Aubergier,D.Alix,B.Solaiman,,A.Hillion and A.Lasquellec, " Trying to understand the reasons for cot death", IEEE 10th annual International Conference EMBS, Novembre 47, 1988, New Orleans,USA.

5

6