Unsupervised Clustering Methods for Medical Data: An Application to Thyroid Gland Data Songül Albayrak Computer Engineering Department, Yildiz Technical University Istanbul, Turkiye
[email protected]
Abstract. The purpose of this paper is to examine the unsupervised clustering methods on medical data. Neural networks and statistical methods can be used to develop an accurate automatic diagnostic system. Self-Organizing Feature map as a Neural Network model and K-means as a statistical model are tested to predict a well defined class. To test the diagnostic system, thyroid gland data is used for the application. As a result of clustering algorithms, patients are classified normal, hyperthyroid function and hypothyroid function.
1 Introduction The amount of data is rapidly growing at the result of scientific measurements and experiments. In the area of medicine, devices used to obtain data by experimental measurement have been developing and the traditional manual data analysis has become inefficient and methods for efficient computer-based analysis are indispensable. Neural network, statistical methods and machine learning algorithms are currently being tested on many medical prediction problem to achieve accurate automatic diagnosis [7]. In literature, breast cancer, heart disease, hepatitis, liver disorders, lymphography, Pima Indians diabetes have been investigated and usually statistical methods, such as decision trees, Bayes classifier and standard linear discrimination etc., have been used. Furthermore, Baxt used backpropagation to identify myocardial infarction on a coronary artery disease database, Rosenberg et al. found performance of a radial basis function network to be comparable with that of human experts and superior to various backpropagation methods and for breast cancer detection, researchers have successfully applied backpropagation, ART and fractal analysis[7]. In this work, unsupervised clustering methods were performed to cluster the patients into three clusters by using thyroid gland data obtained by Dr. Coomans[1]. To measure the thyroid gland functions, five different tests were applied to patients and the test results were used by other researchers for the classification purpose. In a very recent work, L. Ozyilmaz and T. Yildirim are investigated the supervised classification methods to develop a medical diagnostic system on this data [2]. In the work
presented here, self-organizing feature map and K-means algorithms are used as an unsupervised clustering method to cluster the patients. As a result of clustering algorithms, patients status are classified normal, hyperthyroid function and hypothyroid function. In this paper, some information about clustering algorithms, K-means and SelfOrganizing Feature Map algorithm, used in this application and the thyroid gland data are also given.
2 K-means Clustering The ability to determine characteristic prototypes or cluster centers in a given set of data plays a central role in the design of pattern classifiers based on the minimum distance concept. K-means algorithm is based on the minimization of the squared distances from all points in a cluster domain to the cluster center. This procedure consists of the following steps [3]. Step1: Choose K initial cluster centers z1(1), z2(1), zK(1). These are arbitrary and are usually selected as the first K samples of the given sample set. Step 2: At the t th iterative step distribute the samples {x} among the K cluster domains, using the relation,
x ∈ S j (t ) if x - z j (t ) < x − z i (t )
(1)
for all i=1,2,…,K, i≠j , where Sj(t) denotes the set of samples whose cluster center is zj(t). Step 3: From the results of Step 2, compute the new cluster centers zj(t+1), j=1,2,…,K, such that the sum of the squared distance from all points in Sj(t) to the new cluster center is minimized. In other words, the new cluster center zj(t+1) is computed so that the performance index is minimized. The zj(t+1) which minimizes this performance index is simply the sample mean of Sj(t). Therefore, the new cluster center is given by z j (t + 1) =
1 Nj
∑x
j = 1,2,. . . , K
(2)
x∈S j ( t )
where Nj is the number of samples in Sj(t). The name “K-means” is obviously derived from the manner in which cluster centers are sequentially updated. Step 4: If zj(t+1)=zj(t) for j=1,2, . . . ,K, the algorithm has converged and the procedure is terminated. Otherwise go to Step 2.
3 Self-Organizing Feature Map (SOFM) Algorithm Self-Organizing Mapping is a kind of neural network which is based on competitive learning. The output neurons of the network compete among themselves to be activated, with the result that only one output neuron or one neuron per group wins the competition. The output neurons that win the competition are called winner-take-all neurons. One way of inducing winner-take-all competition among the output neurons is to use lateral inhibitory connections between them [4,5,6].
Two dimensional array of neurons
Input
Fig. 1. Two-dimensional lattice of neurons that are fully connected to the inputs. In a self-organizing feature map, the neurons are placed at the nodes of a lattice that is usually one or two dimensional. The neurons become selectively tuned to various input patterns in the course of a competitive learning process. The location of the neurons (i.e. winning neuron) so tuned tend to become ordered with respect to each other in such a way that a meaningful coordinate system for different input feature is created over the lattice. There are three basic step involved in the application of the algorithm after initialization. These three steps are sampling, similarity matching and updating steps and are repeated until the map formation is completed. The algorithm is summarized as follows: 1. Initialization: Choose random values for the initial weight vector wj(0). The only restriction here is that the wj(0) must be different for j=1,2,. . .,N , where N is the number of neurons in the lattice. It may be desirable to keep the magnitude of the weights small. 2. Sampling: Draw a sample x from the input distribution with a certain probability; the vector x represents the sensory signal. 3. Similarity Matching: In the formulation of adaptive algorithm, it is convenient to normalize the weight vectors wj to constant Euclidean norm (length). In such a situation, the best-matching criterion described here is equivalent to the minimum Euclidean distance between vectors. If we use the mapping i(x) to identify the neu-
ron that best matches (winning) the input vector x, we may determine i(x) by applying the following condition;
i( x) = arg min x(t ) − w j (t ) j
j = 1,2,..., N
(3)
t represents the iteration count. 4. Updating: Adjust the synaptic weight vectors of all neurons, using the update formula
w j (t ) + η (t )[ x (t ) − w j (t )] w j (t + 1) = w j (t )
j ∈ Ai ( x ) (t ) otherwise
(4)
where η(t) is the learning-rate parameter, and Ai(x)(t) is the neighborhood function centered around the winning neuron i(x); both η(t) and Ai(x)(t) are varied dynamically during learning for best results. 5. Continuation: Continue with step 2 until noticeable changes in the feature map are observed. 3.1 Selection of Parameters The learning process involved in the computation of a feature map is stochastic in nature, which means that the accuracy of the map depends on the number of iterations of the SOFM algorithm. Moreover, the success of map formation is critically dependent on how the main parameters of the algorithm, namely, the learning rate parameter η and the neighborhood function Ai are selected. Unfortunately, there is no theoretical basis for the selection of these parameters. They are usually determined by a process of trial and error [5]. The learning-rate parameter: η(t) is used to update synaptic weight vector wj(t) should be time varying. In particular, during the first 1000 iterations or so, η(t) should begin with a value close to unity; thereafter should be decreased gradually by staying above 0.1. The exact form of variation of η(t) with t is not critical; it can be linear, exponential, or inversely proportional to t. It is during this initial phase of the algorithm that the topological ordering of the weight vectors wj(t) takes place. This phase of the learning process is called the ordering phase. The remaining iterations of the algorithm are needed principally for the fine tuning of the computational map; this second phase of the learning process is called the convergence phase at a small value for a fairly long period of time, which is typically thousands of iterations[5]. Neighborhood Function: For topological ordering of the weight vectors wj(t) to take place, careful consideration has to be given to the neighborhood function Ai(t). Generally, the function Ai(t) is taken to include neighbors in a square region around
the winning neuron, as illustrated in Fig.2. For example, a “radius” of one includes the winning neuron plus the eight neighbors. However, the function may take other forms, such as hexagonal or even a continuous Gaussian shape. In any case, the neighborhood function Ai(t) usually begins such that it includes all neurons in the network and then gradually shrinks with time[5].
Ai =3 Ai =2 Ai =1 Ai =0
Fig. 2. .Square topological neighborhood Ai, of varying size, around “winning” neuron i, identified as a black circle.
4 The Functions and Properties of the Thyroid Gland The thyroid gland is the biggest gland in the neck. It is situated in the front neck bellow the skin and muscle layers. The thyroid gland takes the shape of a butterfly with the two wings being represented by the left and right thyroid lobed which wrap around the trachea. The sole function of the thyroid is to make thyroid hormone. This hormone has an effect on nearly all tissues of the body where it increases cellular activity. The function of the thyroid therefore is to regulate the body’s metabolism [9]. The thyroid gland is prone to several very distinct problems, some of which are extremely common. Production of too little thyroid hormone causes hypo-thyroidism or production of too much thyroid hormone causes hyper-thyroidism. In this work, thyroid database [10] are investigated to cluster by unsupervised methods. This data set contains 3 classes and 215 samples. These classes are assigned to the values that correspond to the hyper-, hypo- and normal function of the thyroid gland. The followings give the 5 tests which are applied to patients to measure the thyroid functions . 5 dimensional feature vector is obtained as x=[x1,x2,x3,x4,x5] from the applied tests
1-T3-resin uptake test (A percentage) 2-Total Serum thyroxin as measured by the isotopic displacement method. 3-Total Serum triiodothyronine as measured by radioimmuno assay 4-Basal thyroid-stimulating hormone (TSH) as measured by radioimmuno assay 5-Maximal absolute difference of TSH value after injection of 200 micro grams of thyrotropin-releasing hormone as compared to the basal value.
5 Experimental Results K-means which is a distance based statistical method and SOFM algorithms were used for analyzing the unsupervised clustering methods on the thyroid gland data. Pascal codes were produced for this purpose and clustering was achieved over 215 thyroid data. After the clustering the correct classification rate was obtained as 78.1% for K-means algorithm. Two different structure, 1-D map and 2-D map, have been used for SOFM. 1-D map has 20 lattice of nodes and 2-D map includes 7x7 lattice of nodes. SOFM method has the same η(t)=1/√t learning rate for both structure. Correct classification rate obtained with 2-D SOFM was 96.3% while it was 93.0% for 1-D SOFM. Overall results were given in table 1. As can be seen from the results, SOFM gives the better results than K-means clustering. Table 1. Performance comparison on the Thyroid gland data
Classifiers K-means 1-D SOM (20 nodes) 2-D SOM (7x7 nodes)
Correct Classified sample 168 200 207
Correct Classification rate 78.1 % 93.0 % 96.3 %
6 Conclusion In this work, unsupervised clustering methods have been used for the purpose of medical diagnosis. Thyroid gland data was used as an application. SOFM which is the unsupervised neural network structure gives the better results than statistical based Kmeans clustering methods for this specific problem. In the application, size of lattice, learning rate and neighborhood function in the SOFM are the most important factors on the performance of clustering process. If the size of lattice is increased, the performance of the clustering process increases while the speed of the system decreases. However, to speed up the SOFM algorithm, due to its suitable structure, parallel pro-
gramming can be used. This work has demonstrated the motivation for research in medical diagnosis using unsupervised clustering methods.
References 1. Coomans, D., Broeckaert, I., Jonckheer, M., Massart D.L.: Comparison of Multivariate Discrimination Techniques for Clinical Data - Application to the Thyroid Functional State. Methods of Information in Medicine, Vol.22, (1983) 93-101 2. Ozyilmaz L., Yildirim T., Diagnosis of Thyroid Disease using Artificial Neural Network Methods, Proceedings of the 9’th International Conference on Neural Information Processing (, ICONIP’02) (2002) 3. Tou, J.T., Gonzalez R.C., Pattern Recognition Principles, Addison-Wesley Publishing Company (1974) 4. R.J.Schalkoff, Pattern Recognition: Statistical, Structural and Neural Approaches, John Wiley &Sons, Inc. (1992) 5. Haykin S., Neural Networks: A Comprehensive Foundation, Macmillan College Publishing Company, New York (1994) 6. N.K.Bose, P.Liang, Neural Network Fundamentals with Graphs, Algorithms and Applications, Mc Graw Hill (1996) 7. Carpenter A.G., Markuzon N., ARTMAP-IC and medical diagnosis: Instance counting and inconsistent cases, Elsevier Neural Networks 11, 323-336, (1998) 8. Deng D., Kasabov N., On-line Pattern Analysis by Evolving Self-Organizing Maps, (2000) 9. www.endocrineweb.com/thyroid.html, 2002 10. www.ics.uci.edu/pub/ml-repos/machine-learning-database/, 2001