An example of article layout - Semantic Scholar

1 downloads 0 Views 681KB Size Report
... L. C., Black P. M., Lau C., Allen J. C., Zagzag D., Olson J., Curran T., ... Reich M., Latulippe E., Mesirov J. P., Poggio T., Gerald W., Loda M., Lander E. S. & ...
Volume 12

Communal Neural Network for Ovarian Cancer Mutation Classification Muhammad Shoaib B. Sehgal, Iqbal Gondal, and Laurence Dooley GSCIT, Monash University VIC 3842, Australia Email: {Shoaib.Sehgal, Iqbal.Gondal, Laurence.Dooley}@infotech.monash.edu.au

Abstract Microarrays are being used to express thousands of genes at a time which is helpful to diagnose and cure many diseases with higher accuracy using diagnostic classifiers. However, 90% of the time gene expression datasets contain multiple missing values because of slide scratches, hybridization error, image corruption and etc. These missing values affect classifiers accuracy as most of the classifiers either ignore the missing values of data or replace it with zero value. In this paper we have presented an innovative Communal Neural Network (ComNN) model that estimates the missing values based on genetic correlation principle. The classification accuracy of the proposed system is compared with other well known techniques like Support Vector Machine (SVM), Generalized Regression Neural Network (GRNN), Probabilistic Neural Network (PNN) and novel Parallel PNN (PPNN) for the classificaton of BRCA1, BRCA2 and Sporadic mutations for ovarian cancer. The results indicate that ComNN outperformed SVM, GRNN, PNN and PPNN when it was cross validated for the aforementioned data containing multiple missing values.

1.

Introduction

Microarrays are in use to take simultaneous measurements of thousands of genes. This genetic data is helpful for the study of different malignant tumour cells and cancer causing mutations under different conditions (Hellem 2003). A reliable test for the detection of cancer mutation types would be very helpful for the early detection and cure of cancer. However, it requires differential data of several genes at a time therefore microarray data has gained attraction of many scientists to classify different cancer mutations. For the molecular classification of the micorarray cancer data machine learning algorithms have already proven their significance. For example, Golub (1999), Toure (2001), Ship (2002), Pomery (2002), Bhattacharjee (2001), Khan (2001) and Ramaswamy (2001), all provided a broad range of examples where machine learning has been applied to leukemia, 1

© Copyright 2005

Complexity International

Volume 12

lymphoma, brain cancer, lung cancer, multiple primary tumor and small round blue cell tumors classification. Despite however, successfully introducing machine learning algorithms for classification of microarray data the associated problem with microarrays is that they always generate multiple missing valued data. The possible reasons are slide scratches, image corruption or insufficient resolution (Hellem 2004), which impacts significantly on the overall performance of machine learning algorithms (Acuna 2004). One approach to deal with this is to simply ignore the sample containing missing values; however, with a limited number of samples, this is not always feasible. An alternative solution is to make a classifier which should be able to estimate the gene value based on some correlation between the samples or genes. For example, Gustavo (2004), demonstrated that with the estimation of missing values using nearest neigbour genetic values, classification accuracy of KNN classifier and Linear Discrimination Analysis classifier dramatically increased. However, unfortunately current classifiers impute zero or row average to replace missing values, which results in misclassification therefore there is a growing need to develop a classifier which could deal with missing values in a proper way. In this paper we present an innovative Communal Neural Network (ComNN) that has shown robustness while classifying the ovarian cancer microarray data, containing missing values. The classification accuracy of ComNN is tested with Support Vector Machines (SVM), Probabilistic Neural Network (PNN) and Parallel PNN. The reason to use SVM for the comparison is that SVM has shown promising results in a variety of biological classification tasks, including gene expression microarrays (Brown 1997 and Mukharjee 2000). Byvatov (2003) for example, proposed the use of SVM over BPN in identifying small organic molecules that potentially modulate the function of G-protein coupled receptors. The fundamental problems with SVM however remain their inability to treat large number missing values and its performance dependency upon the selected kernel, with no universally accepted kernel for all types of data. Results will confirm the superior performance of ComNN compared with aforementioned machine learning algorithms for the mutation classification, associated with ovarian cancer. Its worth noting that the reason to classify mutations is due to fact that mutations in BRCA1, BRCA2 and Sporadic can lead to carcinogenesis through different molecular pathways (Amir 2002) and detection of these mutations is finally helpful for the treatment of these diseases in efficient way. It represents a major challenge that the new model must be able to assist in the diagnosis of genetic mutation type in the presence of multiple missing values The remainder of this paper is organized as follows: section 2 respectively reviews the GRNN, PNN and SVM classification techniques, while the new proposed ComNN neural network methodology is presented in section 3. In section 4 the discussion is carried out based on the quantitative results, with some general conclusions presented in section 5.

2.

Background of Applied Classification Techniques

This section describes the theoretical background of the classification techniques used in this paper for the comparison with our perposed the ComNN.

2.1 Generalized Regression Neural Network Generalized regression neural networks are paradigms of the Radial Basis Functions (RBF) used in functional approximation (Specht 1991 and Timmerman 1999). To apply GRNN to classification, an input vector x (BRCA1,BRCA2 or Sporadic microarray data) is formed and 2

© Copyright 2005

Complexity International

Volume 12

weight vectors W are calculated using Eq. (2). The output y(x) is the weighted average of the target values tk of training samples xi close to a given input sample x, as given by:n

y ( x) =

∑t W i

i =1 n

i

(1)

∑W

i

i =1

⎡ − x − xi 2 ⎤ where Wi = exp ⎢ ⎥ 2 ⎣ 2h ⎦

(2)

The only weights that are needed to be learned are the smoothing parameters, h of the RBF units in Eq. (2), which are set using a simple grid search method. The distance between the computed value y(x) and each value in the set of target values T is given by:-

T = {1, 2}

(3)

The values 1 and 2 correspond to the training class and all other classes respectively. The class corresponding to the target value with the least square distance is chosen. The results presented in section 4 shows however that this formation only works well for binary classification because GRNN exhibits a strong bias towards the target value nearest to the mean value µ of T. This was also the spur in using target values 1 and 2 because both have the same absolute distance from µ.

2.2 Probabilistic Neural Network Probabilistic neural networks are also a type of radial basis network suitable for classification problems (Bevatov 2003). They can be interpreted as a function that approximates the probability density of the underlying examples distribution. A PNN has an Input, Pattern, Summation and Output layers. Each training example has one corresponding node in the pattern layer. Every pattern node forms a product of the weight vector W and the given example for classification, where the weights entering a node are from a particular example. The product is then passed through the activation function Fk given by:-

⎡X W ⎢⎣ σ T

Fk = exp

k 2

− 1⎤

(4)

⎥⎦

where X is the input vector of gene expression data (BRCA1, BRCA2 or Sporadic) and W is the weight vector. In the summation layer each node receives the outputs from pattern nodes associated with a given class (BRCA1, BRCA2 or Sporadic). The PNN output nodes are binary neurons that produce the classification decision using Eq. (5).

3

© Copyright 2005

Complexity International

Volume 12

O=

N

∑F

(5)

k

i=1

In Eq. (4) σ is the only parameter preset for training, i.e. the standard deviation of the normal function. There is a trade-off with small values of σ possibly causing very spiky approximations that will not generalize well, while large deviations smooth out detail. To select the optimized smoothing factor, a grid search method is used.

2.3 Support Vector Machine Support vector machines perform well in many practical classification problems (Vapnic 1998 and Evgeniou 2000). SVM transforms input vector space Ρn to higher dimensional space (Vapnik 1995 and Haykin 1999) and attempt to locate a separating hyperplane (Burges 1998, Cristanini 2000 and Gunn 1998). The gene expression data Z is transformed into higher dimensional space. The separating hyperplane in higher dimension space satisfies:-

W.Zi + b = 0

(6)

To maximize the margin between genetic mutation classes (7) and (8) are used.

Max

1

(7)

2

W

Subject to the condition yi (W.Zi + b ) ≥ 1

(8)

Using the Kuhn-Tucker condition (Gunn 1998) and LaGrange Multiplier methods Eq. (7) is equivalent to solving the dual problem

⎡ l ⎣ i =1

Max ⎢ ∑ ai −

1

l

⎤ ⎦

∑ a a a y K ( x x )⎥ 2 i

j

i

j

i

i =1

j

where 0 ≤ αi≤ C, l = number of inputs, i = 1…..l and

(9)

l

∑a y

i i

= 0.

In Eq. (9) K(xi ,xj) is the

i =1

kernel function and in our experiments, a linear, polynomial and RBF functions were used. C is the only adjustable parameter in the above calculations and a grid search method is used to select the most appropriate value of C.

4

© Copyright 2005

Complexity International

3.

Volume 12

Communal Neural Network

Our proposed model is a committee of GRNN models integrated in a parallel arrangement. The ComNN is organized into three layers namely Input, Hidden and Output as in Fig. 1. The input layer transmits both the input data X and target vector T to the hidden layer. The CoV neurons find the missing values in the data and compute the absolute diagonal covariance of all rows with the row having missing value. The number of CoV neurons are equal to the number of rows containing missing data. Then rows computed by each CoV neuron are ranked according to their absolute covariance value. The reason to use absolute covariance is that the more higher covariance value the more correlated rows are. The values of the k most correlated rows are passed on to the imputation neurons. The number of imputation neurons are the same as CoV neurons. The imputation neurons estimate the missing values ω of the sample based on the values of the most correlated row X using (10), (11) and (12).

δ γ

(10)

− − 1 n ( )( ) X − X Y − Y ∑ j j n − 1 j =1

(11)

− 1 n ( X − X )2 ∑ j n − 1 j =1

(12)

ω= δ=

γ=

The second layer of neurons in the hidden layer imputes the missing values to form the complete data Xc. The Xc is passed to each GRNN in the hidden layer. The GRNNs then generate output using (1) and (2). This output is then passed to the neurons connected to GRNN these neurons calculate the absolute distance from the output y of a connected GRNN to each T value. The target value C from T, having minimum distance from y is then passed to the output layer. The neurons in the output layer are only activated when y has least square distance from first target vaue (the class). The class corresponding to the active neuron is the class of X. The ComNN is trained in such a way that the Target Class M is selected from the set of classes, required to be identified. Then system is trained from the data of M, stating as positive examples. The data of remaining classes serve to provide negative examples to the system. The class labels are provided so that all the data samples from class M have label 1 and samples from remaining classes are label 2. Each class has its corresponding GRNN node in ComNN trained by the aforementioned way. These trained GRNN nodes for each target class are joined in parallel to form the ComNN architecture as shown in Fig. 1.

5

© Copyright 2005

Complexity International

Volume 12

Fig. 1. Communal neural network

4.

Mehodology

The classification performance of ComNN was compared with well-established techniques including GRNN, PNN, PPNN and SVM, which were reviewed in section 2. The ovarian cancer data by Amir (2002) was used for this purpose. The ovarian cancer data had 61 samples. The microarray data is actually asymmetric, so the log2 of the input data is used as the input to the classifier to identify the mutation of genetic data. To correctly identify the classification accuracy, the data was divided evenly into k folds and then the system went through k-iterations. For each kth experiment, k-1 folds were used for training and remaining one for testing such that the selection probability Pv of each fold to become a part of validation data for a particular iteration is:

Pv =

η

(13)

N×L

while the probability Pt of the remaining subset being selected as training data for a particular iteration is:

6

© Copyright 2005

Complexity International

Volume 12

Pt =

( k − 1) × η

(14)

N×L

where N = total data items per class, k = number of folds, L = number of classes and η = number of samples in each subset. After k iterations all subsets would have been part of the validation set, so the overall probability of the data as test data is unity, thereby giving the results a better confidence level. The classification accuracy Accuracy was calculated as:-

Accuracy =

1 k ∑ Acci k i =1

(15)

where Acc is the accuracy after each iteration. The motivation behind using the k-fold cross validation method over the classical hold out or random resampling method was that it uses data sets evenly both for training and testing, there by giving better estimation of the classification rates.

4.

Results and Discussion

The classification results obtained by GRNN and PNN were below 50% for the BRCA1, BRCA2 and Sporadic mutations of ovarian and breast cancer data, because they overfitted the training data and faced the curse of dimensionality problem. To solve this problem we arranged PNN in a way described in section 3 to form a Parallel PNN (PPNN), the accuracy improved to 50% as shown in Tables 1. PPNN misclassified 50% of the data (for all types of mutations) due to the fact that when each node in PPNN was trained using a one verses all technique it exhibited a strong bias towards target class T and misclassified all other samples as T. Table 1. Classification accuracies Classification Algorithm Neural Nets

SVM

BRCA1

BRCA2

Sporadic

ComNN

93.75

75

81.25

PPNN

50

50

50

Linear Kernel

93.75

65.63

78.13

Polynomial Kernel

93.75

68.75

81.25

RBF Kernel

87.5

78.12

75

The results presented in Table 1 shows that ComNN have the overall best performance for classifying ovarian cancer mutations. SVM kernels showed some promising results for one mutation, but failed to provide a similar performance for other mutation. For instance, the RBF kernel performed well in detecting BRCA2 mutation but was outperformed by ComNN for the classification of BRCA1 and Sporadic. This is because SVM tries to insert a separable hyperplane when it transforms data in Euclidean space, Ρn to higher dimensional space, which

7

© Copyright 2005

Complexity International

Volume 12

was not the case for some of the data samples. Another reason is that BRCA2 samples contained only 381 missing values while BRCA1 and Sporadic contained 1541 and 2292 missing values respectively that demonstrate ability of ComNN to handle large amount of missing values. The only training parameter required to be stipulated for ComNN is smoothing parameter h for that we used grid search method. Though grid search method is slow and takes linear time but in genetic mutation classification accuracy is more vital requirement than speed.

5.

Conclusion

This paper has presented a novel neural network model, called the Communal Neural Network that has demonstrated consistent superior classification performance when applied to ovarian cancer, microarray data sets, compared with other popular machine learning algorithms, including SVM, PNN GRNN. Using a k-fold cross validation technique, ComNN provided accuracies of 94%, 75% and 81% for BRCA1, BRCA2 and Sporadic mutations respectively for ovarian cancer data. The classification results were consistently better than the other techniques, confirming the model’s ability to handle missing values present in genetic data. At the same time, ComNN does not face the problems such as kernel selection in SVM. The only design parameter required for training is the smoothing parameter, whose optimal value can be computed using a grid searching method which is simple but powerful enough to find the optimal smoothing parameter.

References Acuna, E. and Rodriguez, C. (2004). The treatment of missing values and its effect in the classifier accuracy, Classification, Clustering and Data Mining Applications. Springer-Verlag Berlin-Heidelberg, 639-648. Amir A. J., Yee C. J., Sotiriou C., Brantley K. R., Boyd J., Liu E. T. (2002), Gene expression profiles of brca1-linked, brca2-linked, and sporadic ovarian cancers. Journal of the National Cancer Institute, vol. 94 (13). Bhattacharjee A., Richards W. G., Staunton J., Li C., Monti S., Vasa P., Ladd C., Beheshti J., Bueno R., Gillette M., Loda M., Weber G., Mark E. F., Lander E. S., Wong W., Johnson B. E., Golub T. R., Sugarbaker D. J. & Meyerson M. (2001), Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proc. Natl. Acad. Sci. USA 98:13790–13795. Brown M .P. S., Grundy W. N., Lin D., Cristianini N., Sugnet C., Furey T. S., Ares M. & Haussler D. (1997), Knowledge-based analysis of microarray gene expression data using support vector machines. Proc. Natl. Acad. Sci., 262-267. Burges C. J. C. (1998), A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, vol. 2 (2), 121-167. Byvatov E. & Schneider G. (2003), Support vector machine applications in bioinformatics. Applied Bioinformatics, vol. 2, 67-77. Cristanini N., & Shawe-Taylor J. (2000), An Introduction to Support Vector Machines and other Kernel-Based Learning Methods. Cambridge University Press.

8

© Copyright 2005

Complexity International

Volume 12

Evgeniou T., Pontil M. & Poggio T. (2000), Regularization networks and support vector machines. Advances in Computational Mathematics, 13:1-50. Golub T. R., Slonim D. K., Tamayo P., Huard C., Gaasenbeek M., Mesirov J. P., Coller H., Loh M. L., Downing J. R., Caligiuri M. A., Bloomfield C. D. & Lander E. S. (1999), Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 286(5439):531-537. Gunn S. (1998), Support vector machines for classification and regression, ISIS Technical Report, Image Speech and Intelligent Systems Group, University of Southampton. Gustavo B., Monard C.M. (2003), An Analysis of Four Missing Data Treatment Methods for Supervised Learning. Applied Artificial Intelligence 17(5-6): 519-533. Haykin S. (1999), Neural Networks, Prentice Hall. Hedenfalk I., Duggan D., Chen Y., Radmacher M., Bittner M., Simon R., Meltzer P., Gusterson B., Esteller M., Kallioniemi O. P., Wilfond B., Borg A. & Trent J. (2001), Geneexpression profiles in hereditary breast cancer, N. Engl. J. Med., 22;344(8):539-548. Hellem T., Dysvik B. & Jonassen I. (2004), LSimpute: accurate estimation of missing values in microarray data with least squares methods. Nucleic Acids Research, vol. 32(3), Oxford University Press. Mukherjee S., Tamayo P., Slonim D., Verri A., Golub T., Mesirov J. P. & Poggio T. (2000), Support vector machine classification of microarray data. Technical Report, Artificial Intelligence Laboratory, Massachusetts Institute of Technology. Pomeroy S. L., Tamayo P., Gaasenbeek M., Sturla L. M., Angelo M., McLaughlin M. E., Kim J. Y., Goumnerova L. C., Black P. M., Lau C., Allen J. C., Zagzag D., Olson J., Curran T., Wetmore C., Biegel J. A., Poggio T., Mukherjee S., Rifkin R., Califano A., Stolovitzky G., Louis D. N., Mesirov J. P., Lander E. S. & Golub T. R. (2002), Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature, 415(24):436-442. Ramaswamy S., Tamayo P., Rifkin R., Mukherjee S., Yeang C. H., Angelo M., Ladd C., Reich M., Latulippe E., Mesirov J. P., Poggio T., Gerald W., Loda M., Lander E. S. & Golub T. R. (2001), Multiclass cancer diagnosis using tumour gene expression signatures, Proc. Natl. Acad. Sci., USA, 98(26):15149-15154. Timmerman D., Verrelst H., Bourne T. H., De Moor B., Collins W. P., Vergote I. & Vandewalle J. (1999), Artificial neural network models for the preoperative discrimination between malignant and benign adnexal masses. Ultrasound Obstet Gynecol, 13:17- 25. Toure A. & Basu M. (2001), Application of neural network to gene expression data for cancer classification, IEEE. Shipp M. A, Ross K. N., Tamayo P., Weng A. P., Kutok J. L, Aguiar R. C., Gaasenbeek M., Angelo M., Reich M., Pinkus G. S., Ray T. S., Koval M. A., Last K. W., Norton A., Lister T. A., Mesirov J., Neuberg D. S., Lander E. S., Aster J. C. & Golub T. R. (2002), Diffuse large B-cell lymphoma outcome prediction by gene expression profiling and supervised machine learning. Nat Med, 8(1):68-74. Specht D. F. (1991), A generalized regression neural network. IEEE Trans. on Neural Networks, 568-576.

9

© Copyright 2005

Complexity International

Volume 12

Khan J., Wei J. S., Ringner M., Saal L. H., Ladanyi M., Westermann F., Berthold F., Schwab M., Antonescu C. R., Peterson C. & Meltzer P. S. (2001), Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Med, 7(6), 673-679. Vapnik V. N. (1998), Statistical Learning Theory. John Wiley & Sons. Vapnik V. N. (1995), The Nature of Statistical Learning Theory, John Wiley & Sons.

10

© Copyright 2005