or Lung Cancer

2017 3rd International Conference on Science and Technology - Computer (lCST)

The Application 0/ Wavelet Recurrent Neural Network/or Lung Cancer Classification Devi Nurtiyasari, Dedi Rosadi and Abdurakhman

Department of Mathematics Universitas Gadjah Mada Yogyakarta, Indonesia [email protected], [email protected], [email protected] Abstract-Lung cancer is one of the deadliest types of cancer in the world. Lung cancer detection is necessary to determine the next steps in dealing with the patients. One of the methods that can be used for lung cancer detection is a classification method based on lung cancer image. Most of the models for lung cancer classification based on lung cancer image are various types of the neural network model with binarization image pre-processing. As an image is containing noise, it is needed to remove the noise from the original image before the binarization process. Wavelet is a model that can be used to remove the noise from the original image, i.e. image denoising process. Recurrent Neural Network is neural network development model which is able to accommodate the network output to be re-input of the network. The architecture of Recurrent Neural Network uses Elman network that has feedback link from the hidden layer to the input layer. The combination model of Wavelet and Recurrent Neural Network, called Wavelet Recurrent Neural Network, can be used for lung cancer classification by applying Wavelet for lung image denoising process and Recurrent Neural Network for the classification process. Classification of lung cancer using Wavelet Recurrent Neural Network provide results with sensitivity, specificity, and accuracy were respectively 93.75%, 66.67%, and 84% for training

data and 88.24%, 75%, and 84% for testing data.

Keywords-wavelet neural network, image processing

I.

An early detection may increase the survival rate of patients significantly. Most of the models for lung cancer classification based on lung cancer image are various types of neural network model with binarization image pre-processing.The previous research for lung cancer classification has been done by using neural network model and it provides accuracy 80% [3]. Another research has been done by using recurrent neural network model and it provides result 81.33% [4]. These research does not concern about the image characteristic. As an image is containing noise, it is necessary to remove/reduce the noise from the original image before the binarization process. Wavelet is a model that recently developed for image denoising. Wavelet can be used to remove/reduce the noise from the original image based on the image characteristic. Recurrent Neural Network is the development of neural network model which has feedback link on the network. The model that combines Wavelet and Recurrent Neural Network is called Wavelet Recurrent Neural Network. This model can be used for lung cancer classification by applying Wavelet for lung image denoising process and Recurrent Neural Network for the classification process. The classification accuracy is expected an increase as the model is elaborated.

INTRODUCTION

Lung cancer is one of the common types of cancer in the world. Results from World Health Organizations (WHO) [1] show that lung cancer has the highest mortality rate caused by cancer. There are 8.20l.030 deaths caused by cancer (both sexes, all ages), the major deaths caused by cancer are lung cancer as much as 1.589.000 deaths (19.4%), liver cancer as much as 745.517 deaths (9.1%), stomach cancer as much as 723.027 deaths (8.8%), colorectal cancer as much as 693.881 deaths (8.5%), and breast cancer as much as 521.817 deaths (6.4%). An early diagnosis for lung cancer detection is by lung radiology test (lung image), called Chest X-Ray test. Lung image will show different results between normal lung and abnormal lung. The existence of nodules in the lung image indicates that the lung is not normal. However, these nodules are not always a lung cancer, because it can be due to some other diseases such as pneumonia or tuberculosis [2]. The nodules that are detected in lung can be categorized into two categories non cancerous nodule (benign) and cancerous nodule (malignant).

11.

WAVELET RECURRENT NEURAL NETWORK

(WRNN)

A. Image Noise and Image Denoising The image is a matrix of pixel arranged in rows and columns. The pixel describes the row and column location. It also has intensity symbolized as p( x, y ) , where x is the row location and y is the column location of the pixel. Based on its intensity, the image can be categorized into three types, i.e. ROB image (0 � p(x, y) � 255, 3 level) , grayscale image

(0 � p(x, y) � 255, 1 level) ,and binary image (0 for black

color and 1 for white color). Noise is actually not the part of the image. One of the noise model is additive noise. This kind of noise can be formulated as follows [5]: p(x, y) Pa (x, y) + Pn (x, y) (1)

978-1-5386-1874-5/17/$31.00 ©2017 IEEE

=


where p(x, y) is the pixel 0f the original image, Pa (x, y) is the

pixel of the image without noise, and Pn (x, y) is the pixel of the

image noise. The problem of image denoising is to recover Pa (x,y) from the original image p(x, y) , which still contains noise Pn (x,y) .

B.

,

Image Extraction

Image extraction is a technique for collecting features from an image. Gray Level Co-occurrence Matrix (GLCM) method is one of the methods that can be used for image extraction. This kind of image extractions results in 14 features, i.e. energy, contrast, correlation, the sum of square variance, Inverse Difference Moment (TDM), sum average, sum entropy, sum variance, entropy, difference variance, difference entropy, maximum probability, homogeneity and dissimilarity [6]. All of the features extraction formulas are shown in Table 1. TABLE 1.

IMAGE FEATURES EXTRACTION

Feature

Formula

Energy

L L {P(x,y)}2

Contrast

L L {p(x,y)} (X_y)2

Correlation

x Y

L L x

Sum variance Sum entropy Entropy Difference vanance Difference entropy Maximum probability Homogeneity Dissimilarity

O"xO"Y

LLP(x,y)(x-J.l)2 x Y

-L k(Px+Y(k») k

Sum average

p(x,y) LL x Y l +(x-y)2 L (x-SumEntropy)2px+y x

(l)

-L PX+Y(k) 10gPx+Y(k) k

-LLP(x,y)logp(x,y) x Y

var(PX_Y(k) )

-L (PX-Y(k) ) 10g(Px_Y(k») k

,

S: 'Pj,k(x)dx

=

(2)

0

2) The square of l/lj,k (x) integrates over (-00,00) to unity

s: 'P2j,k(X)dx

=

(3)

1

The basic principal of Wavelet transformation is dividing the signal based on its frequency (decomposition process). Wavelet is described as the family of the translation and dilation function, called mother wavelet. The mother wavelet l/lj,k (t) is defined as follows [9]. . '/2 (4) 'P (t) 2J 'P(2J t -k);j k E Z;j "'" 0 ,

=

where j is the frequency parameter or scaling which is the measurement of compression degree and k is the translation parameter which determines the time location of the wavelet. Haar Wavelet is the base of other wavelet types. For image denoising, it uses Wavelet 2D. The algorithm used for the image denoising is Tree-adapted Wavelet Shrinkage (TAWS) algorithm. The TAWS algorithm is a simple, but highly effective, wavelet-based image denoising algorithm [9]. The Haar Wavelet transformation for the image can be calculated for the image with odd numbers of rows and columns. Consider an image f,

(xy)p(x,Y)-J.lxJ.ly

Y

these properties [8]. 1) The integral of 'Pj k(X) over (-00,00) is 0

j ,k

x Y

Sum of square vanance

Inverse Difference Moment

ability to concentrate image energy in some coefficients. The coefficients separate in two categories: the coefficient with much energy and the coefficient with little energy. The coefficient with little energy can be removed as it only gives little information (not significant information). Consider a Wavelet function 'Pj k (x) then the function must be satistying

maxx,y p(x,y)

p(x,y) L L x Y l+lx-YI LLP(x,y)lx-yl x Y

C. Wavelet Wavelet is a small wave which has its energy concentrated in time to give a tool for the analysis of transient, non stationary, or time-varying phenomena [7]. Wavelet has the

f=

[

fl'

�

f2,M

fl,2 fl,1

f2,2 f2,1

:

..

..

fN,M ;

... fN,2 fN,1

I

(5)

The first level of Haar Wavelet transformation is started by calculating the ID wavelet transform (first level) on each row of f, so that it produces a new image. On the new image obtained from step 1, calculate the same ID wavelet transforms on each of its columns. The Haar Wavelet transformation of an image f can be symbolized as follows.

f�

[hI

(6)

l

a

where the sub-images h , dl, aI, and VI each of them has M / 2 rows and N / 2 columns. The sub-image al is trended along rows of f followed by computing trends along columns. The sub-image dl is created from the fluctuations along both

I

I

rows and columns. The sub-image h is created by computing trends along the image f followed by computing the fluctuations along columns. The sub-image v1 is the reverse of the sub image hI. Then, analogously with pyramid algorithm, the next level of Haar transformation is as follows:


(7)

(8)

D. Recurrent Neural Network (RNN) RNN is a neural network with feedback linle There are 4 layers in RNN: the input layer, hidden layer, the output layer, and feedback linle The activation function is used to connect one layer to another layer. The feedback linl( is able to accommodate the network output to be re-input of the network. In this paper, the researchers use RNN from the Elman Network which has feedback link from the hidden layer to the input layer. The RNN architecture with Elman Network is shown in Fig. 1.

Ill. THE APPLlCATlON OFWRNN FOR LUNG CANCER CLASSIFICA nON (RESULTS AND DISCUSSION) There were 100 lung images from Japanese Society Radiology and Technology [10] used in this paper (35 normal lung images, 33 benign lung images, and 32 malignant lung images). It was divided into training data (75 lung images) and testing data (25 lung images). The first step for applying WRNN for lung cancer classification was by denoising the lung image with wavelet and, then, extracting the denoised image with GLCM technique. The result of GLCM extraction was used for classification with RNN model. A. Wavelet/or Lung Image Denoising The denoised lung images with Haar Wavelet level 1 to level 10 was shown in Fig. 2. The denoised images were more obscure as the level increase. Then, the researcher chose the denoised image with Haar Wavelet Level 6. The next step was extracting the denoised image with GLCM technique and the extraction results (14 extraction features) were used for being classified by RNN model.

Output layer

In ut La er

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

Fig.l. RNN Architecture E.

Sensitivity, Specificity, and Accuracy The possibilities that can happen in diagnostic test are shown in Table 2. There are 4 results for the diagnostic tests. There are true positive (a) which means that sick patient correctly identified as sick, false positive (b) which means that healthy patient incorrectly identified as sick, false negative (c) which means that sick patient incorrectly identified as healthy, and true negative (d) which means that healthy patient correctly identified as healthy. The sensitivity, specificity and accuracy a b a+d are measured respectively by -- , , and a+b+c+d a+c b+d --

TABLE II.

(i)

(k)

Fig. 2. (a) Normal Lung Image, (b) Denoised Lung Image with Haar Wavelet Levell, (c) Denoised Lung Image with Haar Level 2, (d) Denoised Lung Image with Haar Level 3, (e) Denoised Lung Image with Haar Level 4, (f) Denoised Lung Image with Haar Level 5, (g) Denoised Lung Image with Haar Level 6, (h) Denoised Lung Image with Haar Level 7, (i) Denoised Lung Image with Haar Level 8, U) Denoised Lung Image with Haar Level 9, (k) Denoised Lung Image with Haar Level 10

-----

DIAGNOSTIC TEST True Situation

Measure

Performance Indicator Present

Indicator Absent

Positive Negative Total

True Positive (a) False Negative (c) a+c

False Positive (b) True Negative (d) b+d

Performance

B.

Wavelet Recurrent Neural Network/or Lung Cancer Classification

The first step of RNN Elman network classification was by choosing how many hidden layers will be used for the lung image classification. The error of the RNN learnings for training data and testing data were shown in Table m.


MSE HIDDEN LAYER

TABLE III. Hidden Layer

IV.

MSE Training

MSE Testing

0.46722 0.41538 0.38206*) 0.34526 0.29248

0.64285 0.51201 0.45553*) 0.58408 0.69655

1 2 3 4 5

As shown in Table Ill, RNN model with 3 hidden layers provided smallest MSE for training data and smallest MSE for testing data, so that for the next process, elimination input, RNN with 3 hidden layers was used. MSE FOR INPUT ELIMINATION

TABLE IV. Hidden

Numbers Of

MSE

MSE

Layer

Input

Training

Testing

1 2 3 4

14 Features 13 Features 13 Features 13 Features

0.38206*) 0.4666 0.49975 0.46274

0.45553*) 0.65983 0.56336 0.66101

As shown in Table 4, RNN model with 3 hidden layers and 14 features extraction as input variables provided smallest MSE for training data and smallest MSE for testing data, so that for the next process, RNN with 3 hidden layers and 14 features extraction were used as the input variables. The last step was measuring the sensitivity, specificity and accuracy of the model to show how effective was WRNN model for lung cancer classification. TABLE V.

The procedures of lung cancer classification using Wavelet Recurrent Neural Network was defining the input data training data and testing data), denoising lung image using Wavelet, denoising lung image features extraction using GLCM technique, and determining the classification using RNN. This paper was using Wavelet Haar Level 6 for image denoising process and Elman network for the lung image classification with Recurrent Neural Network. The best WRNN model for lung classification was WRNN with 3 hidden layers and 14 features extraction as the input variables. It provided the results of sensitivity, specificity, and accuracy were respectively 93.75%, 66.67%, and 84% for training data and 88.24%, 75%, and 84% for testing data. This research can be elaborated by using another Wavelet model such as Daubhecies, Symlet, or Coiflet for image denoising, another image extraction method, another RNN network, such as Hopfield network. REFERENCES [1 ]

International Agency for Research on Cancer, World Cancer Report 2014, edited by Stewart, B.W. and Wild, C.P. (World Health Organization, Lyon, 2014), pp. 10-19.

[2 ]

K.A.G. Udhesani, R.G.N Meegama, and T.G.I. Fernando, International Jornal oflmage Processing (IJIP), 5, pp. 2.

[3]

M. Miah and M.A. Yousuf, "Detection of Lung Cancer from CT Image

Sensitivity

Specificity

Accuracy

Training Data Testing Data

93.75% 88.24%

66.67% 75%

84% 84%

As shown in Table V, WRNN model for lung cancer classification provided the results of sensitivity, specificity, and accuracy were respectively 93.75%, 66.67%, and 84% for training data and 88.24%, 75%, and 84% for testing data. The accuracy of WRNN model was better than the previous research using NN [3] and RNN[4] which have accuracy respectively 80% and 81.33%.

2nd Int'I Conf. on Electrical Engineering and Information & Communication Technology.

Using Image Processing and Neural Network",

[4]

D. Nurtiyasari. "The Application of Recurrent Neural Network Model for Lung Cancer Classification", unpublished.

[5]

AI. Bovik, Handbook of Image & Video Processing. (Academic Press, Austin, 2000), pp. 117.

[6]

R. M. Harralick, K. Shanmugam, and I. Dinstein, IEEE Transaction on System, Man and Cybernetics. 3, 610 (1973)

[7]

C.S. Burrus, et ai, Introduction to Wavelets and Wavelet Transforms. (Prentice Hall, Upper Saddle River, 1998). pp. 1-3.

[8]

Perceival D.B. and Walden A.T, Wavelet Methods for Time Series Analysis. (Cambridge University Press, London, 2000). pp. 2.

[9]

Walker, J.S., A Primer on Wavelets and their Scientific Application.( Chapman & Hall (CRC), Boca Raton, 2008). pp.99, 133-134.

RESULTS

Data

CONCLUSION

[10] Japanese Society of Radiology Technology, Digital Image Database. (1997)