STACKED SPARSE AUTOENCODER (SSAE) BASED FRAMEWORK FOR NUCLEI PATCH CLASSIFICATION ON BREAST CANCER HISTOPATHOLOGY Jun Xu1 , Lei Xiang1 , Renlong Hang1 , Jianzhong Wu2 1
Nanjing University of Information Science and Technology, Nanjing 210044, China. 2 Jiangsu Cancer Hospital, Nanjing 210000, China. works in segmentation or classification of nuclei are mostly based on supervised learning. For histopathological images, it is usually expensive or cost to get enough labeled data for training or learning. On the other hand, as the rapid development of digitalized pathological technology, it is easy to get a large amount of unlabeled data. Moreover, histopathological is generally high-resolution data. The performance of current supervised-based discriminative model would be greatly improved if we can develop a very efficient way to make use of such large unlabeled and highly-structured data. One solution to this problem is to learn a good feature representation to capture a lot of structure from those input unlabeled data. Then the discriminative model works on such new feature space for subsequent classification of desired objects. Recently, significant progress has been made on learning representation of images from the pixel (or low) level feature in order to identify high-level feature in images. These highlevel feature are often learned in hierarchical representation using large amounts of unlabeled data. Deep learning is such an hierarchical learning approach to learn high-level feature from raw pixel level intensity which is sufficiently useful for differentiating different objects by a classifier. Deep learning has been shown great accomplishments in vision and learning since the first deep autoencoder network was proposed by Hinton et al. in [3]. It has been caused great attention by researchers from both industry and academia. Recently, a deep max-pooling convolutional neural networks is presented for detecting mitosis in breast histological images [4]. Similar work from this team won the ICPR 2012 mitosis detection competition. Inspired by these works, in this paper, we present a Stacked Sparse Autoencoder (SSAE) framework for nuclei classification on breast histopathology.
ABSTRACT In this paper, a Stacked Sparse Autoencoder (SSAE) based framework is presented for nuclei classification on breast cancer histopathology. SSAE works very well in learning useful high-level feature for better representation of input raw data. To show the effectiveness of proposed framework, SSAE+Softmax is compared with conventional Softmax classifier, PCA+Softmax, and single layer Sparse Autoencoder (SAE)+Softmax in classifying the nuclei and non-nuclei patches extracted from breast cancer histopathology. The SSAE+Softmax for nuclei patch classification yields an accuracy of 83.7%, F1 score of 82%, and AUC of 0.8992, which outperform Softmax classifier, PCA+Softmax, and SAE+Softmax. Index Terms— Deep learning, Sparse Autoencoder, Breast Cancer Histopathology 1. INTRODUCTION With the recent advent and cost-effectiveness of whole-slide digital scanners, tissue histopathology slides can now be digitized and stored in digital image form. Digital pathology makes computerized quantitative analysis of histopathology imagery possible. In the context of breast cancer, the size, arrangement, and morphology of nuclei in breast histopathology are important biomarkers for predicting patient outcome [1]. However, the manual detection of BC nuclei in histopathology is a tedious and time-consuming process that is unfeasible in the clinical setting. Therefore, it is important to develop efficient method for automatically detecting BC nuclei. Previous approaches to nuclei or cell segmentation include region growth, threshold, clustering, level set [2], supervised color-texture based method, watershed based method are not very robust to the highly variable shapes and sizes of BC nuclei, as well as artifacts in the histological fixing, staining, and digitization processes. In [1], we present an semi-automated nuclear detection scheme based on the Expectation Maximization (EM) algorithm. These previous
2. METHOD Autoencoder is an unsupervised feature learning algorithm which aims to develop better feature representation of input high-dimensional data by finding the correlation among the data. Basically, an auto-encoder is simply a multi-layer feedforward neural network trained to represent the input with back-propagation. By applying back-propagation, the autoencoder tries to decrease the discrepancy as much as possible
This work is supported by National Science Foundation of China (No. 61273259) and Six Major Talents Summit of Jiangsu Province (No. 2013XXRJ-019). Email:
[email protected].
978-1-4673-1961-4/14/$31.00 ©2014 IEEE
999
Fig. 1: The architecture of basic Sparse Autoencoder (SAE) for nuclei classification. between input and reconstruction by learning a encoder and a decoder (See Figure 1), which yields a set of weights W and biases b. 2.1. The Basic Sparse Autoencoder[5] Let X = (x(1), x(2), . . . , x(N ))T be the entire training (unlabeled) patches, where x(k) ∈ Rdx , N and dx are the number of training patches and the number of pixels in each patch, (l) (l) (l) respectively. h(l) (k) = (h1 (k), h2 (k), . . . , hdh (k))T denotes the learned high-level feature at layer l for the k−th patch, where dh are the number of hidden units in current layer l. Throughout this paper, we use superscript and subscript on a notation to define the hidden layer and unit in this lay(1) er, respectively. For instance, in Figure 1, hi represents the i−th unit in the 1st hidden layer. For simplicity, we denote x and h(l) as a input patch and its representation at hidden layer l, respectively. The architecture of basic Sparse Autoencoder (SAE) is shown in Figure 1. In general, the input layer of the autoencoder consists in an encoder which transforms input x in the corresponding representation h, and the hidden layer h can be seen as a new feature representation of input data. The output layer is effectively a decoder which is trained to reconstruct an approximation x ˆ of the input from the hidden representation h. Basically, training an autoencoder is to find optimal parameters by minimizing the discrepancy between input x and its reconstruction x ˆ. This discrepancy is described with a cost function. The cost function of a Sparse Autoencoder (SAE) comprises three terms as follows[6]:
The first term is an average sum-of-squares error term which describes the discrepancy between input x(k) and reconstruction x ˆ(k) over the entire data. Encoder eθˇ(·) maps input x ∈ Rdx to the hidden representation h ∈ Rdh , which is defined by h = eθˇ(x) = s(W x + bh ), where W is a dh × dx weight matrix and bh ∈ Rdh is a bias vector. The encoder is parameterized by θˇ = (W, bh ). Decoder dθˆ(·) maps resulting hidden representation h back into input space x ˆ. x ˆ = dθˆ(h) = s(W T h+bx ), where W T is a dx ×dh weight matrix and bx ∈ Rdx is a bias vector. Here s(·) is activation function which are 1 , defined by Sigmoid logistic function as s(z) = 1+exp(−z) where z is the pre-activation of a neuron. Hence, the decoder is parameterized by θ = (W T , bx ). The weight matrix W T of the reverse mapping is the transpose of weight matrix W . Therefore, the autoencoder is said to have tie weights, which effectively reduce the number of the parameters of the weight matrix into half. Therefore, the pre-activation of the output layers of autoencoder can be written by three parameters θ = (W, bh , bx ) as y = W T s(W x + bh ) + bx . So, the reconstruction x ˆ by decoder can be computed as x ˆ = s(y). Basically, training an autoencoder is to find optimal parameters θ = (W, bh , bx ) simultaneously by minimizing the reconstruction error described in the first term. The cost function L(·, ·) measures the discrepancy between input x and the reconstruction x ˆ by decoder dθˆ(·). In the second term, n is the number of units in hidden layer, and the index j is summing over the hidden units in the network. KL(ρ||ˆ ρj ) is the Kullback-Leibler (KL) divergence between ρˆj , the average activation (averaged over the training set) of hidden unit j, and desired activations ρj which is 1−ρ defined as ρ log ρˆρj + (1 − ρ) log 1− ρˆj . The third term is a weight decay term which tends to decrease the magnitude of the weight, and helps prevent overfitting. Here ∥W ∥22
nl s∑ l−1 sl ∑ ∑ (l) (wi,j )2 = tr(W W ) = T
l=1
i
(2)
j
where nl is the number of layers and sl is the number of neu(l) rons in layer l. wi,j represents the connection between i−th neuron in layer l − 1 and j−th neuron in layer l. For the SAE studied in this paper, as Figure 1 shows, the parameters of SAE are nl = 2, s1 = 1156, s2 = 500. 2.2. Stacked Sparse Autoencoder (SSAE)[5]
The stacked autoencoder is a neural network consisting of multiple layers of basic SAE in which the outputs of each N n ∑ layer is wired to the inputs of the successive layer. In this 1 ∑ LSAE (θ) = KL(ρ||ˆ ρj ) paper, we construct two layers SSAE which consists of two (L(x(k), dθˆ(eθˇ(x(k))))) + α N j=1 k=1 basic SAE. The architecture of SSAE is shown in Figure 2. (1) For simplicity, we didn’t show the decoder parts of each basic + β∥W ∥22 SAE in the figure.
1000
Fig. 2: The architecture of Stacked Sparse Autoencoder (SSAE) and Softmax Classifier for nuclei classification. The SSAE yields a function f : Rdx → Rdh(2) that transforms an input raw pixels of a patch to a new feature representation h(2) = f (x) ∈ Rdh(2) . In the first layer or input layer, the input is the raw pixel intensity of a square patch, which is represented as a column vector of pixel intensity whose size is 342 × 1. There are dx = 34 × 34 = 1156 input units in the input layer. The first and second hidden layers have dh(1) = 500 and dh(2) = 100 hidden units, respectively. 3. DATA SET AND EXPERIMENTAL DESIGN A total of 37 H & E stained breast histopathology images were collected from a cohort of 17 patients and scanned into a computer using a high resolution whole slide scanner at 20x optical magnification. 3.1. The Generation of Training and Testing Sets We extract two classes of square patches from the histopathology images: nuclei and non-nuclei patches. The size of each patches is 34 × 34. Each nuclei patch contains one nuclei in its centroid. The non-nuclei patches don’t contain any nuclei or contain parts of nuclei. In training set, 14421 nuclei and 28032 non-nuclei patches are extracted from 37 H & E stained breast histopathology images. To show the effectiveness of our method, we choose some challenge patches for testing. There are 2000 nuclei and 2000 non-nuclei patches in the testing set. 3.2. Training the SSAE The architecture of SSAE is shown in Figure 2. We employ the greedy layerwise approach for pretraining SSAE by training each layer in turn. After the pre-training, the trained SSAE will be employed to nuclei and non-nuclei patches classification in testing set.
First, a SAE on the raw inputs x to learn primary features h(1) (x) on the raw input by adjusting the weight W(1) . Next, the raw input is fed into this trained sparse autoencoder, obtaining the primary feature activations h(1) (x) for each of the input patches x. These primary features are used as the “raw input” to another sparse autoencoder to learn secondary features h(2) (x) on these primary features. Following this, the primary features are fed into the second SAE to obtain the secondary feature activations h(2) (x) for each of the primary features h(1) (x) (which correspond to the primary features of the corresponding input patches x). These secondary features are treated as “raw input” to a softmax or SVM classifier, training it to map secondary features to digit labels. Finally, all three layers are combined together to form a SSAE with 2 hidden layers and a final softmax or SVM classifier layer capable of classifying the nuclei and non-nuclei patches as desired. 3.3. Experiments and Performance Evaluation There are two aims in designing the experiments. One is to show that high-level feature representation with SSAE would be much more useful to a classifier than the intensity of raw pixels. We compare SSAE with raw pixel-based classification which employ softmax classifier directly to intensity of raw input pixels, where a softmax classifier is learned from training set to classify candidate patches into nuclei and non-nuclei patches using intensity of raw pixels. In SSAE based framework, the learned features by SSAE are treated as “raw input” to Softmax classifier for classification. We use SSAE+Softmax to represent this framework. The other aim is to show the efficiency of SSAE in representing high-level features from the raw pixels. Principal Component Analysis (PCA) is also one of the most commonly used unsupervised learning algorithms for identifying low-dimensional subspace of maximal variation within unlabeled data. We compare SSAE with PCA in representing high-level features from the raw pixels. To attain these aims, the SSAE+Softmax is compared against three methods on classification of nuclei from non-nuclei patches as follows:(1) Softmax(SM): Softmax classifier on raw pixel classification; (2)PCA+Softmax: high-level features learned by PCA from raw data are treated as “raw input” to Softmax classifier for classification; (3)SAE+Softmax: high-level features learned by SAE from raw data are treated as “raw input” to Softmax classifier for classification. The classification results of different methods on classification of nuclei and non-nuclei patches were evaluated in terms of Precision, Recall, F1 Score, and Accuracy P which are defined as P recision = T PT+F P , Recall = P recision×Recall TP , F 1 Score = 2 × ; T P +F N P recision+Recall Accuracy = T P +T N T P +T N +F P +F N , respectively. Here TP is the number of nuclei patches that had been correctly classified based on the
1001
ground truth. FP, TN, and FN are defined in similar ways. 4. RESULTS AND DISCUSSION Qualitative results of learned high-level features from training patches with SSAE are shown in Figures 3(a) and 3(b). The SSAE model discovers very interesting structure between data. The quantitative evaluation on SSAE and compared methods are shown in Table 1. The Precision-Recall plane of SSAE and other methods are shown in Figure 4(a). The Receive Operating Characteristic Curve (ROC) of SSAE and other methods are shown in Figure 4(b). As expected, SSAE yields highest F1 score of 82.0%, AUC of 0.8992, and accuracy of 83.7%. These results justify our assumption that SSAE works well in learning useful high-level feature for better representation the data. All the experiments were carried out on a PC (Intel Core(TM) 3.4 GHz processor with 16 GB of RAM). The software implementation was performed using MATLAB. Table 1: Precision (Prec), Recall (RC), F1 Score (F1), Area Under the Curve (AUC), Accuracy(Acc), and Execution Time (ET) (in minutes (m) or hours (h)) for four different methods. Method Prec RC F1 AUC Acc ET (%) (%) (%) (%) SM 71 79 74.5 0.8501 75.3 1m PCA+SM 89 72 79.6 0.8616 77.5 0.5 m SAE +SM 74 84 78.7 0.8942 80.1 1.1 h SSAE +SM 76 89 82 0.8992 83.7 2.8 h
(a)
(b)
Fig. 4: The performance of our approach compared to others in Precision-Recall plane show in Fig 4(a). The ROC curve of our approach and others are shown in Fig 4(b). cancer histopathology. Most of current classification methods to this problem are based on supervised learning and pixelwise classification. The SSAE framework learns high-level feature representation of raw data in unsupervised way. These high-level features enable the classifier to work very efficiently on classifying nuclei from non-nuclei patches. The evaluation results show SSAE+Softmax outperforms conventional Softmax classifier, PCA+Softmax, and SAE+Softmax. 6. REFERENCES [1] Ajay Basavanhally, Jun Xu, and et al., “Computer-aided prognosis of er+ breast cancer histopathology and correlating survival outcome with oncotype dx assay,” in ISBI’09. IEEE, 2009, pp. 851–854. [2] Hussain Fatakdawala, Jun Xu, and et al., “Expectation– maximization-driven geodesic active contour with overlap resolution (emagacor): Application to lymphocyte segmentation on breast cancer histopathology,” TBME, vol. 57, no. 7, pp. 1676–1689, 2010.
(a)
[3] Geoffrey E Hinton and Ruslan R Salakhutdinov, “Reducing the dimensionality of data with neural networks,” Science, vol. 313, no. 5786, pp. 504–507, 2006.
(b)
Fig. 3: The learned feature representation of raw pixels with SSAE. Fig 3(a) show the learned feature representation in the first hidden layer (with 500 units). The learned high-level feature representation in the second hidden layer (with 100 units) is shown in Fig 3(b). As expected, Fig 3(a) shows detailed boundary features of nuclei and other tissue while Fig 3(b) show high-level feature of nuclei.
[5] Andrew Ng, “Sparse autoencoder,” CS294A Lecture notes, p. 72, 2011.
5. CONCLUDING REMARKS
[6] Y. Bengio and et al., “Representation learning: A review and new perspectives,” TPAMI, vol. 35, no. 8, pp. 1798– 1828, 2013.
[4] Dan C. Ciresan and et al., “Mitosis detection in breast cancer histology images with deep neural networks,” in MICCAI 2013, vol. 8150 of LNCS, pp. 411–418. Springer, 2013.
In this paper, a Stacked Sparse Autoencoder (SSAE) framework is proposed for nuclei patches classification on breast
1002