Cite this paper as: Gong B., He B., Hu Q., Jia F. (2015) Automatic Liver Localization based on Classification Random Forest with KNN for Prediction. In: Jaffray D.
Automatic Liver Localization based on Classification Random Forest with KNN for Prediction Benwei Gong, Baochun He, Qingmao Hu and Fucang Jia* Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China Abstract— Robust localization of liver in 3D-CT images is a prerequisite for automatic liver segmentation. Accurate, robust liver localization is challenging due to the variation in appearance and shape, and the ambiguous boundaries between the liver and its neighbor organs. A fully automatic approach was proposed: in the first stage, the interface between the thoracic cavity and the abdomen was detected with a differential model, and the relative structural prior of liver region was derived; in the second stage, random forest is constructed, each testing sample was predicted with a k nearest neighbor (KNN) model based on the relative structural in the same leaf node of the random forest. Experiment results showed that the proposed method obtained comparable or better performance in liver localization. Keywords— Liver localization, Structural prior, Random forest, K nearest neighbor I. INTRODUCTION
Accurate and robust liver localization plays an important role in automatic liver segmentation. Liver segmentation in three dimensional computed tomography (3D-CT) images is essential for liver surgery planning and intraoperative navigation. Statistical shape model-based liver segmentation has been proved to be an efficient liver segmentation method [1]. Robust liver localization is essential for statistical shape model initialization. Failed or inaccurate segmentation often occurred due to wrong liver localization. Liver is difficult to be localized in CT images due to several factors. Firstly, the great variation exists in the appearance and shape. Secondly, the ambiguous boundaries between the liver region and the heart, the spleen and other organs. Thirdly, liver lesion such as liver tumor(s) existed inside or on the boundary causes the imhomogeneity in appearance. It remains a great challenge to automate liver localization in a robust manner. Prior knowledge captured from expert labelled medical datasets has intensive impact on automatic organ localization. Atlas registration has been popular in its robustness of registering one atlas image to the image to be segmented [2], while the method is time consuming. Recently, machine learning methods have played an important role in anatomy and lesion detection, such as AdaBoost based tumor segmentation was ranked the first place in the MICCAI 2008
liver tumor segmentation challenge [3]. A discriminative generalized Hough transform composed of the generalized Hough transform (GHT) and a discriminative training technique was proposed to locate organs efficiently and robustly [4]. Random forest (RF) is an efficient ensemble tree model to deal with high dimensional datasets [5]. Typically, random forest is made up of a multitude of independent decision trees during training phase, each tree makes a prediction of the targets in the testing phase. Random forest often require a great amount of balanced training samples in order to avoid over-fitting. Recently, random forest and Hough forest have been studied for general object detection in computer vision [6, 7]. Criminisi et al. used RF to automatically localize multiple anatomical structures by bounding box [8]. In another study, both random regression and classification forests were used to locate kidneys following a coarse-to-fine strategy [9]. While random forest works well comparable to other classifiers, image features still plays an important role. Recently, forests with spatial context information were combined to get more accurate and robust localization [10-13]. Atlas random forest was used to encode each single atlas with a classification forest to label multiple anatomical structures, which encoded prior knowledge and features in a more compact way [14]. However, wrong classification of other organs as liver tissue often occur which hinders the follow-up segmentation. In this work, we present a new approach that explores the spatial relationship of abdominal organs to enhance the classification accuracy of random forest. On one hand, lungliver interface localization effectively prevents the wrong heart classification to liver tissue due to the similar intensity range caused by inferior imaging contrast, on the other hand, K-nearest neighbor (KNN) is used to integrate the relative spatial relationship into classification random forest. Compared to other state-of-the-art methods, the proposed method is computationally efficient, and has comparable or better localization accuracy. II.
METHODS
The approach proposed consisted of two phases: interface localization and KNN based classification random forest for prediction. First, an interface between the ab-
© Springer International Publishing Switzerland 2015 D.A. Jaffray (ed.), World Congress on Medical Physics and Biomedical Engineering, June 7-12, 2015, Toronto, Canada, IFMBE Proceedings 51, DOI: 10.1007/978-3-319-19387-8_ 46
191
192
B. Gong et al.
dominal cavity and the thoracic area is located by a differential model. Classification RF is constructed to maximize the information gain. The prediction of each test sample is determined by the relative structural prior of liver or the KNN training samples in the same leaf node during testing (Fig.1)
1 1 f I ,j ( P( s ) P( j )) K I
exp(
( PK1( s ) PI1( j ) )2 2G 2
)
( 5)
Fig. 1 The proposed workflow. The training phase comprises the lungliver interface localization and the construction of classification RF. KNN model is used in the leaf node of random forest to make prediction.
A. Interface Localization and Structural Prior As the liver lies in the abdominal cavity, the heart and the lung lie in the above cavity. Since the intensity of lung region is low, we count the percent of such voxel over the right-half of the axial plane as RI ( z ) , and the location of interface * I
L
* I
L for each image I should satisfy: (Fig. 2) { z | 0 'RI ( z) G , 'RI ( z) * I
RI ( L )
RI ( z) RI ( z 1) } (1)
D max{ RI ( z) , 0 D 1}
( 2)
After the interface localization, the reference point in each image is set as PI0
( 0, 0, LI* ) , which is the refer-
ence coordinate in the structural prior system. N 0 A number of voxels ^P( were randomly sampled j )` I
j
1
from liver region in each image and transferred to the reference coordinate system as: 1 P( j) I
0 P( j ) PI0 I
( xI ( j ) , yI ( j ) , zI ( j ) LI* ) (3)
In 3D-CT image K, each new voxel 1
s) to reference coordinate as P( K
0 P( s) is transferred K
0 P( s) PK0 , and the K
probability of the voxel belonging to liver is counted as 1 1 1 1 pr i or ( P( ¦ f ( P( j )) ( 4) s) ) = s) - P( K I NH I ,j i ,j K 1 1 with P( j ) N( P( j ) , H ) , and I K
of the voxel
NH be the neighbor size
1 k
P( j ) in the reference coordinate system, H is
the distance metric,
f I ,j is the Gaussian kernel function:
Fig. 2 Interface localization between the abdominal cavity and the thoracic. a, b) The red line represents the interface in 3D-CT image, c).The percent of low intensity of voxels in different axial plane. d) The detection of interface over all testing images with different
G
.
B. KNN Based Classification Random Forest Random forest is an ensemble of independent decision trees that are trained in a random manner, which have enjoyed the benefit of efficiency, robustness for amount of noise and errors, and can be trained on large, high dimensional datasets without over-fitting. During forests construction, each tree starts by randomly choosing a subset of entire datasets as well as feature vectors, those trees are able to make independent predictions. During testing, it is clear that different tree in the classification RF contributes differently to the final result because of the random selection of the training sets and features. Besides, each testing sample is described by a set of features. For each tree in classification RF, just very few features are used to make prediction. And samples in the same leaf node share just very few features ( D features or less), which means that the samples in the same leaf node may be different in other features greatly. Though, in statistics, more trees with an average score will advance the prediction, it is not good enough when the number of training samples is limited. In medical image analysis, anatomical structures share strong spatial context relationship. Many methods were proposed to exploit such spatial information, such as atlas based RF. We applied a KNN method which relies on the hypothesis that the samples in the same leaf node not only
IFMBE Proceedings Vol. 51
Automatic Liver Localization based on Classification Random Forest with KNN for Prediction
share such features, but are closely localized in the space. The distance of KNN model was defined as Euclidean distance that depends on the relative spatial location. Each new 0
193
used for each tree training, other main parameters were optimized as: T = 100 trees, the number of features per tree is 12, the maximum depth d = 11, the minimum size of
sample in image I with original coordinates P( j I
) has been
stopping criteria for leaf node
transferred
coordinates
neighbor size K = 16. The neighbor size did not impact much the localization performance, as in Fig. 3, which indicated our method is robust on parameters K. The number of weak classifiers was set as 100 equal to that of the AdaBoost and traditional random forest. Since the localization has been considered as initial segmentation, the performance of our methods was evaluated with Dice coefficient and Hausdorff distance metric. The mean metric score were shown in Table 1. Compared with AdaBoost and traditional random forest, our KNN based random forest approach has higher Dice coefficient and smaller Hausdorff distance in average. Paired t-test was used to compare these metric values. The difference of KRF vs. AdaBoost was statistically significant in Dice coefficient (p