Fast -NN Classification for Multichannel Image Data Simon Warfield
Abstract
A new fast and exact algorithm for determining the -NN classification of multichannel image data, and a new distance transform algorithm are described. Complexity analysis and empirical studies with magnetic resonance images (MRI) demonstrate the effectiveness of the new classification algorithm.
Key words: -nearest neighbour rule, distance transform, pattern classification, magnetic resonance image (MRI) segmentation.
1 Introduction
The k-Nearest Neighbour ( -NN) classification rule is a technique for nonparametric supervised pattern classification. Given a training data set consisting of N prototype patterns (vectors) of dimension D and the corresponding correct classification of each prototype into one of C classes, a pattern most of the
of unknown class, is classified as class
if
closest prototype patterns are from class (Cover and Hart, 1967). Distance is measured with a distance
metric appropriate to the problem domain. It has been shown that, with certain statistical assumptions, in the infinite training data case the conditional risk of the 1-NN rule is
and that of the k-NN rule is
, where
is the Bayes risk (minimum possible) (Cover, 1968). The -NN rule is effective for the multichannel image data commonly used in medical imaging and remote sensing applications. Each pattern is a D-dimensional vector constructed from the value of the pixel in each of the D channels of the image. The range of possible pixel values in each channel
is limited by the image acquisition process. University of New South Wales, School of Computer Science and Engineering, Sydney 2052 Australia,
[email protected], Appeared
in ‘Pattern Recognition Letters’, Vol. 17, Num. 7, pp.713–721, 1996
1
A typical MRI data set of the human brain consists of 1-3 channels of data recorded for a 3D volume of 1-3x
voxels. A training data set is obtained by having an expert select typical pixels for 4 - 6 types of tissue. In this domain, the number of possible patterns is much smaller than the number of patterns to classify. This makes it efficient to compute a lookup table for every possible pattern that could occur in the image data, and then classify the volume by finding the location of the value of each voxel in the lookup table (Cline et al., 1990).
The -NN rule has higher accuracy and stability for MRI data than other common statistical classifiers, but has a slow running time (Clarke et al., 1993). Slow classification is impractical because it is a significant limit to the rate at which data sets can be processed, and because it limits the effectiveness of interactive selection of classification parameters during training.
The new fast -NN algorithm described here makes it practical to interactively select and training data prototypes to maximize the classification accuracy. Cline et al. (1990) report the effectiveness of an interactive classification step in which the training data set is modified by a trained observer on the basis of classification results. The accuracy of the classification is assessed by having the expert inspect the classification of an MRI data set, and this immediate feedback is used to modify the training data. In this methodology it is critically important that the time to compute the classification of the lookup table be small, so that it is convenient for an experienced operator to tune the classification parameters. Once the expert is satisfied with the accuracy of the classification, further data sets can be classified without intervention. The success of nearest neighbour classification in many applications and the slow speed of obvious algorithms,
has stimulated work on efficient algorithms for the 1-NN and -NN rules. Branch and bound algorithms have been applied for computing the nearest neighbours (Jiang and Zhang, 1993; Fukunaga and Narendra, 1975). A tree structure
of disjoint subsets of the training data is constructed and then efficiently searched. In these schemes, the -NN of each
pattern of unknown class are determined independently. Belkasim et al. (1992) described an efficient -NN rule that makes use of the natural clustering of training data to reduce the number of training patterns to which distances must be computed to classify a pattern of unknown class. In the worst case the partitioning of the training data has no
benefit. Friedman et al. (1975) describes an algorithm for finding the -NN based on ordering the training data along the dimension exhibiting maximum local sparsity for each test pattern. The expected number of distance calculations is O(F
! DN#"$! D). 2
Most applications of the segmentation of MRI (such as volumetric measures of diseased tissue) require high classification accuracy. In order to maintain a low error rate algorithms that quickly provide the exact answer are sought, rather than those that use approximations to gain speed. In application domains that do not require maximum accuracy, it is possible to make use of ideas of editing and condensing to reduce the number of training data prototypes considered. The goal of these algorithms is to minimize the number of training data prototypes whilst maintaining as much accuracy as possible (essentially a clustering problem), and they have been applied successfully with the 1-NN
rule Gates (1972). The new -DT based -NN algorithm, which is suitable for low dimensionality problem domains, has a worst case performance that is independent of the number of different training data prototypes, and so provides fast accurate classification for multichannel image data without editing of training data sets.
The algorithms for classification with the -NN rule described here generate identical output and so are not assessed on the basis of classification accuracy. The performance of each algorithm is primarily determined by the number of distance calculations carried out. The factors that affect performance are:
% , the number of nearest neighbours to consider, %
N, the number of training data patterns,
%
F, the number of unknown patterns to classify,
%
D, the dimensionality of the feature space,
%
the speed and number of evaluations of the distance metric.
This study compared the new -DT -based algorithm for the -NN classification of multichannel image data with the brute force algorithm of Cover and Hart (1967) and the algorithm of Friedman et al. (1975). A complexity analysis for each important parameter is presented. In order to give a meaningful assessment of the performance of each algorithm on practical classification problems, the performance of each algorithm was empirically determined on a typical multichannel image data classification problem.
It was found that the new -DT based -NN algorithm significantly reduces the number of distance calculations for the classification of MRI data as compared to the algorithms of Cover and Hart (1967) and Friedman et al. (1975).
It allows -NN classification to be used interactively on multichannel image data, making it practical to interactively
3
modify the classification parameters, so as to achieve the maximum accuracy. The -DT based -NN algorithm makes it possible to take advantage of as much training data as can be acquired, improving the error rate, without a significant performance penalty since the worst case number of distance calculations is independent of the number of different training data patterns. Once the parameters are optimized and a suitably accurate classification lookup table has been computed, new data sets can be classified efficiently by table lookup without further calculations.
2 Description of Algorithms for the & -NN Rule Each pixel of a multichannel image is a vector, consisting of the values of the picture element in each channel. The range of pixel values in a given channel is limited by the physics of image formation and image acquisition. Each channel may have a different number of possible pixel values. Each pixel vector is a pattern, corresponding
to one cell in this lookup table. The -NN rule is used to construct a lookup table of class, indexed by pixel value. Euclidean distance is commonly used for classification in the multichannel image data domain, and a table lookup implementation of this function has been used in the experiments reported here.
An important parameter affecting both the accuracy and execution time of the -NN classification rule is , the
number of nearest neighbours to consider. There is a tradeoff in the selection of . When
is chosen to be small
the classification of each pattern is based on a small number of prototypes and may not make effective use of the information present in prototypes. When
is chosen to be large, the
nearest prototype patterns may not be from a
close and small region of the feature space, and consequently the classification is not based on a local estimate of the underlying probability density function. When computing the classification of each cell in a lookup table, the number of cells in the lookup table, F, will
affect the execution time of the algorithm. The -DT based -NN algorithm is designed to take advantage of the need to classify an entire lookup table of patterns.
The N prototype patterns affect the accuracy and efficiency of -NN algorithms. As described above, under certain statistical assumptions the error rate approaches the minimum possible error rate as
and N approach infinity, but in
practice the number of prototypes used is limited by the availability of accurate prototype data, of time to select
prototypes and by the execution time of the -NN rule.
4
2.1 Brute Force
The -NN algorithm described by Cover and Hart (1967) provides a standard against which other algorithms may be compared. With this algorithm the distance to each training prototype is computed for each test pattern, and the nearest
are selected.
2.2 Ordering Prototypes The algorithm described by Friedman et al. (1975) reduces the number of distance calculations required to find the
-NN. First the prototypes are ordered on the values of each coordinate axis, generating D ordered sets. In order to classify a test pattern , an estimate of the local sparsity in the region of for each coordinate axis is made, and the ordering corresponding to the largest sparsity is chosen. The location of the test pattern in the ordered prototypes is
determined and the search for the -NN is carried out by considering the prototypes nearest the test pattern using the
selected order. The search for the -NN can be terminated when the distance to the next pattern along the coordinate
axis exceeds that of the ’th pattern already found. Statistical analysis indicates the expected number of distance calculations is O(F
2.3 The
*
' DN(") D).
Distance Transform Algorithm
The -NN classification strategies described previously classify each pattern in the lookup table independently. However, patterns next to each other in the table are separated by only a small distance. It is possible to speed up the
-NN classification by taking advantage of the computations done to classify nearby cells (grouping together similar patterns that have unknown class). The distance transform of an arbitrary dimensionality image consisting of feature and non-feature pixels, is an image where the value of each pixel is the distance to the nearest feature. Several distance transform algorithms have been proposed (Paglieroni, 1992). Distances are calculated by a series of local propagations. Distance transform algorithms are fully described by the number and shape of masks which determine how the distances are propagated locally. These masks are scanned over the space in order to compute the distance transform. Figure 1 illustrates the masks suitable for a two channel distance transform (Borgefors, 1984). A pass is made over the image from top left to bottom right with the forward mask, and at each pixel the mask is centred on the cell labelled 0. The new value of 5
B
A
A
0
B
B
Forward mask
0
A
A
B
Backward mask
Figure 1: Masks used to compute 2 dimensional distance transform. the centre cell is then the minimum of the set of values obtained by adding each cell to the value of the mask over it.
+ ,/.0,0 gives a chessboard distance function, +0,132!.4,05 is a city block distance function and +6,872!.4,:9 is an approximation to a Euclidean
The value of the mask cells alter the distance function being computed —
distance function. Another pass is made from bottom left to top right with the backward mask, updating the computed distances, and the transform is complete. An exact discrete Euclidean distance map can be obtained by propagating the displacement vector for each feature pixel rather than incrementally updating the distance with local propagations, and the distance map computed is then identical to that obtained without local propagations (Mullikin, 1992).
The fast -NN classification algorithm described here makes use of a new extended distance transform algorithm, that calculates the distances to the
nearest neighbours. This is equivalent to calculating
distance maps where
;=?@A@ CB is a map of the distance to the ; ’th nearest neighbour. Rather than computing maps of distances, maps of identifiers to training prototypes are computed, so that after computing the -DT of a lookup table, the each map
nearest training prototypes are directly available. The masks for any distance transform algorithm may be used. The implementation reported here uses the masks described by Borgefors (1984) for two channel problems. For higher dimensionality problems, the masks described by Ragnemalm (1993) are used. This algorithm requires
D
passes to
be made and, since these masks are separable, is particularly efficient on a parallel machine when each pass can be done simultaneously. Algorithm 1 is a schema for the
distance transform ( -DT) algorithm and an example of its operation is shown
in figure 2. Given a set of “objects” (such as training data prototypes), located in an arbitrary dimensionality image
(such as a classification lookup table), the -DT algorithm computes a map of the
nearest objects at every location
in the image. Each object (training data pattern) is stored in an array, and its index in the array is its unique identifier. The identifiers are inserted into a map. A set of DT masks are then passed across the map, but rather than propagating
6
+
+
+
+
+
+
+
5
2
1
1
1
2
5
+
+
0
0
0
+
+
4
1
0
0
0
1
4
+
+
0
+
+
+
+
4
1
0
1
1
2
5
+
+
0
0
0
+
+
4
1
0
0
0
1
4
+
+
+
+
0
+
+
5
2
1
1
0
1
4
+
+
0
0
0
+
+
4
1
0
0
0
1
4
+
+
+
+
+
+
+
5
2
1
1
1
2
5
(a) Original Image
(b) Distance Transform
0
0
0
1
2
2
2
3
1
1
0
1
1
1
0
0
0
1
2
2
2
3
3
1
0
1
1
6
3
3
3
1
2
2
2
0
0
0
3
6
6
6
4
4
4
5
6
6
6
3
3
3
4
5
7
7
4
4
4
5
7
7
7
8
8
8
7
6
6
6
8
8
8
9
10
10
10
4
9
9
8
9
7
7
8
8
8
9
10
10
10
9
9
9
8
9
9
7
(c) ID of nearest feature pixel (d) ID of second nearest feature
Figure 2: Computing identifiers of the -NN. The feature pixels are numbered in raster scan order starting from zero.
7
Insert training data patterns identifiers into map. for all distance transform mask scans do
for all cells in the map do
Propagate the -NN identifiers from each mask edge cell to the centre cell ,
Compute distance from to each of the training patterns, Sort in order of increasing distance, Select the identifiers of the
nearest patterns
Algorithm 1: Description of algorithm for computing the
Distance Transform ( -DT)
a distance, the identifiers of each object are propagated. This -DT algorithm allows more than one pattern to be at a particular location in the image. Any distance metric that computes the distance between two patterns can be used since only identifiers, not distances, are propagated. When there are a number of feature pixels at an equal distance from a particular cell (such as row 3, column 4 in figure 2, which is equidistant from feature pixels 1, 3 and 5) the ordering of the neighbours is based on the raster scan ordering of the feature pixels. Of course, when a classification
is to be carried out from the -NN set, all such feature pixels are included as part of the -NN set. The data structure
to represent the -NN of each cell in the map needs to be selected in order to maximise the speed of the propagate and select operations, whilst minimizing memory requirements. When the number of training prototypes is a small fixed number, a bit-vector can be used to record the identifiers of the prototypes. When a large number of training prototypes is used, a hash table is suitable.
The -DT -NN algorithm can be used to construct a classification lookup table by inserting the training data into
a map, computing the -DT of the map, and then applying the -NN classification rule to each cell in the map and placing the classification into a lookup table.
3 Summary of Complexity Analysis of the & -NN Algorithms
Table 1 is a summary of the complexity analysis for each algorithm. The performace of each -NN classification algorithm is reported in terms of the number of distance function evaluations (which scales linearly with the time needed to evaluate the distance metric) and the number of comparison operations. When the distance metric is
8
particularly fast to evaluate, the number of comparison operations may dominate the running time. Measure
Cover and Hart
distance calculations
FN
comparison operations
O(FN
Friedman et al.
! D N(") D ) DNEAFG (N) + O(F D N (") D )
O( D F(D+1) )
O(F
EAFG (N))
DT
O(F(D+1)
HE?F3G ((D+1) ))
Table 1: Dependence of each -NN classification algorithm on the number of distance calculations and comparison operations, where N is the number of training prototypes, F is the number of patterns to classify,
is the number of
nearest neighbours and D is the table dimensionality.
4 Empirical Performance of & -NN Algorithms A 3D double echo spin echo MRI of a human brain was used to provide data to compare the performance of each of the three algorithms on a practical problem. The pixel values in each of the two channels of data ranged from 0 to 255, giving a lookup table of 65536 cells to classify (F = 65536). An expert observer selected 250 training pixels of each of four tissue classes (background, white matter, grey matter and cerebrospinal fluid). The classification lookup table was then computed with each algorithm in turn, using a Sun SPARCstation 20/612 with 2x60MHz SuperSPARC processors and 128MB RAM, which is sufficient memory to ensure that the lookup table can be held in memory. Distances were computed using a table lookup implementation of the Euclidean distance function.
When applying the -NN algorithm to classify MRI images, some parameters are fixed by the problem domain and by the required accuracy. For example, for double echo spin echo MRI data, D = 2 and a pixel range from 0 to 255 is sufficient to capture the intensity ranges for soft tissue of the brain. Quantizing the MRI data further leads to a lack of differentiation between different tissue types, but a larger range leads to a larger lookup table without improving the discrimination between soft tissue types. The experiments illustrate which algorithm is faster for this problem, and also how changes in the parameters K, N and F affect the time to compute the table lookup classification. They demonstrate the complexity analysis results apply to this practical problem and indicate the performance of these algorithms if applied to another problem domain.
For each experiment of the parameters , N, F, two were fixed, and the third was varied over a wide range. The number of distance calculations required to compute the lookup table was recorded. 9
5 Results In order to investigate the classification error rate, the training data of 1000 prototypes was divided into 10 groups
of 100 prototypes. The -NN classification error rate was then determined by selecting in turn each group as test data and classifying this using the other groups as training data. This procedure was repeated for error rate of the ten trials as a function of
I?2KKLB . The average
is shown in figure 3.
Figure 3: Error rate of -NN classification for MRI segmentation.
To investigate the variation in performance as a function of the number of training prototypes N, the number of distance function evaluations needed to compute a classification lookup table of F = each algorithm for a range of values of N
3M NO
cells was determined for
?K32KK3LB . The ratio of the number of distance function evaluations for
the -DT algorithm and for that of Friedman et al. (1975) to that of the brute force algorithm as a function of N is shown in figure 4.
To investigate the variation in performance as a function of , the lookup table size and number of training data prototypes were fixed at typical values (F = 65536, N = 1000) and
was varied from 1 to 10. The ratio of the number
of distance function evaluations for the -DT algorithm and for that of Friedman et al. (1975) to that of the brute force
10
Figure 4: Ratio of number of distance calculations for each algorithm to the brute force algorithm (F = 65536, k = 1).
algorithm as a function of
is shown in figure 5.
To investigate the variation in performance as a function of the lookup table size F,
data prototypes, N, were fixed at typical values (k = 1, N = 1000) and F was varied from
and the number of training
K3 O to M 3 O . The ratio of
the number of distance function evaluations for the -DT algorithm and for that of Friedman et al. (1975) to that of the brute force algorithm as a function of F is shown in figure 6. The performance of the algorithms has been discussed in terms of the number of distance calculations, which has the advantage of being independent of any particular implementation or computer. In order to indicate that this measure reflects actual execution time, the time required to compute the classification of a table (F = 65536, N = 100,
Q?2KK B ) was determined. The ratio of the execution time for the -DT algorithm and for that of Friedman et al. (1975) to that of the brute force algorithm is shown in figure 7.
11
Figure 5: Ratio of number of distance calculations for each algorithm to the brute force algorithm (F = 65536, N = 1000).
Figure 6: Ratio of number of distance calculations for each algorithm to the brute force algorithm (k = 1, N = 1000).
12
Figure 7: Ratio of execution time of each algorithm to the brute force algorithm (F = 65536, N = 100).
13
6 Discussion
Algorithms for the classification of multichannel image data using the -NN classification rule have been described and compared. A novel distance transform algorithm has been presented.
The new -DT algorithm computes
distance maps, where each location in map
has the value of the distance
to the ’th nearest neighbouring object. The new -DT algorithm extends other distance transform algorithms from computing just the distance to the nearest “object”, to computing the distance to the
nearest “objects”. It removes
the restriction of the distance transform algorithm that only one “object” can appear at each location in space.
The
distance transform ( -DT) algorithm is the basis for an extremely fast -NN classification algorithm for
multichannel image data. The -DT -NN algorithm has worst case performance that is independent of the number of training data patterns. The behaviour of each algorithm has been studied empirically on the classification of a typical MRI data set. The
-DT -NN algorithm has the best performance of those compared here. The results indicate that the -DT -NN algorithm is an excellent choice for multichannel image data. Its speed makes it practical to routinely use interactively
for -NN classification.
The -DT algorithm is useful in other problem domains. Its application to clustering in colour space and to the robust registration of 2D and 3D data sets are currently under investigation.
References Belkasim, S., Shridhar, M., and Ahmadi, M. (1992). Pattern Classification Using an Efficient KNNR. Pattern Recognition, 25(10):1269–1274. Borgefors, G. (1984). Distance Transformations in Arbitrary Dimensions. Computer Vision, Graphics, and Image Processing, 27:321–345. Clarke, L. P., Velthuizen, R. P., Phuphanich, S., Schellenberg, J. D., Arrington, J. A., and Silbiger, M. (1993). MRI: Stability Of Three Supervised Segmentation Techniques. Magnetic Resonance Imaging, 11:95–106.
14
Cline, H. E., Lorenson, W. E., Kikinis, R., and Jolesz, F. (1990). Three-Dimensional Segmentation of MR Images of the Head Using Probability and Connectivity. Journal of Computer Assisted Tomography, 14(6):1037–1045. Cover, T. M. (1968). Estimation by the Nearest Neighbor Rule. IEEE Transactions on Information Theory, IT14(1):50–55. Cover, T. M. and Hart, P. E. (1967). Nearest Neighbor Pattern Classification. IEEE Transactions on Information Theory, IT-13(1):21–27. Friedman, J. H., Baskett, F., and Shustek, L. J. (1975). An Algorithm for Finding Nearest Neighbors. IEEE Transactions On Computers, C-24(10):1000–1006. Fukunaga, K. and Narendra, P. (1975). A Branch and Bound Algorithm for Computing k-Nearest Neighbors. IEEE Transactions On Computers, C-24:750–753. Gates, G. W. (1972). The Reduced Nearest Neighbor Rule. IEEE Transactions on Information Theory, pages 431–433. Jiang, Q. and Zhang, W. (1993). An improved method for finding nearest neighbours. Pattern Recognition Letters, 14:531–535. Mullikin, J. C. (1992). The Vector Distance Transform in Two and Three Dimensions. CVGIP: Graphical Models and Image Processing, 54:526–535. Paglieroni, D. W. (1992). Distance Transforms: Properties and Machine Vision Applications. CVGIP: Graphical Models and Image Processing, 54(1):56–74. Ragnemalm, I. (1993). The Euclidean distance transform in arbitrary dimensions. Pattern Recognition Letters, 14:883– 888.
15