Supervised Evaluation Methodology for ... - Semantic Scholar

Supervised Evaluation Methodology for Curvilinear Structure Detection Algorithms

Xiaoyi Jiang and Daniel Mojon Department of Electrical Engineering and Computer Science, Technical University of Berlin, Germany Department of Neuro-Ophthalmology and Strabismus, Kantonsspital St. Gallen, Switzerland

Abstract Curvilinear structures are useful features in a variety of applications. Compared to other commonly used features such as edges, there is relatively few work on curvilinear structure detection and its performance evaluation. In this paper we propose a novel supervised methodology for evaluating the performance of curvilinear structure detection algorithms. We consider the two aspects of performance, namely detection rate and detection accuracy, separately, in contrast to their mixed handling in earlier approaches that typically produces biased impression of detection quality. By doing so, the proposed performance measures give us a more informative and precise performance characterization. We will demonstrate the advantages of our approach using both synthetic and real examples.

1 Introduction The term curvilinear structure denotes a line or a curve with some width. Curvilinear structures are useful features in a variety of applications, e.g. in finding roads or rivers in aerial images, blood vessels or bones in medical images, and characters in text images. Compared to other commonly used features such as edges, there are relatively few algorithms for curvilinear structure detection [4, 6, 10] and methods for the evaluation of their performance. For instance, alone from 1993 through 1995 the authors of [2] counted 21 edge detection algorithms published in only three major journals. They also listed 12 edge detection evaluation methods. The purpose of this paper is to discuss the drawbacks of current methods for the evaluation of curvilinear structure detection algorithms and to propose a novel supervised evaluation methodology. The efforts of performance evaluation in computer vision can generally be classified into four distinct categories

The work was supported by the Stiftung OPOS Zugunsten von Wahrnehmungsbehinderten, St. Gallen, Switzerland.

[2, 8]: theory-based, human evaluation, ground-truth based, and task-based. Our methodology falls into ground-truth based evaluation. These approaches measure the difference between algorithmic results and ground truth. Because the ground truth is specified manually, the term supervised is also used in this context. We propose performance measures that indicate the difference between the detected curvilinear structures and the ground truth more informatively and precisely than earlier approaches. In the following our discussion will be exemplified by the task of detecting blood vessels in fundus images. It is important to point out that our approach is applicable in the general context of the evaluation of curvilinear structure detection algorithms.

2 Drawbacks of previous approach In [3, 5] the performance of blood vessel detection is measured as follows. Given a machine-segmented result image (MS) and its corresponding hand-labeled ground truth image (GT), any pixel which is marked as vessel in both MS and GT is counted as a true positive. Any pixel which is marked as vessel in MS but not in GT is counted as a false positive. The true positive rate (TPR) is established by dividing the number of true positives by the total number of vessel pixels in GT. The false positive rate (FPR) is computed by dividing the number of false positives by the total number of non-vessel pixels in GT. Similar pixel-wise comparison has also been used for evaluating binarization methods [7] and building extraction from aerial imagery [9]. While this approach is suitable there for comparing large regions, its application to curvilinear structures as elongated and thin regions is more questionable. This is illustrated in Figure 1 that shows two modified versions of a GT vessel image. For both MS and MS we obtain TPR=0.851 and FPR=0.000, indicating an equal rate of 85.1% correct detection and no spurious vessels. But in reality there are substantial differences between the two MS images. The image MS is generated by thinning GT at some places. In this case the entire ves

1051-4651/02 $17.00 (c) 2002 IEEE

and FPR result in a biased impression of detection quality. In this work we consider the two aspects separately and introduce corresponding performance measures. By doing so, we are able to provide a more informative and precise performance characterization.

3 Supervised evaluation methodology Figure 1. MS : partial thinning of GT. MS : deletions in GT

The basic assumption is that, for each test image, we have a corresponding GT image with the curvilinear structures manually specified. Concerned with fundus images, for instance, twenty images of both normal and abnormal (pathological) cases together with GT is publically available1 . In this paper we use these images for the test and illustration purpose. Given a blood vessel image V, we perform a thinning operation to obtain a thinned vessel image V of one-pixel width. Obviously, the vessel network structure is described by V . A supplement of width information for each vessel pixel in V provides then a full description of the vessel network. Given a MS and a GT, we propose to measure the detection rate by comparing MS and GT only, i.e. how much of the GT vessel network structure is detected in MS. In a second step the width of matched MS and GT vessel pixels are compared to give a measure of detection accuracy. A pseudo-code description of the evaluation procedure is given in Figure 3. To measure the detection rate we have to determine how many of the vessel pixels in GT are detected in MS . Due to the uncertainty in both manual ground-truthing and vessel detection a pixel-wise comparison is doubtless of no use at all. Traditionally, point matching is based on some threshold, i.e. two points less than the threshold apart from each other are considered as matched. This is particularly true in the evaluation of contour detection algorithms [1, 11]. In our case the characteristic of curvilinear structures gives us a natural limiting threshold without any manual specification. We consider a vessel pixel in GT as detected (true positive, TP) if is also a vessel pixel in MS. window around In practice we only require that the contains at least one vessel pixel in MS in order to tolerate positional bias in the case of very thin, say one-pixel wide, curvilinear structures. Then, the true positive rate (TPR) results from dividing the number of true positives by the total number of vessel pixels in GT . Undetected vessel pixels in GT are counted as false negatives (FN, missing vessels). A division by the total number of vessel pixels in GT supplies the false negative rate (FNR). In a similar manner false positives (FP, spurious vessels) are those vessel pixels in MS for which there is no vessel pixel in GT in the window around . In addition to computing FN and FP it is also interesting to ask about the characteristic, for instance

Figure 2. MS : partial expanding of GT. MS : insertions in GT

sel network is correctly detected, but some vessels have a smaller width than GT. In contrast MS results from deleting parts in GT and therefore perfectly equals GT except the deleted parts. A more objective performance measure would be TPR(MS )=1.00 and TPR(MS ) 1.00, indicating the percentage of the correctly detected part of the vessel network. The correctly detected parts of the vessel network can be further evaluated with respect to the detection accuracy, i.e. the width error. Then, we would expect a non-zero (zero) width error for MS (MS ). A second situation in Figure 2 illustrates a related problem. Again, the two MS images MS and MS have equal performance measures TPR=1.00 and FPR=0.012, implying a full detection of the vessel network and 1.2% spurious vessels in both cases. Here MS emerges from GT by locally expanding GT while MS equals GT plus eight spurious (diagonal) vessel parts. Different from MS , the spurious vessel pixels in MS cause vessel width errors, but do not change the vessel network structure in any way. Intuitively, a measure FPR(MS )=0 and FPR(MS ) 0 makes thus more sense. Accordingly, MS (MS ) is associated with non-zero (zero) width error. The examples above clearly show the drawbacks of the evaluation method from [3, 5]. Due to the nature of curvilinear structures being elongated and thin regions, a pixelwise comparison is not feasible. A mixture of two different aspects of performance, namely detection rate and detection accuracy, in the overall performance measures TPR

1 http://www.ces.clemson.edu/

1051-4651/02 $17.00 (c) 2002 IEEE

ahoover/stare/probing/

MS 99.9% 0.1% (4 pixels) 1:50%, 2:25%, 3:25% 0 0.00 0.26

/ Detection rate / for each vessel pixel in GT if there is at least one vessel pixel in W TP++; mark as detected; else FN++; FNHist[width( )]++;

TPR FNP FNHist FP pos. accu. width accu.

in MS

MS 77.7% 22.3% (1726 pixels) 1:62%, 2:34%, 3:4% 0 0.01 0.00

TPR = TP / (number of vessel pixels in GT ); FNR = FN / (number of vessel pixels in GT ); for each vessel pixel in MS if there is no vessel pixel in W in GT FP++; FPHist[width( )]++;

MS 100.0% 0.0% 0

TPR FNP FP FPHist

MS 100.0% 0.0% 408 1:8%, 2:8%, 3:8%, 4:21% 5:20%, 6:29%, 7:6% 0.00 0.00

/ Detection accuracy / for each detected vessel pixel in GT find in MS nearest to ; sum1 += distance(p,q); sum2 += width( )-width( ) ;

pos. accuracy width accuracy

0.05 0.28

Table 1. Evaluation results for MS , MS , MS , and MS

AvgPosAccuracy = sum1 / #(detected vessels in GT ); AvgWidthAccuracy = sum2 / #(detected vessels in GT );

Figure 3. Evaluation procedure. the width, of the missing and spurious vessels. It is probably more problematic to miss or erroneously detect thick vessels than thin ones. To obtain this information we can establish a width histogram of the false negatives and false positives. Here the width is defined by the Euclidean distance to the nearest pixel in the background. In the second part of the evaluation we intend to measure the detection accuracy. The accuracy consists of that of position and width. Here only those vessel pixels in GT marked as detected are involved. For each of such points its matching in MS is defined to be the vessel pixel in MS with the smallest Euclidean distance to . Then, the position accuracy is simply the Euclidean distance of and . Similarly, the width accuracy is given by the difference of the width of and . In both cases we finally compute the average accuracy through a division by the total number of the detected vessel pixels in GT .

togram of false negatives we see further that the missing vessels are relatively thin; 96% of the missing parts have a width up to 2. Here the width error is zero, indicating a perfect detection of 77.7% of the vessel network. The interpretation of these evaluation measures is exactly what we postulated for more informative and precise performance evaluation compared to the evaluation method in [3, 5]. Similar improvement can also be observed for MS and MS . No false positive is found in MS , compared to 408 in MS . Furthermore, the spurious vessels in MS are mainly thick ones with 76% being at least 4 pixels wide. Although both MS and MS have 100% TPR, MS is not a perfect detection. The width error 0.28 indicates some detection inaccuracy. Now we demonstrate our evaluation method on real vessel detection results. Figure 4 shows a fundus image, the corresponding ground truth, and vessel detection MS and MS from two different algorithms. Using the evaluation method from [3, 5] we obtain:

MS : MS :

TPR=0.919, TPR=0.803,

FPR=0.040 FPR=0.019

There is a large difference (11.6%) in TPR. Also, MS contains two times false positives than MS . The evaluation measures based on our approach are listed in Table 2. Actually, MS only detects 4.7% more of the vessel network structure than MS . The much larger difference of 11.6% above is explained by the fact that MS tends to be thicker than GT. Thus, it produces a better pixel-wise matching results, but also a higher FPR. Measured by our method, this is expressed by a larger width error for MS than MS . On the other hand, MS and MS essentially contain the same amount of spurious vessels, although those

4 Experimental results

First we show how our method evaluates the four images in Figures 1 and 2, see Table 1. As wanted, MS has TPR near 100%, implying a full detection of the vessel network structure. The fact that the detected vessels are thinner than GT is expressed by the width error 0.26 (pixel). In contrast MS leads to TPR=77.7% only and accordingly 22.3% of the vessel network structure undetected. Based on the his

1051-4651/02 $17.00 (c) 2002 IEEE

imental work has been embedded in the context of blood vessel detection in fundus images. It is important to point out that our approach is applicable in the general context of the evaluation of curvilinear structure detection algorithms.

References

Figure 4. Top: fundus image and GT. Bottom: two detection results MS and MS .

MS 89.0% 11.0% 1:94%, 2:4%, 3:2% 594 1:27%, 2:48%, 3:17% 4:8% 0.54 0.75

TPR FNP FNHist FP FPHist

pos. accu. width accu.

MS 84.3% 15.7% 1:96%, 2:3%, 3:1% 620 1:81%, 2:15% 3:4% 0.45 0.51

Table 2. Evaluation results for MS

and MS

in MS are slightly thicker than in MS . Here our performance measures provide again a more detailed and precise description of the differences between algorithmic results and ground truth.

5 Conclusion Compared to other commonly used features such as edges, there is relatively few work on curvilinear structure detection and its performance evaluation. In this paper we have proposed a novel supervised methodology for evaluating the performance of curvilinear structure detection algorithms. We consider the two aspects of performance, namely detection rate and detection accuracy, separately, in contrast to their mixed handling in earlier approaches that typically produces biased impression of detection quality. By doing so, the proposed performance measures give us a more informative and precise performance characterization. Both synthetic and real examples have been used to demonstrate the advantages of our approach. The exper-

[1] S. Dougherty and K.W. Bowyer, Objective evaluation of edge detectors using a formally defined framework, in: Empirical Evaluation Techniques in Computer Vision (K.W. Bowyer and P.J. Phillips, Eds.), IEEE Computer Society Press, 210–234, 1998. [2] M.D. Heath, S. Sarkar, T. Sanocki, and K.W. Bowyer, A robust visual method for assessing the relative performance of edge-detection algorithms, IEEE Trans. on PAMI, 19(12): 1338–1359, 1997. [3] A. Hoover, V. Kouznetsova, and M. Goldbaum, Locating blood vessels in retinal images by piece-wise threshold probing of a matched filter response, IEEE Trans. on Medical Imaging, 19(3): 203–210, 2000. [4] J.-H. Jang and K.-S. Hong, Linear band detection based on the Euclidean distance transform and a new line segment extraction method, Pattern Recognition, 34(9): 1751–1764, 2001. [5] X. Jiang and D. Mojon, Adaptive local thresholding by verification-based multi-threshold probing with application to vessel detection in retinal images, under revision for IEEE Trans. on PAMI, 2002. [6] Th.M. Koller, G. Gerig, G. Szekely, and D. Dettwiler, Multiscale detection of curvilinear structures in 2-D and 3-D image data, Proc. of ICCV, 864–869, 1995. [7] S.U. Lee, S.Y. Chung, and R.H. Park, A comparative performance study of several global thresholding techniques for segmentation, Computer Vision, Graphics, and Image Processing, 52: 171–190, 1990. [8] M.C. Shin, D.B. Goldgof, K.W. Bowyer, and S. Nikiforou, Comparison of edge detection algorithms using a structure from motion task, IEEE Trans. on SMC, Part B: Cybernetics, 31(4): 589–601, 2001. [9] J.A. Shufelt, Performance evaluation and analysis of monocular building extraction from aerial imagery, IEEE Trans. on PAMI, 21(4): 311–326, 1999. [10] C. Steger, An unbiased detector of curvilinear structures, IEEE Trans. on PAMI, 20(2): 113–125, 1998. [11] C. Wiedemann, C. Heipke, and H. Mayer, Empirical evaluation of automatically extracted road axes, in: Empirical Evaluation Techniques in Computer Vision (K.W. Bowyer and P.J. Phillips, Eds.), IEEE Computer Society Press, 172–187, 1998.

1051-4651/02 $17.00 (c) 2002 IEEE