Journal of Electronic Imaging 17(3), 031104 (Jul–Sep 2008)
Practical use of receiver operating characteristic analysis to assess the performances of defect detection algorithms Yann Le Meur Jean-Michel Vignolle Trixell 38430 Moirans, France E-mail:
[email protected] Jocelyn Chanussot GIPSA-Lab Department Image et Signaux 38402 Saint Martin d’Heres, France
Abstract. Defect detection in images is a current task in quality control and is often integrated in partially or fully automated systems. Assessing the performances of defect detection algorithms is thus of great interest. However, because this is application- and context-dependent, it remains a difficult task. We describe a methodology to measure the performances of such algorithms on large images in a semi-automated defect inspection situation. Considering standard problems occurring in real cases, we compare typical performance evaluation methods. This analysis leads to the construction of a simple and practical receiver operating characteristic (ROC) based method. This method extends the pixel-level ROC analysis to an object-based approach by dilating the ground truth and the set of detected pixels before calculating the true-positive and false-positive rates. These dilations are computed thanks to the a priori knowledge of a human-defined ground truth and gives to true-positive and false-positive rates more consistent values in the semi-automated inspection context. Moreover, the dilation process is designed to be automatically suited to the object’s shape in order to be applied on all types of defects without any parameter to be tuned. © 2008 SPIE and IS&T. 关DOI: 10.1117/1.2952590兴
1 Introduction Quality-control tasks are some of the main application fields of digital image processing, particularly detection theory. The increasing number of new image processing techniques applied to industrial inspection is relevant proof of the interest taken by both the industrial and academic communities in this problem. The leading applications of these techniques are defect detection on textile, wood, or other industrial matters by automated inspection on digital images.1,2 These images can be acquired by simple optical imaging, x-ray imaging, or nondestructive methods like ultrasound reflection on the surfaces to be inspected. This paper considers the retrieval of defects on digital x-ray detectors. Digital detectors are Paper 07160SSRR received Aug. 1, 2007; revised manuscript received Feb. 13, 2008; accepted for publication Feb. 13, 2008; published online Jul. 9, 2008. This paper is a revision of a paper presented at the SPIE conference on Quality Control by Artificial Vision, May 2007, Le Creusot, France. The paper presented there appears 共unrefereed兲 in SPIE proceedings vol. 6356. 1017-9909/2008/17共3兲/031104/14/$25.00 © 2008 SPIE and IS&T.
Journal of Electronic Imaging
now used in x-ray radiography to acquire digital images. The advantages of this fully digital system are obvious: Lower exposure is required than with film systems; the images have a better quality; the digital format enables easy storage and transmission; digital processing algorithms can be used in order to enhance diagnostic reliability, etc. Since the production process of such devices is lengthy and requires human intervention, an important issue is to check the quality of the detector’s output images, particularly to search for potential defects in these images. The detection and localization of such defects can be achieved by image processing algorithms. Defects from digital x-ray detectors produce spurious features in the output images, with various shapes and properties. Their detection thus remains a difficult problem, and several algorithms must consequently be considered, evaluated, and compared. Different methods to quantify the detection performances of an algorithm have been described in the literature. In the frame of text detection and recognition, Wolf and Jolion3 assess the performances by rectangle matching and performance graphs. Liu and Haralick4 proposed a simple method based on neighborhood inspection to evaluate edge detection performances. Nascimento and Marques5 classify types of detection errors in order to build a metric for the evaluation of object detection algorithms in surveillance applications. More general methods use common metrics merged in a basic way6 or with fuzzy logic.7 Even though they can be useful for specific applications, the reliability of these methods vanishes when considering the task of detecting different objects with various shapes. We propose a practical view of how defect detection algorithms can be evaluated and give a response to the assessment of these algorithms based on well-known ROC analysis and object morphology. An overview of the inspection task is followed by a brief description of ROC methodology, and an original method derived from this methodology is presented and discussed. Finally, we illustrate how to use such a method to process an automated thresholding of detection images. This paper is an extended
031104-1
Jul–Sep 2008/Vol. 17(3)
Le Meur, Vignolle, and Chanussot: Practical use of receiver operating characteristic analysis…
version of our presentation8 in 2007 at the Eighth International Conference on Quality Control by Artificial Vision 共QCAV兲. 2 The Inspection Task Digital x-ray detectors provide large grayscale images, larger than 3,000⫻ 3,000 pixels. The inspection task consists of finding defects in these images. The considered images are acquired with a nominal x-ray dose and without any subject between the x-ray generator and the detector: Consequently, images are only composed of the acquisition noise, which we will consider as background, and of potential defects we want to detect. In this case, the goal of detection algorithms is to localize defect areas so that a human expert could then inspect them more specifically. The quality-control task is not fully automated: The detection algorithms bring assistance to make faster and more reliable human decisions. This is a typical situation in an industrial context: The detection algorithms are designed to catch the attention of a human operator and only focus on potential defect areas in order to make the inspection task less tedious. The evaluation of the algorithm performances must take this context into account and provide a measure suitable for various defects. The main assumptions introduced by this specific application are as follows: • The perfect location of all the defective pixels is not required: Actually, the human expert just needs some pixels at each defect location to identify it. On the other hand, the whole set of defects must be identified by the detection of at least one pixel lying inside each target. A target or a defect within the ground-truth map is defined as a connected set of defective pixels. • The borders of each target are approximately defined: Since the design of a ground truth remains subjective, the areas near the borders of a defect can be seen as either defective pixels or background pixels. Moreover, some defects have naturally fuzzy borders; a precise and certain ground truth can thus not be defined. These observations imply particular definitions in order to use ROC methodology, which we will describe in Section 4. In our specific application, there are two kinds of defects to detect: 1. punctual defects: isolated pixels or little clusters of pixels with abnormal statistics 共strong luminance兲; 2. extended defects: a spatially correlated set of pixels that are not statistically atypical when considered individually. They come in the shape of lines, columns, spots, or gathered clusters. They contain several pixels, which are not necessarily punctual defects. Figure 1 shows examples of synthetic defects with the associated defect map designed by a human expert. These examples have been “handmade” to illustrate extreme cases of defects that could occur in any kind of imaging system. This figure spotlights the various ways to build the defect maps, depending on the type of defect. For punctual defects, the defect map is defined precisely, whereas for extended defects, like the fuzzy spot, the defect map is more subjective. When several clusters are gathered 关as on Journal of Electronic Imaging
Fig. 1 Different kinds of defects and corresponding defect map drawn by a human expert. One can notice the subjectivity of this task: borders are delimited approximately, and clusters of defective pixels can be gathered to form a single defective cluster.
the right side of Fig. 1共a兲兴, the defect map includes the clusters and some nondefective pixels in one single object. It also underlines the scale adaptation required to identify high-level structures of defects. As a matter of fact, the clusters of defective pixels on the right side of Fig. 1共a兲 are not identified as several punctual defects, but rather as a single defective area made of punctual defective pixels. These remarks should be considered when designing a method to assess the performance of defect detection algorithms in such a semi-automated inspection task. 3
The ROC Analysis
3.1 Definitions and ROC Curves Initially developed for the evaluation of radar detections systems in the 1950s, ROC analysis was first described in terms of signal detection theory in the mid-1960s.9,10 Over the years, it has become a standard technique to evaluate detection performances.11,12 First used to measure diagnostic performances of medical imaging systems, especially in radiological imaging,13–15 the ROC methodology has since been extended to various detection systems. For a single-target 共a defective area in our case兲 problem, the ROC analysis consists of measuring the binary response of the detection system 共target present or not兲 to one stimulus, in our case an image, by calculating the truepositive rate tpr and false-positive rate fpr with tpr =
true positive , total positives
共1兲
fpr =
false positive . total negatives
共2兲
Figure 2 presents the classical representation of a confusion matrix. A couple 共fpr ; tpr兲 corresponds to one point in the ROC plane. ROC curves are computed for varying parameters of the detection systems, and tpr and fpr are computed for each value of the parameter. The ROC analysis is an appropriate tool to deal with detection performances since it takes the prevalence of each class into account and provides two complementary and intuitive measures that are meaningful for the system calibration.
031104-2
Jul–Sep 2008/Vol. 17(3)
Le Meur, Vignolle, and Chanussot: Practical use of receiver operating characteristic analysis…
Fig. 2 The confusion matrix represents the true positive and the false positive for a defect detection task. These two quantities are plotted on ROC curves.
In a defect detection context, there can be many defects in one image. The true positive and false positive can be computed on this image following the Free-ROC methodology16 共FROC兲. Free-ROC is an extension of the ROC methodology to target localization, while ROC only deals with target detection. In our application, the advanced theory of Free-ROC is not needed, so we will only consider basic tools of the ROC methodology. Considering an image with several targets—the defects—and a defect detection algorithm providing a pixel-by-pixel classification with two classes 共defect or no defect兲, there are four cases for each pixel pi of the image: 1. pi is classified as defect and is a defect in the groundtruth image; it is a true positive also called hit, or recall. 2. pi is classified as background 共no defect兲 and is a background pixel in the ground-truth image; it is a true negative. 3. pi is classified as defect and is a background pixel in the ground-truth image; it is a false positive also called false alarm. 4. pi is classified as background and is a defect pixel in the ground-truth image; it is a false negative. The main advantage of ROC analysis is that the two quantities tpr and fpr are normalized to the number of positive and negative samples, respectively. Then, unlike the traditional measures like accuracy 共the percentage of pixels correctly classified兲, tpr and fpr cannot be biased by a small prevalence of one class compared to the other. To spotlight this statement, let us consider a detection task on a 1,000⫻ 1,000-pixel image with one single defective pixel. An algorithm that systematically detects nothing is actually very accurate: 99.9999% of the pixels are correctly classified. On the other hand, both its fpr and tpr will be 0%, thus revealing really bad detection performances. As a conclusion, the sole accuracy of the algorithms does not provide enough information to ensure a reliable estimation and is biased in this case by the small prevalence of the defect class. ROC analysis features in one curve the sensitivity 共equivalent to the true-positive rate tpr兲 of the detection system versus fpr, which are the two quantities of interest in a quality control context. It indicates how many false Journal of Electronic Imaging
Fig. 3 Examples of ROC curves. In on this example, 4 ROC curves have been plotted. The first one, from algo1, shows that this detection algorithm provides a perfect result 共100% of true positives for all decision thresholds兲. The curve from algo4 has been computed from a detection algorithm that decides randomly between two possible classes 共defect or no defect兲. The 2 other curves are those from two other detection algorithms. Algo2 performs a better detection since its curve always stays above algo3’s curve.
alarms are generated by the system for a given detection sensitivity. Moreover, an ROC curve provides the dynamic behavior of the system with respect to a change in the decision threshold. This information can be used to choose between two detection systems: The ROC curve of the better detection system is always higher than the other curve in the ROC plane. The ROC curves of four detection algorithms are displayed in Fig. 3. The perfect detection algorithm is “algo1,” whose ROC curve is a step function 共100% of tpr for any fpr兲. In the case where the two classes, defects and background, are equally distributed, an algorithm corresponding to a random decision has the ROC curve “algo4” 共the ascending diagonal of the ROC plane兲. Between random and ideal decision, “algo2” performs better than “algo3.” A practical measure of the global performance of an algorithm is given by the area under its ROC curve. This area under curve 共AUC兲 is commonly used to quantify with one single number the overall performance of a detection algorithm.13,17,18 4 Comparison of the Masks In Section 3, the ROC analysis was presented as a useful tool to assess the performances of a detection algorithm. However, a major problem remains: How can tpr and fpr be estimated. In other words, which pixels should be considered as true positives or false positives? At each decision threshold, tpr and fpr are calculated by comparing a binary detection mask 共with ones for defects and zeros for the background兲 with a ground-truth mask. In the following, the detected defect mask is called the test mask, M i,j. It results from a pixel-wise decision produced by the detection algorithm. The manually designed groundtruth mask is called the target mask, Ti,j. In practice, the simplest way to compare these two masks is to make a pixel-level comparison, thus exactly fitting the definitions of false positive and true positive given in Section 3.
031104-3
Jul–Sep 2008/Vol. 17(3)
Le Meur, Vignolle, and Chanussot: Practical use of receiver operating characteristic analysis…
Fig. 4 tpr and fpr: In each image, the targets to detect are in gray and the detected pixels are in white. The first two images have the same tpr and the last two have the same number of false-alarm pixels. Nevertheless, in our inspection context, the configuration that corresponds to 共a兲 and 共c兲 is considered a better detection result since all the targets have been detected, and there is only one true false alarm. Indeed, the detected pixels around the target in 共c兲 should be not counted as false alarms, as they are very close to the target.
As discussed in Section 2, the human expert does not need the detector to provide a pixel-wise precision for the localization of the defects. Based on this assumption, alternative methods have been proposed to calculate tpr and fpr: Theiler et al.19 suggest transforming the test mask in order to consider each object of the target mask as 100% detected as soon as at least one pixel is detected in this object. In a similar way, Harvey and Theiler20 proposed dilating the test mask by a fixed factor, in order to include some of the “near-hit” pixels of the test mask in the tpr count. The construction of such alternative methods take significance in our semi-automated inspection context, where we have to be more specific about the definition of a true positive and a false negative. First, let’s consider the true positives: In a multitarget situation 共i.e., there is more than one target in the ground truth兲, the tpr should be 100% if the detected pixels lead to the visual localization of all targets, even if all the defective pixels are not detected. In the example in Fig. 4共a兲, about 66% of the defective pixels are detected 共hatching areas兲, but the detection allows the expert to localize all the targets. The global tpr in Fig. 4共b兲 is also 66%, but for the human expert the detection is clearly worse since the bottom target has been completely missed. Now, considering false alarms, we should make a distinction between isolated false alarms, false alarms close to a target 共“near hit”兲, and clusters of false alarms. For the human expert, the number of false alarms is the number of observation windows, called AOI 共“area of interest”兲 that the system will display to be checked. Each window displayed with no true defect will be considered a false alarm. In the example in Fig. 4, the two images of Figs. 4共c兲 and 4共d兲 both have 10 false-alarm-pixels. But for Fig. 4共c兲, the human expert will consider the detection to have only one false alarm: The few detected pixels near the target cannot be seen as a false alarm because the associated AOI will include a part of the target. The other detected pixels form a cluster that will be embedded in only one AOI. This detected AOI consequently raises the only false alarm of the image. In Fig. 4共d兲, there are only isolated false alarms: The expert has to check 10 AOIs, which do not include a defect; the detection system then raises 10 false alarms. Taking these observations into account, we propose a method to compare binary masks that brings up less penalization to detected pixels near the object borders. These Journal of Electronic Imaging
pixels are considered false alarms by all the previous methods. The main purpose of such a new method is to extend the pixel-based ROC analysis to an object-based analysis. In the meantime, it should be adapted to the ROC framework. The proposed method does not require any extra parameter to be tuned and can be applied to the cases where multitargets of different sizes and shapes are to be found in a single image. In the following parts, three comparison methods for masks are described and discussed, namely, the simple pixel-level comparison, Theiler’s method, and our proposed method. The proposed method is focused on the problems raised by the semi-automated inspection task. Harvey’s comparison method requires a strong a priori knowledge of the size of the targets. It is thus not further developed here. 4.1 Pixel-Level Mask Comparisons The first and most intuitive method is to compute the binary comparison between target and test masks, without any preprocessing. Considering the binary target mask Ti,j with P defective pixels 共pixels with value 1兲 and N background pixels 共with value 0兲, the pixel-level mask comparison is described by Fig. 5. The “pixel count” box returns the number of pixels with value 1 in the input image, and the “not” box stands for the binary complement operator. tpr and fpr are thus computed as follows:
tpr =
1 兺 M i,j · Ti,j , P i,j
共3兲
fpr =
1 兺 M i,j · 共1 − Ti,j兲. N i,j
共4兲
The pixel-level mask comparison is computed on the three synthetic defects in Figs. 6共a兲–6共c兲. The corresponding ground truth is displayed in gray on the second line of this Figure, with the test mask appearing in white. Missed pixels on these latter images are in gray and detected pixels are always in white, whether they are false alarms or true positives. The three defects chosen are • punctual defects: six isolated defective pixels, • fuzzy spot defect: a bright spot with fuzzy borders, • cluster defects: a cluster of defective pixels forming an area identified by the human expert as one single defective object. For the purpose of clarify, the ground-truth pixels for the punctual defects are pointed out in Fig. 6共d兲 by arrows. Table 1 presents the computed tpr and fpr for the three defects with this pixel-level mask comparison. For the isolated pixels, the simple pixel-level mask comparison provides satisfactory results. For these kinds of defects, the exact location is required and false alarms are raised even if the detection is close to the defect. In practice, the system must be very precise for this kind of defect in order to catch the human expert’s attention on the exact pixel because of the very small size of the defects. In this case the pixel-level comparison performs well, giving a tpr
031104-4
Jul–Sep 2008/Vol. 17(3)
Le Meur, Vignolle, and Chanussot: Practical use of receiver operating characteristic analysis…
Fig. 5 The pixel-level mask comparison computes the tpr and fpr by direct comparison between target and test masks. “Not” stands for binary complement, and “count” returns the number of white pixels in the input image.
of one out of two and an fpr that correctly represents the number of times the expert will focus on a nondefective pixel. The situation of the fuzzy spot defect is more ambiguous. In the present example, some pixels inside the target are actually detected, but not all of them are. In the meantime, the ground truth is set as a circular area that includes the fuzzy spot. In this case, the algorithms’s detection is almost perfect: The number of good detections and their location at the center of the defect are sufficient informa-
tion for the human expert to properly identify the defect. But due to the ground-truth definition, the fpr and tpr computed at a pixel level are far from the fairly good expected values. Moreover, one can consider the detected pixel at the bottom left of the defect as a near hit, which then should not be treated as a false alarm. As a matter of fact, the assessment of detection performances faces the high subjectivity linked to the design of the ground truth, especially in this case where the defect’s borders are fuzzy. This subjectivity is not integrated in the pixel-level comparison.
Fig. 6 Three kinds of defects and corresponding target and test masks. Pixels in gray are those from the target mask, drawn by a human expert. Pixels in white are the pixels detected as defective by a detection algorithm. Journal of Electronic Imaging
031104-5
Jul–Sep 2008/Vol. 17(3)
Le Meur, Vignolle, and Chanussot: Practical use of receiver operating characteristic analysis… Table 1 tpr and fpr computed with the pixel-level mask comparison. fpr 共%兲
tpr 共%兲
Punctual
0.12
50.0
Spot
0.03
Cluster
0.12
Defect
fpr 共%兲
tpr 共%兲
Punctual
0.12
50
3.8
Spot
0.03
100
1.5
Cluster
0.12
100
Defect
Similar remarks hold for the example of cluster defect. Again, some pixels 共left and right sides of the cluster兲 are counted as false alarms when they are too close to the borders to be actually objectively considered as such. Second, many pixels are detected inside the cluster 共mainly the punctual defects inside this cluster兲. Nevertheless, the tpr barely reaches 2%, whereas a human expert would identify the whole defect area, thanks to these detected pixels. These examples underline the unsuitability of the pixellevel comparison to assess detection performances in a complex context. The main reason is the following: The method does not take into account the assumptions underlined in Section 2. 4.2 Theiler’s Mask Comparison As the simple pixel-level comparison leads to an inappropriate assessment of detection performances, there is a need to derive a technique that somehow mimics the human expert. In this way, Theiler proposed a metric to perform a higher-level interpretation of the test mask. This technique depends on a “filling-in” process: All of a target’s pixels are considered detected if the detection algorithm detects at least one of them 共see Fig. 7兲. tpr and fpr are thus computed as follows: tpr =
1 兺 Fill共M i,j兲 · Ti,j , P i,j
Table 2 tpr and fpr for Theiler’s mask comparison.
共5兲
fpr =
1 兺 M i,j · 共1 − Ti,j兲, N i,j
共6兲
where Fill is the filling-in operator. Following this approach, tpr and fpr are computed on the defects of Fig. 6. Corresponding results are displayed in Table 2. For punctual defects, Theiler’s method provides satisfactory results, similar to those obtained with the pixel-level mask comparison. For the other two examples, the fpr according to Theiler’s comparison is unchanged, but the tpr is now 100%. As a matter of fact, in each case, at least one pixel is detected inside the target. This strategy is well suited in some cases: For the fuzzy spot defect, the detected pixels are centered on the defect and are distributed over an area that is not too different from the true defect. Then, the expert will consider all these detected pixels as one detection, allowing him to find the defect. The tpr of 100% is a correct measure of the detection performances. On the other hand, the limits of this filling-in strategy arise when considering the cluster defect. In this case, mainly the bottom part of the defect is detected while the top part has no detection. The expert can miss one part of the defect due to a lack of detected pixels on the whole area of the ground truth. This fact is not expressed, since Theil-
Fig. 7 Theiler’s mask comparison diagram. It adds “fill-in” operators to the previous pixel-level method. Then, if at least one pixel is detected in a target, all the pixels of this target will be counted as true positives. Journal of Electronic Imaging
031104-6
Jul–Sep 2008/Vol. 17(3)
Le Meur, Vignolle, and Chanussot: Practical use of receiver operating characteristic analysis…
Fig. 8 Original soft mask comparison. Target and test masks are dilated by a dilation factor computed in accordance with the size of each target. This process aims to mimic the fpr and tpr count, which should be made by a human expert in an inspection context.
er’s method attributes a tpr of 100% to the detection on cluster defect. This problem may occur especially with extended defects and in a multitarget context. The fpr computation follows the same rule as the pixellevel comparison, with the drawbacks described in the previous section. To conclude, Theiler’s comparison extends the pixellevel approach to an object-based approach by simply adding a filling-in process to the test and target mask comparison. On a single-target context, and with small targets compared to the image size, this comparison performs an acceptable assessment in terms of inspection. Nevertheless, it does not take the distribution of the detected pixels inside the target into account, which is a criterion to look after before declaring the target as 100% detected. Moreover, the false-alarm computation remains inappropriate. 4.3 The Proposed Soft Mask Comparison In this section, we present an original method that provides a practical response to the performance assessment requirements and overcomes the limitations of the standard methods previously described. The principle of our method is illustrated by Fig. 8. The method requires two new operators: target dilation and test dilation. The target dilation is applied on the target mask and aims at extending the areas of targets to take into account the subjectivity of the “ground-truthing” task and the fuzzy nature of target borders. The test dilation is applied on the test mask, which contains pixels detected by the detection algorithm. This process mimics the way a human expert would analyze the detection result—She would focus her attention not only on the detected pixels, but also on the pixels around them. These dilation processes require the computation of a distance map, as explained in the following. This distance map needs to be computed only one time for a given target mask. Then the computation of several ROC points 共varying test mask兲 requires only one distance map computation and one target dilation. The proposed method only adds a few operations compared to the pixel-level methods: the dilations on the test mask. Journal of Electronic Imaging
4.3.1 Distance map computation The distance map is computed on the ground truth 共target mask兲. It will provide a map with a dilation factor, assigned to each target. The computation of the distance map is done in two steps. First, a Euclidean distance transform21 on the target mask is computed, and then the directed Hausdorff distance22 on each target is determined. These two steps are shown in Fig. 9. 1. Euclidean distance transform. The Euclidean distance between pixels 共i , j兲 and 共m , n兲 is defined as de共共i, j兲,共m,n兲兲 = 冑共i − m兲2 + 共j − n兲2 . The Euclidean distance transform on the target mask calculates the Euclidean distance between the pixels of the target and the background. The pixel values of the resulting image D共i , j兲 are computed as follows: D共i, j兲 = min兵de共T共i, j兲,T共m,n兲兲兩T共m,n兲 = 0其. m,n
This value represents the minimum distance of each pixel of the target mask to the background. Figure 9 illustrates an example of a Euclidean transform on an image with two targets. The Euclidean distance of a
Fig. 9 Distance map computation. The Euclidean transform of the test masks is computed first. It represents the Euclidean distance of each target pixel 共in white兲 to the background 共in black兲. For each target, the maximum of the Euclidean distance transform is taken; it’s the directed Haussdorf distance of the target, which will be stored in the distance map.
031104-7
Jul–Sep 2008/Vol. 17(3)
Le Meur, Vignolle, and Chanussot: Practical use of receiver operating characteristic analysis…
Fig. 10 Target dilation process. 共a兲 The two targets 共in white兲 are dilated with respect to the dilation factors stored in the distance map. The structuring elements used for these dilations are displayed in gray. 共b兲 The result of this dilation. The target dilation process allows a dilation of the target mask suited to the shape of each target, thanks to the distance map previously computed.
Fig. 11 Test dilation process. Each pixel in the test mask 关white pixels in 共a兲兴 is dilated by a circular structuring element 共gray dashed lines兲 with respect to the dilation factor stored in the distance map. 共b兲 The target mask 共ground truth兲 is superposed by a continuous gray line. Pixels that did not hit a target are not dilated.
target to the background depends on the shape of the target. 2. Directed Haussdorf distance. The final step is to extend the values computed at a pixel level to the whole target. For each target, we take the maximum value in the distance transform image of the pixels belonging to the target. Let L共i , j兲 be a labeled version of the target mask. L共i , j兲 = k if the pixel 共i , j兲 belongs to target k, k = 1 , . . . , n targets, and L共i , j兲 = 0 if 共i , j兲 belongs to the background. The value stored in the final distance map K共i , j兲 is
consider to have a size similar to the size of the object. This is why a dilation factor computed with a directed Haussdorf distance is used for the test dilation. This is a reasonable argument since, for the inspection of one detected pixel, the human expert will adapt his observation scale to the context surrounding the pixel, i.e., to the target. The tpr and fpr are thus computed as follows:
K共i, j兲 = max兵D共m,n兲兩L共m,n兲 = L共i, j兲其. m,n
4.3.2 Target dilation Target dilation aims at dilating each target of the target mask T共i , j兲 according to a dilation factor computed by the process described earlier 共see Fig. 10兲. For each target, a circular structuring element of radius Ki,j is used for the dilation. This target dilation permits us to create an area around each target where detected pixels won’t be counted as false alarms. The target-adapted aspect of these dilation is one of the main contributions of the proposed method: It does not require any human intervention to set an arbitrary dilation factor, as proposed in the last articles on this subject 共see, e.g., Ref. 20兲. Moreover, this dilation allows us to use the method on multitarget situations since the dilation factors are adapted to each target of T共i , j兲.
tpr =
1 兺 TestDil共M i,j,Ki,j兲 · Ti,j , P i,j
共7兲
fpr =
1 兺 M i,j · 共1 − TargDil共Ti,j兲兲, N i,j
共8兲
where TestDil stands for the test dilation process and TargDil for the target dilation process. The results of the proposed soft comparison method on the three test defects are shown in Fig. 12. The first line of this figure represents the target dilation required for the computation of fpr: The initial target is in light gray, and the dilated target is in dark gray, while the detected pixels are in white. The target dilation is designed to expand the target’s borders, preserving the global shape of the target. Here, the dilation technique leads to approximately double the distance between the Hausdorff distance of each target. Two questions may then be raised:
4.3.3 Test dilation The test dilation is done on the test mask M共i , j兲. Each pixel of this mask is dilated by a circular structuring element whose radius is determined by the value K共i , j兲 stored in the distance map, as illustrated by Fig. 11. This test dilation expresses the fact that exhaustive detection of the target is not required for our semi-automated inspection task, as discussed in the first point of Section 2. Then one detected pixel has more impact on the tpr in our proposed method or in Theiler’s method than in the pixel-level method. But, contrary to Theiler’s method, the test dilation does not permit a tpr of 100% to be reached in all situations. It is suited to the observation window, which we Journal of Electronic Imaging
031104-8
1. Why does the dilation factor depend on the target’s size? 2. Why should the dilation factor be set as described 共i.e., one times the maximum distance to the borders of each target兲? The following answers can be given: 1. A large defect is observed at a larger scale. Consequently, the ground truth is less precise than for a very small defect. Thus, a larger error for the near-hit pixels 共not counted as false alarms兲 should be allowed. 2. We can make the reasonable assumption that a defect is observed with a window with a dimension approximately two times larger than it. The target dilation processed mimics the observation windows by allowJul–Sep 2008/Vol. 17(3)
Le Meur, Vignolle, and Chanussot: Practical use of receiver operating characteristic analysis…
Fig. 12 Dilated target mask superposed with test mask 共first line兲 and dilated target mask superposed with dilated test mask 共second line兲. The dilation of the test mask leads to an increase in the tpr in the case of pixels detected inside a compact target. For isolated pixels, as we require a high precision in their localization, the test mask dilation has no effect on the tpr.
ing detected pixels in an area of such size around the defect. This dilation is an auto-adapted scheme that does not require any parameter. The result of this target dilation is that some detected pixels near the cluster or spot defects are not longer considered false alarms since they now lie within the dilated target mask. The second line of Fig. 12 shows the result of the test dilation on detected pixels 共in white兲, with the dilated target mask superposed in gray. Each detected pixel lying in an initial target 共in light gray兲 is dilated before the computation of tpr. This dilation mimics the human expert’s behavior: The focus is not only on the detected pixels, but also on the surrounding pixels. Then it is consistent to consider the area around these detected pixels for the computation of tpr, as explained in the test dilation process 共Section 4.3.3兲. A consequence of this test dilation is that only some central points have to be detected in order to reach 100% tpr for a target. Typically, the detection of the skeleton23,24 of the target is sufficient to reach such tpr, which is in accordance with the visual inspection task. The central points of a target are sufficient to identify the full target. 4.3.4 ROC point computation by the soft method The corresponding values of fpr and tpr are shown in Table 3. For punctual defects, there is no change compared to the previous methods. For spot defects, the tpr reaches 100%, which is in great accordance with a human interpreJournal of Electronic Imaging
tation. Since the defect is small, the pixels flagged by the detection algorithms will lead to the identification of the defect: The human expert will focus the observation window on these pixels and will see the whole defect. The fpr has slightly dropped due to the pixels at the bottom of the defect, which are excluded from the fpr; indeed, they are “near hit.” For the cluster defect, the fpr has dropped for the same reason, while the tpr now reaches 80%. As a matter of fact, the top part of the defect is not considered detected by our method. This is a relevant interpretation since the cluster defect is extended and is made of two defective areas. Only the bottom one is actually detected by the algorithm. In this situation, Theiler’s method gives a tpr of 100%. It does not take into account the missed top part of the defect. In the presented cases, the proposed soft mask comparison gives tpr and fpr results that are consistent with the
Table 3 tpr and fpr for soft mask comparison. fpr 共%兲
tpr 共%兲
Punctual
0.12
50
Spot
0.03
100
Cluster
0.07
80
Defect
031104-9
Jul–Sep 2008/Vol. 17(3)
Le Meur, Vignolle, and Chanussot: Practical use of receiver operating characteristic analysis…
Fig. 13 Comparison of the detection performance of two algorithms. The first column displays the original image of the defect, the second column is the output of the detection algorithms 共grayscale image兲, and the third column is the thresholding of the previous image at a particular decision threshold, with the corresponding AOI 共gray square windows兲 that the human expert will check. The last column shows the ground truth with the AOI to be checked. Algorithm A raises one false alarm.
expert requests for detection performance assessment. This method performs a morphological dilation of the target and test masks and is auto-adapted to the multitarget problem since the dilation factors are computed only one time for each target without any parameter. 5 Application of Proposed Method 5.1 Performance Assessment of Detection Algorithms by ROC Curves In this section, we explain in detail two possible uses of our soft ROC method in the field of detection performance assessment. First, our method is used for the performance assessment of two algorithms on one kind of defect 共Section 5.1.1兲, and for comparing detection results of one algorithm on two types of defect 共Section 5.1.2兲. The second use of our soft method is focused on the calibration of a detection algorithm. We show how our method can be used as a framework to properly calibrate algorithms in accordance with the defect inspection application 共Section 5.2兲.
5.1.1 Assessment of the performances of two algorithms on one type of defect ROC analysis is a simple tool to compare the performances of different detection algorithms by comparing their respective AUC. In this example, we want to compare two detection algorithms, namely “algorithm A” and “algorithm B,” with ROC analysis. Algorithm outputs are displayed in the second column of Fig. 13. Pixels detected as defective take high gray-level values 共bright pixels兲. The corresponding ROC curves are plotted in Fig. 14, with the corresponding AUC reported in Table 4. A thresholding of these outputs is shown in the third column of Fig. 13. The ROC curves and AUC computed with the pixellevel method clearly show that algorithm A provides better detection performance than algorithm B. On the contrary, our soft method leads to the opposite conclusion. Theiler’s method gives no information; due to its hit-or-miss strategy, it is extremely sensitive to the tpr. In our application, this method is not appropriate. To fix the ambiguity raised by
Fig. 14 ROC curves computed with pixel-level, Theiler’s and soft mask comparison methods for algorithm A 共black curve兲 and algorithm B 共dashed gray curve兲. The pixel-level curves suggest a better detection performance for algorithm A, whereas our soft method suggests a better detection performance for algorithm B. Theiler’s mask comparison cannot discriminate between the two algorithms since they are said to be perfect with this method. Journal of Electronic Imaging
031104-10
Jul–Sep 2008/Vol. 17(3)
Le Meur, Vignolle, and Chanussot: Practical use of receiver operating characteristic analysis… Table 4 AUC of algorithms A and B computed with the three ROC measures: pixel-level, Theiler’s, and our soft method. AUC pixel-level ROC
AUC Theiler’s ROC
AUC soft ROC
Algorithm A
0.81
1
0.945000
Algorithm B
0.72
1
0.999947
Defect
the pixel-level and soft ROC analysis, we show in Fig. 13 that algorithm B, as a matter of fact, performs better than algorithm A in this situation. The third column features the result of a threshold applied on the figures from the second column. The threshold was set as follows: • for algorithm A: the decision threshold chosen is the one that maximizes the tpr and raises only one false alarm; • for algorithm B: the decision threshold chosen is the one that maximizes the tpr without raising any false alarms. Comparing the test masks obtained with the ground truth 共fourth column of Fig. 13兲, one can notice that algorithm B allows a better detection of the defect. In the meantime, algorithm A generates a false alarm, which is close to the defect but sufficiently disconnected to be considered a false alarm. In fact, for our semi-automated inspection task, algorithm B gives better results than algorithm A. As a conclusion, the proposed soft ROC analysis ensures an evaluation of detection algorithms that better meets the user needs.
5.1.2 Assessment of one algorithm on two types of defect As explained in Section 3, ROC analysis can be performed on data with unbalanced class repartition, as the tpr and fpr are not sensitive to particular class prevalence. Then, ROC analysis can be used to compare detection results of one algorithm when facing different kinds of defect with various shapes and sizes. Figure 15 shows the results of a detection algorithm on the defects called “Lines” and “Spot,” respectively. The corresponding defect image, the detection result 共grayscale image兲, test mask 共previous image thresholding兲, and comparison with the target mask are displayed. To know which defects the algorithm performs well, ROC curves are plotted for the pixel-level method, Theiler’s method, and the proposed soft method 共see Fig. 16兲. The corresponding AUC are reported in Table 5. From the results obtained by the pixel-level ROC analysis, the conclusion would be that the Lines defect is detected better than the Spot defect 共the corresponding AUC is larger兲. No conclusion can be drawn from the results of Theiler’s method 共both AUC are too close to 1兲. In many practical cases, we have observed that Theiler’s method is not discriminant enough. Very often, detection performances are considered perfect, thus preventing any useful comparison. Finally, if we look at the results obtained by our soft method, a conflicting conclusion can be drawn: Soft ROC analysis gives a better score to detection on Spot than on Lines. As explained in Fig. 15, Spot is actually better detected than Lines with this algorithm. In this figure, two test masks have been extracted 共third column兲. Considering the Spot defect, some pixels have been detected at the center of the defect, without any false alarms. For the human expert who will check the detected pixels with a visualization window 共gray square boxes in Fig. 15兲, the Spot defect will be fully detected. On the other hand, considering the Lines
Fig. 15 Comparison of detection performance on the Lines and Spot defects. The first column displays the original image of the defects, the second column is the output of the detection algorithm 共grayscale image兲, and the third column is the thresholding of the previous image at a particular decision threshold, with the corresponding AOI 共gray square windows兲 The human expert will check. The last column shows the ground truth with the AOI to be checked. For the Lines defect, some of those AOI are false alarms. Journal of Electronic Imaging
031104-11
Jul–Sep 2008/Vol. 17(3)
Le Meur, Vignolle, and Chanussot: Practical use of receiver operating characteristic analysis…
Fig. 16 ROC curves computed with pixel-level, Theiler’s, and soft mask comparison methods for the Lines 共black curve兲 and Spot 共dashed gray curve兲 defects. The pixel-level curves suggest better detection for the Lines defect, whereas our soft method suggests better detection for the Spot defect. Theiler’s mask comparison cannot distinguish between the two detections since they are considered perfect with this method.
defect, Fig. 15 shows that the defect is detected only partially. In the meantime, the detection raised false alarms. This pair of test masks demonstrates that, for our semiautomated inspection task, the detection algorithm considered reaches a better performance on Spot than on Lines, which is the conclusion given by soft ROC analysis. In this case, the pixel-level ROC analysis would cause an assessment mistake. As a conclusion, the proposed soft ROC analysis gives a better performance assessment than the standard evaluation method. 5.2 Automated Calibration of Detection Algorithms Our proposed soft mask comparison method to compute the tpr and fpr has been introduced in the previous section. In this section, we will show how to use this method to make an automatic thresholding of images. Detection algorithms usually provide grayscale images where bright pixels are defective pixels and dark ones are background pixels. To get a test mask from this grayscale detection image, we need to set a decision threshold in order to binarize the detection image. When the target mask is known, we can use the ROC methodology to automatically set the decision threshold at a value leading to a given tpr or fpr. In this situation, the soft mask comparison allows us to get test masks that are more consistent with the given tpr and fpr. Considering the defects introduced in Fig. 6, we need the thresholded image of a grayscale image provided by a detection algorithm. We want the thresholding image at a tpr of 100% 共full target image兲 with the minimum value of the fpr. This is a typical image observed to evaluate the detection performances of an algorithm. Figure 17 shows Table 5 AUC measured on the Lines and Spot defects and with the three ROC measures: pixel-level, Theiler’s, and our soft method. Defect AUC pixel-level ROC AUC Theiler’s ROC AUC soft ROC Lines
0.885
1
0.986
Spot
0.763
1
1.000
Journal of Electronic Imaging
the full target images obtained, for the cluster and Gauss defects, with two different mask comparison methods introduced earlier: the pixel-level method 共Section 4.1兲 and the proposed soft method 共Section 4.3兲. For the cluster defect, Fig. 17共b兲 spotlights the high sensitivity of the pixel-level method. To get a tpr = 100% with our detection algorithms 共images in the left column of Fig. 17兲, nearly all the pixels of the image should be detected. Consequently, numerous false alarms are raised, and one could believe that the algorithm performs pretty poorly on this defect. It is not the case, and the soft mask comparison 关Fig. 17共c兲兴 allows us to avoid such a mistake in this case. The thresholded image shows perfect detection in our semiautomated inspection context 共the detected pixels are sufficient to localize the whole defect兲 with only a few false alarms. The same conclusion can be made with the thresholded images of Figs. 17共e兲 and 17共f兲. In these two cases, the pixel-level comparison, due to its pixel sensitivity, leads to overdetection 共too many false alarms兲 while the proposed method, using the new definition of tpr, gives relevant binarized images. 6 Possible Extensions Our method is a first step toward a high-level mask comparison that will be fully adapted to the postprocessing inspection made by the human expert. Some immediate extensions of this method may be developed. First, for the fpr computation, we should take into account the size of the AOIs that will be presented to the expert for a visual inspection. Our method makes a pixel-by-pixel count of false alarms, which corresponds to a pixel-by-pixel inspection of these alarms, i.e., a size of 1 pixel for the AOI. We should rather consider the real size of the inspection system to gather clusters of false alarms in one single false alarm. This could be done by dividing the image into square windows of the same size as the AOIs and by counting the number of windows where false alarms actually occur. The number of false alarms would then measure the number of times an AOI without any real defect nevertheless has to be checked following the same idea, we can also perform a dilation of the false alarms by the size of AOIs in order to merge clusters of false alarms into one single false alarm.
031104-12
Jul–Sep 2008/Vol. 17(3)
Le Meur, Vignolle, and Chanussot: Practical use of receiver operating characteristic analysis…
Fig. 17 Thresholded images at tpr = 100% with respect to two mask comparison method: pixel-level and our proposed soft mask comparison. The pixel-level method leads to overdetection, with a high resulting fpr. On the contrary, the soft method provides the expected images for tpr = 100%. We do not need more pixels to localize and identify the cluster and Gauss defects.
In this scheme, we have to manage the problem of normalization of the number of false alarms before computing the fpr. Second, the tpr has been normalized by the number P 关see Eq. 共7兲兴 of defective pixels in the image. This choice has been made in order to give a quick interpretation of the tpr, but it can be too restrictive in certain cases. Consider an image made of two defects—one large and one punctual 共one single defective pixel兲. A detection algorithm that correctly detects the first defect but misses the second one will achieve a fairly high tpr, whereas one target out of two has actually been missed. To avoid such situations, we should rather do an object-based normalization: The tpr is computed for each target and the overall tpr for the image is computed by averaging all these rates. Then targets of different size with the same share of well-detected pixels would have the same impact on the overall tpr value. Moreover, if the AOI size is known, we should rather define a constant dilation factor for the test dilation that fits this size. To conclude, several improvements to the proposed method can be made by using a priori knowledge potentially available for the different stages of the detection system. However, the global performance assessment scheme remains unchanged since we still consider fuzzy areas for fpr and extended detected areas to compute tpr. 7 Conclusion The inspection of defects on large images is a very tedious task. Thus, in order to help the human expert, many autoJournal of Electronic Imaging
mated processes and image processing algorithms have been developed to detect potentially defective areas. Assessing the actual quality and performance of these detection algorithms is then of the utmost importance and must be dealt with respect to the inspection context. ROC analysis is a proven methodology to compare such algorithms, but it has some limitations when facing complex situations 共various sizes/shapes/types of defects兲. To overcome these limitations, we propose a method to compute true-positive and false-positive rates in a way that is consistent with semi-automated inspection application. This method uses simple object-based morphological dilations to extend the pixel-level definitions of ROC quantities to more objectrelated ones. Thus, fuzzy areas are automatically defined around each object to exclude near hits from the falsealarm count. In the meantime, true positives are linked to the visual inspection problem by defining a dilation scheme to mimic human expert inspection. This way of using the ROC methodology on practical cases provides a more reliable assessment of defect detection algorithms and allows a better calibration of semi-automated quality-control systems.
References 1. A. Kumar and G. K. Pang, “Defect detection in textured materials using optimized filters,” IEEE Trans. Syst., Man, Cybern., Part B: Cybern. 32共5兲, 553–570 共2002兲. 2. D.-M. Tsai and T.-Y. Huang, “Automatic surface inspection for statistical textures,” Image Vis. Comput. 21, 307–323 共2003兲.
031104-13
Jul–Sep 2008/Vol. 17(3)
Le Meur, Vignolle, and Chanussot: Practical use of receiver operating characteristic analysis… 3. C. Wolf and J.-M. Jolion, “Object count/area graphs for the evaluation of object detection and segmentation algorithms,” Int. J. Doc. Anal. Recog. 8共4兲, 280–296 共2006兲. 4. G. Liu and R. Haralick, “Assignment problem in edge detection performance evaluation,” in Proc. IEEE Conf. Comp. Visi. Patt. Recog. CVPR 2000, pp. 1026–1031 共2000兲. 5. J. C. Nascimento and J. S. Marques, “Novel metrics for performance evaluation of object detection algorithms,” in Proc. 1st ISR Workshop on Systems, Decision and Control Robotic Monitoring and Surveillance 共2005兲. 6. V. Y. Mariano, J. Min, J.-H. Park, R. Kasturi, D. Mihalcik, H. Li, D. Doermann, and T. Drayer, “Performance evaluation of object detection algorithms,” in Proc. Int. Conf. Patt. Recog., 3, 30965 共2002兲. 7. J. M. James Keller and P. Gader, “A fuzzy logic approach to detector scoring,” in Proc. Fuzzy Info. Process. Soci. NAFIPS, 20, 339–344 共1998兲. 8. Y. Le Meur, J.-M. Vignolle, and J. Chanussot, “A practical use of ROC analysis to assess the performances of defects detection algorithms,” in Proc. SPIE 6356, 635616 共2007兲. 9. D. Green and J. Swets, Signal Detection Theory and Psychophysics, Wiley, New York 共1966兲. 10. D. Dorfman and E. J. Alf, “Maximum likelihood estimation of parameters of signal detection theory—a direct solution,” Psychometrica 33, 117–124 共1968兲. 11. J. Egan, Signal Detection Theory and ROC Analysis, Academic Press, New York 共1975兲. 12. J. Hanley, “Receiver operating characteristic 共ROC兲 methodology: The state of the art,” Crit. Rev. Diagn. Imaging 29共3兲, 307–335 共1989兲. 13. J. Hanley and B. McNeil, “The meaning and use of the area under a receiver operating characteristic 共ROC兲 curve,” Radiology 143共29兲, 29–36 共1982兲. 14. C. E. Metz, “ROC methodology in radiologic imaging,” Invest. Radiol. 21, 720–733 共1986兲. 15. T. Fawcett, “ROC graphs: Notes and practical considerations for researchers,” Technical report, HP Labs 共2004兲. http:// home.comcast.net/~tom.fawcett/public_html/papers/ROC101.pdf. 16. J. M. Irvine, “Assessing target search performance: The free-response operator characteristic model,” Opt. Eng. 43, 2926–2934 共2004兲. 17. P. A. Flach, “Tutorial on the many faces of ROC analysis in machine learning,” in Proc. Int. Conf. Machine Learning 共http:// www.cs.bris.ac.uk/flach/ICML04tutorial/兲 共2004兲. 18. C. Cortes and M. Mohri, “AUC optimization vs. error rate minimization,” in Proc. Adv. Neural Info. Proces. Syst. (NIPS 2003) 16, MIT Press, Cambridge 共2003兲. 19. J. Theiler, N. Harvey, and J. M. Irvine, “Approach to target detection based on relevant metric for scoring performance,” in Proc. 33rd Appl. Imagery Patt. Recog. Workshop (AIPR’04), 184–189 共2004兲. 20. N. R. Harvey and J. Theiler, “Focus-of-attention strategies for finding discrete objects in multispectral imagery,” Proc. SPIE 5546, 179–189 共2004兲. 21. A. Jain, Fundamentals of Digital Image Processing, Prentice Hall, Englewood Cliffs, NJ 共1995兲. 22. W. Rucklidge, Efficient Visual Recognition Using the Hausdorff Distance, Springer-Verlag, New York 共1996兲. 23. H. Blum, “A transformation for extracting new descriptors of shape,” in Models for the Perception of Speech and Visual Form, W. WathenDunn 共Eds.兲, pp. 362–380, MIT Press, Cambridge 共1967兲. 24. R. C. Gonzales and R. E. Woods, Digital Image Processing, 2nd Ed., Prentice Hall, Englewood Cliffs, NJ 共2002兲.
Jean-Michel Vignolle graduated in general engineering from Ecole Centrale Paris, France, in 1987, with a speciality in bioengineering, and in the same year received his MS degree in spectrochemical analysis methods from Paris VI University. In 1988–1989, he worked at Thomson Central Research Labs on materials for neural network hardware implementation, then fiber optics sensors. In 1990, he joined Thales Avionics, where he was responsible for various LCD display design for projection in direct view. His activities included microelectronics design, electrical design, mechanical design, and optical design. In 1998, he joined Trixell as technical project manager on various x-ray detector design projects. Since 2002, he has been in charge of the Image Group, a group of engineers dedicated to the development of image processing, image correction algorithms, image quality measurement tools, and methods. Jocelyn Chanussot graduated in electrical engineering from the Grenoble Institute of Technology 共INP Grenoble兲, France, in 1995. He received his PhD degree from Savoie University, Annecy, France, in 1998. He was with the Automatics and Industrial Micro-Computer Science Laboratory 共LAMII兲. In 1999, he worked at the Geography Imagery Perception Laboratory 共GIP兲 for the Délégation Générale de l’Armement 共DGA—French National Defense Department兲. Since 1999, he has been with INP Grenoble as an assistant professor 共1999–2005兲, associate professor 共2005– 2007兲, and professor 共2007–兲 of signal and image processing. He is conducting his research at GIPSA-Lab 共Grenoble Image Speech Signals and Automatics Laboratory兲. His research interests include statistical modeling, classification, image processing, nonlinear filtering, remote sensing, and data fusion. Dr. Chanussot is currently serving as an associate editor for the IEEE Transactions on Geoscience and Remote Sensing and for Pattern Recognition. He is the co-chair of the GRS Data Fusion Technical Committee and a member of the Machine Learning for Signal Processing Technical Committee of the IEEE Signal Processing Society. He has authored or co-authored over 65 publications in international journals and conferences. He is a senior member of the IEEE.
Yann Le Meur graduated in electrical engineering from the Grenoble Institute of Technology 共INP Grenoble兲, France, in 2004 and received his MS degree in signal and image processing from INP Grenoble the same year. In 2004, he led a six-month MS thesis project at the Center National d’Etudes Spatiales 共French space agency兲, Toulouse, France, where he worked on multi-temporal remote sensing images. He is now a PhD candidate at GIPSA-Lab 共Grenoble Image Speech Signals and Automatics Laboratory兲 and Trixell, Moirans, France. His research interests include image processing, objects detection, image statistical analysis, image quality assessment, and data fusion, especially kernel-based methods.
Journal of Electronic Imaging
031104-14
Jul–Sep 2008/Vol. 17(3)