Chapter 4: Performance Characterization of Edge Detection Schemes 30. 4.1 Introduction . ...... Ideal edge gradient values { 0, 3, 3.5, 4, 8, 10, 100 ...... for building user interfaces), and python (a new interpreted language) as possible ways.
Performance Characterization of Image Understanding Algorithms by Visvanathan Ramesh A dissertation submitted in partial ful llment of the requirements for the degree of Doctor of Philosophy University of Washington 1995 Approved by Program Authorized to Oer Degree Date
(Chairperson of Supervisory Committee)
c Copyright 1995
Visvanathan Ramesh
In presenting this dissertation in partial ful llment of the requirements for the Doctoral degree at the University of Washington, I agree that the Library shall make its copies freely available for inspection. I further agree that extensive copying of this dissertation is allowable only for scholarly purposes, consistent with fair use as prescribed in the U.S. Copyright Law. Requests for copying or reproduction of this dissertation may be referred to University Micro lms, 1490 Eisenhower, Place, P.O. Box 975, Ann Arbor, MI 48106, to whom the author has granted the right to reproduce and sell (a) copies of the Manuscript in microform and/or (b) copies of the manuscript made from microform. Signature Date
University of Washington Abstract
Performance Characterization of Image Understanding Algorithms by Visvanathan Ramesh Chairperson of Supervisory Committee:
Professor Robert M Haralick Department of Electrical Engineering
Image Understanding (IU) Systems are complex and they are composed of dierent algorithms applied in sequence. A system for model-based recognition has three essential components: feature extraction, grouping and model matching. In each of these components, tuning parameters (thresholds) are often used. These parameters have been traditionally chosen by trial and error or from empirical data. In this dissertation we discuss a methodology for the analysis and design of IU algorithms and systems that follows sound systems engineering principles. We illustrate how the algorithm parameters can be optimally selected for a given image understanding algorithm sequence that accomplishes an IU task. The essential steps for each of the algorithm components involved are: component identi cation (performance characterization), and application domain characterization (achieved by an annotation). There is an optimization step that is used to optimize a criterion function relevant to the nal task. Performance characterization of an algorithm involves the establishment of the correspondence between random perturbations in the input to the random perturbations in the output. This involves the setup of the model for the output random perturbations for a given ideal input model and input random perturbation model. Given these models and a criterion function, it is possible to characterize the performance of the algorithm as a function of its tuning parameters and automatically set the tuning parameters. The speci cation of the model for the population of ideal input data varies with problem domain. Domain-speci c prior information on the
parameters that describe the ideal input data is gathered during the annotation step. Appropriate theoretical approximations for the prior distributions are then speci ed, validated and utilized in computing the performance of the algorithm over the entire input population. Tuning parameters are selected to optimize the performance over the input population.
TABLE OF CONTENTS List of Figures
v
Chapter 1:
Introduction.
1
Chapter 2:
Computer Vision Algorithm Sequences
Chapter 3:
Methodology for Performance Characterization
1.1 1.2 1.3 1.4
Background : : : : : : : : : : : : : Systems Engineering Methodology : Contributions : : : : : : : : : : : : Organization : : : : : : : : : : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
2.1 Example Vision Algorithm Sequences : : : : : : : : : : 2.1.1 Sequence based on line/arc feature extraction : 2.1.2 Sequence based on interest point extraction : : 2.1.3 Sequence based on Segmentation : : : : : : : : 2.2 Feature Extraction and Grouping Techniques : : : : : : 2.2.1 Edges & Edge detection : : : : : : : : : : : : : 2.2.2 Gray Scale Corner Detection : : : : : : : : : : : 2.2.3 Interest point detection in a contour : : : : : : 2.2.4 Edge pixel Grouping methods : : : : : : : : : : 2.3 Matching Techniques : : : : : : : : : : : : : : : : : : : 2.4 Performance Evaluation of Vision Algorithms { Review
: : : : : : : : : : : : : : :
: : : : : : : : : : : : : : :
: : : : : : : : : : : : : : :
3.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : 3.2 Problem Statement : : : : : : : : : : : : : : : : : : : : : : : 3.2.1 IU Algorithm Sequences : : : : : : : : : : : : : : : : 3.2.2 Optimization of Performance of Algorithm Sequences 3.3 Building Detection Example : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : :
1 2 4 6
8
8 9 9 10 11 11 14 15 15 16 17
20 20 21 22 25 26
Chapter 4:
Performance Characterization of Edge Detection Schemes 30
4.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 4.1.1 Edge detection { Ideal Data & Perturbation Model : : : : : : 4.1.2 Output Ideal Model & Perturbation Model : : : : : : : : : : : 4.2 Relationship between Output Perturbations and Input Perturbations (Gradient Based Edge Detector) : : : : : : : : : : : : : : : : : : : : : 4.2.1 Probability of Misdetection of a Gradient Edge : : : : : : : : 4.2.2 Probability of False alarm at the edge detector output : : : : 4.2.3 Edgel Orientation Estimate Distribution : : : : : : : : : : : : 4.2.4 Positional Error Analysis of Gradient Based Edge Detectors : 4.3 Relationship of Output Perturbations to Input Perturbations (Edge Detection with Hysteresis Linking) : : : : : : : : : : : : : : : : : : : 4.3.1 Analysis of Edge Operator with Hysteresis Thresholds : : : : 4.4 Relationship between Output Perturbations and Input Perturbations (Morphological edge detectors) : : : : : : : : : : : : : : : : : : : : : : 4.4.1 Blur-min edge detector { Description : : : : : : : : : : : : : : 4.4.2 Distributions of Grayvalues in Dilation and Erosion Residues 4.4.3 Distribution of Edge Strength : : : : : : : : : : : : : : : : : : 4.4.4 Probability of False alarm : : : : : : : : : : : : : : : : : : : : 4.4.5 Probability of Misdetection : : : : : : : : : : : : : : : : : : : 4.5 Experimental Protocol : : : : : : : : : : : : : : : : : : : : : : : : : : 4.5.1 Image generation : : : : : : : : : : : : : : : : : : : : : : : : : 4.5.2 Edge pixel localization error { Evaluation Procedure : : : : : 4.6 Results : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 4.7 Conclusion : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
Chapter 5:
30 31 32
33 33 35 35 36 38 38 40 40 41 45 46 46 47 48 48 50 59
Random Perturbation Models for Line Extraction Sequence 63
5.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : 5.2 Boundary Extraction: - Ideal Data & Perturbation Model 5.3 Edge linking or grouping step { Analysis : : : : : : : : : : 5.3.1 Probability of correct edge grouping : : : : : : : : : ii
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
63 64 66 66
5.4 Perturbation model at edge detector/linker output { Misdetection : : 5.4.1 Perturbation model at edge detector/linker output { Properties 5.5 Gap lling algorithm { Analysis : : : : : : : : : : : : : : : : : : : : 5.6 Perturbation Models for Random Entities : : : : : : : : : : : : : : : 5.7 Perturbation Models incorporating dependencies between estimates : 5.8 Discussion : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 5.9 Conclusion : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
Chapter 6:
68 69 73 77 79 80 81
Experimental Protocol for performance characterization of Boundary Extraction Schemes 82
6.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 6.2 Theoretical Results { Brief Review : : : : : : : : : : : : : : : : 6.2.1 Random Perturbation models for a line nding sequence 6.3 Experimental Protocol : : : : : : : : : : : : : : : : : : : : : : : 6.3.1 Image generation : : : : : : : : : : : : : : : : : : : : : : 6.3.2 Line detection sequence { Evaluation procedure details : 6.4 Experiments and Results : : : : : : : : : : : : : : : : : : : : : : 6.5 Conclusion : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : :
: : : : : : : :
: : : : : : : :
82 82 82 83 84 85 88 93
Chapter 7:
Theoretical Analysis of the Bayesian Corner Finder
Chapter 8:
Random Perturbation models for Corner Extraction Sequence 107
7.1 7.2 7.3 7.4 7.5 7.6 7.7
Introduction : : : : : : : : : : : : : : : : : : : : Bayesian Corner Detector : : : : : : : : : : : : Theoretical Derivations : : : : : : : : : : : : : : Probability of False Alarm : : : : : : : : : : : : Distribution of the estimated break point index Discussion : : : : : : : : : : : : : : : : : : : : : Conclusion : : : : : : : : : : : : : : : : : : : : :
: : : : : : :
: : : : : : :
: : : : : : :
: : : : : : :
: : : : : : :
: : : : : : :
: : : : : : :
: : : : : : :
: : : : : : :
: : : : : : :
: : : : : : :
: : : : : : :
94
94 95 96 97 98 102 106
8.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 107 8.2 Features and the annotation process : : : : : : : : : : : : : : : : : : : 108 8.3 Image Domain and Annotation : : : : : : : : : : : : : : : : : : : : : 109 iii
8.4 8.5
8.6
8.7
8.3.1 Conventions used in the annotation process : : : : : : : : : : 110 8.3.2 Groundtruth Generation and statistics computation : : : : : : 112 Algorithm Sequence : : : : : : : : : : : : : : : : : : : : : : : : : : : : 114 Perturbation Models : : : : : : : : : : : : : : : : : : : : : : : : : : : 117 8.5.1 Perturbation models for I/P at the edge detector : : : : : : : 117 8.5.2 Perturbation models for O/P at the edge detector and linker 118 8.5.3 Perturbation models for O/P and I/P data of the Corner Detector119 Statistics Computation from Groundtruth and Parameter Selection : 119 8.6.1 Edge Detection { Statistics and Threshold Selection : : : : : : 119 8.6.2 Thinning { Parameter Selection : : : : : : : : : : : : : : : : : 127 8.6.3 Chain Length Threshold selection : : : : : : : : : : : : : : : : 127 8.6.4 Corner Extraction { Statistics & Parameter Selection : : : : : 128 8.6.5 Results on RADIUS Dataset : : : : : : : : : : : : : : : : : : : 129 Conclusion : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 133
Chapter 9:
Integrated Gradient Boundary Extraction
9.1 Introduction : : : : : : : : : : : : : : : : : : : : : : 9.2 Motivation and Algorithm : : : : : : : : : : : : : : 9.2.1 Estimation of Edgel Orientation { Problems 9.2.2 Measure of Edge Strength : : : : : : : : : : 9.2.3 Basic Algorithm : : : : : : : : : : : : : : : : 9.2.4 Physical Model : : : : : : : : : : : : : : : : 9.3 Theoretical Analysis : : : : : : : : : : : : : : : : : 9.4 Experimental Protocol : : : : : : : : : : : : : : : : 9.5 Results/Limitations/Extensions : : : : : : : : : : : 9.6 Conclusion : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
134 134 135 135 137 138 141 142 144 145 146
Chapter 10: Conclusion
156
Bibliography
163
10.1 Summary : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 156 10.2 Ongoing & Future Work : : : : : : : : : : : : : : : : : : : : : : : : : 159
iv
LIST OF FIGURES 4.1 Edge detector Evaluation { Figure illustrates how a detected edge pixels are associated with groundtruth pixels. : : : : : : : : : : : : : : : 4.2 Theoretical and Empirical Comparisons: (a) gradient based operator gradient estimate distribution, (b) Mean orientation estimate (vs signal to noise ratio), (c) Precision of the orientation estimate. (d) Edge strength distribution for Blur minimum operator : : : : : : : : : : : 4.3 False alarm vs Misdetection plots: (a) gradient based operator, (b) blur-min operator (without correlation eects), (c) comparison and (d) comparison (for 3 by 3 window size) : : : : : : : : : : : : : : : : 4.4 Empirical results on synthetic data and example real images used: (a) Empirical results, (b) Image 1 and (c) Image 2 : : : : : : : : : : : : 4.5 Results with real images: (a) gradient based operator, 5 by 5 window, T = 7, Image 1 (b) blur-min operator (3 by 3 window, T=7), Image 1 , (c) Image 2 { gradient based operator ( 5 by 5 window, T=7 ), and (d) Image 2 { blurmin operator (3 by 3 window, T=7 ) : : : : : : : : 4.6 Plot of Mean edge run length vs Edge strength threshold for various signal to noise ratios. Orientation of the true edge was 15 degrees, Window size 5 by 5 for Morphological operator : : : : : : : : : : : : 4.7 Plot of Mean gap run length vs Edge strength threshold for various signal to noise ratios. Orientation of the true edge was 15 degrees, Window size 5 by 5 for Morphological operator : : : : : : : : : : : : 4.8 Plot of Mean pixel positional error vs Edge strength threshold for various signal to noise ratios. Orientation of the true edge was 15 degrees, Window size 5 by 5 for Morphological operator : : : : : : : : : : : : 4.9 Plot of Mean edge run length vs Edge strength threshold for various signal to noise ratios. Orientation of the true edge was 15 degrees, Window size 5 by 5 for Gradient operator : : : : : : : : : : : : : : : v
49
55 56 57
58 59 60 60 61
4.10 Plot of Mean gap run length vs Edge strength threshold for various signal to noise ratios. Orientation of the true edge was 15 degrees, Window size 5 by 5 for Gradient operator : : : : : : : : : : : : : : : 61 4.11 Plot of Mean pixel positional error vs Edge strength threshold for various signal to noise ratios. Orientation of the true edge was 15 degrees, Window size 5 by 5 for Gradient operator : : : : : : : : : : : : : : : 62 6.1 Line detector Evaluation { Figure illustrates how detected segments are associated with a groundtruth segment. : : : : : : : : : : : : : : : 6.2 Mean length of largest detected segments vs T : : : : : : : : : : : : : 6.3 Mean number of breaks vs T : : : : : : : : : : : : : : : : : : : : : : : 6.4 Mean Segment Length vs T : : : : : : : : : : : : : : : : : : : : : : : 6.5 Mean Gap Length vs T : : : : : : : : : : : : : : : : : : : : : : : : : :
87 89 90 91 92
7.1 Histogram of estimated corner index (various corner angles). : : : : : 104 7.2 Histogram of estimated corner index (various noise levels). : : : : : : 105 Original image : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Annotated Image : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Image Analysis Algorithm Sequence : : : : : : : : : : : : : : : : : : : Perturbation models and Criterion Functions : : : : : : : : : : : : : : Statistics computation from Training Data. Top left gure illustrates statistics related to edges. Top right gure illustrates chain length statistics computation from groundtruth chains. Bottom gure illustrates chain length statistics computation for clutter features. : : : : 8.6 Empirical Distribution of Edge Gradients : : : : : : : : : : : : : : : : 8.7 Empirical Distribution of Chain lengths : : : : : : : : : : : : : : : : : 8.8 Pixel chains detected for an example image. Subimage of a model board image (top left), Detected edges by setting g = 0 and false alarm rate of 10 percent and misdetect rate of 5 percent (top right), Detected edges for false alarm/misdetect rate of 10 percent (bottom left), Detected edges for false alarm/misdetect rate of 5 percent (bottom right). : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
8.1 8.2 8.3 8.4 8.5
vi
110 111 115 118
120 130 131
132
9.1 False Alarm vs Misdetection Characteristics of various Edge strength estimators. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 9.2 Original Image : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 9.3 Inverted Maximum Integrated Gradient Image { Dark regions are areas of high integrated gradient. : : : : : : : : : : : : : : : : : : : : : : : 9.4 Inverted Minimum Integrated Gradient Image { Dark areas are areas of high minimum integrated gradient. : : : : : : : : : : : : : : : : : : 9.5 Ratio of Minimum to Maximum Integrated Gradient Image : : : : : : 9.6 1.0 - P(Edge), Dark pixels correspond to edge pixels : : : : : : : : : : 9.7 Edge strength image (before thinning, thresholded at P(Edge) = 0.5) 9.8 Aircraft Image : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 9.9 Results on Aircraft Image (Gradient Estimate - Standard Least Squares Planar Fit) : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 9.10 Results on Aircraft Image (Gradient Estimate - Robust) : : : : : : :
vii
144 147 148 149 150 151 152 153 154 155
ACKNOWLEDGMENTS This dissertation was made possible by funding from several organizations. Initial work was funded by the Boeing Corporation. I also thank IBM for awarding me a Manufacturing Research Fellowship from 1991 to 1993. The rest of the project was funded through a grant from the Advanced Research Projects Agency. A number of people have assisted me in this endeavor. Special thanks are due to my advisor, Professor Robert Haralick and to Professor Linda Shapiro. Professor Haralick has been a source of constant inspiration and guidance. His drive for scienti c excellence has pushed me to aspire for the same. Professor Shapiro has been of invaluable assistance and was instrumental in bringing me to Seattle. I cannot thank her enough for her countless gestures of kindness. I thank the other members of my committee, Professors Steven Tanimoto, Eve Riskin, and Les Atlas for graciously agreeing to serve on my committee. Special thanks are due to Professor Daryl Lawton. Despite his busy schedule he agreed to be on the reading committee. Even though my IUE committee work was not directly related to this dissertation, the discussions related to the design of the current IUE were very useful in shaping this dissertation in some form. I should therefore thank the IUE committee members. Terry Boult taught me about the importance of asking questions. Tom Binford constantly provided me encouragement to pursue performance characterization research. I also thank my colleagues in the Intelligent Systems Laboratory. I thank members of the performance characterization group, Ken Thornton, Xining Zhang, Xufei Liu and Anand Bedekar, for their valuable inputs. I thank K. Govindarajan for help with experiments. I also thank Bharath Modayur for comments on earlier drafts of my papers. Special thanks go to Mauro and viii
Wendy Costa for their friendship and support throughout my graduate studies. Their children, Nicholas and Melissa, have brightened many of my weekends. Last but not the least, I thank Mercedes for always being there for me. I thank my parents, my brother, Prakash, and my sister, Latha, for their unconditional love and support.
ix
Chapter 1
INTRODUCTION. 1.1 Background Image Understanding (IU) research is focused on developing algorithms and systems for information extraction and interpretation from two-dimensional image signal data. The two-dimensional image signals are processed to determine the geometric structure of 3D objects and infer 3D relationships between them. IU systems are composed of several algorithms (modules) often applied in sequence. Low-level vision modules extract features (such as edges or boundaries) from images, while higher-level vision modules perform some form of an aggregation (grouping) of the low-level features and match the grouped features with pre-stored models for speci c object classes. Although IU researchers have concentrated on developing speci c modules, little performance characterization of these modules has been done. This dissertation is intended to ll this void in IU research. Several researchers have referred to the need for IU algorithm benchmarking, and there is considerable eort underway in several institutions. Building a reliable IU system is a complicated task. The overall objectives of an IU system may vary with the problem domain and with the speci c requirements. Several fundamental questions can be posed:
How can one guarantee that an IU system meets a speci ed performance criterion ?
The IU system has several tuning parameters, and its performance is directly related to these parameters. How can one adaptively select these tuning parameters so that the IU system performance is optimized over an entire population of inputs ? For example, a number of neighborhood operators have been proposed in order to perform local smoothing and derivative estimation for edge detection. Although a variety of techniques have been developed to extract
2 dierent features from images based on gray-tone characteristics by examining local neighborhoods, not much has been done in exactly quantifying the eects of the free parameters used in these techniques. For example, in most image processing operations, neighborhood size is a free parameter and often this is the user's choice. One may have to resort to trial and error in order to nd the best neighborhood size to use.
What is the meaning of performance characterization with respect to IU systems and how is the performance of one sub-system related to the total system's performance ? For example, what is an appropriate way to evaluate boundary extraction schemes, and what is the appropriate way to characterize the errors in the inputs and outputs of each sub-system ?
This dissertation addresses some of these questions by following a sound systems engineering methodology. 1.2 Systems Engineering Methodology An IU system is complex and is composed of many subsystems coupled together. In this situation, special attention should be given to the interactions between each subsystem in addition to their individual characteristics. The performance of an IU system is a complicated function of the inputs, the nature of the random perturbations, and system parameters. In general, a change in the input produces a change in many of the systems output variables. A very important part of the systems engineering eort is to identify the appropriate deterministic or probabilistic models for the various components of the system. Mathematical models describing the characteristics of the output data for given input data should be developed for each individual component in the system. The derivation of these models is called the system identi cation. System identi cation involves the speci cation and validation of appropriate data models for the inputs and outputs of each component. These models should include a stochastic component to account for perturbations in the input. Given an input perturbation model and a system component, the system component is completely identi ed only if the output perturbations can be described as a function of the input perturbations and the component tuning parameters.
3 In order to illustrate the quantitative characteristics of the systems engineering problem, let us consider an object recognition system. An object recognition system has the following steps: 1. a fundamental feature detection step (a low-level process), 2. a feature grouping step (a mid-level process), and 3. an interpretation step (a high-level process). Each step in the sequence has free parameters, which are often thresholds chosen by using empirical data or by trial and error. Often, due to variations from one image to another, a threshold which is good for one image fails for the other. Also, due to contrast changes within an input image, a threshold good for detecting a feature entity is not suitable for detecting another feature entity. So in order to get the best results one may have to resort to using multiple thresholds, one for each feature entity in the image. From this discussion it is clear that a general theory for selecting the free parameters in an operation or an operation sequence needs to be developed. Let us suppose that the system requirement is that its false alarm rate is less than and its misdetection rate is less than . The false alarm rate and the misdetection rate of the system is a function of the parameters describing the input image population (P), the tuning constants employed at various stages in the system (T), and the algorithms utilized in the system (A). The problem is to select T and A, for which the system requirement is satis ed. In this dissertation we do not address the issue of how an algorithm sequence for a particular problem can be chosen. There has been prior work by Goad [22], Joo [50] and Draper [16] in that area. Draper [16] describes a learning system that starts from training images to select the optimal algorithm sequence for a particular task. We, instead, focus on how one can set up tuning constants T given A. In addition, the parameters describing the variability in the input image population (P) are not often known. These are to be estimated from a database of real images. This dissertation addresses these estimation issues as well. Crucial to the development of a system that automatically selects the free parameters used in each stage of an algorithm sequence is the characterization of the performance of each algorithm employed in the sequence. This brings us to ask
4 the question: What does performance characterization mean for an algorithm which might be used in a machine vision system? The algorithm is designed to accomplish a speci c task. If the input data is perfect and has no noise and no random variation, the output produced by the algorithm ought also to be perfect. Otherwise, there is something wrong with the algorithm. So measuring how well an algorithm does on perfect input data is not interesting. Performance characterization has to do with establishing the correspondence of the random variations and imperfections which the algorithm produces on the output data caused by the random variations and imperfections of the input data. Essentially, performance characterization solves the system identi cation problem. An important problem in performance characterization is that the algorithm employed at one stage of a vision sequence changes the data unit. For example, an edge-linking process changes the data from the unit of pixel to the unit of a group of pixels. An arc segmentation/extraction process applied to the groups of pixels produced by an edge-linking process produces tted curve segments. This unit change means that the representation used for the random variation of the output data set may have to be entirely dierent than the random variation used for the input data set. In our edge-linking/arc extraction example, the input data might be described by the false alarm/misdetection characteristics produced by the preceding edge operation, as well as the random variation in the position and rotation of the correctly detected edge pixels. The random variation in the output data from the extraction process, on the other hand, must be described in terms of tting errors (random variation in the tted coecients) and segmentation errors. The representation of the segmentation errors must be natural and suitable for the input of the next process in high-level vision which might be a model-matching process, for example. What should this representation be to make it possible to characterize the identi cation accuracy of the model matching as a function of the input segmentation errors and tting errors? Questions like these are addressed in this dissertation. 1.3 Contributions We believe that there are several important contributions in this dissertation. First, this dissertation lays out a methodology for systematic evaluation of IU algorithm
5 components and systems. This is the rst time that a systems engineering approach to computer vision has been attempted. The bulk of this research was concentrated on developing random perturbation models for the input and output data at each vision module, since understanding the performance of the module is related to comprehending how the output data has been perturbed from its true value due to perturbations in the input data. Performance measures at various stages in the system are related to the perturbation model parameters. Some of the previous statistical approaches usually assume a particular model for the structure and perturbations in the data. Little attention was given to performance related issues. In contrast, this work illustrates how the error in the output at each stage is related to the error in the input stage and thus allows one to specify appropriate criterion functions for evaluating performance of each stage. Speci c algorithms for edge, line, corner extraction have been analyzed in the dissertation. This is the rst time that such rigorous theoretical analysis has been attempted. Benchmarking of computer vision algorithms is possible only if the vision community agrees on what criterion functions are appropriate to evaluate each vision component. One of the contributions in this dissertation is to provide the vision community with the criterion functions. These measures could serve as a standard for evaluation of vision algorithm components. We are not saying that the measures used in this dissertation should be the ones to be used by the community, but rather that the measures provide a starting point for comparison of existing algorithms. In fact, we realize that the criterion functions are often user and application dependent. Before one can set up tuning parameters that are optimal over an entire population of inputs, one has to devise schemes for accurate modelling of the input population. This dissertation is unique in the sense, in that it develops estimation schemes for obtaining prior distributions for image feature parameters from a database of training images. These prior distributions are then approximated by theoretical expressions and the expressions are used to derive the performance of algorithm components over the entire population of images. Tuning parameters are automatically set by using the expressions for the overall theoretical performance. Performance evaluation work done in this dissertation has given several insights into the fundamental problems in the algorithm components. Development of better IU systems will require feedback. That is, improvements to algorithms and the system
6 are often made by using the insights gained during the systematic evaluation. In fact, this dissertation provides a new boundary extraction algorithm that makes use of the insights gained during the performance evaluation. Preliminary results indicate that this algorithm is superior to existing boundary extraction schemes. 1.4 Organization This dissertation is organized in the following fashion. In chapter 2 we illustrate common elements in computer vision algorithm sequences and give a few examples of feature extraction sequences. We break down typical vision algorithm sequences into three essential components involving: feature extraction, feature grouping, and matching. Hence we give a list of image features used in the current literature and review methods for extracting these features. Further, we give a brief introduction to feature grouping methods and matching schemes. A literature review of current and past work on performance characterization is also included at the end of the chapter. In chapter 3 we discuss the methodology for performance characterization in more detail. Chapter 4 provides a theoretical and empirical comparison of edge detection schemes. We compare a morphological edge detector with a gradient-based edge operator. Chapter 5 and 6 discuss the problem of how one can derive perturbation models for a boundary extraction sequence involving edge nding, linking, and gap lling. The UMass line nder is also analyzed as part of this work. Chapter 5 focuses on theoretical results whereas chapter 6 focuses on the empirical evaluation and the experimental protocol used to evaluate line nders. Chapter 7 discusses the theoretical evaluation of the Bayesian corner nder developed by Zhang et al [109]. A discussion of how the free parameters for the algorithm can be set is also given. Chapter 8 discusses the issues involved in modelling the population of input data. Speci c methods for estimation of prior distributions for features belonging to several object classes are outlined. The derived distributions are combined with the theoretical analysis in chapters 4 through 7 to select sub-optimal tuning parameters at each step of a vision algorithm sequence involving edge detection, linking and corner extraction. Chapter 9 discusses a new algorithm for boundary extraction that is based on integrated gradients. The algorithm was developed by using insights provided by the
7 performance evaluation in chapter 4 through 6. Theoretical and empirical evaluation of the performance of the algorithm is the main focus of this chapter. Chapter 10 provides a summary of the dissertation and concludes with open research issues that should be pursued.
Chapter 2
COMPUTER VISION ALGORITHM SEQUENCES In this chapter, we give a discussion of the essential elements in vision algorithm sequences. Typical vision algorithm sequences, whether for industrial inspection or for scene analysis, have similarities in the types of operators used, even though their ultimate goal may be dierent. The sequences often include a combination of the following basic steps: 1. a fundamental feature detection step, 2. a feature grouping step, and 3. an interpretation step. In this chapter we give a few example sequences and then provide a brief review of the feature detectors, grouping methods and interpretation techniques used in computer vision literature. 2.1 Example Vision Algorithm Sequences We mentioned above that vision sequences contain three basic steps that include: feature detection, grouping and interpretation. The detection step is carried out by the assumption of an analytical model for the feature. Here we use the term feature to denote any graytone image feature like an edge pixel, corner pixel, etc. For example, the approximate analytical model for the graytone intensity pro le in the edge pixel's neighborhood may be a linear function of the row and column coordinates. The grouping process works under the assumption that similarity among neighboring feature units indicate that they belong to a single larger cluster unit. We use the term \feature unit" for a basic feature unit such as an edge pixel or a corner pixel. We use the term \feature cluster unit or feature entity " to denote the units obtained after grouping the feature units. For example, each line segment obtained after
9 the grouping process is such a cluster unit. The interpretation process treats each cluster unit as an entity and tries to match groups of such entities by utilizing their relationships, topological as well as non-topological, to pre-stored models. We use the term \feature entities" to describe basic high-level features, such as line segments, junctions etc., that are used in the matching stage. Higher-level feature descriptions such as L, T, Fork junctions used in the literature are combinations of feature entities. We give below few examples of vision sequences which employ the three fundamental steps mentioned. The rst example is based on line/arc feature extraction. The second example is based on interest point extraction. The last example is based on a sequence which involves image segmentation. The rst and second sequences are typical of what would be used in an industrial inspection system. The third is a typical sequence that would be used in scene analysis. 2.1.1 Sequence based on line/arc feature extraction
Most feature extraction schemes do the following: 1. Detect some fundamental unit of the feature. Detection is based on having a model for the fundamental unit. 2. Group these basic units, using some similarity measure, to form a cluster of units. For example, line/arc segments are formed by grouping adjacent edge pixels with similar orientation. 3. Group the unit clusters to form larger feature entities based on other similarity measures. For example, one may group the line/arc segments based on their position, orientation, and curvature to form long feature segments. Note that gap lling is a form of grouping. 4. Hypothesize matches of groups of feature entities to pre-stored models of objects. Each such hypothesis determines the position and orientation of the object in the image. 2.1.2 Sequence based on interest point extraction
A computer vision sequence based on interest points has the following steps:
10 1. Detect interest points in the boundary of the object. This step is usually done in several steps as given below. (a) (b) (c) (d)
Detect edge pixels in the input image. Link edge pixels and obtain contours of edge pixels. Thin these contours to make them one-pixel wide. Use some contour-following technique and locate points of high curvature.
2. Use these image interest points and obtain the correspondence between the image points and model (object) points. The correspondence determines the position and orientation of the object in the image. An example of a matching algorithm, which uses interest points, is the ane-invariant matching algorithm, see [47], [11]. 2.1.3 Sequence based on Segmentation
An operation sequence for scene analysis would consist of: 1. Segmenting an image based on the graytone values in the image. Areas of the image with similar graytone intensity constitute the regions of interest, and the segmented image would consist of the region details and a label for each region. Operations which perform segmentation employ region growing. Region growing is actually a grouping process, where grouping is done based on some function of the observed values in the image. 2. Providing an interpretation of the segmented image. This operation involves matching the regions to pre-stored models of objects in the scene. Note that vision algorithm sequences where a binary image is obtained by thresholding also fall into this category. In such sequences, the thresholding process is nothing but the segmentation step and other processing steps often perform noise cleaning and shape discrimination. A combination of basic morphological operators are often used to perform noise cleaning and shape discrimination. For a review of operators on binary images, the interested reader is referred to Haralick and Shapiro [40].
11 2.2 Feature Extraction and Grouping Techniques The above discussion shows that there are similar components among vision sequences. We now give a brief review of feature extraction and grouping techniques. Our aim is not to give a exhaustive survey of these techniques but to rather illustrate major operators and their salient dierences. A rather exhaustive survey is given in [40]. Since our dissertation considers mainly sequences involving edge point, line/arc features we brie y discuss major operators for edge detection here. 2.2.1 Edges & Edge detection
Ideal intensity edges are de ned by relatively steep intensity changes between two regions and these intensity changes may be modeled as step, ramp or roof functions. There is a direct relationship between edges in an image and the physical properties in a scene; hence edge detection is an important step of a computer vision system. An edge operator converts a gray scale image to a binary image. Subsequent vision processes may make use of the simple form, instead of dealing with the gray scale image directly. Edge detectors can be broadly classi ed into two classes: gradient operators and second derivative operators. For a general survey the interested reader is referred to Haralick and Shapiro [40]. Gradient operators respond with a broad peak at the edge location, and the output of these operators requires a thinning or maximum detection step which degrades resolution. Second derivative operators on the other hand, respond at an edge location (location of the zero crossing of the second derivative), which can be determined to sub-pixel precision depending on the signal to noise ratio. A number of edge detectors have been proposed. Signi cant among them are the facet edge operator developed by Haralick [35], the Laplacian of a Gaussian operator developed by Marr and Hildreth [60] and the Canny edge operator [10]. These edge detectors characterize and detect edges by considering the intensity density over a broad area around candidate edge pixels. Canny [10] discusses the problem of combining outputs from a set of edge operators of varying size and orientation, and he decides on the size of his edge operator by setting a minimum acceptable error rate. He chooses the smallest operator with signal to noise ratio greater than the threshold determined by the error rate.
12 Facet Edge Operator
Haralick [35] discusses an edge detector using the facet model. Pixels which are part of regions have simple grey tone intensity surfaces over their areas, and pixels that lie on an edge have complex gray tone intensity surfaces in their neighborhoods. Speci cally, an edge is present if and only if there is some point in the pixel's neighborhood having a negatively-sloped zero crossing of the second directional derivative taken in the direction of a non-zero gradient at the pixel's center. To determine whether or not a pixel is an edge pixel, its underlying grey tone intensity surface is estimated on the basis of the pixels in its neighborhood. Then the directional derivatives for the intensity surface are computed, and the edge pixel is located at places where a zero crossing of the second directional derivative is present. As noted, the second directional derivative is taken in the direction of a non-zero gradient at the pixel's center. Thus this kind of edge detector will detect weak local gradients. Marr-Hildreth Edge Operator
Marr and Hildreth [60] suggest an edge operator based on the zero crossings of a generalized Laplacian. In eect, this is a non-directional or isotropic second derivative zero crossing operator. Marr and Hildreth selected a limited range of spatial frequencies by blurring the image with Gaussian lters, and they identi ed edges as the locus of points where the directional derivative of the ltered image has a peak, which implies the zero crossing of the second derivative. This technique is important because of its ability to detect edges at several dierent resolutions, determined by the standard deviations of the Gaussian lters. This technique locates in nite straight edges with linear illuminations exactly. The mask for the generalized Laplacian operator is given by sampling the kernel r c (1 ; k (r + c ) )exp ; at the row, column coordinates (r; c). Intensity changes are detected in image I (r; c) by nding the zero-crossings in r G(r; c)I (r; c), where denotes convolution. Wherever an intensity change occurs, there is a peak in the rst directional derivative and a zero crossing in the second directional derivative. Detection of intensity changes is then reduced to nding the zero crossings in the second derivative of the intensity in the direction of maximum slope at the zero crossing. 2
2
2
2
(
( 2 + 2) ) 2 2
13 Canny Edge Operator
This operator consists of several steps. The rst step involves convolving the original image with a gradient of Gaussian kernel. The two-dimensional Gaussian is given by: 1 exp(; r + c ) g(r; c) = 2 2 2
2
2
2
This function can be separated into 2 one dimensional functions of the form:
g(x) = p 1 exp(; 2x ) 2 The gradient of the one-dimensional Gaussian function is given by: 2
2
g0(x) = ; x g(x) 2
By convolving the image f (r; c) in the row direction with g0(r) and then in the column direction with g(c), the smoothed partial derivative of the image function with respect to r, fr is obtained. Similarly, convolving with g0(c) in the column direction followed by g(r) in the row direction gives the smoothed partial derivative of the image function with respect to c, fc . The gradient vector is given by: (fr ; fc). The gradient magnitude , M (r; c), and the direction of maximum rate of change, , can be obtained from fr and fc. q M (r; c) = fr + fc 2
2
= tan; ( ffc ) 1
r
The second step in the Canny edge operator is to perform non-maxima suppression. This is done by computing the direction at each point and interpolating between the values of the two eight-neighbors having direction nearest to it. This is done for the + 180 degree direction also. A point is designated as a non-edge point if the gradient magnitude at the point is not greater than the gradients M and M . The third step is to scan for possible edge points. Each point whose magnitude is above a user de ned threshold is marked as an edge point. If an 8-neighbor is a +180
14 possible edge point and has magnitude above the threshold then it is also labeled as an edge point and the search continues among all the neighbors. The search terminates when a point is not a possible edge or has magnitude below the threshold. 2.2.2 Gray Scale Corner Detection
A survey of techniques for corner detection can be found in Kitchen and Rosenfeld [52]. Kitchen and Rosenfeld categorizes techniques for corner detection into { binary and gray level. They note that gray level schemes performed better than binary techniques in terms of detection, localization and stability. Singh and Shneier [95] categorize gray scale corner detection techniques into template-based techniques and intensity-gradient-based techniques. Template-based corner detection involves determining the correlation between a given template and subwindows in the image. The image is convolved with a mask (template) and the resulting image is thresholded to obtain regions of high correlation with the template. Gradient based methods rely on measurement of edge curvature. Points of signi cant curvature change along an edge are the corners. Zuniga and Haralick [111] proposed a gray scale corner detector based on the facet model. They identify corners as edge points where a signi cant change in gradient direction takes place. This change should be ideally measured as an incremental change along the edge boundary. They compute this incremental change in three ways:
Incremental change in gradient direction along the tangent to the edge at the point which is a corner candidate.
Incremental change along the contour line which passes through the corner candidate.
Instantaneous rate of change in gradient direction in the direction of the tangent line.
They show that the second method works the best.
15 2.2.3 Interest point detection in a contour
In the above section we categorized corner detection schemes into binary and gray level. In this section we give a review of interest-point-extraction algorithms. Given a sequence of points belonging to a contour the interest-feature extraction algorithm traces through the contour and produces a smaller list of points called \interest points". Interest points are points on the contour with some special property that makes them useful in matching. Normally, points of sharp curvature change (corners) are the interest points. Phillips and Rosenfeld [74] describe a curve partitioning algorithm which produces these interest points as output. Given a point P on the curve and a xed arc length k, there is a set of chords that have arc length k and span the part of the curve containing P . Let d(P; C ) be the perpendicular distance from a point P to a chord C whose span includes P , and let M (P; C ) be the maximum distance from P to all such chords. P is a partition point of the curve if the value of M (P; C ) is a local maximum (for the given k) and also exceeds a threshold t(k). This method nds points of high curvature along the contour. Haralick et al [37] locates corners by using line- tting. They plot the variance bound for the angle of a tted line as a function of the number of points used for line tting and note that the variance bound is a minimum at a corner. A number of other algorithms for segmenting a contour into sub pieces are described in [40]. 2.2.4 Edge pixel Grouping methods
Most edge linking techniques are based on the Hough transform. The Hough transform algorithm requires an accumulator array whose dimension corresponds to the number of unknown parameters in the equation of the family of curves being sought. For example, nding line segments using the equation y = mx + b requires nding two parameters for each segment: m and b: The two dimensions of the accumulator array for this family would correspond to quantized values for m and quantized values for b: Using an accumulator array A; the Hough procedure examines each pixel and its neighborhood in the image. It determines if there is enough evidence of an edge at that pixel, and if so, calculates the parameters of the speci ed curve that passes through this pixel. In the straight line example with equation y = mx + b; it would
16 estimate the m and the b of the line passing through the pixel being considered if the measure of edge strength (such as gradient) at that pixel were high enough. Once the parameters at a given pixel are estimated, they are quantized to corresponding values M and B and the accumulator A(M; B ) is incremented. Some schemes increment by one and some by the strength of the gradient at the pixel being processed. After all pixels have been processed, the accumulator array is searched for peaks. The peaks indicate the parameters of the most likely lines in the image. A number of variations on the Hough transform have been proposed and a comprehensive review can be found in [40]. In this dissertation we will be analyzing direction based line nders such as the Burns line nder [9]. The algorithm consists of partitioning the edge orientation space into a xed number of bins, obtaining connected components in the partitioned image, performing a weighted planar t to the gray tone surface in the region of interest, and then intersecting the tted plane with the at plane of height equal to the weighted mean grayvalue. Another grouping technique which is of similar nature is the ellipsoidal clustering algorithm described in Ramesh and Haralick [80]. This algorithm views the process of edge linking as a clustering process, where edge pixels belonging to the same cluster are part of the same line segment. The algorithm assumes that the result of an edge operation produces not only the estimates of the orientation and position but also their variances and covariances. So each edge element becomes an ellipsoid in the parameter space. The problem then becomes one of clustering these ellipsoids in the parameter space. 2.3 Matching Techniques A complete review of matching techniques is beyond the realm of this dissertation, but we will discuss very brie y dierent frameworks for matching schemes used in computer vision. The matching problem is one of establishing the correspondence between model-feature entities and image-feature entities by utilizing: relationships between image-feature entities, pre-stored descriptions of model-feature primitives and their relationships. Essentially this problem falls into a class of problems called the consistent labelling problem, see [30]. This problem is NP-complete and a number of papers discuss methods for reduction of the complexity of the problem by using a
17 variety of constraints; for a review see Haralick and Shapiro [40]. General frameworks for matching include the relational distance approach to matching (see Shapiro and Haralick [92]), ordered structural matching [93], and hypothesize and test methods (See [58], [23], [8] ). Hypothesize and test methods perform matching by rst nding a limited number of correspondences between the model and the image, computing the hypothetical transformation matrix that describes the position and orientation of the object with respect to the camera, testing the hypothesis by projecting the model to the image plane via the transformation, and evaluating the goodness of the t of the transformed model to the image. 2.4 Performance Evaluation of Vision Algorithms { Review In this section we give a brief review of recent papers related to our problem. For most algorithms, there is no performance characterization that has been established and published in the computer vision research literature. In general, performance characterization of feature extraction operators has been done mainly by plotting empirical curves of performance. Quantitative performance evaluation of edge operators was rst performed by Abdou and Pratt [1]. A quantitative performance evaluation methodology for line detection schemes has been proposed by Kanungo et al [51]. They present a methodology for performance evaluation by looking at a speci c case where a line detection algorithm is used to detect the presence or absence of a vertical edge in the presence of a masking grating. They study the performance of a line detection algorithm based on the Hough transform by conducting experiments and obtaining a set of operating curves for a representative set of algorithm parameter values. A number of computer vision researchers have recently stressed the need for benchmarking and performance evaluation of computer vision algorithms. Papers by Price and Huertas [78], Haralick [44], Jain and Binford [48], and Petkovic et al have also stressed the need for performance evaluation of computer vision algorithms. Petkovic et al state that machine vision systems are very hard to model or simulate accurately and so realistic, large scale experiments are the only reliable means of assessing their accuracy. Haralick [38] gives tables for the number of trials required to test a nearperfect machine, in order to guarantee that it satis es a given performance requirement. He illustrates how this number could be as high as 100,000 if one wishes to
18 guarantee an error rate of less than one in ten thousand with 95% con dence. More recently, Haralick [42] stresses the need for a methodology to conduct systematic evaluation of IU algorithms. Moreover, he discusses the meaning of an experimental protocol and illustrates an example protocol by taking thinning algorithms as a case in point. Since the beginning of this dissertation there has been considerable eort in performance characterization at the University of Washington. Jaisimha et al give a performance characterization of thinning algorithms, while Zhang, Haralick and Ramesh [109] have done systematic performance evaluation of corner detectors. Not much qualitative evaluation of the performance of vision algorithms was done prior to the beginning of this dissertation. Grimson and Huttenlocher provide a theoretical analysis of matching algorithms. (See Grimson et al [24], Grimson et al [25]). The only paper which is related to the parameter selection problem among these papers is by Grimson et al [24]. For the matching stage, Grimson and Huttenlocher rigorously derive conditions under which a hypothesized match should be accepted. They obtain the probability of a match occurring at random to the fraction of model features accounted for by the match, as a function of the number of model features, the number of image features, and a bound on the degree of sensor noise. They use this measure to set a proper matching threshold (the fraction of model features that must be matched in order to limit the probability of random matching ). More recent work involves the theoretical error analysis by Sarachik and Grimson [91] of geometric hashing techniques. They assume that the error in extracted point locations is Gaussian distributed and rigorously determine tuning constants for the geometric hashing scheme. Wang and Binford [104] perform a theoretical analysis of the Canny edge detector and illustrate how one can select the gradient threshold. Their analysis is very similar to ours, but there are some fundamental dierences. In fact, theoretical results in chapter 8 of this dissertation and their paper are almost identical. The main dierence is that we use the slope facet model for estimating the gradient magnitude whereas they use the derivative of a Gaussian lter (Canny edge lter) to estimate the gradient magnitude. Our theoretical results for tuning parameters of the Canny hysteresis linking step is more general in the sense that our analysis shows how one can take into account the prior distributions of gradients in areas of interest and non-interest. In our dissertation we are more concerned with the proper choice of parameters
19 at each stage of a vision algorithm sequence, since the performance at a high-level stage is directly related to the performance at the lower-level stages.
Chapter 3
METHODOLOGY FOR PERFORMANCE CHARACTERIZATION 3.1 Introduction We have seen in chapters 1 and 2 that IU systems are complex and are composed of dierent algorithms applied in sequence. An IU system for model-based recognition has three essential components: feature extraction, grouping and model matching. In each of these components, tuning parameters (thresholds) are often used. These parameters have been traditionally chosen by trial and error or from empirical data. In this chapter we discuss details of the systems engineering methodology for the analysis and design of IU algorithms and systems. For a given image understanding task and an algorithm sequence that accomplishes the task we illustrate how the algorithm parameters can be optimally selected. The essential steps for each of the algorithm components involved are: component identi cation (performance characterization) and application domain characterization (achieved by an annotation). There is an optimization step that is used to optimize a criterion function relevant to the nal task. Performance characterization of an algorithm involves the establishment of the correspondence between random perturbations in the input to the random perturbations in the output. This involves the setup of the model for the output random perturbations for a given ideal input model and input random perturbation model. Given these models and a criterion function, it is possible to characterize the performance of the algorithm as a function of its tuning parameters and automatically set the tuning parameters. We use the term \application domain characterization " to describe the process by which theoretical probabilistic models describing the population of inputs to an IU system are derived and estimated from training data. The speci cation of the theoretical model for the population of ideal input data varies with problem domain. Domain-speci c prior information on the parameters that describe the ideal input
21 data can be gathered during an annotation step. The annotation procedure is one in which ground truth information is manually entered by an user. Appropriate theoretical approximations for the prior distributions can be then speci ed, validated and utilized in computing the performance of the algorithm sequence over the entire input population. Tuning parameters can be selected to optimize the performance over the input population. This chapter is organized as follows. First, we provide a statement of our problem. We then proceed to describe, in detail, our methodology and the necessary steps required to design optimal IU algorithms. 3.2 Problem Statement Let A denote an algorithm. At the abstract level, the algorithm takes inputs, a set of observations, called input units UIn and produces a set of output units UOut. Associated with the algorithm is a vector of tuning parameters T. The algorithm can be thought of as a mapping A : (UIn; T) ! UOut. Under ideal circumstances, if the input data is ideal (perfect), the algorithm will produce the ideal output. In this situation, doing performance characterization is meaningless. In reality the input data is perturbed, perhaps due to sensor noise or perhaps due to the fact that the implicit model assumed in the algorithm is violated. Hence the output data is also perturbed. Under this case the inputs to (and the outputs from) an algorithm are observations of random variables. Hence, we view the algorithm as a mapping: A : (U^In ; T) ! U^Out, where the^symbol is used to indicate that the data values are observations of random variables. This brings us to the verbal de nition of performance characterization with respect to an algorithm: \Performance characterization for an algorithm involves establishing the correspondence between the random variations and imperfections on the output data and the random variations and imperfections on the input data." More speci cally, the essential steps for performance characterization of an algorithm include: 1. the speci cation of a model for the ideal input data. 2. the speci cation of a model for the ideal output data.
22 3. the speci cation of an appropriate perturbation model for the input data. 4. the derivation of the appropriate perturbation model for the output data (for the given input perturbation model). 5. the speci cation and the evaluation of an appropriate criterion function to characterize the performance of the algorithm. The main challenge is in the derivation of appropriate perturbation models for the output data and relating the parameters of the output perturbation model to the input perturbation, the algorithm tuning constants, and the ideal input data model parameters. This is due to the fact that the speci cation of the perturbation model must be natural and suitable for ease of characterization of the performance of the subsequent higher-level process. Once an output perturbation model is speci ed, estimation schemes for obtaining the model parameters have to be devised. In addition, the model has to be validated, since theoretical derivations may often involve approximations. The speci cation of an appropriate criterion function is by itself an interesting problem. The issue here is what appropriate measures can be used to compare the ideal expected result with the perturbed output from an algorithm. Benchmarking of IU algorithms is possible if the IU community agrees on a set of appropriate measures (criterion functions) that should be used to evaluate an algorithm. 3.2.1 IU Algorithm Sequences
Having discussed the meaning of performance characterization with respect to a single algorithm, we now turn to the situation where simple algorithms are cascaded to form complex systems. First we specify the essential components of typical vision algorithm sequences (feature detection, grouping and matching) and note the similarities between them. Then we discuss the nature of input and output perturbations at each stage. Feature extraction involves the classi cation of image pixels into atomic feature entities (for example, edge/non-edge, corner, etc.). The extraction is done by the assumption of a speci c model for the image feature characteristics. For example,
23 an ideal intensity edge may be modelled as a step function or an ideal corner can be modelled as being generated by the intersection of two line segments. Feature grouping involves the assignment of group labels to individual atomic feature entities. The basis for such an assignment is criteria such as proximity, orientation dierence, etc. At a conceptual level all of these algorithms perform a clustering task, by utilizing appropriate distance measures (metrics/non-metrics) that describe similarity between atomic feature entities. For example, groups of pixels that form line segments or arc segments, can be visualized as ellipsoids in the high-dimensional space of feature attributes. Feature matching also involves the assignment of categories to individual feature entities. Features that are labelled to the same category are, perhaps, part of the same (or belong to a class of) object(s). Relative to the IU task, some of the objects are of interest and some are of non-interest. From the classi cation point of view, at each stage of a typical IU system, features belonging to both the objects of interest and objects of non-interest are detected, grouped and passed on to the matching stage. Relative to the task, one can de ne classi cation errors as follows. The possible errors in a detection step are:
Mislabeling a true atomic feature unit, belonging to an object of interest, as a non-feature unit.
Mislabeling a true non-feature unit, due to noise or belonging to an object of non-interest, as a feature unit.
The possible errors in a grouping step are:
the introduction of clutter cluster units belonging to objects of non-interest. This may also be caused by correlated noise.
the merging of two true cluster units into a single cluster. This is mainly due
to the inability of the algorithm to dichotomize the clusters since the similarity measures used provide reasonable evidence to suggest otherwise.
the splitting of a single true cluster unit into multiple cluster units. An error occurs in the interpretation process if:
24
The interpretation process identi es the object in the image incorrectly. This is a misclassi cation.
The interpretation process falsely states that there is an object in the scene due to a random match (perhaps to an non-interesting object). This is false alarm . 1
We have posed performance characterization of an algorithm as analysis of the algorithm's sensitivity to perturbations in the input data. We have also stressed the dierences in the nature of the speci cation of perturbation models at dierent stages in an image understanding algorithm sequence. The statement of the problem, in its present form, does not present the whole picture. The ideal input data is often speci ed by a model with parameters speci ed by a vector D and the algorithm is often an estimator of these parameters. First, we note that the ideal input data is nothing but a sample from a population of ideal inputs. The characteristics of this population, i.e. the exact nature of the probability distributions for D, is dependent on the problem domain. The process of generation of a given ideal input can be visualized as the random sampling of a value of D according to a given probability distribution FD. Let PIn denote the vector of parameters for the input perturbation model and T denote the vector of (unknown) tuning parameters. Let QOut(T; PIn; D) denote the criterion function that is to be optimized . Then the problem is to select T so as to optimize the performance measure Q, over the entire population, that is given by: Z Q(T; PIn) = QOut(T; PIn; D)dFD (3:1) 2
In the situation where the perturbation model parameters, PIn, are not xed, but have a speci c prior distribution, one can evaluate the overall performance measure by integrating out PIn. That is: Z Q(T) = Q(T; PIn)dFPIn (3:2) False alarm is another type of misclassi cation. When there are only two classes of objects (objects of interest/non-interest) the two types of errors are the false alarm and misdetection. 2 Note that the input data U ^In is not one of the parameters in the criterion function. This is correct if all the input data do not violate any of the assumptions about the distribution(s) of D and PIn.
1
25 We now focus on the criterion function and its choice for dierent IU algorithms. In general, the problem solved by model-based IU algorithms involves the identi cation and localization of instances of a given object model. The feature detection, feature grouping, and model matching steps can be visualized as classi cation tasks. Thus, one can use standard decision-theoretic methods such as the Neyman-Pearson theory in our problem. Under Neyman-Pearson theory, one would set the threshold to the value that corresponds to a class error probability of . For example, one could choose an optimal threshold that would set the probability of false alarm of an edge operator to 0.05. On the contrary, if one wishes to obtain a balance between the misdetection and false alarm characteristics the appropriate criterion function could be a convex combination of the false alarm and misdetection error probabilities, or if dierent costs are associated with false alarm and misdetection, the criterion could be a weighted combination of the false alarm rate and misdetection rate. 3.2.2 Optimization of Performance of Algorithm Sequences i Let denote the collection of all algorithms. Let A i 2 , then A i : UIni ! UOut i is the mapping of the input data UIni to the output UOut . Note that the unit for i i UIn may not be the same as the unit for UOut and perturbations in the input unit type causes perturbations in the output unit type. A performance measure, Q i , is associated with Ai. Associated with each algorithm is the set of input parameters T i . The performance measure is a function of the parameters T i . An algorithm sequence, S , is an ordered tuple: ( )
( )
( )
( )
( )
( )
( )
( )
( )
( )
( )
S : (A ; A ; : : : ; A n ) (1)
(2)
( )
where n is the number of algorithms utilized in the sequence. Associated with an algorithm sequence is a parameter vector sequence
T : (T ; T ; : : :; T n ) (1)
(2)
( )
and a ideal input data model parameter sequence:
D : (D ; D ; : : :; T n ): (1)
(2)
( )
26 The performance at one step of the sequence is dependent on the tuning parameters, and the perturbation model parameters at all previous stages. So:
Qi = fi(T i ; T i; ; : : : ; T ; PIn i; ; : : :; PIn ): ( )
(
1)
(1)
(
1)
(1)
So the overall performance of the sequence is given by:
Qn(T; PIn) = fn (T n ; T n; ; : : :; T ; PIn n; ; : : :; PIn ): ( )
(
1)
(
(1)
1)
(1)
The free parameter selection problem can now be stated as follows: Given an algorithm sequence S along with the parameter sequence T and performance measure Qn, select the parameter vector T that maximizes Qn . Note that Qn is actually the integral:
Q Z n(TZ; PIn) = : : : fn (T n; ; : : :; T ; PIn n; ; : : :; PIn ; D n; ; : : : ; D )dFD n; : : : dFD : (
1)
(1)
(
1)
(1)
(
1)
(1)
(
1)
(1)
Note that at each stage a dierent set of prior distributions FD i comes into play. Also, the perturbation model parameters PIn i is a function gi(T i; ; PIn i; , D i; ; A i; ). In other words, the perturbation model parameters at the output of stage i is a function of the tuning parameters at stage i ; 1, the input perturbation model parameters in stage i ; 1, the ideal input data model parameters, and the algorithm employed in stage i ; 1. It is important to note that the functions gi depend on the algorithm used. No assumption is made about the form of the function gi. ( )
( )
(
1)
(
(
1)
(
1)
1)
3.3 Building Detection Example We have just discussed the problem of setting up appropriate tuning constants for an IU system. We now turn to a concrete problem and illustrate the details. We assume aerial image analysis as our problem domain. Speci cally, we take a look at the problem of recognizing buildings in aerial image data. The input image(s) to a computer vision algorithm may contain dierent categories of object classes, with each object class having an idealization, an associated set of free variables also termed as ideal input data model parameters, and an associated random perturbation model. The same is true for the output data produced by the vision algorithm.
27 To illustrate what we mean consider the buildings in an aerial image. Buildings in an aerial image constitute an object class. The idealization of a 3D building is a polyhedral 3D spatial object whose sides are vertical and whose roof boundary (not the entire roof) lies in a horizontal plane. The idealization of a building on an aerial image is that of an object whose boundary is the perspective projection of the 3D building idealization. The free variables of the 3D spatial object model are the length, widths, and angles of the building faces. The 3D scene has an imaging sensor attached to it. We can think of deriving a population of images of a given site by capturing images at various positions and orientations of the imaging sensor. One could add time as another parameter as it gives an idea of where the light source (in this case the sun) is with respect to the world. Thus, the imaging sensor has associated with it the free variables of position and viewing angle. The imaging sensor has intrinsic parameters that we assume are not variable. For example, in the case of a simple pin-hole camera model, the intrinsic parameters include the focal length, dimensions of the ccd array, and the number of rows and columns in the sensor array. Given the distribution for the values of the sensor free variables, the distribution of the lengths and angles of the 3D building edges translates into 2D distributions for the length of the boundary segments, the angles between segments, and the depth of the open area adjacent to any boundary segment. We have just described essentially the geometric part of what the imaging sensor does. Along with the 3D geometric model, we need to describe the re ectance properties of the surfaces of the building. One assumption could be that the re ectance is constant on each building surface. In addition, one needs to specify the characteristics of the light source. In aerial image analysis, we may assume that we are given the time and the day at which the image was obtained. Combining this information with the approximate latitude and longitude of the location being imaged, we can determine the position of the sun. The distribution of light source positions and the distribution of the sensor positions together induce a joint gray-level distribution on the building faces and the roof. This model is a stochastic model that may, depending on the imaging sensor and buildings of interest, describe the gray-level spatial distribution of the faces, and assume an independence between the gray levels of one face and those of another face or roof. It may also describe the contrast of the gray-level
28 distribution between faces and some measure of the gray-level spatial dependencies. The gray-level spatial distribution would also specify the distribution associated with the gradients of step edges and the width of the gradient regions. The random perturbation model for the perturbations in the sensor would describe the nature of the gray level pixel noise, how much of it is additive, how much of it is replacement, how it is correlated, and how large it is as an eect. Systematic perturbations may be introduced in the sensor due to geometric distortion in the sensor. The probability model for the boundary segments of an aerial building would give the probability distributions relating to occlusion . It indicates how much of a boundary will appear on the image, in how many pieces it will appear, and what is the conditional size distribution of the pieces. To illustrate a more complete view of how these data models might be, we take a building recognition sequence consisting of edge detection, edge linking, corner detection, line classi cation, and building recognition. Given a description of idealization of the object class and of the random perturbation of the object class, it is possible to analytically determine for a given kind of edge operator the conditional probability distributions for each boundary segment relating to how much of it gets detected, how many pieces it is detected in, and the size distribution of the pieces. For those detected pieces there is the distribution describing the location perturbation of the correctly detected edges. Small perturbations can be adequately captured by assuming a Normal distribution for the additive perturbation with a covariance matrix being the key distribution parameter . In any case, each of these distributions will be a function of the tuning parameters of the edge operator. Thus for a zero-crossing facet edge operator, the tuning parameters might include: neighborhood size, order of polynomial t, radius within which the zero-crossing is searched for, and gradient threshold or contrast threshold. Following edge detection is an edge linking stage that groups together edges belonging to the same boundary and at the same time closes some of the gaps on the boundary. The data at this stage could be characterized by the length distributions of the boundary pieces, the location perturbation distributions, and perhaps the cur3
3
We do not use the term random perturbation model to distinguish the fact that occlusion is a deterministic process. The variability in the number of boundary segments that are visible is due to the variation in the sensor pose and light source pose.
29 vature distributions. Then there is a corner detection stage which segments the boundaries into lineal pieces. Associated with the lineal pieces are the distributions relating to the included angles and the location perturbation of the detected boundaries relative to the ideal boundaries. Following this there may be a classi cation stage which uses the detected lineal segments and the neighboring lineal segments to classify whether or not any lineal segment is likely to be part of a building or not. This stage's results can be characterized by the false alarm rate and misdetect rate. Finally at the last stage, there is building recognition. This stage selects and groups together line segments which had previously been classi ed as likely to be part of buildings. It determines those groupings which are consistent with being part of the perspective projection of the kind of polyhedra we initially described. This operation results in a building misdetect and false alarm rate. And for the correctly detected buildings there are associated measures of the distribution for the number of faces and number of boundary lineal segments that are correct and the number that are not correct. And for the correctly detected segments there is a covariance matrix associated with the line segment end point positions. In the case of scalable vision systems, there is another element of complexity with respect to the tuning parameters of the algorithms. Here, one of the tuning parameters will be associated with scale. And the algorithm must adaptively set this parameter based on what it can learn by probing each spatial area with operations over dierent neighborhood sizes. How the algorithm adaptively sets the scale parameter will typically depend on another's free parameter. Hence we see that this does not change the nature of the perturbation models or the idealizations of the data at any point in the vision algorithm sequence. It only changes the complexity of the perturbation propagation calculation. In summary, we have seen that computer vision algorithms have multiple steps. Each step typically has some tuning parameters. The input data to each step can be considered to be randomly perturbed. The random perturbation on the output data produced by each step is a function of the input random perturbation and the tuning parameters. Associated with the purpose of the vision algorithm is a criterion function. The tuning parameters must be chosen to optimize the criterion function for the given kinds of input perturbations.
Chapter 4
PERFORMANCE CHARACTERIZATION OF EDGE DETECTION SCHEMES 4.1 Introduction Previous chapters of this dissertation reviewed the current literature on IU algorithms and proposed a methodology for the engineering of IU systems. In this chapter we discuss one of the most fundamental steps in IU systems: edge detection. Ideal intensity edges are de ned by relatively steep intensity changes between two regions in an image. These intensity changes are often modelled as step, ramp or roof functions. Examples of classic edge detectors include: the Marr-Hildreth edge detector [60], Facet edge detector [35] and the Canny edge detector [10]. Edge detection using morphological techniques are attractive because they can be eciently implemented in near real time machine vision systems that have special hardware support [56]. A number of edge detectors have been discussed in the computer vision literature. However, little performance characterization of edge detectors has been done. In general, performance characterization of edge detectors has been done mainly by plotting empirical curves of performance. Quantitative performance evaluation of edge detectors was rst performed by Abdou and Pratt [1]. In this chapter we perform a theoretical and empirical comparison of gradientbased edge detectors and morphological edge detectors. The dierence between the two schemes is essentially in the estimation scheme for edge strength. Gradient based edge detectors estimate the gradient magnitude by using linear lters, whereas morphological operators involve non-linear ltering. Theoretical analysis of morphological schemes are therefore more involved than their linear counterparts. Following the methodology outlined in chapter 3, our rst step is to specify an appropriate idealization for the input data, specify a model for the perturbations in the input data, specify an appropriate model for the ideal output data, specify an perturbation model for the output data, and relate output perturbation model parameters to input perturbation model parameters. By assuming that an ideal edge
31 is corrupted with additive noise we derive theoretical expressions for the probability of misdetection (the probability of labelling of a true edge pixel as a non-edge pixel in the output). Further, we derive theoretical expressions for the probability of false alarm (the probability of labelling of a non-edge pixel as an output edge pixel) by assuming that the input to the operator is a region of at graytone intensity corrupted with additive noise. In addition we derive the theoretical expression for the distribution of pixel positioning error as a function of the signal to noise ratio, gradient threshold, and the neighborhood size employed in the edge operator. Finally, we provide an experimental protocol for our experiments on synthetic data and provide performance curves. 4.1.1 Edge detection { Ideal Data & Perturbation Model
Our ideal edge is a ramp edge of scale (width of the ramp) K pixels. Speci cally in 1dimension, the intensity values are viewed as a function I : D ! Z . Here the domain of the function is speci ed by the 1d-interval neighborhood around the edge pixel. The domain is the index set, D = ;(K ; 1)=2; : : : ; 0; : : : ; (K ; 1)=2. We assume that K is an odd integer K ; 1 is even.
I (x) = a + G(x) for x = ;K ; 1=2; : : :; K ; 1=2 = a ; G(K ; 1)=2 for x < ;K ; 1=2 = a + G(K ; 1)=2 for x > K ; 1=2
(4.1)
In the analysis that follows, we assume that no other edge is present within an interval of width W (W > K ) pixels around the center of the current edge pixel. This is done to make the analysis a little simpler. It is possible to relax this assumption and rigorously analyze eects of interfering edges. Assuming that a neighborhood operator of appropriate window size K is used, the 1d estimate of gradient magnitude (for perfect data) would be the sequence of values G(x) and it is clear that G(0) is maximum within a 1d interval neighborhood of K pixels wide. Thus, where appropriate we will use G(x); x = ;(K ; 1)=2; : : : ; (K ; 1)=2, as the true value of the gradient magnitude at pixel x. De ne G(0) = Gt, then G(x) < Gt; 8x 6= 0. a is the mean gray level in the edge neighborhood.
32 In 2d, the ideal model has to account for the orientation of the edge. It is assumed that the ideal image data is a function, wherein the gradient magnitude is given by Gt and the edge direction is . The ideal input image values are given by It(r; c) = r + c + , where = Gtcos and = Gt sin and is the mean gray value in the K by K neighborhood. The input image gray values are assumed to be corrupted with noise which may be modelled as a Gaussian distribution with zero mean and standard variance . That is: I (r; c) = It(r; c) + (r; c) (4:2) where, I (r; c) is the observed image gray value, It(r; c) is the true gray value and (r; c) is the noise component. (r; c)'s are independent and identically distributed Gaussian random variables with zero mean and standard deviation . This noise model is realistic when one is dealing with standard gray-scale cameras. 4.1.2 Output Ideal Model & Perturbation Model
The ideal edgel output is characterized by two parameters its true position (r; c) and orientation . The ideal output edge image can be viewed as a function O : (Zr Zc ) ! f0; 1g [0; 2 ). The ideal edge image is speci ed by the function that maps all ideal edge pixel location integer tuples to the label 1 (Edge) and an ideal orientation attribute . Let Do = f(r; c)jO(r; c) = 1g Zr Zc . Do is the set of all true edge pixels. Viewing the edge detector as a functional that takes in a function as input and produces a function as output, we can see that the output function O^ is a perturbed version of the ideal expected function O. The characteristics of the perturbations are: 1
Misdetection { An ideal edge pixel was not detected. This means that if (r; c) 2 Do , then the output function estimate (r; c) is not an element of D^ o.
False alarm { An ideal non-edge pixel was detected. This means that if (r; c) 2 (Zr Zc) ; Do , then (r; c) 2 D^ o . 1
This orientation attribute can normally be inferred to a speci c precision from the topology of the set Do = f(r; c)jO(r; c) = 1g, but we show as an attribute for speci c reasons. It is not exactly clear what the orientation is at junctions, or in corners. So we assume that the ideal edgel orientation is speci ed as a separate attribute.
33
Detection { For those edge pixels correctly detected, we have an error in the
estimated location of the edge pixel. Due to gray scale perturbations, the estimated orientation is perturbed and the estimated location of a gradient maximum is also perturbed. This is also re ected in D^ o and in the estimated function O^ .
The parameters that characterize these perturbations include: the probability of misdetection, probability of false alarm, and the covariance matrix of estimated edge position and orientation. We assume that perturbations in the edge position are dominant along the direction of the edge intensity pro le. Thus, we approximate edge positional deviations to be along the direction orthogonal to the direction tangent to the edge element (edgel). 4.2 Relationship between Output Perturbations and Input Perturbations (Gradient Based Edge Detector) 4.2.1 Probability of Misdetection of a Gradient Edge
In this section, we derive an expression for the probability of misdetection for a gradient-based edge detection scheme. As seen above, the probability of misdetection is one of the parameters that describe the perturbation characteristics in the output of an edge detector. We assume that the gradient at a particular pixel is estimated by computing a least squares t to the gray levels in the pixel's neighborhood. If we approximate the image graytone values in the pixel's neighborhood by a plane p r + c + , then the gradient value Gt = g = + . To estimate and we use a least squares criterion. On the basis of these estimates, we can derive the density function for the estimated gradient magnitude. Under our assumptions about the noise model in the input image, it can be shown that the tted parameters ^ and ^ are Gaussian random variables with means ; and variances ; respectively. We use the notation Ui to denote unit normal random variables with zero mean and unit variance. Under this notation we can rewrite the expressions for and as: = + U and = + U . Note that: = when a square neighborhood is used in the t and they are related to the input noise variance by the expression: 2
2
2
2
1
2
2
2
2
34
= r : (4:3) r c Here r cr is the summation of the squared row index values over the neighborhood used in the least squares t. By de nition Pvi Ui , where Ui's are i.i.d. unit normal variables, is distributed as a chi-square distribution with v degrees of freedom. Also, Pvi (Ui + i) is distributed as a non-central chi-square distribution with non-centrality parameter Pvi i . Now: G^ = ^ + ^ = (U + ) + (U + ) (4.4) = : () + : ( ) 2
2
2
2
2
2
2
=1
2
2
2
2 2
2
1
1
2
2
2
2
2
2
1
2
2
where: = and = . Now, (U +) and (U + ) are non-central chi-square distributed with noncentrality parameters and and 1 degree of freedom. The distribution of the sum of two non-central chi-square distributed random variables is also non-central chi-square distributed. Press [77] has shown that the distribution of linear functions of independent non-central chi-square variates with positive coecients can be expressed as mixtures of distributions of central chi-square's. In addition, the non-central chi-square distribution is a special case of a type I Bessel function distribution and the more general form of the distribution of a linear function of Bessel independent random variables is derived in Springer [97]. We have shown that the distribution for the gradient can be derived from rst principles. In the situation where the input noise is additive zero mean Gaussian noise we have shown that the ratio G = is a non-central chi-square distribution. with 2 degrees of freedom and non-centrality parameter C = ( + )= . Now, the gradient edge detection process labels a pixel as an edge if G^ > T . Hence a pixel is labeled as an edge pixel if G^ = > T = . But = =r cr and G^ = is distributed like a (C ). Therefore the probability of detecting the edge can be given by ! T r c r ^ (4:5) P (G > T ) = Prob (C ) > and the probability of misdetecting the edge is given by ! T r c r Pmisdetection = Prob (C ) < (4:6) . 2
1
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
35 4.2.2 Probability of False alarm at the edge detector output
Another parameter that describes the perturbations in the edge detector output is the probability of false alarm of the edge detector. To determine the probability of false alarm, we assume that the input data at the edge detection step is a region of constant gray tone values with additive Gaussian noise. Since a pixel is labelled an edge pixel if the estimated gradient value, G, is greater than a speci ed threshold, T , the probability of false detection is Prob(G > T ). The coecients and of the facet model described in chapter 2 are normally distributed with zero mean. If the input noise variance is then the variance of , is equal to: 2
2
XX
= 2
The variance of , , is equal to:
r
c
r: 2
(4:7)
c:
(4:8)
2
XX
= 2
r
c
2
Note that the summations are done over the index set for r and c. Since G = + , if we assume a square neighborhood then the G = is chi-square distributed with 2 degrees of freedom. So the probability of labelling of a noise pixel as an edge pixel can be computed once we know the variance for the parameter . Speci cally, the probability of false alarm is given by: PPr! T r c Pfalsealarm = Prob > (4:9) 2
2
2
2
2
2
2
2
2
2
Note that only when the operator uses a square neighborhood the estimates of the variances for and are equal. The above simpli cation is possible only under this condition. On the other hand when a rectangular neighborhood is used the only dierence is that G is distributed as a linear combination of two chi-square distributed random variables. 2
4.2.3 Edgel Orientation Estimate Distribution When the orientation ^ is estimated by the expression:
^ ^) ^ = tan; ( = 1
36 it can be shown that the distribution of the orientation estimate under our perturbation model assumptions is the VonMises distribution. In fact, the conditional distribution for the orientation estimate given the estimated gradient and the true gradient value is: P (^ = j ; ; g; g^) = 0
1 expcos ; ; < : 2I () (
0)
0
Here the precision parameter is equal to: gg^r;cr = , is the true orientation value and I (x) is the modi ed Bessel function of rst kind and order 0. The above expression can be derived by examining the joint distribution of ^ and ^. The joint density function of ^ and ^ is given by: ; ; ; 1 p(^; ^) = 2 e (4:10) 2
2
0
0
(^
)2 +( ^ 2 2
)2
2
Applying the transformation: ^ = g^ cos ^ and ^ = g^ sin ^ and making use of the fact that = g cos and = g sin , we can write the joint probability density function, p(^g ; ^): 0 1 ! g g ! ; 1 g ^ g g ^ @ gg e(gg ; = )A I e (4:11) 2I Note that the second term in the above expression is a probability density function corresponding to the non-central chisquare random variate g^ . The rst term is the conditional pdf p(^jg; g^; ) and this pdf has the Von-Mises form. It is clear from the conditional pdf that the orientation estimate has signi cantly high variance when the true gradient magnitude is low. It can be seen that as g tends to in nity the precision parameter tends to in nity. As g ! 0, the distribution function approaches the uniform distribution. ^ cos( ^
0
) (
^
2
)2
2
0
^2 + 2 2 2
2
2
4.2.4 Positional Error Analysis of Gradient Based Edge Detectors
In this section we derive the expression for the mean error in the edge pixel location. We consider an ideal edge model that has an one-dimensional intensity pro le of a ramp. Speci cally, the intensity pro le is de ned by:
I (x) = a + Gx
(4.12)
37 for x = ;K ; 1=2; : : :; K ; 1=2 = a ; G(K ; 1)=2 for x < ;K ; 1=2 = a + G(K ; 1)=2 for x > K ; 1=2 We assume that the edge detection is performed by computing the gradient by tting a planar surface to the grayscale values as in [82]. In 1-dimension this problem is equivalent to tting a line to the data for each 1 by K neighborhood. There are two kinds of errors that are introduced in the t, one error is the systematic bias that is introduced in the t due to the approximation of the function I (x) by a linear t in the 1 by K neighborhood and the other error is the error introduced due to the additive noise in the input. Let G(x) be the gradient estimate obtained when the least squares t is performed for the window of ideal data I (i); i = x ; (K ; 1)=2; : : : ; x + (K ; 1)=2. Clearly, G(0), the gradient estimate when the neighborhood overlaps the entire ramp, is equal to the true slope G. Also, G(x); jxj > K is equal to zero. In addition, one can note that G(x) is a symmetric function since I (x) is symmetric. When the discrete samples are corrupted with additive i.i.d Gaussian noise with zero mean and variance , then the estimates for the gradient values, G^ (x), are normal random variables with true mean G(x) and variance = P i where the sum is taken over values of i = ;(K ; 1)=2; : : : ; (K ; 1)=2. Neighboring gradient estimates, G^ (x) and G^ (x + j ), are dependent random variables because of the overlap in the neighborhoods used during the estimation procedure. If we viewed the sequence of 2K ; 1 random variables G^ (x); x = ;(K ; 1); : : : ; K ; ^ then G^ is distributed as a multivariate Gaussian random 1 as a random vector G variable with mean vector G(x) and covariance matrix AA0, where the matrix A is obtained from tting kernel coecients as described in the appendix and is the covariance matrix of the additive noise vector which is assumed to be I. The matrix A captures the dependence between the adjacent gradient estimates. In order to compute the error in the edge pixel position, we assume that the pixel with the maximum gradient magnitude along the gradient direction is labelled as an edge, while all the other pixels are labelled as non-edge pixels. That is, the edge pixel's index is ep when: 2
2
2
2
G^ (ep) > G^ (x) 8 ; (K ; 1) < x < (K ; 1); x 6= ep
(4.13)
38 Hence the probability that the location i is labelled as edge is given by a multivariate integral with appropriate limits speci ed by the gradient threshold used. That is, the probability is given by the expression: (ep = i) = Z Z Prob 1 Z xi : : : (G(x); AA0 )dxidxj x T xj =0
i=
+
(4.14)
j6=i
Z ;T
Z
xi =;1
0
xj =xi
:::
Z
(G(x); AA0)dxidxj
j6=i
where is the multivariate normal distribution function. has two sums in the integral because the threshold T is actually on the absolute value of the gradient. The mean error in the edge pixel location is then given by:
=
KX ;1 i=;(K ;1)
i Prob(ep = i)
(4:15)
Note that the true index of the edge pixel is at 0. The expressions for Prob(ep = i) gives the probability mass function of the estimated edge pixel index i. 4.3 Relationship of Output Perturbations to Input Perturbations (Edge Detection with Hysteresis Linking) A popular edge detector is the Canny edge detector [10]. We provided a brief review of the Canny edge detector in our literature review. The Canny edge detector includes a linking stage (often called hysteresis linking). In the following sub-sections we derive expressions for the probability of false alarm and misdetection in the edge linker output. 4.3.1 Analysis of Edge Operator with Hysteresis Thresholds
In this section we show how the above analysis can be used to derive expressions for the false alarm and misdetection probabilities when the hysteresis linking idea of Canny [10] is used. Canny uses two thresholds:
a high gradient threshold, T , to mark potential edge candidates, and 1
39
a low gradient threshold, T , that assigns edge label to pixels if there exists at 2
least one pixel in the pixel's neighborhood that has gradient magnitude greater than T . 1
More formally:
O(r; c) = 1 if G(r; c) > T or if G(r; c) > T and 9(R; C ) 2 Nr;c 3 G(R; C ) > T : = 0 elsewhere 1
2
1
(4.16)
Let Fg denote the cumulative distribution function for the gradient magnitude. Let W denote the number of pixels in the neighborhood. Then the probability of labelling a pixel as an edge pixel is given by :
P (edge) = 1 ; Fg(T ) + ((Fg(T ) ; Fg (T ))(1 ; fFg(T )gW ; )) 1
1
2
1
1
(4:17)
The term 1 ; Fg(T ) is the probability that the gradient magnitude is greater than T . The rest of the term is the probability that the current pixel being examined has a gradient magnitude between T and T and there exists at least another pixel with gradient value greater than T . Here we assume that the candidate pixels considered in the window have similar orientation estimates. That is, their edge orientation estimates are close to each other. One can relax this assumption and include the eects of the noise on the orientation estimate. The cumulative distribution would then be on two variables, the orientation and gradient magnitude. 1
1
2
1
1
Probability of misdetection
Using equation ( 4.17) we can write the expression for the probability of misdetection (when hysteresis linking is used) as:
Pmisdetection = qh = Fg(T ) + (Fg (T ) ; Fg (T ))Fg (T )W ; 2
1
2
1
1
(4:18)
A glance at the above expression indicates that this probability is going be smaller than the misdetection probability for an edge operator with a single gradient threshold. The probability of misdetection, when hysteresis linking is not used, is given by Fg (T ). Since T is much less than T , Fg (T ) is less than Fg (T ). The second term in the above expression can be at most equal to Fg(T ) ; Fg(T ). Hence qh x; i = 1; : : : M ; 1) (1)
If the Yi 's are independent and identically distributed with cdf Fy , we have:
Fy (x) = Prob(min(Yi; i = 1; : : : ; M ; 1) x) = 1 ; fProb(Yi > x)gM ; = 1 ; f1 ; Fy (x)gM ; (1)
1
1
(4.27)
The cumulative distribution function of the maximum of i.i.d. Yi's (the highest order statistic Y M ; ) is given by: (
1)
Prob(max(Yi ; i = 1; : : :; M ; 1) x) Prob(Y M ; x) Prob(Yi x; i = 1; : : : M ; 1) Fy (x)M ;
Fy m; (x) = = = = (
1)
(
1)
1
(4.28)
When the Yi's are independent, but not identically distributed, then the cdf of the minimum value of Yi 's is given by:
Fy (x) = 1 ; (1)
MY ;1 i=1
(1 ; Fyi (x))
(4:29)
where Fyi is the cdf for Yi . Similarly, the cdf of the maximum of Yi's is given by:
Fy M ; (x) = (
1)
MY ;1 i=1
Fyi (x):
(4:30)
44 Distribution of Yd and Ye
We know that: Ye = ;minfY ; 0g. Hence Ye can be written as: (1)
Ye = ;Y if Y 0 = 0 elsewhere (1)
(1)
(4.31)
The cdf of Ye is given by the following expression when the Yi 's are i.i.d random variables:
FYe (x) = (1 ; Fy (;x))M ; = 1 ; (1 ; Fy (0))M ; 1
1
x>0 x=0
(4.32)
When the Yi's are independent but not identically distributed: the cdf of Ye is given by:
FYe (x) =
MY ;1 i=1
= 1;
(1 ; Fyi (;x))
x>0
MY ;1 i=1
(1 ; Fyi (0))
x=0
(4.33)
Similarly, Yd = maxfY M ; ; 0g. Hence Yd can be written as: (
1)
Yd = Y M ; if Y M ; > 0 = 0 elsewhere (
1)
(
1)
(4.34)
Then the cdf of Yd, for i.i.d Yi 's, is given by:
FYd (x) = Fy (x)M ;
1
x0
(4:35)
When Yi 's are independent, but not identically distributed, FYd is given by:
FYd (x) =
MY ;1 i=1
Fyi (x)
x0
(4:36)
45 4.4.3 Distribution of Edge Strength
The output edge strength is given by O = min(Yd; Ye ). The cdf for O can be easily obtained if Yd and Ye were independent. In order to derive the expression for the cdf of O, we rewrite the expression for O as:
O = max(min(;Y ; Y M ; ); 0): (1)
(
(4.37)
1)
This is done in order to bring out the fact that the distribution of edge strength is dependent on the joint distribution of the min and max of the samples in the detector window. The joint density of the minimum and maximum of the Yi's can be written (for i.i.d Yi 's) as:
f (x ; x ) = (M ; 1)(M ; 2)fy (x )fy (x )[Fy (x ) ; Fy (x )]M ; 1
2
1
2
2
1
x x
3
1
(4:38)
2
The cdf for O (for iid Yi's) is then given by the following expression: Z ;x Z 1 FO (x) = 1 ; f (x ; x )dx dx ;1 x = (1 ; Fy (;x))M ; + (Fy M ; (x) ; (Fy (x) ; Fy (;x))M ; ) x > 0(4.39) 1
2
1
2
1
1
1
But the Yi's are in fact dependent (since each value has a common term Xj (the current pixel value) that is being subtracted from Xi ). To handle the situation, we write the joint density of the minimum, maximum of the original data samples Xi's (i 6= j; i = 1; : : : ; M ), and the random value Xj : (M ; 1)(M ; 2)fx(x )fx(x )[Fx(x ) ; Fx(x )]M ; fx(x ) x x ; ;1 < x < 1 (4.40)
g(x ; x ; x ) = 1
2
3
1
1
2
2
2
3
2
3
3
Note the use of fx and Fx instead of Fy 's here. Integration of the above density function within the limits of x = ;1 to x = x ; z and x = x + z to 1 and x = ;1 to x = 1 gives 1:0 ; FO(z), where FO(z) is the cdf of the edge strength estimate. Thus using the result given (for iid samples) above: Z1 FO (z) = (1 ; Fx(x ; z))M ; + FxM ; (x + z) ; (Fx(x + x) ; Fx(x ; x))M ; dFx(x ) ;1 (4:41) 1
3
1
3
2
3
3
3
1
1
3
3
3
1
3
46 The cdf for O for independent, but not identical Yi 's, is given by the expression ( 4.39) with f (x ; x ) equal to: 1
2
X
f (x ; x ) = 1
2
i;j );i6=j
(
fyi (x )fyj (x ) 1
2
MY ;1 k=1;k6=i;k6=j
Fyk (x ) ; Fyk (x ) 2
1
(4:42)
The expression when the eect of dependence between Yi's is considered gets more complicated in this case. 4.4.4 Probability of False alarm
In this section, we use the results of the previous section(s) to derive the expression for the probability that a non-edge pixel gets labelled as an edge pixel. We assume that the input to the detector is a sequence of i.i.d Gaussian samples with zero mean and variance z . The probability of falsely labelling a noise pixel as an edge pixel is given by: p = 1 ; FO(T ), where T is the edge strength threshold used, where FO (T ) is the cumulative distribution function for the edge strength obtained when the Yi 's are Gaussian with zero mean and variance 2 , where = z =M . 2
2
2
2
4.4.5 Probability of Misdetection
Using the results obtained in previous sections, we can derive the expression for the probability of misdetection of the blur-min edge detector. Let the true gradient value (the slope of the ramp) be Gt. Let the neighborhood size be M pixels. Assuming that the slope spans the entire neighborhood the gray values in the ramp can be written as: I + ((Gt )i); i = 0; : : : ; M ; 1, where I is the gray value in the left most pixel in the window. We assume that the image values are corrupted with additive Gaussian noise with zero mean and variance z . We can see that the Xi's in section ( 4.4.2) are nothing but Gaussian random variables with mean I + (Gti) and variance . The dierences of Xi 's, Yi's, are also Gaussian random variables with means, i , and variance 2 . Here the i 's are speci ed by the sequence: Gt(M ; 1)=2; Gt (M ; 2)=2; : : : ; Gt; ;Gt; : : :; ;Gt(M ; 2)=2; ;Gt (M ; 1)=2. Note that there are only M ; 1 Yi's, because the dierence from the center pixel is zero. Since the Yi's are independent, we can use equations ( 4.29) and ( 4.30) to obtain the cdf's for the minimum and maximum of Yi 's. The derivation for the cdf of the output edge 1
1
2
1
2
2
47 strength is analogous to the derivation in section ( 4.4.2). If FO is the cdf for the output edge strength, the probability of misdetection, when a threshold T is used, is given by: Pmisdetection = FO (T ) (4:43) As was seen in section ( 4.4.2) we need to numerically integrate f (x ; x ), the joint pdf of the minimum and maximum of the random variables, in order to compute the above probability. When the noise standard deviation is less than Gt=4, we can use the following approximation for computing the distribution of the edge strength: 1
Prob(O x) = 1 ; Prob(Z > x; Z > x) 1
2
2
(4:44)
where Z and Z are independent random variables with cdf's corresponding to the CDF's of the erosion and dilation residues. Z and Z are the dilation and erosion residues and the edge strength distribution is given by: 1
2
1
2
Prob(O x) = 1 ; (1 ; Fz (x))(1 ; Fz (x)) 1
2
(4:45)
The above simpli cation is possible because modes of the density functions for the M th sample and the rst sample are well separated and the min and the max of the samples may be considered to be independent of each other. Even in this situation, the dilation and erosion residues are dependent random variables (because the residues have a common variable, the center pixel in the neighborhood). If we want to take this into account, one has to integrate the joint density of the random variates X and XM ; and Xj in a manner similar to the derivation of equation 4.41. When is comparable to Gt then equation ( 4.39) will have to be used and the probability has to be computed by numerical integration. (1)
1
4.5 Experimental Protocol In the previous sections we derived theoretical relationships describing the performance of an edge detector as a function of the input data parameters, perturbation model parameters and tuning parameters. This section describes the experimental protocol employed to verify the theory and to perform empirical comparison of the edge detectors. The objective of the experiments (to evaluate edge detectors) is to plot performance measures such as: edge pixel localization error, probability of false alarm, probability of misdetection as functions of the input signal to noise ratio.
48 4.5.1 Image generation
Synthetic images of size 51 rows by 51 columns were generated with step edges at various orientations passing through the center pixel (R; C ) = (26; 26) in the image. The gray value, I (r; c), at a particular pixel, (r; c), in the synthetic image was obtained by using the function where (r; c) = (r ; R)cos() + (c ; C )sin().
I (r; c) = Imin ; (r; c) < 0 = Imax; otherwise:
(4.46)
Imin and Imax are the gray values in the left and right of the step edge. The variables R and C designate a point in the image on which the step edge boundary lies. In our experiments we set Imin to be 100 and Imax to be 200. We used orientation () values of 0, 15, : : :, 175 degrees. To generate ramp edges, we averaged images containing the step edges with a kernel of size 4 by 4 so that the resulting ramps have 5 pixels width. To these ramp edge images we added additive Gaussian noise to obtain images with various signal to noise ratios. We de ne the signal to noise ratio as: s (4:47) SNR = 20log n where s is the standard deviation of the gray values in the input image and n is the noise standard deviation. We used SNR values of 0, 5, 10, 20 dB. They correspond to s=n values of 1, 1.78, 3.162, and 10 respectively. Groundtruth edge images were generated by using the following function where (r; c) = (r ;R)cos()+(c;C )sin(). I (r; c) = = I (r; c) = = I (r; c) = 1
2
0 (r; c) < ;0:5 1 otherwise: 0 (r; c) < 0:5 1 otherwise: I (r; c) exor I (r; c) 1
(4.48)
2
4.5.2 Edge pixel localization error { Evaluation Procedure
Given binary edge data of the true line segment and the data from the edge detector, before one can compute the edge localization error, one has to establish the correspondence between ground truth pixels and detected edge pixels. This correspondence is
49 P1
G1
W
Figure: Illustrates an ideal line segment and the neighborhood around the segment used for establishing groundtruth to detected edge pixel correspondence. Groundtruth pixel G1 has correspondence to P1 (the closest detected pixel in the direction of maximum intensity change). Detected pixels Ground truth Segment
W - Edge operator width
Figure 4.1: Edge detector Evaluation { Figure illustrates how a detected edge pixels are associated with groundtruth pixels. not necessarily one to one, since we have both false alarms and misdetections. For example, a given ground truth pixel may not have a corresponding pixel in the set of detected pixels due to a miss. On the other hand, a given pixel in the edge detector output may not correspond to any ground truth pixel, because it was a false alarm. In order to compute the edge localization accuracy, we need a convention by which this groundtruth to detected pixel correspondence can be established. The edge accuracy evaluation proceeded as follows. The edge pixel location error E is de ned as the distance along the gradient direction from the true edge pixel to the nearest labelled edge pixel (if one exists, in the edge detector output). A given ground truth edge pixel is assumed to be missing in the detector output if if there are
50 no edge pixels in the detector output within an interval centered on the ground truth edge pixel. The interval is oriented along the gradient direction and the number of pixels in the interval is equal to the edge operator width. We will refer to this interval, as the \valid zone" for each pixel. In addition to the computation of edge pixel location error as given above, we also compute the following statistics from the output image. We visualize the edge and non-edge labellings encountered as one walks along the valid zone as a sequence of alternating 0 and 1 runs. We compute the mean and variances for the lengths of the gaps and the edge segments. In the ideal case when there is no error, the edge segment lengths will have mean value of 1 and a variance of zero, whereas the gap segment lengths will have a mean value equal to the bW=2c, where W is the window operator neighborhood size. At low levels of edge gradient threshold the edge detector responses are thick regions and the edge segment length values may vary from 1 to W . The segment length and gap length statistics capture this aspect. The edge operators employed included the gradient-based operator and the morphological blur-minimum operator. We used 5 by 5 and 3 by 3 neighborhoods for the operators. 4.6 Results We rst illustrate that the theoretical expressions derived in this chapter agree with experimental results. This is done by plotting the empirical distribution function for the edge strength/orientation estimates and comparing it with theory. Figure 4.2(a) gives the cumulative distribution function for the scaled squared gradient estimate (obtained from 10000 trials) and the theoretical distribution. It can be seen that there is complete agreement between theory and experiment. Since the orientation estimate distribution was conditioned on the estimated gradient value, instead of obtaining the empirical conditional distribution we evaluate the rst and second moments of the observed orientation distribution and compare with the expected mean orientation and the expected precision parameter for the VonMises distribution. We know that the expected precision parameter value is given by: E (gg^= ) = (g= )E (^g). Making use of the fact that g^ = is distributed as a non-central chisquare distribution with 2 degrees of freedom, the expected value for g^ was found (See [53], for the derivation of the moments of the non-central chisquare distribution). Figure 4.2(b) gives the 2 1
2
2 1
2 1
51 plot of the expected orientation estimate and the observed empirical estimate along with the standard deviations. Note that the expected orientation estimate and its circular variance were calculated according to the estimation procedure outlined in Mardia [59] (pages 25-27). The precision parameter for the VonMises distribution was estimated by using the maximum likelihood estimation procedure outlined in Mardia [59] (pages 122-123). Figure 4.2(c) gives the plot of the estimated precision parameter and the theoretical expected precision parameter value. It can be seen that the theory agrees with the experiment. We compared the theoretical and experimental cdf's for the edge strength estimate of the morphological operator. We found that the observed cdf matches theoretical predictions expressed in equation 4.41. It was seen that ignoring dependence between the random variables in the order statistic calculation given in equation 4.37 resulted in large departures from the experimental data. Figure 4.2(d) illustrates the comparison of the theoretical and experimental cdf of the edge strength estimate when the input was a i.i.d gaussian noise eld. No blurring of the noise eld was done. The standard deviation was 20 and the mean value 100. The neighborhood size used was 5 by 5. The x axis for the cdf was standardized by the standard deviation value. It can be seen that the theory agrees with the experiment. We also plotted Pmisdetection and Pfalsealarm against gradient or edge strength threshold T , for various noise variances z . We also plot the theoretical operating curve Pmisdetection vs Pfalsealarm for these edge detectors. The theoretical plots were obtained by varying/setting the values for the parameters in the ranges speci ed below: 2
Ideal edge gradient values { 0, 3, 3.5, 4, 8, 10, 100 Noise variance values { 1, 10, 25, 100 Gradient threshold values { 0.1 to 50.0 Edge operator window size { 5 x 5, 3 x 3 hysteresis threshold value { T = 0:5 T . 2
1
From the theoretical analysis, it is clear that gradient edge detection with hysteresis linking is superior to gradient edge detection without hysteresis linking. As has
52 been pointed out by Hancock and Kittler [27], the output from the hysteresis linking algorithm may consist of short segments obtained due to correlated noise. For a given threshold T the probability of false alarm with hysteresis linking is higher. This is expected since we are admitting more pixels to be edge pixels based on contextual information. However, if T is suciently large, the false alarm obtained with hysteresis linking is comparable to that obtained without hysteresis linking. The plots obtained con rm the above points. 1
1
Figure 4.3(a) gives the theoretical false alarm vs misdetection plot for a conventional gradient-based edge detector. The graylevel noise variance was set at 25, and a 5 by 5 window size was used. The true edge slope was varied from 2 to 5. In a normal image only a fraction of the pixels are true edge pixels and hence the absolute count of the number of pixels falsely labelled as edge pixels would be quite high. Figure 4.3(b) gives the theoretical false alarm vs misdetection plot for the blur-minimum edge operator (Note: these plots ignore: the dependence between the samples in the order statistic calculation and noise correlation eects. This results in probability of false alarm estimates that are upper bounds of the actual false alarm estimate for a range of edge strength threshold values). The expression of equation 4.41 is a better approximation to use while computing the probability of false alarm, but that expression assumes no noise correlation. The false alarm probability obtained using that expression is a stronger bound but is still an upper bound of the true probability of false alarm for a range of edge strength values. We validated this empirically. Figure 4.3(c) gives the plots comparing the detectors for an edge slope of 4.0, noise variance of 25 and window size of 5 by 5. It can be seen that when the edge slope is equal to 4, the false alarm rate of the gradient-based edge detector, for a misdetection rate of 10% is approximately 2 percent. It can be seen that for a misdetection rate of 10%, the corresponding false alarm rate for the blur-minimum edge detector when correlation eects (due to blurring) are not considered is as much as 18%. Since this value was seen empirically as a loose upperbound for a range of edge strength threshold values, no quantitative statement about the false alarm rate for the blur-minimum edge operator can be made. Figure 4.3(d) shows the comparison of the edge detectors when a window size of 3 by 3 is used. It can be seen that the gradient-based detector performs the worst. An intuitive explanation for this is as follows: the blurmin operator uses min and max to estimate the edge strength and
53 as the sample size grows (i.e. the window size is bigger) the estimates for the min and max will tend towards ;1 and 1. Thus the estimated edge strength will be large, even though the noise variance is not as high. On the other hand with the gradient-based scheme, a larger window size implies better t and the lesser variance for the estimates. Hence one would expect the false alarm probability to decrease with increasing window size for the gradient-based operator, whereas the probability would increase (if one ignores eects of correlation) with increasing window size for the morphological operator. In order to verify our theory we generated test ramp images with varying levels of additive noise and plotted the false alarm vs misdetection characteristics. One such plot obtained, when the true edge slope was 4.0 and the input noise variance was 100, is given in gure 4.4(a). Figure 4.4(a) shows that the performance of the blurminimum operator is the best, followed by the gradient-based edge operator. The performance of the blur-minimum operator (when noise is not correlated) is poor as predicted by theory. From the expressions for the false alarm and misdetection probabilities for the morphological edge detector, it is not easy to infer whether it is superior to the other detectors. Since the theoretical false alarm characteristics only gives an overestimate on the probability of false alarm for the blur-minimum operator, one cannot say anything about its superiority over the gradient-based edge detection scheme. Currently, our theoretical model assumes that the output after blurring is a ramp edge with additive Gaussian noise. The added noise at each pixel is assumed to be i.i.d Gaussian samples. The eects of correlation between neighboring pixels should be addressed in a subsequent paper. Figure 4.5 illustrates the results obtained by applying the edge detectors on the Dr. Einstein image and the brain image. The operator window size was set at 3 by 3 and the threshold varied. One can view both the output from both edge detectors as estimates of the edge strength, and the fundamental dierence between the two detectors is the way in which the edge gradient estimation is done. The morphological method estimates the edge gradient by a non-linear technique whereas the conventional least squares tting method uses a linear lter. In order to evaluate the detection schemes with real images we x a particular threshold, T , for the blurminimum edge operator and then vary, Tg , the threshold for the gradient-based edge
54 operator until the same degree of false alarm (due to texture in the data) was obtained. It can be seen from gure 4.4 that the blur-minimum edge operator output captures more of the structure in the brain image. Also, the edges obtained by using the blurmin operator are thinner. A thorough evaluation of the performance of the operators on real images can be done if the images are obtained by controlled experiments and if ground truth information is available. We address the issues concerning groundtruth and performance evaluation on real data sets in a subsequent chapter. The results obtained from the experiments are given in gures 4.6 through 6. The curves were obtained by taking the running mean of adjacent samples. The window size for the running mean operation was 5. The results shown in the plots are the results obtained after 10 replications. Figures 4.6 and 4.9 illustrate how the mean length of the run of edge pixels varies with edge strength threshold for the morphological operator and the gradient based operator. It is clear from the plots that as the edge strength threshold is increased the run length drops to a value of 1. When the gradient threshold is high, we label a lesser number of pixels as edge pixels in the output, and hence the runs encountered are of small width. Another point that the plots illustrate is that as the signal to noise ratio increases from -5 to +20 dB the slope of the curve increases. This eect is due to the fact that the noise has the eect of smoothing on the ideal run-length pro le. Ideally, we expect the run-length to be a linear function of the threshold (since the input consists of linear ramp edges). Figures 4.7 and 4.10 illustrate how the mean gap length varies with the edge strength threshold. As expected, the mean gap length monotonically increases as a function of the edge strength threshold. In the ideal case, we expect the mean gap length to be a linear function of the edge strength threshold and in the presence of large degree of noise this ideal function is blurred. Figures 4.8 and 4.11 illustrate how the mean edge pixel positional error varies with edge strength threshold. It is clear that the error drops to zero when the signal to noise ratio is high. When the signal to noise ratio is 0 or 5 dB, it can be seen that the mean error is as much as 0.5 pixels. A comparison of the plots for the morphological and gradient based operators indicate that the gradient based scheme is superior for signal to noise levels of 0 dB and higher. The gradient based scheme has comparable errors when the signal to noise ratio is -5 dB. We concluded above that the morphological operator had superior false alarm vs misdetect characteristics. The experiments here point out
55
Comparison of Empirical CDF and Theoretical CDF
Mean Orientation (vs SNR)
( Gradient based Edge operator, 5 by 5 window size, input snr = 1) 1.0 Theoretical CDF Empirical CDF (1000 Trials)
0.6
0.4
0.2
0.0 0.0
Experimental Theoretical
0.8500
Orientation (radians)
F((g/sigma_o)^2)
0.8
0.9000
0.8000
0.7500
0.7000
0.6500
20.0
40.0 60.0 (g/sigma_o)^2
80.0
100.0
0.6000 0.0
Precision of Orientation estimate
1.0 2.0 Signal to noise ratio (g/sigma)
Blurminimum operator −− Validation of Theory
( vs SNR)
( i.i.d. sample case, window size 5 by 5, noise sd = 1.0 )
500.0
1.0 Experimental Theoretical
Empirical CDF Theoretical CDF
0.8 P(Edge Strength L = 0 otherwise 2(
2
)
(5.38)
Distribution of the edge segment lengths after gap lling
Suppose that exactly i gaps were lled to produce a single segment in the output. Hence there are exactly i +1 edge segments and i gap lengths between these segments. The gap lengths were all less than L, otherwise they would not have been lled. Let Xj ; j = 1; : : : ; i + 1 denote the sequence of random variables for the edge segment lengths in the input and Xk0 ; k = 1; : : : ; i denote the sequence of gap lengths in the input. Then the length of a single output segment is given by: i X +1
j =1
Xj +
i X k=1
Xk0 :
(5:39)
But we know that Xj 's are i.i.d exponential random variables with parameter and Xk 0's are i.i.d truncated exponential random variables with probability density function: 1
; x P (Xk 0 = x) = 1;ee; L if x < L = 0 otherwise. 2
2
2
(5.40)
It can be shown that the probability density function for the sum of i+1 exponentially distributed random variables ( Xj 's ), with parameter , is given by: 1
i
;1 y
P (Y = y) = ( y) i! e 1
1
0y