Morphological Scale-Space With Application to Three ... - CiteSeerX

1 downloads 0 Views 2MB Size Report
In a second demonstration we show the recognition of eight mountains in a digital elevation map ..... in `Proceedings of the SPRC Workshop on Signal Processing and its ..... centred on di erent frequencies so as to cover the total information content ..... posed to multi-scale, image processing has come to imply a reduction in.
Morphological Scale-Space With Application to Three-Dimensional Object Recognition

Paul Thomas Jackway

B.E. (Electronics Engineering), Royal Melbourne Institute of Technology (RMIT) G.Dip. (Applied Statistics), RMIT M.Ap.Sc. (Mathematical Modelling & Data Analysis), RMIT

Signal Processing Research Centre, School of Electrical & Electronic Systems Engineering, ABC Queensland University of Technology, G.P.O. Box 2434, Brisbane, Queensland 4001, Australia.

Submitted as a requirement for the degree of Doctor of Philosophy (Ph.D.), Queensland University of Technology, 1994.

Key words             

Scale-space mathematical morphology computer vision object recognition pattern recognition image analysis image processing multiscale multiresolution multidimensional signal processing range image face recognition digital elevation map

i

Abstract This thesis develops and demonstrates an original approach to scale-space theory. A new scale-space theory based on a uni ed multiscale morphological dilation-erosion smoothing operator is presented. The essential scalespace causality property for local extrema of a signal under this operation is proved. This result holds for signals on R2 and higher dimensions and for negative as well as positive scales. When applied to grayscale images we show that structuring functions from the \elliptic poweroids" lead to favourable dimensionality and semi-group properties. Paraboloids, in particular, allow ecient computation of the scale-space, and such an algorithm is presented. The generalised frequency response of this signal smoother, which is similar to that of a Butterworth lter (with an amplitude dependent corner frequency), is obtained. The lter is statistically characterised by obtaining second-order statistical properties of the output signal with independent and identically distributed uniform noise input. Similar scale-space results are obtained for the multiscale morphological closing-opening operator, and we show that the resulting scale-space ngerprints are identical to those of the dilation-erosion. To demonstrate the utility of the new theory, we present an approach for the recognition of multiple 3-D objects in range data via the local matching of surfaces. In this approach the reduced morphological scale-space ngerprint is used as the primitive for matching. The resulting recognition process is invariant to translation, rotation, limited scaling, and partial occlusion. The results of the proposed object recognition method showing the recognition of a scene containing nine faces at various positions, angles and scales is presented. In a second demonstration we show the recognition of eight mountains in a digital elevation map.

ii

Contents Key words Abstract Abbreviations and symbols Publications Authorship Acknowledgements Outline 1 Introduction

1.1 The Field of Computer Vision : : : : : 1.2 Scale and Signal Processing : : : : : : 1.2.1 Example: image edge detection 1.3 Scale-Space Filtering : : : : : : : : : : 1.4 The Research Tasks : : : : : : : : : : : 1.5 Contributions of the Thesis : : : : : : 1.6 Summary : : : : : : : : : : : : : : : :

: : : : : : :

: : : : : : :

: : : : : : :

: : : : : : :

: : : : : : :

: : : : : : :

: : : : : : :

: : : : : : :

: : : : : : :

: : : : : : :

: : : : : : :

: : : : : : :

: : : : : : :

i ii x xii xiv xv xvi 1 2 4 6 8 10 11 12

2 A Review of Scale-Space Filtering, Mathematical Morphology and Object Recognition 13 2.1 Introduction : : : : : : : : : : : : : : : : 2.2 Scale-Space Filtering : : : : : : : : : : : 2.2.1 Scale-Space Filtering : : : : : : : 2.2.2 Multiresolution Image Processing 2.2.3 Wavelet Theory : : : : : : : : : : 2.2.4 Summary and Conclusions : : : : 2.3 Mathematical Morphology : : : : : : : : 2.3.1 Dilation and Erosion : : : : : : : iii

: : : : : : : :

: : : : : : : :

: : : : : : : :

: : : : : : : :

: : : : : : : :

: : : : : : : :

: : : : : : : :

: : : : : : : :

: : : : : : : :

: : : : : : : :

: : : : : : : :

: : : : : : : :

14 15 15 19 20 23 24 24

iv

CONTENTS

2.3.2 Morphological Filtering : : : : : : : : : : : : : : 2.3.3 Morphology with Scaled Structuring Functions : 2.3.4 Summary and Conclusions : : : : : : : : : : : : 2.4 Object Recognition in Range Images : : : : : : : : : : 2.4.1 Introduction : : : : : : : : : : : : : : : : : : : : 2.4.2 Previous Work in Object Recognition : : : : : : 2.4.3 Conclusion : : : : : : : : : : : : : : : : : : : : : 2.5 Summary : : : : : : : : : : : : : : : : : : : : : : : : :

: : : : : : : :

: : : : : : : :

: : : : : : : :

: : : : : : : :

29 32 32 33 33 35 40 40

3 Scale-Space Properties of the Multiscale Dilation-Erosion 42 3.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : 3.2 Multiscale Morphology : : : : : : : : : : : : : : : : : : : : : : 3.2.1 Foundation and Notation : : : : : : : : : : : : : : : : : 3.2.2 Scale Dependent Morphology : : : : : : : : : : : : : : 3.2.3 Multiscale Dilation-Erosion Scale-Space : : : : : : : : 3.3 Properties of Multiscale Dilation-Erosion : : : : : : : : : : : : 3.3.1 Properties of the Filter Support : : : : : : : : : : : : : 3.3.2 Continuity and Order Properties of the Scale-Space Image : : : : : : : : : : : : : : : : : : : : : : : : : : : 3.3.3 Signal Extrema in Scale-Space : : : : : : : : : : : : : : 3.4 Summary and Conclusion : : : : : : : : : : : : : : : : : : : :

4 Structuring Functions for Multiscale Dilation-Erosion Introduction : : : : : : : : : : : : : : : : : : : Semi-Group Properties : : : : : : : : : : : : : A More General Umbra : : : : : : : : : : : : Dimensionality : : : : : : : : : : : : : : : : : Dimensionality Properties in Scale-Space : : : Second Derivative Properties : : : : : : : : : : The Paraboloid Structuring Function : : : : : Computation : : : : : : : : : : : : : : : : : : 4.8.1 A naive approach : : : : : : : : : : : : 4.8.2 An improved algorithm : : : : : : : : : 4.8.3 Further improvements : : : : : : : : : 4.8.4 The support region : : : : : : : : : : : 4.8.5 Illustrative run times of the algorithms 4.9 Summary : : : : : : : : : : : : : : : : : : : :

4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8

: : : : : : : : : : : : : :

: : : : : : : : : : : : : :

: : : : : : : : : : : : : :

: : : : : : : : : : : : : :

: : : : : : : : : : : : : :

: : : : : : : : : : : : : :

: : : : : : : : : : : : : :

: : : : : : : : : : : : : :

: : : : : : : : : : : : : :

43 44 44 45 47 51 51 53 55 60

61 62 63 65 68 69 76 78 80 81 82 82 85 85 88

5 Spectral and Statistical Properties of the Scaled DilationErosion 89 5.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : 90 5.2 Generalised Frequency Response : : : : : : : : : : : : : : : : : 91

v

CONTENTS

5.2.1 De nitions and Notations : : : : : : : : : : : : : : : : 91 5.2.2 The Output Waveform : : : : : : : : : : : : : : : : : : 95 5.2.3 Asymptotic Results : : : : : : : : : : : : : : : : : : : : 96 5.3 Statistical Properties : : : : : : : : : : : : : : : : : : : : : : : 102 5.3.1 General Results for Noise Input : : : : : : : : : : : : : 104 5.3.2 Results for uniform noise and powerbolic structuring function weights : : : : : : : : : : : : : : : : : : : : : : 108 5.4 Summary : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 113

6 A Multiscale Closing-Opening Scale-Space

115

6.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : 116 6.2 Background : : : : : : : : : : : : : : : : : : : : : : : : : : : : 117 6.3 Scale-Space Properties of the Multiscale Closing-Opening : : : 122 6.3.1 General Properties of the Multiscale Closing-Opening : 122 6.3.2 Causality Theorems for the Multiscale Closing-Opening 126 6.4 Fingerprints in Morphological Scale-Space : : : : : : : : : : : 129 6.4.1 Equivalency of Fingerprints : : : : : : : : : : : : : : : 129 6.4.2 The Reduced Fingerprint as a List : : : : : : : : : : : 134 6.5 Conclusion : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 135

7 Object Recognition Using Morphological Scale-Space Fingerprints 137 7.1 Introduction : : : : : : : : : : : : : : : : 7.2 Fingerprint Extraction : : : : : : : : : : 7.3 Methodology : : : : : : : : : : : : : : : 7.3.1 Introduction : : : : : : : : : : : : 7.3.2 Feature extraction and matching 7.3.3 Data Structures for the Method : 7.4 Face Recognition in Range Images : : : : 7.4.1 Introduction : : : : : : : : : : : : 7.4.2 Methodology : : : : : : : : : : : 7.4.3 Results : : : : : : : : : : : : : : : 7.5 Digital Elevation Map Analysis : : : : : 7.5.1 Introduction : : : : : : : : : : : : 7.5.2 Methodology : : : : : : : : : : : 7.5.3 The e ect of noise : : : : : : : : 7.5.4 Results : : : : : : : : : : : : : : : 7.6 Summary and Conclusions : : : : : : : :

8 Summary and Conclusions

: : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : :

: 138 : 139 : 145 : 145 : 147 : 154 : 156 : 156 : 158 : 162 : 163 : 163 : 164 : 166 : 167 : 167

170

8.1 Summary of the Thesis : : : : : : : : : : : : : : : : : : : : : : 171 8.2 Thesis Contributions : : : : : : : : : : : : : : : : : : : : : : : 174

vi

CONTENTS

8.3 Limitations of the Approach : : : : : : : : : : : : : : : 8.3.1 Continuous Theory | Discrete Implementation 8.3.2 Noise Tolerance : : : : : : : : : : : : : : : : : : 8.3.3 Scale and Human Perception : : : : : : : : : : : 8.3.4 Comparisons with Other Methods : : : : : : : : 8.4 Recommendations for Future Research : : : : : : : : :

A Transformation Estimation by Least Squares B Circle of Best Fit by Least Squares Bibliography

: : : : : :

: : : : : :

: : : : : :

: 175 : 175 : 176 : 176 : 176 : 177

179 183 186

List of Figures 2.1 A Gaussian scale-space ngerprint: the plot of zero-crossings of F (x; ). : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 3.1 A chord from the origin to any point on the structuring function should lie on, or below, the structuring function (shown in 2-D). : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 3.2 Typical 2-D circular structuring functions. (a) at (ie. cylinder); (b) sphere; (b) poweroid with = 2 (ie. paraboloid); (c) poweroid with = 4 (ie. quartoid). : : : : : : : : : : : : : : 3.3 Smoothing of a 1-D signal by Multiscale Morphological DilationErosion : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 3.4 A multiscale dilation-erosion ngerprint: the plot of the extrema of F (x; ) for a random signal. : : : : : : : : : : : : : : 4.1 Shape depends on X; Y scaling : : : : : : : : : : : : : : : : : 4.2 Various structuring functions of the 2-D circular poweroid family, g(x; y) = ?(x2 + y2 ) =2 : (a) cone ( = 1); (b) paraboloid ( = 2); (c) quartoid ( = 4); (d) cylinder ( = 1). : : : : : 4.3 A multiscale dilation-erosion ngerprint: the plot of the extrema of the scale-space image. : : : : : : : : : : : : : : : : : 4.4 An example of dimensionality in scale-space. (a) A random signal. (b) This signal with an anity of size 4.0. (c)&(d) The multiscale dilation-erosion ngerprints of the above signals with a non-dimensional (spherical) structuring function. Note the connectivity and structure of the ngerprints di er because of the anity: (c) has 19 closed loops and (d) 20 closed loops (the di erence is in the 8th loop from the right) indicating the break-down of dimensionality. (e)&(f) The multiscale dilation-erosion ngerprints of the above signals with a dimensional (parabolic) structuring function. Note the structure of the ngerprints remain similar (with 19 closed loops) indicating the conservation of dimensionality. : : : : : : : : : : : : : vii

17 47 48 50 59 67 70 71

75

LIST OF FIGURES

viii

4.5 Various structuring functions of the powerbolic family g(x) = jxj . : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 78 4.6 The naive code : : : : : : : : : : : : : : : : : : : : : : : : : : 81 4.7 Improved algorithm (version 1) : : : : : : : : : : : : : : : : : 83 4.8 Improved algorithm (version 2) : : : : : : : : : : : : : : : : : 84 4.9 Improved algorithm (version 3) : : : : : : : : : : : : : : : : : 86 5.1 Output waveform of the scaled dilation f (t) = (f ~ g )(t). Input signal f (t) = cos t. : : : : : : : : : : : : : : : : : : : : : 97 5.2 Generalised frequency response of the scaled dilation-erosion. : 101 5.3 Generalised frequency response of the scaled dilation-erosion for = 2 and = 4 when preceded by a di erentiator. : : : : 102 5.4 Mean and variance for output versus scale. For U [0; 1] noise input and parabolic and quartic structuring function weights. : 111 5.5 Autocorrelation function for output noise, = 2. : : : : : : : 112 5.6 Autocorrelation function for output noise, = 4. : : : : : : : 113 5.7 Autocorrelation function for output noise, = 1. : : : : : : : 113 6.1 A counter-example to the proof of the theorem in Chen & Yan (1989). f has 3 irregular points; f g1 has 2 irregular points; while f g1 has 1 irregular point. : : : : : : : : : : : : : : : : 118 6.2 A counter-example to the theorem of Jang & Chin (1991). The opening of, f (x) = (jxj mod2 ? 1)2 ; x 2 R, by structuring function, g(x) = ?x2, and the second derivatives of these functions. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 121 6.3 Geometrical interpretation of the opening and closing with partitioning. (a) parabolic structuring function g(x); (b) signal f (x); (c) opening f  g; (d) closing f  g. : : : : : : : : : 125 6.4 A comparison of ngerprints | (a) zero-crossings of second derivative in Gaussian scale-space; (b) zero-crossings of second derivative in multiscale closing-opening scale-space; (c) local extrema in multiscale dilation-erosion scale-space; (d) local extrema in multiscale closing-opening scale-space. : : : : : : : 131 6.5 The reduced morphological scale-space ngerprint : : : : : : : 134 7.1 Finding the scale of a local maximum on a signal : : : : : : : 140 7.2 C code to extract the reduced morphological scale-space ngerprint from local maxima of a 2-D function : : : : : : : : : : 143 7.3 Models for matching with scene : : : : : : : : : : : : : : : : : 159 7.4 Range scene with nine faces for recognition. : : : : : : : : : : 161 7.5 DEM scene with eight regions : : : : : : : : : : : : : : : : : : 165 7.6 Models for matching with DEM scene : : : : : : : : : : : : : : 166

List of Tables 1.1 1.2 4.1 4.2 7.1 7.2 7.3 7.4 7.5 7.6 7.7

Levels of analysis in computer vision. : : : : : : : : : : : : : : 2 Citations of Witkin's original paper (Witkin 1983) : : : : : : : 9 Runing times for the algorithms : : : : : : : : : : : : : : : : : 87 Running times for the algorithms (seconds). IBM PC/AT 80286/7 16MHz. Picture of \Lena" 256  256  8 bits. Language: Topspeed Modula-2 V3.01. : : : : : : : : : : : : : : : : 87 Scene description for the face recognition problem : : : : : : : 160 Recovered scene description and true scene description parameters from the face recognition experiment : : : : : : : : : : 162 feature numbers in the ngerprint les : : : : : : : : : : : : : 163 Approximate times taken for the stages of the algorithm on a Silicon Graphics Personal Iris 4D/35 computer : : : : : : : : : 163 Mountain recognition: exact transformation parameters between the database objects and the test scene : : : : : : : : : : : : : 166 Mountain recognition: recovered scene parameters, and true scene parameters from the recognition process : : : : : : : : : 168 Approximate times taken for the stages of the algorithm on a Silicon Graphics Personal Iris 4D/35 computer : : : : : : : : : 168

ix

Abbreviations and symbols Abbreviations and acronyms MM MMDE MMCO SSF D.C. DEM RMIT B.E. G.Dip. M.Ap.Sc. Qld. cdf pdf sf i.i.d.

mathematical morphology, multiscale morphological dilation-erosion, multiscale morphological closing-opening, scale-space ltering. direct current. digital elevation map. Royal Melbourne Institute of Technology. Bachelor of Engineering. Graduate Diploma. Master of Applied Science. Queensland. cumulative distribution function. probability distribution function. structuring function. independent and identically distributed (random variables).

Symbols A; B a; b; v a; b; z bxc fa : g F(!)



r2 B

point sets in Euclidean space, points in Euclidean space (Hadwiger's notation), points in Euclidean space, the largest integer less than the number x, the set of elements a such that , the frequency response, the autocovariance function, scale parameter, @2f + @2f the Laplacian operator, r2f (x; y) = @x 2 @y2 a transformation, the symmetrical set of B with respect to the origin, x

Abbreviations and symbols

B = fb : ?b 2 B g, Ax the set A translated by x, Ax = fa + x : a 2 Ag, Ac the complement of set A, Ac = fa : a 62 Ag, A the re ection of set A, A = fa : ?a 2 Ag, Z the set of integers, f?1; : : :; ?1; 0; 1; : : : ; 1g R the real numbers, Rk k-dimensional Euclidean space, Zk k-dimensional discrete space, L partially ordered set (poset), K subset of a poset, ; ; operators on posets, D; G subsets of Euclidean space; domain of a function, S structuring function support, M bound on a function,  morphological dilation; Minkowski set addition, morphological erosion; Minkowski set subtraction,  morphological opening,  morphological closing, ~ multiscale morphological dilation-erosion, multiscale morphological closing-opening, f function; image; signal, g structuring function, g scaled structuring function, g re ected structuring function, g(x) = g(?x), 8 for all, 9 there exists,  end of proof mark; end of de nition mark.

xi

Publications These are the publications which have been produced by, or in conjunction with, the author, during his Ph.D. candidacy:

Refereed Journal Papers 1. Jackway, P. T. (1993a), `Multiscale image processing: A review and some recent developments', Journal of Electrical and Electronics Engineering, Australia 13(2), 88{98. 2. Jackway, P. T. (1994a), `On dimensionality in multiscale morphological scale-space with elliptic poweroid structuring functions', Journal of Visual Communication and Image Representation, to appear. 3. Jackway, P. T. (1994b), `Properties of multiscale morphological smoothing by poweroids', Pattern Recognition Letters, 15(2), 135{140. 4. Jackway, P. T., & Deriche, M. (1994), `Scale-Space properties of the multiscale morphological dilation-erosion', IEEE Transactions on Pattern Analysis and Machine Intelligence, to appear.

Refereed Conference Papers 5. Forbes, K., Jackway, P. T., & Anh, V. V. (1991), Automatic counting of nuclear tracks using a PC, in `Proceedings of Statcomp - Biostats 91', Coolangatta, Queensland, pp. 278{281. 6. Jackway, P. T. (1992a), Morphological scale-space, in `Proceedings of the 11th IAPR International Conference on Pattern Recognition', The Hague, The Netherlands, pp. C252{255. 7. Jackway, P. T. (1992b), Scale space properties of the multiscale morphological closing-opening lter, in `Proceedings of the 2nd Singapore xii

Publications

8.

9.

10.

11.

xiii

International Conference on Image Processing (ICIP '92)', Singapore, pp. 278{281. Jackway, P. T., Boles, W. W., & Deriche, M. (1993a), 3-D object recognition using morphological scale-space ngerprints, in `Proceedings of IEEE Workshop on Visual Signal Processing and Communications', Melbourne, pp. 291{294. Jackway, P. T., Boles, W. W., & Deriche, M. (1993b), Morphological scale-space ngerprints and their use in 3-D object recognition, in `Proceedings of the 2nd Conference on Digital Image Computing: Techniques and Applications, (DICTA-93)', Sydney, pp. 382{389. Jackway, P. T., Deriche, M., & Boles, W. W. (1993), Object recognition in range images using the morphological scale-space ngerprint, in `Proceedings of the SPRC Workshop on Signal Processing and its Applications (WoSPA 93)', Brisbane, pp. 89{95. Jackway, P. T., Boles, W. W., & Deriche, M. (1994), Morphological scale-space ngerprints and their use in object recognition in range images, in `Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, (ICASSP'94)', Adelaide, pp. V5{8.

Internal Technical Reports 12. Jackway, P. T. (1991a), Parametric morphological smoothing: Statistical and spectral properties, Research Working Paper 91/S5, School of Mathematics, Queensland University of Technology, Brisbane, Australia. 13. Jackway, P. T. (1991b), Scale-space properties of a parametric morphological smoother, Research Working Paper 91/S2, School of Mathematics, Queensland University of Technology, Brisbane, Australia.

Authorship The work contained in this thesis has not been previously submitted for a degree or diploma at this or any other higher education institution. To the best of my knowledge and belief, the thesis contains no material previously published or written by another person except where due reference is made.

Signed: : :: : : : : :: : : : : : :: : : : : : :: : : : : : :: : : : : :: : : : : : :: : : : : Date:. . . . . . . . . .. .

xiv

Acknowledgements My doctoral programme was started in the school of Mathematics and concluded in the school of Electrical and Electronic Systems Engineering at Queensland University of Technology. From my time in the School of Mathematics I wish to thank my roommate Keith Forbes, the head of school Professor Tony Pettitt, and especially, my supervisor Vo Anh, who convinced me to do a PhD at QUT, and associate supervisor H.T. Tsui who rst explained to me the wonders of scale-space. From my time at engineering I acknowledge the support and infrastructure provided by Professors Miles Moody and Boualem Boashash, the comradeship and academic input of the other Image Laboratory students, Richard Buse, Peter Holmes, Quang Tieng, Mohammed Bennamoun and Daniel Bell, and most importantly my supervisors, Doctors Wageeh Boles, and Mohamed Deriche who in turn provided encouragement and guidance throughout the course of this project. During the time of my PhD candidature I am grateful for the support of the QUT Doctoral Students Association, and the relaxation provided by beaches at Surfers Paradise, Noosa, and Byron Bay. Above all, the love and support of my partner in doctoral studies | and life, my wife Rosalie Boyce made this Queensland adventure an absolute pleasure. This thesis has been typeset by the author using the LATEX1 package. The text font used is 12pt Computer Modern Roman2, other fonts from the Computer Modern Roman Family are used where required with some special symbols from the AMS-TEX3 macro package. 1 LATEX (Lamport 1986) is a \descriptive markup" document processing system based on the TEX (Knuth 1986) typesetting system. TEX and LATEX are publicly available via

FTP on the network at address: [email protected]. 2 The Computer Modern (CM) family of fonts were designed by Donald Knuth especially for the TEX system. 3 AMS-TEX (AMS 1991) is a macro package for TEX which de nes a wide range of specialist mathematical sysbols.

xv

Outline The rst chapter will introduce the eld of computer vision and the topic of scale-space ltering, and will conclude with a statement of the research task to be followed in this project. The second chapter will review the literature on scale-space ltering and related areas, and states the research question of the thesis. Necessary background reviews of other topics of direct relevance to the thesis will then be presented. Chapter 3 will contain the rst major theoretical results and contribution. After establishing the foundation and notation for the chapter we will introduce a scale-dependent morphology via a scaled structuring function. We will de ne two important new multiscale operations and formally develop properties pertinent to their use in a new scale-space theory. We will demonstrate that this scale-space satis es standard scale-space axioms. Chapter 4 will present a treatment of several topics which relate to the selection of a suitable morphological structuring function for the new multiscale operator. The use of the paraboloid structuring function in particular gains several advantages which are outlined in this chapter along with an ecient algorithm for implementation of the new smoother. Having established a new operation and proposed its use as a signal smoother, Chapter 5 contains the results of spectral and statistical analyses on this smoother. Chapter 6 will critically review published work on the multiscale opening operation. A scale-space causality theorem for the multiscale closing-opening is obtained. This result augments the previous results for the dilation-erosion. The scale-space ngerprint from the two approaches are then compared. A practical use for the above theory is then demonstrated in Chapter 7 where we will outline in detail the methodology for applying this reduced ngerprint to the recognition of objects in range images by the matching of surface features. This methodology is then applied to several interesting pattern recognition problems. Finally, in the last chapter, we summarise the preceding work, outline the thesis contributions, and make recommendations for future work. xvi

Chapter 1 Introduction

1

1.1 The Field of Computer Vision

2

1.1 The Field of Computer Vision Computer vision and image analysis is a very broad and rapidly expanding area of study; a bibliography of selected major U.S. or international journals and conference proceedings for the year 1991 alone runs to nearly 1200 citations (Rosenfeld 1992) and by 1992 has increased to nearly 1900 citations per year (Rosenfeld 1993). Levine (1985) describes computer vision: \Computer vision largely deals with the analysis of pictures in order to achieve results similar to those obtained by man." Levine (1985) further divides the eld into the following levels of analysis: LEVEL DESCRIPTION M+3 3-D scene interpretation M+2 3-D scene description M+1 2-D image description 6 to M Higher level aggregation and model matching 5 Discovery of structural relationships 4 Feature classi cation 3 Image segmentation and feature detection 2 Preprocessing and restoration 1 Sensor representation 0 Scene Table 1.1: Levels of analysis in computer vision. Levels 0 to 3 are known as picture or image processing and as low-level vision, and the higher levels as picture or image interpretation and high-level vision. Level 4 is also known as pattern recognition. In as much as the low-level vision steps deal with digital signals, the techniques used here are clearly also part of the eld of digital signal processing. Of course with any signal processing we are really concerned with the processing of information so we need to have di erent processing and analysis techniques depending on how information is carried by the signal. This is why image processing has developed its own set of techniques apart from those used more widely in signal processing. For example, suppose we have two signals expressed as functions of time, f1(t), and f2(t). f1(t) is, say, an underwater transient signal, and f2(t) is a raster line from an image. An appropriate analysis procedure for f1(t) may be to expand this signal into the time-frequency plane, f1(t) ! F1(t; !) this may be done for example by using say the Wigner-Ville distribution (Boashash & O'Shea 1989).

1.1 The Field of Computer Vision

3

In contrast, to analyse f2(t) we may choose to expand this signal into scale-space, f1(t) ! F1(t; ) by convolution with a scaled Gaussian kernel (Witkin 1983). The point is that we must match the analysis technique to the way in which information is being carried in the signal (Baraniuk 1994). Importantly, in the analysis of images (Koenderink 1984) we would often be dealing with multidimensional signals, for example functions in 2-D f (x; y) rather than 1-D functions as used in the previous example. The comprehensive survey of computer vision and image analysis previously cited (Rosenfeld 1993) provides a useful guide to the various divisions of the eld. The survey is arranged as:  General References  Related Topics  Applications  Architectures  Computational techniques  Feature detection and segmentation; image analysis  Two-dimensional shape; pattern  Colour and texture  Matching and stereo  Three-dimensional recovery and analysis  Three-dimensional shape  Motion. As an indication, the major results of chapter 3 of this thesis, which were presented at an international conference (Jackway 1992a), have been classi ed under the section \computational techniques" and sub-section \image operations: scale space". The author agrees with this categorisation, however I would argue that scale-space is more of a theory than a computational technique, and although the motivation for multiscale techniques comes primarily from image processing applications, scale-space theory is general and, as we will see, can be applied to 1-D signals and higher dimensional signals. This thesis is directly concerned with scale-space and in fact introduces a new scale-space theory. We will introduce scale-space shortly however we should rstly discuss what \scale" is and why it is important in signal (image) processing.

4

1.2 Scale and Signal Processing

1.2 Scale and Signal Processing Many fundamental problems in signal and image processing and analysis rely on a smoothing or regularisation step in their solutions. Examples include: the determination of the Spectrum of a stationary signal by smoothing the periodogram (Brockwell & Davis 1987); the \early vision" problems of structure-from-X (where X = stereo, motion, shading, or texture) which all need regularisation (Bertero, Poggio & Torre 1988); and image edge detection by zero crossings of the Laplacian of Gaussian (LOG) operator where the Gaussian lter acts as a smoothing kernel (Torre & Poggio 1986). Informally we may say that the outputs of such procedures are \too noisy" unless either the inputs or outputs are smoothed in some way. Mathematically we speak of such problems as being \ill-posed," a notion attributed to Hadamard (1923) meaning that the solution: a) may not exist, or b) may not be unique, or c) may show large changes in output for arbitrarily small changes in input. In addition the vision-related problems above can also be regarded as \inverse problems" in that we are trying to recover properties z of objects (for example: shape, edges) from images y of these objects, that is we have the imaging relation

y = Az

(1.1)

which relates object properties to image properties via some system operator, A. The straight-forward solution is to use the inverse z = A?1y but in practice such problems are often ill-posed. In general, inverse problems are characterised by having incomplete information | we need to impose further conditions related to physical constraints or a priori knowledge to enable a solution to be found. Ill-posed, inverse problems are an extremely important class of mathematical problems and various general methods such as \maximum entropy" (Jaynes 1984) and \regularisation" (Tikhonov & Arsenin 1977) have been developed. A statistical perspective on these problems has recently been published (O'Sullivan 1986). Poggio, Torre, Bertero and Koch (Bertero et al. 1988, Torre & Poggio 1986, Poggio, Torre & Koch 1985) rst realised that many of the problems and proposed solutions developed independently in the elds of computer vision and image analysis were in fact speci c applications of the above more general methods from mathematics. We are interested here in the fact that these methods all depend on the value of a single parameter that determines the amount of smoothing. Large values of this parameter produce large amounts of smoothing in the signal which removes \small-scale" features from the signal (\scale" is used here in the intuitive sense), therefore we can refer to this parameter as a scale

5

1.2 Scale and Signal Processing

parameter. For example in regularisation we have to nd the z^ that minimises a weighted combination of two quantities (Tikhonov & Arsenin 1977)

jAz^ ? yj2 + jP z^ j2;

 > 0:

(1.2)

The rst term expresses the requirement that the proposed model, Az should be close (in the least squares sense) to the data y and the second term Pz is chosen so that implausible or irregular (or unsmooth!) behaviour of the solution is penalised. In the maximum entropy methods (Jaynes 1968) the second term is chosen R on information theoretic grounds to be the (neg)entropy functional ? z(t) log z(t) dt. The weighting or smoothing parameter  which is of central interest here must be selected by the user. If statistical properties of the data are known or assumed it may be possible to estimate  as the value which minimises some quantity such as the predictive mean square error (Bertero et al. 1988, O'Sullivan 1986). Otherwise , which is certainly data dependant, is chosen either a priori or adaptively to give satisfactory results. As a second example consider the problem of detecting edges in images. Intensity edges occur in an image where the gradient is a local maximum, therefore to perform edge detection we use derivative operators of various orders (Torre & Poggio 1986). It is well known that derivatives \amplify" noise | equivalently, in the frequency domain, we see that derivative operators are high-pass lters. Mathematically speaking, if z is the derivative of y then from calculus we have Zt (1.3) y(t) = z( ) d = Az ?1

which is of the form of equation (1.1). More precisely, it is an integral equation of the \ rst kind" and notoriously ill-posed (Torre & Poggio 1986, Tikhonov & Arsenin 1977, Poggio, Voorhees & Yuille 1988). These three ways of looking at di erentiation lead to the same conclusion: smoothing (low-pass ltering or regularisation) is needed. Therefore, all practical edge detectors have associated with them a scale parameter, say , which governs the degree of smoothing. Usually, this parameter is chosen to suit the image (or part of the image) involved and to give satisfactory results (Torre & Poggio 1986, Poggio et al. 1988, Canny 1986, Nalwa & Binford 1986). We wish to stress that a smoothing procedure is an essential element of many data and signal processing operations. This smoothing depends upon a scale parameter which must be selected to give optimum results. Implicit in the preceding sentence is an important, but often overlooked, assumption: that a single value of smoothing parameter is appropriate. This restriction certainly makes the mathematical formulation of the problem

6

1.2 Scale and Signal Processing

easier, but may be overly narrow from a wider perspective. Why not allow for the processing of the signal at more than one scale? Although Rosenfeld and Thurston published early papers on image analysis using multi-scale operators (Rosenfeld & Thurston 1971, Rosenfeld, Thurston & Lee 1972), the rst use of multi-resolution images as a whole probably belongs to Kelly (1971). The pioneering work along this direction was brought to computer vision by David Marr and colleagues (Marr 1976, Marr & Hildreth 1980, Marr & Poggio 1979, Marr, Ullman & Poggio 1979, Marr 1982) who appreciated that the multi-scale analysis of images offers many bene ts. Earlier work on human vision had suggested that visual information may be processed in parallel by a number of spatial-frequencytuned channels (Campbell & Robson 1968). It seemed worthwhile to apply this same approach to computer vision problems. To be more speci c let's look in detail at multi-scale techniques as applied to the detection of intensity edges in images.

1.2.1 Example: image edge detection

Intensity edges are an extremely important feature of images and convey much information, as illustrated by the human ability to interpret cartoons. Edges are a key feature of the \raw primal sketch" of Marr (1976) which is the rst image description step in Marr's theory of vision (Marr 1982). Edge detection is also complementary to image segmentation since edges can be used to break up images into di erent regions (Horn 1986). As already stated, the detection of intensity edges in images involves the use of derivative operators. It may be advantageous if the derivative operator used is rotationally invariant (isotropic). The Laplacian operator,

@ + @ r2 = @x 2 @y 2 2

2

(1.4)

is the lowest-order linear combination of partial derivatives that is rotationally invariant (Horn 1986). The Laplacian is therefore widely used in edge detectors (Torre & Poggio 1986). Since this is a second-derivative based operator, an edge is indicated by a zero-crossing of the output. For the smoothing part of the procedure, the Gaussian lter, 2 2 G(x; y) = (22)?1 exp(? x 2+2y ); (1.5) is widely used. The parameter  controls the degree of smoothing. Marr & Hildreth (1980) have shown that the Gaussian lter optimises the con icting requirements of both spatial localisation and frequency domain localisation

1.2 Scale and Signal Processing

7

which are related by the \uncertainty principle" (Wilson & Granlund 1984). Additionally, the Gaussian is the only rotationally invariant lter which is the product of two one-dimensional lters (Horn 1986) and the serial application of two Gaussian lters of scale 1 and 2 is equivalent to a single Gaussian lter of scale 3?2 = 1?2 + 2?2 (Gaussian lters are closed under convolution) (van den Boomgaard 1992)). These later properties facilitate ecient computation of multiscale Gaussian ltering. The frequency response of a derivative operator is of a high-pass form, when in series with a low-pass (smoothing) lter the result is a band-pass lter (in computer vision terms: a spatial frequency channel). The centre frequency of this band-pass lter is determined by the degree of smoothing. To illustrate, consider the commonly used Laplacian of Gaussian (LOG) edge detector: the product of equations (1.4) and (1.5). With the LOG lter, an edge is detected by looking for zero-crossings of the Laplacian of an image I (x; y) convolved with a Gaussian operator G(x; y), that is r2(G(x; y)  I (x; y)) which by a mathematical result can also be written (r2G(x; y))  I (x; y) giving the LOG lter: 2 2 2 2 LOG = r2G(x; y) = ? 1 4 [1 ? x 2+2y ] exp(? x 2+2y ):

(1.6)

F (fx; fy ) = ?42(fx2 + fy2) expf?222(fx2 + fy2)g;

(1.7)

The Fourier transform of this lter is

where fx and fy are the spatial frequencies in the x and y directions. This result shows that LOG is a band-pass lter with a centre (peak) frequency of fx2 + fy2 = (222)?1. Interestingly, Marr & Hildreth (1980) present evidence from studies of the human visual process to support the use of the LOG lter, perhaps implemented in the nervous system by the \di erence-of-Gaussians" (DOG) lter (Marr, Poggio & Hildreth 1980, Young 1987) or even the di erenceof-o set-Gaussians (DOOG) lter (Young 1987). The work of Campbell & Robson (1968) on the human visual system has been extended to give a \four mechanism model" of vision (Wilson & Bergen 1979), although more recent studies have shown that some aspects of human visual behaviour cannot be totally explained by the concept of spatial frequency channels (Caelli & Moraglia 1987). Nonetheless, the multi-channel approach is of wide utility in image analysis problems. That this LOG edge detector appears in the frequency domain as a bandpass lter also lends support to the idea of using several such lters in parallel, centred on di erent frequencies so as to cover the total information content of the signal. The zero-crossings of band-limited signals are actually very

1.3 Scale-Space Filtering

8

rich in information content. A Theorem due to Logan (1977) states that (under some general conditions) a one-dimensional signal of bandwidth less than one octave can be represented (up to a multiplicative constant) by the positions of its zero-crossings. This shows that a bank of ideal one-octave wide lters, with zero-crossing detectors, covering the bandwidth of a signal would encode all the information in the signal. Logan's result was extended to images (2-D signals) by Rotem & Zeevi (1986) who showed how to encode and reconstruct images (with a bandwidth of less than one octave) from samples of their zero-crossing contours. Although the LOG lter is not an ideal band-pass lter, (having instead a Gaussian shape in the frequency domain) the above theory shows that much information is contained in the positions of the zero-crossings of an image ltered in this way. Since many such lters in parallel may be needed to cover the bandwidth of the image, this provides general theoretical support for the multi-scaling approach of Marr (1982). Another advantage of a multiscale analysis is that it provides a degree of invariance of the analysis to changes in overall signal scaling. For example, it is widely accepted that the analysis of an enlarged photograph should give the same results as the original even though all the objects in the photograph are now larger. This is possible with multiple lters since a feature originally appearing in a certain channel now appears in some higher channel but the relative positions of the image features in scale (and position) does not change. A potential problem with any multi-scale signal processing or analysis is how to relate the description of the signal at one scale to that at other scales. An interesting solution to this problem was introduced to computer vision by Witkin (1983) under the title \scale-space ltering".

1.3 Scale-Space Filtering Scale-space ltering provides a way to associate signal descriptions across multiple scales. The idea is elegant: if scale is considered as a continuous variable rather than a parameter, then a signal feature at one scale is identi ed to that at another scale if they lie on the same feature path in \scalespace". This technique provides a means for \. . . managing the ambiguity of scale in an organized and natural way" (Witkin 1983). A central idea in Witkin's work is that important signal features would persist through to relatively coarse scales even though their location may be distorted by the ltering process. However by \coarse-to- ne tracking" they could be tracked back down a path in scale-space to zero scale to be located exactly on the original signal. In this way the bene t of large smoothing to detect the major features could be combined with precise localisation.

1.3 Scale-Space Filtering

9

In scale-space ltering, the signal is \expanded" into a higher dimensional space, that is, if f (t) is a signal then the scale-space image of the signal is Z1 (t? ) (1.8) F (t; ) = f ( ) p1 e? 22 d: ?1  2 F de nes a surface on the (t; ) plane, the surface swept out as the Gaussian's standard deviation is smoothly varied (Witkin 1984). After expanding the signal into scale-space we \condense" the signal description by concentrating on only the features in the signal at all scales. Although it is not normally appreciated in the literature, the choice of appropriate signal feature is very closely related to the type of signal smoothing used. With the correct featuresmoother pairing we may obtain a causality result which guarantees that signal features can only disappear and never appear with increasing scale. This property means that coarse-to- ne tracking can always be used as every signal feature, once present at some scale, can be tracked through scale-space to zero-scale on the original signal. In fact in this thesis we will de ne a scalespace as being any smoother-feature pairing which possesses such a causality property. It has been shown that for 1-D signals and the Gaussian lter as speci ed by equation (1.8), the zero-crossing of the n-th signal derivative (n = 0; 1; 2; : : :) is the appropriate feature detector (Babaud, Witkin, Baudin & Duda 1986). Following its publication in 1983, the scale-space idea has caused an explosion of interest which has endured. A citation search on Witkin's original paper using the \Science Citation Index", which covers only the major journals in computer vision, produced the results presented in table 1.2. Year No. of Citations 1984 3 1985 8 1986 16 1987 16 1988 19 1989 21 1990 20 1991 20 1992 29 1993 24 Table 1.2: Citations of Witkin's original paper (Witkin 1983)

1.4 The Research Tasks

10

Unfortunately, Witkin's scale-space has several limitations, particularly in 2-D and higher dimensional signals. These limitations will be treated in detail in the next chapter where we present a more technical review of scalespace theory and notation, however we can summarise, without discussion, the limitations of Gaussian scale-space as:  Zero-crossing contours in higher dimensional signals can split into two with increasing scale.  There is no scale-space causality property for local extrema of a signal.  computation of the ngerprint involves the smoothing of the signal at various scales (although this can be done incrementally).  scale is non-negative only. Therefore the task to be undertaken in this thesis is the construction and demonstration of a new scale-space theory which overcomes some of the above limitations.

1.4 The Research Tasks In the subsequent chapters we will undertake the following tasks: Chapter 2 A literature review of Gaussian scale-space concentrating on its limitations. A brief review of related contemporary multi-resolution techniques. A review of the background, principles, and major operations and results of mathematical morphology. A review of the current state-of-the-art in object recognition, particularly in range images. The introduction of notation and symbolisms to be used throughout the thesis. Chapter 3 The introduction of a new scale-space based on a multiscale morphological dilation-erosion smoothing operation. The mathematical proof of a number of important properties of this smoothing, culminating in the proof of a scale-space causality property. Chapter 4 Arising from the introduction in chapter 3 of new morphological operation, we will consider semi-group and dimensionality properties which relate to the selection of an suitable morphological structuring function. Chapter 5 In order to relate this new smoother to more familiar linear smoothers which have well known spectral and statistical properties, we will investigate such properties of the multiscale morphological dilationerosion.

1.5 Contributions of the Thesis

11

Chapter 6 We will investigate critically some related results in the literat-

ure, and inspired by the investigation in chapter 3, we will attempt the construction of a multiscale morphological closing-opening scale-space and the examine relationship with the dilation-erosion scale-space. Chapter 7 The demonstration of one application of the realisation of the previous theory to the recognition of objects in range images. Demonstration 1 will be on human face recognition, and demonstration 2 will recognise mountains in digital elevation maps. Chapter 8 A summary of the preceding work. The drawing of conclusions. The presentation of suggestions for further work.

1.5 Contributions of the Thesis The major contributions of this thesis will be:  a new and original scale-space theory;  the proof of continuity and order properties in this scale-space;  dimensionality results which motivate the choice of scaled structuring function;  a semi-group property of the structuring function which carries over to the ltering itself;  an ecient algorithm to perform the smoothing and therefore generate the scale-space;  initial results in the spectral analysis of this new signal smoothing operation;  a preliminary statistical treatment of this lter;  a related class of morphological scale-space based on the multiscale closing-opening;  the relationship between these scale-spaces with respect to ngerprints is explained.  \reduced ngerprints" are introduced;  an ecient algorithm for the extraction of the reduced ngerprints;  the demonstration of a novel, general purpose, object recognition system using reduced ngerprints.

1.6 Summary

12

1.6 Summary This chapter has traced the development of multi-scale signal and image processing from its beginnings to the latest developments. The point of departure is the consideration of the problem of obtaining the optimum value of a smoothing or regularisation parameter common in the solution of many ill-posed inverse problems. The pioneering work of David Marr in recognising the value of the multiscale approach to computer vision was then outlined. An overview of the scale-space theory of Andrew Witkin was presented and the major limitations of this technique were listed. An outline of the organisation of the remainder of this thesis and the research contributions made is then given. The following chapter presents a review of the three topics of scale-space ltering, mathematical morphology, and object recognition.

Chapter 2 A Review of Scale-Space Filtering, Mathematical Morphology and Object Recognition

13

2.1 Introduction

14

2.1 Introduction Computer vision and image analysis is a very broad and rapidly expanding area of study; a bibliography of major sources for the year 1991 alone runs to nearly 1200 citations (Rosenfeld 1992) and for 1992 has increased to nearly 1900 citations (Rosenfeld 1993). Of this immense body of literature we will conduct reviews on three topics. The next section contains the rst, and broadest review. We review the topic of multiscale image and signal processing where we discover the mathematical rationale for multiscale signal processing, how multiscale analysis was rst applied to images, and the seminal work of Witkin (1983) in developing \scale-space" theory. We will examine this theory closely and investigate its strengths and weaknesses. Out of this review will come the motivation for the work contained in this thesis. Where possible, we have attempted to cite the very beginnings of the ideas involved to give a sense of historical perspective to the review; we cite as well as the latest developments to indicate the current \state of the art". Between these two chronological extremes the author will attempt to cite all work that has made a contribution to the development of the theory. Part of the material in this review has been published in an Australian journal (Jackway 1993). The second review is of mathematical morphology, a recent branch of mathematical theory. The review here seeks to outline the essential elements and tools of the theory in preparation for the theoretical work in chapters 3 to 6. Another important function of this review is to set the necessary de nitions and notation for the remainder of the thesis. Note, notation and terminology has been particularly problematic in mathematical morphology, a point we discuss in some detail in section 2.3.1. The third review deals with the very practical subject of object recognition in images, particularly range images, since this will be our major interest in chapter 7. There have been many good reviews of object recognition so, after establishing a de nition of the object recognition problem from the literature and reviewing the necessary theory, we will examine in detail only those methods which relate most directly to our proposed approach.

2.2 Scale-Space Filtering

15

2.2 Scale-Space Filtering In this major section we will review scale-space ltering and other related multi-resolution approaches. In this technical review we will introduce the notation to be used throughout this thesis and outline in some detail the limitations of existing scale-space theory.

2.2.1 Scale-Space Filtering

Scale-space ltering has already been introduced in the previous chapter, however here we will present a more thorough and technical treatment. The ideas behind scale-space rst appeared in a report on expert systems by Stans eld (1980) who was looking at ways to extract features from graphs of commodity prices. The scale-space concept was named, formalised, and brought to image analysis by Witkin (Witkin 1983, Witkin 1984). It provides a way to associate signal descriptions across multiple scales. A central idea in Witkin's work is that important signal features would persist through to relatively coarse scales even though their location may be distorted by the ltering process. In a way these linkages across scale are used to cheat the \uncertainty principle" which states that spatial localisation and frequency domain localisation are con icting requirements (Wilson & Granlund 1984). In terms of images we can talk about scale-space bridging the \pixel-region gap" (Rosenfeld 1984) since coarse scales refer to regions whereas ne scales refer to individual pixels. Importantly, we require that a signal feature, once present at some scale, must persist all the way through scale-space to zero-scale. Otherwise the feature would be spurious: being caused by the lter and not the original signal. This is often called a \causality" property (Koenderink 1984). We can also see this as a mathematical monotonic property since the number of features must be a monotone decreasing function of scale. Listing the major requirements of the scale-space approach we have:  A method of smoothing a signal to remove small-scale features (it is best if the smoothing parameter corresponds to the intuitive notion of \scale"),  A method of detecting features at each level of smoothing,  A causality result which states that the features at a higher level of smoothing are related to (caused by) features at a ner level of resolution although the reverse need not be true. Depending on the application, other desirable properties of the scale-space approach may include a semi-group property of the smoothing operator

16

2.2 Scale-Space Filtering

(Lindeberg 1990), and stability, uniqueness, and invertability of the resulting signal representation (Hummel & Moniot 1989, Yuille & Poggio 1985). We now establish the notation and terminology. Suppose we have a signal, f (x) : Rn ! R and a smoothing kernel g(x; ) : Rn  R ! R. The signal smoothed at scale  is then given by F : RnR ! R;F (x; ) = f (x)g(x; ), where  denotes the convolution. F is a function on the (n + 1)-dimensional space called \scale-space". F (x; ) is known as the \scale-space image" of the signal (Witkin 1983). If we detect signal features (edges) by zero-crossings (Marr & Hildreth 1980), then it has been shown that the derivatives of Gaussian lters are the unique convolution kernels possessing a causality property for 1-D signals (Babaud et al. 1986). Let Z denote the point set of zero-crossings, that is, Zf () = fx : f (x)  g(x; ) is a zero crossing pointg: (2.1) The causality property (Yuille & Poggio 1985) for 1-D signals can now be stated as, De nition 2.1 (Zero-Crossing Causality in 1-D): Let card[Z ] denote the cardinality of the set Z , then the zero-crossing causality property is obeyed if, for all, 0  1  2 card[Zf (1)]  card[Zf (2)]

(2.2)



If we plot Z () against , we obtain a set of meandering lines in scale-space which have been termed the \ ngerprint" of the signal by Yuille & Poggio (1985). Figure 2.1 shows such a plot. As outlined in the introductory chapter, zero-crossings are rich in information about the signal. With regard to Logan's theorem (Logan 1977), with Gaussian scale-space we do not have a strictly band-pass lter but we do have a continuous range of centre frequencies. A natural question is whether the ngerprint diagram, which is a track of zero-crossings across continuous scale, is indeed a complete and unique representation of the signal? Yuille & Poggio (1985) answered this question in the armative. In fact the scale-space ngerprint contains redundant information as suggested by Witkin (1983) and only the derivative of the zero-crossing at two points are theoretically needed to reconstruct the signal (for polynomial signals) (Yuille & Poggio 1985). Hummel (1986) expanded the signal recovery results to a wider class of functions and showed that in general the inversion of the ngerprint is, not surprisingly, ill-posed! Later results by Hummel & Moniot (1989) show that stable reconstruction is possible if gradient data along the zero-crossings are used as well although this is at the cost of any data compression!

2.2 Scale-Space Filtering

17

Figure 2.1: A Gaussian scale-space ngerprint: the plot of zero-crossings of F (x; ). Koenderink (1984) related causality to the identi cation of points of equal luminance (in an image under blurring) and he showed that, under reasonable constraints, this implies that the signal evolution through scale-space should be governed by the \heat equation" which in turn leads uniquely to Gaussian smoothing. Studying signal feature propagation via di erential equations is a powerful approach and mathematical tools such as the \maximum principle" (Hummel 1986) can be used to show causality. Arising from this work are some new methods based on \anisotropic di usion" where the di usion coecient, instead of being constant, is made to depend locally on F (x; y; ) (Perona & Malik 1990, Whitaker & Pizer 1993). In this way smoothing can be enhanced within regions and reduced across boundaries, thus actually sharpening image edges with increased smoothing. The causality property in linear scale-spaces is an active research area and new results continue to appear (Wu & Xie 1990, ter Haar Romeny, Florack, Koenderink & Viergever 1991, Anh, Shi & Tsui 1993). Lindeberg (1990) has established an axiomatic formulation of scale-space for discrete signals. In this case, the appropriate lter is not a sampled Gaussian, rather the di usion equation itself should be discretised, the resulting 1-D lter is T (n; ) = e? In(), where In(:) are the modi ed Bessel functions of integer order (Lindeberg 1990). The weakness of the Gaussian scale-space approach, especially for 2-D signals (images), deserves closer attention. Zero-crossings, which are points on a 1-D signal, are closed contours in 2-D and Z is a set of closed curves. For images, the corresponding scale-space image is 3-dimensional and the ngerprints are now surfaces in 3-space. In particular, Yuille & Poggio (1986) have found:

2.2 Scale-Space Filtering

18

In two dimensions, with the Laplacian operator, the Gaussian is the only lter obeying the conditions which never create zero crossings as the scale increases...(however) a closed zero-crossing contour can split into two as the scale increases, just as the trunk of a tree may split into two branches. This is a problem as two separate contours at a coarse scale may in fact be caused by the same signal feature (see the diagrams in (Lifshitz & Pizer 1990)). It is true that no new intensities are created, which satis es Koenderink's causality (Koenderink 1984), however it seems to be odd semantics to claim that one closed contour splitting into two is not the creation of a new zero-crossing in practice! As a result of this diculty, the chief applications of Gaussian scale-space are to problems involving 1-D signals: the description and recognition of planar curves (Mokhtarian & Mackworth 1986), histogram analysis (Carlotto 1987), signal matching (Witkin, Terzopoulos & Kass 1987), ECG signal analysis (Tsui, Choy & Ho 1988), the pattern matching of 2-D shapes (Morita, Kawashima & Aoki 1991), boundary contour re nement in MRI images (Raman, Sarkar & Boyer 1991), the analysis of facial pro les (Campos, Linney & Moss 1993), the matching of motion trajectories (Rangarajan, Allen & Shah 1993). Instead of using zero-crossings of the Laplacian (a second derivative based operator) (Marr & Hildreth 1980) we could also detect features (edges) by detecting local extrema of a rst derivative based operator, an approach suggested by Witkin (1984). Then we would look for local extrema of the derivative of the signal in scale-space. One immediate advantage is that the local extrema of a signal of any dimensionality is in general an isolated point. So, is there a linear scale-space lter that gives a causality property for local extrema? Unfortunately not! As discussed in (Lifshitz & Pizer 1990), there is no convolution kernel with the property that it does not introduce new extrema with increasing scale in 2-D. This is a major drawback with the linear scale-space formulations. The proven uniqueness of the Gaussian lter (NB: amongst linear lters only!) has closed o the research on alternative types of scale-space for some time, thus the latest literature on scale-space consists mainly of reports of applications. Recently, however some progress has been made by looking at some nonlinear types of signal smoothing. These smoothers are not convolutions (at least not in the normal sense) but arise in connection with certain basic operations from the eld of \Mathematical Morphology." For example, Chen & Yan (1989) have used the morphological \opening" operation [to be discussed in detail in the next section of this review] with multiscale disks to show a scale-space causality theorem for the zero-crossings of the boundary curvature, of objects in a binary image. This result has since been found

2.2 Scale-Space Filtering

19

to apply to any scaled compact and convex structuring elements by Jang & Chin (1991). By Sternberg's \umbra" (see the next section of this chapter), we can apply this theorem to zero-crossings of the second derivative of 1D functions. However, in common with Gaussian scale-space, the causality property using zero-crossings is not readily extended to 2-D and higher dimensional signals. We forshadow here that we will outline some problems with the results of both Chen & Yan (1989) and Jang & Chin (1991) when we treat their papers in depth in chapter 6. Jang & Chin (1992) have also published a recent comprehensive review of Gaussian and morphological scale-space which emphasises the signal-feature and signal-smoother aspects of various scale-space formulations. In an interesting and di erent approach, van den Boomgaard (1992) has considered \morphological propagators" in which a scale-space is formed by propagating a signal by erosions and dilations with a multi-scale parabolic structuring function. The evolution is goverened by a non-linear di erential equation of the same form as the Burger's equation for shock waves. The features in scale space are the singularities of the signal. A limited causality result is available in that the ngerprint lines cannot split with increasing scale, however they can start at non-zero scale, and so may not pertain to features in the original signal. This work has not been extended into a full scale-space theory, at present. The Multiscale analysis of images (and signals) has become exceedingly popular in recent times, a thorough but now dated review of multiscale image understanding is given by Dyer (1987). A literature review would not be complete without some mention of other contemporary multi-scale techniques. We brie y review multiresolution pyramids and wavelet theory in the following sections.

2.2.2 Multiresolution Image Processing

In parallel developments to scale-space the eld of \Multiresolution Image Processing" has also appeared (Rosenfeld 1984). Multi-resolution, as opposed to multi-scale, image processing has come to imply a reduction in sampling rate as the scale becomes coarser. The Nyquist Sampling Theorem (Oppenheim & Schafer 1989) states that a bandlimited signal is uniquely determined by its samples if the sampling period, T is less than 1=2B where B is the maximum frequency in the signal. Thus if this signal has its bandwidth reduced by a low-pass lter the sampling rate (resolution) can be correspondingly reduced without any additional loss of information. Multiresolution Image Processing is strongly associated with data structures or image representations called \Pyramids" rst used by Tanimoto & Pavlidis (1975). A \Gaussian Pyramid" (multiresolution low-pass lter) can be constructed as follows (Rosenfeld 1984): Consider an image of 2n  2n pixels.

20

2.2 Scale-Space Filtering

Store this image in a 2n  2n array, this is the base (level 0) of the pyramid. For the next level up, each pixel is obtained by a weighted averaging over a 5  5 square of pixels on the level below. Repeat this procedure to obtain a pyramid of n + 1 levels. The sampling distance is doubled at each step so the k-th level is held in an array of size 2n?k  2n?k . The n-th level is a single pixel. Other pyramid structures such as the \Laplacian Pyramid" (multiresolution band-pass lter) can be easily obtained from the Gaussian pyramid (Burt & Adelson 1983). One problem with pyramids in general is that the data at separate levels is correlated and there is no clear model which handles this correlation. It is dicult to know whether a similarity between the image details at di erent resolutions is due to a property of the image itself or to the intrinsic redundancy of the representation (Mallat 1989). The pyramid approach o ers many useful properties and permits many ef cient and useful image operations (Burt & Adelson 1983, Uhr 1987, Rosenfeld 1984, Bister, Cornelis & Rosenfeld 1990), including the segmentation of range images Sabata, Arman & Aggarwal (1993). Lately, adaptive pyramids, in which the relationship between levels depends on the data, have been de ned (Jolion & Montanvert 1992).

2.2.3 Wavelet Theory

The \wavelet" representation, a more general multiresolution representation theory has become popular in recent times. Wavelets were introduced to image analysis by (Mallat 1989), but like many good ideas had already been introduced previously in mathematics (Grossmann & Morlet 1984). To understand the notation surrounding wavelet theory it is necessary to deal with some mathematical preliminaries, the following de nitions are taken from Brockwell & Davis (1987, chapter 2). De nition 2.2 (Inner-Product Space): A complex vector space H is said to be an inner-product space if for any pair of elements x and y in H, there is a complex number hx; yi, called the inner-product of x and y, such that 1. hx; yi = hy; xi,  denoting complex conjugation, 2. hx + y; zi = hx; zi + hy; zi for all x; y; z 2 H, 3. h x; yi = hx; yi for all x; y 2 H and 2 C , 4. hx; xi  0 for all x 2 H, 5. hx; xi = 0 if and only if x = 0.



21

2.2 Scale-Space Filtering

De nition 2.3 (Norm): The norm ofq an element x of an inner product space is de ned to be kxk = hx; xi. De nition 2.4 (Cauchy Sequence): A sequence fxn ; n = 1; 2; : : :g of elements of an inner-product space is said to be a Cauchy sequence if

kxn ? xmk ! 0 as m; n ! 1:

(2.3)



De nition 2.5 (Hilbert Space): A Hilbert space H is an innerproduct space which is complete, i.e. an inner-product space in which every Cauchy sequence fxng converges in norm to some element x 2 H.  Now we can proceed with a brief overview of wavelet theory which is mainly taken from the introduction in Mallat (1991) Let L2 denote the Hilbert space of measurable, square-integrable one-dimensional functions. For f : R ! R; f (x) 2 L2 and g(x) 2 L2, the inner-product of f (x) with g(x) is Z +1 g(x)f (x) dx: (2.4) hg(x); f (x)i = ?1

The norm of f (x) 2 L is given by Z +1 jf (x)j2 dx: kf k2 = 2

?1

(2.5)

The wavelet transform is a linear operator that decomposes a signal into components that appear at di erent scales. This transform is based on the convolution of the signal with a dilated lter. A wavelet is a function (x) 2 L2 such that Z +1 (x) dx = 0: (2.6) ?1

Let us denote by s(x) the \dilation"1 of (x) by a factor s: x 1 (2.7) s (x) = s s : The wavelet transform of a function f (x) at the scale s and position x is 1 The word \dilation" is used here in its ordinary sense of: \make wider or larger; cause

to expand or swell; stretch" (Guralnik 1982) and does not refer to the morphological operation of the same name as used widely throughout this thesis.

22

2.2 Scale-Space Filtering

given by the convolution product

Wsf (x) = f  s (x):

(2.8)

Grossmann & Morlet (1984) have shown that the wavelet transform satis es an energy conservation equation and that f (x) can be reconstructed from its wavelet transform (justifying the use of the term \transform"). When the scale s decreases the support of s(x) decreases so the wavelet transform Wsf (x) is sensitive to ner details. The scale s characterises the size and regularity of the signal features extracted by the wavelet transform. The wavelet transform depends on two parameters s and x that vary continuously over the set of real numbers. For practical applications these parameters must be discretised. For a particular class of wavelets, the scale parameter can be sampled along the dyadic sequence (2j )j2Z, without modifying the overall properties of the transform. The wavelet transform at the scale 2j is given by (2.9) W2j f (x) = f  2j (x): By imposing that + X1 ^ j 2 j (2 !)j = 1; (2.10) ?1 +1 f (x)e?i!x dx is the Fourier transform of f (x), we ensure where f^(!) = R?1 that the whole frequency axis is covered by a dilation of ^(!) by the scales (2j )j2Z. Any wavelet satisfying equation (2.10) is called a dyadic wavelet. The function f (x) can be reconstructed from its dyadic wavelet transform: + X1 f (x) = W2j f  2j (?x): (2.11) ?1

Lets call (x) a smoothing function if it is the impulse response of a low pass lter. Mallat (1991) shows that if we use the second derivative of this function as our wavelet function, d2 (x); (2.12) (x) = dx 2 the zero-crossings of Ws f (x) correspond to the in ection points of (f  s)(x). If the smoothing function is a Gaussian, then the wavelet function is a secondderivative of a Gaussian, and detecting the zero-crossings is equivalent to the LOG edge detector as previously discussed in subsection 2.2.2. The wavelet theory, being rmly based in mathematics and hence quite general and powerful, has become popular at the moment and is the subject

2.2 Scale-Space Filtering

23

of much active research particularly in regard to applications (Ben-Arie & Rao 1992, Chang & Kuo 1992, Boles & Tieng 1993). In common with other multiscale approaches, it shares the problem of relating information obtained at one scale or level to that obtained at another? Witkin's scale-space theory (Witkin 1983), developed in image analysis before wavelets were known, turns out to be a particular instance of wavelet theory as the Gaussian, or derivative of Gaussian kernels of scale-space can be seen as wavelet functions. The scale-space approach emphasises the relationship between signal descriptions across scale and the existence of the causality property whereas the wavelet approach emphasises the signal representation (or transform) aspect of the technique.

2.2.4 Summary and Conclusions

Witkin's Gaussian scale-space has been introduced as a basis for the coherent description of signals across scales, but this approach has some problems. We can summarise the limitations of Gaussian scale-space as:  Zero-crossing contours in higher dimensional signals can split into two as with increasing scale.  There is no scale-space monotonic property for local extrema of a signal.  Computation of the ngerprint involves the smoothing of the signal at various scales (although this can be done incrementally).  Scale is non-negative only. The fundamental research problem to be addressed then, can be stated as:

How can a full scale-space theory be constructed that possesses a monotonic property in higher dimensional signals? Additional desirable properties might include the existence of an ecient algorithm for obtaining the ngerprint, and perhaps the extension of scalespace to negative scales. The existence of at least one other scale-space, based on morphological operations (Chen & Yan 1989), gives an indication of a useful line of enquiry. The morphological approach is taken up in the next chapter, but rst we review necessary concepts in mathematical morphology. Finally, for completeness, we have brie y covered other contemporary multi-resolution techniques and wavelets.

2.3 Mathematical Morphology

24

2.3 Mathematical Morphology \Mathematical morphology" grew out of theoretical investigations of a geometrical or probabilistic nature needed in the analysis of spatial data from geology. The work was carried out by a team at the Fontainebleau research centre of the Paris School of Mines from 1964. This theoretical work was rst released widely with the publication of the book, \Random Sets and Integral Geometry", by Matheron (1975). A more practical book related to image analysis was later published by Serra (1982) followed by a second volume on theoretical advances (Serra 1988). In the next subsection we will introduce the basic morphological operations of dilation and erosion. We will conduct this introduction in an approximately chronological order which provides an opportunity to discuss and resolve the confusion in the literature over notation and terminology.

2.3.1 Dilation and Erosion

There has been, and continues to be, considerable confusion in the literature with the notation for, and the de nitions of, the basic morphological operations of dilation and erosion. Part of this confusion comes from the existence of two closely related set operations: Minkowski Addition and Subtraction. Minkowski (1903) introduced a set addition operation with obscure notation and later Hadwiger (1950) introduced a corresponding subtraction and gave the names \Minkowski addition" and \Minkowski subtraction" to these operations. In Hadwiger's notation: let Rk denote Euclidean k?space, given two point sets A and B in Rk, the \Minkowski addition" of A and B is the set fa + b : a 2 A; b 2 B g and the \Minkowski subtraction" of A and B is the set fv : v + b 2 A; b 2 B g (Scherk 1951). It is of interest that Hadwiger considered, without naming, the composite operations and found: (A ? B ) + B  A  (A + B ) ? B . Matheron (1975) de nes Minkowski addition as above and uses the symbol . For x 2 Rk; A  fxg is the translate of A which is denoted by Ax. Matheron de nes the \symmetrical set" of B with respect to the origin: B = f?x : x 2 B g. However, the Minkowski subtraction, denoted by , is re-de ned here as the complementation dual to Minkowski addition, that is, A B = (Ac  B )c. It is easy to show that, A B = fz : (B )z  Ag = fz : z ? b 2 A; b 2 B g which di ers from Hadwiger's de nition. Most importantly Matheron de nes the \dilatation" of A by B as A  B and shows that A  B = fz : A \ Bz 6= ?g = fa ? b : a 2 A; b 2 B g Matheron de nes the \erosion" of A by B as A B , the complementation dual of dilatation, and shows that A B = fz : Bz  Ag = fz : z + b 2 A; b 2 B g. Matheron (1975) then de nes the opening AB of A by B and the closing

2.3 Mathematical Morphology

25

AB of A by B as follows: AB = (A B )  B ; AB = (A  B ) B . The books by Serra adopt this notation except the more modern term \dilation" is used for AB . The advantage of this approach is that the Minkowski addition and subtraction are duals with respect to complementation, as are the dilation and erosion, that is, A  B = (Ac B )c and A  B = (Ac B )c. In, say, a binary image this means that dilating the foreground is equivalent to eroding the background with the same structuring element. One disadvantage is that the notation for dilation and erosion, opening and closing is rather untidy with the \hats" on the structuring elements. For a more formal disadvantage we need to rst discuss the alternative notation and terminology. The second main school of terminology belongs to Sternberg (Sternberg 1986, Haralick, Sternberg & Zhuang 1987) from the United States of America. Here we have the Minkowski addition and subtraction de ned as, X  B = fx + b : x 2 X; b 2 B g and X B = fz : Bz  X g = fz : x + b 2 X; b 2 B g which agree with the original de nitions of Hadwiger (1950) although this point is not appreciated since Haralick et al. (1987) compare their erosion with that of Serra et. al. who, as we have seen, re- de ned the Minkowski subtraction. Most importantly, in the work of Haralick et al. (1987), the transformation X 7! X  B is called the \dilation" and X 7! X B is called the \erosion". Thus dilation and erosion are actually synonymous with the Minkowski operations; which is good since the same symbols, ; are being used. This means that when compared with Serra's notation the symbol  has the same meaning while the word \dilation" does not, and the symbol means something di erent while the word \erosion" turns out to mean the same. No wonder there has been some confusion! The best justi cation of Sternberg's notation and terminology comes from Heijmans & Ronse (1990). They consider the basis of the morphological operations much more generally, from an algebraic approach. The morphological operations can be de ned on several di erent mathematical object spaces, for example: sets; grey-level functions; convex sub-sets; closed sub-sets. Heijmans & Ronse (1990) have shown that the common factor necessary is that the space is a \complete lattice". To understand this more general morphology some de nitions are necessary (see Heijmans & Ronse (1990)).

De nition 2.6 (Poset): Consider a set L; a binary relation  on L is called a partial order relation if it is 1. re exive: for any X 2 L; X  X ; 2. antisymmetric: for any X; Y 2 L; if X  Y and Y  X; then X = Y ; 3. transitive: for any X; Y; Z 2 L; if X  Y and Y  Z; then X  Z.

26

2.3 Mathematical Morphology

We then say that (L; ) is a partially ordered set, or in brief a poset.  The reverse relation  (de ned by X  Y if and only if Y  X ) is also a partial order relation. Given L; U 2 L and K  L; we say that U is an upper bound of K if for any K 2 K we have U  K , and that L is an lower bound of K if for any K 2 K we have L  K . A supremum of K is the least upper bound, and an in mum is the greatest lower bound. De nition 2.7 (Complete Lattice): The poset L is a complete lattice if every non-void subset K of L has a supremum and an in mum.  Two elements of the complete lattice L are important: the universal bounds. They are the greatest element I and the least element O; de ned by O  X  I for every X 2 L. De nition 2.8 (Duality Principle): The reverse  of an order relation  is itself an order relation. This reversion extends then to the supremum and in mum, since we have for any K  L: sup K = (Linf;) K; inf K = sup K: (L;)

(L;)

(L;)

The universal bounds of (L; )are those of (L; ); but interchanged. Thus (L; ) is called the dual lattice of (L; ). Thus to every de nition,, property, or statement on (L; ) corresponds a dual one on (L; ). This is the duality principle.  Dilations and erosions can now be de ned very generally on complete lattices: De nition 2.9 (Dilation and Erosion): Let be an operator, ie. a function L ! L 1. is a dilation if for every F  L; (sup F) = supX 2F (X ): 2. is a erosion if for every F  L; (inf F) = inf X 2F (X ):



We can see from this de nition that dilations and erosions are dual operations in the sense just de ned. Note, this is in general di erent to the \duality by complementation" as used by Matheron and Serra in de ning their erosion from the dilation. One nal de nition is needed to complete the result:

27

2.3 Mathematical Morphology

De nition 2.10 (Adjuction): Let ; " be operators L ! L. Then we say that (":) is an adjuction if for every X; Y 2 L; we have (X )  Y , X  "(Y )  The Minkowski addition and subtraction X  B and X B are adjuctions.

Heijmans & Ronse (1990) show that they are the only translation invariant dilations and erosions of a Euclidean or digital space. The duality based on adjuctions is general whereas the duality based on complementation (cf. (Matheron 1975)) cannot be extended to the non-Boolean lattice. This is a theoretical justi cation for the Sternberg notation and terminology. In this thesis we will follow the notation and terminology of Haralick et al. (1987), that is: De nition 2.11 (Dilation and Erosion): Let Rk denote Euclidean k?space, given two point sets A and B in Rn then, [ A  B = fa + b : a 2 A; b 2 B g = Ab ; b2B \ A B = fz : Bz  Ag = A?b; b2B

denote the dilation of A by B and the erosion of A by B respectively.  Some basic properties of the erosion and dilation are listed below, the proofs are straightforward and can be found in Haralick et al. (1987) or Serra (1982). [ AB = Ab (2.13)

AB A  (B  C ) A  (B [ C ) A  (B \ C ) AB AB BC A B

b2B

= BA = (A  B )  C = (A  B ) [ (A  C )  (A  B ) \ (A  C )  A if f0g 2 B ) AC  B C ) A\ B  A  C = A?b b2B

A B 6= B A

(2.14) (2.15) (2.16) (2.17) (2.18) (2.19) (2.20) (2.21) (2.22)

28

2.3 Mathematical Morphology

(A B ) C A (B [ C ) A (B \ C ) A B AB BC A B AB C

= A (B  C ) = (A B ) \ (A C )  (A B ) [ (A C )  A if f0g 2 B ) A C  B C ) A B  A C = (Ac  B )c , AB C

(2.23) (2.24) (2.25) (2.26) (2.27) (2.28) (2.29) (2.30)

Of these, the \chain rules" (2.15) and (2.23) are perhaps the most practically useful as they may allow the dilation or erosion with a large structuring function to be decomposed into a series of operations with smaller structuring functions with computational savings. In chapter 4 we will also see how these rules lead to a semi-group property for morphological scale-space. Mathematical morphology can be applied to functions either via the \umbra" approach of Sternberg (1986) or directly via the complete lattice approach of Heijmans & Ronse (1990). In fact Heijmans (1991) comments: Some authors (Giardina & Dougherty 1988, Haralick et al. 1987, Serra 1982, Sternberg 1980, Sternberg 1986) prefer to describe morphological operators for gray-level functions in terms of umbras instead of working with the numerical functions explicitly. Although such an approach may help to obtain a geometrical picture of the operations, it is in fact an unnecessary intermediate step and a source of many mistakes (Ronse 1990, Section 1). In this thesis we consider only functions and thus use directly the following de nition: De nition 2.12 (Dilation and Erosion of Functions): Let f : D ! R; D  Rn, and g : G ! R; G  Rn, then: (f  g)(x) =

sup ff (x ? t) + g (t)g ;

t2G\D x ff (x + t) ? g (t)g ; (f g)(x) = t2Ginf \D?x

denote the dilation of the function f (x) by the structuring function g(x) and the erosion of the function f (x) by the structuring function g(x).  Where supff g and inf ff g refer to the supremum, least upper bound, and in mum, greatest lower bound, of the function f (DePree & Swartz 1988).

29

2.3 Mathematical Morphology

Sometimes in the literature, the symbols, \W f " and \V f " are used (Heijmans 1991), and, in less mathematical treatments, max and min (Haralick et al. 1987). Theoretically, the supremum (in mum) must be used where the function does not attainWits \maximum" (\minimum"), for example, if f (x) = V x for 0 < x < 1 then 0 1, particularly the circular paraboloids = 2;  p elliptic poweroids: g (x) = ?jj x0Ax = jj for > 1

and A is a symmetric positive de nite matrix. Examples of various 2-D circular structuring functions are shown in gure 3.2.

47

3.2 Multiscale Morphology

0

α x

x

g(a x)

α g(x)

g(x)

Figure 3.1: A chord from the origin to any point on the structuring function should lie on, or below, the structuring function (shown in 2-D).

3.2.3 Multiscale Dilation-Erosion Scale-Space

With a scaled structuring function of the previous section we propose to join dilation and erosion at zero scale to form a single multiscale operation which uni es the morphological operations as follows: De nition 3.3 (Multiscale Dilation-Erosion): The multiscale dilation-erosion of the signal f (x) by the scaled structuring func-

48

0

0

-0.5

-0.5 g(x,y)

g(x,y)

3.2 Multiscale Morphology

-1

-1.5

-1

-1.5

-2 40

-2 40 50

30 20

10

30 20

10

10 0

40

20

30

y

50

30

40

20

10 0

y

x

(b)

0

0

-0.5

-0.5 g(x,y)

g(x,y)

(a)

x

-1

-1.5

-1

-1.5

-2 40

-2 40 50

30

40

20

30

(c)

x

30 20

10

10 0

40

20

20

10 y

50

30

y

10 0

x

(d)

Figure 3.2: Typical 2-D circular structuring functions. (a) at (ie. cylinder); (b) sphere; (b) poweroid with = 2 (ie. paraboloid); (c) poweroid with = 4 (ie. quartoid).

49

3.2 Multiscale Morphology

tion g (x) is denoted1 by f ~ g , and is de ned by 8 > < (f  g )(x) if  > 0; (f ~ g )(x) = > f (x) if  = 0; : (f g )(x) if  < 0.



That is, for positive scales we perform a dilation, for negative scales an erosion. With this method scale may be negative; it is jj which corresponds to the intuitive notion of scale. Unlike linear operators, dilation and erosion are \non-self-dual" (Serra 1988), positive and negative scales in scale-space contain di ering aspects of the information in a signal. As we shall see, positive scales pertain to local maxima in the signal, whereas negative scales pertain to local minima. In the same way scaled morphological opening and closing can be combined: De nition 3.4 (Multiscale Closing-Opening): The multiscale closing-opening of the signal f (x) by the scaled structuring function g (x) is denoted2 by f g , and is de ned by 8 > < (f  g )(x) if  > 0; if  = 0; (f g )(x) = > f (x) : (f  g )(x) if  < 0.



This multiscale operation has been de ned in a similar manner by van den Boomgaard (1992) and is mentioned here only for completeness. In chapter 6 we will present a new scale-space constructed from the multiscale closingopening. However, for the remainder of this chapter we will concentrate on the multiscale dilation-erosion of de nition 3.3 and the scale-space image F : D  Rn  R ! R de ned by:

F (x; ) = (f ~ g )(x)

(3.15)

where the (n +1)-dimensional space given by D R is called the \multiscale dilation-erosion scale-space". 1 the symbol \~" has previously been used by Serra (1982) to refer to the \Hit or Miss Transform" which does not appear in this thesis. Since we prefer to use a standard LATEX or AMS-TEX symbol rather than de ne a new one, we will reuse \~". 2 the symbol \ " has previously been used by Serra (1982) to refer to the \thickening" which does not appear in this thesis.

50

3.2 Multiscale Morphology

In line with the previous discussion, to be properly called a \scale-space" we need to demonstrate a causality property in this scale-space. Since we are using the operations of mathematical morphology to smooth a signal, the well known geometric visualizations of dilation and erosion are intuitively helpful: For the moment, take the scaled structuring function to be a n-dimensional ball with the radius as a scale parameter: a positive radius corresponds to rolling the ball along the top of the \surface" of the signal, and a negative radius to rolling the ball along the underneath. The smoothed signal can be visualised as the surface traced out by the centre of the ball when it is traced over the top (dilation) or underneath (erosion) of the \surface" of the signal. We illustrate this operation for a 1-D signal in gure 3.3. Intuitively,

AAAAAAAA AAAAAAAA AAAAAAAA AAAAAAAA AAAAAAAA AAAAAAAA

AAAA AAAA AAAA

Figure 3.3: Smoothing of a 1-D signal by Multiscale Morphological DilationErosion this new surface is smoother (in the sense of having smoother and less hills) than the original signal, and furthermore the larger the radius the smoother the \ ltered" surface becomes. In the limit: as the radius approaches zero the original signal is recovered (for continuous signals); and as the radius approaches in nity the output becomes at. It should be apparent that if the ball touches the top of a hill (local maximum) then a hill will appear on the output at exactly that point. If however the radius is such that the ball is prevented from touching that hill by nearby hills, then no hill will appear at that point on the output, and more importantly that hill cannot reappear for any increased value of radius, r. In more precise terms: we propose that the number of local maxima is a monotone decreasing function of r. This

51

3.3 Properties of Multiscale Dilation-Erosion

suggests a scale-space causality result which is treated more rigourously in the next section.

3.3 Properties of Multiscale Dilation-Erosion The properties of the smoothing will be introduced formally in the form of propositions building to a theorem which is the central result of this chapter. In outlining the results of this chapter we will often make use of the duality between dilation and erosion (already given at equation 2.31):

f g = ? ((?f )  g);

(3.16)

where g indicates the re ection of g which is de ned by equation 2.32, and the negation of the function (?f )(x) = ?(f (x)) for all x. Therefore many results on f ~ g for  > 0, which corresponds to dilation, can be immediately applied to  < 0, corresponding to erosion. In practice most structuring functions are symmetrical about the origin so that g = g and in this case the erosion of a signal by a structuring function may be obtained by: negating the signal, performing a dilation with that same structuring function, and negating the result, that is f g = ?(?f  g).

3.3.1 Properties of the Filter Support

The sup/inf operations in the multiscale dilation-erosion may not have to be taken over the full domain G but only within a smaller area S  G. This region, by analogy with linear lters, can be called the \support region" of the smoother. In general, if we can minimise the area of this support region we decrease the time for computation of the smoothing operation. Assume that the signal is non-negative and bounded, that is, there exists some M  0 such that 0  f (x)  M , for all x 2 D. Proposition 3.1 (Jackway 1991b): If we let Sp = ft 2 G : g (t)  f (x) ? M g and Sn = ft 2 G : g (t)  ?f (x)g then , 8 > < supt2Sp\D xff (x ? t) + g (t)g if  > 0; if  = 0; (f ~ g )(x) = > f (x) : inf t2Sn\D?xff (x + t) ? g (t)g if  < 0.

Proof: From (3.2) we have 0 2 Sp: Therefore, sup ff (x ? t) + g (t)g  f (x) + g (0) = f (x): t2Sp

(3.17)

52

3.3 Properties of Multiscale Dilation-Erosion

Let Spc = GnSp, the complement of Sp in G, so that,

g (t) < f (x) ? M for all t 2 Spc ) f (x ? t) + g (t) < f (x) + f (x ? t) ? M for all t 2 Spc ) f (x ? t) + g (t) < f (x) for all t 2 Spc ) supc ff (x ? t) + g (t)g < f (x): (3.18) t2Sp

Therefore, sup ff (x ? t) + g (t)g < sup ff (x ? t) + g (t)g :

t2Spc

t2Sp

(3.19)

So,

0 sup ff (x ? t) + g (t)g = max @ supc ff (x ? t) + g (t)g ; t2G t2Sp ! sup ff (x ? t) + g (t)g t2Sp = sup ff (x ? t) + g (t)g : (3.20) t2Sp

This completes the proof for  > 0, the proof for  < 0 proceeds in a similar manner using inf instead of sup, ?f (x) instead of M ? f (x) and reversing the appropriate inequalities.  Using the inequality 0  f (x)  M for all x 2 D a more general support Sg can be found that is independent of position x. Putting the extreme values for f (x) into the formulae for Sp and Sn we obtain:

Sg = [Sp]f (x)=0 = [Sn]f (x)=M = ft 2 G : g (t)  ?M g:

(3.21)

It is mathematically convenient to have the support regions as closed sets, so from now we let Sp; Sn or Sg denote the closure of the respective regions if necessary. Using the support region Sp; Sn or Sg enables the computation of the multiscale dilation-erosion to be made more ecient. These support regions depend on scale, in particular from equations (3.5) and (3.9) we see that they approach a single point at the origin as  ! 0, importantly,

Sg ! f0g as  ! 0

(3.22)

53

3.3 Properties of Multiscale Dilation-Erosion

3.3.2 Continuity and Order Properties of the ScaleSpace Image

The multiscale dilation-erosion (de nition 3.3) consists of three parts corresponding to positive, zero, or negative values of the scale parameter. Since the structuring function is non-negative at the origin [see eq. (3.2)], the dilation is extensive and the erosion anti-extensive, that is (f  g )  f and (f g )  f . Thus we have the result, (f ~ g )0 :

(3.23)

However, the justi cation for the joining of the multiscale dilation and erosion would be considerably strengthened if we can show the stronger result that both operations approach f (x) as the scale parameter approaches zero from above and below. In other words it is natural to consider the behaviour of the scale-space image F (x; ) across the \seam" at  = 0 in scale-space. This can be done on the grounds of continuity and we have the following proposition.

Proposition 3.2 (Jackway 1991b): If the bounded signal f (x) is continuous at some x 2 D, then the scale-space image F (x; ) is continuous with respect to  at  = 0. That is, at points x where f (x) is continuous, F (x; ) ! f (x) as  ! 0. Proof: For  > 0,

F (x; ) = sup ff (x ? t) + g (t)g : t2Sg

(3.24)

From the de nition of \supremum", there exists some  2 Sg, such that F (x; ) = limt! ff (x ? t) + g (t)g and, since g (:) is non-positive, and the dilation is extensive, we have,

f (x)  F (x; )  lim (3.25) t! f (x ? t) Now as  ! 0+ , from relation (3.22)  ! 0 and since f (:) is continuous at x, lim lim f (x ? t) = f (x);

!0+ t!

(3.26)

and from (3.25), F (x; ) ! f (x) as  ! 0+ . A similar proof follows for  < 0 and we have, F (x; ) ! f (x) as  ! 0? , which completes the proof. 

3.3 Properties of Multiscale Dilation-Erosion

54

A careful study of the proof of proposition 3.2 reveals that a slightly stronger result holds, in fact full continuity of the signal is not necessary for the one-sided limits to converge to the signal. At points x = xu where f (xu ) is \upper semi-continuous" (u.s.c.), F (xu; ) ! f (xu ) as  ! 0+ , and at points x = xl where f (xl) is \lower semi-continuous" (l.s.c.), F (xl; ) ! f (xl) as  ! 0? . We recall that a function f is said to be \upper semi-continuous" at c if the non-deleted limit Limx!c sup f (x) = f (c) (Bartle 1964). Further f is \lower semi-continuous" at c i ?f (c) is u.s.c. Upper semi-continuous functions are often used to model \pictures" (Serra 1982) since their threshold sets, F (t) = fx : f (x)  tg are closed sets. If the structuring function is suciently smooth this property transfers to the scale-space image and we have the following proposition: Proposition 3.3 (Jackway 1991b): If the structuring function g(t) is a continuous function on Rn then the scale-space image is continuous for all x 2 D;  6= 0. Proof: Consider g+ (x +x) = g (x)+ E (x) where assuming the continuity of g(t), which implies the continuity of g (t), we have, (3.27) lim E (x) = 0 for all x kxk!0 For  > 0;

!0

F (x + x;  + ) = sup ff (x + x ? t) + g+ (t)g t setting u = t ? x = sup ff (x ? u) + g+ (u + x)g u = sup ff (x ? u) + g (u) + E (u)g u (3.28) So, in the limit as kxk ! 0;  ! 0, and equation (3.27) holds, hence F (x +x;  +) ! F (x; ), establishing the continuity. A similar argument holds for  < 0 completing the proof.  The following point-wise order properties of the scale-space image follow directly from the extensivity, and increasing properties of the morphological dilation and erosion (Haralick et al. 1987). Proposition 3.4 (Jackway 1991b): The scale-space image F (x; ) = (f ~ g )(x) possesses the following properties:

55

3.3 Properties of Multiscale Dilation-Erosion

F (x; 0) = f (x) for all x 2 D, F (x; 1) = supt2D ff (t)g for all x 2 D, F (x; ?1) = inf t2D ff (t)g for all x 2 D, p > q ) F (x; p)  F (x; q) for all p; q 2 R; x 2 D. Proof: 1. is by de nition 3.3; 2. & 3. by putting g (x) = 0 (equation (3.11)) in de nitions 3.1 & 3.2. To prove property 4 we note that by substituting equation (3.10) in de nition 3.1 we obtain, for 0 < q < p, then, n o F (x; q) = sup f (x ? t) + gq (t) t n o  sup f (x ? t) + gp (t) t  F (x; p); (3.29) 1. 2. 3. 4.

likewise, using de nition 3.2, if q < p < 0, then

F (x; q)  F (x; p); thus proving the proposition.

(3.30)



3.3.3 Signal Extrema in Scale-Space

Propositions 3.2{3.4 show that the scale-space image has good continuity and order properties but we have yet to show the essential scale-space causality property. The major result of this chapter is a theorem which shows in a precise way how f ~ g becomes smoother with increasing jj. Further we show that the causality property holds for local extrema of the signal so this is the signal \feature" appropriate to the multiscale dilation-erosion scale space. Prior to presenting this theorem some necessary partial results are obtained. The rst result relates the position and amplitude of a local maximum (or minimum) in the ltered signal to that in the original signal.

Proposition 3.5 (Jackway 1992a):

Let the structuring function have a single maximum at the origin, that is, g(x) is a local maximum implies x = 0, then: (a) If  > 0 and (f ~g )(xmax) is a local maximum, then, f (xmax) is a local maximum of f (x) and (f ~ g )(xmax) = f (xmax) (b) If  < 0 and (f ~ g )(xmin) is a local minimum, then, f (xmin) is a local minimum of f (x) and (f ~ g )(xmin) = f (xmin)

56

3.3 Properties of Multiscale Dilation-Erosion

Proof: Recall that the function (f ~ g )(x) is said to have a local maximum (minimum) at x0 i (f ~ g )(x0)  () (f ~ g )(x0) for all x in some -neighbourhood of x0, N (x0; ) = fx : kx ? x0k < g (DePree & Swartz 1988). Take case (a) rst. Consider a point on f ~ g within an neighbourhood of a local maximum xmax. We have:

sup ff (xmax ? t) + g (t)g  sup ff (xmax ? t + e) + g (t)g t t for all e 2 N (0; ) (3.31)

Assuming suitable continuity within the brackets, the sup on the LHS occurs for, say, t = . We then have,

f (xmax ? ) + g ()  f (xmax ?  + e) + g () ) f (xmax ? )  f (xmax ?  + e) for all e 2 N (0; ) (3.32)

Therefore,

f (x) has a local maximum at x = xmax ? :

(3.33)

Substituting u = t ? e on the RHS, equation (3.31) becomes,

sup ff (xmax ? t) + g (t)g  sup ff (xmax ? u) + g (u + e)g u t e 2 N (0; ): (3.34)

Putting u =  into equation (3.34),

f (xmax ? ) + g ()  f (xmax ? ) + g ( + e) ) g ()  g ( + e) e 2 N (0; ) (3.35)

Therefore g (u) has a local maximum at u = . From the conditions of the proposition this implies that  = 0, so that from the previous result, f (x) has a local maximum at x = xmax proving part of the proposition. We have already used, (f ~ g )(xmax) = sup ff (xmax ? t) + g (t)g t = f (xmax ? ) + g ()

(3.36)

So, using  = 0 and g (0) = 0, it is easy to show (f ~ g )(xmax) = f (xmax);

(3.37)

57

3.3 Properties of Multiscale Dilation-Erosion

which proves the remainder of proposition (a). Part (b) (for  < 0) is proved in an analogous way using the morphological erosion.



We are now able to relate a signal feature at non-zero scale to the original signal (zero scale). However to obtain a causality result we need the next proposition.

Proposition 3.6 (Jackway 1992a):

Let the structuring function have a single local maximum at the origin, that is, g(x) is a local maximum implies x = 0, then: (a) If 0 >  > 0 and (f ~ g0 )(xmax) is a local maximum, then, (f ~ g )(xmax) is a local maximum and, (f ~ g )(xmax) = (f ~ g0 )(xmax); (b) If 0 <  < 0 and (f ~ g0 )(xmin) is a local minimum, then, (f ~ g )(xmin) is a local minimum and, (f ~ g )(xmin) = (f ~ g0 )(xmin). Proof: Take part (a) rst. From equation (3.10), g (x) = g0 (x) ? E (x) where E (x)  0 for all x 2 G , and E (0) = 0. Consider (f ~ g )(xmax) with 0 >  > 0, (f ~ g )(xmax) = sup ff (xmax ? t) + g (t)g t = sup ff (xmax ? t) + g0 (t) ? E (t)g t  sup ff (xmax ? t) + g0 (t)g t = (f ~ g0 )(xmax): (3.38)

We have shown in the proof of proposition 3.5 that the sup occurs in the RHS of equation (3.38) for t = 0, however, for t = 0,

f (xmax ? t) + g (t) = f (xmax ? t) + g0 (t);

(3.39)

so, the sup also occurs in the LHS of equation (3.38) for t = 0 and (f ~ g )(xmax) = (f ~ g0 )(xmax) (3.40) Now, in the neighbourhood of xmax we have, from proposition 3.4, (f ~ g )(xmax + e)  (f ~ g0 )(xmax + e)

e 2 N (0; ): (3.41)

Therefore, (f ~ g )(xmax) is a local maximum, proving part (a) of the proposition. Part (b) (for 0 <  < 0) is proved in an analogous way using the inf operation. 

3.3 Properties of Multiscale Dilation-Erosion

58

These propositions provide very important scale-space results because they enable coarse-to- ne tracking in the scale-space image (Witkin 1983). If a signal feature appears at some scale 0 it also appears at zero scale and all scales in between. Stated as a monotonic property we can say that the number of features does not decrease as scale approaches zero. The hard work being done, this property is worth emphasising in a theorem.

Theorem 3.1 [Scale-Space Causality (Jackway 1992a)]: Let f : D  Rn ! R denote a bounded function, g : G  Rn ! R a scaled structuring function satisfying the conditions of proposition 3.6., and the point sets,

Emax(f ) = fx : f (x) is a local maximum g ; Emin(f ) = fx : f (x) is a local minimum g ; denote the local extrema of f . Then, for any 1 < 2 < 0 < 3 < 4,

Emin(f ~ g1 )  Emin(f ~ g2 )  Emin(f ) (3.42) Emax(f ~ g4 )  Emax(f ~ g3 )  Emax(f ) (3.43) Proof:

Suppose the theorem is false and Emax(f ~ g4 ) * Emax(f ~ g3 ) for some 0 < 3 < 4, then there exists some xmax 2 D such that F (xmax; 4) is a local maximum but F (xmax; 3) is not, which contradicts proposition 3.6(a). The case for Emin is proved similarly using proposition 3.6(b).  We have shown that the number of features may not increase with increasing scale but we have not shown that they decrease! Indeed with some (arti cial) signals the number of extrema is constant across all scales. For example if f (x) = sin(x); x 2 D then Emax (f ~ g ) = fx 2 D : sin(x) = 1g = fx 2 D : x = 2i; i 2 Zg and Emin (f ~ g ) = fx 2 D : sin(x) = ?1g = fx 2 D : x = 2i; i 2 Zg, which are independent of . If a signal, however, contains information at di erent scales this will generally be re ected as a decrease in the number of features with increasing jj. If the signal has a single unique global maximum (minimum), then for suciently large positive (negative) scale, there remains only a single feature in the scale-space image. Plotting E ~ () = Emax(f ~g )[Emin (f ~g ) versus  gives the multiscale dilation-erosion \ ngerprint" of the signal. Note that we use the term \ ngerprint" becase of the appearance of the graph and in analogy with the term introduced by Yuille & Poggio (1985) in the Gaussian scale-space case. We

3.3 Properties of Multiscale Dilation-Erosion

59

do not intend to imply any uniqueness properties by using this term. Such a plot for a smoothed uniform random noise signal is shown in gure 3.4.

Figure 3.4: A multiscale dilation-erosion ngerprint: the plot of the extrema of F (x; ) for a random signal. Theorem 3.1 provides one of the \scale-space axioms" of Lindeberg (1990) \The essential requirement is that a signal at a ner level of scale should contain less structure than a signal at a coarser level of scale. If one regards the number of local extrema as one measure of smoothness it is thus necessary that the number of local extrema in space does not increase as we go from a ner to a coarser level of scale."

The other axioms are also satis ed with one modi cation:

 All signals should be de ned on the same domain; in other words no pyramid representations will be used.

 An increasing value of the scale parameter  should correspond to coarser levels of scale and signals with less structure. Particularly,  = 0 should represent to the original signal.

 All representations should be generated by convolutions of the original

signal with a kernel (linear shift-invariant smoothing). The \morphological" convolution is shift-invariant, however it is increasing instead of being linear.

The axioms being satis ed, we are justi ed in claiming the construction of a new \scale-space".

3.4 Summary and Conclusion

60

3.4 Summary and Conclusion In this chapter we have introduced a new scale-space based on a multiscale morphological dilation-erosion smoothing operation. We have taken care to motivate, justify, and explain the approach through the proof of a series of properties and propositions which lead to a scale-space causality theorem which is the main result of this chapter. We then demonstrate that the new scale-space satis es the scale-space axioms of Lindeberg (1990) and therefore may properly be called a \scale-space". Since this scale-space is novel, myriads of interesting research questions arise, some of which are addressed in the remainder of this thesis. However, before tackling properties and applications of the scale-space, we concentrate in the next chapter on structuring functions, a topic which has not been discussed in detail in this chapter.

Chapter 4 Structuring Functions for Multiscale Dilation-Erosion

61

62

4.1 Introduction

4.1 Introduction In the previous chapter, we have treated the structuring function in a general way, introducing constraints only where mathematically needed to obtain desired theoretical results. From those results we may conclude so far that, depending on which of the properties of the previous chapter we require, the structuring function may need to be continuous, anti-convex, and have a single local maximum at the origin. We begin our discussion of structuring functions in section 4.2 by looking at the \semi-group" property of scale space and its implication for the allowable class of structuring functions. If we view the scale dependent dilation and erosion as operators on a signal, that is,

D() f = f  g E() f = f g

(4.1) (4.2)

then the semi-group property (Butzer & Berens 1967) for the scale parameterised morphological operations is

D( + ) f = D() (D() f ) ;   0; E( + ) f = E() (E() f ) ;   0; D(0) f = E(0) f = f:

(4.3) (4.4) (4.5)

Many authors consider this semi-group property one of the fundamental foundations of scale-space (Koenderink 1984, Lindeberg 1990), in fact Lindeberg (1988) makes it one of his axioms. This property enables the signal at scale  +  to be obtained directly from the previous signal at scale  by repeated dilation (or erosion) and speci es how the global structure of the scale-space is related to the local structure. In applications, due to the physical nature of the problem to be addressed, more constraints on the desired behaviour of any scale-dependent operator may need to be imposed. The morphological operations depend on \shape" (from the Greek root morphe = shape). In section 4.3 we continue our discussions of structuring functions by showing how we require the introduction of an additional parameter for the notion of the \shape" of a function to be properly de ned. This leads to a generalisation of the \umbra" concept of Sternberg (1986). Since this parameter has no physical signi cance, we would require any operations to be invariant to its value, which leads to a \shape invariance" requirement on the ltering. Rivest, Serra & Soille (1992) have expressed this condition more generally and called it \dimensionality" so we will use this established terminology. The question is, what conditions must be imposed on the structuring func-

4.2 Semi-Group Properties

63

tions to ensure the scale-space possesses dimensionality for images? We show that the answer has the e ect of limiting practical consideration of structuring functions for intensity image processing to a family called the \elliptic poweroids". For range images, where all the dimensions are compatible, the most general structuring function is the scaled sphere or, if directionality is required, the scaled ellipsoid. We then consider other aspects of structuring function choice such as second derivative and curvature properties and further restrict the suitable functions to paraboloids. Finally, we strengthen the practical appeal of paraboloids by demonstrating a morphological separability property in 2-D which facilitates ecient computation. Since the ecient computation of the morphological operations depends on properties of the structuring function, we include a brief discussion of computation in this chapter. Computation of the ngerprint itself will be covered in chapter 7 where we deal with applications. Parts of sections 4.3{4.7 of this chapter have been published in (Jackway 1994b) and accepted for publication in (Jackway 1994a).

4.2 Semi-Group Properties We begin our discussion of semi-group properties by looking at such properties of the structuring function itself. Then we will extend this property to the scale-space operators. Serra (1982) presents the following proposition: A family B(  0) of non-empty compact sets is a one-parameter continuous semi-group (i.e. B  B = B+ ; ;   0) if and only if B = B where B is a convex compact set. We present a related result for our scaled structuring function which shows why we have chosen the scaling equation [equation (3.3), repeated here for convenience] as   g (x) = jj g jj?1x jj?1x 2 G;  6= 0; (4.6) and why the anti-convex property of structuring functions is necessary. We could obtain the \umbras" (Sternberg 1986) of our functions and directly use Serra's result above but it is more informative to work from rst principles to show how the de nition of anti-convexivity comes into play. We make the following proposition: Proposition 4.1: A family g (  0) of scaled structuring functions given by equation (4.6) and which are anti-convex, is a

64

4.2 Semi-Group Properties

one-parameter continuous semi-group. That is, g  g = g+ for ;   0. Proof: (Since we consider only ;   0, for clarity, we immediately drop the j:j signs in (4.6). We start from the de nition of anti-convexivity:

 g(a) + (1 ? ) g(b)  g(a + (1 ? )b) ; (0    1) (4.7) Now make the following substitutions:  = +  ; (1 ? ) =  x?t t + ; a =  ; b =  , so that, ! !  g x ? t  +  g t  g (x ? t) + t : (4.8) +  +  + This implies, ! ! x ? t t x (4.9)  g  +  g   ( + ) g  +  : And we also note that, (  !) !  x ? t t x = ( + ) g g  + g   +  : (4.10) t= +x So,

(  !)  x ? t t g  g = max t g  + g  ! x = ( + ) g  +  = g+

(4.11)

Thus the semi-group property (Hille & Phillips 1957) of anticonvex structuring functions is proved.  As our scaled structuring functions are dependent only on the magnitude of the scale parameter, we have the further result,

g  g = gjj+jj ; ;  2 R:

(4.12)

As indicated in the introduction, if we view the scale dependent dilation and erosion as operators on a signal, that is,

D() f = f  g

(4.13)

65

4.3 A More General Umbra

E() f = f g

(4.14)

Then the semi-group property for structuring functions and the chain rules for dilations and erosions equations (2.15),(2.23) [repeated here]:

f  (g  g ) = (f  g )  g; f (g  g ) = (f g ) g;

(4.15) (4.16)

leads directly to the semi-group property for the scale parameterised morphological operations [see equations (4.4), (4.5)]. Lindeberg (1988) considers the semi-group property to be very important in his construction of a (linear) discrete scale-space. In fact he makes it one of his scale-space axioms from which the whole structure of scale-space is developed. In practical terms the semi-group property ensures that the result of smoothing a signal at some scale is independent of the path taken to arrive at that smoothing, that is, a signal smoothed at scale 3 may be obtained by smoothing the original signal f3 = f ~ g3 or by smoothing an already smoothed signal f3 = f1 ~ g2 where 3 = 1 + 2. As stated in the introduction, when working with functions (rather than sets ) we need to be careful with the concept of \shape" which is central to morphology.

4.3 A More General Umbra Binary images are readily represented as sets (for example, the set of all white points) and mathematical morphology was originally set-based (Serra 1982). However, grey-scale images are more naturally represented as functions where the value of the function at a point represents the image-intensity. The \umbra" (Sternberg 1986) provides a natural way to associate a set U with a function f and brings the operations of mathematical morphology to functions, n o U [f ] = (x; y; z) : z  f (x; y) : (4.17) Since morphological operations are non-linear and depend on \shape" we can see a shortcoming of the umbra approach, as de ned above, for physical signals. This is best illustrated with a more common function-to-set mapping | the drawing of a graph. Consider the process of drawing the graph of a signal voltage v(t) as a function of time. Firstly a scale is chosen for the x-axis to display the required time interval, then a scale is chosen for the y-axis (relative to that of the x-axis) to cause the shape of the resulting graph to convey the required

4.3 A More General Umbra

66

information to the viewer. Mathematically, we can view this operation as choosing appropriate scaling constants and to relate a physical timevarying, signal voltage to a set of points G in the plane (the graph). In symbols: n o G = (x; y) : x = t= ; y = v(t)= (4.18) In the digitisation of a physical signal (by sampling in time and quantising the samples) the signal is mapped into the digital space Z2. The scaling factors appear in the selection of sampling rate ( ) and the \gain" of the A-to-D converter ( ). We wish to emphasise here that function shape depends on the values of and , see gure 4.1. More precisely, we can see that scale depends on and shape on the ratio  = = . Since morphological operations are scale invariant, (G  B) = G  B, we can, without loss of generality, take the scale to be unity, this suggests that (4.18) be re-parameterised as, n o G = (x; y) : x = t; y = v(t) (4.19) Thus, in applying mathematical morphology to grey-scale images we consider Sternberg's umbra to be under-parameterised (Sternberg 1986), a more general formulation would be: A grey-scale image is a grey level function f (x; y) on the points of Euclidean 2-space. A grey level function can be thought of in Euclidean 3-space as a set of points [x; y; f (x; y)], imagined as a thin, undulating, not necessarily connected sheet. A grey-scale image f (x; y) is represented in the mathematical morphology by an umbra U [f; ] in Euclidean 3-space, where a point p = (x; y; z) belongs to the umbra if and only if z  f (x; y). Where  is a shape parameter. This extra shape parameter, , however presents a problem due to the dimensional inhomogeneity between the spatial dimensions and the intensity dimension of the grey-scale image. The value of  is therefore unde ned by the physical problem. It would be advantageous if the morphological operations were invariant with respect to the value of . This fact has until recently been overlooked in the image analysis literature. Lately, however, it has received some attention in Verbeek & Verwer (1989). We had earlier called this property \shape semi-invariance" (Jackway 1994b) but it is apparent that the newly published \dimensionality" property of Rivest et al. (1992) encompasses the same idea in a more precise and thorough way, which is to be preferred. We should therefore re-phrase the argument of this section in terms of dimensionality, this task is addressed in the next section.

67

4.3 A More General Umbra

10 9 8 7 6 5 4 3 2 1 0 0

0.5

1

1.5

2

2.5

3

0.5

1

1.5

2

2.5

3

1.5

2

2.5

3

1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 0

1.5 1.45 1.4 1.35 1.3 1.25 1.2 1.15 1.1 1.05 1 0

0.5

1

Figure 4.1: Shape depends on X; Y scaling

68

4.4 Dimensionality

4.4 Dimensionality Rivest et al. (1992) have recently shown the importance in image processing and analysis of a concept they call dimensional consistency or \dimensionality". Dimensional measurements on greyscale images have a physical signi cance. Suppose that an intensity image is mathematically modelled as a function f (x); x 2 R2 into the closed segment I = [0; 1]. The intensity axis I represents the irradiance (light intensity) at the image plane and is therefore not dimensionally homogeneous with the spatial dimensions. Any measurement with physical signi cance should not couple these physically di erent dimensions. Scaling in the spatial dimensions is known as \homothety" and scaling in the intensity direction as \anity" indicating the fundamental physical di erence between magnifying an image and brightening it (Rivest et al. 1992). Making a measurement on an image consists in applying a functional on the image, where a \functional" is a global parameter associated with a function. The property of \dimensionality" applies to functionals, that is: De nition 4.1 (Dimensional Functionals): The functional W is de ned to be \dimensional" if there exists constants k1; k2 such that for all 1; 2 > 0:

W (1f (2 x)) = k11 k22 W (f (x)) where 1 is the anity and 2 the homothety (Rivest et al. 1992).



This relation restricts the way in which anities and homotheties of an image a ect dimensional measurements on it, and results in a decoupling between anityRand homothety measurements. For example, the \volume" of f , V (f ) = f (x) dx, is dimensional since, V (f 0 ) = 1?2 2 V (f ) with f 0 = 1f (2 x). If f (x) is the irradiance at x then V (f ) has physical signi cance as the total radiant power. Rivest et. al. (Rivest et al. 1992) indicate that useful image processing operators should conserve dimensionality. It is commonly thought that \volumic" (non- at) morphological structuring elements lead to the break-down of dimensionality in morphological operations (Rivest et al. 1992, Sternberg 1986). This is true for xed scale structuring elements. However we will show in the next section, that if the morphological dilation-erosion scale-space is formed by using scaled \elliptic poweroid" structuring functions, which are in general volumic, any dimensional functional of the scale-space image is also a dimensional functional of the underlying image.

69

4.5 Dimensionality Properties in Scale-Space

In image analysis it is often desirable to ensure isotropic properties in any ltering, this translates directly to the morphological structuring functions being circularly symmetric. Sternberg (1986) mentions in passing the use of parabolas as structuring functions, Verbeek & Verwer (1989) have considered in more depth the morphological opening with non-spherical structuring elements, in particular poweroids and paraboloids, and van den Boomgaard (1992) has shown some special properties of the paraboloids with respect to morphological dilation. Commonly used structuring functions are: hemispheres, cylinders, cones, and paraboloids (Sternberg 1986) (see gure 3.2). A useful family of isotropic structuring functions is given by power functions p of the vector norm kxk = x:x. So we de ne the (negative) \poweroid" family of scaled structuring functions: De nition 4.2 (Poweroid Structuring Functions): The poweroid scaled structuring functions are given by,

g (x) = ?jj(kxk = jj)  0;  6= 0:

 Some representative members of the 2-D poweroid functions q  2 2 g(x; y) = ? x + y

(4.20)

are presented in gure 4.2. This family includes cones ( = 1), paraboloids ( = 2), quartoids ( = 4), and cylinders ( = 1). To cater for non-isotropic (directional) ltering we can de ne a (negative) \elliptic poweroid" family of scaled structuring functions:  p (4.21) g (x) = ?jj x0Ax = jj > 0;  6= 0: where A is a symmetric positive de nite matrix. In R2 the contours g (x) = constant are ellipses of various orientation and eccentricity. If A is the unit matrix then (4.21) reduces to the isotropic case (4.4). This idea is an extension of van den Boomgaard (1992)'s non-isotropic \quadratic structuring function".

4.5 Dimensionality Properties in Scale-Space Extending the de nition of dimensionality to scale-space images we de ne: De nition 4.3 (Scale-Space Dimensionality): W is a dimensional functional on F if given 1; 2; 3 > 0 there exists constants

70

0

0

-0.5

-0.5 g(x,y)

g(x,y)

4.5 Dimensionality Properties in Scale-Space

-1

-1.5

-1

-1.5

-2 40

-2 40 50

30

40

20

30

30

20

10

20

10

10 0

y

50

30

40

20

10 0

y

x

x

(b)

0

0

-0.5

-0.5 g(x,y)

g(x,y)

(a)

-1

-1.5

-1

-1.5

-2 40

-2 40 50

30

40

20

30

(c)

x

30 20

10

10 0

40

20

20

10 y

50

30

y

10 0

x

(d)

Figure 4.2: Various structuring functions of the 2-D circular poweroid family, g(x; y) = ?(x2 + y2 ) =2 : (a) cone ( = 1); (b) paraboloid ( = 2); (c) quartoid ( = 4); (d) cylinder ( = 1).

71

4.5 Dimensionality Properties in Scale-Space

k1; k2; k3 such that: W (1F (2x; 3)) = k11 k22 k33 W (F (x; ))

 Rivest et al. (1992) extend the discussion of dimensional consistency from functionals to image processing operators however a rigourous de nition for operators (corresponding to de nition 4.3 for functionals) is not given. Therefore we will use functionals on the signal as the mathematical vehicle to extend dimensionality to morphological scale-space. The same extension to operators applies as is outlined in Rivest et al. (1992). In scale-space ltering we work with the scale-space image which is of higher dimensionality than the signal (Witkin 1983). We can be careful to ensure that any operations used preserve dimensionality in the scale-space image. As an illustrative (somewhat non-practical!) example, suppose we were to try matching 1-D signals by counting the number of closed loops in their morphological scale- space ngerprints. (A ngerprint in dilationerosion scale-space is a plot of the positions of local extrema of the smoothed signal versus the smoothing parameter, see, for example, gure 4.3.)

Figure 4.3: A multiscale dilation-erosion ngerprint: the plot of the extrema of the scale-space image. Since the existence and relative position of local extrema are invariant under homothety and anity the number of closed ngerprint loops is a dimensional functional on the scale-space image. Now, a moments re ection shows that this is not the important point; the real question is whether our dimensional functional on the scale-space image F (which after all is a construct) is a dimensional functional on the original signal f and thus has a physical signi cance?

4.5 Dimensionality Properties in Scale-Space

72

So we need to nd out how to construct dimensional functionals on f . We now show that if the dilation-erosion scale-space is constructed via an appropriate class of structuring functions all dimensional functionals on F are also dimensional functions on f . We rst show the suciency of the elliptic poweroid structuring functions for this purpose. Without loss of generality we take  > 0 [the case of  < 0 follows by the symmetry (complementary properties) of morphological dilation and erosion]. Therefore we will consider the scale-space formed by the multiscale dilation F (x; ) = (f  g )(x);  > 0.

Proposition 4.2 (Jackway 1994a):p With elliptic  poweroid 0 structuring functions, g (x) = ?jj x Ax = jj , all dimen-

sional functionals W (F ) are also dimensional functionals of f : Proof: We rst of all establish a necessary property of this structuring function. For any 2; 3 > 0 0q T 1 (  t ) A (  t ) 2 2 A g3 (2t) = ?3jj@ j3j 0q T 1 (  t ) A (  t ) 2 2 ?j  j A =  @ j2j 1 p 0 ! ?j  j t At =  jj 1 = 1 g (t) (4.22) 1 Where 1 = ?2 3 ?1: (4.23) Remembering that F (x; ) = (f  g )(x);  > 0, and, f 0 = 1f (2x), we write:

1F (2x; 3) = 1 max t ff (2 x ? 2 t) + g3  (2 t)g 1 g (t) f (  x ?  t ) + = 1 max 2 2 t 1  = max t f1 f (2 (x ? t)) + g (t)g = (f 0  g )(x) (4.24)

Now, since by de nition W is a dimensional functional on F , for any 1; 2; 3 > 0 there exists k1; k2; k3 such that de nition 4.3 is

73

4.5 Dimensionality Properties in Scale-Space

satis ed. By using (4.23) and (4.24), we get: 1



W (f 0  g ) = k11 k22 (1 ?1 2 ?1 )k3 W (f  g ) 0 0 = k11 k22 W (f  g ) (4.25) where k10 = k1 + k?31 and k20 = k2 + k?31 . So, if W is seen as a functional on f , (4.25) shows this functional to be dimensional.



We next show the necessity of the poweroid structuring functions (within the class of general functions of the form p  g(t) = g t0At ; g(t)  0; for all t 2 Rn (4.26) with the matrix A symmetric and positive de nite). From the preceding proof it is evident that proposition 4.2 will only hold for structuring functions for which there exists some 3 > 0 such that equation (4.22) is satis ed. In this case, using the scaling relation (4.6) we rewrite (4.22) as: !  g t  =   g 2t n;  > 0 for all t 2 R (4.27) 3 1  3 Considering functions of the form of (4.26) thepfunctional relation (4.27) can be simpli ed to a 1-D relation by letting s = t0At = , and rearranging: !  2 g(s) = 13 g  s for all s > 0: (4.28) 3

We now have the following proposition:

Proposition 4.3 (Jackway 1994a): If, for any 1; 2 > 0 there

exists some 3 > 0, and, (4.28) is satis ed, then g(s) = Cs , where C is a constant. Proof: Setting s = 3=2 in (4.28) and rearranging gives, ! 1  3 (4.29) 13 = g(1) g  : 2

To simplify the notation, let A = 3=2 ; B = s2=3 ; C = g(1), then, substituting back in (4.28),

g(A) g(B ) = C g(AB )

for all A; B > 0

(4.30)

4.5 Dimensionality Properties in Scale-Space

74

Since g(s)  0 for all s > 0 we can take the natural logarithms of the negative of both sides of equation (4.30) to get: ln(?g(A))+ln(?g(B )) = ln(?C )+ln(?g(AB )); for all A; B > 0 (4.31) This is the de ning relation for a logarithmic function, so, ln(?g(s)) = ln(?C ) + logK s; for some constant K > 0: (4.32) Thus, g(s) = C exp(logK s) = C exp( lnlnKs ) = Cs ; C < 0; (4.33) where = (ln K )?1 is a constant. This proves the proposal.  As a point of interest, as ! 1 the structuring function approaches the \ at" (non-volumic) cylindrical structuring element well known in greyscale morphology and image processing (Nakagawa & Rosenfeld 1978, Sternberg 1986, Haralick et al. 1987). It is rather dicult to demonstrate the e ect of non-dimensionality in scale spaces as the e ects are likely to be small. One di erence, however, can be seen in the connectivity of ngerprints. As an example of the dimensionality property in 1-D we present gure 4.4 which shows the di ering e ects of the use of dimensional (parabolic) and non-dimensional (spherical) structuring functions in the computation of the \multiscale dilation-erosion scale-space ngerprint" of a certain signal and that same signal with an anity. A very close examination of gure 4.4 shows that with a non-dimensional structuring function the ngerprint of the stretched signal may be di erently connected, whereas with the parabolic structuring function the ngerprint is merely compressed in the scale direction. We can obtain a functional by counting the closed loops of the ngerprints, in this case we have gure 4.4(c) with 19 loops, and gure 4.4(d) with 20 loops (the di erence is in the 8th loop from the right) indicating the break- down of dimensionality. In contrast (with a parabolic structuring function) both gure 4.4(e) and gure 4.4(f) contain 19 loops. If the signal was an intensity image and some image analysis operation, such as pattern recognition, was sensitive to the connectivity of the ngerprint, then with the non-dimensional structuring function, the output would depend on the arbitrary scale chosen to represent the intensity dimension relative to the spatial dimensions of the image.

75

4.5 Dimensionality Properties in Scale-Space

(a)

(b)

(c)

(d)

(e)

(f)

Figure 4.4: An example of dimensionality in scale-space. (a) A random signal. (b) This signal with an anity of size 4.0. (c)&(d) The multiscale dilation-erosion ngerprints of the above signals with a non-dimensional (spherical) structuring function. Note the connectivity and structure of the ngerprints di er because of the anity: (c) has 19 closed loops and (d) 20 closed loops (the di erence is in the 8th loop from the right) indicating the break-down of dimensionality. (e)&(f) The multiscale dilation-erosion ngerprints of the above signals with a dimensional (parabolic) structuring function. Note the structure of the ngerprints remain similar (with 19 closed loops) indicating the conservation of dimensionality.

4.6 Second Derivative Properties

76

We make the general observation that (for any xed scale) the more \pointed" (low ) structuring functions tend to give more emphasis to the local shape near a signal feature. There are no general criteria for selecting although this parameter does have an e ect on the \generalised frequency response" of the lter (this topic is explored in the chapter 5 of this thesis). Other constraints or requirements may dictate the choice of and hence the structuring function. The \ at" structuring function ( = 1) is commonly used because the morphological operations reduce to simply taking the maximum or minimum of the signal over some neighbourhood (Nakagawa & Rosenfeld 1978). That is, for at structuring functions, (f  g )(x) = max t2G ff (x ? t)g ; (f g )(x) = min t2G ff (x + t)g :

(4.34)

The use of at structuring functions is most appropriate for binary images where the morphological operations are identical to those on point-sets. However on grey-scale images, the use of at structuring functions leads to

at regions in the output signal around the local extrema, and the local extrema are no longer exactly localised in position. This is certainly a disadvantage in multiscale dilation-erosion scale-space since it is the local extrema of the output signal which are our scale-space features and exact localisation is absolutely necessary. For other morphological scale-spaces the second derivative or curvature properties of the structuring function become important. Therefore we need to examine more closely the curvature properties of the various structuring functions of the poweroid family.

4.6 Second Derivative Properties As we shall see in chapter 6, for scale-spaces formed using the zero-crossing of the second derivative of the morphological closing-opening operations, the second derivative or curvature of the structuring functions is an important property. Therefore we will now examine the second derivative of the powerbolic structuring functions. The above scale-space can only be de ned for 1-D signals (see: chapter 6) so we consider the 1-D structuring function, g(x) = ?(jxj ) with > 1 which has a second derivative (wrt x), g00(x) = ? ( ? 1)jxj ?2; x 6= 0. Examining this second derivative as

4.6 Second Derivative Properties

x ! 0 we nd:

77

8 ?1; if 1 < < 2; < 00(x) = ?2; if = 2; (4.35) lim g : 0; x!0 if > 2. Parabolic structuring functions ( = 2) possess the advantage of a constant second derivative everywhere, g00(x) = ?2 for all x. It may be preferable to ensure that a structuring function has nite second derivative, and in particular at x ! 0. From (4.35) we see that this requirement corresponds to choosing  2. We turn now to curvature which is related to the second derivative by (Thomas, Jr. & Finney 1979), g00(x) (4.36) (x) = (1 + g02(x))3=2 The curvature of g(x) is therefore, ?2 (x) = (1 +( 2?x2(1) x?1))3=2 (4.37) The maximum curvature occurs at x^, where, !1=(2 ?2) ? 2 (4.38) x^ = 2(2 ? 1) We see that for = 2 the maximum curvature occurs at x^ = 0. As ! 1, x^ ! 1. As increases the maximum curvature increases and the point at which this occurs moves out from x = 0 towards x = 1. This behaviour can be seen graphically in gure 4.5 where various powerbolic (that is, 1-D poweroid) structuring functions are displayed. Regions of high curvature on a curve are regions of high information content (Attneave 1954) and having the curvature decrease away from the origin in the structuring function is similar to having the weightings in a linear lter decrease towards the edges of the window (a common practice). If the structuring function is to possess its maximum curvature at the origin (s = 0) and this curvature is to be nite then the only structuring function from the powerbolic family (4.4) is the parabolic stylus ( = 2). Before we leave this discussion of structuring functions, there is another reason why the paraboloid structuring functions are especially important.

78

4.7 The Paraboloid Structuring Function ∞ 10 4 2 1.5

1

2 1.8 1.6 1.4 1.2 1

h(s)

0.8 0.6 0.4 0.2 0 -2

-1.5

-1

-0.5

0

0.5

1

1.5

2

s

Figure 4.5: Various structuring functions of the powerbolic family g(x) = jxj .

4.7 The Paraboloid Structuring Function For a xed scale, the 2-D morphological dilation can be computed most directly by the \morphological convolution" n o f ( i ? x; j ? y ) + g ( x; y ) ; (4.39) (f  g )(i; j ) = (x;y max  )2G where G is a square neighbourhood of (i; j ). If the size of this neighbourhood is r2 then the computational burden of the direct implementation of equation (4.39) is Bd = O(r2). The Landau symbol O is often used to indicate computational complexity, : R() = O(x) means that R()=x is bounded as  ! 1 (Lipschutz 1969). For the 2-D paraboloid structuring function, g (x; y) = ?jj(x2 + y2)=2, we have a separability property:

g (x; y) = g(1)(x) + g(1)(y);

(4.40)

where the 1-D structuring function g(1)(x) = ?jjx2=2. The \max" function has a similar property, n o n o max f f ( x; y ) g ; (4.41) max f ( x; y ) = max (y) (x) x;y2G x2G

y2G

4.7 The Paraboloid Structuring Function

where G(x) and G(y) are the projections of G on the x and y axes. Combining these properties we get the desired result n o (1) (x) (f  g )(i; j ) = xmax

( i ? x; j ) + g  2G(x) where

(1)(y )g:

(i; j ) = ymax f

( i; j ? y ) + g  ( y ) 2G

79

(4.42) (4.43)

That is the computation has been reduced to a series of two 1-D morphological convolutions with a computational burden, B1 = O(r). The cost is that additional storage is required for the intermediate result . This result has recently become known in the literature as the separable decomposition of structuring elements (Shih & Mitchell 1991, Gader 1991, van den Boomgaard 1992, Yang & Chen 1993). There are actually two kinds of separability involved here. Firstly, additive separability where g (x; y) = g(1)(x) + g(1)(y), secondly, morphological separability, g (x; y) = g(1)(x)  g(1)(y). Note in the result (4.42{4.43) we have used additive separability to obtain a morphological separability result. In a very recent work, Yang & Chen (1993) have shown that for square morphological templates (ie. discrete structuring elements) the two kinds of separability are in fact equivalent. Further, Yang & Chen (1993) have shown in a theorem that if g(x; y) is additively separable of size (2r + 1)  (2r + 1), and it is anti-convex (NB. called \convex" in (Yang  & vChenv 1993)) vthen it canh be expressed as h h h g(x; y) = k1  k2     kr  (k1  k2     kr ), where ki is a horizontal 1-D structuring element of size 3, and kiv is a vertical 1-D structuring element of size 3. The importance of this result is that by the chain rules for dilations (end erosions), equations (2.15)&(2.23), if g = k1  k2     kr then f  g = (((f  k2)  k2)    ) kr So the whole operation can be performed as a sequence of 1-D 3-point operations. We do not take these important practical aspects any further here. The ecient computation of the multiscale morphological operations is considered in more detail in section 4.7. The point to stress here is that to obtain all these nice results we need additive separability of the structuring function. Writing the 2-D elliptic poweroids as 2 q p 0 2 2 2 g(x; y) = ? x Ax = ? a11x + 2a12xy + a22y (4.44) where,

! a a 11 12 (4.45) A = a12 a22 : The conditions necessary are therefore that, (a) a12 = 0 (A is a diagonal

4.8 Computation

80

matrix) and, (b) = 2. Therefore g(x; y) must be a circular paraboloid (a11 = a22), or an elliptic paraboloid with the major and minor axes of the ellipse aligned with the co-ordinate system x; y axes (for a11 6= a22). In practical terms this is a very favourable property of the paraboloids. Rein van den Boomgaard (1992) has also shown that the elliptic paraboloid structuring functions (which he called the \quadratic structuring functions" QSF) are closed with respect to morphological dilation (and erosion). This result is an extension of the semi-group property of section 4.2 to arbitrary QSF kernel matrices. In fact van den Boomgaard (1992) argues that the elliptic paraboloids can be considered to be the morphological equivalent of the Gaussian convolution kernels because this class is dimensionally separable and closed with respect to dilation (and erosion), thereby establishing an \equivalence" between the parabolic structuring function in mathematical morphology and the Gaussian kernel in convolution. Leaving theory for the moment lets make use of some of these properties and address the ecient computation of the morphological operations.

4.8 Computation Inspection of equation (4.39) shows that this smoother can, in principle, be implemented in a massively parallel fashion with a processing element for each pixel in G. Taking for example,  > 0, the processing element at (i; j ) takes as inputs the image values ff (i ? x; j ? y) : (x; y) 2 Gg and returns the maximum of f (i ? x; j ? y) + g (x; y). This approach can be related to the concepts of cellular automata where each \cell" (pixel) is subject to a transformation based on the values of the cells in a nite neighbourhood (Du , Watson, Fountain & Shaw 1973). Specialized hardware has been built along these lines for the closely related morphological lters, including: Golay Logic Processor (Golay 1969), di 3 (Graham & Norgren 1980), CLIP (Du et al. 1973, Du 1979), TAS (Klein & Serra 1972), DIP (Gerrit & Aardema 1981), and MPP (Batcher 1980, Batcher 1985). The prime example here is perhaps the pipeline architecture of the \Cytocomputer" patented by Sternberg and described in (Sternberg 1980, Sternberg 1983, Sternberg 1985). Ecient morphological algorithms have also been implemented within the framework of \Image Algebra" (Ritter & Gader 1987, Dougherty & Giardina 1988, Ritter, Wilson & Davidson 1990) which is more formal approach to the algebra of image operations. As the hardware required is usually of a rather specialised and expensive nature (Snyder, Jamieson, Gannon & Siegel 1985, Loui, Venetsanopoulos & Smith 1990), parallel implementations of the smoother will not be considered any further here. We therefore consider the implementation of equa-

81

4.8 Computation

tion (4.39) in a sequential manner, pixel by pixel, on a standard machine. Therefore all the algorithms will be of order O(n2) where n2 is the number of pixels in the image. For exactness consider the case for  xed and positive | the computational details for  negative are identical. We ignore for the moment edge of image e ects which complicate the algorithm without assisting the explanation, so we assume (i ? x; j ? y) 2 D.

4.8.1 A naive approach

For D discrete and square, equation (4.39) may be written, +r (f  g )(i; j ) = x;ymax ff (i ? x; j ? y) + g (x; y)g =?r

(4.46)

To obtain the whole output image this operation is performed at each of the n2 pixel of the input, ;2;:::;n (f  g )(i; j )j ji=1 (4.47) =1;2;:::;n The naive approach is the direct implementation of (4.46) and (4.47). (Note, computational algorithms will be given in a MODULA-2 like code which has a straight-forward and easily understood syntax). The input image is in the two dimensional array f[.,.], the structuring function is in g[.,.], and the output is placed in fout[.,.]. for j := 1 to n do for i := 1 to n do max := 0 for x := -r to r do for y := -r to r do temp := f[i-x, j-y] + g[x, y] if temp > max then max := temp endif endfor y endfor x fout[i, j] := max endfor i endfor j

Figure 4.6: The naive code This will become the \standard" algorithm against which improvements will be measured. This program is basically four nested loops with the comparison inside the innermost loop. By multiplying together the dimensions of the loops, we expect the execution time Tn to be proportional to n2(2r +1)2.

82

4.8 Computation

4.8.2 An improved algorithm

It can immediately be seen that for adjacent pixels in the image the support regions overlap except for a single new row or column, one wonders whether use can be made of partial results computed for the previous region. Indeed this is the case. The rst step forward is to use the separability property of the paraboloid structuring function. The max operation possesses the two following properties,  +r  +r +r maxfa g (4.48) max f a g = max i;j =?r ij i=?r j =?r ij and,

+r +r max f  + a g =  + max fa g; for all  2 R: (4.49) i i=?r i=?r i Applying these properties and the separability of the structuring function to the problem at hand we have, +r (f  g )(i; j ) = x;ymax ff (i ? x; j ? y) + g( 1)(x; y)g =?r  +r  +r ( ( = max maxff (i ? x; j ? y) + g 1)(y)g + g 1)(x) x=?r y=?r +r = max f (i ? x; j ) + g( 1)(x)g x=?r

(4.50)

Where,

+r (4.51)

(i; j ) = max ff (i; j ? y) + g( 1)(y)g y=?r Therefore we can break the algorithm into two steps, f ! and ! (f  g ). The improved algorithm follows using an additional array gamma[.,.] to hold the intermediate result and the array h[.] which contains the separated structuring function. This program contains 2 sections of three nested loops with the comparison inside the innermost loop. By multiplying together the dimensions of the loops, we expect the execution time Tn to be proportional to 2n2(2r + 1) which is a factor of 2=r less than the naive algorithm. This is an important savings as the algorithm is now O(r) rather than O(r2).

4.8.3 Further improvements

Intuition would suggest that we would be unable to reduce the algorithm below O(r) however other improvements are possible. First we need the

83

4.8 Computation for j := 1 to N do for i := 1 to N do max := 0 for y := -r to r do temp := f[i, j-y] + h[y] if temp > max then max := temp endif endfor y gamma[i, j] := max endfor i endfor j for j := 1 to N do for i := 1 to N do max := 0 for x := -r to r do temp := gamma[i-x, j] + h[x] if temp > max then max := temp endif endfor x fout[i, j] := max endfor i endfor j

Figure 4.7: Improved algorithm (version 1) following property of the structuring function, 1)2 g( 1)(x ? 1) = ?jj (x ? 2 2 1 ? 2x ! x = ?jj 2 + 2 = g (1)(x) ? jj 1 ?22x x = ?r : : : x  r (4.52) Consider computing (i; j +1) = maxrx=?r ff (i; j ? (x +1))+ g( 1)(x)g having previously computed (i; j ) = maxrx=?r ff (i; j ?x)+g( 1)(x)g. Using a change of dummy variable  = x +1 and equation (4.52), (i; j +1) can be rewritten, r+1

(i; j ) = =max ff (i; j ? ) + g( 1)( ? 1)g ?r+1

1 ? 2 g r+1 = =max f f ( i; j ?  ) + g (1)(  ) ? j  j  ?r+1 2

(4.53)

Now we have already made a comparison amongst all but one of these terms (with di erent weightings) when computing (i; j ), say this maximum occurred for x = m. Thus in particular we already know that f (i; j ? m) +

84

4.8 Computation

g( 1)(m)  f (i; j ? x) + g( 1)(x) for x = ?r; ?r + 1; : : : ; m ? 1. Thus we can deduce that f (i; j ? m) + g( 1)(m) ? jj 1?22  f (i; j ? ) + g( 1)() ? jj 1?22 for  = ?r + 1; ?r + 2; : : : ; m ? 1 so we do not have to test for the maximum amongst these pixels. This same principle applies for the second part of the algorithm, that is computing (f  g ) from . On average we reduce the amount of comparisons and thus the time by half. A geometrical interpretation is that as the structuring is moved from left to right across the function, the point of contact with the function must always appear, at or to the right of, the existing point of contact. The improved algorithm follows. Care needs to be taken with the sequence of processing the array elements to avoid storing the position of the maximum for more than one element. The index corresponding to the maximum is stored in variable m. We have recently discovered that van den for i := 1 to N do for j := 1 to N do max := 0 for y := m-1 to r do temp := f[i, j-y] + h[y] if temp >= max then max := temp mtemp := y endif endfor y gamma[i, j] := max m := mtemp endfor j endfor i for j := 1 to N do for i := 1 to N do max := 0 for x := m-1 to r do temp := gamma[i-x, j] + h[x] if temp > max then max := temp mtemp := x endif endfor x fout[i, j] := max m := mtemp endfor i endfor j

Figure 4.8: Improved algorithm (version 2) Boomgaard (1992) has found very similar improvements for his \quadratic

4.8 Computation

85

structuring functions."

4.8.4 The support region

Until now it has been assumed that the dimension of the support region r  r has been dependent only upon ,  q (4.54) r = jjM which can be evaluated when the structuring function g is initialised (Note, bxc denotes the largest integer less than x). This formula corresponds to the support region of equation (3.21). However, if we are willing to let the support region depend upon f (i; j ) then the smaller region, q  r = jj(M ? f (i; j )) ; (4.55) may be used (see proposition 3.4 of section 3.3.1. As an alternative we can use partial results already computed in the inner loop to determine when to exit this loop. During the iteration of the inner loop if the magnitude of the structuring function becomes greater than M ? max then we should terminate iterations since f + g cannot be greater than max for any future iteration. The savings from this scheme are dicult to estimate, but it has one good property in that as  increases the likelihood of obtaining a high value for max early in the iteration count is higher therefore the likelihood of an earlier termination of the loop increases. This partially counteracts the greater r due to greater . The algorithm follows. The best way to compare these algorithms is to implement them on a computer and time them, this is done in the next section.

4.8.5 Illustrative run times of the algorithms

The following timings were made for a 256  255 image of uniform random image pixels with a greyscale range of 0 : : : 255. The smoothing was carried out for  = 1:0 which is equivalent to r = 15. The computer used was an IBM PC/AT clone1 which uses an Intel 80286 CPU running at a clock speed of 16MHz with a 80287 maths co-processor. The algorithms were coded in 1 Model TP-VLSI from Total Peripherals Pty Ltd., 2 Short Street, Chatswood NSW

2067, Australia.

86

4.8 Computation

for i := 1 to N do for j := 1 to N do max := 0 y := m-1 hx := h[y] repeat temp := f[i, j-y] + hx if temp >= max then max := temp mtemp := y endif inc(y) hx := h[y] until (-hx >= M-max) OR (y > N-j) gamma[i, j] := max m := mtemp endfor j endfor i for j := 1 to N do for i := 1 to N do max := 0 x := m-1 hx := h[x] repeat temp := f[i-x, j] + hx if temp >= max then max := temp mtemp := x endif inc(x) hx := h[x] until (-hx >= M-max) OR (x > N-i) fout[i, j] := max m := mtemp endfor i endfor j

Figure 4.9: Improved algorithm (version 3)

87

4.8 Computation

the language MODULA-22 using the \compact" memory model and with all compiler optimizations turned on. The timings are Table 4.1: Runing times for the algorithms algorithm time Naive 559 sec Improved (1) 32 sec Improved (2) 14 sec Table 4.2: Running times for the algorithms (seconds). IBM PC/AT 80286/7 16MHz. Picture of \Lena" 256  256  8 bits. Language: Topspeed Modula-2 V3.01. Scale Algorithm  alg 4.7 alg 4.8 alg 4.9 1 32 18 18 3 88 46 43 4 113 58 52 5 137 69 61 6 158 79 70 7 177 89 78 8 193 98 86 9 208 105 93 10 221 111 100 11 231 118 106 12 256 105 111 13 256 130 117 14 256 129 121 15 257 128 126 16 256 129 130 17 256 129 134 18 256 128 139 19 255 129 142 20 256 129 146 2 Topspeed Modula-2, Jensen & Partners International, Inc. 1101 San Antonio Road,

Suite 301 Mountain View, California 94043, USA.

4.9 Summary

88

4.9 Summary We discuss the semi-group property of scale-spaces and show how this is a consequence of a similar property of the structuring functions under dilation if these structuring functions are anti-convex. By considering \shape" for functions we have been lead to a shape invariance property and a generalisation of the umbra concept. Recognising that this is indeed the dimensionality principle of the literature, we have extended the dimensionality principle to scale-space. We have shown that morphological operations using elliptic poweroid structuring functions, which are in general volumic, retain dimensionality in morphological scale-space and enable us to ensure operators retain dimensionality on the original image. Of the poweroids the paraboloids have good second derivative and curvature properties. We show a computational bene t of paraboloids by demonstrating a morphological separability property in that a 2-D morphological convolution is reduced to a series of two 1-D convolutions. We mention the semi-group property of morphological dilation and erosion by elliptic paraboloids which has led to them being called the morphological equivalent of the Gaussian convolution kernels. As an application of the separability property of the paraboloid structuring function we have developed an ecient computer algorithm for implementing the morphological operations. This chapter completes the theoretical development of the dilation-erosion scale-space. In the next chapter we will examine some of the statistical aspects of smoothing a signal by dilating and eroding with poweroids.

Chapter 5 Spectral and Statistical Properties of the Scaled Dilation-Erosion

89

5.1 Introduction

90

5.1 Introduction In the previous two chapters we have established a scale-space theory based on a non-linear ltering operation. Here we pause in this development to build some important general links between this new approach and more conventional (linear) lters. The reader will have noticed that all the work so far has been done in the time (or spatial) domain, we have not even mentioned any statistical properties of our operations. What does our ltering do to random signals? There are many, well researched, classes of statistical lters, some of them non-linear (for example, the median lters) how do our morphological operations t in with these established lters? What are the links? Likewise, the linear lters used in scale-space are low-pass lters in the frequency domain and their derivatives are therefore band-pass. Low-pass linear lters are closely associated with the smoothing of a signal. We have already established in the theorem of chapter 3 that our operation smooths a signal (by reducing the number of extrema) therefore, in what sense do we have a low-pass operation? More generally, does the concept of a frequency response make any sense at all for a non-linear operation? We make a beginning in answering these questions in this chapter. It happens that we can de ne a \generalised frequency response" and as long as care is taken, this approach provides yet another link with the linear lters. We have already indicated that this work is ancillary to the development of the scale-space but we feel it is important that any new technique not be presented in a vacuum. The links with existing methods are crucial if new work is to be accepted and understood. The insight gained by considering these non-linear lters in the frequency and statistical domains is also invaluable in choosing between methods and in developing areas of applicability for the new methods. This chapter is in two main sections following this introduction. In section 5.2 the response of the lter to a single frequency input signal is developed. This response is developed through determining the \generalised frequency response" which is the ratio of the variances of the output and input signals [see, Justusson (1981)]. The analysis here is carried out in the continuous domain and for a 1-D signal. In section 5.3 the response of the discrete form of the smoothing lter to noise input is developed. Speci c results are obtained for the case of an input which consists of independent and identically distributed noise from a uniform distribution. Since we have a window with independent noise this analysis applies as well for images (2-D) or higher dimensional signals. The treatment here follows broadly that used by Justusson (1981) in a paper on the statistical properties of median lters to which this lter is related.

91

5.2 Generalised Frequency Response

Kuhlmann & Wise (1981) have considered \second moment" properties of median lters, and later, Bovik & Restrepo (1987) established the spectral properties of the more general \L-estimate" lters. As will be seen in section 5.3, the lter considered here is not in the class of L-estimate lters so its analysis is more cumbersome than these standard cases. The author is not aware of any other work on the spectral or statistical properties of the grayscale morphological operations.

5.2 Generalised Frequency Response

5.2.1 De nitions and Notations

Impulse response and frequency response functions are often used to describe lters, however this usage is based on the linearity property of the lter. Let, (x) denote the delta function, which can be de ned (Papoulis 1977) as the limit of a family of functions c(x) such that Z1 Z1 c(x)f (x) dx = f (0) (5.1) c(x) dx = 1 lim c!0 ?1

?1

for any function f (x) continuous at the origin. This leads to the de ning identity, Z1 (x)f (x) dx = f (0): (5.2) ?1 Now, we can express the input signal as a linear combination of scaled impulses, Z1 (5.3) f (t) = f (x)(t ? x) dx ?1

so, we can express the output y(t) of a linear shift-invariant system L[:] as the convolution of the input signal with the lter impulse response, that is, Z 1  y(t) = L[f (t)] = L f (x)(t ? x) dx Z 1 ?1 f (x)L [(t ? x)] dx = Z?1 1 f (x)h(t ? x) dx (5.4) = ?1

where h(x) = L[(x)] is known as the lter impulse response. Since no such relationship holds for non-linear lters, the lter impulse response is of limited value. For example the impulse response of a median lter is zero (Justusson 1981) which is not very informative! For what it is worth, from the de nitions of the morphological operations, it is easy to

92

5.2 Generalised Frequency Response

obtain the following impulse response for the discrete scaled morphological dilation-erosion: 8 > < g (x) if  > 0; ( ~ g )(x) = > (x) if  = 0; (5.5) :0 if  < 0, n where x 2 Z and (x) is the multidimensional discrete impulse function,  if x = 0; (x) = 01 otherwise. (5.6) Because of the non-linearity of our lter, we do not consider this result very, important except perhaps as an indication of the lter response to impulse noise. Note, in the continuous case, the signal (x) is unbounded, that is, (x) ! 1 as x ! 0, so multiscale dilation-erosion is unde ned for this input [see chapter 3]. In linear system theory, we often use the Fourier decomposition of the signal into linear sums of scaled exponentials (Papoulis 1977), Z1 f (t) = 21 F (!) ej!t d! (5.7) ?1 p where, j = ?1, and F (!) is the Fourier transform of f (t), Z1 (5.8) F (!) = f (t) e?j!t dt: ?1

In particular, because of the symmetry properties of the above equations (5.7) and (5.8) , if f (t) is a real even function, then we have the simpler versions, Z1 1 f (t) = 2 ?1R(!) cos !t d! Z1 f (t) cos !t dt: (5.9) R(!) = ?1

Again, we can express the output y(t) of a linear shift-invariant system L[:] as the linear sum of its responses to the individual spectral components fR(!) cos !tg. Z 1  y(t) = L[f (t)] = L R(!) cos !t d! Z 1 ?1 R(!) L [cos !t] d!: (5.10) = ?1

Therefore the \frequency response" or the response of the lter to f (t) = cos !t is important. For non-linear systems, the frequency response is less

93

5.2 Generalised Frequency Response

useful but, as mentioned in the introduction, a generalised form of this concept has been used in the literature (Justusson 1981). As Tyan (1981) says: Although the law of superposition does not apply to nonlinear lters, a study of the power transferred and that transported to its harmonics still yields important information about the behaviour and merit of each nonlinear lter. Therefore, in this section we consider the response of the scaled morphological dilation-erosion to 1-D input signals of the form,

f (t) = A cos !t;

A; !  0

(5.11)

where A is known as the amplitude and ! the angular frequency of the input signal. In passing we note that f (t) is continuous and bounded. In particular we shall be interested in the output of the lter with respect to !. In contrast to the scale-space approach we only consider the output at a single scale, so we can write the ltering equation as,

f (t) = (f ~ g )(t):

(5.12)

For the arguments advanced in the previous chapter, we are interested in structuring functions of the powerbolic family, so will develop the results for the powerbolic structuring functions ! j x j g (x) = ?jj jj > 1: (5.13) Because of the dilation-erosion duality with respect to negation, equation 2.31, it is easily shown that the statistical and spectral properties to be considered here depend only on the magnitude of . Therefore, to avoid unnecessary complication, throughout this chapter, we assume  > 0, consider only the dilation, and drop the j:j signs around . Combining equations (5.12) and (5.13) , our class of lters is therefore de ned by the following equation: ( ! ) j x j f (t) = max f (t ? x) ?    > 0; > 1; t 2 R: (5.14) x2R Referring back to equation (5.11) we see that the input signal is periodic [f (t) = f (t + nT ); n 2 Z] with period T = 2! , and is an even function [f (?t) = f (t)]. From our knowledge of the cos function we know that (for A; ! > 0) this signal possesses local maxima for t = kT; k 2 Z, and the signal value at these points is A. Putting x = 0 in equation (5.14) it is

94

5.2 Generalised Frequency Response

easy to see that the output signal f also has local maxima of height A at t = kT; k 2 Z. Also, from the symmetries in equation (5.14), we see that the output signal f (t) is periodic with period T , and is even. Therefore it has a line spectrum and can be exactly represented by the \Fourier cosine series" (Picinbono 1988), 1 X (5.15) f (t) = a0 + ai cos i!t; i=1

where the coecients faig are given by, Z T=2 2 a0 = T f (t) dt; 0 Z T=2 f (t) cos i!t dt; i = 1; 2; 3; : : : (5.16) ai = T4 0 From these coecients, we can nd the one-sided power spectral density (PSD) which gives the output power at each frequency component (Taub & Schilling 1971),

P (0) = a20; 2 a P (i) = 2i ; i = 1; 2; 3; : : : (5.17) Adding these powers together we get the total power in the output signal 1 X (5.18) Ptot = P (0) + P (i) i=1

The total output power may also be found directly by obtaining the meansquare of the output signal by integration Z T=2 Ptot = T2 0 f2(t) dt (5.19) Note, the equality of equations (5.18) and (5.19) is a consequence of Parseval's theorem (Picinbono 1988). The DC (zero-frequency) power in the output conveys no information, so for signals with a non-zero DC component it is more useful to consider the variance which is the output power not at DC, that is, = Ptot ? P (0) (5.20) Var f

95

5.2 Generalised Frequency Response

Likewise for the input signal itself the variance is given by, Var = A2=2 f

(5.21)

These quantities are in general a function of the frequency of the input signal. We form the ratio, f (! ) (5.22) F(!) = Var Var (!) ; f

to indicate the proportion of the input power that is passed by the lter. We refer to a graph of F(!) versus ! as the \generalised frequency response" [called simply the \frequency response" in Justusson (1981)]. For linear shiftinvariant lters where all the output power is at the fundamental frequency (since exponential signals are eigenfunctions of a linear shift-invariant lter), equation (5.22) reduces to the well known standard frequency response of the lter. The vertical axis of frequency response graphs is traditionally scaled in decibels (dB), FdB(!) = 10 log F(!), and a logarithmic scale is normally used for the horizontal (frequency) axis giving the so-called \Bode plot" (Picinbono 1988). Our task is to obtain a Bode plot of the generalised frequency response for the lter of equation 5.14.

5.2.2 The Output Waveform

Since the output signal is periodic we need only determine it over a single period, say, ?T=2 < t  T=2. First we rewrite the lter equation in a more convenient form substituting for the input signal, and using the change of variable,  = ?!x, ( ) j  j f (t) = max A cos(!t + ) ? !  ?1 2R A; !;  > 0; > 1; ?T=2 < t  T=2: (5.23) For each value of t, the maximum of the right hand side must occur for

? <    since the cos(!t + ) must reach its maximum in this range and ?jj =(!  ?1) is a decreasing function of jj. Let this maximum occur for  = 0(t), then,

( ) j  j f (t) = A cos(!t + ) ? !  ?1

=0 (t)

A; !;  > 0; > 1; ?T=2 < t  T=2:

(5.24)

96

5.2 Generalised Frequency Response

Now the quantity inside the braces on the RHS of equation (5.24) is di erentiable with respect to  and therefore the derivative is equal to zero at  = 0(t), that is,

?A sin (!t + 0(t)) ? sgn (!0 (t)) ?j1 0(t)j

?1

= 0 ? < 0(t)   (5.25)

where \sgn" is the sign function [sgn(x) = 1 if x  0, sgn(x) = ?1 if x < 0]. Collecting the constants and rearranging we get  ! ? sin(0 + !t) + ! sgn(0) j0j ?1 = 0 ? < 0   (5.26) c where we have de ned, !c = 1 A 1  1? (5.27) A quick sketch will show that this equation possesses from one to three solutions for 0 depending on the values of !, t and !c . However t = 0 implies 0 = 0, and 0 < t  T=2 implies ?!t < 0  0. Moreover, since the \sin" function is odd, that is, sin(0 + !t) = ? sin(?0 ? !t), if 0 corresponds to a certain t, then ?0 corresponds to ?t. These relations simplify the numerical determination of the solution of equation (5.26). Additionally, as equation (5.26) is well behaved, a numerical solution can be easily obtained by a method such as Newton-Raphsonbisection (Press, Flannery, Teukolsky & Vetterling 1986, sect. 9.4). We have implemented such an algorithm which given the values for !; t and !c , returns 0. Having found 0 substituting back into equation (5.24) and rearranging gives, ( )  0 (t) f (t) = A cos (!t + 0(t)) + sin (!t + 0(t)) : (5.28) Together, equations (5.26) and (5.28) give the required output waveform. We have plotted this output waveform in gure 5.1 for various values of  and with A and ! held constant. For comparison, the input signal f (t) = A cos !t is also shown. The shape of the output waveform depends on the behaviour of 0(t) which in turn depends on the ratio !=!c . The nature of this dependence is examined further in the next section.

5.2.3 Asymptotic Results

The word asymptotic in the section heading refers to the behaviour of the output as the angular frequency ! becomes very small and very large. More precisely we shall examine the cases !=!c  1 and !=!c  1.

97

5.2 Generalised Frequency Response 1.0000 0.8000 0.6000 0.4000 f_sig(t)

a=4, sig=3

a=4, sig=1 a=2, sig=1 a=4, sig=1/3 a=2, sig=1/3

0.2000

a=2, sig=3

0.0000 0.0000 -0.2000

1.0000

2.0000

3.0000

4.0000

5.0000

6.0000

7.0000

-0.4000 -0.6000 -0.8000 -1.0000 t

Figure 5.1: Output waveform of the scaled dilation f (t) = (f ~ g )(t). Input signal f (t) = cos t, structuring function g = 1? jxj ; = 2; 4;  = 1 ; 1; 3. 3

Low-frequency response

Assuming a parabolic structuring function, = 2, and using the trigonometric identity sin(0 + !t) = sin 0 cos !t + cos 0 sin !t, we can rewrite equation (5.26),  ! ?2 sin 0 cos !t + cos 0 sin !t + 0 ! =0 (5.29) c Now, using the rst term of the Maclaurin series expansions, sin 0 = 0 + O(30) and cos 0 = 1 + O(20), the equation becomes  ?2 0 cos !t + sin !t + 0 !! = O(20) c ? sin !t (5.30) 0   ?2 cos !t + !!c We notice that 0 ! 0 as !=!c ! 0 which ensures the ignored terms of the series expansions are negligible. Now, we see from equation (5.28) that,

!=!c ! 0 ) f (t) ! A cos !t;

(5.31)

that is, the output signal equals the input signal. Or, in lter terms, the

5.2 Generalised Frequency Response

98

signal is passed without loss. Since the DC (zero-frequency) component is zero, the variance of the output is equal to the output power, that is, Varf (!)j!!c = A2=2.

High-frequency response For high frequencies [!=!c  1], we use the rst term in the Maclaurin series

expansion of sin, that is, sin(!t + 0) = !t + 0 + O ((!t + 0)3). Rewriting equation (5.26),  ! ?2   = O (!t + 0)3 (!t + 0) + 0 ! c ? !t 0  (5.32)  ! ?2 ?T=2 < t  T=2: 1 + !c We note that 0 + !t ! 0 as !=!c ! 1 which ensures that the ignored terms of the series expansions are negligible. Now, from equation (5.28) we see that

!=!c ! 1 ) f (t) ! A:

(5.33)

That is, the output signal is constant at the maximum value of the input signal. Since the signal is constant, the variance of the output is zero, Varf (!)j!!c = 0, although the DC component is large. Recalling the previous result for the low frequency variance, we see that, in terms of output variance, the smoother is acting as a low-pass lter. This fact is not unexpected since linear smoothers are also low-pass lters. It is now necessary to study the (generalised) frequency response of our lter to determine the shape of the frequency response curve and in particular the \corner" frequency and slope of the roll-o (Picinbono 1988).

Generalised frequency response

Before proceeding to establish by numerical methods the full frequency response, we can improve the asymptotic results of the previous section to establish an approximate frequency response. Again, assume !=!c  1, equation (5.24) can be written as f (t) = cos(!t +  ) ? 20  ! ?2 (5.34) 0 A 2 !c Substituting the rst two terms in the Maclaurin expansion of cos(!t + 0)

5.2 Generalised Frequency Response

99

we get,

f (t)  1 ? (!t + 0)2 ? 20  ! ?2 ?T=2 < t  T=2 (5.35) A 2 2 !c Substituting for 0 from equation (5.32) and simplifying ! f (t)  1 ? 1 +  ! 2 ?1 (!t)2 ?T=2 < t  T=2 A !c 2 2 = 1 ? G(!) (!t2 ) ?T=2 < t  T=2 (5.36)

Here, the (!t)2=2 term represents the approximate \waveform" of the output at high frequencies and the G(!) = (1 + (!=!c )2)?1 is the amplitude of this waveform. The variance of the output signal is proportional to G2(!) with the constant of proportionality depending upon the waveform. To nd this constant for high frequencies we nd the power in a signal of shape (!t)2=2; ?T=2 < t  T=2. !2 Z T=2 Z T=2 2 2 4 2 C = T (!t) =4 dt ? T (!t) =2 dt 0 0 4 ? 4 = 20 36 4  (5.37) = 45  2:168

At low frequencies the waveform is close to a sine wave so the constant reduces to C  0:5. The Variance of the input signal is A2=2 so the generalised frequency response is 1 1 f (! ) F(!) = Var = 2 C (5.38) 2 Varf (!) (1 + (!=!c ) ) (1 + (!=!c )2) Which can immediately be identi ed as the transfer function of a cascade of two rst order Butterworth lters (Johnson 1976) with corner frequencies of !c . This is an interesting nding! This is also the justi cation for the previous choice of the notation !c for the quantity 1= A?1= ?1 at equation (5.27). For low-pass lters !c denotes the \corner frequency" where the frequency response begins to \roll-o " with increasing frequency. A very important point here is that the corner frequency of our lter is dependent on A, the amplitude of the input signal. For linear lters the frequency response in independent of signal amplitude, so our nding is a outcome of the non-linearity of the lter, and is related directly to its origin as a

5.2 Generalised Frequency Response

100

\morphological" (shape based) lter. In the sense of section 4.3, we note that increasing the amplitude of a sine wave does change its shape. For high frequencies, !=!c  1, the generalised frequency response can be expressed in decibels as,

FdB = 10 log10 !?4 = ?40 log10 !;

(5.39)

which gives a roll-o slope of 40 dB per decade, that is, increasing the input frequency by a factor of 10 results in a decrease in the output variance by 40 dB. One obvious question is whether a quartic ( = 4) structuring function would lead to a transfer function approximated at high frequencies by a cascade of 4 Butterworth lters and with a roll-o of 80 db/decade? Theoretically this problem becomes rather intractable, so an alternative approach is required. In any case we have used several asymptotic approximations which should be veri ed. Therefore we now seek to numerically determine the generalised frequency response of our lter. Since we are no longer constrained by algebra we shall obtain responses for both parabolic and quartic structuring functions. The method is quite straightforward, we numerically implement equations (5.16) | (5.20) on the function de ned by (5.26) | (5.28). Integration is performed using the \Simpsons rule" integration procedure given in (Press et al. 1986, p.113). The results are shown in gure 5.2. Indeed the quartic structuring function does lead to a roll-o of 80 dB/decade in the generalised frequency response. In the case of at structuring functions = 1 it is apparent that the output variance will be exactly zero for !  =. This is not inconsistent with the notion of an in nite roll-o . Therefore it may be appropriate to refer to the exponent in the powerbolic structuring functions as the \order" of the lter although this usage is only a suggestion in a eld crowded with inconsistent terminology [see, for example, section 2.3.1]. From chapter 3 we know that morphological scale-space deals with local extrema of the signal. If we are forming the scale-space of a 1-D signal it may be appropriate to work with the rst derivative of the signal instead of the original. In this case the local extrema of the derivative corresponds directly to edges in the signal (or, zero-crossings of the second derivative) which provides a link with Gaussian scale-space where such zero-crossings are used. Therefore it is worthwhile to examine the generalised frequency response for the system comprising a di erential operator followed by our scaled dilation-erosion.

101

5.2 Generalised Frequency Response

relative output variance Vout / Vin

10dB

-10dB

Butterworth

-20dB

Quartic sf Parabolic sf

-30dB

0.1

1

10

relative frequency W / Wc

Figure 5.2: Generalised frequency response of the scaled dilation-erosion.

Frequency response when preceded by a di erentiator

For an input signal of f (t) = A sin !t the derivative (wrt t) is f 0(t) = !A cos !t which is the input used previously with the amplitude A replaced by !A. If this signal is now passed through the lter with the response of equation (5.38) then the overall frequency response is given by

! f (! ) F(!) = Var = 2 C Var (!) (1 + (!=! )2)2 2

f

c

(5.40)

This is a bandpass response shown graphically in gure 5.3. That this is a bandpass response with centre frequency dependent on a scale parameter has implications for its use in image processing where it provides some parallel to the spatial-frequency-tuned channels in early human vision (Marr et al. 1979). We note however, that for images (2-D signals) the equivalence between zerocrossings of Laplacians (used in, Gaussian scale-space) and local extrema of rst derivatives no longer exists as in the 1-D case. The use of multiscale dilation-erosion on the derivatives of multi-dimensional signals is a potential area for further research. At this point, we will leave the frequency-domain analysis of the scaled dilation-erosion . In the next section we will undertake a statistical analysis of this lter.

102

5.3 Statistical Properties

relative output variance Vout / Vin

0dB

-10dB Butterworth Parabolic sf

-20dB 0.1

1

10

relative frequency W / Wc

Figure 5.3: Generalised frequency response of the scaled dilation-erosion for = 2 and = 4 when preceded by a di erentiator.

5.3 Statistical Properties In this section we view the discrete form of the multiscale dilation-erosion as a \statistical lter", that is, the value of the output, Zt; t 2 Z, is a statistic based on the values of the input Xi; i 2 Z, in some \window" around the point t. We can therefore write,

Zt = T (Xt?n ; Xt?n+1 ; : : : ; Xt+n )

t 2 Z;

(5.41)

where T is any statistic and the window width is 2n +1. In image processing the most common forms of statistical lters are perhaps the moving average lter (where T is the mean) and the median lter (where T is the median). To place the scaled dilation-erosion in this framework we sample the scaled structuring function on the discrete grid to obtain \structuring function weights" gi = g (i); i = ?n; ?n + 1; : : : ; n: (5.42) For the discrete case the terminology of \window" for the support region, and structuring function \weights" is in accordance with the usual terminology for such moving statistical lters (Justusson 1981). Now we can write the discrete forms of the scaled dilation and scaled erosion as

Zt = max fXt?n + gn ; Xt?n+1 + gn?1 ; : : : ; Xt+n + g?n g ;

(5.43)

103

5.3 Statistical Properties

Zt = min fXt?n ? g?n ; Xt?n+1 ? g?n+1 ; : : : ; Xt+n ? gn g ; (5.44) t 2 Z; which are in the form of equation (5.41). The links with established classes of statistical lters is worth brie y exploring. One approach is to recognise that the median, maximum, and minimum, are examples of \order statistics" (David 1981), also commonly called \rank order statistics" (Nodes & Gallagher, Jr. 1982), or even \rank statistics" (Meddis 1984). The r-th order statistic of a set of data is obtained by ranking all the data into increasing size and then taking the r-th element. De nition 5.1 (Order Statistics): The r-th order statistic X(r) of a set of data fXi; i = 1; 2; : : : ; ng is de ned by

X(r) = r?th largest value of fXi g i = 1; 2; : : : ; n



Therefore we have

X(1)  X(2)  : : :  X(r)  : : :  X(n) :

(5.45)

So, in this way X(n) is the maximum of the data, X(1) is the minimum, and X n+1 is the median (for N odd). The moving median lter was rst 2 introduced by Tukey (1971) for smoothing time-series and quickly became widely used both in statistics (Tukey 1977, Hartwig & Dearing 1979) and in image processing (Pratt 1978, Justusson 1981, Nodes & Gallagher, Jr. 1982). The main property of median ltering is that it removes impulse noise but retains sharp edges and constant areas in the signal. The moving maximum and minimum lters have been used to remove \salt-and-pepper" noise from images by Nakagawa & Rosenfeld (1978). Many generalisations have been made to order statistics lters (Lee & Kassam 1985, Nieminen, Heinonen & Neuvo 1987) to obtain new properties or to unify the many types of lters (Maragos & Schafer 1987a, Maragos & Schafer 1987b). Many of these lters are based on well-known point estimators from statistics (Lehmann 1983). For example, P the mean minimises the quantP 2 ity (Xi ? a) and the median minimises jXi ? aj, thereforePone way to generaise these estimators is to nd the quantity that minimises (Xi ? a), where  is a convex, even function (Lehmann 1983). This leads to the class of M-estimators, of which the mean and median are special cases, the corresponding moving window lters are called M- lters (Lee & Kassam 1985). Another point estimator is formed by taking linear combinations of order statP n A X . This is the L-estimator (Lehmann 1983), istics, that is, Z = i=?n i (i)

5.3 Statistical Properties

104

again the corresponding moving window lters are called L- lters (Lee & Kassam 1985). If Ai = 2n1+1 ; i = ?n; ?n +1; : : : ; n then we have the mean lter; if A0 = 1; Ai = 0 i 6= 0 we have a median lter; if A?n = 1; Ai = 0 i 6= ?n we have a minimum lter; if An = 1; Ai = 0 i 6= n we have a maximum lter. Another form of order statistic lter is the \weighted median lter" where instead of taking the median of a set of data fX?n ; X?n+1 ; : : : ; Xng, we weight the data by repeating elements of this set wi times before taking the median (Loupas, Mcdicken & Allan 1989). For example if n = 1 and we have weights w?1 = 2; w0 = 3; w1 = 2 we take the median of the extended set fX?1; X?1 ; X0; X0; X0; X1; X1g. In this way the ability of the lter to preserve the signal increases but the noise suppression is reduced (Loupas et al. 1989). In this light we can see equations (5.43) and (5.44), the discrete dilation and erosion, as \additively weighted maximum" and \additively weighted minimum" operations thereby providing a uni cation with order statistics lters. As a generalisation, instead of taking the minimum and maximum (or 1-st and n-th order statistics), we may take the k-th and (n ? k)-th order statistics in equations (5.43) and (5.44) to give a \soft" morphology where the structuring functions are soft and able to be penetrated by k data points (Koskinen & Astola 1992). This would make the operations more robust to impulse noise and may well be worthy of further research. In the context of this thesis, we do not know whether such soft morphology would be suitable to construct a scale-space. As a nal comment we note that the classes of \stack lters" (Wendt, Coyle & Gallagher 1986) and \generalised stack lters" (Lin & Coyle 1990) which are lters that commute with thresholding have been developed. These lters unify the classes of linear, order statistic, and morphological lters. Lin & Coyle (1990) have provided some results in selecting optimum lters from this very large class. We will begin now by obtaining some statistical properties of the output of the discrete dilation when driven by noise. To allow the greatest generality we will for now allow the structuring function weights to be constrained only by being nite within some speci ed window. Later in section 5.3.2 we use the powerbolic weights to obtain results corresponding to the lters of the previous section.

5.3.1 General Results for Noise Input

Let the inputs Xi in equation (5.43) be independent and identically distributed with cumulative distribution function (cdf), FX (x) = PrfX  xg, and probability density function (pdf), fX (x) = dxd FX (x). Further, let

105

5.3 Statistical Properties

Yi = Xt+i + g?i . The cdf of Yi is, FYi (x) = = = =

PrfYi  xg PrfXt+i + g?i  xg PrfXt+i  x ? g?i g FX (x ? g?i )

(5.46)

The corresponding pdf is, fYi (x) = fX (x ? g?i ). Note that because the Xi are independent the Yi are also independent. Now consider the cdf of the output of the lter Zt,

FZt (x) = PrfZt  xg = Prfall Yi  xg Y+n FX (x ? g?i ) = i=?n

Di erentiating to get the pdf of Zt, 2 +n 3 +n Y X fX (x ? g?i ) fZt = 4 FX (x ? g?i )5 ; i=?n i=?n FX (x ? g?i )

(5.47)

(5.48)

Note, this formula for the pdf of the maximum of independent non-identically distributed variates is given in (David 1981, Ex 2.3.2). Since FZt does not depend on t, fZtg is a strictly stationary process, so the subscript t will be dropped from now on. To clarify: the distribution of the output sequence depends only on the distribution of the input FX , and the structuring function weights fgig.

The Mean

The mean of the output signal is,

Z = ZE (Z ) 1 xfZ (x) dx = ?1 2 3 +n Z 1 Y+n X fX (x ? g?i ) x 4 FX (x ? g?j )5 = dx ?1 j =?n i=?n FX (x ? g?i ) X+n Z 1 Y+n FX (x ? g?j ) dFX (x ? g?i ): x = i=?n ?1

j=?n j6=i

(5.49)

106

5.3 Statistical Properties

Following the method in David (1981, p. 34) we now show the conditions for the existence of Z . If E (X ) exists, and if jgij  1, then E (Yi ) = E (X + g?i ) = E (X ) + g?i exists. Noting that, Z1 (5.50) E (Yi) = x dFX (x + gi ); and since 0  FX  1,

jZ j 

?1

X+n Z 1 i=?n

?1



jxj dFX (x + gi ) ;

(5.51)

and Z exists if E (X ) does.

The Autocovariance

To obtain (p), the autocovariance of Z at lag p, we rst obtain the joint cdf, FZt;Zt+p (x; y) = PrfZt  x; Zt+p  yg. The joint cdf is independent of t because of stationarity, and, if p > 2n, is the product of FZt (x) and FZt+p (y) because of independence. Thus it is sucient to consider the case for 0 < p  2n. Expanding Zt and Zt+p .

Zt = max(Xt?n + gn ; : : :; Xt?n+p + gn?p ; : : :; Xt+n + g?n ) (5.52) Zt+p = max( Xt?n+p + gn ; : : : ; Xt+n + g?n+p ; : : :; Xt+n+p + g?n ) (5.53) From the de nition of conditional probability (Hogg & Craig 1978), we invoke the identity, PrfZt  x; Zt+p  yg = PrfZt  xg: PrfZt+p  y j Zt  xg;

(5.54)

where j denotes a conditional probability. The complications are due to the common random terms fXi g i = t ? n + p; : : :; t + ng in the overlapping parts of equations (5.52) and (5.53). These terms, however, have di erent structuring function weights in the two corresponding parts. Using equation (5.54) these terms lead to probabilities of the form PrfX  j X  g. Using conditional probability again, we have, ( PrfX  g FX ( ) PrfX  j X  g = PrfX  g = FX ( ) ; < ; (5.55) 1;  :

107

5.3 Statistical Properties

We now introduce some notation to denote a conditional sum or product, Yb Xb fi ; fi; i=a (cond)

i=a (cond)

where the sum or product is over i = a to i = b subject to cond = true. By careful inspection of equations (5.52) and (5.52) we can write down the joint distribution function

FZt;Zt+p (x; y) = PrfZt  x; Zt+p  yg ?Y n+p?1 Yn ( FX (y ? gi) ) Yn FX (y ? gi) FX (x ? gi) = i=?n+p FX (x ? gi?p ) i=?n i=?n =

Yn?p Yn FX (x ? gi) FX (x ? gi)

i=n?p+1



(x?gi?p >y?gi )

i=?n (x?gi y?gi+p ) ?Y n+p?1 Yn FX (y ? gi) FX (y ? gi ) i=?n+p i=?n (x?gi?p >y?gi)

(5.56)

The joint pdf is obtained as usual by di erentiating the joint cdf, 2 ;Z (x; y ) t+p fZt;Zt+p (x; y) = @ FZt@x@y Yn?p Yn FX (x ? gi ) FX (x ? gi ) = i=n?p+1



i=?n (x?gi y?gi+p ) ?Y n+p?1 Yn FX (y ? gi) FX (y ? gi ) i=?n+p i=?n (x?gi?p >y?gi)

2 3 n ? p n X fX (x ? gi ) X fX (x ? gi ) 77  664 + 5 i=n?p+1 FX (x ? gi ) (x?g i=y??ng F)X (x ? gi ) i i+p 3 2 ? n + p ? 1 n X fX (y ? gi ) X fX (y ? gi ) 77 +  664 5 (5.57) i=?n FX (y ? gi ) i=?n+p FX (y ? gi ) (x?gi?p >y?gi )

If p > 2n, the expansions of FZt and FZt+p have no terms in common and the joint cdf is the product of the individual cdf's. In this case the autocovariance will be zero (because of the independence of Xi ; Xj for i 6= j ).

5.3 Statistical Properties

108

To complete this sub-section, we state the formula for the autocovariance of Z (Hogg & Craig 1978). Using the joint density we have just found, the autocovariance is,

Z (p) = E (Zt ? Zt )(Zt+p ? Zt ) = EZ (ZtZZt+p) ? 2Z 1 1 = ?1 ?1 xy fZt;Zt+p (x; y) dxdy ? 2Z :

(5.58)

These results are completely general, they refer to the maximum of any set of independent, identically distributed, additively weighted, variates. The rst point to note is that these results also apply directly by symmetry to the minimum of the set and thus to the morphological erosion. Secondly, although the notation has implied a 1-D lter, the inputs fXi g may have been taken from a 2-D (or higher dimension) neighbourhood surrounding Xt. Having dealt with the general situation, we now choose a speci c input noise distribution and suitable structuring function weights.

5.3.2 Results for uniform noise and powerbolic structuring function weights

The multiscale dilation-erosion is de ned for bounded input signals so the distribution of the noise input should have nite support. The Uniform distribution is the prototype and simplest of all such distributions so we will let X  U [0; 1], that is,  1 if 0  x < 1; fX (x) = 0 elsewhere. (5.59) 8 0 if x < 0; < (5.60) FX (x) = : x if 0  x < 1; 1 if x  1. The uniform distribution is commonly used in work on order statistics (see for example, David (1981)). Inagaki (1980) recently used this distribution to calculate the autocovariance function for the moving maximum statistic, our lter reduces to this case for the at structuring function, gi = 0 for i = ?n; : : :; n which corresponds to poweroid weights with = 1. Now we consider the structuring function weights given by the discrete form of equation (5.13).

gi = ?jj1? jij i 2 Z; 1 < :

(5.61)

109

5.3 Statistical Properties

For the these weights, we have,

g0 = max (g ) = 0 i2Z i 0 > gi > gj for all 0 < i < j g?i = gi

(5.62) (5.63) (5.64)

Since the input signal is bounded by [0; 1], from equation (3.21) we can obtain a formula for the necessary size of the window,

n = b jj ?1 c:

(5.65)

In general, fX (x) = 0 for x < a and x > b, then n = b jj ?1 (b ? a) ?1 c. We begin by nding the cdf of the output signal Z . Substituting equations (5.60) and (5.61) into equation (5.47), we have,  Y+n  0 < x < 1: (5.66) x + jj1? jij FZ (x) = i=?n (x+jj1? jij 0, if

X  B (r) 6= ; and CN (X ) = CN [X  B (r)] = 1; we have the following relation:

Z f@ [X  B (r)]g  Z (@ X );

and Z f@ [X  B (r)]g is monotonic decreasing as r increasing.  This leads to a scale-space causality property if we take zero-crossings of curvature function along the boundary of image objects as the image feature of interest. This was the rst paper to obtain any scale-space causality property for a morphological operation and has received some attention because of this fact. However, because of the method of proof, the Chen & Yan (1989) paper has several limitations. Firstly, the scale-space result is applicable only to binary images where the boundary of objects is a simple closed planar curve, secondly, scale can only be increased to the point where the image objects split into multiple components, thirdly, the theorem applies

118

6.2 Background f f o g1

g1

g1

f o g2

g2

Figure 6.1: A counter-example to the proof of the theorem in Chen & Yan (1989). f has 3 irregular points; f g1 has 2 irregular points; while f g1 has 1 irregular point. only to scaled disks (where radius is the scale parameter) as the structuring element. Fourthly, the theorem applies only to the multiscale opening operation (although it can be applied to the closing through the duality of opening and closing: equation (2.44)). Note: there is an unimportant error in the proof of the theorem in (Chen & Yan 1989) where it is stated that \the number of irregular points on [X B (r)] will not decrease as r increases. The counter-example of gure 6.1 shows clearly that this is not the case. Fortunately this part of the proof is unnecessary as the required result can be obtained more directly from the generalised idempotency property [X  B (r + )] = [X  B (r)]  B (r + ) and theorem 6.1. is not invalid from this cause. Another minor error in the Chen & Yan (1989) paper is the statement that,

119

6.2 Background

(opening with a disk of radius r satis es the condition that:) : : : The lter recovers the whole image at suciently small scale. Let r = 0, then B (r) is a point, i.e., B (r) = b0, thus X  (b0) = X . (Chen & Yan 1989, p.695) This statement shows a misunderstanding of the topological properties of the morphological operations as outlined, for example, in Serra (1982, chapter III). In particular, from proposition III{29 of Serra (1982), we see that the opening and closing are upper semi-continuous (not continuous), so that, for example, if X has \threads" of zero thickness (see: gure III.1 of Serra (1982)) then limr!0fX  B (r)g 6= X . This is perhaps of no practical importance since we would not use such an X to model an object in a physical image, but it indicates a certain lack of rigour in the paper of (Chen & Yan 1989). More serious is another mathematical de ciency in the theorem itself, a de ciency that also occurs in other authors. However, we will wait until after we treat this later more general work, to discuss these problems and their solutions. This result of theorem 6.1 on binary images ( sets in R2 ) can be applied to functions of a single variable through the use of the umbra concept of Sternberg (1986) as discussed in chapter 4. In fact, Chen & Yan (1989) use the \singular" function f (x) = ?xe?x2 + x2 ex2=4 (NB. misprinted in the paper) in their discussion to demonstrate some advantages of their method over Gaussian scale-space. Advantages claimed are: that zero-crossings do not merge into one as in Gaussian ltering; the theorem holds for a wider mathematical class of functions; the ngerprint is more interpretive about the original image; less computationally expensive than the Gaussian ltering. When applied to functions, zero-crossings of curvature are equivalent to zerocrossings of the second derivative, since, 00 (x) f > 0 ) f 00(x) > 0; (6.1) f (x) = 3 = 2 0 2 (1 + f (x)) f (x) < 0 ) f 00(x) < 0: (6.2) Note, the second derivative does not have to exist at all points on the function, that is why we speak of \zero-crossings". Therefore the Chen & Yan (1989) result can be applied directly to zero-crossings of the second derivative of 1-D functions, but only with the scaled semi-circular structuring function,

p

g (x) = K + 2 ? x2

0  ; jxj  ;

(6.3)

where K is an arbitrary constant, the value of which does not a ect the opening.

6.2 Background

120

The work of Chen & Yan (1989) has been improved and generalised by Jang & Chin (1991) who showed that convexity and compactness of the structuring element are the necessary and sucient conditions for the monotonic property of the multiscale morphological opening lter. Their theorem is:

Theorem 6.2 [Monotonic Property of the Multiscale Opening (Jang & Chin 1991)]: Suppose X is a compact set in R2.

Z [@ X ] denotes the nite number of zero-crossings of curvature function along the contour @ X , and CN [X ] is the number of connected components of X . For any r > 0,

X  B (r) 6= ; and CN [X ] = CN [X  B (r)] = 1;

we have

Z (@ [X  B (r)])  Z [@ X ]; and Z (@ [X  B (r)]) is monotonic decreasing as r increasing if and only if B (r) is a compact convex set.  By the duality of opening and closing, this result applies directly to the closing operation. Through the use of the umbra, this result is also extended to functions and, in particular, includes the opening by the powerbolic family of structuring functions which are compact and anti-convex. Note: convex structuring elements correspond to anti-convex structuring functions. A disadvantage of this work is that there is no obvious extension to functions on higher dimensional spaces. While an advantage is that in using zero crossings of curvature there is an obvious connection with the Gaussian approach which also uses zero-crossings of the second derivative (Laplacian in 2-D) (Witkin 1984). We have earlier foreshadowed that there are theoretical problems with theorem 6.2 , we can illustrate this with a counter-example. Consider the function, f (x) = (jxj mod2 ? 1)2; x 2 R, which is graphed in gure 6.2. This function is continuous and analytic everywhere except at the cusps, x = 2i; i 2 Z. The second-derivative is, f 00(x) = 2; x 6= 2i; i 2 Z. Therefore the number of zero-crossings of the second-derivative of this function is, Z [f 00] = 0. Now, if we open f (x) with the unit parabola, g(x) = ?x2, we obtain,  f (x)  0:25; (f  g)(x) = 0:5 + g(x + 2i); i 2 Z ifif ff ((xx)) > (6.4) 0:25. Therefore the second derivative of the opening is,   0:25; (f  g)(x) = 2?2 ifif ff ((xx)) > (6.5) 0:25,

121

6.2 Background 12

2

10

1.5 f(x)

1

8 (f o g )(x)

0.5

6 0 4 0

1

2

3

4

5

6

7

8

9

10 -0.5

f"(x) 2

-1

0

-1.5 0

1

2

3

4

5

6

7

8

9

10 -2

-2 (f o g)"(x)

Figure 6.2: A counter-example to the theorem of Jang & Chin (1991). The opening of, f (x) = (jxj mod2 ? 1)2 ; x 2 R, by structuring function, g(x) = ?x2, and the second derivatives of these functions. which clearly has zero-crossings at x = 2i; i 2 Z. Thus, Z ((f  g)00) > Z (f 00), which proves the theorem of Jang & Chin (1991) false. A similar example using semi-circles would invalidate the theorem of Chen & Yan (1989). Note, the monotonic property still holds, only at zero-scale do problems occur. From gure 6.2, we can see the reason for the diculty, the section of the function replaced by the arc of the structuring function during the opening operation does not contain any zero-crossings of the second-derivative, however two new zero-crossings are introduced where the structuring function (with negative second-derivative) meets the function (with positive secondderivative). Chen & Yan (1989) and Jang & Chin (1991) seem to have forgotten the possibility of new zero-crossings of second-derivative being introduced where the structuring function meets the signal function. Of course if the derivative of f (x) exists and is continuous everywhere (that is the function is of class C 1 (Lipschutz 1969)) then the function would have to contain at least 2 zero-crossings of second-derivative in the section replaced by the arc of the structuring function. So the theorems would hold. Therefore the answer lies in restricting the input functions to those of class C 1. This will become clearer when we develop our version of this theorem in section 6.3.2. A good comparison of Gaussian and morphological opening scale-spaces for shape analysis has recently appeared in the literature (Jang & Chin 1992). It is of interest that this review places a similar emphasis on the signal

122

6.3 Scale-Space Properties of the Multiscale Closing-Opening

feature | smoothing lter aspects of scale-space and the importance of the scale-space causality or monoticity property as is found in this thesis. The next section addresses the task of obtaining important scale-space causality results for the multiscale closing-opening within the functional framework established in chapter 3, and deals with extending the theory to local extrema and multidimensional functions.

6.3 Scale-Space Properties of the Multiscale Closing-Opening Throughout this chapter we will work with the multiscale closing-opening, denoted and de ned in de nition 3.4. In this chapter we will often obtain results for scale  < 0 since this corresponds to the opening operation which is used in the literature (Chen & Yan 1989, Jang & Chin 1991). In these cases we then appeal to the duality principle of the opening and closing (equation (2.44)) to extend the results to the morphological closing and hence to the combined closing-opening operation.

6.3.1 General Properties of the Multiscale Closing-Opening In order to work with scaled structuring functions we need the following proposition: Proposition 6.1: If g (x) denotes an anti-convex scaled structuring function given by equation (4.6), and if j2j  j1j then the scaled structuring function g2 (x) is morphologically open with respect to g1 (x). Proof: From the property of the opening of the dilation, equation (2.46), we have, (f  g)  g = f  g;

(6.6)

and from the semi-group property for scaled structuring functions, equation (4.12), we have,

g2 = gj2 j?j1 j  g1 :

(6.7)

Combining these equations we get the required result,

g2  g1 = (gj2 j?j1j  g1 )  g1 = gj2 j?j1j  g1 = g2 :

(6.8)

123

6.3 Scale-Space Properties of the Multiscale Closing-Opening

 Therefore, we have the following order properties with respect to scale, Proposition 6.2: If, 1 < 2 < 0 < 3 < 4, then (f g1 )  (f g2 )  f  (f g3 )  (f g4 );

(6.9)

Proof: From the previous proposition, we nd g1 is open with respect to g2 , and g4 is open with respect to g3 . Then, from equations (2.52) and (2.53) the result follows.  Standard results from mathematical morphology theory (Haralick et al. 1987) show that both the closing and opening are idempotent, therefore, (f g ) g = f g :

(6.10)

The multiscale closing-opening lter also satis es the following scale-related conditions (cf. Chen & Yan (1989)): 1. It is scale invariant, that is,  1 (6.11) f (t) g (t) =   f (t) g1(t) ; 2. The lter recovers the input signal for zero scale (by de nition!), (f g0)(t) = f (t);

(6.12)

3. As scale approaches positive (negative) in nity the output approaches the global maximum (minimum) of the input signal that is, lim !1f(f

g )(t)g = max ff (t)g t ff (t)g lim !1 f(f g )(t)g = min t

(6.13) (6.14)

Therefore the multiscale closing-opening is suited to the formation of a scale-space, similar to that formed by the multiscale dilation-erosion (equation (3.15). That is, we consider the \multiscale closing-opening scale-space" F : D  Rn  R ! R de ned by,

F (x; ) = (f g )(x):

(6.15)

We will obtain monotonic properties for signal features within this scalespace.

6.3 Scale-Space Properties of the Multiscale Closing-Opening

124

The results of both Chen & Yan (1989) and Jang & Chin (1991) depend on partitioning the result of the opening operation into arcs of the original set and arcs of the translated structuring element, we will extend this partitioning idea to work with functions on multi-dimensional spaces and therefore with multi-dimensional arcs or \patches" of the structuring function. The rst step is a basic morphological result, outlined in Haralick et al. (1987), which provides a geometrical interpretation to the opening and closing. To obtain the opening of f by a paraboloid structuring element, for example, take the paraboloid, apex up, and slide it under all the surface of f pushing it hard up against the surface. The apex of the paraboloid may not be able to touch all points of f . For example, if f has a spike narrower than the paraboloid, the top of the the apex may only reach as far as the mouth of the spike. The opening is the surface of the highest points reached by any part of the paraboloid as it slides under all the surface of f . (. . . ) To close f with a paraboloid structuring element, we take the re ection of the paraboloid in the sense of (equation (2.32)), turn it upside down (apex down), and slide it all over the top of the surface of f . The closing is the surface of all the lowest points reached by the sliding paraboloid. (Haralick et al. 1987). In terms of the opening we have the following proposition.

Proposition 6.3 (Haralick et al. 1987): 2 f g =T4

[

fz:U [g]U [f ]g

3 U [g]z5 :

Where, U [g] is the umbra of g, that is, U [g] = f(x; y) : y  g(x)g. T [U [g]] : Rn ! R is the \top surface" of the umbra, that is, T [U [g]](x) = maxfy : (x; y) 2 U [g]g. U [g]z indicates the translate of U [g] by z 2 Rn  R, U [g]z = fu + z : u 2 U [g]g. Proof: This result is proved in proposition 71 of Haralick et al. (1987).  From this geometrical interpretation of the opening (or closing) we see that the output signal can be partitioned, that is with f : D 2 Rn ! R,  S 000(); (6.16) (f  g )(x) = fs((xx)) ifif xx = = S (). with,

S 0() [ S 00() = D

(6.17)

125

6.3 Scale-Space Properties of the Multiscale Closing-Opening

S 0() \ S 00() = ; (6.18) 00 s(x) < f[(x) x = S () (6.19) s(x) = PATCH [(g)zi ] (6.20) i2I

PATCH [(g)zi ] \ PATCH [(g)zj ] = ; for i 6= j

(6.21)

where PATCH [(g)zi ] is a patch on the structuring function g which has the origin translated to zi 2 U [f ]. Note that I is a nite index family (Jang & Chin 1991). In words, the opening of a signal consists of patches of the original signal combined with patches of translated structuring functions. By duality, the closing can be partitioned in a similar way. This geometrical interpretation of the opening and closing (on a 1-D function) is illustrated in gure 6.3. Now we examine how this partitioning varies with scale and we

g(x)

(a)

S"

f(x)

(b)

(fog)(x)

(c)

(fog)(x)

S"

(d)

Figure 6.3: Geometrical interpretation of the opening and closing with partitioning. (a) parabolic structuring function g(x); (b) signal f (x); (c) opening f  g; (d) closing f  g. obtain the following proposition: Proposition 6.4: Given that 1 < 2 < 0, [ f  g1 = f (x)jx2S0(1 ) [ PATCH [(g1 )zi ] (6.22) i2I [ f  g2 = f (x)jx2S0(2 ) [ PATCH [(g2 )zi ]: (6.23) i2I

and,

S 0(1)  S 0(2)  D

(6.24) Proof: Equations (6.22) and (6.23) follow directly from equation (6.16). Relation (6.24) follows from proposition 3.2, on the order (anti-extensive) properties of the opening.  The above proposition states that with increasing scale, the opening replaces more and more of the original function with patches from the structuring

126

6.3 Scale-Space Properties of the Multiscale Closing-Opening

element. Likewise, from the duality of opening and closing, we see that a corresponding result applies for the closing, with the function being replaced with patches from the inverted structuring function.

6.3.2 Causality Theorems for the Multiscale ClosingOpening

The patches of the upright (inverted) structuring function possess several smoothness properties, namely that they: are anti-convex (convex), cannot contain a local minimum (local maximum), contain at most one local maximum (local minimum), and possess negative (positive) second derivative. These properties lead directly to the 1-D monotonic scale-space property for zero-crossings of curvature as already obtained via the umbra from Theorem 6.2. Restating this theorem in functional terms, we have:

Theorem 6.3 [Monotonic Property of Zero-Crossings of Second-Derivative of the Multiscale Closing-Opening]: Let f : D  R ! R denote a bounded function of class C 1, and g : G  R ! R a scaled anti-convex structuring function.

Let Z (f ) denote the number of zero-crossings of the second derivative of f . Then, for any 1 < 2 < 0 < 3 < 4,

Z (f g2 ) Z (f g3 ) Z (f g1 ) Z (f g4 )

   

Z (f ) Z (f ) Z (f g2 ) Z (f g3 )

(6.25) (6.26) (6.27) (6.28)

Proof: The second derivative is negative on the patches, so, Z

!

[

PATCH [(g1 )zi ] = 0; ! [ Z PATCH [(g2 )zi ] = 0: i2I

i2I

(6.29) (6.30)

The opening operation substitutes an anti-convex patch from g for a patch of f (x), let the points of f (x) adjacent to PATCH [(g)zi ] be denoted by f (x1) and f (x2). We have, f (x1) = (g (x1))zi , f (x2) = (g (x2))zi , and f (x) > (g (x))zi x1 < x < x2. Since f is of class C 1, there must be an anti-convex region of f on the patch, that is, there exists x1 <  < x2 such that

f 00() < 0:

(6.31)

127

6.3 Scale-Space Properties of the Multiscale Closing-Opening

Now, there are three cases: 1. f 00(x1) < 0 and f 00(x2) < 0. the substitution of the patch does not introduce any zero-crossings at x1 or x2. 2. f 00(x1) > 0 and f 00(x2) > 0. the substitution of the patch introduces two new zero-crossings at x1 and x2. However, from (6.31) there must have been two or more zero-crossings on f (x); x1 < x < x2. 3. f 00(x1) < 0; f 00(x2) > 0 or f 00(x1) > 0; f 00(x2) < 0. the substitution of the patch introduces one zero-crossings, however, from (6.31) , this replaces one or more zero-crossings on f (x); x1 < x < x2. In any case the number of zero-crossings does not increase with the opening and, relation 6.25 is proved. From the duality of opening and closing, relation 6.26 follows. By relation (6.24), S 0(1)  S 0(2), so points x1 and x2 move further apart. As one of these points moves from a region on f where f 00 < 0 to a region where f 00 > 0, a zero-crossing is introduced by the opening at this point, but this merely replaces the zero-crossing on f and the total number of zero-crossings does not change. Alternately, as one of x1 or x2 moves from a region on f where f 00 > 0 to a region where f 00 < 0, the zero-crossing introduced by the opening disappears at this point along with the zero-crossing on f and the total number of zero-crossings decreases by two. Thus we have,     Z f (x)jx2S0(1 )  Z f (x)jx2S0(2) ; (6.32) and therefore,

Z (f g1 )  Z (f g2 )  Z (f ):

(6.33)

By duality we easily obtain,

Z (f g4 )  Z (f g3 )  Z (f );

(6.34)

which completes the proof of the theorem.  This theorem, is analogous to that in Jang & Chin (1991), and therefore does not constitute a major new result. However it does clearly show the use of partitioning and the method of proof appropriate to these problems. We also now have a framework for attempting to extend this result to functions in

128

6.3 Scale-Space Properties of the Multiscale Closing-Opening

higher dimensions. The results of proposition 6.4 hold in higher dimensions but to extend theorem 6.3 we need to nd a higher-dimensional analogue to zero-crossings of the second-derivative. A natural choice in 2-D would be zero-crossings of the Laplacian, 2 2 (6.35) r2f (x; y) = @@xf2 + @@yf2 ; which are much used in multiscale edge detection (Marr & Hildreth 1980). Since the structuring element is anti-convex, there can be no zero-crossings of the Laplacian on the patches, however, in 2-D zero-crossings do not have to be interleaved (that is, two upward crossings must be separated by a downward crossing), which is the physical basis of equation 6.31. As a result, we can see no way, using second-derivative based features, to extend theorem 6.3 to 2 or higher dimensions. Thus, even in this way the result is closely related to the Gaussian scale-space of Witkin (1984) which has similar limitations. In chapter 3 however, we have found that using local extrema of a function as the signal feature of interest can lead to scale-space causality properties. Since, we have already observed that the patches of the signal introduced by the opening operation possess no local minimums, we have the following scale-space causality theorem.

Theorem 6.4 [Monotonic Property of Local Extrema of the Multiscale Closing-Opening]: Let f : D  Rn ! R denote a bounded function, g : G  Rn ! R a scaled anticonvex structuring function, and the point sets,

Emax(f ) = fx : f (x) is a local maximum g ; Emin(f ) = fx : f (x) is a local minimum g ; denote the local extrema of f . Then, for any 1 < 2 < 0 < 3 < 4,

Emin(f g1 )  Emin(f g2 )  Emin(f ) (6.36) Emax(f g4 )  Emax(f g3 )  Emax(f ) (6.37) Proof: The patches for the opening contain no local minima,

Emin Emin

[ i2I

[

i2I

PATCH [(g1 )zi ]

!

PATCH [(g2 )zi ]

!

= ;;

(6.38)

= ;;

(6.39) (6.40)

129

6.4 Fingerprints in Morphological Scale-Space

The occurrence of a local minimum is a local property of a function therefore the replacement of part of f with a patch from g can not a ect the existence of local extrema outside the patch except possibly at the patch boundary. However, from equation (6.19), the patch is everywhere less than the function it replaces, and no new minima can be created at the patch boundaries, therefore,   Emin (f g1 ) = Emin f (x)jx2S0(1 ) ; (6.41) Emin (f g2 ) = Emin f (x)jx2S0(2 ) : (6.42) But, by relation (6.24), S 0(1)  S 0(2)  D, so,     Emin f (x)jx2S0(1 )  Emin f (x)jx2S0(2 )  Emin (f (x)jx2D )) ; (6.43) and therefore,

Emin (f g1 )  Emin (f g2 )  Emin (f ):

(6.44)

And, as usual, by duality,

Emax(f g4 )  Emax (f g3 )  Emax(f ):

(6.45)



This theorem is a major result, and is the analog of theorem 3.1 for the closing-opening. The question now is how are the two scale spaces, F~(x; ) = (f ~ g )(x) and F (x; ) = (f g )(x) related?

6.4 Fingerprints in Morphological Scale-Space

6.4.1 Equivalency of Fingerprints

We start with some de nitions and notations. The \scale-space ngerprints" are plots, versus scale, of the point sets of the positions of the appropriate signal feature,

De nition 6.1 (Multiscale Dilation-Erosion Scale-Space Fingerprint): The multiscale dilation-erosion scale-space ngerprint is a plot, versus scale, of the scale-dependent point-set:

E ~ () = Emax(f ~ g ) [ Emin (f ~ g ):

(6.46)



130

6.4 Fingerprints in Morphological Scale-Space

A subset of this set, the \reduced ngerprint" is de ned as:

De nition 6.2 (Reduced Multiscale Dilation-Erosion ScaleSpace Fingerprint): The reduced multiscale dilation-erosion

scale-space ngerprint is de ned as:

8 > if  > 0; < Emax(f  g ) ~ Er () = >: Emax(f ) [ Emin (f ) if  = 0; Emin (f g ) if  < 0.

(6.47)



Likewise, for the multiscale closing-opening, we de ne:

De nition 6.3 (Multiscale Closing-Opening Scale-Space Fingerprint): The multiscale closing-opening scale-space ngerprint is a plot, versus scale, of the scale-dependent point-set:

E () = Emax(f g ) [ Emin (f g ):

(6.48)

 A subset of this set, the \reduced ngerprint" is de ned as:

De nition 6.4 (Reduced Multiscale Closing-Opening ScaleSpace Fingerprint): The reduced multiscale closing-opening

scale-space ngerprint is a plot, versus scale, of the scale-dependent point-set: 8 > if  > 0; < Emax(f  g ) (6.49) Er () = >: Emax(f ) [ Emin (f ) if  = 0; Emin (f  g ) if  < 0.



To illustrate these de nitions we show in gure 6.4 ngerprints for the same random 1-D signal from the: zero-crossings of second derivative in Gaussian scale-space; zero-crossings of second derivative in multiscale closing-opening scale-space; local extrema in multiscale dilation-erosion scale-space; local extrema in multiscale closing-opening scale-space. It is clear that the \surface" formed by the dilation is in general di erent from the surface formed by the closing. However, the ngerprints are concerned only with the local extrema of this surface. From proposition 3.5 in chapter 3 we see that (for  > 0) a local maximum on the dilated surface corresponds directly in height and position to a local maximum on the underlying signal. From the geometrical interpretation of the closing we can

6.4 Fingerprints in Morphological Scale-Space

131

(a)

(b)

(c)

(d)

Figure 6.4: A comparison of ngerprints | (a) zero-crossings of second derivative in Gaussian scale-space; (b) zero-crossings of second derivative in multiscale closing-opening scale-space; (c) local extrema in multiscale dilation-erosion scale-space; (d) local extrema in multiscale closing-opening scale-space.

132

6.4 Fingerprints in Morphological Scale-Space

also see that the same applies to the closing operation. Furthermore, an identical condition: that the structuring function \touches" the signal at a local maximum, is responsible for the local maximum in both the dilated and the closed surface. Therefore, we have the following proposition: Proposition 6.5: Let the structuring function have a single local maximum at the origin. The following two statements are equivalent: 1. (f  g )(x) has a local maximum at x = xmax () 2. (f  g )(x) has a local maximum at x = xmax Likewise, for the erosion and opening, the following two statements are equivalent: 1. (f g )(x) has a local minimum at x = xmin () 2. (f  g )(x) has a local minimum at x = xmin Proof: From proposition 3.5: (f  g )(xmax) is a local maximum ) f (xmax) is a local maximum, and (f  g )(xmax) = f (xmax). However, from property (2.48) we have a sandwich result:

f (x)  (f  g )(x)  (f  g )(x); 8x 2 D;

(6.50)

therefore, (f g )(xmax) is also a local maximumand (f g )(xmax) = f (xmax). To show the reverse relation, we appeal to the geometric interpretation of the closing. If (f  g )(xmax) is a local maximum then the origin of the translated (re ected) structuring element at xmax must be greater than the origin for the structuring element at all x in some -neighbourhood of xmax. Since the locii of this origin forms the surface of the dilation operation, we have, (f  g )(x)  (f ~ g )(xmax) for all

x 2 N (xmax; ); (6.51)

which shows that (f ~ g )(xmax) is a local maximum. This completes the proof of the rst part of the proposition. Again, the secod part follows from the morphological duality properties. 

133

6.4 Fingerprints in Morphological Scale-Space

This proposition shows that the reduced scale-space ngerprints for both the dilation-erosion and the closing-opening are identical,

Er~() = Er () for all  2 R:

(6.52)

The natural question now is to extend these results to the full ngerprints. We need the following proposition, Proposition 6.6: Let the structuring function have a single local maximum at the origin. The following two statements are equivalent: 1. (f  g )(x) has a local minimum at x = xmin () 2. (f  g )(x) has a local minimum at x = xmin Likewise, for the erosion and opening, the following two statements are equivalent: 1. (f g )(x) has a local maximum at x = xmax () 2. (f  g )(x) has a local maximum at x = xmax Proof: If (f  g )(xmin) is a local minimum, then the origin of the translated (re ected) structuring function is lower than in the surrounding neighbourhood. Then, since the re ected structuring function has a local minimum at the origin, and is convex, the union of the structuring functions in the neighbourhood of xmin has a minimum at xmin , and, since this union is the closing, (f  g )(xmin) is also a local minimum. To show the reverse relation, we note that if (f  g )(xmin) is a local minimum, then, since the re ected structuring function has a local minimum at the origin, and is convex, we have (f  g )(xmin ) = (f  g )(xmin ):

(6.53)

Appeal to property (2.48): (f  g )(x)  (f  g )(x); 8x 2 D;

(6.54)

we see that (f  g )(xmin) must also be a local minimum. This completes the proof of the rst part of the proposition. Once again, the secod part follows from the morphological duality properties. 

6.4 Fingerprints in Morphological Scale-Space

134

Figure 6.5: The reduced morphological scale-space ngerprint This proposition together with the previous result shows that the full scalespace ngerprints for both the dilation-erosion and the closing-opening are identical, E ~() = E () for all  2 R: (6.55) Thus the surfaces formed by the scaled dilation and closing of a multidimensional function, although di erent almost everywhere, have local extrema of the same height and at the same points. We have not yet explained why we have introduced the reduced ngerprints in de nitions 6.2 and 6.4. The reason lies in the availability of a particularly ecient representation as outlined in the next subsection.

6.4.2 The Reduced Fingerprint as a List

Since, from equation (6.52), we nd the reduced ngerprints from the closingopening and erosion-dilation to be identical, we can drop the superscript and simple write (6.56) Er () = Er~ () = Er (): Now, from theorem 3.1 (or alternately, theorem 6.4), we nd that in the reduced ngerprint,

1 < 2 < 0 ) Er (1)  Er (2)  Er (0); 4 > 3 > 0 ) Er (4)  Er 3)  Er (0):

(6.57) (6.58)

Since at any scale, Er () is a point-set, the reduced ngerprint consists only of vertical lines, beginning at zero scale and extending into positive and negative scales until they end. An example of a reduced morphological scalespace ngerprint is shown in gure 6.5. This property makes the reduced

6.5 Conclusion

135

ngerprint particularly easy to represent, all we need is to specify the position of each ngerprint line and the scale at which it ends. This is easily stored as a list: suppose we have a signal f : Rn ! R with k local extrema at f (xi); i = 1; 2; : : : ; k, then, since each local extrema is the origin of a ngerprint line, we can represent the reduced ngerprint as a list of k (n +1)tuples, (xi; i); i = 1; 2; : : : ; k, where i is the scale associated with line i. This ecient representation will prove useful in applications in the next chapter.

6.5 Conclusion This chapter has established scale-space causality results for the multiscale closing-opening scale-space. We started from published results for the multiscale opening and discussed some de ciencies of this work. In adressing these problems we have obtained a result (theorem 6.3) which shows a scale-space causality for zero-crossings of the second-derivative of a 1-D function. This result is only valid for 1-D functions of class C 1 and we have not been able to extend this result to higher dimensional functions. Turning to local extrema as the signal feature, we were then able to obtain a scale-space causality theorem for the multiscale closing-opening (theorem 6.4). As with theorem 3.1, this result holds in general for higher dimensional functions. We have compared the scale-space ngerprints from the multiscaledilationerosion (theorem 3.1) and the multiscale closing-opening (theorem 6.4) and found them to be identical. Finally, we have shown how the reduced ngerprint can be represented as a list of (n + 1)-tuples. To illustrate the results we have presented gures 6.4 and 6.5 to show the various ngerprints. As a practical note, these diagrams show discrete approximations to the theory which has been developed in the continuous domain. Such discrete approximations involve: using the modi ed Bessel function approximation to the Gaussian (Lindeberg 1990); using extrema of rst di erences for zero-crossings of second-derivatives; and using the eightpoint neighbourhood on the rectangular grid for neighbourhood calculations such as the local extrema. Since we have shown the ngerprints from the dilation-erosion and closingopening to be identical which scale-space should be used in practice? Morphological purists may prefer the closing-opening, since, as discussed in subsection 2.3.2, these operations are \ lters" in the strict sense. However, if the aim is obtain a signal description in the form of a full ngerprint, the dilation and erosion are computationally less expensive. If only the reduced ngerprint is required, then, as we will show in the

6.5 Conclusion

136

next chapter, neither scale-space needs to be computed. We will develop there an algorithm to compute the reduced ngerprint directly without any smoothing of the signal. This now completes the development of the theoretical aspects of morphological scale-space. We will complete the thesis with an example of the application of these results to several problems of 3-D object recognition in range images. The description of these applications follow in the next chapter.

Chapter 7 Object Recognition Using Morphological Scale-Space Fingerprints

137

7.1 Introduction

138

7.1 Introduction In the previous chapters we have developed a new scale-space theory, now is the time to apply this theory to the solution of some typical computer vision problems. In the selection of suitable problems we wish to demonstrate the features of our new theory, highlighting the areas where this theory di ers from other approaches. These major di erences are:  theory is valid for higher dimensional signals;  the reduced ngerprints can be represented very eciently;  the morphological operations most naturally deal with surfaces (although dimensionality properties for functions have been developed). Thus, for the best illustration of the applications of morphological scalespace, we seek a problem where we can represent a surface by the reduced scale-space ngerprints. A suitable class of problems arises under the title of 3-D object recognition in range images. These problems are two-dimensional since we regard a range image as being represented by a function f : R2 ! R, where f (x; y) represents the \height" or \depth" at point (x; y) in the image. The objects appearing in range images are 3-dimensional objects, although we have only a single xed view, or 2 21 D representation of the surface of the object. As discussed in section 2.4 re-cognition implies that the total set of possible objects to be recognised in the image are known a priori, for example, in character recognition the character set is known, and in face recognition, the set of possible faces is known. To avoid confusion, we will refer to this collection of objects as the \database". The range image to be analysed will be called the \scene". We will seek to perform object recognition in \ ngerprint space" that is we will represent both the scene, and database objects, by their reduced morphological scale-space ngerprints, and then search for matches between scene objects and database objects. Each match that is found and veri ed generates a transformation matrix and indicates a single object from the database. If we apply this transformation matrix to the speci ed database object this object will match an object in the image with respect to location, rotation, and magni cation. Therefore the \answer" to the object recognition problem is a list of these matches found for a particular scene. In as much as we are using a new type of feature in this chapter, our work is novel; we have kept the feature extraction and matching stages straightforward to better illustrate the use of our new representation. When used with range images in this way the representation deals with features on the surface of the object and so is a \surface description" similar to the \BCRD" of Radack & Badler (1989) or the \splash" of Stein & Medioni (1992).

7.2 Fingerprint Extraction

139

In the next section we will present the ngerprint extraction algorithm. In section 7.3, we will more formally introduce the methodology to be used in our object recogniser. Then, in sections 7.4 and 7.5 we will discuss in some detail the application of our methodology to the two object recognition problems: human face recognition; and digital elevation map (DEM) analysis. Finally, section 7.6, we will summarise and draw conclusions. The material in this chapter has been presented at several conferences and workshops (Jackway, Boles & Deriche 1993a, Jackway, Boles & Deriche 1993b, Jackway, Deriche & Boles 1993, Jackway, Boles & Deriche 1994), and the recognition results, in summary form only, in a journal paper (Jackway & Deriche 1994).

7.2 Fingerprint Extraction To perform ecient object recognition in ngerprint space, we require an algorithm for computing the reduced morphological scale-space ngerprint. It is very important that the algorithm does not involve actually computing the scale-space image which would be computationally prohibitive. We now present such an algorithm. In the previous chapter we have introduced the reduced morphological scale-space ngerprint and indicated how, for a range image (2-D signal), it can be represented as a list of k triples, (xi; yi; i) i = 1; 2; : : : ; k, where k is the number of local extrema in the image. We can consider the triple (xi; yi; i) as assigning a scale i to the local extrema at position (xi; yi) in the original signal. Now, i represents the scale value at which the ngerprint line at (xi; yi) ends. From the geometrical interpretation of the morphological operations, as discussed extensively in the previous chapter, we see that this i is the value such that structuring functions of   i \touch" the signal at (xi; yi) while those of scale  > i do not. This is the property used in the algorithm | we search for a structuring element satisfying this property and then assign its scale value to the ngerprint line. It may help to visualise this operation as in ating a balloon resting on top of a local maxima, see gure 7.1. If the local maxima is lower than the global maximum, then eventually the balloon gets so big that it touches the surface elsewhere and, from then on, no longer rests on the local maximum. We equate the radius of the balloon (structuring function) at \lift-o " as the critical scale of the local maximum. If the local maximum is equal to the global maximum we can immediately assign the scale to be 1. To formally develop the algorithm we will consider a general multidimensional signal f : Rn ! R with k local maxima at f (xi ) i = 1; 2; : : : ; k. To translate the description of the previous paragraph into a practical algorithm,

140

7.2 Fingerprint Extraction

Figure 7.1: Finding the scale of a local maximum on a signal we note that at the critical scale i for the local maximum at position xi, the structuring function with origin at f (xi) and of scale i passes through a point in common with the surface at at least one other point. Lets denote any one of these other points by f (xp). We need to nd f (xp), for by knowing this point we can immediately determine i by solving (for i) the equation, f (xi) ? gi (xp ? xi) = f (xp): (7.1) If g is anti-convex, we can always nd a function g?1 such that if g (x) = z then,  = g?1(x; z). Using this function the solution to equation (7.1) is,

i = g?1 (xp ? xi; f (xi) ? f (xp)) :

As an example, for hemispherical structuring functions given by   1=2 kxk  ; g (x) = ?jj 1 ? 1 ? kx=k2 we have, giving,

(7.2) (7.3)

2 2 g?1(x; z) = ? kxk2z+2 z ;

(7.4)

2 k x p ? xi k2 + (f (xi ) ? f (xp )) : i = ? 2 (f (xi ) ? f (xp))

(7.5)

The remaining step is to nd the point xp. We can apply equation (7.2) to all the points xj ; j 6= i in the signal; from the balloon analogy we can see that point xp returns the minimum of all the resulting scales, that is we

7.2 Fingerprint Extraction

141

can search for the point returning the minimum scale, so that,  ?1  g ( x ? x ; f ( x ) ? f ( x )) : i = min i j i j j

(7.6)

Fortunately, there is generally no need to search all the points of the signal. For a start, if we are calculating the scale for the local maximum at xi, we can ignore all other points xj in the signal for which f (xj ) < f (xi ), as the structuring function cannot touch there. Secondly, as we progressively compute equation (7.6) , if we denote the minimum scale found so far by ^i then we only have to search the points in the region de ned by,

fx : g^i (x ? xi)  f (xi) ? M g

(7.7)

where M is the global maximum of the f . This is because, outside of this region, the magnitude of the structuring function is such that it cannot touch the signal. Note, as the search in equation (7.6) progresses and the minimum scale decreases the region de ned by equation (7.7) also decreases, ensuring that the algorithm terminates. For this reason it is most ecient to search over points xj in order of increasing radius from xi. In practice, it easier to search along the sides of an expanding square around xi. We can now write the algorithm:

Algorithm 7.1 (To nd the scale of the local maximum at xi): Let M denote the global maximum of signal f (x). ENTRY POINT: xi is a local maximum of f Step 1 i 1 Step 2 IF (f (xi) = M ); RETURN Step 3 R 0 Step 4 R R + 1 Step 5 FOR all points xj of radius R from xi DO Step 6 H f (xi) ? f (xj ) Step 7 IF (H  0) GOTO Step 10 Step 8 j = g?1(xj ? xi; H ) Step 9 IF (j < i); i j Step 10 ENDFOR Step 11 IF (gi (R)  f (xi) ? M ); GOTO Step 4 RETURN: value in i 

142

7.2 Fingerprint Extraction

A similar algorithm nds the scale associated with the local minima of f . An examination of algorithm 7.1 shows that for each local extrema in the signal we search all the points in a n-dimensional volume given by

fx : gi (x ? xi)  f (xi) ? M g :

(7.8)

Therefore, the algorithm is approximately of order O(in ). If we let n = P 1 k n k i=1 i then the computation for the complete reduced morphological scale-space ngerprint is O(kn ). The code fragment of gure 7.2 shows the C implementation of this algorithm for a 2-D function (range image). /* typedef struct {int x; int y; float scale;} scaleitemtype; /* float f[N,N] holds the signal. /* scaleitemtype FP[K] holds the reduced fingerprint, /* on input FP[] contains the co-ordinates of the /* local maxima; on output also contains the /* associated scales. /* M is the global maximum of f[]

*/ */ */ */ */ */ */

for (i=1; i L) which are

152

7.3 Methodology

candidate origins for object. Form a list of features in the scene, Soc c = 1; 2; : : : ; C which are candidates to be origins of objects.

fSog = f(xi; yi; i) : i  L; i = 1; 2; : : : ; kg:

(7.37)

This step requires a search of the k scene features.  For the next step of the procedure we have as input the following parameters: a magnitude range (loMag; hiMag), tolerance values tol, scaletol, and ther number of features to test L. The idea is to obtain for the largest L features from each object in the the ratio m;l = m;l m;l database, and then test whether there are features in the scene surrounding any of the candidate object origins with c = rcc ratios within a tolerance. Each such match generates a hypothesis. The algorithm follows in pseudocode:

Algorithm 7.2 (To generate hypotheses): ENTRY POINT: Parameters: loMag; hiMag; tol; scaletol; L all candidate object origins in the Step 1 FOR scene feature list (xc; yc)c = 1; 2; : : : ; C DO FOR each database object m = 1; 2; : : : ; n Step 2 DO FOR the L largest scale features in each Step 3 database object, (rm;l; m;l; m;l) DO rm;l Step 4 m;l m;l Step 5 Step 6

Step 7

FOR each scene feature (xj ; yj ; j ) j = 1q; 2; : : : ; k for which, rm;l(loMag ? tol)  (xj ? xc)2 + (yj ? yc )2  rm;l(hiMag + tol) DOp (xj ?xc )2 +(yj ?yc )2 j j IF (m;l ? scaletol)  j  (m;l ? scaletol) THEN Generate Hypothesis: object: m position: (Xc ; Yc ) angle:  = Tan?1( xyjj ??ycxc ) ? m;l

magni cation: M q 2 (xj ? xc) + (yj ? yc)2=rm;l . Store Hypothesis in a list.

=

153

7.3 Methodology

Step 8 Step 9 Step 10 Step 11

ENDDO (Step 5) ENDDO (Step 3) ENDDO (Step 2) ENDDO (Step 1)



By multiplying together the FOR loops and realising that the number of features found at step 7 is approximately proportional to the area (hiMag ? loMag + 2tol)2 we nd that the computation is of order O (CLnk(hiMag ? loMag + 2tol)2). At the completion of this algorithm we have a list of hypothesised recognised objects to be veri ed in the next step.

Verify hypotheses

The next phase of the procedure is to test and verify each hypothesis. The important idea here is that each hypothesis speci es exactly a proposed relationship between a scene object and a speci ed database object. Now, given a proposed exact relationship (which is based on a very few matching features) it is easy to check whether enough further features match to verify the hypothesis. Since we have an exact hypothesis to check, we know where to look in feature-space for matching features, thus we can perform this step very eciently. Further, we perform the checking in order of the large scale (major) features rst, corresponding to the coarse-to- ne approach of scalespace (Witkin 1984). The number or proportion of matching features necessary for hypothesis veri cation, and the tolerances used in position and scale for the match, will depend upon the particular application and details will be given when the particular applications are discussed in later sections. Once a hypothesis is veri ed, based on a number of matching features, we can use a regression procedure, which is described in appendix 1, to estimate more accurately the translation, rotation, and magni cation parameters of the matching. A more detailed outline of the procedure is now given.

 For each hypothesis in the list, (m; Xk ; Yk ; ; M ), transform each fea-

ture (rm;i; m;i; m;i) in database object m by the speci ed transformation matrix (Xk ; Yk ; ; M ) to give a hypothesised scene feature (x0i; yi0; i0). The transformation is:

x0i = Mrm;i cos(m;i + ) + Xk ; yi0 = Mrm;i sin(m;i + ) + Yk ;

(7.38) (7.39)

154

7.3 Methodology

i0 = Mm;i:

(7.40)

Search the scene features to nd a feature (xi; yi; i) with ((x0i ? tol)  xi  (x0i + tol) AND (yi0 ? tol)  yi  (yi0 + tol) AND (i0 ? scaletol)  i  (i0 + scaletol)), if such a feature is found, increment featurecount and record the correspondance ((xi; yi; i); (x0i; yi0; i0)). If featurecount is suciently large (for example, featurecount  0:25mm) the hypothesis is veri ed otherwise it is rejected.  If there is more than one hypotheses for the same scene object we accept the one with the highest number of corresponding points and reject the others. Note, in practise, it is most likely that such competing hypotheses are in fact the same object with slightly di erent values of transformation so the choice of hypothesis is not important (given the next step).  For each veri ed hypothesis, the parameters of the transformation matrix of the hypothesis are accurately estimated by a 2-D least squares method using the point-to-point correspondences already stored. The method used is based on that given in Stein & Medioni (1992) and is fully described in appendix 1. The estimation step returns the nal ^ M^ ) output (m; X^ k ; Y^k ; ;  Print results.

7.3.3 Data Structures for the Method

From the above outlines of the matching processes we see that the scene feature-list needs to be searched to nd features matching a given value (x; y; ) within a tolerance. This is equivalent to searching a certain volume in the 3-space Z2  R. Stein & Medioni (1992) have used a method called \structural indexing" which in turn is based on a computer search method \Coalesced Hashing" (Vitter & Chen 1987) to eciently search a similar feature-space. In hashing, an \address" would be made by using a \hash function" h : fall possible (x; y; )g ! f1; 2; : : : ; H g on the scene features. This converts the 3-space into a linear map. Then if we want to nd a feature at some (x; y; ) we simply compute h(x; y; ) and look there in the map. The term \coalasced" refers to the way collisions in the hash table (that is, several features with the same hash address) are catered for. For more details see (Vitter & Chen 1987). Although such a hashing approach would probably be the most ecient approach for large object recognition applications, we have not used it in

7.3 Methodology

155

this thesis because of its extra complexity. We have instead opted for a more direct search strategy as described below. From the ngerprint extraction, we already have the scene features in a list sorted by scale. Actually we use two lists, of negative and positive scale features. This means we can quickly nd the features falling within a range of scales. To cater for searching in 2-D space, we also form an indicator array I[Ns][Ns] of the same dimensions as the scene. As the scene feature-list is read into the procedure, the scales are entered into their correct position in the array, that is, on reading feature (x; y; ) we set array I[x][y] = . Now we can eciently search in the region of a scene point by searching the corresponding region in the indicator array. This completes the full description of the feature extraction and matching procedures. Now we proceed to describe in detail the actual applications of the object recognition method. In the next section we describe an experiment in human face recognition, in the following section, an application in region recognition in digital elevation maps.

7.4 Face Recognition in Range Images

156

7.4 Face Recognition in Range Images

7.4.1 Introduction

Francis Galton (1888) was the rst to attempt to codify the identi cation of humans, resulting in the suggestion to use the facial pro le. [This fascinating paper also contains one of the rst serious suggestions for the use of \marks left by blackened nger-tips upon paper", an idea which has since gained some popularity!] In later work, the position of 5 cardinal points on the facial pro le found to be important in identi cation (Galton 1910). The use of pro les, at rst drawn by hand on paper, later became computerised (Harmon, Kuo, Ramig & Raudkivi 1978, Harmon, Khan, Lasch & Ramig 1981). The recognition of human faces has had a long history in computer vision. We rst present a very brief overview of the eld, taken mainly from (Turk & Pentland 1991a). Although the ability to infer intelligence or character from faces is suspect, the human ability to recognize faces is remarkable. We can recognize thousands of faces learned throughout our lifetime and identify familiar faces at a glance even after years of separation. This skill is quite robust, despite large changes in the visual stimulus due to viewing conditions, expression, aging, and distractions such as glasses or changes in hairstyle or facial hair. As a consequence the visual processing of human faces has fascinated philosophers and scientists for centuries, including gures such as Aristotle and Darwin. Computational models of face recognition, in particular, are interesting because they can contribute not only to theoretical insights but also to practical applications. Turk & Pentland (1991a) Much of the work in the computer recognition of faces has focused on detecting individual features such as the eyes, nose, mouth, and head outline; and de ning a face model by the position, size, and relationships among these features. We should note in passing that face recognition by humans does not wholly work this way (Carey & Diamond 1977) as shown by the poor performance of humans on upside-down faces (Rock 1974). Indeed there may be a single visual neurone or \gnostic unit" in the human brain tuned to recognise a particular face (the famous \grandmother cell"!) and recent research tends to support this hypothesis (Perrett, Mistlin & Chitty 1987). In early work at Bell Laboratories, Goldstein, Harmon & B. (1971) constructed a feature-vector of 22 features which were \relative, distinctive, relatively independent measures, which could be judged reliably" (by humans!). Kaya & Kobayashi (1972) have measured the amount of information

7.4 Face Recognition in Range Images

157

carried by 9 geometric parameters characterising a face. Fischler & Elschlager (1973) used a local template matching and a global goodness of t criterion to perform matching experiments on faces and terrain. Yuille, Cohen & Hallinan (1989) uses \deformable templates" to t the shape of the eyes and mouths, the deformation parameters can be used in recognition. In a similar idea, Huang & Chen (1992) has used \active contours" to t face, eyebrow, eyes, mouth, and nostril shapes. More recently, connectionist approaches (neural networks) have been used to extract the dominant features (Fleming & Cottrell 1990) and categorize the faces. Another recent approach to extracting the dominant features from faces is the use of statistical \principal components analysis" (Sirovich & Kirby 1987, Kirby & Sirovich 1990), which has been extended to the \Eigenface"system of Turk & Pentland (1991a) (also (Turk & Pentland 1991b)). In this system any face can be approximately reconstructed by the weighted sum of about 40 eigenfaces (the 40 largest principal components). The vector of weights then forms an identi er for matching. In a di erent, but related, problem Govindaraju, Sher, Srihari & Srihari (1989) have proposed a method for locating faces in a scene (newspaper photograph). This localisation may form the front-end of a recognition system which works on well-framed faces, or, for example, in photographic lm processing, the location of faces may be used to automatically adjust the colours (Turk & Pentland 1991a). Our approach is di erent from all the above in that we deal with the range image of the face so we have the actual surface of the face to deal with. We will actually deal with a front view of the face. Further, we do not have the freedom to choose the dominant facial features for recognition as in the above methods, we are constrained in the choice of features (local extrema) by our method. We should also emphasise that in our experiments we are somewhat restricted in the data that is available. In particular, we are not able to obtain multiple range images of the same face, (on di erent days, with di erent expressions, with di erent digitisations, etc.). To cope with this we will redigitise, after rotation, magni cation, and translation, the small number of faces available in our database. We place several of these transformed faces together to form the \scene" and then try to recover the recognition parameters. In this way we solve both the localisation and recognition problem. A description of the experimental detail follows.

7.4 Face Recognition in Range Images

158

7.4.2 Methodology

The range data we have available for the face experiments is publicly available on the Internet1 and was used and displayed in the paper by Pentland & Sclaro (1991). There is data in separate les for the heads of 9 people. The native format of the data is cylindrical co-ordinates on a 512  256 grid with 8-bits per pixel, so, if we read the data into an array D[i; j ]; i = 1; 2; : : : ; 256; j = 1; 2; : : : ; 512, we can recover the 3-dimensional shape of the heads, in cartesian co-ordinates (x; y; z), by the relations:

k xk yk zk

= = = =

j + 512i; D[i; j ] cos(j=512); D[i; j ] sin(j=512); i:

(7.41) (7.42) (7.43) (7.44)

For our experiments in recognition we require a front range view of each face. From the 3-D data as described above we can construct a front range view by computing the normal distance from a suitable plane to the closest point on the surface. From initial examination of the data it became apparent that the axis of the cylindrical co-ordinate system used was not always even approximately co-incident with the major axis of the head. As well the heads were not all facing in the 0 direction. Ideally, we should construct an object centered co-ordinate system. One approach is by nding the moments of inertia of the head. This has the problem of being complicated, computer intensive and still dependent on where the head data is truncated at the neck. We were looking for an automated system. The approach taken is to nd the nose in the data, this is easily done by searching for the maximum point in the range data (within a prescribed region), we denote the z co-ordinate of the nose by zn. Then, we take the cross-section of the head at the nose level, that is CSn = f(x; y; z) : z = zng. Next, we t the circle of best t (in the least squares sense) to this cross section. The algorithm, due to Thomas & Chan (1989), used for this purpose is shown in appendix 2. By this method we establish a suitable z-axis for the head, as the line perpendicular to the xy plane and passing through the center point. A new nose point is then found as the point within the proximity of the original nose point with a maximum radius from the new z-axis. This new nose point, and the new z-axis sets up a head-based co-ordinate system which can provide a front-view of the face. 1 available via anonymous File Transfer Protocol (FTP) from the media Laboratory at

MIT at address whitechapel.media.mit.edu:/pub/range.

7.4 Face Recognition in Range Images

159

We compute such a front range view by establishing a reference plane through the head. This reference plane contains the z-axis as established in the previous paragraph and is oriented such that the projection of the nose point on the plane falls on this axis. The reference plane is divided into 256  256 pixels and the perpendicular distance from each pixel to the nearest point on the face is recorded. In this way a range image is established. The resulting 2 21 D range images are placed in the object database. These database objects are shown in gure 7.3.

Figure 7.3: Models for matching with scene The test scene was computer generated by scaling, rotating, re-sampling, and translating the database objects to form a test scene image. The rota-

7.4 Face Recognition in Range Images

160

tion, scaling and re-sampling of the models ensures that \noise" is added to the recognition problem. Otherwise each scene object would match a database object exactly | an unfair test of any method! The scene objects are generated by using the inverse of the translation, that is, if the scene S [:; :] contains: model m; at position (X; Y ); at angle ; at magni cation M , we obtain the values in the scene from the relation:

S [x; y] = M Am [x0; y0] if (x0; y0) 2 A; where the co-ordinate ((x0; y0) is given by: ! ! ! x0i = 1 cos  sin  x1 ? X : yi0 y1 ? Y M ? sin  cos 

(7.45) (7.46)

We do the above for every pixel (x; y) 2 S and round the resulting (x0; y0) to the nearest integer co-ordinates. In this way we ensure that a value is assigned to every pixel in the scene image of the object regardless of the magni cation used, also all pixel values in the scene object are M times the corresponding database pixel value. Since the scene is computer generated from the database objects, the exact parameters for the object recognition problem are known. In this way the accuracy of the recovered parameters and the performance of the method can be determined. The test scene is a range image consisting of nine objects (heads) at magni cations of between 0:75 and 1:5 and angles of 0 to 315 in 45 steps. The values used are shown in table 7.1. The scene image itself is displayed Table 7.1: Scene description for the face recognition problem scene parameters face position angle magnitude 1 300, 256 0 1.50 2 70, 256 0 0.75 3 100, 100 45 1.00 4 256, 70 90 0.75 5 412, 100 135 1.00 6 442, 256 180 0.75 7 412, 412 225 1.00 8 256, 442 270 0.75 9 100, 412 315 1.00

7.4 Face Recognition in Range Images

161

in gure 7.4.

Figure 7.4: Range scene with nine faces for recognition. For hypothesis generation, we selected the permissable range of magni cations to be Rmag = (0:5; 2:0). This range more than covers the range of magni cations found in the scene objects. The larger this range, the more scene features must be searched for a hypothesised match at step 5 of Algorithm 7.2. and the more hypotheses will be generated. The hypothesis veri cation step contains several parameters which govern the tolerences and criteria used for the matching of features. The remaining detail of the methodology is to explain the hypothesis veri cation parameters used. By experimentation with the data-set it was found that the following

7.4 Face Recognition in Range Images

162

Table 7.2: Recovered scene description and true scene description parameters from the face recognition experiment recognised parameters true scene parameters face position angle magnitude face position angle magnitude 1 299.9, 256.2 0.6 1.490 1 300, 256 0 1.50 2 70.5, 255.6 1.2 0.748 2 70, 256 0 0.75 3 100.0, 100.3 45.4 1.007 3 100, 100 45 1.00 4 256.0, 70.1 90.5 0.746 4 256, 70 90 0.75 5 412.1, 99.8 135.6 1.000 5 412, 100 135 1.00 6 441.9, 256.0 179.5 0.752 6 442, 256 180 0.75 7 411.8, 411.9 224.8 0.995 7 412, 412 225 1.00 8 256.1, 441.9 269.9 0.733 8 256, 442 270 0.75 9 100.0, 412.0 315.7 0.992 9 100, 412 315 1.00 procedure gives reliable and robust recognition, with due allowance for the distortions of feature scale previously discussed, and any occlusion of the scene objects by the borders of the scene. To verify each hypothesis, the 50 largest features (in terms of scale magnitude) are considered. The criteria for acceptance is a matching featurecount of greater than 0:25  50 = 12. The area of the scene searched for a match was a square region of side 2 tol where tol = 1 + 0:1 ri0 (ri0 is the hypothesised scene feature radius). A feature is considered a match if its scale falls in the range i0  scaletol, where scaletol = 0:15i0 (i0 is the hypothesised scene feature scale).

7.4.3 Results

The recovered description and scene description (for comparison) are shown in table 7.2. The recognition procedure successfully and correctly identi es all the faces and accurately estimates the position, angle, and magni cation parameters. It may be of interest to note the pertinent statistics from the various stages of the recognition process. The number of features in the ngerprint les are shown in table 7.3. The total number of hypotheses generated is: 485. The approximate times taken for the various parts of the procedure implemented in the C language on a Silicon Graphics Personal Iris 4D/35 computer is shown in table 7.4.

7.5 Digital Elevation Map Analysis

163

Table 7.3: feature numbers in the ngerprint les le scale > 0 scale < 0 total scene le 3651 2690 6341 face 1 49 67 116 face 2 63 72 135 face 3 53 48 101 face 4 87 74 161 face 5 33 40 73 face 6 68 67 135 face 7 35 46 81 face 8 62 52 114 face 9 96 115 211 Table 7.4: Approximate times taken for the stages of the algorithm on a Silicon Graphics Personal Iris 4D/35 computer operation time Obtain ngerprint of 9 faces (256  256) 1.1s Obtain ngerprint of range scene (512  512) 37.6s Generate hypotheses 4.4s Verify hypotheses 4.2s In the next section we describe the second object recognition experiment which uses data from a Digital Elevation Map.

7.5 Digital Elevation Map Analysis 7.5.1 Introduction

As a second application of the object recognition algorithm the problem of identifying feature areas on a digital elevation map (DEM) is addressed. In the case of digital terrain data, the data itself naturally occurs as a 2 21 D representation, where the value at position (x; y) represents the elevation above sea level of the corresponding point on the surface of the earth. Therefore, we do not have to perform the considerable pre-processing required in the previous application. As another di erence with the previous case, this time we are provided with the \scene" and we will construct a recognition prob-

7.5 Digital Elevation Map Analysis

164

lem by building a database from objects extracted from this scene. As in the previous case we will rotate, scale, and re-sample these scene objects to add \noise" to the problem and to make sure the scene and database objects do not match exactly.

7.5.2 Methodology

DEM data for a region in Colorado USA was obtained from the United States Geographical Survey This data is publically available, with documentation (Elassal & Caruso 1983) via the internet2. This data is in the form of elevation measurements on a square grid with spacing of 30 metres. A 512  512 pixel region was selected as the test scene, this corresponds to an area of approximately 15km  15km on the ground. We decided to extract 8 circular regions of radius 40 pixels as the scene \objects" to be recognised. Each region is centered on a local maximum (a mountain) in the scene, so we have in e ect, a mountain recogniser. The test scene and selected regions are shown in gure 7.5. These regions were re-sampled at 15m spacing (by bilinear interpolation (Press et al. 1986)) which e ectively magni es the area by a factor of 2.0, rotated by ?30 about the feature, magni ed by a factor of 0.75 in both height and spatial extent (to give an e ective total magni cation of 1.25), re-sampled to a 30m grid and smoothed by a Gaussian lter of scale 1.0. The resulting 2 21 D range images are placed in the object database. These database objects are shown in gure 7.6. The exact transformation parameters between the database objects and the scene are shown in table 7.5. The aim is to eciently recover these correspondence matrices. Again, the rotation, scaling, and re-sampling of the objects ensures that \noise" is added to the recognition problem. The Gaussian ltering step was found to be necessary to help remove large numbers of spurious small scale features introduced by the re-sampling process. This indicates the sensitivity of the method to any type of noise or distortion which introduces large numbers of local extrema in the surface. This point is important so we include a brief treatment in the next sub-section before presenting the results. To complete this description of the methodology, we remark that the method used is as outlined in detail in section 7.3. As for the various parameters: for hypothesis generation, we selected the permissable range of magni c2 available via anonymous File Transfer Protocol (FTP) from the address

spectrum.xerox.com:/pub/map. The region used is available in four les covering the area bounded by: Lat. 39 220 30" N { 39 370 30" N and Long. 105 000 W { 105 150 W . The data was supplied by Jon Junker [email protected].

7.5 Digital Elevation Map Analysis

165

Figure 7.5: DEM scene with eight regions ations to be Rmag = (0:75; 0:85), this range covers themagni cation found in the scene objects ( 1:125 = 0:8). The tolerances are: an additive scale tolerance of scaletol = 20 and a multiplicative tolerance of RonStol = 0:15. The 3 largest features next to the origin in each database object are used to form the hypotheses. The hypothesis veri cation step contains several parameters which govern the tolerences and criteria used for the matching of features. By experimentation with the data-set it was found that the following values give reliable and robust recognition. To verify each hypothesis, the 20 largest features (in terms of scale magnitude) are considered. The criteria for acceptance is a matching featurecount of greater than 0:33  20 = 6. The area of the scene searched for a match was a square region of side 2 tol where tol = 1 + 0:15 ri0 (ri0 is the hypothesised

7.5 Digital Elevation Map Analysis

166

Figure 7.6: Models for matching with DEM scene Table 7.5: Mountain recognition: exact transformation parameters between the database objects and the test scene scene parameters object position angle magnitude 1 98, 76 30 0.8 2 226, 181 30 0.8 3 359, 46 30 0.8 4 455, 59 30 0.8 5 109, 351 30 0.8 6 192, 304 30 0.8 7 49, 406 30 0.8 8 308, 99 30 0.8 scene feature radius). A feature is considered a match if its scale falls in the range i0  scaletol, where scaletol = 0:2i0 (i0 is the hypothesised scene feature scale).

7.5.3 The e ect of noise

Any recognition method which depends on features is sensitive to noise or distortions which directly a ect the existence of these features. Since our method uses local extrema in the surface we are sensitive to the presence of additive noise particularly that which is uncorrelated over small areas. Such

7.6 Summary and Conclusions

167

e ects are worse on at level regions of the surfaces since an in nitesimal change in the value of a pixel with respect to its neighbours can form a new local maximum or minimum at that point. The e ect of noise is not as bad as may be rst thought, since most (but not all) features introduced by noise are at very small scales which are regarded by the algorithm as the least important of all the features. Secondly, as long as noise does not appear in the database objects but only in the scene, it will not greatly a ect the nal matching process, only slow it down, since the database features will still correspond nally with the correct features in the scene. We can merely suggest here that if noise is a problem it should be removed a priori by some pre- ltering technique such as median ltering(Nodes & Gallagher, Jr. 1982) or averaging (Pratt 1978) (for example, Gaussian ltering). This sensitivity to noise, particularly impulse noise, is not surprising since both the dilation and erosion depend on extreme value statistics (that is the maximum and minimum). In this regard it may be useful to consider using a \soft" structuring element which allows, say  points to penetrate it. That is, instead of the minimum operation we take the -th order statistic and instead of the maximum operation we use the k ? -th order statistic, where k is the number of points in the window. We have already raised this possibility in section 5.3 as a suggestion for further study however it is outside the scope of this thesis.

7.5.4 Results

The test scene ngerprint contains 713 maximums and 420 minimums. With the parameters used, the hypothesis generation step generates 45 hypotheses. The recognition procedure correctly identi es all the regions and estimates the position, angle, and magni cation parameters. The recovered region parameters are shown in table 7.6. The approximate times taken for the various parts of the procedure implemented in the C language on a Silicon Graphics Personal Iris 4D/35 computer is shown in table 7.7.

7.6 Summary and Conclusions We have presented a novel approach using the morphological scale-space ngerprint for the recognition of multiple 3-D objects in 3-D scene data. The recognition is invariant to translation, rotation, scale, and partial occlusion. We have presented results showing the recognition of a scene containing nine human faces at various positions, angles and scales. In a second application we have eciently recognised eight mountainous regions in a DEM scene. We

168

7.6 Summary and Conclusions

Table 7.6: Mountain recognition: recovered scene parameters, and true scene parameters from the recognition process recovered scene parameters true scene parameters object position angle magnitude object position angle magnitude 1 98.0, 75.9 29.4 0.794 1 98, 76 30 0.8 2 226.1, 181.2 30.0 0.795 2 226, 181 30 0.8 3 358.7, 46.5 29.3 0.793 3 359, 46 30 0.8 4 454.9, 58.8 30.1 0.805 4 455, 59 30 0.8 5 109.0, 350.9 29.6 0.796 5 109, 351 30 0.8 6 193.0, 303.7 30.4 0.798 6 192, 304 30 0.8 7 48.4, 406.7 30.0 0.800 7 49, 406 30 0.8 8 307.9, 99.3 30.2 0.802 8 308, 99 30 0.8 Table 7.7: Approximate times taken for the stages of Silicon Graphics Personal Iris 4D/35 computer operation Obtain ngerprint of object (128  128) Obtain ngerprint of scene (512  512) Generate hypotheses Verify hypotheses

the algorithm on a time 0.5s 121s 0.7s 0.3s

have recovered the translation, rotation, and scale parameters which map the database objects into the scene. The method of the present chapter depends on having a reasonable density of features to work with. Since we use local extrema of the surface as features, our method will only work for surfaces with a rich set of local extrema such as are commonly found in natural objects. Our feature extraction will not work successfully with simple shapes, for example a sphere from any direction has only a single local maximum! To utilise our method for more general scenes it should be combined with a method using geometric features such as those used in 3D-POLY by Chen & Kak (1989). The limitation of our method with respect to additive noise has been discussed. These object recognition experiments provide a demonstration of the use of reduced morphological scale-space ngerprints. This demonstration takes advantage of the fact that the ngerprints can be expressed as a list to achieve

7.6 Summary and Conclusions

169

both ecient ngerprint extraction and matching. The coarse-to- ne tracking idea of scale-space is followed by storing and accessing the ngerprints in a list which is sorted by scale. Given suitable data, other demonstrations could be envisioned perhaps making use of the ability of the new scale-space theory to cater for higher dimensional signals. Perhaps problems from other domains entirely outside computer vision which can be expressed as a surface matching problem may be able to be found. The author feels that the object recognition problems described in the present chapter do not fully display the advances made previously in the construction of a new class of scale-space theory however they do present a novel and interesting approach and lend a practical and \real-world" avour to an otherwise theoretical thesis.

Chapter 8 Summary and Conclusions

170

8.1 Summary of the Thesis

171

8.1 Summary of the Thesis In the preceding chapters we have presented the following work and obtained the following ndings. In Chapter 1 we introduced the eld of computer vision and noted the nearly 2000 papers per year currently being published in this academic area. We then demonstrated that many methods in image and signal analysis depend on the value of a parameter which can be seen as a scale parameter. Theoretically, this can be seen as a consequence of the ill-posedness of the problems being addressed. Early researchers realised the usefulness of using a multi-scale analysis of signals to ensure that all of the information contained therein is captured for analysis. A potential problem with any multi-scale signal processing or analysis is how to relate the description of the signal at one scale to that at other scales. Witkin's scale-space ltering, which makes scale a continuous variable, allows for the tracing of features through scale, and therefore provides an elegant solution for this problem. This theory has created lasting research interest and still receives over 20 major citations per year. Unfortunately, Witkin's scale-space has several limitations, particularly for 2-D and higher dimensional signals:  Zero-crossing contours in higher dimensional signals can split into two as with increasing scale.  There is no scale-space causality property for local extrema of a signal.  computation of the ngerprint involves the smoothing of the signal at various scales (although this can be done incrementally).  scale is non-negative only. The research task is stated as: the construction and demonstration of a new scale-space theory which overcomes some of the above limitations. Chapter 2 provided a more technical literature review of Gaussian scalespace, introducing the notation and concentrating on the limitations of this standard approach as outlined above. A formal de nition of scale-space causality (the de ning property of scale-space) was given. A brief review of the related contemporary multi-resolution techniques of multiresolution image processing and pyramids, and wavelet theory was provided. We mentioned that wavelet theory includes as a special case Gaussian scale-space, but that in general it does not possess a causality result linking features across scale. To conclude this section we stated the major research problem addressed in this thesis:

8.1 Summary of the Thesis

172

How can a full scale-space theory be constructed that possesses a monotonic property in higher dimensional signals? We then reviewed the background, principles, and major operations and results of mathematical morphology. We are careful with notation as there has been much confusion in this area. We follow the notation of Sternberg throughout this thesis and justify this choice with theoretical arguments. Dilation and erosion were de ned along with the opening and closing for both sets (binary images) and functions. A catalogue of the fundamental results of mathematical morphology for both sets and functions was presented. To close this section, we presented a discussion of the use of the term \morphological lter" in the literature. This review provided all the necessary results from mathematical morphology for the remainder of the thesis. Finally, we undertook a review of the current state-of-the-art in object recognition, particularly objects in range images. The notation and terminology of this eld were presented and a well known problem de nition and framework was given. Key earlier work was cited and classi ed along the dimensions, of industrial versus natural scenes, feature extraction versus surface description, and hypothesise and verify. We concluded that methods have become more complicated and the examples used more general over the last decade. This part of the review did not become relevant until chapter 7 of the thesis, where the object recognition applications were undertaken, however it was included in chapter 2 to avoid disrupting the ow of the thesis at a later stage. Chapter 3 contained the rst major theoretical results and contribution. After establishing the foundation and notation for the chapter we introduced a scale-dependent morphology via a scaled structuring function. In the two main de nitions of the whole thesis we then de ned two new operations: the \multiscale dilation-erosion" and the \multiscale closing-opening". Leaving the multiscale closing-opening until a later chapter, we then concentrated on the properties of the multiscale dilation-erosion and the scale-space derived from it. We sought to motivate, justify, and explain the approach through a sequence of propositions and proofs. We rst presented properties of the lter support region and continuity and order properties of the scale-space image. Then in a series of propositions relating to signal extrema, we built towards the major scale-space causality theorem which justi ed the use of the term scale-space for this operation. We demonstrated that this scale-space satis es (with a minor modi cation) all the scale-space axioms of Lindeberg. Chapter 4 presented a treatment of several topics which relate to the selection of an suitable morphological structuring function for the new multiscale operator. Firstly, we showed a semi-group property for the scaled structur-

8.1 Summary of the Thesis

173

ing functions and then showed that this property transfers directly to the multiscale operator generated by these functions. Then we showed how the scale-space as a whole possesses the property of \dimensional consistency" or dimensionality, if and only if structuring functions from the elliptic poweroids are used. This property is of relevance if the operators are used on intensity images or functions. We demonstrated the advantages of using the paraboloid structuring functions which are separable in 2-D, and have been called the morphological equivalent of the Gaussian kernel. We then turned to the computation of the scaled smoothing operation and in several steps improved the naive algorithm to take advantage of the symmetry and separability properties of the structuring function. Having established a new operation and proposed its use as a signal smoother, in Chapter 5 we performed spectral and statistical analyses on this smoother. These analyses provide a conceptual with linear smoothers which also have a low-pass frequency response (for example). We found a generalised frequency response (ratio of variances) similar to that of a Butterworth lter of order equal to the exponent in the powerbolic function. We also show that preceding the smoother by a di erentiator gives a band-pass response. Since this smoother is a window operator and can be seen as an additively weighted order statistic lter, we obtained the mean, variance, and autocovariance of a i.i.d. uniform noise signal after smoothing. This work showed the diculty of obtaining even straightforward results when dealing with non-linear operators. Chapter 6 started with a critical review of some published work on the multiscale opening and discussed some de ciencies and limitations of this work. Most serious is the fact that the zero-crossing results cannot be meaningfully extended to 2-D and higher dimensional signals. This is apart from the fact that we have constructed counter-examples that show the theorems to be false with certain realistic input signals! In addressing these problems we obtained a scale-space causality theorem for zero-crossings of the secondderivative of a 1-D function. Note, this result is only valid for 1-D functions of class C 1 and has not been extended to higher dimensional functions. We then turned to local extrema as the signal feature, and obtained a scale-space causality theorem for the multiscale closing-opening, which like the result of the dilation-erosion, is valid in all dimensions. We then found the scale-space ngerprints from the multiscale dilation-erosion and the multiscale closingopening to be identical. Finally, we introduced the \reduced ngerprint" and showed how this ngerprint can be represented as a list of (n + 1)-tuples. Chapter 7 began with the development of an ecient algorithm for the extraction of the reduced ngerprint. We then outlined in detail the methodology to be followed in applying this reduced ngerprint to the recognition of objects in range images by the matching of surface features. Section 7.4

8.2 Thesis Contributions

174

applied the methodology to the recognition of nine human faces in an arti cially constructed range scene. The results showed this demonstration to be a success with all the faces recognised in an ecient manner. Section 7.5 addressed the problem of recognising patches on a surface. In this case the surface was a digital elevation map of part of Colorado. The results showed that all 8 regions were recognised correctly and the transformation parameters recovered to a high degree of accuracy.

8.2 Thesis Contributions

The main contribution of this research is in its introduction of a new and original scale-space theory. This scale-space has superior

theoretical properties in many areas to the Gaussian scale-space. Although a few isolated ndings has previously appeared in the literature (some with awed proofs, as discussed in chapter 6) there had been no coherent theory of scale-space other than the Gaussian scale-space presented by Witkin (1983) (and its discretised version by Lindeberg (1990)). Indeed, although the interest in scale-space has persisted since its inception due to its inherent theoretical appeal and elegance, no-one has even suggested the possibility of non-linear generated scale-spaces with true causality properties. Apart from the introduction of a new scale-space theory encompassed in theorem 3.1, we have made a start in obtaining pertinent results related to the use of multiscale morphological smoothing.  We have shown some continuity and order properties which justify the tripartite form of the smoothing equation in de nition 3.3.  In chapter 4 we have contributed dimensionality results which motivate the choice of scaled structuring function.  We have also presented a semi-group property of the structuring function which carries over to the ltering itself.  This chapter also introduces an algorithm to perform the smoothing and therefore generate the scale- space in practice.  Chapter 5 contributes some initial results in the spectral analysis of this new signal smoothing operation.  Further, a preliminary statistical treatment of this lter is presented.  Chapter 6 contributes another class of morphological scale-space, that based on the multiscale closing-opening we have de ned here.

8.3 Limitations of the Approach

175

 The theoretical relationship between these scale-spaces with respect to

ngerprints is explained.  The notion of reduced ngerprints is introduced and the desirable properties of this construction are outlined.  Chapter 7 contributes an ecient algorithm for the extraction of the reduced ngerprints of signals.  A novel, general purpose, object recognition system using reduced ngerprints is demonstrated. Although the author claims an original and signi cant contribution to scale-space theory, there are some limitations to this work and many research questions which have been left unanswered. This thesis is the starting point for such research.

8.3 Limitations of the Approach

8.3.1 Continuous Theory | Discrete Implementation

To de ne local extrema of a signal we need the notion of the \neighbourhood" of a point. The notion of the neighbourhood, while intuitive and simple for a continuous domain, can become quite complicated for a digital grid. On a digital space a neighbourhood can be de ned as the set of all points \connected" to the point in question, so in turn we have to de ne a \connectivity" structure on the grid (Rosenfeld 1970). For the experiments of Chapter 7 we used 8-connectivity, however this is not a trivial issue (Latecki 1993) and other valid solutions are possible (including the use of a hexagonal grid which has some advantages (Serra & Lay 1985, Bell et al. 1989)). Another place where a digital approximation impacts on the theory is in the morphological operations themselves. For example, the notion of convexivity of the structuring element in a digital space is not straightforward (Serra 1982). Also, the idea of derivatives is essentially a continuous notion, and in practice must be replaced by di erences, in the experiments we have used the rst and second di erences to approximate the corresponding derivatives. In common with Gaussian scale-space (Witkin 1983), the theory of morphological scale-space has been developed in the continuous domain, that is we have taken as our signal, f : Rn ! R. Note that in Chapter 4, we used a discrete domain for the computation of the smoothing, in Chapter 5, we considered a discrete domain for the statistical treatment of the lter, and in Chapter 7 we used in practice a digital approximation f : Z2 ! R to the

8.3 Limitations of the Approach

176

continuous theory. In Gaussian scale-space theory, the work of Lindeberg (1990) was the rst to properly treat the discrete case. It is therefore a limitation of this thesis that the various theorems have not been proved in the discrete case. However, in all the experiments conducted with the discrete approximations outlined above, we have not seen any violation of scale-space causality.

8.3.2 Noise Tolerance

Being a scale-space theory, the approach of this thesis places great emphasis on the existence of a causality property. In achieving this property other properties must necessarily be sacri ced. Although we develop a smoothing operation, this smoothing does not involve averaging (as in linear smoothers). Therefore, in the presence of additive or impulse noise, the present method may not give good results. To the method, a noise impulse is treated as a small-extent, high-amplitude signal component and therefore may be assigned a high scale and treated as an important part of the signal. The only cure for this problem is to pre- lter the signal (with for example, an averaging, or median lter) before performing analysis using morphological scale-space. Thus separating the restoration and analysis steps.

8.3.3 Scale and Human Perception

A possible limitation of the method lies in how it assigns importance (scales) to features of the signal and whether this relates to human perception of the same signal. Although the method possesses the intuitively good (nonlinear!) property that a small signal component close to a major component is given less importance than that same component in a \ at" region of the signal, it remains to be seen how this approach compares to human perception. In the worst case, some unimportant features in our method (small scales) may appear important to humans, while some of our major features may appear unimportant. Related to this question is whether it would be possible to nd a transformation to apply to the signal before scale-space analysis to correct any perceptual distortion in the method. This may be a fruitful area for future research.

8.3.4 Comparisons with Other Methods

As this is the only known scale-space causality result in higher dimensions it is dicult to provide direct meaningful theoretical comparisons. However, when applied to a particular problem, it should be possible to compare the resulting solution with other approaches within the problem domain. In

8.4 Recommendations for Future Research

177

Chapter 7 we have given only a limited comparison with the method of Stein & Medioni (1992) as we were seeking to demonstrate the use of morphological scale-space rather than solve the face or mountain recognition problems per. se. The strength of any scale-space approach lies in the causality theory and potential applications need to take full advantage of this. If strict causality is not needed, other linear approaches such as tracking the zero-crossings of the Laplacian of the Gaussian (Hummel & Moniot 1989), or zero-crossings of the wavelet transform (Mallat 1991), are possible. In these cases, the preferable method depends upon performance; we have not made any of these comparisons in this study, but this is a valid area for future research. Other recommendations for future work are listed in the next section.

8.4 Recommendations for Future Research 1. Much would be gained if the ngerprint could be inverted to obtain the original signal, since a \transform" or representation theory would result which would allow applications such as coding. This is not possible in general since the ngerprint only retains partial information (bounds) on the signal behaviour between extrema. Nevertheless, even partial reconstruction of the signal into an equivalence class would be worthwhile for many applications. Perhaps, the ngerprint could be augmented by additional information to make it unique. 2. The use of multiscale dilation-erosion on the derivatives of multidimensional signals is a potential area for further research. Preceding the smoothing by a di erentiator gives a band-pass lter which may have good information-theoretic motivations. 3. Instead of taking the minimum and maximum (or 1-st and n-th order statistics), we may take the k-th and n ? k-th order statistics in equations (5.43) and (5.44) to give a \soft" morphology where the structuring functions are soft and able to be penetrated by k data points. This will make the operations more robust to impulse noise and may well be worthy of further research. In the context of this thesis, we do not know whether such soft morphology would be suitable to construct a scale-space. 4. Application of the existing theory as presented here to practical problems, especially in multidimensional signals analysis and processing should be a rich area for future research.

8.4 Recommendations for Future Research

178

5. Now that the uniqueness status of the Gaussian scale-space has been broken, what other (necessarily non-linear) scale-spaces exist?

Appendix A Transformation Estimation by Least Squares

179

180

Transformation Estimation by Least Squares

In this appendix we introduce a least squares method for estimating the transformation matrix between two point- sets with correspondance in 2-D. This method is a reduced form of the 3-D method used by Stein & Medioni (1992) and has also appeared in Lamdan, Schwartz & Wolfson (1988). A more thorough treatment of the registration of 3-D shapes including pointsets without correspondence is considered in Besl & McKay (1992). Problem: Let u = (u1; u2)T and v = (v1; v2)T be points in 2-D Euclidean space. Given the sequences uj j = 1; 2; : : : ; n and vj j = 1; 2; : : : ; n with the correspondence uj , vj . Find the transformation T (u) = A(u) + b which minimises the distance between the sequences (Tu)j j = 1; 2; : : : ; n and vj j = 1; 2; : : : ; n. The vector b represents the translational part of the transformation and A is the rotational and magni cation part, that is ! ! a a r cos  ? r sin  11 12 A = a21 a22 = r sin  r cos  (A.1) where r is a scaling or magni cation and  is a rotation. The distance to be minimised is: n X jTuj ? vj j2 d = min T = min A;b

j =1 n X

jAuj + b ? vj j2

j =1 0n X

@ = min A;b

j =1

jb ? vj j2 +

n X j =1

jAuj j2 + 2

n X j =1

b:Auj j ? 2

n X j =1

1 Auj :vj A(A.2)

Pn u = 0. Then Pn b:Au = We are free to choose co-ordinates so that j j =1 j =1 j b:A(Pnj=1 uj ) = 0 and one second-order term disappears. So therefore we need to minimise 1 1 0n 0n n X X X @ jb ? vj j2A + min @ jAuj j2 ? 2 Auj :vj A (A.3) d = min A b j =1

The solution for b is To minimise over A, let

j =1

j =1

b = n1 X vj : n

(A.4)

j =1

g(A) = g(a11; a12; a21; a22) =

n X j =1

jAu

n X 2 jj ? 2 j =1

Auj :vj :

(A.5)

181

Transformation Estimation by Least Squares

Expanding Auj we have

Auj = aa2111 aa2212

!

u1j u2j

!

a12u2j = aa11uu1j + + 21 1j a22u2j

!

(A.6)

So,

jAuj j2 = a211u21j + a212u22j + 2a11a12u1j u2j + a221u21j + a222u22j + 2a21a22u1j u2j ; and,

(A.7)

Auj :vj = a11u1j v1j + a12u2j v1j + a21u1j v2j + a22u2j v2j

We minimise g by setting: @g = 0 for i = 1; 2; k = 1; 2: @aik This is a system of 4 equations in 4 unknowns: n n n @g = 2a X 2 + 2a X u u ? 2 X v u = 0 u 1j 1j 1j 2j 12 11 1j @a11 j =1 j =1 j =1 n n n X @g = 2a X 2 ?2Xv u = 0 u u + 2 a u 12 1j 2j 12 1j 2j 2j @a12 j =1 j =1 j =1 n n n @g = 2a X 2 + 2a X u u ? 2 X v u = 0 u 2j 1j 1j 2j 22 21 2j @a21 j =1 j =1 j =1 n n n X @g = 2a X 2 ? 2 X v u = 0: u u u + 2 a 2j 2j 1j 2j 22 21 2j @a22 j =1 j =1 j =1 This can be written in matrix form as: 1 1 0n 0n X X @ uj uTj A A = @ uj vjT A : j =1

j =1

With the solution,

1 1?1 0 n 0 n X X A = @( uj uTj A @ uj vjT A j =1

j =1

(A.8) (A.9)

(A.10) (A.11) (A.12) (A.13)

(A.14)

(A.15)

In general the determinant of A measures the volume change in the transformation, so we can obtain the scaling: q q p r = r2(cos2  + sin2 ) = det(A) = a11a22 ? a12a21 (A.16)

182

Transformation Estimation by Least Squares

The rotational angle is obtained from: !  2 r sin  a12  ? 1  = Tan 2r cos  = Tan?1 aa21 ? +a 11

22

(A.17)

Appendix B Circle of Best Fit by Least Squares

183

Circle of Best Fit by Least Squares

184

We have used the following algorithm, due to Thomas & Chan (1989), to nd the circle of best t through the cross-section of the head data. Note, this algorithm is exact (non-iterative) unlike the popular method most recently presented by Landau (1987) which, incidentally, has been shown to be biased Berman (1989). The innovation in the Thomas & Chan (1989) method lies in the de nition of the \error" to be minimised. so that the resulting equations can be solved explicitly. Assume the data points (xi; yi) i = 1; 2; : : : ; N . And the circle of best t is given by (x ? a)2 + (y ? b)2 ? r2 = 0; (B.1) where a; b; r are the parameters to be found. The error is de ned to be the di erence between the constant area r2 and the area of the circle centred at (a; b) passing through (xi; yi). Therefore the squared error at data point (xi; yi) is: 2   (B.2) 2i (a; b; r) = r2 ?  (xi ? a)2 + (yi ? b)2 : The task is to nd values for the parameters a; b; r to minimise the Sum of Squared Error: X 2 SSE (a; b; r) = i (a; b; r) 2 X 2  r ?  (xi ? a)2 + (yi ? b)2 : (B.3) = As usual, to minimise, we nd where the partial derivatives, @SSE (a; b; r) = 0; (B.4) @r @SSE (a; b; r) = 0; (B.5) @a @SSE (a; b; r) = 0: (B.6) @b As shown in Thomas & Chan (1989), these equations are easily rearranged to give the matrix form: ! ! ! a = 1 ; 1 1 (B.7)

2 b 2 2 where,

X 2 X 2 1 = 2 x ?N x

(B.8)

Circle of Best Fit by Least Squares

2 1 2

1

2

= = = = =

X X X  2 x y ? N xy X X X  2 x y ? N xy X 2 X 2 2 y ?N y X 2 X X X X X  x x ? N x3 + x y2 ? N xy2 X 2 X X X X X  y y ? N y3 + y x2 ? N yx2 :

After solving for a; b we nd r by: X 2  X X X x ? 2a x + Na2 + y2 ? 2b y + Nb2 : r2 = N1

185 (B.9) (B.10) (B.11) (B.12) (B.13) (B.14)

Bibliography AMS (1991), User's Guide to AMSFonts Version 2.1, American Mathematical Society, Providence, RI. Anh, V., Shi, J. Y. & Tsui, H. T. (1993), `Scaling theorems for zero crossings of bandlimited signals', IEEE Transactions on Pattern Analysis and Machine Intelligence. Arman, F. & Aggarwal, J. K. (1993), `Model-based object recognition in dense-range images | A review', ACM Computing Surveys 25(1), 5{ 43. Attneave, F. (1954), `Some informational aspects of visual perception', Psychological Review 61, 183{193. Babaud, J., Witkin, A. P., Baudin, M. & Duda, R. O. (1986), `Uniqueness of the gaussian kernel for scale-space ltering', IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-8(1), 26{33. Baraniuk, R. G. (1994), Beyond time-frequency analysis: Energy densities in one and many dimensions, in `Proceedings ICASSP'94 1994 International Conference on Acoustics, Speech & Signal Processing', IEEE Signal Processing Society, Los Alamitos, CA, Adelaide, Australia, pp. III{ 357{III{360. Bartle, R. G. (1964), The Elements of Real Analysis, John Wiley & Sons, New York. Batcher, K. E. (1980), `Design of a massively parallel processor', IEEE Transactions on Computers C-29(9), 836{840. Batcher, K. E. (1985), MPP: A high-speed image processor, in L. Snyder, L. H. Jamieson, D. B. Gannon & H. J. Siegel, eds, `Algorithmically Specialized Parallel Computers', Academic Press, London, pp. 59{68. Bell, S. B., Holroyd, F. C. & Mason, D. C. (1989), `A digital geometry for hexagonal pixels', Image and Vision Computing 7(3), 194{204. 186

BIBLIOGRAPHY

187

Ben-Arie, J. & Rao, K. R. (1992), Image expansion by non-orthogonal wavelets for optimal template matching, in `Proceedings 11th IAPR International Conference on Pattern Recognition', IEEE Computer Society Press, Los Alamitos, CA, The Hague, The Netherlands, pp. C650{654. Berman, M. (1989), `Large sample bias in least squares estimators of a circular arc center and its radius', Computer Vision, Graphics, and Image Processing 45, 126{128. Bertero, M., Poggio, T. A. & Torre, V. (1988), `Ill-posed problems in early vision', Proceedings of the IEEE 76(8), 869{889. Besl, P. J. & Jain, R. C. (1985), `Three-dimensional object recognition', ACM Computing Surveys 17(1), 75{145. Besl, P. J. & McKay, N. D. (1992), `A method for the registration of 3-D shapes', IEEE Transactions on Pattern Analysis and Machine Intelligence 14(2), 239{256. Bhanu, B. (1984), `Representation and shape matching of 3-D objects', IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI6(3), 340{351. Bister, M., Cornelis, J. & Rosenfeld, A. (1990), `A critical review of pyramid segmentation algorithms', Pattern Recognition Letters 11, 605{617. Boashash, B. & O'Shea, P. (1989), `Detection and classi cation of underwater transients by time-frequency analysis', Journal of Electrical and Electronics Engineering, Australia 9(3), 63{74. Boles, W. W. & Tieng, Q. M. (1993), Three-dimensional curve representation based on dyadic wavelet transform, in `Proceedings DICTA-93 Digital Image Computing: Techniques and Applications', Sydney, pp. 63{70. Bolles, R. C. & Horaud, P. (1986), `3DPO: a three-dimensional part orientation system', The International Journal of Robotics Research 5(3), 3{26. Bovik, A. C. & Restrepo, A. (1987), `Spectral properties of moving L-estimates of independent data', Journal of the Franklin Institute 324(1), 125{137. Brockwell, P. J. & Davis, R. A. (1987), Time Series: Theory and Methods, Springer Series in Statistics, Springer-Verlag, New York. Burt, P. J. & Adelson, E. H. (1983), `The laplacian pyramid as a compact image code', IEEE Transactions on Communications COM-31(4), 532{ 540.

BIBLIOGRAPHY

188

Butzer, P. L. & Berens, H. (1967), Semi-Groups of Operators and Approximation, Die Grundlehren der mathematischen Wissenschaften in Einzeldarstellungen, Band 145, Springer-Verlag, Berlin. Caelli, T. & Moraglia, G. (1987), `The concept of spatial frequency channels cannot explain some visual masking e ects', Human Neurobiology 6, 63{ 65. Campbell, F. W. & Robson, J. G. (1968), `Application of fourier analysis to the visibility of gratings', Journal of Physiology 197, 551{556. Campos, J. C., Linney, A. D. & Moss, J. P. (1993), `The analysis of facial pro les using scale space techniques', Pattern Recognition 26(6), 819{ 824. Canny, J. (1986), `A computational approach to edge detection', IEEE Transactions on Pattern Analysis and Machine Intelligence 8(6), 679{698. Carey, S. & Diamond, R. (1977), `From piecemeal to con gurational representation of faces', Science 195, 312{314. Carlotto, M. J. (1987), `Histogram analysis using a scale-space approach', IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-9(1), 121{129. Chakravarty, I. & Freeman, H. (1982), Characteristic views as a basis for three-dimensional object recognition, in `Proceedings of the SPIE Conference on Robot Vision', Vol. 336, SPIE, pp. 37{45. Chang, T. & Kuo, C.-C. J. (1992), Texture classi cation with tree-structured wavelet transform, in `Proceedings 11th IAPR International Conference on Pattern Recognition', IEEE Computer Society Press, Los Alamitos, CA, The Hague, The Netherlands, pp. B256{259. Chen, C. H. & Kak, A. C. (1989), `A robot vision system for recognizing 3-D objects in low-order polynomial time', IEEE Transactions on Systems, Man, and Cybernetics 19(6), 1535{1563. Chen, M.-H. & Yan, P.-F. (1989), `A multiscaling approach based on morphological ltering', IEEE Transactions on Pattern Analysis and Machine Intelligence 11(7), 694{700. Chen, Y. & Dougherty, E. R. (1992), Texture classi cation by gray-scale morphological granulometries, in `Visual Communications and Image Processing '92', Vol. Proceedings SPIE Vol. 1818, SPIE, pp. 931{942.

BIBLIOGRAPHY

189

Chien, C. H. & Aggarwal, J. K. (1986), `Volume/surface octrees for the representation of three-dimensional objects', Computer Vision, Graphics, and Image Processing pp. 100{113. Chien, C. H., Sim, Y. B. & Aggarwal, J. K. (1988), Generation of volume/surface octree from range data, in `CVPR'88: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition', IEEE Computer Society Press, Ann Arbor, Michigan, pp. 254{260. Chin, R. T. & Dyer, C. R. (1986), `Model-based recognition in robot vision', ACM Computing Surveys 18(1), 67{108. David, H. A. (1981), Order Statistics, Wiley series in Probability and Mathematical Statistics, second edn, John Wiley & Sons, New York. DePree, J. D. & Swartz, C. W. (1988), Introduction to Real Analysis, John Wiley & Sons, New York. Dougherty, E. R. (1990), Characterization of gray-scale morphological granulometries, in P. D. Gader, ed., `Image Algebra and Morphological Image Processing', Vol. SPIE 1350, SPIE, pp. 129{137. Dougherty, E. R. & Giardina, C. R. (1988), Closed-form representation of convolution, dilation, and erosion in the context of image algebra, in `CVPR'88: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition', IEEE Computer Society Press, Ann Arbor, Michigan, pp. 754{759. Dougherty, E. R., Newell, J. T. & Pelz, J. B. (1992), `Morphological texturebased maximum-likelihood pixel classi cation based on local granulometric moments', Pattern Recognition 25(10), 1181{1198. Du , M. (1979), Parallel processors for digital image processing, in P. Stucki, ed., `Advances in Digital Image Processing: Theory, Application, Implementation', Plenum Press, New York, pp. 265{276. Du , M. J. B., Watson, D. M., Fountain, T. J. & Shaw, G. K. (1973), `A cellular logic array for image processing', Pattern Recognition 5, 229{ 247. Dyer, C. R. (1987), Multiscale image understanding, in L. M. Uhr, ed., `Parallel Computer Vision', Academic Press, London, pp. 171{213.

BIBLIOGRAPHY

190

Elassal, A. A. & Caruso, V. M. (1983), Digital elevation models, USGS Circular 895-B, United States Geological Survey, 516 National Center, Reston, Virginia 22092. Fan, T.-J. (1990), Describing and Recognizing 3-D Objects Using Surface Properties, Springer Series in Perception Engineering, Springer-Verlag, New York. Fan, T.-J., Medioni, G. & Nevatia, R. (1989), `Recognizing 3-D objects using surface descriptions', IEEE Transactions on Pattern Analysis and Machine Intelligence 11(11), 1140{1157. Faugeras, O. D. & Hebert, M. (1986), `The representation, recognition, and locating of 3-D objects', The International Journal of Robotics Research 5(3), 27{52. Fischler, M. A. & Elschlager, R. A. (1973), `The representation and matching of pictorial structures', IEEE Transactions on Computers C-22(1), 67{ 92. Fleming, M. K. & Cottrell, G. W. (1990), Categorization of faces using unsupervised feature extraction, in `Proceedings of the International Joint Conference on Neural Networks', Vol. 2, San Deigo, CA, pp. 65{70. Forbes, K., Jackway, P. T. & Anh, V. V. (1991), Automatic counting of nuclear tracks using a PC, in `Proceedings of Statcomp | Biostats 91', Coolangatta, Queensland, pp. 278{281. Gader, P. D. (1991), `Separable decompositions and approximations of greyscale morphological templates', CVGIP: Image Understanding 53(3), 288{296. Galton, F. (1888), `Personal identi cation and description', Nature 38, 173{ 177. Galton, F. (1910), `Numeralised pro les for classi cation and recognition', Nature 83, 127{130. Gerrit, F. A. & Aardema, L. G. (1981), `Design and use of DIP-1: A fast, exible and dynamically microprogrammable pipelined image processor', Pattern Recognition 14, 319{330. Giardina, C. R. & Dougherty, E. R. (1988), Morphological Methods in Image and Signal Processing, Prentice-Hall, Englewood Cli s, NJ.

BIBLIOGRAPHY

191

Golay, M. J. E. (1969), `Hexagonal parallel pattern transformations', IEEE Transactions on Computers C-18(8), 733{740. Goldstein, A. J., Harmon, L. D. & B., L. A. (1971), `Identi cation of human faces', Proceedings of the IEEE 59(5), 748{760. Govindaraju, V., Sher, D. B., Srihari, R. K. & Srihari, S. N. (1989), Locating human faces in newspaper photographs, in `CVPR'89: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition', IEEE Computer Society Press, San Diego, CA, pp. 549{554. Graham, M. D. & Norgren, P. E. (1980), The di 3t.m. analyser: A parallel/serial Golay image processor, in M. Onoe, K. Preston Jr. & A. Rosenfeld, eds, `Real-Time Medical Image Processing', Plenum Press, New York, pp. 163{182. Grimson, W. E. L. & Lozano-Perez, T. (1984), `Model-based recognition and localization from sparse range or tactile data', The International Journal of Robotics Research 3(3), 2{35. Grimson, W. E. L. & Lozano-Perez, T. (1987), `Locating overlapping parts by searching the interpretation tree', IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-9(4), 469{482. Grossmann, A. & Morlet, J. (1984), `Decomposition of hardy functions into square integrable wavelets of constant shape', SIAM Journal of Mathematical Analysis 15(4), 723{736. Guralnik, D. B., ed. (1982), Webster's New World Dictionary, second college edition edn, Simon & Schuster, New York. Hadamard, J. (1923), Lectures on Cauchy's Problem, Yale University Press, New Haven. Hadwiger, H. (1950), `Minkowskische addition und subtraktion beliebiger punktmengen und die theoreme von Erhard Schmidt', Mathematische Zeitschrift 53(3), 210{218. Haralick, R. M., Sternberg, S. R. & Zhuang, X. (1987), `Image analysis using mathematical morphology', IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-9(4), 532{550. Harmon, L. D., Khan, M. K., Lasch, R. & Ramig, P. F. (1981), `Machine identi cation of human faces', Pattern Recognition 13(2), 97{110.

BIBLIOGRAPHY

192

Harmon, L. D., Kuo, S. C., Ramig, P. F. & Raudkivi, U. (1978), `Identi cation of human face pro les by computer', Pattern Recognition 10, 301{ 312. Hartwig, F. & Dearing, B. E. (1979), Exploratory Data Analysis, Sage University Paper series on quantitative applications in the social sciences, 07{016, Sage Publications, Beverly Hills and London. Heijmans, H. J. A. M. (1991), `Theoretical aspects of gray-level morphology', IEEE Transactions on Pattern Analysis and Machine Intelligence 13(6), 568{582. Heijmans, H. J. A. M. & Ronse, C. (1990), `The algebraic basis of mathematical morphology: I. dilations and erosions', Computer Vision, Graphics, and Image Processing 50, 245{295. Hille, E. & Phillips, R. S. (1957), Functional Analysis and Semi-Groups, American Mathematical Society Colloquium Publications Volume XXXI, American Mathematical Society, Providence, Rhode Island. Hogg, R. V. & Craig, A. T. (1978), Introduction to Mathematical Statistics, fourth edn, Macmillan, New York. Horaud, P. & Bolles, R. C. (1984), 3DPO's strategy for matching threedimensional objects in range data, in `Proceedings of the International Conference on Robotics', IEEE Computer Society Press, Atlanta, Georgia, pp. 78{85. Horn, B. K. P. (1986), Robot Vision, The MIT Electrical Engineering and Computer Science Series, McGraw-Hill, New York. Huang, C.-L. & Chen, C.-W. (1992), `Human facial feature extraction for face interpretation and recognition', Pattern Recognition 25(12), 1435{1444. Hummel, R. & Moniot, R. (1989), `Reconstructions from zero crossings in scale space', IEEE Transactions on Acoustics, Speech, and Signal Processing 37(12), 245{295. Hummel, R. A. (1986), Representations based on zero-crossings in scalespace, in `CVPR'86: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition', IEEE Computer Society Press, Miami, FL., pp. 204{209. IEEE (1991), `Special issue on interpretation of 3-D scenes | part I', IEEE Transactions on Pattern Analysis and Machine Intelligence 13(10), 969{ 1104.

BIBLIOGRAPHY

193

IEEE (1992), `Special issue on interpretation of 3-D scenes | part II', IEEE Transactions on Pattern Analysis and Machine Intelligence 14(2), 97{ 307. Inagaki, N. (1980), The distributions of moving order statistics, in K. Matusita, ed., `Recent Developments in Statistical Inference and Data Analysis', North-Holland, Amsterdam, pp. 137{142. Jackins, C. L. & Tanimoto, S. L. (1980), `Oct-tree and their use in representing three-dimensional objects', Computer Vision, Graphics, and Image Processing 14, 249{270. Jackway, P. T. (1991a), Parametric morphological smoothing: Statistical and spectral properties, Research Working Paper 91/S5, School of Mathematics, Queensland University of Technology, Brisbane, Australia. Jackway, P. T. (1991b), Scale-space properties of a parametric morphological smoother, Research Working Paper 91/S2, School of Mathematics, Queensland University of Technology, Brisbane, Australia. Jackway, P. T. (1992a), Morphological scale-space, in `Proceedings 11th IAPR International Conference on Pattern Recognition', IEEE Computer Society Press, Los Alamitos, CA, The Hague, The Netherlands, pp. C252{255. Jackway, P. T. (1992b), Scale space properties of the multiscale morphological closing-opening lter, in `Proceedings of the 2nd Singapore International Conference on Image Processing (ICIP '92)', Singapore, pp. 278{281. Jackway, P. T. (1993), `Multiscale image processing: A review and some recent developments', Journal of Electrical and Electronics Engineering, Australia 13(2), 88{98. Jackway, P. T. (1994a), `On dimensionality in multiscale morphological scalespace with elliptic poweroid structuring functions', Journal of Visual Communication and Image Representation. to appear. Jackway, P. T. (1994b), `Properties of multiscale morphological smoothing by poweroids', Pattern Recognition Letters 15(2), 135{140. Jackway, P. T. & Deriche, M. (1994), `Scale-space properties of the multiscale morphological dilation-erosion', IEEE Transactions on Pattern Analysis and Machine Intelligence. to appear.

BIBLIOGRAPHY

194

Jackway, P. T., Boles, W. W. & Deriche, M. (1993a), 3-D object recognition using morphological scale-space ngerprints, in `Proceedings of IEEE Workshop on Visual Signal Processing and Communications', Melbourne, pp. 291{294. Jackway, P. T., Boles, W. W. & Deriche, M. (1993b), Morphological scalespace ngerprints and their use in 3-D object recognition, in `Proceedings DICTA-93 Digital Image Computing: Techniques and Applications', Sydney, pp. 382{389. Jackway, P. T., Boles, W. W. & Deriche, M. (1994), Morphological scalespace ngerprints and their use in object recognition in range images, in `Proceedings of the 1994 IEEE International Conference on Acoustics, Speech, and Signal Processing, (ICASSP'94)', Vol. 8, Adelaide, pp. V5{ 8. Jackway, P. T., Deriche, M. & Boles, W. W. (1993), Object recognition in range images using the morphological scale-space ngerprint, in `Proceedings of WoSPA 93: SPRC Workshop on Signal Processing and its Applications', Brisbane, Australia, pp. 89{95. Jang, B. K. & Chin, R. T. (1991), Shape analysis using morphological scale space, in `Proceedings of the 25th Annual Conference on Information Sciences and Systems', pp. 1{4. Jang, B. K. & Chin, R. T. (1992), `Gaussian and morphological scale space for shape analysis', Asia Paci c Engineering Journal (Part A) 2(2), 165{ 202. Jaynes, E. T. (1968), `Prior probabilities', IEEE Transactions on Systems Science and Cybernetics SSC-4(3), 227{241. Jaynes, E. T. (1984), Prior information and ambiguity in inverse problems, in D. W. McLaughlin, ed., `SIAM-AMS Proceedings of the Symposium in Applied Mathematics: Inverse Problems', Vol. 14, SIAM-AMS, American Mathematical Society, pp. 151{166. Johnson, D. E. (1976), Introduction to Filter Theory, Prentice-Hall electrical engineering series, Prentice-Hall, New Jersey. Jolion, J. M. & Montanvert, A. (1992), `The adaptive pyramid: A framework for 2D image analysis', CVGIP: Image Understanding 55(3), 339{348. Justusson, B. I. (1981), Median ltering: Statistical properties, in T. S. Huang, ed., `Two-Dimensional Digital Signal Processing II', Vol. 43 of Topics in Applied Physics, Springer-Verlag, Berlin, pp. 161{196.

BIBLIOGRAPHY

195

Kaya, Y. & Kobayashi, K. (1972), A basic study on human faces recognition, in S. Watanabe, ed., `Frontiers of Pattern Recognition', Academic Press, New York, pp. 265{289. Kelly, M. D. (1971), Edge detection in pictures by computer using planning, in B. Meltzer & D. Michie, eds, `Machine Intelligence 6', Edinburgh University Press, Edinburgh, pp. 397{409. Kirby, M. & Sirovich, L. (1990), `Application of the Karhunen-Loeve procedure for the characterization of human faces', IEEE Transactions on Pattern Analysis and Machine Intelligence 12(1), 103{108. Klein, J. C. & Serra, J. (1972), `The texture analyser', Journal of Microscopy 95(2), 349{356. Knuth, D. E. (1986), The TEXbook, Adison-Wesley Publishing Co., Reading. Ma. Koenderink, J. J. (1984), `The structure of images', Biological Cybernetics 50(5), 363{370. Koskinen, L. & Astola, J. (1992), Statistical properties of soft morphological lters, in `Proceedings of the SPIE Conference on Nonlinear Image Processing III', Vol. 1658, SPIE, pp. 25{36. Kuhlmann, F. & Wise, G. L. (1981), `On second moment properties of median ltered sequences of independent data', IEEE Transactions on Communications COM-29(9), 1374{1379. Lamdan, Y., Schwartz, J. T. & Wolfson, H. J. (1988), On recognition of 3-D objects from 2-D images, in `IEEE International Conference on Robotics and Automation', IEEE, p. ? Lamport, L. (1986), LATEX: A Document Preparation System, Adison-Wesley Publishing Co., Reading. Ma. Landau, U. M. (1987), `Estimation of a circular arc center and its radius', Computer Vision, Graphics, and Image Processing 38, 317{326. Latecki, L. (1993), `Topological connectedness and 8-Connectedness in digital pictures', CVGIP: Image Understanding 57(2), 261{262. L'Ecuyer, P. (1988), `Ecient and portable combined random number generators', Communications of the ACM 31(6), 742{774.

BIBLIOGRAPHY

196

Lee, C. N. & Rosenfeld, A. (1986), Connectivity issues in 2D and 3D images, in `CVPR'86: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition', IEEE Computer Society Press, Miami, FL., pp. 278{285. Lee, Y. H. & Kassam, S. A. (1985), `Generalized median ltering and related nonlinear ltering techniques', IEEE Transactions on Acoustics, Speech, and Signal Processing ASSP-33(3), 672{683. Lehmann, E. L. (1983), Theory of Point Estimation, Wiley series in probability and mathematical statistics, Wiley, New York. Levine, M. D. (1969), `Feature extraction: A survey', Proceedings of the IEEE 57(8), 1391{1407. Levine, M. D. (1985), Vision in Man and Machine, McGraw-Hill Series in Electrical Engineering, McGraw-Hill, New York. Lifshitz, L. M. & Pizer, S. M. (1990), `A multiresolution hierarchical approach to image segmentation based on intensity extrema', IEEE Transactions on Pattern Analysis and Machine Intelligence 12(6), 529{540. Lin, J. H. & Coyle, E. J. (1990), `Minimum mean absolute error estimation over the class of generalized stack lters', IEEE Transactions on Acoustics, Speech, and Signal Processing 38(4), 663{678. Lindeberg, T. (1988), On the construction of a scale-space for discrete images, Technical Report TRITA-NA-P8808, Royal Institute of Technology, Sweden. Lindeberg, T. (1990), `Scale-space for discrete signals', IEEE Transactions on Pattern Analysis and Machine Intelligence 12(3), 234{254. Lipschutz, M. M. (1969), Theory and Problems of Di erential Geometry, Schaum's Outline Series, McGraw-Hill, New York. Logan, B. F. (1977), `Information in the zero crossings of bandpass signals', Bell Systems Technical Journal 56(4), 487{510. Loui, A., Venetsanopoulos, A. & Smith, K. (1990), High speed architectures for morphological image processing, in E. J. Delp, ed., `Nonlinear Image processing', Vol. SPIE 1247, pp. 145{169. Loupas, T., Mcdicken, W. N. & Allan, P. L. (1989), `An adaptive weighted median lter for speckle supression in medical ultrasonic images', IEEE Transactions on Circuits and Systems 36(1), 129{135.

BIBLIOGRAPHY

197

Mallat, S. (1991), `Zero-crossings of a wavelet transform', IEEE Transactions on Information Theory 37(4), 1019{1033. Mallat, S. G. (1989), `A theory for multiresolution signal decomposition: The wavelet representation', IEEE Transactions on Pattern Analysis and Machine Intelligence 11(7), 674{693. Maragos, P. (1989), `Pattern spectrum and multiscale shape representation', IEEE Transactions on Pattern Analysis and Machine Intelligence 11(7), 701{716. Maragos, P. & Schafer, R. W. (1987a), `Morphological lters { part I: Their set-theoretic analysis and relations to linear shift-invariant lters', IEEE Transactions on Acoustics, Speech, and Signal Processing ASSP-35(8), 1153{1169. Maragos, P. & Schafer, R. W. (1987b), `Morphological lters { part II: Their relations to median, order statistic, and stack lters', IEEE Transactions on Acoustics, Speech, and Signal Processing ASSP-35(8), 1170{1184. Marr, D. (1976), `Early processing of visual information', Philosophical Transactions of the Royal Society of London B 275, 483{519. Marr, D. (1982), Vision, Freeman, San Francisco. Marr, D. & Hildreth, E. (1980), `Theory of edge detection', Proceedings of the Royal Society of London B 207, 187{217. Marr, D. & Poggio, T. (1979), `A computational theory of human stereo vision', Proceedings of the Royal Society of London B 204, 301{328. Marr, D., Poggio, T. & Hildreth, E. (1980), `Smallest channel in early human vision', Journal of the Optical Society of America 70(7), 868{870. Marr, D., Ullman, S. & Poggio, T. (1979), `Bandpass channels, zero-crossings, and early visual information processing', Journal of the Optical Society of America 69(6), 914{916. Martin, W. N. & Aggarwal, J. K. (1983), `Volumetric descriptions of objects from multiple views', IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-5(2), 150{158. Matheron, G. (1975), Random Sets and Integral Geometry, Wiley, New York. Maxwell, J. C. (1870), `On hills and dales', The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science XL.|Fourth Series, 421{427.

BIBLIOGRAPHY

198

Meddis, R. (1984), Statistics using ranks, Basil Blackwell, Oxford. Minkowski, H. (1903), `Volumen und ober ache', Mathematische Annalen 57, 447{495. Mokhtarian, F. & Mackworth, A. (1986), `Scale-based description of planar curves and two-dimensional shapes', IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-8(1), 34{43. Morales, A. & Acharya, R. (1990), Non-linear multiscale ltering using mathematical morphology, in E. J. Delp, ed., `Nonlinear Image Processing', Vol. SPIE 1247, Proc. SPIE 1247, pp. 169{179. Morita, S., Kawashima, T. & Aoki, Y. (1991), `Pattern matching of 2-D shape using hierarchical descriptions', Systems and Computers in Japan 22(10), 40{49. Nakagawa, Y. & Rosenfeld, A. (1978), `A note on the use of local min and max operations in digital picture processing', IEEE Transactions on Systems, Man, and Cybernetics SMC-8(8), 632{635. Nalwa, V. S. & Binford, T. O. (1986), `On detecting edges', IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-8(6), 699{ 714. Newell, J. T. & Dougherty, E. R. (1992), Maximum-likelihood morphological granulometric classi ers, in `Image Processing Algorithms and Techniques III', Vol. Proceedings SPIE Vol. 1657, SPIE, pp. 386{395. Nieminen, A., Heinonen, P. & Neuvo, Y. (1987), `A new class of detailpreserving lters for image processing', IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-9(1), 74{90. Nodes, T. A. & Gallagher, Jr., N. C. (1982), `Median lters: Some modi cations and their properties', IEEE Transactions on Acoustics, Speech, and Signal Processing ASSP-30(5), 739{746. Oppenheim, A. V. & Schafer, R. W. (1989), Discrete-Time Signal Processing, Prentice Hall Signal Processing Series, Prentice-Hall, London. O'Sullivan, F. (1986), `A statistical perspective on ill-posed inverse problems', Statistical Science 1(4), 502{527. Papoulis, A. (1977), Signal Analysis, McGraw-Hill, New York.

BIBLIOGRAPHY

199

Pentland, A. & Sclaro , S. (1991), `Closed-form solutions for physically based shape modelling and recognition', IEEE Transactions on Pattern Analysis and Machine Intelligence 13(7), 715{729. Perona, P. & Malik, J. (1990), `Scale-space and edge detection using anisotropic di usion', IEEE Transactions on Pattern Analysis and Machine Intelligence 12(7), 629{639. Perrett, D. I., Mistlin, A. J. & Chitty, A. J. (1987), `Visual neurones responsive to faces', Trends in Neurosciences 10(9), 358{364. Picinbono, B. (1988), Principles of Signals and Systems: Deterministic Signals, Artech House, Inc., Norwood, MA. Poggio, T., Torre, V. & Koch, C. (1985), `Computational vision and regularization theory', Nature 317(September), 314{319. Poggio, T., Voorhees, H. & Yuille, A. (1988), `A regularized solution to edge detection', Journal of Complexity 4, 106{123. Pratt, W. K. (1978), Digital Image Processing, John Wiley & Sons, New York. Press, W. H., Flannery, B. P., Teukolsky, S. A. & Vetterling, W. T. (1986), Numerical Recipes: The Art of Scienti c Computing, Cambridge University Press, Cambridge. Radack, G. M. & Badler, N. I. (1989), `Local matching of surfaces using a boundary-centered radial decomposition', Computer Vision, Graphics, and Image Processing 45, 380{396. Raman, S. V., Sarkar, S. & Boyer, K. L. (1991), `Tissue boundary re nement in magnetic resonance images using contour-based scale space matching', IEEE Transactions on Medical Imaging 10(2), 109{121. Rangarajan, K., Allen, W. & Shah, M. (1993), `Matching motion trajectories using scale-space', Pattern Recognition 26(4), 595{610. Ritter, G. X. & Gader, P. D. (1987), `Image algebra techniques for parallel image processing', Journal of Parallel and Distributed Computing 4, 7{ 44. Ritter, G. X., Wilson, J. & Davidson, J. (1990), `Image algebra: An overview', Computer Vision, Graphics, and Image Processing 49, 297{331.

BIBLIOGRAPHY

200

Rivest, J.-F., Serra, J. & Soille, P. (1992), `Dimensionality in image analysis', Journal of Visual Communication and Image Representation 3(2), 137{ 146. Rock, I. (1974), `The perception of disoriented gures', Scienti c American 230, 78{85. Ronse, C. (1990), `Why mathematical morphology needs complete lattices', Signal Processing 21, 129{154. Rosenfeld, A. (1970), `Connectivity in digital pictures', Journal of the Association for Computing Machinery 17(1), 146{160. Rosenfeld, A. (1984), Multiresolution Image Processing and Analysis, Springer Series in Information Sciences; 12, Springer-Verlag, New York. Rosenfeld, A. (1992), `Survey: Image analysis and computer vision: 1991', CVGIP: Image Understanding 55(3), 349{380. Rosenfeld, A. (1993), `Survey: Image analysis and computer vision: 1992', CVGIP: Image Understanding 58(1), 85{135. Rosenfeld, A. & Thurston, M. (1971), `Edge and curve detection for visual scene analysis', IEEE Transactions on Computers C-20(5), 562{569. Rosenfeld, A., Thurston, M. & Lee, Y.-H. (1972), `Edge and curve detection: Further experiments', IEEE Transactions on Computers C-21(7), 677{ 714. Rotem, D. & Zeevi, Y. Y. (1986), `Image reconstruction from zero crossings', IEEE Transactions on Acoustics, Speech, and Signal Processing ASSP34(5), 1269{1277. Sabata, B., Arman, F. & Aggarwal, J. K. (1993), `Segmentation of 3D range images using pyramidal data structures', CVGIP: Image Understanding 57(3), 373{387. Scherk, P. (1951), `Hadwiger, H. Minkowskische addition und subtraktion beliebiger punktmen und die theoreme von Erhard Schmidt', Mathematical Reviews 12(8), 631{632. Serra, J. (1982), Image Analysis and Mathematical Morphology, Academic Press, London. Serra, J. (1987), `Morphological optics', Journal of Microscopy 145(1), 1{22.

BIBLIOGRAPHY

201

Serra, J. (1988), Image Analysis and Mathematical Morphology. Volume 2: Theoretical Advances, Academic Press, London. Serra, J. & Lay, B. (1985), `Square to hexagonal lattices conversion', Signal Processing 9, 1{13. Serra, J. & Vincent, L. (1992), `An overview of morphological ltering', Circuits Systems and Signal Processing 11(1), 48{108. Shih, F. Y. & Mitchell, O. R. (1991), `Decomposition of gray-scale morphological structuring elements', Pattern Recognition 24(3), 195{203. Sirovich, L. & Kirby, M. (1987), `Low-dimensional procedure for the characterization of human faces', Journal of the Optical Society of America, A 4(3), 519{524. Snyder, L., Jamieson, L. H., Gannon, D. B. & Siegel, H. J., eds (1985), Algorithmically Specialized Parallel Computers, Academic Press, London. Srihari, S. N. (1981), `Representation of 3D digital images', Computing Surveys 13, 399{424. Stans eld, J. L. (1980), Conclusions from the commodity expert project, A.I. Lab Memo No. 601, Massachusetts Institute of Technology. Stein, F. & Medioni, G. (1992), `Structural indexing: Ecient 3-D object recognition', IEEE Transactions on Pattern Analysis and Machine Intelligence 14(2), 125{145. Sternberg, S. R. (1980), Cellular computers and biomedical image processing, in J. Sklansky & J.-C. Bisconte, eds, `Biomedical Images and Computers', Vol. 17 of Lecture Notes in Medical Informatics, SpringerVerlag, Berlin, pp. 294{319. Sternberg, S. R. (1983), `Biomedical image processing', Computer 16(1), 22{ 34. Sternberg, S. R. (1985), Computer architectures specialized for mathematical morphology, in L. Snyder, L. H. Jamieson, D. B. Gannon & H. J. Siegel, eds, `Algorithmically Specialized Parallel Computers', Academic Press, London, pp. 169{176. Sternberg, S. R. (1986), `Grayscale morphology', Computer Vision, Graphics, and Image Processing 35(3), 333{355. Suetens, P., Fua, P. & Hanson, A. J. (1992), `Computational strategies for object recognition', ACM Computing Surveys 24(1), 5{61.

BIBLIOGRAPHY

202

Tanimoto, S. & Pavlidis, T. (1975), `A hierarchical data structure for picture processing', Computer Graphics and Image Processing 4(2), 104{119. Taub, H. & Schilling, D. L. (1971), Principles of Communication Systems, McGraw-Hill Electrical and Electronic Engineering Series, McGraw-Hill, New York. ter Haar Romeny, B. M., Florack, L. M. J., Koenderink, J. J. & Viergever, M. A. (1991), Scale space: Its natural operators and di erential invariants, in `Proceedings of the 12th International Conference in Image Processing in Medical Imaging '91', Lecture Notes in Computer Science V.511, Wye, UK. Thomas, Jr., G. B. & Finney, R. L. (1979), Calculus and Analytic Geometry, fth edn, Addison-Wesley, Reading, Massachusetts. Thomas, S. M. & Chan, Y. T. (1989), `A simple approach for the estimation of circular arc center and its radius', Computer Vision, Graphics, and Image Processing 45, 362{370. Tikhonov, A. & Arsenin, V. (1977), Solutions of Ill-Posed Problems, John Wiley & Sons, New York. Torre, V. & Poggio, T. A. (1986), `On edge detection', IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-8(2), 147{163. Tsui, H.-T., Choy, T. T.-C. & Ho, C.-W. (1988), Biomedical signal analysis by scale space technique, in `Proceedings of the 5th International Conference on Biomedical Engineering'. Tukey, J. W. (1971), Exploratory Data Analysis, (preliminary edition), Addison-Wesley, Reading, Massachusetts. Tukey, J. W. (1977), Exploratory Data Analysis, Addison-Wesley series in behavioral science: quantitative methods, Addison-Wesley, Reading, Massachusetts. Turk, M. & Pentland, A. (1991a), `Eigenfaces for recognition', Journal of Cognitive Neuroscience 3(1), 71{86. Turk, M. A. & Pentland, A. P. (1991b), Face recognition using eigenfaces, in `CVPR'91: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition 1991', IEEE Computer Society Press, pp. 586{591.

BIBLIOGRAPHY

203

Tyan, S. G. (1981), Median ltering: Deterministic properties, in T. S. Huang, ed., `Two-Dimensional Digital Signal Processing II', Vol. 43 of Topics in Applied Physics, Springer-Verlag, Berlin, pp. 197{217. Uhr, L. M., ed. (1987), Parallel Computer Vision, Academic Press, London. van den Boomgaard, R. (1992), Mathematical Morphology: Extensions Towards Computer Vision, PhD thesis, University of Amsterdam. Vemuri, B. C. & Aggarwal, J. K. (1987), `Representation and recognition of objects from dense range maps', IEEE Transactions on Circuits and Systems CAS-34(11), 1351{1363. Verbeek, P. W. & Verwer, B. J. H. (1989), `2-D adaptive smoothing by 3-D distance transformation', Pattern Recognition Letters 9, 53{65. Vitter, J. S. & Chen, W.-C. (1987), The Design and Analysis of Coalesced Hashing, The International Series of Monographs on Computer Science, Oxford University Press, New York. Voss, K. (1991), `Images, objects, and surfaces in Zn', Journal of Pattern Recognition and Arti cial Intelligence 5(5), 797{808. Wendt, P. D., Coyle, E. J. & Gallagher, N. C. (1986), `Stack lters', IEEE Transactions on Acoustics, Speech, and Signal Processing ASSP34, 898{911. Wesley, M. & Markowsky, G. (1981), `Fleshing out projections', IBM Journal of Research and Development 25(6), 934{954. Whitaker, R. T. & Pizer, S. M. (1993), `A multi-scale approach to nonuniform di usion', CVGIP: Image Understanding 57(1), 99{110. Wilson, H. R. & Bergen, J. R. (1979), `A four mechanism model for threshold spatial vision', Vision Research 19, 19{32. Wilson, R. & Granlund, G. H. (1984), `The uncertainty principle in image processing', IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-6(6), 758{767. Witkin, A. P. (1983), Scale-space ltering, in `Proceedings of the International Joint Conference on Arti cial Intelligence', Kaufmann, Palo Alto, CA, pp. 1019{1022. Witkin, A. P. (1984), Scale space ltering: a new approach to multi- scale description, in S. Ullman & W. Richards, eds, `Image Understanding 1984', Ablex, Norwood, NJ, pp. 79{95.

BIBLIOGRAPHY

204

Witkin, A., Terzopoulos, D. & Kass, M. (1987), `Signal matching through scale space', The International Journal of Computer Vision pp. 133{144. Wu, L. & Xie, Z. (1990), `Scaling theorems for zero-crossings', IEEE Transactions on Pattern Analysis and Machine Intelligence 12(11), 46{54. Yang, J.-Y. & Chen, C.-C. (1993), `Decomposition of additively separable structuring elements with applications', Pattern Recognition 26(6), 867{ 875. Young, R. A. (1987), `The gaussian derivative model for spatial vision: I. retinal mechanisms', Spatial Vision 2(4), 273{293. Yuille, A. L. & Poggio, T. (1985), `Fingerprints theorems for zero crossings', Journal of the Optical Society of America, A 2(5), 683{692. Yuille, A. L. & Poggio, T. A. (1986), `Scaling theorems for zero crossings', IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-8(1), 15{25. Yuille, A. L., Cohen, D. S. & Hallinan, P. W. (1989), Feature extraction from faces using deformable templates, in `CVPR'89: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition', IEEE Computer Society Press, San Diego, CA, pp. 104{ 109.

Suggest Documents