[4] A. Rosenfeld, R.A. Hummel and S.W. Zucker. Scene labelling by relaxation operations. ... [22] P.J. Burt and E. Adelson. The laplacian pyramid as a compact ...
Comparison of Segmentation Methods for an Accurate Iris Extraction
by Asheer Kasar Bachoo
Submitted in fulfillment of the academic requirements for the degree of Master of Science in the School of Computer Science University of Kwazulu-Natal Westville Campus, Durban, South Africa October, 2006
c Copyright by Asheer Kasar Bachoo, 2006 °
UNIVERSITY OF KWAZULU-NATAL FACULTY OF SCIENCE AND AGRICULTURE The research described in this dissertation was performed at the University of Kwazulu-Natal under the supervision of Professor Jules-Raymond Tapamo. I hereby declare that all material incorporated in this thesis is my own original work except where acknowledgement is made by name or in the form of a reference. The work contained herein has not been submitted in part or whole for a degree at any other university.
Signed: Asheer Kasar Bachoo
Date: September 2006
As the candidate’s supervisor I have approved this dissertation for submission. Signed J-R Tapamo
Date: September 2006
ii
To my parents, supervisor, and Baba.
iii
Abstract Biometric identification systems recognize persons by a digital signature derived from a particular physiological attribute. One such attribute is the unique patterns that exist in the texture of an iris. These patterns provide sufficient information to uniquely identify an individual. Segmentation of the iris texture from an acquired digital image is not always accurate - the image contains noise elements such as skin, reflection and eyelashes that must be located and removed. We compare and contrast four texture description and two hybrid pattern classification methods for segmenting iris texture using a region based pattern classification approach. These techniques are evaluated by analyzing their segmentation accuracy.
iv
Acknowledgements A dissertation is a journey of self discovery and knowledge acquisition. I would like to thank the following people for their support during the formulation of this thesis: • My supervisor, Prof. Jules-Raymond Tapamo, for his guidance and support during the difficult times of this research. His passion for scientific discovery and perseverance has become one of my most valued character traits. • My parents and family, for their undying patience and faith in me. • My fellow colleagues, Ming-Wei, Johan and Wayne for their discussions and humor; Soren Greenwood, for his technical support and lively anecdotes; Deshen
Moodley and Charlene Beirowski, for their sound advice; and all my friends for their support and care. • The NRF, Armscor (SA) and the School of Computer Science, University of Kwazulu-Natal, Durban for their financial support that assisted tremendously
and lightened many burdens. Opinions expressed and conclusions arrived at in this thesis are those of the author and not necessarily to be attributed to the NRF. • The National Laboratory of Pattern Recognition, Chinese Academy of Sciences, for the use of their CASIA Iris Database.
• And Sai Baba, the unseen force that gave me light in the darkest of times.
v
Table of Contents
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
iv
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
v
Chapter 1
. . . . . . . . . . . . . . . . . . . . . . . . . .
1
1.1
Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1.2
Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
1.3
Investigated approach . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
1.4
Thesis objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
1.5
Thesis layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
Chapter 2
Introduction
Background and previous work . . . . . . . . . . . . . . .
5
2.1
The iris . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
2.2
Iris recognition algorithm . . . . . . . . . . . . . . . . . . . . . . . . .
8
2.3
Implemented methods and technologies . . . . . . . . . . . . . . . . .
8
2.4
Current iris segmentation algorithms . . . . . . . . . . . . . . . . . .
23
2.5
Analysis of iris segmentation . . . . . . . . . . . . . . . . . . . . . . .
24
Chapter 3
Feature extraction . . . . . . . . . . . . . . . . . . . . . . .
28
3.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
28
3.2
Image formulation
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
28
3.3
Texture in digital images . . . . . . . . . . . . . . . . . . . . . . . . .
30
3.4
Applications of texture analysis . . . . . . . . . . . . . . . . . . . . .
33
3.5
Texture analysis methods
. . . . . . . . . . . . . . . . . . . . . . . .
34
3.5.1
Statistical texture analysis . . . . . . . . . . . . . . . . . . . .
36
3.5.2
Structural and geometrical texture analysis . . . . . . . . . . .
43
3.5.3
Spectral and signal processing approaches . . . . . . . . . . .
45
3.5.4
Model-based approaches . . . . . . . . . . . . . . . . . . . . .
62
3.6
Feature normalization
. . . . . . . . . . . . . . . . . . . . . . . . . .
69
3.7
The border problem . . . . . . . . . . . . . . . . . . . . . . . . . . . .
70
vi
Chapter 4
Image segmentation . . . . . . . . . . . . . . . . . . . . . .
71
Image segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . .
71
4.1.1
Segmentation based on global image knowledge . . . . . . . .
71
4.1.2
Edge-based segmentation . . . . . . . . . . . . . . . . . . . . .
73
4.1.3
Region based segmentation . . . . . . . . . . . . . . . . . . . .
79
4.1.4
Watershed segmentation . . . . . . . . . . . . . . . . . . . . .
81
4.2
Pattern classification for region based segmentation . . . . . . . . . .
82
4.3
Pattern classification methods . . . . . . . . . . . . . . . . . . . . . .
83
4.3.1
Neural networks . . . . . . . . . . . . . . . . . . . . . . . . . .
84
4.3.2
Syntactic methods for pattern classification . . . . . . . . . . .
85
4.3.3
Fuzzy systems . . . . . . . . . . . . . . . . . . . . . . . . . . .
87
4.3.4
Linear discriminant functions . . . . . . . . . . . . . . . . . .
87
4.3.5
Statistical pattern classification . . . . . . . . . . . . . . . . .
89
4.3.6
Clustering algorithms . . . . . . . . . . . . . . . . . . . . . . .
91
Similarity measures . . . . . . . . . . . . . . . . . . . . . . . . . . . .
95
4.1
4.4
Chapter 5 5.1
5.2
Methods and results . . . . . . . . . . . . . . . . . . . . .
97
Experimental environment . . . . . . . . . . . . . . . . . . . . . . . .
97
5.1.1
Image format . . . . . . . . . . . . . . . . . . . . . . . . . . .
97
5.1.2
Software development environment . . . . . . . . . . . . . . .
98
5.1.3
Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
98
5.1.4
System overview . . . . . . . . . . . . . . . . . . . . . . . . .
98
Iris boundary localization . . . . . . . . . . . . . . . . . . . . . . . .
99
5.2.1
PUPIL EDGE MAP computation . . . . . . . . . . . . . . . . 101
5.2.2
SCLERA EDGE MAP computation
5.2.3
Iris localization . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.2.4
Eyelid localization . . . . . . . . . . . . . . . . . . . . . . . . 113
. . . . . . . . . . . . . . 103
5.3
Iris normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
5.4
Texture preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 5.4.1
Point spread functions . . . . . . . . . . . . . . . . . . . . . . 121
5.4.2
The Retinex model . . . . . . . . . . . . . . . . . . . . . . . . 125 vii
5.4.3 5.5
5.6
Texture enhancement . . . . . . . . . . . . . . . . . . . . . . . 127
Parameter estimation and feature extraction . . . . . . . . . . . . . . 129 5.5.1
GLCM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
5.5.2
GABOR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
5.5.3
DWT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
5.5.4
MRF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 5.6.1
Sample selection . . . . . . . . . . . . . . . . . . . . . . . . . 135
5.6.2
Classifier design . . . . . . . . . . . . . . . . . . . . . . . . . . 139
5.7
Computing ground truths . . . . . . . . . . . . . . . . . . . . . . . . 143
5.8
Iris segmentation using pattern classification . . . . . . . . . . . . . . 143
5.9
5.8.1
Clustering of features for region growing . . . . . . . . . . . . 144
5.8.2
Iris image region classification for segmentation . . . . . . . . 148
5.8.3
Connected components filtering . . . . . . . . . . . . . . . . . 159
Iris texture extraction . . . . . . . . . . . . . . . . . . . . . . . . . . 165
Chapter 6
Future work and recommendations . . . . . . . . . . . . 166
6.1
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
6.2
Limitations of the system and recommendations for future work . . . 167
6.3
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
viii
List of Figures
Figure 2.1
The iris . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
Figure 2.2
A segmented iris zone . . . . . . . . . . . . . . . . . . . . . . .
10
Figure 2.3
Normalized iris region . . . . . . . . . . . . . . . . . . . . . .
10
Figure 2.4
Sub-bands produced by the Haar decomposition process . . . .
19
Figure 2.5
An iris image - 1) upper eyelid 2) eyelashes 3) uneven illumination 4) specular reflection 5) lower eyelid . . . . . . . . . . . .
25
Figure 2.6
Machine vision framework for iris texture extraction . . . . . .
27
Figure 3.1
An example of a simple scene . . . . . . . . . . . . . . . . . .
31
Figure 3.2
A scan line from A to B in the image shown in Figure 3.1 . . .
31
Figure 3.3
Examples of textures in an iris image . . . . . . . . . . . . . .
32
Figure 3.4
Visual texture patterns . . . . . . . . . . . . . . . . . . . . . .
32
Figure 3.5
Texture primitives . . . . . . . . . . . . . . . . . . . . . . . .
32
Figure 3.6
Texture calculations using a window
34
Figure 3.7
Test images (from top, left to right): EYELASH1, EYELASH2,
. . . . . . . . . . . . . .
IRIS1, IRIS2 and IRIS3 . . . . . . . . . . . . . . . . . . . . .
35
Figure 3.8
Pixel relations . . . . . . . . . . . . . . . . . . . . . . . . . . .
38
Figure 3.9
Computing an un-normalized GLCM . . . . . . . . . . . . . .
38
Figure 3.10 Roberts operators . . . . . . . . . . . . . . . . . . . . . . . . .
47
Figure 3.11 Prewitt operators . . . . . . . . . . . . . . . . . . . . . . . . .
47
Figure 3.12 Sobel operators . . . . . . . . . . . . . . . . . . . . . . . . . .
47
ix
Figure 3.13 Laplace operator . . . . . . . . . . . . . . . . . . . . . . . . .
47
Figure 3.14 Application of spatial filters . . . . . . . . . . . . . . . . . . .
48
Figure 3.15 Partitioning of the Fourier spectrum: a) ring filter, b) wedge filter 50 Figure 3.16 Fourier power spectrum
. . . . . . . . . . . . . . . . . . . . .
51
Figure 3.17 DWT tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
54
Figure 3.18 Haar kernel operators LL, LH, HL, HH . . . . . . . . . . . .
54
Figure 3.19 A single pass of a 2D DWT transform using the 2×2 HL Haar kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
55
Figure 3.20 Haar decomposition process . . . . . . . . . . . . . . . . . . .
56
Figure 3.21 Gabor surfaces with θ = 0, Bf = 1 and Bθ = 30 . . . . . . . .
58
Figure 3.22 Gabor 2D plots (cross section through y = 0 with φ =
Pi , 2
θ = 0, Bf = 1 and Bθ = 30) . . . . . . . . . . . . . . . . . . .
59
Figure 3.23 Gabor kernels (clockwise from top left): 0◦ , 45◦ , 90◦ and 135◦ angles. Note the orientation of the ”stripes” . . . . . . . . . . Figure 3.24 Neighbourhood systems
59
. . . . . . . . . . . . . . . . . . . . .
64
Figure 3.25 1st order cliques . . . . . . . . . . . . . . . . . . . . . . . . . .
64
Figure 3.26 2nd order cliques . . . . . . . . . . . . . . . . . . . . . . . . .
65
Figure 3.27 Symmetric GMRF clique pairs . . . . . . . . . . . . . . . . . .
67
Figure 4.1
Thresholding . . . . . . . . . . . . . . . . . . . . . . . . . . .
74
Figure 4.2
Edge detection . . . . . . . . . . . . . . . . . . . . . . . . . .
76
Figure 4.3
A scan line of an image showing altitudes. The vertical lines at the peaks are dam walls . . . . . . . . . . . . . . . . . . . . . x
82
Figure 4.4
A simple 2 class decision surface. Patterns below the line fall into class 2 while patterns above the line are in class 1 . . . .
84
Figure 4.5
A single neuron . . . . . . . . . . . . . . . . . . . . . . . . . .
85
Figure 5.1
Image coordinate system . . . . . . . . . . . . . . . . . . . . .
97
Figure 5.2
System overview
99
Figure 5.3
Computing an edge map for pupil detection . . . . . . . . . . 102
Figure 5.4
Noise reduction using closing followed by opening . . . . . . . 106
Figure 5.5
Linear contrast stretching . . . . . . . . . . . . . . . . . . . . 109
Figure 5.6
Contrast enhancement on the image that is processed using
. . . . . . . . . . . . . . . . . . . . . . . . .
closing and opening. The boundary gradients are sharper compared to standard histogram equalization . . . . . . . . . . . . 110 Figure 5.7
The edge preserving filter. The dot in the center is the pixel at (x, y) with value f (x, y) . . . . . . . . . . . . . . . . . . . . . 111
Figure 5.8
Application of the EPF . . . . . . . . . . . . . . . . . . . . . . 112
Figure 5.9
A localized iris . . . . . . . . . . . . . . . . . . . . . . . . . . 113
Figure 5.10 Eyelid localization using RANSAC and learned parameters . . 117 Figure 5.11 Eyelid localization
. . . . . . . . . . . . . . . . . . . . . . . . 118
Figure 5.12 Iris normalization . . . . . . . . . . . . . . . . . . . . . . . . . 120 Figure 5.13 Using point spread functions for normalizing image illumination. Note that the PSF outputs have been contrast enhanced
123
Figure 5.14 Enhanced images, without and with illumination flattening using a PSF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 Figure 5.15 Enhanced images, with PSF and Retinex . . . . . . . . . . . . 126 xi
Figure 5.16 Contrast enhancement of Retinex output . . . . . . . . . . . . 128 Figure 5.17 The effect of texture window size on segmentation with window sizes of 8 × 8, 16 × 16 and 24 × 24 . . . . . . . . . . . . . . . . 131 Figure 5.18 Input image and labelled image . . . . . . . . . . . . . . . . . 136 Figure 5.19 Prediction accuracy for training images . . . . . . . . . . . . . 138 Figure 5.20 Average prediction accuracy . . . . . . . . . . . . . . . . . . . 138 Figure 5.21 A graphical view of the FLD projection using a two class discriminant. The two classes are separated by the line y = 0. However, there exists a class overlap . . . . . . . . . . . . . . 140 Figure 5.22 Input image and its ground truth . . . . . . . . . . . . . . . . 143 Figure 5.23 A labelled image produced by the clustering process . . . . . . 145 Figure 5.24 Average DB indices . . . . . . . . . . . . . . . . . . . . . . . . 147 Figure 5.25 Average number of classes for clustering . . . . . . . . . . . . 147 Figure 5.26 Average segmentation accuracy . . . . . . . . . . . . . . . . . 152 Figure 5.27 Average weighted segmentation accuracy . . . . . . . . . . . . 152 Figure 5.28 Segmentation results . . . . . . . . . . . . . . . . . . . . . . . 153 Figure 5.29 Segmentation results . . . . . . . . . . . . . . . . . . . . . . . 154 Figure 5.30 Segmentation results . . . . . . . . . . . . . . . . . . . . . . . 155 Figure 5.31 Segmentation results . . . . . . . . . . . . . . . . . . . . . . . 156 Figure 5.32 Segmentation results . . . . . . . . . . . . . . . . . . . . . . . 157 Figure 5.33 Segmentation results . . . . . . . . . . . . . . . . . . . . . . . 158 Figure 5.34 Average segmentation accuracy using CC filtering . . . . . . . 162 Figure 5.35 Average weighted segmentation accuracy using CC filtering . . 162 xii
Figure 5.36 Segmentation results - successful connected components filtering 163 Figure 5.37 Segmentation results - poor connected components filtering . . 164 Figure 5.38 Input iris image and extracted iris texture . . . . . . . . . . . 165
xiii
List of Tables
Table 5.1
Learning data for parabolas. Red denotes data that was rejected 116
Table 5.2
GLCM 2 class distributions based on FLD transform . . . . . . 141
Table 5.3
GABOR 2 class distributions based on FLD transform . . . . . 141
Table 5.4
DWT 2 class distributions based on FLD transform
Table 5.5
MRF 2 class distributions based on FLD transform . . . . . . . 142
Table 5.6
Average segmentation accuracies, with their standard deviations
. . . . . . 142
shown in brackets . . . . . . . . . . . . . . . . . . . . . . . . . 149 Table 5.7
Average segmentation accuracies using connected components. The standard deviation is shown brackets . . . . . . . . . . . . 160
Table 5.8
Difference between segmentation accuracies using connected components and no connected components filtering (CC - NO CC)
xiv
160
Chapter 1 Introduction The human visual system and its cognitive abilities can easily identify objects in a scene - we can extract and recognize an item of interest that may be contained amongst other items. This consists of the power to comprehend what is being sensed visually through the use of reason and a priori knowledge stored in memory. It constitutes visual interpretation and provides a high level systematic structure for further knowledge extraction and decision making. A physical record of objects in a scene can be captured through the acquisition of a digital image using electronic sensing equipment. More importantly, this digital record can be analyzed using computer executed image processing algorithms. Hence, the long desired goal of scientists to use machines to perform automatic visual processing of scenes in digital images is well realized today - vehicle identification through licence plate extraction and recognition, localization of cancer cells in mammograms and facial identification of persons are a few common technologies being developed and improved daily. This is machine vision and pattern classification - the ability to mimic human vision using software and digital equipment for object extraction, recognition and classification.
1.1
Motivation
An obvious application of machine vision and pattern classification is the identification of people. Recognizing an individual is a process that requires physical traits - patterns - that are uniquely discriminative. Humans have used characteristics of the body, such as the face and voice, for centuries to recognize each other. In the past century, robust properties such as fingerprints, retinal patterns and the iris have emerged for person identification. This use of physical traits and behaviours for recognizing people in an automatic system is called biometrics. 1
2 Iris recognition systems acquire images of the eye and use image processing techniques to locate and process the unique patterns in the iris texture. These unique spatial arrangements of tones and physiological elements in the visual texture constitute the input to complex algorithms that generate a digital code from it. This signature is different for every individual and is used to identify a person. Hence, the introduction of iris recognition systems into the commercial sectors has signified an improvement in the security of people, environments, information and resources. These systems have a high success rate that, however, is achieved by assumptions made on the state and nature of the acquired iris image. The digital image processing algorithms embedded in these systems consider useful zones of the iris texture to be located in particular sections of the image - a perfect segmentation is not performed. Consequently, the texture area that is processed may be corrupted or missing important information. This impacts on the performance and accuracy of the recognition component. The above mentioned problem should not be viewed in isolation. The general problem of object segmentation is widespread in a number of applications. In addition, a solution is usually particular to the application and problem domain. It serves to improve our understanding of theories and practices by testing them on a wide variety of data and image types.
1.2
Problem statement
The research problem that this thesis addresses is that of accurate iris image segmentation for an improved texture extraction. Hence, we design an automatic segmentation algorithm that will identify the different regions in an iris image and will extract, if possible, an uncorrupted texture zone that can be processed further for computing a digital signature.
1.3
Investigated approach
Our power to distinguish different objects sometimes uses a very important visual cue texture. Textures particular to objects of interest will have a distinguishing pattern. This thesis focuses on grey scale iris images and the textural patterns inherent to
3 them. We use several methods to compute texture properties in an iris image and, thereafter, identify different texture regions so that we may extract the relevant areas that are required by the digital signature generation component. The computation of texture properties - called features - is referred to as feature extraction. Extraction and identification of image regions is referred to as image segmentation and pattern classification respectively. 1.4
Thesis objectives
The objectives of this thesis are as follows: • Present an extensive review of iris segmentation and recognition algorithms. • Provide an extensive discussion of feature extraction techniques for digital image
textures. This will focuss on grey level co-occurrence matrices (GLCM), the
discrete wavelet (DWT), Gabor filtering (GABOR) and Markov random fields (MRF). • Review segmentation and pattern classification techniques. • Compare and contrast the four techniques - GLCM, DWT, GABOR and MRF - in terms of the segmentation result that their feature sets generate and their separability of textures. • Compare and contrast the performance of the K-means (KM) and fuzzy Cmeans (FCM) clustering techniques for segmenting iris images using each of
GLCM, DWT, GABOR and MRF. • Discuss the image processing algorithms implemented for segmenting iris im-
ages. Techniques include circle and curve fitting, image enhancement and con-
nected components and shape analysis. 1.5
Thesis layout
Chapter 2 presents background on the iris, iris recognition algorithms and discusses current iris segmentation methods. In Chapter 3, feature extraction methods are
4 discussed. We also present image formulation and define features and feature extraction. Chapter 4 discusses image segmentation. We concentrate on pattern classification techniques for segmenting and identifying image regions. Implemented methods and experimental results are presented in Chapter 5. Future works and concluding remarks are presented in Chapter 6.
Chapter 2 Background and previous work Automated person identification using patterns of the iris for digital signatures has developed substantially within the last decade. A number of researchers have contributed to the scientific knowledge available for iris image acquisition, segmentation and pattern analysis and classification. In recent times, our security has been threatened by terrorist groups and malicious individuals, highlighting the need for robust and secure person identification. The automated authentication of individuals is a complex and sensitive task. Current methods of authentication - such as passwords and electronic cards - are susceptible to fraud and theft. An approach to this problem is an automated biometric identification system [91] - a system that uses a unique attribute of the human body to establish a digital signature for recognition of the individual. Our increasing knowledge of the human body’s physiology has developed a number of authentication technologies - retinal, fingerprint, hand geometry, facial, voice, DNA and iris systems [9]. The most common ones are discussed. The primary biometric is fingerprint identification. Minutiae - such as ridge bifurcations and ridge endings - are localized during the processing stage. These characteristics are used to generate an orientation field of the fingerprint, which subsequently provides the discriminating details for authentication of persons. Fingerprint systems are favourable for their high recognition accuracy [31]. Authentication by hand geometry has gained widespread interest. The features vector incorporates information concerning the length and width of the fingers and hand shapes [8]. A drawback of this system is that the geometry of the hand is not very distinctive. It is useful for low-level security systems [117]. Authentication by facial appearance is perhaps the most logical yet complex of the biometrics. Facial images can be captured quite easily and stored in large databases 5
6 for retrieval. However, the authentication stage is hindered by aspects such as aging and facial expression (physical), lighting and camera rotation. A general solution to these problems has yet to be presented [64, 96]. Voice (speaker) recognition [33] exploits the differences in physiological and behavioural aspects of the human speech production system. The speech waves contain spectral components that are analyzed for unique features. This system is also susceptible to a number of problems such as imposters, noise and physiological changes of the person. Iris recognition is a relatively new yet powerful solution to all the problems presented above. It has low false rejection rates (FRRs) and false acceptance rates (FARs) and the digital signature established is highly unique. Replication of the iris and fraudulent efforts are highly unlikely [38]. We now discuss the iris and iris recognition. 2.1
The iris
The iris [6] begins its formation in the 3rd month of gestation. By the 8th month, its distinctive pattern is complete. However, pigmentation and even pupil size increase as far up as adolescence [69]. The iris is a multilayered texture - this combination of layers and color provides a highly distinctive pattern. An assortment of texture variations is possible. They include: • Contractile lines related to the state of the pupil. • Crypts - irregular atrophy of the border layer. • Naevi - small elevations of the border layer. • Freckles - collections of chromatophores. • Color variation - an increase in pigmentation yields darker colored irides. The iris and its texture variations are shown in Figure 2.1. Of the utmost importance in a biometric identification system is the stability and uniqueness of the object being analyzed. Ophthalmologists [50] and anatomists [54], during the course of clinical observations, have noted that the irises of individuals
7
¾
Ciliary zone
¾
Collarette ¾
Crypt
Pupillary zone
-
Radial furrow
-
Pupil
¾
Pupillary frill
Iris Sclera
Figure 2.1: The iris
are highly distinctive. This extends to the left and right eye of the same person. Repeated observations over a period of time have highlighted little variation in the patterns. Developmental biology has also provided evidence of the uniqueness of the iris [69]. Although the general structure of the iris is genetically determined, the uniqueness of its minutiae is highly dependent on circumstances. As a result, replication is almost impossible. It has also been noted that, following adolescence, the iris remains stable and varies little for the remainder of the person’s life. Development is continuous during the early and adolescent years (pigmentation continues as well as an increase in pupil size) [54, 69].
8 In 1936, Frank Burch, an ophthalmologist, proposed the idea of using iris patterns for personal identification. However, this was only documented by James Doggarts in 1949. The idea of iris identification for automated recognition was finally patented by Aran Safir and Leonard Flom in 1987. Although they had patented the idea, the two ophthalmologists were unsure as to a practical implementation for the system. They commissioned John Daugman to develop the fundamental algorithms in 1989. These algorithms were patented by Daugman in 1994 and now form the basis for all current commercial iris recognition systems. The Daugman algorithms are owned by Iridian Technologies and they are licensed to several other companies. 2.2
Iris recognition algorithm
An iris recognition algorithm has the following general structure: • Iris image acquisition: Image acquisition is the process of capturing an iris image and storing it in digital format. It is usually performed with a CCD
(charged coupled device) camera or a digital optical sensor. • Image preprocessing: Processing of the acquired iris image entails localizing the iris and subsequently extracting iris texture features. The processing
stage must remove noise and elements that can affect the feature extraction process. The features extracted must not be corrupted - they must be an accurate reflection of the region of interest. • Learning and recognition: When the iris image has been processed and texture features extracted, a digital signature for the individual must be computed
from the texture properties and stored for comparing to unknown signatures. This is the learning process. If we compare an unknown iris code to known samples and make a decision as to its identity, we are performing recognition. 2.3
Implemented methods and technologies
John Daugman developed the fundamental iris algorithm and iris recognition system. Commissioned by Flom and Safir to conduct intensive and extensive research for implementing automated recognition, his work was presented in [36]. This was followed
9 by updates on his research [37, 38, 39]. Finding an iris in an image requires a search through the image domain for circles. Daugman uses integro-differential operators - commonly known as active contour modelling - to search for maximal derivatives along circular arcs ds of radius r and center (x0 , y0 ). This corresponds to circular edge detection. The operator is Z f (x, y) ∂ ds | max(r,x0 ,y0 ) | Gσ (r) ∗ ∂r r,x0 ,y0 2πr
(2.1)
where Gσ is a Gauss filter [53] that smooths the image f (x, y) and removes noise. The spread (scale) of the filter is controlled by σ and can be used to assess fine and coarse image elements. The partial derivative corresponds to circular edges of radius r. It is also used to assess image quality. If the maximum derivative falls below a threshold it implies poor focus or occlusion by eyelids. Eyelids are modelled using a similar method - derivatives are searched for along parabolic arcs. Once the iris has been located in an image, it is transformed to compensate for changes in diameter to make it invariant to the size. This is important since textural details change when the pupil dilates or constricts. Daugman implements a rubber sheet model to transform the iris to a dimensionless polar co-ordinate system. The idea behind the dimensionless polar system is to assign an r and θ value to each coordinate in the iris that will remain invariant to the possible stretching and skewing of the image. For the transformation, the r value ranges in [0, 1] and the angular value spans the interval [0, 2π]. The remapping is done according to the following formulas: x(r, θ) = (1 − r)xp (θ) + rxi (θ)
(2.2)
xp (θ) = xp0 + rp cos(θ)
(2.3)
xi (θ) = xi0 + ri cos(θ)
(2.4)
y(r, θ) = (1 − r)y (θ) + ry (θ) p i
y (θ) = y + r sin(θ) p p0 p
y (θ) = y + r sin(θ) i i0 i
10
¡ ¡
¡
¡
µ ¡ ¡
Figure 2.2: A segmented iris zone -
?
Figure 2.3: Normalized iris region The center of the pupil is denoted by (xp0 , yp0 ) and (xi0 , yi0 ) is the center of the iris; rp is the radius of the pupil and ri is the radius of the iris; and (xp , yp ) and (xi , yi ) are the coordinates of points bordering the pupil’s and the iris’ radii respectively. The iris is a most unique phenotype and the randomness of its physical structure provides a highly distinctive human pattern for recognition purposes. This biological signal must be decomposed into fundamental time resolved frequencies effectively and with a high degree of information. In [35] Daugman presented 2D Gabor filters for providing optimal spectral and spatial information in images: G(x, y) = e
(−π[
(x−x0 )2 (y−y0 )2 + ]) α2 β2
.e(−2πi[u0 (x−x0 )
2 +v
0 (y−y0 )
2 ])
(2.5)
The Gabor function is essentially sine and cosine functions modulated by a Gaussian window where (x0 , y0 ) specifies a center position in the image - this corresponds to the center of the ”bell-shaped” Gaussian function. The effective aspect ratio of the Gaussian envelope is controlled by (α, β), with α determining the spread in the x direction and β the spread in the y direction. Hence, (α, β) controls the ellipticity of the Gaussian. The wavelength of the sine and cosine functions are represented by
11 (u0 , v0 ) - the frequency co-ordinates in the Fourier domain. The spatial frequency is p w0 = u20 + v02 and θ0 = arctan( uv00 ) is the spatial direction of the sine and cosine
waves. Daugman uses these 2D functions to describe a family of wavelets [84] for extracting discriminating information from iris textures: ψmpqθ (x, y) = 2−2m ψ(x0 , y 0 )
(2.6)
A family of wavelets is a set of self similar functions that are localized in space. They differ by dilations (size) and translations (position) in spatial co-ordinates. The mother wavelet ψ is the generic function. This function is dilated and translated to create the baby wavelets. We have (x0 , y 0 ) dilating the wavelet in size by 2−m and translating it to position (p, q) through a rotation of angle θ. The scale of analysis is m - as m increases, the lower the frequencies analyzed. A high scale corresponds to low frequencies; a low scale corresponds to high frequency analysis. In Equation 2.6 x0 = 2−m [xcos(θ) + ysin(θ)] − p
(2.7)
y 0 = 2−m [−xsin(θ) + ycos(θ)] − q
(2.8)
Wavelets perform a local analysis of texture at different frequencies and resolutions. This provides an optimal coverage of the spatial-frequency domain. The iris image is convolved with a set of filters that cover most of the Fourier domain. For each convolution, the result is binarized to denote an element in the features vector a set of unique information for an individual. A total of 2048 bits (0s and 1s) are generated for the features vector (iris code). Matching is performed using a normalized Hamming function with the exclusive-or operator: 2047
1 X Aj ⊕ B j HD = 2048 j=0
(2.9)
where A and B are two codes being compared and Aj and Bj are bits in corresponding positions. If HD is less than or equal to 0.32 then A and B belong to the same person. Wildes [115] conducted substantial research into iris recognition. The system designed utilizes the Hough transform [53] for locating the iris and a multiresolution Laplacian of Gaussians [22] pyramid decomposition scheme to extract features from the iris. Localizing the iris involves generating an edge map by thresholding the
12 magnitude of the image intensity gradient given by | ∇(G(x, y) ∗ f (x, y)) |
(2.10)
where ∇=(
∂ ∂ , ) ∂x ∂y
(2.11)
are the edge gradients in the x and y directions [53] and G(x, y) is a standard smoothing Gaussian. Equation 2.10 refers to smoothing the image f (x, y) with a Gauss filter G and computing the edge magnitudes. This is then thresholded to generate an edge map. Generating this edge map is followed by voting for contours that define the iris boundaries. The Hough transform is defined as H(xc , yc , r) =
n X
h(xj , yj , xc , yc , r)
(2.12)
j=1
where
and
1, if g(xj , yj , xc , yc , r) = 0 h(xj , yj , xc , yc , r) = 0, otherwise g(xj , yj , xc , yc , r) = (xj − xc )2 + (yj − yc )2 − r2
(2.13)
(2.14)
For every circle (x0 , y0 , r) that passes through (xj , yj ), g(xj , yj , xc , yc , r) = 0. Therefore, the (x0 , y0 , r) that maximizes H represents the circle with the most edge points and is a good approximation of the contour of interest. The eyelid contours are modelled in a similar fashion. Wildes performs image normalization and addresses size and rotation invariance by using image registration. An acquired image fa is transformed into alignment with a database image fd so that their intensity differences are minimal. The mapping function is (u, v) and minimizes Z Z (fd (x, y) − fa (x − u, y − v))2 dxdy x
(2.15)
y
while taking into account a transform of coordinates (x, y) to (x0 , y 0 ) defined by Ã
x0 y0
!
=
Ã
x y
!
− sR(φ)
Ã
x y
!
(2.16)
13 where s is a scaling factor and R(φ) a matrix representing rotation by φ. Capturing the unique spatial and spectral characteristics of the iris is accomplished using a set of Laplacian of Gaussian filters that achieves a bandpass decomposition of the iris. The filters are represented by LoGρσ = −
1 ρ2 −ρ2 /2σ2 (1 − )e πσ 4 2σ 2
(2.17)
where ρ refers to the radial distance of a point from the center of the filter and σ the standard deviation of the Gaussian. The bandpass decomposition scheme is performed as follows: let W represent a filter and f be an image. f is convolved with W so as to produce a set of low-pass filtered images gk+1 according to g0 = f
(2.18)
gk+1 = (W ∗ gk )↓2 for k = 0, 1, . . .
(2.19)
with (.)↓2 entailing downsampling by a factor of 2 in each direction. The kth level of the Laplacian pyramid lk is formed as the difference between gk and gk+1 . In order to match the sampling rate of gk , gk+1 is expanded before subtraction. This is done by upsampling and interpolation, denoted by lk = gk − 4W ∗ (gk+1 )↑2
(2.20)
where (.)↑2 indicates upsampling by a factor of 2 by inserting a row and column of zeros between each row and column of the original image. The factor of 4 is required since 3/4 of the samples are zeros. This resulting Laplacian pyramid is the foundation of all subsequent processing and analysis. Wildes determines a goodness of match for two signals by computing a normalized correlation measure. This can be defined as follows: let f1 (x, y) and f2 (x, y) be two images of size n × m. Let n
µ1 = and
m
1 XX f1 (x, y) nm x=1 y=1
v uX m u n X σ1 = t (f1 (x, y) − µ1 )2 x=1 y=1
(2.21)
(2.22)
14 be the mean and standard deviation of p1 respectively. µ2 and σ2 are similarly defined for p2 . The normalized correlation between p1 and p2 can be stated as Pn Pm y=1 (f1 (x, y) − µ1 )(f2 (x, y) − µ2 ) x=1 COR(f1 , f2 ) = nmσ1 σ2
(2.23)
Normalized correlation captures integrated similarity of corresponding points and also accounts for local variations in image intensity. Four measures are calculated - one for each band of the Laplacian pyramid. This correlation procedure is performed for the verification procedure only. Thereafter, Fisher linear discriminant analysis is used to make a decision i.e. accept or reject. The discriminant is defined on a set of iris training data. Boles [20] uses circular edge detection to locate the iris. Compensation for differences in size is performed by determining the ratio of the acquired iris diameter and the reference image and then using this ratio to create virtual circles - which have the same diameter - for extracting information. In addition,the virtual circles have the same number of points (a normalization value N which is a power of 2 and controls the number of decomposition levels of the wavelet transform). Iris characteristics are extracted using the dyadic wavelet transform zero crossings [85]. Two dissimilarity functions are tested for iris signal comparison. Denote the zerocrossing representation of a signature f at a particular resolution level j by Z j f . Also, let Pj = {pj (r); r = 1, ..., Rj } be the set containing the locations of zero-crossing points at level j where Rj is the number of zero-crossings of this representation at level
j. Then Zj f can be uniquely represented as a set of ordered complex numbers whose imaginary [ρj ]f and real [µj ]f parts indicate the zero-crossing position and magnitude of Zj f between two adjacent zero-crossing points, respectively. The dissimilarity functions that compare the unknown signature g and template f at resolution j are (1)
dj (f, g) = min m
(2) dj (f, g)
= min m
P Rj
N X
n=1
| Zj f (n) − ΓZj g(n + m) |2 , m ∈ [0, N − 1]
r=1 {[µj (r)]f [ρj (r)]f − Γ[µj (r + m)]g [ρj (r + m)]g } PR j Γ r=1 | [µj (r)]f [ρj (r)]f || [µj (r)]g [ρj (r)]g |
(2.24)
2
, m ∈ [0, Rj − 1] (2.25)
where Γ is the scale factor and is equal to the ratio between the radius of the virtual circle and that of the unknown signature. Equation 2.24 uses all the points of the signal representation while Equation 2.25 uses only the zero-crossing points. The overall
15 dissimilarity value over a resolution interval is the average of the values determined at each resolution level. Ma et al. [73] have contributed extensively to the field of iris recognition. Their works include multichannel Gabor filtering [63] for extracting texture details, circular symmetric filters [75] and key local variations in 1D signals [74]. In [63], the iris texture is unwrapped and transformed into a rectangle. This is divided into 8 sub-images and local information is extracted from each sub-image. A features vector is constructed as an ordered sequence of the sub-image information, with the sub-image information providing local information while the ordered sequence represents the global content. A 2D even Gabor filter extracts the textural details. The parameters of the Gabor function include 4 orientations and 5 frequencies, giving a total of 20 filters. Using these filters, 20 output images are produced from each sub-image, providing a total of 160 output images per iris. For each output image, a feature value is determined by computing the average absolute deviation (AAD) AAD =
1 X ( | f (x, y) − µ |) N N
(2.26)
where N is the number of pixels in the sub-image, µ is the mean of f and f (x, y) is the value at (x, y) in the image. Iris matching is performed using the weighted Euclidean distance (WED) for comparison of features vectors. The WED is v u BN N uX X t wi W ED = (f k − f(i,j) )2 (i,j)
i=1
(2.27)
j=1
with wi denoting the ith weighting coefficient, BN the number of sub-images and N the number of features extracted from each sub-image. f(i,j) is the jth feature k component of the ith sub-image of unknown iris and f(i,j) is the jth feature component
of the ith sub-image of iris indexed by k. The wi are determined from empirical results. Circular symmetric [75] filters (CSF) were developed on the basis of Gabor filters for feature extraction and do not have orientation selectivity. However, they can capture information in a particular frequency band. The CSF is defined as y2 1 1 x2 G(x, y, f ) = exp[− ( 2 + 2 )]M (x, y, f ) 2πσx σy 2 σx σy
(2.28)
where M (x, y, f ) = cos[2πif (
p x2 + y 2 )]
(2.29)
16 The aspect ratio of the Gauss envelope is controlled by σx and σy along the x and y axes respectively. A modulating function is specified by M (x, y, f ) with f denoting the frequency of the sinusoidal function. As mentioned earlier, the filter does not provide orientation information. The iris texture is mapped to a 64×512 rectangular block and the region for analysis is the top-most 75 percent (48×512) section - moving from bottom to top implies a radial shift from the pupil to the iris and sclera boundary. The texture is divided vertically into 3 local regions and each region is filtered with a CSF of different frequency from the others. Features are extracted from each 8×8 (non-overlapping) block by computing its AAD, providing a total of 384 features. Iris codes are compared by implementing the nearest feature line (NFL) method [106]. The NFL method assumes that every class has at least two members. The line that passes through two feature members in the same class can extrapolate or interpolate to produce a feature line in the feature space. Consider two feature points fi and fj that belong to the same class s. The distance d between the feature line fis fjs passing through fis and fjs and a query feature point fx is calculated as d(fx , fis fjs ) =|| fx − psi,j ||
(2.30)
where psi,j is the projection point of fx . Hence, psi,j can be expressed as psi,j = µfis + (1 − µ)fjs = fis + µ(fjs − fis ) with µ=
(fx − fis )T · (fjs − fis ) (fjs − fis )T · (fjs − fis )
(2.31)
(2.32)
By traversing all i and j (i 6= j), we can choose the feature line of minimal distance and name it the nearest feature line. The NFL is improved by limiting the extent of the feature line and redefining the distance if the point is out of a certain range of the feature line. The pattern similarity measure is || fx − fis || β < T1 , d(fx , fis fjs ) = || fx − f˜x || T1 ≤ β ≤ T2 , || fx − f s || β > T2 j
(2.33)
17 where f˜x is the projection of fx on the feature line fis fjs , β is a position parameter and T1 and T2 are two thresholds set at -0.5 and 1.5 respectively. The system of Ma et al. [74] analyzed key local variations and achieved a correct recognition rate of 100% on a test set consisting of 306 different iris classes from 213 subjects. The iris texture is normalized to a 64×512 rectangular block - the vertical direction corresponds to a radial shift while the horizontal direction is an angular shift. Information density is more pronounced in the angular direction [37] which corresponds to horizontal lines in the normalized image. A set of 1D intensity signals S is generated from the normalized image I, of size K×L (64×512 in their case), as follows: M 1 X Si = I(i−1)∗M +j i=1,2,...N M j=1 I1 . . T T ) I = Ix = (I1T , ..., IxT ..., IK . . IK
(2.34)
where Ix are the grey values of the xth row in I. M is the total number of rows used to form a 1D signal and N is the total number of 1D signals. In their experiments, they choose M = 5 and N = 10. The dyadic wavelet - with a quadratic spline wavelet function - decomposes the signal at 2 scales. Local minima and maxima are computed to describe key variations in the texture. The positions of these sharp variations are noted and they form a set of features. For each Si , the position sequences at the 2 scales are concatenated to form a set of features fi = {d1 , d2 , ...di , ...dm ; dm+1 , dm+2 , ...dm+n ; p1 , p2 }
(2.35)
The first m components are from the first scale, the next n components are from second scale and di denotes the position of a key variation in the signal. p1 and p2 represent the property of the first local sharp variation point at the two scales respectively:
18 • 1 if it is a local minimum. • -1 otherwise. The fi s are concatenated to form an ordered features vector f . It must be pointed out that since every signal has distinct key variations, the dimensionality of f is not constant. To facilitate the matching of iris features vectors, the original vectors must be expanded to a sequence of zeros and ones since they are not of constant dimensionality. The transform is performed as follows: • At each position represented by the features vector, the binary sequence changes
from 1 to 0 or from 0 to 1. The start of the sequence is known for each scale
since it is stored in p. If p is -1 then the first (d1 − 1) components are set to 1
otherwise they are set to 0.
• Thereafter, the remaining sequence can be generated from the remaining di . The similarity of iris codes is computed using the XOR operator. Lim et al. [62] localize the iris using edge detection and the accumulator method of the Hough transform. Using the method of Daugman, the texture is normalized to a 450×60 grid. The width (450) corresponds to the angular direction while the height (60) is the radial direction. The Haar wavelet [84] is applied 4 times to the iris texture. At the 4th pass, the coefficients in the HH sub-band and the average of the coefficients in each of the 3 remaining HH sub-bands are extracted as features. Figure 2.4 illustrates the Haar decomposition process. The squares represent subband data at a particular scale. The feature set contains 87 elements. A quantization is performed on these elements by setting positive elements to 1 and negative ones to 0. New iris patterns are learned by employing a competitive learning neural network method called learning vector quantizing (LVQ). 6000 samples were collected from Korean students in their twenties - half the samples were used to train the LVQ and the other half were for tests and experiments. In comparing the Haar and Gabor methods for feature extraction, their Haar implementation was superior. Ali and Hassanien [13] use edge detection and search for maximas in the edge map to locate the iris. They incorporate filtering and edge linking for boundary detection.
19
LL
HL HL
LH
HH
LH
HH
Figure 2.4: Sub-bands produced by the Haar decomposition process Daugman’s polar transform maps the iris to a 450×60 grid and, thereafter, they apply the Haar wavelet and feature extraction process described above. A normalized Hamming distance function is employed for measuring similarity of patterns. However, they do not mention a threshold for acceptance. In [57], the iris zone is modelled as a mixture of 3 Gaussian distributions where each distribution represents pupil, face and iris respectively. The probability density function is p(x) =
3 X k=1
ωk f (x | µk , σk )
(2.36)
The variables µ and σ describe the particular form of the densities. These parameters are estimated using expectation maximization (EM). When the form of the density has been established, the iris region is extracted and circles are fitted to it to refine the boundaries. The EM algorithm sometimes fails in the presence of indistinguishable densities. This is addressed by using a predefined solution space to ensure correct estimation. Their experiments also demonstrated that wavelet components at low frequencies in the radial direction and low to mid range frequencies in the angular direction have a rich information content and are robust to noise. The feature extraction process is preceded by Gaussian low pass filtering of the normalized image in the radial direction, resulting in a smoothed image g(x, y). Thereafter, difference of Gaussian wavelets (DoG) decompose the texture in the angular direction. The
20 wavelets have a B-spline function ϕs(k) (x) =
1 x ϕ s(k) s(k)
(2.37)
Given g(x, y), a set of lowpass filter signals is yielded by convolving g(x, ·) with ϕ s(k) . The difference between adjacent smoothed signals ϕs(k) ∗ g(x, ·) and ϕs(k−1) ∗ g(x, ·)
forms the kth band
Wk = ϕs(k) ∗ g(x, ·) − ϕs(k−1) ∗ g(x, ·)
(2.38)
The functions s(k) and s(k − 1) are experimentally chosen. The signal content in the iris texture is decomposed into multiple bands which are thereafter approximated by
piecewise linear curves. These curves are constructed using a set of node points such that the distortion (energy) in the signal is minimized. There are twenty node points that define the curve. The similarity between two curves is measured by computing a normalized cross correlation coefficient: hω˜ u1 , u˜2 i = hω˜ u1 , u˜2 i =
hω˜ u1 · u˜2 i σ(ω˜ u1 )σ(˜ u2 )
N 1 X ω(x)˜ u1 (x)˜ u2 (x) N x=1
(2.39)
(2.40)
where u˜1 and u˜2 are zero mean signals. The inner product is represented by h·i, σ
is the standard deviation and N the number of points on the curve for which the
correlation coefficient is computed. A weighting value is defined by ω(x) = 1 − ϕs(k) (x − c) − ϕs(k−1) (x − c)
(2.41)
with c denoting an iris location that has been detected and N N (c) denoting the neighbourhood of that location. The problem of head tilt is accommodated in the system by shifting the signal’s pixels left or right and comparing it to stored patterns. The best match in a 1-NN (nearest neighbour) rule classifies the iris pattern. The system performs much better than Wildes’, Boles’ and Sanchez-Avila’s [115, 20, 23]. Emergent-frequency and instantaneous-phase are discussed in [24] for iris code generation. These features are derived from the original signal and its Hilbert transform. Given any real signal x(t), we can construct the analytic signal Zx (t) as follows: Zx (t) = x(t) + j.H(x(t))
(2.42)
21 where H(x(t)) is the Hilbert transform of x(t). The instantaneous phase is ϕi (ρ, θ) = arctan
Im(Zi (ρ, θ)) Re(Zi (ρ, θ))
(2.43)
with ρ and θ describing the polar coordinate system of the normalized iris as described by Daugman. The iris region is passed through a filterbank of 3 channels to isolate dominant components. These components are FM demodulated to identify the emergent frequencies ai where ai = arccos[
zi (ρ, θ + 1) + zi (ρ, θ − 1) ] 2z(ρ, θ)
(2.44)
The emergent frequencies and instantaneous phase are thresholded to generate the iris code. Similar to other systems, iris rotation is accommodated by shifting the signal left or right and selecting the best match. Once again, a Hamming distance function computes a match ratio between two iris codes, with a threshold of 0.5 demarcating imposters from authentic persons. Their experiments show a FRR of 11%. Multi-scale zero crossings are discussed in [23]. There are two versions of the algorithm presented which differ in the way the iris signal is represented. One representation uses the grey level values located on the contour of a virtual circle of radius r positioned at the center of the pupil. These grey values constitute a 1D signal. The values are selected at angular increments of
2π Ls
where Ls is the length of the signal.
This signal can be represented as IS = IE (xc + rcosθ, yc + rsinθ)
(2.45)
where 2n
π π c ≤ θ ≤ 2(n + 1) , n ∈ N ∪ {0} Ls Ls
(2.46)
and r is a predefined radius with (xc , yc ) being the center of the pupil in the iris image IE . The second representation considers grey values on each virtual circle centered at (xc , yc ) such that ri ≤ r ≤ re with increments
2π Ls
and ri and re specified. The iris
signal is constructed as follows:
re X 1 IE (xc + rcosθ, yc + rsinθ) IS = re − ri + 1 r=r i
(2.47)
22 with θ defined as in the first representation. Feature extraction is accomplished using a multiscale zero crossing representation. Let f (x) ∈ L2 (R) and {W2j f (x)} with j ∈ Z be its dyadic wavelet transform.
Given a pair of consecutive zero crossings of W2j with abscissae zn−1 and zn , the integral en =
Z
zn
W2j f (x)dx
(2.48)
zn−1
is computed. For any W2j , the zn (n ∈ Z) can be represented by a piecewise constant function
Z2j f (x) =
en , x ∈ [zn−1 , zn ] zn − zn−1
(2.49)
A stable representation is computed by determining the zero crossing representation of the dyadic wavelet transform of the iris signature (scales 2j , j ∈ Z for the multiscale
representation). If the fine scale is 1 and the largest 2J , the discrete zero crossing representation of IS is {(Z2j (IS )) 1 ≤ j ≤ J}
(2.50)
Zero crossings of the wavelet transform are estimated from change in sign of the samples. This is done by approximating the position with a linear interpolation between 2 samples of different sign. Three similarity measures - the Euclidean distance, normalized Hamming distance and the dissimilarity measure of Equation 2.25 - are compared. In addition, Daugman’s Gabor method for feature extraction is compared to the multiscale zero-crossing method. The Hamming distance provided the best metric for similarity while the multiscale zero-crossing method performed better (99.6% classification accuracy) than the Gabor extraction (98.3% classification accuracy). In [1], an indirect identification method using the human iris is investigated. A fundus camera acquires the iris image and a suitable sample is binarized and used for an optical Fourier analysis. The light intensity distribution of the optical Fourier spectrum is scanned and digitally stored as a feature image. Comparison of images is performed using simple correlation functions. Zorski et al. [113] discuss using the Hough transform to search for unique patterns in the iris.
23 2.4
Current iris segmentation algorithms
In the previous section, we discussed in detail the many iris recognition algorithms in the literature. Included in this were the techniques used for segmenting iris images, with the two important approaches being integro-differential operators and the Hough transform or its variants. These techniques are successful but do not take into account corruption of texture regions by eyelashes and specular reflection. The assumption relied on is that image pixels located in a square or semi-circular region to the left and right of the pupil are uncorrupted. In the literature, two approaches are documented that account for poor or corrupted iris texture. Eyelash and reflection detection has been proposed in [67]. It divides the eyelash problem into two possibilities - separable eyelashes and multiple eyelashes. Separable eyelashes are treated as edges. The image f (x, y) is convolved with a Gabor filter and thresholded to segment the eyelashes. The Gabor function is defined as
x2 }cos(2πux) (2.51) 2σ 2 where u is the frequency of the sinusoidal wave and σ is the standard deviation of G(x, u, σ) = exp{
the Gaussian. If the resultant value of a point falls below a threshold, it belongs to an eyelash: f (x, y) ∗ G(v, u, σ) < K1
(2.52)
where K1 is a pre-determined threshold and ∗ is the convolution operator. Multiple eyelashes are modelled using an intensity variation model - eyelashes overlapping in a small area have a low intensity variation. If the variance of intensity in the area is below a threshold, the center of the window is labelled an eyelash pixel. This can be stated as
PN
i=−N
PN
+ i, y + j) − M )2 < K2 (2N + 1)2
j=−N (f (x
(2.53)
where M is the mean intensity in the window and (2N + 1)2 is the window size and K2 is a threshold. A connected components algorithm is also implemented to avoid misclassification of pixels. K1 and K2 are empirically determined parameters. Their reflection model defines two types of reflections - strong and weak. A pixel of strong reflection has an intensity value greater than a specified threshold. A pixel of weak reflection is a transition region between strong reflection and the iris. Strong
24 reflection is recognized by an inequality, f (x, y) > K3 , where K3 = 180. Weak reflections are detected using a statistical procedure. They show that the intensity of iris pixels follows a normal distribution with mean µ and standard deviation σ. The test is based on the equation µ + ασ < f (x, y)
(2.54)
The parameter α controls the false acceptance and false rejection rates. They use an iterative approach to estimate µ and σ and, hence, locate weak reflection pixels. The method described above is effective. However, it uses thresholding for determining eyelash pixels and the threshold is not automatically determined. If the region being thresholded does not have 2 distinct grey level distributions - corresponding to iris and eyelash pixels - the algorithm may fail. Incorrect threshold selection will also affect the result negatively. The frequency distribution of iris images has been analyzed to determine occlusions by eyelids and eyelashes [73]. Frequencies in the Fourier domain provide an insight into the content of the iris region - frequencies outside a specified range signify occlusion by eyelids and/or eyelashes. This enables the system to accept or reject an image for processing. Although effective, the technique does not provide a solution to removing the useless regions. 2.5
Analysis of iris segmentation
Segmentation of an iris image is a classical image processing problem. The following occurrences are possible in the acquired iris image: • Bright lighting can cause specular reflection off the eye, which makes the processing stage almost impossible at times.
• Poor lighting can hide the textural details and introduce an uneven illumination component.
• Atmospheric conditions and human emotion affect the state and size of the pupil. This causes the iris region to vary in size.
• The iris may be partially hidden. This can be caused by eyelashes, eyelids, contact lenses and glasses.
25
1 2
3 5
4
Figure 2.5: An iris image - 1) upper eyelid 2) eyelashes 3) uneven illumination 4) specular reflection 5) lower eyelid
Figure 2.5 shows some of the artifacts that manifest in a typical input image. These artifacts are noise components in the iris texture signal. They must be segmented from the image. The core focus of this research is to improve the iris segmentation process by considering the above problems as a normal occurrence in an iris image. As such we make no assumptions about the state and nature of the input data except that it may be corrupted. The iris region we wish to extract has texture properties that are different from those of pixels of eyelashes, reflection, pupil and eyelids. This provides the basis for a texture feature extraction and pattern classification approach for segmenting the different components from the iris image. In a machine vision framework, the following steps must be performed:
26 1. Image preprocessing: The input image is enhanced to improve edge contrast for finding the boundary contours. Noise elements such as tiny spurious points should be filtered out. The iris boundaries are located and the area of interest extracted. 2. Feature extraction: Texture properties are computed for each pixel. This will describe regions in the image. 3. Image segmentation: Pixels are grouped together to form regions with similar properties. These regions represent objects of interest in the image. This process is guided by the texture properties of the pixels. 4. Pattern (object) recognition: Regions are classified (recognized) and accepted or rejected for further processing. Hence, we can extract objects of our choice from the image. The region of interest is uncorrupted iris texture. This process is reflected in Figure 2.6. The input box represents the iris image that will be processed. The output box represents the result of the algorithm - this is the extracted region of interest.
27
INPUT ?
IMAGE PREPROCESSING ?
FEATURE EXTRACTION ?
IMAGE SEGMENTATION ?
PATTERN RECOGNITION ?
OUTPUT Figure 2.6: Machine vision framework for iris texture extraction
Chapter 3 Feature extraction 3.1
Introduction
A digital image contains information about a scene that has an object or objects. An important part of machine vision is to recognize individual image regions that correspond to objects of interest. An object in the image consists of a set of pixels and we will assume that each pixel in this set has an associated pattern. A pattern is a set of descriptors that identifies the pixel. Hence, we can also associate a pattern with the image region for its description and recognition. We call the set of patterns features, features vector or feature set. Features for a pixel are a set of numerical descriptors computed during the process of feature extraction [7]. In this chapter, we discuss methods for computing features that describe image texture. By extracting texture properties for each pixel we obtain descriptions that can be grouped together to form homogenous regions. This will enable us to discard or accept a region for our specific needs. Machines only have a notion of numbers - at their lowest level these are bits. We attach semantics to these numbers to give it meaning for describing image texture regions. We must also remember that what is perceptible to man may not be perceptible to a machine and vice versa. We first discuss image formulation and how an image is represented by pixels. Thereafter, we provide some definitions of texture and then discuss various feature extraction techniques for computing texture properties of image pixels.
3.2
Image formulation
Our visual response to the world around us is stimulated by electromagnetic (EM) radiation called light. Light is the visible part of the spectrum that lies in the range 350nm to 780nm. Colour is ascribed to the frequency of the light waves. Some types 28
29 of perceived information can be brightness, edges, texture or a combination of these. The physical objects that make up a scene have different surface properties with varying degrees of optical reflectivity and absorbability. The visual information we perceive is determined by the content of the light reflected off the object’s surface. Light frequencies reaching the eye represent the 3D perspective world and are transformed by the eye into a 2D intensity representation on the retina. An intensity value on a 2D image is visual information about the scene that encapsulates and integrates the entire optical and image formation process. Although humans are limited to the visual band of the EM spectrum, imaging sensors cover almost the entire EM range of frequencies. These sensors acquire images not visible to the human eye. Common examples include x-ray, gamma rays, radar and UV band imaging. We shall denote a continuous 2D image using the functional form f (x, y) where (x, y) are coordinates in the plane [53]. The value of f at (x, y) is a positive scalar quantity that has a physical meaning determined by the image source. We can represent f in the form of two components: 1. Illumination component i(x, y) - this is the amount of source illumination incident on the scene. It is dependent on the source illumination. 2. Reflectance component r(x, y) - the amount of illumination reflected by objects in the scene. This is dependent on the objects in the scene. These two components are combined as follows: f (x, y) = i(x, y)r(x, y)
(3.1)
where 0 < i(x, y) < ∞ and 0 < r(x, y) < 1 A 2D digital image is a finite array of discrete values associated with the continuous case. It is captured by an imaging sensor that samples the scene and generates a grid of pixels (picture elements). The sensor produces an output that is usually a continuous voltage relative to the illumination and reflectance in scene. This waveform
30 is sampled and then quantized to create a digital image. Sampling entails digitizing the co-ordinate values while quantization digitizes the amplitudes. Spatial resolution of the image is determined by the sampling steps - small steps provide better object detail and resolution than larger steps. Consider Figures 3.1 and 3.2. Figure 3.1 shows a simple image with a gradient transition from left to right (or vice versa). It changes from light to dark and then light again. Consider a line passing through the image from A to B. The amplitudes from A to B are shown in Figure 3.2, where amplitude increases from dark to light. To represent the path from A to B as a line of pixels in a digital image, we must first sample from A to B to digitize the spatial coordinates, then quantize the amplitude at each sampled coordinate. The above process produces a matrix of real and finite numbers where sampling on f(x, y) gives (x, y) discrete values. We define f on the domain R such that R = {(x, y), 0 ≤ x ≤ N − 1, 0 ≤ y ≤ M − 1} and
f (x, y) =
f (0, 0)
f (1, 0)
f (1, 0)
f (1, 1)
·
·
·
·
· ·
f (N − 1, 0)
· ·
f (N − 1, 1)
· ·
·
· ·
·
f (M − 1, 0) f (M − 1, 0) · · f (N − 1, M − 1)
where M is the total number of rows and N the total number of columns. Each element in the above matrix is a pixel. An n-band image will have n such matrices. For simplicity, we assume that we are working with a single band image. 3.3
Texture in digital images
In real world problems, images do not demonstrate uniform intensities - they contain variations in tonal content. This constitutes the textures in the image and is considered parameters to be estimated during the processing stage. These parameters give texture primitives varying degrees of fineness, coarseness and periodicity of patterns. It is not easy to define primitives for texture since they can manifest at multiple scales. We also have tone and structure in texture. Tone is the intensity property of
31
A t
B t
Figure 3.1: An example of a simple scene A
B
Quantization
.... . .... .... ... .... .... .... .... .... . . . .... .. .... .... .... .... .... .... .... ... . . . .... ... ..... ..... ...... ...... ...... ..... ...... ...... . . ....... . . . . . ....... ........ ......... ......... .......... .......... ............... .........................................
Sampling interval Figure 3.2: A scan line from A to B in the image shown in Figure 3.1
the primitive while structure is the spatial relationship between primitives. We can describe tone using min, max and average intensity. Primitives can be random or have a mutual dependence. The same number and type of primitive does not necessarily create similar textures. Similarly, the same spatial relations do not imply texture uniqueness. We must view tone and spatial relations as a dependency that gives texture uniqueness. Small primitives give fine texture; large ones make the texture coarse. Texture can also be described in terms of texture strength. Strong texture has well defined primitives with a regular structure - elements and spatial relations
32
Figure 3.3: Examples of textures in an iris image
wgwgwgwgwgw wgwgwgwgwgw wgwgwgwgwgw wgwgwgwgwgw wgwgwgwgwgw wgwgwgwgwgw wgwgwgwgwgw wgwgwgwgwgw wgwgwgwgwgw wgwgwgwgwgw wgwgwgwgwgw
ggggggggggg w w w w w ggggggggggg ggggggggggg w w w w w ggggggggggg ggggggggggg w w w w w ggggggggggg ggggggggggg ggggggggggg w w w w w ggggggggggg ggggggggggg w w w w w ggggggggggg
Figure 3.4: Visual texture patterns wg
gg gg w
Figure 3.5: Texture primitives
are easily determined. Weak textures have primitives and spatial relations that are difficult to define - they are referred to as random [81]. There is no universal definition of texture since definitions are related to the method of analysis [14]. It can be regarded as a grouping of similarity in the image [2]; a repetition of basic structural patterns [7]; or as intensity variations that follow a particular periodicity [28]. These groupings, repetitive patterns or periodic variations are called texture primitives or texture elements (texels). A texture primitive or texel is the building block that makes a texture. Figure 3.3 shows the possible textures in an iris image. A texel is a set of contiguous pixels. A grouping of texels allows us to perceive characteristics such as tonal periodicity, orientation, structural shapes and primitive sizes. Figure 3.4 shows a grid of circles and we assume that a circle is a pixel with a
33 colour that is black or white. Figure 3.5 shows texture primitives that can be used to create the textures in Figure 3.4 by contiguously repeating their occurrence. It is obvious by looking at these texels that they have different sizes, tonal content and spatial arrangement that distinguish one texture from the other. In addition, Figure 3.4 (a) has a perceivable vertical orientation of elements that is not evident in its texel. This orientation can only be perceived by a machine if we process a set of pixels that show a vertical orientation of elements. From this simple study, we can deduce that texture characteristics are dependent on the primitives and their spatial arrangement. Real textures are more complex than the simple ones in the above example - they have a larger tonal content and more complicated primitives. However, they are all patterns that can be analyzed using texture analysis methods. We must also understand the concept of feature sets. Not all characteristics of textures will discriminate them effectively. In the above example, using grey level occurrences will not help recognize the textures since they both have the same grey level occurrences i.e. black and white. However, we can use directionality as a feature since one texture has vertical directionality while the other does not. Hence, a feature set must provide good separability of textures [99]. Our main aim in analyzing texture is to classify or segment it. We require precise numerical properties to make it machine recognizable. These properties are method dependent. Texture analysis [14] in machine vision is time and space intensive and draws ideas from a number of scientific disciplines, each with its own philosophical and theoretical framework. It is also sensitive to scale, resolution and orientation. The principle approaches are statistical, structural, spectral (signal processing) and model-based techniques [28]. They differ in the context that image pixels are viewed.
3.4
Applications of texture analysis
Texture analysis methods can be used for classification and segmentation. In texture classification, a texture image or region must be identified from a given set of classes. In segmentation problems, texture boundaries must be identified so that regions can be separated. Every pixel is usually classified as belonging to a particular region class.
34 Window Center pixel
- y ¾
Image
Figure 3.6: Texture calculations using a window 3.5
Texture analysis methods
Texture feature extraction is the process of computing image texture properties using estimated parameters that will assist in the discrimination of pixels for segmentation and classification purposes. We regard feature extraction as the process of parameter estimation and computation of feature sets using these parameters. A desired goal of the system that we are designing is to make it as automatic as possible by minimizing the number of tunable parameters. This improves the generalization of the problem. When we view textures, the human visual system can recognize the different patterns that discriminate and separate them. How can a machine perceive these different textures and separate them? In designing a feature extraction process, we must also take into account the fact that the technique implemented might work on a subset of textures rather than the entire set. Feature sets must be established for describing objects so that the separability and discrimination of these objects is (almost) optimal. Characteristics that describe a texture region must be computed using a neighbourhood operator - a function or method that takes its surrounding pixels into consideration to facilitate the process of meaningful data extraction. This is based on our definition of texels as sets of contiguous pixels. A solitary pixel does not provide sufficient information. To do this a window is centered at the pixel we want to compute features for. All the pixels within this window are then analyzed using any one of the techniques that will be discussed in the following subsections. Figure 3.6 shows an image with a window centered at a pixel for which we compute a set of texture features.
35
Figure 3.7: Test images (from top, left to right): EYELASH1, EYELASH2, IRIS1, IRIS2 and IRIS3 Texture analysis methods can be divided into the following four general groups: 1. Statistical texture analysis. 2. Structural and geometrical texture analysis. 3. Spectral and signal processing approaches. 4. Model-based approaches. For purposes of discussion in some of the following chapters, we introduce five test images, shown in Figure 3.7. We denote them EYELASH1, EYELASH2, IRIS1, IRIS2 and IRIS3.
36 3.5.1
Statistical texture analysis
The spatial distribution of intensity in an image can be treated as a set of random variables or occurrences. This enables a statistical interpretation of texture for pattern recognition. Grey level co-occurrence matrix features Discriminating features for texture separation can be computed using the statistical approach of grey level co-occurrence matrices (GLCM) [56, 98]. Texels (texture elements) repeat themselves in an image area to produce a region of texture. These primitives contain a configuration of pixels that gives the texture its defining characteristics. A GLCM is a matrix of second order statistics that represents pixel configurations as probabilities of pairwise grey level occurrences. These pairwise occurrences must satisfy a particular relationship in order to contribute to the probability matrix. The pixel-pair relationship denotes a spatial dependency for a particular texture. These dependencies are observed in the GLCM, from which a number of features can be computed. This approach has been found to be very popular and effective [61, 78, 88, 101]. Let us suppose that our image region of analysis is an M × N grid with M
rows and N columns. The grey scale pixel intensity at cell index (x, y) where y ∈
{0, 1, ..., M − 1} and x ∈ {0, 1, ..., N − 1} is quantized to Ng levels. The grey level cooccurrence of grey tone i and grey tone j in an image, occurring in two neighbouring
cells separated by distance d, can be specified in a matrix Pij of relative frequencies where 0 ≤ i, j ≤ Ng − 1. There also exists an angular relationship between the neighbouring cells. Let us call the resolution cell with grey tone i the reference
pixel and the resolution cell with grey tone j the neighbour pixel. If the reference and neighbour pixel have angular relationship specified by θ and are separated by distance d, then neighbour pixel occurs in the direction θ from the reference pixel at separated by distance d. The matrix P is symmetric i.e. P (i, j, d, θ) = P (j, i, d, θ) and a function of i, j, θ and d. For most image processing problems, θ is quantized to 45 degree intervals where θ ∈ {0◦ , 45◦ , 90◦ , 135◦ }. Figure 3.8 shows pixel spatial relations for 45 degree
intervals. Matrices for θ ∈ {180◦ , 225◦ , 270◦ , 315◦ } do not contain any additional
37 information and do not need to be computed. Let P T (i, j, d, θ) denote the transpose of P (i, j, d, θ). Then P (i, j, d, 0◦ ) = P T (i, j, d, 180◦ )
(3.2)
P (i, j, d, 45◦ ) = P T (i, j, d, 225◦ )
(3.3)
P (i, j, d, 90◦ ) = P T (i, j, d, 270◦ )
(3.4)
P (i, j, d, 135◦ ) = P T (i, j, d, 315◦ )
(3.5)
When θ is quantized to 45 degree intervals, unnormalized frequencies are defined by P (i, j, d, 0) =#{((k, l), (m, n)) ∈ (Ly × Lx ) × (Ly × Lx ) | k − m = 0, | l − n |= d,
(3.6)
I(k, l) = i, I(m, n) = j} P (i, j, d, 45) =#{((k, l), (m, n)) ∈ (Ly × Lx ) × (Ly × Lx ) | (k − m = d, l − n = −d)
(3.7)
or (k − m = −d, l − n = d), I(k, l) = i, I(m, n) = j} P (i, j, d, 90) =#{((k, l), (m, n)) ∈ (Ly × Lx ) × (Ly × Lx ) | | k − m |= d, l − n = 0,
(3.8)
I(k, l) = i, I(m, n) = j} P (i, j, d, 135) =#{((k, l), (m, n)) ∈ (Ly × Lx ) × (Ly × Lx ) | (k − m = d, l − n = d)
(3.9)
or (k − m = −d, l − n = −d), I(k, l) = i, I(m, n) = j} where # denotes the number of elements in the set [98]. To compute features, the GLCM frequencies must be normalized. This is done by dividing each entry by the sum of all the entries in the matrix. Figure 3.9 shows a GLCM computed for the example image using Equations 3.6 - 3.9. In order to compute a GLCM describing a pixel relation specified by d and θ, we must look at all pixel pairs satisfying this relationship. Then we increment entry
38 45◦
90◦
135◦
6 µ ¡ h h x x h h h h ¡ ¡ h h x h x h h h ¡ h h x h h x h h ¡ h x¡h h h x h x h @ ¡ h h h h@h ¡ x h x h x h x h xh h h h h h h h h h h h h h h h h h
@ I h h h @x @ h h x h @ h h@h x @ h h h
1 2 3 4
0
x x x x x - d
◦
6
h h h h h h h h h h h h h h h h h h
Reference pixel
Figure 3.8: Pixel relations
0
1
2
3
0
8
0
0
4
3
1
0
8
2
0
0
0
2
0
2
4
1
0
0
3
4
0
1
6
2
2
1
1
1
2
2
1
1
1
2
3
3
3
0
3
0
0
3
0 Image
GLCM for d = 1 and θ = 0
Figure 3.9: Computing an un-normalized GLCM
(i, j) and (j, i) (for a symmetrical GLCM where θ ∈ {0◦ , 45◦ , 90◦ , 135◦ }) where i and
j are the grey level pair of the pixel pair. Note that the GLCM is an Ng × Ng matrix
- it is dependent on the number of grey levels in the image. Practically, this poses
a problem due to the large memory requirements. In addition, the matrix may be sparse and provide poor statistics for feature extraction. The occupancy of matrix entries can be improved by reducing the number of grey levels in the image. Serious consideration must be taken in this aspect so that textural details are not destroyed. Equal-probability quantization is discussed in [98] for reducing the number of grey levels. We present here a few common features computed from GLCMs. The interested reader can refer to [14, 98] for a broader perspective.
39 1. Angular Second Moment (ASM): ASM =
Ng −1
X
p2θ,d (i, j)
(3.10)
i,j=0
ASM is a measure of the homogeneity in an image. The sum of squares for a homogenous region is high because there are few entries in the GLCM and these entries have high values. From ASM we can compute the energy (ENER): EN ER =
√
ASM
(3.11)
2. Contrast (CONT): CONT =
Ng −1
X
i,j=0
(i − j)2 pθ,d (i, j)
(3.12)
Contrast measures the local intensity variation in an image region. There is an exponential increase as the difference between i and j increases. This measure increases with an increase in contrast. Greater contrast is favoured in this equation i.e. provides a larger numerical value. 3. Entropy (ENT): ENTR = −
Ng −1
X
pθ,d (i, j)log(pθ,d (i, j))
(3.13)
i,j=0
Entropy measures the regularity or orderliness in an image. A regular (homogenous) scene has low entropy while an irregular scene has high entropy. It is assumed that log(0) = 0. The negative sign makes the result positive since log(x), where 0 ≤ x ≤ 1, is always less than or equal to zero. The smaller the
value of pθ,d (i, j) the larger the absolute value of log(pθ,d (i, j)). 4. Local homogeneity (LOHO): LOHO =
Ng −1
X
i,j=0
1 pθ,d (i, j) 1 + (i − j)2
(3.14)
Local homogeneity is also referred to as inverse difference moment. It has large values for low contrast (homogenous) areas. The weighting factor 1 + (i − j) 2
provides small contributions from inhomogeneous regions.
40 5. Correlation (CORR): CORR =
Ng −1
X (ij)pθ,d (i, j) − (µx µy ) σx σy i,j=0
(3.15)
Correlation measures the grey level linear dependence between the pixels at the specified positions relative to each other. The properties µx , µy , σx and σy are descriptive statistics of the GLCMs. They can also be used as features. 1. GLCM means (µx , µy ): px (i) =
Ng −1
X
pθ,d (i, j)
(3.16)
X
pθ,d (i, j)
(3.17)
j=0
py (j) = µx =
Ng −1
i=0 Ng −1
X
ipx (i, j)
(3.18)
X
jpy (i, j)
(3.19)
i,j=0
µy =
Ng −1 i,j=0
The marginal properties across the rows and columns of the GLCM are described by px and py . The GLCM means are computed using the frequency of occurrence of grey level values. The grey level frequency is taken in conjunction with the combination of another grey level value. The values of µx and µy are equal if the GLCM is symmetrical. We compute a feature called MEAN: MEAN = µx = µy
(3.20)
for symmetrical GLCMs. 2. Standard deviation (σx , σy ): v uNg −1 uX σx = t px (i)(i − µx )2
(3.21)
i=0
v uNg −1 uX σy = t py (i)(i − µy )2 i=0
(3.22)
41 Standard deviation relies on the GLCM mean, and the dispersion around the mean of grey values within the GLCM. It deals specifically with combinations of pairs of grey values and is not the same as the standard deviation of grey levels in the original image. The standard deviations (σx , σy ) are equal if the GLCM is symmetric. We use SDEV = σx = σy
(3.23)
Zhang et al. [95] provide evidence that GLCM texture features perform differently for particular images. They also showed that combinations of three or four features were sufficient for good accuracy. In addition, they also showed that MEAN performed the best as a single feature or as a combination with another GLCM texture measure. Their work was consistent with previous works that divided GLCM features into 3 groups: 1. The contrast group - contrast, dissimilarity and homogeneity. 2. The orderliness group - energy and entropy. 3. Descriptive statistics - mean, standard deviation and correlation. and suggested combining one feature from group 1, one from group 2 and one or two from group 3. Groups 1 and 2 have their members strongly correlated with the others in their group. Sum and difference histograms Haralick’s texture features are computationally intensive. A method proposed by Unser [110] estimates coefficients using first-order statistics. Sum and difference histograms are used to generate texture features. Two image texture windows with centers at f (x, y) and f (x0 , y 0 ) are displaced by a distance d = (d1 , d2 ) such that (x0 , y 0 ) = (x + d1 , y + d2 )
(3.24)
The sum and difference grey levels at (x, y) are defined as sum(x,y) = f (x, y) + f (x0 , y 0 )
(3.25)
diff(x,y) = f (x, y) − f (x0 , y 0 )
(3.26)
42 Using these definitions, histograms for sum and difference grey levels are computed for d in directions 0◦ , 45◦ , 90◦ and 135◦ and for different distances. For each histogram the mean, energy, contrast and entropy can be computed for texture features. This provides 32 features for a single distance. Grey levels in the image are quantized to 32 levels in order to make the histogram statistics more meaningful. Run-length features Galloway’s [51] run-length technique computes texture properties from grey level runlengths in different directions. Coarse textures have a large number of neighbouring pixels of the same grey level while fine textures have fewer. The length and directions of these tonal primitives enable a way to describe textural characteristics. A run length matrix is Pθ (g, r) = ag,r
(3.27)
where ag,r is the frequency of a connected pixel interval of run-length r and grey level g in the direction θ. The four directions 0◦ , 45◦ , 90◦ and 135◦ are used. To obtain high run length values and good statistical descriptions, grey levels are reduced. Features computed from Pθ are short and long run-length emphasis, grey level distribution, run-length distribution and run percentage. The autocorrelation function It is possible to measure the size of texture primitives and regularity in an image using the autocorrelation function [28]. It can be defined for an image f (x, y) of size M × N as ρ(p, q) =
PM −q−1 PN −p−1 y=0
f (x, y)f (x + p, y + q) x=0 PM −1 PN −1 2 y=0 x=0 f (x, y)
(3.28)
for different values of p and q. For coarse textures the autocorrelation drops slowly with distance otherwise it drops rapidly. For regular primitives the function fluctuates, displaying peaks and valleys. Direct statistical features The simplest descriptor for textures is the use of direct statistics and gradient information computed from the grey levels in the texture window [103]. Common statistical
43 features include mean, median, variance and standard deviation. The gradient features include derivatives in the x and y directions. This technique is computationally efficient and will suffice for a basic interpretation of image pixels before more complex analysis.
3.5.2
Structural and geometrical texture analysis
This approach to texture analysis defines texture as being composed of basic texture elements. These primitives can be used to form more complex patterns.
Chen’s geometric features An effective method is provided by Chen [118] that generates binary patterns, from a grey level image, that are then processed for geometric properties. A stack of N different thresholded binary images is generated from the original image, which contains N grey levels, each time using a higher threshold. For each binary image the 1-valued pixels, and then the 0-valued pixels, are grouped into connected regions. Examining each connected component, an irregularity measure is computed and weighted in relation to the total size of connected components. For each binary image, 4 discriminatory properties are now available - the number of 1- and 0-connected regions and 2 irregularity measures for 1- and 0-connected regions. For the N binary images, 4 features are derived from each property - maximum value, average value, mean and standard deviation - yielding 16 texture features.
Local binary pattern operator A new method that is very promising is the use of local binary pattern operators (LBPO) [107, 108] for generating textural feature sets. Texture T is defined as the joint distribution {gc , g0 , . . . , gP −1 } of grey levels of P pixels in a local neighbourhood
N where P > 1. The neighbourhood is a circle of radius R, centered at pixel c, with
P − 1 pixels spaced out equally on its circumference. The signs of the differences between the center pixel and a neighbourhood pixel define the binary pattern generated.
44 The local binary pattern (LBP) for (P, R) at pixel c is
LBP(P,R) where
and
P P −1 s(gp − gc ) if U (N (P, R)) ≤ 2 p=0 = P + 1 otherwise 1, if x ≥ 0 s(x) = 0, if x < 0
U (N (P, R)) =
P X i=1
| s(gi − gc ) − s(gi−1 − gc ) |
(3.29)
(3.30)
(3.31)
with g0 = gP . U corresponds to the number of 0-1 changes between successive bits in the pattern. The LBP defines uniform patterns when U (N (P, R)) ≤ 2. It has
invariance against monotonic grey scale transforms and rotation, and can detect patterns for any quantization of angular space in the circular neighbourhood and for any
spatial resolution. Multiple values of P and R can be used to generate a texture description. Voronoi tessellation and graph theoretic approaches Tuceryan and Jain [82] discuss using Voronoi tessellation to extract texture tokens. Consider a set S of 3 or more tokens (points) in the Euclidean plane such that these points are not collinear and no 4 points are co-circular. For every point P in S, a Voronoi polygon is constructed - it is a polygon region of all points closer to P than any other point. The set of complete polygons in S is called the Voronoi diagram of S. The Voronoi tessellation of a plane is its Voronoi diagram and all incomplete polygons. The Voronoi polygon describes the neighbourhood of P . To extract tokens from grey level textures, a Laplacian of Gaussian filter is applied and the resulting image is thresholded to determine local maxima. Then connected components are extracted from the binary image to determine tokens. The Voronoi tessellation of the tokens are constructed and features are computed for each polygon - moments, magnitude and direction of the vector from the token to polygon centroid, polygon elongation and orientation of polygon major axis. The shapes of the Voronoi polygons describe the spatial distribution of texture elements.
45 The graph theoretic approach is also explored by Urquhart [97] and Ahuja [89]. Tomita et al. [49] propose a recursive primitive grouping scheme that uses histograms of primitives to establish texture features.
Syntactic patterns Syntactic texture pattern description draws an analogy between texel spatial structure and the structure of a formal language. During training, a grammar is inferred from the training set which contains words of the language describing the texture class. A grammar is constructed for each texture class in the training set. During the recognition process, words are extracted from the texture and the grammar that can complete the syntactic analysis of the description word determines the texture class. A strict grammar or placement rule cannot describe real textures since they are irregular, distorted and have structural variations - variable rules or non-deterministic grammars must be used. Zucker [105] regards real world textures as distortions of ideal and perfect ones. A placement rule is defined for the ideal texture by a graph that is similar in form to a semi-regular or regular tessellation. Fu [71] uses syntactic grammar and placement rules to generate texture models for pattern recognition.
3.5.3
Spectral and signal processing approaches
A frequency analysis of images provides suitable features for texture discrimination. The ability to filter an image into different frequency bands yields key signal variations for establishing features.
Spatial filters Spatial domain filters, such as the Prewitt, Sobel, Roberts and Laplacian [53], are the simplest and most direct way to capture the basic pattern of a texture. They are edge detectors that locate abrupt changes in the intensity function of an image. An edge is a property of a pixel and is computed using a neighbourhood of the pixel. Operators that describe an edge are generally expressed as partial derivatives. The intensity change is a gradient that points in the direction of the largest growth of the
46 image function. The gradient of an image f (x, y) at (x, y) is the vector ¶ µ ∂f ∂f , ∇f = = (Gx , Gy ) ∂x ∂y
(3.32)
The magnitude of this vector is ∇f =
sµ
∂f ∂x
¶2
+
µ
∂f ∂y
¶2
(3.33)
For digital images, it is more efficient to use ∇f ≈|
∂f ∂f |+| | ∂x ∂y
(3.34)
An important quantity is α(x, y), which is the direction angle of the vector ∇f. It can be computed using
α(x, y) = arctan
µ
Gy Gx
¶
(3.35)
Figures 3.10 - 3.13 show some popular spatial filters for edge detection. The masks compute the partial derivatives, from which the gradient magnitude and orientation for each pixel can be derived. We can use the edge information to compute texture characteristics. Edge frequency within a window describes texture roughness - fine textures will have a high number of edges within the window while coarse ones will have fewer edges. Contrast is characterized by gradient magnitudes - high contrast areas will have high magnitudes. The randomness of a texture can be characterized by computing the entropy of the edge magnitude histogram. Other texture properties of edge maps and edge distributions are discussed in [49]. Figure 3.14 shows the output of some spatial filters after processing iris image regions.
47
-1
0
0
-1
0
1
1
0
Figure 3.10: Roberts operators
-1
-1
-1
-1
0
1
0
0
0
-1
0
1
1
1
1
-1
0
1
Figure 3.11: Prewitt operators
-1
-2
-1
-1
0
1
0
0
0
-2
0
2
1
2
1
-1
0
1
Figure 3.12: Sobel operators
-1
-1
-1
-1
8
-1
-1
-1
-1
Figure 3.13: Laplace operator
48
(a) IRIS3
(b) Sobel applied to IRIS3
(c) Laplace applied to IRIS3
(d) EYELASH2
(e) Sobel applied to EYELASH2
(f) The Laplace applied to EYELASH2
Figure 3.14: Application of spatial filters Spatial moments The spatial moments [109] of an image region can be used as texture features. The (p + q)th moments for a region R in image f is computed by mpq =
X
xp y q f (x, y)
(3.36)
(x,y)∈R
Spatial moments can, theoretically, uniquely describe an image region using all orders. If the moments are computed around each pixel in the image, it is equivalent to filtering by a combination of filter masks. The resulting filtered images are used as texture features. First order moments are related to features such as energy and contrast. Tuceryan [109] uses (p + q) ≤ 2 for texture description. Details for computing
moments using masks can be found in [28]. Moments invariant to scale, translation
and rotation are discussed in [53, 81].
49 Law’s energy measures Laws [76, 77] has presented a set of filtering coefficients for establishing textural energy measures. Three simple kernels are used to derive Law’s 5 × 5 masks - A3 = (1, 2, 1)
represents averaging; S3 = (−1, 2 − 1) corresponds to spots and E3 = (−1, 0, 1) is for locating edges. By convolving these three vectors with each other and themselves, five vectors result: A5 = (1, 4, 6, 4, 1)
(3.37)
E5 = (−1, −2, 0, 2, 1)
(3.38)
S5 = (−1, 0, 2, 0, −1)
(3.39)
R5 = (1, −4, 6, −4, 1)
(3.40)
W5 = (−1, 2, 0, −2, −1)
(3.41)
These filters are able to assess average grey level (A5 ), edges (E5 ), spots (S5 ), ripples (R5 ) and waves (W5 ). The 5 × 5 masks for convolution are derived by mutually
multiplying these vectors, treating the first term as a column vector and the second term as a row vector. A texture characteristic is calculated by convolving the image with the 5 × 5 masks and then computing an energy statistic on the result. The Fourier transform
The human visual system is able to perform a multi-band analysis of texture images. It can decompose an image with particular selectivity for frequency and orientation [26]. The Fourier transform [53] of an image reveals its frequency and orientation distribution. Consider an image f (x, y) and its Fourier transform F (u, v). | F (u, v) | 2
is called the power spectrum and | · | is the modulus of a complex number. It is used
for computing texture features by evaluating the following integrals: Z 2Π Z r1 | F (u, v) |2 drdθ fr1 ,r2 = 0
and fθ1 ,θ2 = where
Z
(3.42)
r2
θ2 θ1
Z
r=
∞ 0
√
| F (u, v) |2 drdθ
u2 + v 2
(3.43)
50
Figure 3.15: Partitioning of the Fourier spectrum: a) ring filter, b) wedge filter and v θ = arctan( ) u The above integrals are able to divide the spectrum into specific rings and wedges for specific band and orientation processing. The average value of the energy contained within these rings and wedges characterizes texture and must be computed as features for description of the texture. The energy content of the rings provides a roughness estimation of the texture. High energy in rings with a large radius signifies a fine texture (high spatial frequency) while high energy in rings with small radii signifies a coarse texture (lower spatial frequencies). The energy content within wedges gives an indication of texture directionality. This technique is referred to as multiresolution processing since different channels tuned to different frequencies are analyzed. Figure 3.15 shows an example of a ring and wedge in the Fourier domain. Examples of the Fourier power spectrum for some example images are shown in Figure 3.16. The wavelet and multiresolution analysis The Fourier transform provides the global frequency decomposition of the image signal. In certain problems, a localized function may be required for good time resolution for feature estimation. An important development in signal processing was that of the wavelet. Mallat [84] was the first to show that wavelets form a powerful basis for multiresolution theory. Multiresolution theory incorporates ideas from subband coding, quadrature filters and pyramidal image processing.
51
(a) IRIS3
(b) Fourier power spectrum of IRIS3
(c) EYELASH2
(d) Fourier power spectrum of EYELASH2
Figure 3.16: Fourier power spectrum Physical structures in an image manifest at different scales. At a coarse resolution, the details in the image correspond to large structures. A coarse to fine approach is useful for image and signal analysis - this is the multiresolution strategy of signal processing. This is done by arranging the image information into a set of details appearing at different resolutions. The details of an image for a particular resolution is defined as the difference of information between its approximation at the current level rj and its approximation at the previous level rj−1 . Multiresolution analysis analyzes a signal at different frequencies with different resolutions of information. The continuous wavelet transform is Wψ (s, τ ) = where
Z
f (x)ψs,τ (x)dx
(3.44)
1 x−τ ) (3.45) ψs,τ (x) = √ ψ( s s s and τ are called scale and translation parameters respectively. s is used to expand or
52 dilate the function while τ is used to move the function to different locations in space. This, effectively, provides a localized scale variant analysis. The wavelet function is of finite length (compactly supported). Scale in the wavelet transform is represented by s. High scales correspond to a non-detailed global view of the signal or image while low scales correspond to a detailed view. With respect to frequency, low frequencies (high scales) correspond to global information in the signal. High frequencies (low scales) correspond to detailed data of the pattern. Scaling dilates or compresses a signal - large scales correspond to dilated signals and small scales to compressed signals. If s > 1 then f (sx) is a compressed version of f (x). It is a dilated version if s < 1. To apply the wavelet transform to digital images and to implement it in a computer, a discrete wavelet transform (DWT) is required. The DWT uses filters of different cutoff frequencies to analyze a signal at multiple scales. The resolution is changed by the filtering operations and scaling is controlled by upsampling or downsampling the signal. When we upsample, we add new samples to a signal to increase its sampling rate. Downsampling refers to reducing the sampling rate by removing some of the samples. Let us denote our 1D signal by x(n). The DWT begins by filtering x(n) with a half-band low-pass filter l(n). This corresponds to convolving x(n) with the impulse response of l(n): x(n) ∗ l(n) =
+∞ X
k=−∞
x(k) · l(n − k).
(3.46)
l(n) removes all frequencies above half of the highest frequency in the signal. The result of the convolution is then downsampled by removing every second sample - this is referred to as subsampling by a factor of 2. This reduces the number of points in the signal by 2 and doubles the scale. The above operation removes high frequency content. Half-band filtering removes half the frequencies, which is loss of half of the information. Therefore, we halve the resolution. We represent the entire process as y(n) =
+∞ X
k=−∞
x(k) · l(2n − k)
(3.47)
The full DWT separates a signal into different frequency components and can be computed using a cascade of half-band high- and low-pass filters. Outputs for the
53 filters h (high pass) and l (low pass) are given by: aj+1 (n) =
+∞ X
k=−∞
dj+1 (n) =
+∞ X
k=−∞
aj (k) · l(2n − k)
(3.48)
aj (k) · h(2n − k)
(3.49)
The aj are the approximation coefficients used for the next scale of the transform while the dj are the wavelet (detail) coefficients from the high-pass filter. The DWT works on signals that have length equal to a power of two. At scale j + 1 there are half the number of elements as scale j. The DWT can be performed until only 2 elements remain. We achieve scaling by subsampling the signal; multiresolution is achieved by the filtering operations. The detail coefficients will have high amplitudes for the prominent frequencies in that part of the signal that contains these frequencies. Non-prominent frequencies will have low amplitudes. The DWT is extended to a 2D digital image f (x, y) by filtering across the rows and then the columns of the result. This is followed by downsampling to achieve the effect of scaling. Figure 3.17 shows a single pass (iteration) of the DWT on an image. The symbol ↓ implies downsampling by a factor of 2 at each scale. The filtering and
downsampling operations can be represented using 2 × 2 kernels. Convolving every
2 × 2 non-overlapping block of pixels produces a result equivalent to applying a set of
filters across the rows and columns and then subsampling. Kernels for a generalized Haar transform are shown in Figure 3.18. A complete image decomposition process is show in Figure 3.19. For completeness, we reproduce the Haar decomposition process (Chapter 2) in Figure 3.20. Features computed from the wavelet (dj ) and approximation (aj ) coefficients are: 1. Mean: N 1 X MEAN = 2 p(i, j) N i,j=1
(3.50)
The mean of the wavelet coefficients is the same as the common statistical property of a data set. Data in most real world scenarios tend to cluster around the mean.
54 - h
f
- 2↓
H(di)
- 2↓
L(ai)
- g
- h
- 2↓
- LH(d
- g
- 2↓
- LL(a
i+1 )
i+1 )
Figure 3.17: DWT tree 1
1
-1 -1
-1
1
1
-1
1
1
1
-1
1
-1
1
1
Figure 3.18: Haar kernel operators LL, LH, HL, HH 2. Absolute average deviation (AAD): AAD =
1 X | p(i, j) − µ | N 2 i,j=1
(3.51)
AAD is similar to standard deviation. It measures the dispersion of the wavelet coefficients around the mean µ. 3. Wavelet energy: N 1 X 2 p (i, j) ENER = 2 N i,j=1
(3.52)
Wavelet coefficients are a measure of the intensity of the local fluctuations in a signal at a particular scale and resolution. A regular signal has coefficients with negligible values; signals with large variations have larger wavelet coefficients. High energies correspond to high frequency variations in the signal. This provides a measure of variations in texture patterns at different resolutions and bandwidths. where p(i) is a coefficient at level j and N is the number of coefficients at level j. Gabor filtering A popular method for texture feature extraction is the multichannel filtering approach using Gabor filters. This technique is able to exhibit some characteristics of the human
55
8 × 8 input image
8
8
3
4
6
5
7
8
8
8
3
2
3
4
6
5
8
8
8
8
3
1
2
4
8
8
8
8
2
5
4
3
8
1
2
2
1
5
4
5
7
2
2
1
3
1
2
1
4
5
3
1
1
2
3
2
1
2
3
4
3
3
2
2
?
Split image into 2 × 2 sub blocks and convolve each sub block with the kernel
8
8
3
4
6
5
7
8
8
8
3
2
3
4
6
5
8
8
8
8
3
1
2
4
8
8
8
8
2
5
4
3
8
1
2
2
1
5
4
5
7
2
2
1
3
1
2
1
4
5
3
1
1
2
3
2
1
2
3
4
3
3
2
2
?
Result of convolution: Each sub block has been filtered and downsampled
0
0
0
0
0
0
1
1
-12
-1
2
0
2
-1
1
-1
?
Combine the results to produce the 4 × 4 output matrix (wavelet coefficients). The output dimensions are half that of the input dimensions - this is the effect of downsampling by a factor of 2.
0
0
0
0
0
0
1
1
-12 -1
2
0
1
-1
2
-1
Figure 3.19: A single pass of a 2D DWT transform using the 2×2 HL Haar kernel
56
LL
HL HL
LH
HH
LH
HH
Figure 3.20: Haar decomposition process visual system. It uses a multiresolution system to extract information that describes different characteristics of an image. These characteristics, as a whole, provide a description of image texture. The visual cortex of a human has various cells performing different types of processing. They have spatial frequency and orientation selectivity when processing image signals. These cells can be viewed as mechanisms and detectors tuned to particular frequencies and orientations. Each such detector is a single channel for signal processing. A set of such mechanisms tuned to a number of different frequencies and orientations constitutes a multichannel filtering system that models the human visual system. The Gabor function can be implemented as a multichannel filter that mimics the human visual system. Two important aspects must be considered before designing a bank of Gabor filters. Firstly, the functional form of the set of filters must specified. In addition, parameters must be carefully selected so that the filter is tuned to detect the different features and structures present in the image. Secondly, the filter outputs must undergo feature extraction in order to improve the feature set. In the spatial domain, the Gabor function is a Gaussian modulated sinusoid. The complex Gabor filter impulse response is µ ¶¾ µ ¶ ½ 1 x2 y2 j2πx 1 exp exp − + h(x, y) = 2πσx σy 2 σx 2 σy 2 λ
(3.53)
57 where j=
√
−1
In the spatial-frequency domain, its representation is #) "µ ( ¶2 1 σx2 + v 2 σy2 H(u, v) = exp −2π 2 u− λ The frequency of the sinusoids is
1 . λ
(3.54)
The spread of the Gaussian in the x and y
directions is controlled by σx and σy respectively. The frequency bandwidth of the filter is represented by Bf and the angular bandwidth by Bθ . The x and y coordinates can be rotated spatially by a value θ to produce a filter for different orientations: 0
x = x cos θ + y sin θ 0
y = −x sin θ + y cos θ
(3.55) (3.56)
and substituting x0 for x and y 0 for y. By variation of θ in Equations 3.55 and 3.56, the filter can be tuned for orientation selectivity. σx can be computed by setting the frequency cutoff to -6 db and σy by setting the cutoff in the angular direction to -6 db as suggested in the literature [34]: σx =
√ ln2(2Bf + 1)
σy =
√
2π Bf (2 λ
√
− 1)
ln2 ¡ ¢ 2π tan B2θ λ
√
(3.57)
(3.58)
Generally, Bf and Bθ are selected to match psycho-visual data. A common technique is to use the real component of the complex Gabor function for filtering µ ¶ ½ ¶¾ µ 1 1 x2 y2 2πx g(x, y) = exp − + cos 2πσx σy 2 σx 2 σy 2 λ
(3.59)
This function has two symmetrically located Gaussians in the spatial frequency domain. The spatial-frequency representation is ( "µ #) ¶2 1 σx 2 + v 2 σy 2 G(u, v) = exp −2π 2 u− λ "µ ( #) ¶2 1 u+ σx 2 + v 2 σy 2 + exp −2π 2 λ
(3.60)
58
Symmetric Gabor surface (φ = 0)
Asymmetric Gabor surface (φ =
Π ) 2
Figure 3.21: Gabor surfaces with θ = 0, Bf = 1 and Bθ = 30
Jain [12] and Kruizinga [70] implement filters with the following form:
¶ ¶¾ µ µ ½ y2 2πx 1 x2 + +φ cos r(x, y) = exp − 2 σx 2 σy 2 λ
(3.61)
where φ is the phase offset. If φ ∈ {0, π} the filter is symmetric; if φ ∈ { π2 , − π2 } the
filter is antisymmetric. The parameters for the function are the same as discussed
above. Figures 3.21 - 3.23 show some surfaces, 2D plots and kernels of a Gabor function using Equations 3.57, 3.58 and 3.61. Let G(x, y) be a Gabor function, with form analogous to any of those discussed above. The response of G(x, y) to a continuous input image f (x, y) is computed by
59
λ=2
λ=4
Figure 3.22: Gabor 2D plots (cross section through y = 0 with φ = and Bθ = 30)
Pi , 2
θ = 0, Bf = 1
Figure 3.23: Gabor kernels (clockwise from top left): 0◦ , 45◦ , 90◦ and 135◦ angles. Note the orientation of the ”stripes”
60 evaluating R = G(x, y) ∗ f (x, y) ZZ G(x, y)f (x, y)dxdy =
(3.62) (3.63)
Ω
where R is the filter output and Ω the image domain. The operator ∗ denotes convolution. For a discrete image with M rows and N columns, the integral is evaluated
using R=
−1 N −1 M X X
G(i, j)f (i, j)
(3.64)
i=0 j=0
where G and f are discretized functions. The Gabor filter response R undergoes further processing for feature extraction. Gabor filters must be tuned to detect the features in an image. For this purpose, the parameters must be carefully selected for filter design. Methods for Gabor filter selection and parameter estimation may be supervised or unsupervised. Bovik et al. [5] use empirical information based on the analysis of the power spectrum of individual textures. Filter locations are determined by significant peaks and dominant directions in the frequency domain for oriented textures. The lower fundamental frequencies discriminate periodic textures. If the texture is non-oriented, the center frequencies of the two largest maxima are suggested. This approach is unreliable since images will contain different peaks in the power spectrum. Dunn and Higgins [40] select optimal filter parameters based on known sets of textures. They identify boundaries between two textures using a single filter. The filter is selected using an exhaustive search to find the center frequency and then applied to an image to partition it. This is not effective since a single filter will not be able to identify variations of the known sample textures. The most common approach for parameter estimation is the automated approach of filter banks [10, 12, 43, 68]. Parameters are specified ad hoc and the filters are created so that they provide a reasonable coverage of the spatial-frequency domain. This avoids computing texture dependent parameters. An important aspect of Gabor filtering is the post processing of filter outputs for feature extraction. A number of methods are mentioned in the literature [34, 92]. They include:
61 1. Magnitude response: The simplest feature extraction method is to analyze the magnitude response of a set of filters [5]. Filters matching a particular texture will have large magnitudes. Their outputs are negligible when they do not match the texture. 2. Spatial smoothing: Outputs from a Gabor filter are generally sensitive to the noise in an image . Bovik et al. [5] apply spatial smoothing to the filter output using a Gaussian with a spatial extent greater than that of the filter. If the Gaussian of the filter is χ(x, y), then the smoothing function is χ(γx, γy) with 2 3
recommended for γ. Clausi [34] shows experimentally that spatial smoothing
improves the performance of the filter for feature extraction. 3. Real component: The real component, g(x, y), is implemented as an evensymmetric bank of filters by Jain and Farrokhnia [10] and Jain et al. [12]. This reduces the computational burden and their experiments produce reasonable results. 4. Sigmoidal thresholding function: Jain et al. [10, 12] subject the filter outputs to a nonlinear transformation: ϕ(R) = tanh(αR) =
1 − e−2αR 1 + e−2αR
(3.65)
with an empirical value of 0.25 for α. This function changes the sinusoidal variations in the filtered image into square variations. Effectively, it behaves as a blob detector. 5. Gabor energy: The outputs of two real Gabor filters that differ in phase can be combined to yield a measure called the Gabor energy (E) [70]. Specifically, one filter is symmetric and the other antisymmetric: q 2 2 E = Reven + Rodd
(3.66)
The filtering and post processing produce feature images and the data in these images can be used directly as features for texture discrimination. In addition, each feature image Fk can be processed further for feature extraction. The following features can be derived for a point in Fk centered within a square window of width W:
62 1. Statistical measures: Statistical measures can be derived from the feature images. They include the mean, standard deviation and AAD. 1 X Fk (x, y) N x,y∈W 1 X ABS MEAN = | Fk (x, y) | N x,y∈W X 1 VAR = [Fk (x, y) − MEAN]2 N − 1 x,y∈W √ STD DEV = VAR 1 X AAD = | Fk (x, y) − MEAN | N x,y∈W MEAN =
(3.67) (3.68) (3.69) (3.70) (3.71)
2. Energy measure: In addition to deriving statistical features, we can compute a measure that is indicative of the energy in the variations of the Gabor responses: ENER =
1 X [Fk (x, y)]2 N x,y∈W
(3.72)
where N is the number of points in the region W . The literature [12, 34] suggests using Gaussian weighted windows in the above computations. 3.5.4
Model-based approaches
The model-based approach aims to describe texture by constructing an image model with particular parameters. This model can also synthesize texture. Markov random fields A Markov random field (MRF) texture model represents the global intensity distribution of an image as the joint probability distribution of local conditional systems of each pixel in the image. The image intensity pixel depends only on a set of neighbourhood pixels. These neighbouring sites interact with each other and capture information for image modelling. The MRF represents global knowledge in terms of local information provided by a neighbourhood structure. We will associate a digital image with a random process X that contains random variables X(i, j). (i, j) are pixel indices in the I × J lattice system S. The notation
63 will be simplified so that (i, j) can be indexed as s = 1, 2, ..., I × J i.e. X(i, j) = X s .
Elements in S are referred to as sites. When we define the process X (a random image) on the lattice system S (pixel sites), an event may occur at s. In our case, this event on site s will represent an intensity level of a pixel and will assume discrete values in a set G of k labels (intensity values). The local relationship between a pixel and its neighbouring pixels is dependent on the neighbourhood system N for S, defined as N = {Ns | s ∈ S}
(3.73)
The set of sites that neighbour s is denoted by Ns . The neighbourhood system has the following properties: 1. s 6∈ Ns 2. s ∈ Nr ⇔ r ∈ Ns The first property states that a site is not a neighbour of itself. The second property says that the neighbourhood relationship is mutual. The neighbourhood system, when defined, has a particular order (o). An oth order system is Nso = {r ∈ Ns , 0 < d(s, r)2 ≤ o}
(3.74)
where d(s, r)2 is the Euclidean distance between sites s and r. The order describes the extent and size of the neighbourhood. Sites that occur at the boundary of the lattice have fewer neighbours than interior sites. Hence, boundaries are assumed to be toroidal to accommodate these pixels. Neighbourhood systems up to order 5 are shown in Figure 3.24. A clique c for (S, N ) is defined as a subset of sites in S in which every pair of sites are neigbours. A clique can consist of a single site, a pair of neighbouring sites or even a triple of neighbouring sites and so forth. We will denote C as the set of all cliques for (S, N ). Figures 3.25 and 3.26 show first and second order cliques respectively. Let Ω be the set of all possible configurations of X defined as Ω = {ω = (xs1 , ..., xsIJ ) | xsi ∈ G, 1 ≤ i ≤ IJ}
(3.75)
64 5
4
3
4
5
4
2
1
2
4
3
1
s
1
3
4
2
1
2
4
5
4
3
4
5
Figure 3.24: Neighbourhood systems
Figure 3.25: 1st order cliques We shall denote the event {Xs1 = xs1 , ..., XsIJ = xsIJ } as {X = ω} for simplicity. We call X a Markov random field with respect to N if
P (X = ω) > 0 for all ω ∈ Ω
(3.76)
P (Xs = xs | Xr = xr , r 6= s) = P (Xs = xs | Xr = xr , r ∈ Ns )
(3.77)
and
for all s ∈ S and (xs1 , ..., xsIJ ) ∈ Ω. The expression on the left hand side of Equation
3.77 is the local characteristics of the MRF. Positivity of the joint probability dis-
tribution is ensured by Equation 3.76. The MRF model uses cliques to estimate the local conditionals with respect to N . Critical to successful modelling of the texture is the choice of order for N and the form of the global and local distributions. In image texture, the repetitive nature of patterns implies that the grey level of a pixel is dependent and related to the intensities of the neighbouring pixels. The MRF is a way to model this local dependence so that textures can be described. The Markovianity property depicts the local characteristics of image X. Image processing applications using MRFs include texture synthesis [30], image
65
Figure 3.26: 2nd order cliques restoration [52] and segmentation and classification [29, 55, 102]. We are concerned with texture feature extraction. A MRF can be specified using the conditional probabilities approach P (Xs = xs | Xr = xr , r 6= s) or the joint probability approach P (X = ω). There are two common methods for feature extraction using MRF - the
Gibbs distribution approach and the Gauss Markov random field approach. A Gibbs distribution is specified by an energy function E(x) and can be written as P (X = x) =
1 exp(−E(x)) Z
(3.78)
where the partition function Z=
X
exp(−E(x))
(3.79)
x
is a normalizing constant and includes a summation over all the possible configurations of X. The energy function E(x) takes the form E(x) =
X
Vc (x)
(3.80)
c∈C
The energy is the sum of clique potentials Vc (x) over all possible cliques c. The local configuration of the clique c determines the value of Vc (x). A particular function Vc (x) can incorporate a priori contextual information into the model. The Hammersley-Clifford theorem [17, 52] establishes the equivalence between the local property of a MRF and the global property of the Gibbs random field (GRF). Let X be a random process for a lattice system S with neighbourhood system N as defined earlier. This theorem states:
66 Theorem 1 X is an MRF on S with respect to N if and only if X is a GRF on S with respect to N. From this theorem, the joint distribution P (X = x) can be explicitly specified using the energy function of the GRF. The Derin-Elliot model [55] and the auto-binomial model [17, 30] model textures using a Gibbs random field. They utilize single pixels and pairwise pixel cliques in the second order neighbourhood of a site. The conditional probabilities are P (xs | Ns ) =
1 exp(−w(xs , Ns )T θ) Zs
(3.81)
exp(−w(g, Ns )T )
(3.82)
where Zs =
X g∈G
The energy is given by
MN
1X w(xs , Ns )T θ E(x) = 2 s=1
where
w(xs , Ns ) = [w1 (xs ), w2 (xs ), w3 (xs ), w4 (xs )]
(3.83)
(3.84)
and θ = [θ1 θ2 θ3 θ4 ]. The two models differ in their definitions of w. In the DerinElliot model wi (xs ) = I(xs , xs−r ) + I(xs , xs+r ) where 1 ≤ i ≤ 4
(3.85)
and for the auto-binomial model wi (xs ) = xs (xs−r + xs+r ).
(3.86)
The set of neighbouring sites are defined by r, which is an offset to a neighbouring site, while I(a, b) is defined as − 1, if a = b I(a, b) = 1, otherwise
(3.87)
Estimation of θ for a texture window produces a features vector that characterizes it. These parameters describe the Gibbs distribution of the texture. The Markov process can also be described by a symmetric difference equation [116] X(c) =
X
βc,m [X(c + m) + X(c − m)] + ec
(3.88)
67
j’
f
g
i’
d’
b
c
h
e’
a’
s
a
e
h’
c’
b’
d
i
g’
f’
j
Figure 3.27: Symmetric GMRF clique pairs This is the Gauss Markov random field (GMRF) that describes the local relations of clique pairs to the center cell . ec is a zero mean Gaussian distributed noise (estimation error), m is an offset from the center cell c and βc,m are parameters that weigh a pair of symmetric neighbours to the center cell. The m belong to a set of shift vectors that correspond to the order of the model. Shift (offset) vectors for the second order model are {(0, 1), (1, 0), (1, 1), (−1, 1)}. Symmetric clique pairs are
shown in Figure 3.27. The βs and ec form the features vector that describes the Markovian properties of the texture and govern the spatial interactions. GMRFs for texture characterization are discussed in [29, 102]. A region R of size w × w is defined together with the order of neighbourhood (N ).
For every pixel wij in R, its neighbouring pixels up to order N describe a spatial interaction with the pixel. These spatial interactions for all wij s in R are modelled using the Gauss model described above. We can represent the equation in matrix notation: X(c) = β T Qc + ec β is a vector consisting of all the βc,m and Qc is a vector defined by X(c + m1 ) + X(c − m1 ) X(c + m ) + X(c − m ) 2 2 Qc = X(c + m3 ) + X(c − m3 ) ... The βs can be estimated using a least squares approach: # #−1 " " X X Qc X(c) Qc QTc β= c∈N
c∈N
(3.89)
(3.90)
(3.91)
68 For every pixel in R that is analyzed, a neighbourhood N is defined and a Qc is computed. Then β is estimated and ec computed, which provides the texture features. Features can also be computed by a modified method based on the one presented in [45]: fj =
1 X | X(c) − βj Qcj | w2 c∈R
(3.92)
Prior to computing the texture features, the intensity image must be transformed to a zero mean image. To do this, for every pixel xij in the intensity image, compute a local mean µ in a windowed region centered at xij . Then the new value of xij in the zero mean image is xij −µ. The least squares estimate is sensitive to outliers. A smoothing
operation or outlier removal is recommended for improved feature extraction. Fractal models
The fractal texture model [94] determines a correlation between surface roughness and the fractal dimension of the texture. A fractal is an object or quantity that displays a sense of self-similarity at different scales. Mandelbrot [86] proposed fractal geometry and its appearance in shapes and surfaces of the real world. A deterministic fractal has its definition based on the concept of self-similarity. Assume we have a bounded set S in Euclidean n-space. S is self-similar when S is the union of Nr distinct (non-overlapping) copies of itself where each copy has been scaled down by a ratio r. The fractal dimension is defined using Nr and r as D=
log Nr log 1r
(3.93)
D provides an estimation of the roughness of a surface, where rougher textures are associated with a larger fractal dimension. It has been proposed for characterizing textures. Real world textures are not deterministic since they can be irregular and distorted. This makes the computation of D difficult. It can be estimated using Mandelbrot’s method [86], Fourier analysis [94] and box counting techniques [16, 60]. The discrete wavelet transform [84] can also analyze fractal properties of a signal. It has been shown in [60] that the fractal dimension may be the same for textures that are distinctly different. In these cases, it provides poor texture separability. Lacunarity [81] is suggested as a feature that differentiates textures in those instances.
69 It measures the discrepancy between the mass and the expected value of the mass of the fractal set. The autoregressive model Another common model for texture description is the autoregressive (AR) model which assumes a local interaction between image pixels in that pixel intensities are a weighted sum of surrounding pixel intensities. Consider an image f which is a zero mean random field. An AR causal model can be defined as fs =
X
θr f r + e s
(3.94)
r∈Ns
where fs is image intensity at site s, es denotes an independent and identically distributed noise, Ns is a neighbourhood of s and θ is a vector of model parameters. Pixels in the neighbourhood of s are represented by r. The model is similar to the GMRF but they differ in the way the center pixel interacts with its neighbours. The method of least squares estimation is used to compute θ, which is used as features for texture segmentation and classification. 3.6
Feature normalization
When a set of features vectors has been computed for an image, the elements must be normalized so that the numerical values have an equal effect in the region description. The different descriptors and operators produce outputs that fall within different ranges. Hence, the raw data must be standardized. Normalizing data ensures that the similarity measure for two patterns is unbiased. Two popular standardization measures are scaling and zero mean with unit variance normalization. We discuss these transforms for one dimensional data sets. Extension to n-dimensional vectors is straight forward. Scaling transforms a variable so that it lies within a particular range. If x new is the scaled variable then usually xnew ∈ [0, 1]. This new value can be computed using xnew =
xold − min max − min
(3.95)
where min and max are the maximum values in the data set that xold belongs to.
70 Normalizing a data set so that it has zero mean and unit variance is a statistical approach. The formula xnew =
xold − µ σ
(3.96)
transforms the variable xold where µ and σ are the mean and standard deviation of the data set respectively. The data points tend to cluster around the mean (µ). The normal density P r(x) that describes the standardized data set obeys P r[| x − µ |≤ σ] ' 0.68
(3.97)
P r[| x − µ |≤ 2σ] ' 0.95
(3.98)
P r[| x − µ |≤ 3σ] ' 0.997
(3.99)
We can see from above that 99.7% of the standardized data falls within the range [−3, 3] when σ = 1. Hence, we can threshold values that fall outside this range. This is particularly effective for outlier removal that may otherwise corrupt the data set or bias a classification process. 3.7
The border problem
The methods we have discussed generally process pixels lying within a texture analysis window. The center pixel and its neighbours must be analysed for texture properties. Image border pixels do not have a complete set of neighbours since a window centered at them may have portions falling outside the image region. This problem can be handled by generating a border of width n pixels around the image. This is merely a mirror reflection of the border pixels. Another way is to only process pixels that have a set neighbours that lie completely within the texture window. The latter produces an output image that has smaller dimensions than the input image since we do not process border pixels.
Chapter 4 Image segmentation In the previous chapter, we discussed feature extraction techniques for computing the texture properties of an image. The result of this process is a texture features vector x for every pixel. We are now concerned with using these patterns as input to an algorithm that will segment the image. General image segmentation algorithms are first discussed. Thereafter, we present pattern classification methods and concentrate on some important techniques for image segmentation. 4.1
Image segmentation
An important step in machine vision is image segmentation - partitioning an image into disjoint sets where each set of pixels has similar properties and corresponds to an object of interest in the real world. At the same time, the segmentation process helps reduce the amount of information that will subsequently be processed and removes useless data. The methods available for image segmentation can be divided into three general groups with regard to the way image information is manipulated for separation of objects: 1. Segmentation based on global image knowledge. 2. Edge-based segmentation. 3. Region-based segmentation. The information produced from these methods differ and so do their results. A fusion of these techniques generally provides a good segmentation result. 4.1.1
Segmentation based on global image knowledge
Global knowledge based segmentation schemes represent image data as a histogram of features [81]. It is assumed that the object of interest and the background that we 71
72 wish to separate it from have very distinct grey level distributions. Pixels belonging to similar objects in an image usually have a grey level value that is related to a constant reflection or absorbtion of light by their surfaces. Different objects will have different reflectivity and absorbtivity behaviours; similar objects will have similar grey levels. Using the above knowledge, a threshold T can be computed to segment an image into two groups - background and foreground (objects of interest). The grey levels of the background and foreground are, ideally, grouped into two modes in the grey level histogram. The threshold T generally lies between these two modes. Any grey level greater than or equal to T can be classified as foreground and any pixel less than T is the background. Thresholding is an image transform ΓT : GN ×M → B N ×M that transforms f into g as follows:
1, g(x, y) = 0,
if f (x, y) ≥ T
(4.1)
otherwise
where the input image f , with size N × M and n grey levels in the set G = {0, 1, 2, . . . , n − 1}, is transformed to a binary image g, with size N × M and grey
levels in the set B = {0, 1}, by the threshold T . Pixels of objects are represented by
g(x, y) = 1; background pixels are g(x, y) = 0. We have generalized thresholding to grey level image processing. It can, however, be extended to texture, gradient and colour information. Basic thresholding can also be modified to use multiple thresholds that produce an output image with a limited set of grey levels [81]: g(x, y) = 1 if f (x, y) ∈ D1 = 2 if f (x, y) ∈ D2 = ...
(4.2)
= n if f (x, y) ∈ Dn = 0 otherwise where Di is a subset of grey levels. This is useful when the image histogram is multi-modal. However, computing the modes and valleys in a histogram is a complex problem; it also does not guarantee a successful thresholding algorithm. The histogram may be smoothed to improve results. Most images do not have uniform illumination and a constant threshold T produces undesirable results. The histogram of these types of images does not display
73 dominant modes. To accommodate this factor, an adaptive procedure can be employed for thresholding [53]. For each pixel, T is computed as a local property of the pixel. Hence, the threshold value varies over the entire image. A sliding window algorithm, or one that divides the image into sub-images, is often implemented and a threshold is computed for each window (or sub-image): T = h(S) + c, S = {p ∈ s | p = f (x, y), (x, y) ∈ W }, c ∈ R
(4.3)
where W is a window or sub-image region in the image, S is the set of grey levels p in W and h is a function that computes a threshold based on the grey levels in W . Simple and effective functions for h include the mean or median of S. The constant c is image dependent and tunes the output of the thresholding function to a desired result. Optimal thresholding [53] is based on the approximation of the histogram of an image using a weighted sum of two or more probability densities with normal distributions. The difficulty with this method lies in estimating the normal distributions’ parameters and the assumption of the distributions being normal. The threshold is set as the closest grey level corresponding to the minimum probability between the maxima of two or more normal distributions. Figure 4.1 shows the results of some thresholding algorithms.
4.1.2
Edge-based segmentation
Edge-based segmentation uses information about feature discontinuities in an image. These discontinuities represent grey level, texture or colour borders. By locating and processing these boundaries we can perform a partial or complete segmentation of an image. Simple edge detection must be supplemented by a variety of operations for constructing region borders. Hence, edge detection is usually supplemented by edge linking and boundary construction to improve the quality of borders.
Edge detection In Section 3.5.3 we discussed spatial filters. Convolving an image with a set of these filters produces an edge image which shows the image discontinuities. The edge map
74
(a) EYELASH2
(d) IRIS3
(b) Global thresholding of EYELASH2 with T = 100
(e) Global thresholding of IRIS3 with T = 100
(c) Thresholding performed on EYELASH2 using the optimal algorithm
(f) Optimal thresholding performed on IRIS3
Figure 4.1: Thresholding provides basic information for locating region boundaries, which can be processed further for object extraction. Zero crossings of second derivatives can be used to detect edges [32]. When an edge occurs in an image, computation of its first derivative at this point will show an extremum; the second derivative at this position is zero. The zero crossing remains the same even when the edge gradient changes and can be used to locate object contours. One way to approximate the second derivative of an image is to use the LoG (Laplacian of Gaussian) operator - it is the Laplacian of an image smoothed by a Gaussian. The convolution mask of a LoG operator can be specified by ¶ µ 2 x2 + y 2 x + y2 − σ2 exp(− LoG(x, y) = c ) σ4 2σ 2
(4.4)
where c normalizes the sum of mask elements to zero and σ is the Gaussian spread.
75 The result of a convolution with this filter must be thresholded to find edges. The Canny edge detector is an optimal edge detector [27] for edges corrupted by white noise. Canny uses three criteria for optimality: 1. The first criterion is a low error rate. Edges should not be missed and non-edges should not be detected. 2. The second criterion is that the edge points should be well localized. The distance between edge pixels as located by the detector and the actual edge is to be at a minimum. 3. The third criterion is to have only one response to a single edge. Sonka et al. [81] highlight that effective Canny edge detection should be preceded by Gaussian smoothing. The Canny edge detector is robust against noise and produces edges that are a single pixel thick - a desirable result for most problems. Figure 4.2 shows the results of edge detection on some test images. Edge image thresholding An edge image provides information regarding discontinuities in the original image. The presence of noise means that there will be discontinuities that do not signify actual edges. These values must be suppressed for a better edge map. Simple thresholding, non-maximal suppression and hysterisis are useful methods that can be applied for improving the quality of the edge image [81]. Non-maximal suppression suppresses multiple responses in the neighbourhood of single boundaries. It uses information about gradient direction to perform this suppression. Multiple edge responses create a thickening where a simple contour should exist. To perform non-maximal suppression, the edge directions must be quantized in eight ways according to 8-connectivity. For every pixel with non-zero edge magnitude, the two adjacent pixels that are indicated by its edge direction are inspected. If the edge magnitude of either of these two pixels are greater than that of the pixel under inspection, the pixel is marked for deletion. Once all pixels have been inspected, the image is re-scanned and all pixels marked for deletion are set to zero. In some images, edge detection may have weak responses and simple thresholding will break up contours. The Canny edge detector uses hysteresis to address this
76
(a) EYELASH2
(b) Edges detected in EYELASH2 using the Canny edge detector
(c) Edges detected in EYELASH2 using the method of zero crossings
(d) IRIS3
(e) Canny edge detection performed on IRIS3
(f) Edge detection performed on IRIS3 using the zero crossings technique
Figure 4.2: Edge detection problem. If an edge response is above a high threshold t1 , it is a definite edge pixel. A weak response that is above a low threshold t0 and connected to strong responses is also likely to be an edge. Hence, it is preserved. The thresholds t0 and t1 are computed using the image’s signal to noise ratio [27].
Edge relaxation Edge relaxation uses context based information to improve the quality of boundaries in an edge image [4]. In the general case, it is an iterative approach that improves the edge quality at each step. Contextual information, for example, is a solitary edge without any supporting edge. A popular edge relaxation technique uses crack
77 edges (edges located between pixels). Edge context is considered at both ends of an edge. Edge patterns that can occur are examined in order to produce continuous borders. The iterative process computes an edge confidence for each edge. Each edge is associated with a particular type and its confidence is updated accordingly. At the end of the iterative process, edge confidences converge to 0 (edge termination) or 1 (the edge forms a boundary). This is used to generate the final edge map. Border tracing If homogenous regions have been determined in an image, it may be important to compute the borders that enclose them. An inner boundary is a subset of the region - the boundary pixels belong to the region. Outer boundaries are not a subset of the region. In order to trace borders, the image must be binary or labelled. Deriving an outer region boundary is useful for describing the object’s perimeter, circularity and other properties [81]. Border tracing for inner and outer boundaries uses 4- or 8connectivity to trace along the border of a region. First, the image is scanned until a region pixel P0 is located. Then the trace algorithm starts from P0 and searches its 3 × 3 neighbourhood anti-clockwise. The first pixel found with the same label is a
new boundary element Pi . Then Pi is examined and the process is repeated. When
Pn = P1 and Pn−1 = P0 , the process stops. The boundary pixels are represented by P0 . . . Pn−2 . This algorithm finds inner boundaries. It can be easily adapted to find outer borders and borders of region holes. Pixels in the sequence P may be repeated. Extended border tracing is a technique that also locates single borders between adjacent regions [93]. It uses a lookup table for tracing the borders. Border tracing can also be extended to grey level images using edge data [41]. Edge linking and boundary construction An edge map is usually noisy and contains fragmented borders. Methods exist for linking these borders to form complete contours and for finding closed curves that enclose an object. The Hough transform [53, 15] locates contours in an n-dimensional parameter space by examining whether they lie on curves of specified shape. It computes a
78 global description of features (curves) in an image. These features are described in terms of the parameters that describe the curve. It represents curves in an image f in terms of an accumulator array A, defined as follows: A : P → N where P = P1 × P2 × . . . × Pn
(4.5)
where n determines the range of parameters of an n-dimensional space P . A is determined through the computation of partial values for points of f and adding them to previous ones. Peaks in the accumulator array correspond to curves of a desired shape. The classical Hough transform is used to locate lines, circles and ellipses. Consider locating circles in an image: a general circle is defined by (x − a)2 + (y − b)2 = r2
(4.6)
where (a, b) is the circle center and r its radius. (x, y) is an image edge coordinate and it serves as a constant in Equation 4.6; (a, b, r) become points in Hough space. To find curves in (a, b, r), we discretize this space into finite intervals. This corresponds to A in Equation 4.5. Then each (x, y) transforms Equation 4.6 to a discretized curve and accumulator cells along this curve are incremented. When all (x, y) have been considered, peaks in A represent an (a, b, r) which corresponds to a circle in the image. The input image f is a binary edge map. A technique similar to the Hough transform uses the Radon transform - an integral transform - for finding desired curves in an image. Lines are expressed in the form ρ = x cos θ + y sin θ
(4.7)
where θ is the angle between the positive x-axis and the line joining (x, y) and the origin (0, 0). The smallest distance from (x, y) to the origin of the coordinate system is signified by ρ. The Radon transform for a set of parameters (ρ, θ) is the line integral through f (x, y), where the line position corresponds to the value of (ρ, θ): Z ∞Z ∞ ˜ f= f (x, y)δ(ρ − x cos θ − y sin θ)dxdy −∞
(4.8)
−∞
In a digital implementation, δ corresponds to the Kronecker delta. In the Radon domain, peaks correspond to lines in the image. Active contour models (ACM) [80] are closed parametric curves that find region boundaries by deforming under the influence of internal and external forces. To
79 delineate an object boundary, a closed curve must be placed around it and allowed to undergo an iterative process that deforms and shapes it to fit around the region border. The internal forces are related to the curve; the external forces are derived from the image and drive the curve towards the area of interest. A deformable model is governed by dynamic equations and aims to minimize an energy function. A wide variety of ACM properties can be specified through the energy function. The energy function consists of two parts: the internal and external energies. Hence, EACM = Einternal + Eexternal
(4.9)
The ACM can also be implemented as a region based model [90]. The external energy is computed based on image region properties such as texture or grey level. Boundary points of the contour can traverse large homogenous regions of the image and locate the object contour. Vector fields [25] and geodesic [111] models are also popular . Boundary detection can also be implemented using graph searching methods [53]. The border is viewed as a set of nodes connected by edges that have associated costs. The boundary detection algorithm then becomes a search for an optimal path through the weighted graph from one node to another - this path minimizes (or maximizes) the cost. A graph is a structure consisting of a finite number of nodes and edges (arcs) between the nodes (ni , nj ). Algorithms implemented generally require the weighted arcs to have an orientation component. Each image pixel corresponds to a graph node. Two nodes ni and nj correspond to two 8-connected pixels f (xi , yi ) and f (xj , yj ). The graph is constructed by connecting nodes ni and nj with arcs. The criteria for connecting two nodes depends on the edge magnitudes and orientations at these nodes. A number of algorithms use these two pieces of information to compute the border as a graph. An edge or boundary is viewed as a sequence of connected arcs. 4.1.3
Region based segmentation
Region based segmentation constructs image regions by using homogeneity criteria for region growing. Regions are grown by merging pixels. The criteria of homogeneity can be based on image grey level, color or texture features. The result of a region based segmentation algorithm is influenced by the properties that describe the pixels.
80 If region properties have been effectively computed, a good segmentation result will be produced else it will be poor. Image regions must satisfy the following conditions: H(Ri ) = T RU E for i = 1, . . . , S
(4.10)
H(Ri ∪ Rj ) = F ALSE where i 6= j and Ri adjacent to Rj
(4.11)
where S is the number of regions in the image and H(Ri ) is a binary homogeneity evaluation of the region Ri . Region merging (growing) Region merging groups pixels or subregions into larger regions based on predefined criteria [53]. It begins with a set of seed points. Regions are grown by attaching to each seed point neighbouring pixels that have features similar to the seed. A set of one or more seed points must be selected. This is usually problem dependent since a priori knowledge can be used to select a set of points that produce good results. When a priori knowledge is unavailable, a general procedure is to compute a features vector for every pixel. This set of features is used to assign a pixel to a seed point during the growing process. If these features vectors are distributed as clusters of values, pixels near the cluster centroids can be selected as seed points. Assignment of a pixel to a region or cluster requires a measure of similarity. The pixel will be appended to the most similar seed point. The region growing process can also incorporate connectivity and shape analysis for improved segmentation. The growing process stops when no more pixels can be included in the region. Region splitting An alternate technique for region based segmentation is to initially view the entire image as a single region and then begin splitting the regions in the image until a stopping criterion is achieved [81]. This splitting process is done sequentially on existing image regions. Region splitting methods sometimes use the same homogeneity criteria as region merging. They differ only in the direction of the processing. Even if the homogeneity criteria is the same, the results from region merging and that of region splitting can differ. The criteria for region splitting can also include cluster analysis, shape analysis and pixel classification.
81 Split and merge algorithms Region merging combines pixels to create homogenous objects; region splitting breaks up a set of pixels to produce homogenous regions. Split and merge algorithms [104] combine these two techniques in order to improve the segmentation result. The image is divided into a number of disjoint regions and then the algorithm merges and/or splits regions to satisfy the homogeneity criteria. These algorithms work using hierarchical pyramid image representations. If a level of a pyramid is not homogenous (excluding the lowest level), it is divided into four regions that are elements of higher resolution at the level below. If four regions at a level of the hierarchy have approximately the same homogeneity, they are merged into a single region in an upper level of the hierarchy. The pyramid structure corresponds to a segmentation quadtree where each leaf node is a homogenous region and is an element of some pyramid level. The number of leaf nodes corresponds to the number of segmented regions once the algorithm has executed. Splitting and merging is analogous to removing or building parts of the segmentation tree.
4.1.4
Watershed segmentation
Segmentation by watersheds is based on viewing an image as a topographic system in three dimensions: two spatial coordinates and grey levels. In topography, watershed lines divide individual catchment basins. By viewing grey levels (or any other feature) as altitudes, low altitudes will correspond to catchment basins and high altitudes as watershed lines. The catchment basins are homogenous regions - all the pixels in that region are connected to the basin’s region of minimum altitude. Watersheds for digital image processing using mathematical morphology were investigated in [18, 87]. These methods, however, were computationally intensive. Vincent and Soille [112] present an efficient, fast and practical algorithm. The strategy they present starts from the altitude minima. Let us suppose that a hole is made in each regional minima and the entire elevation grid (topography) is flooded by letting water rise through the holes in the minima from below at a constant rate. When the water from different basins are about to merge, a dam or wall is built to prevent the merging. Eventually, when the flooding is complete, only the tops of the dams will be
82
.......... .... ....... .... ... ... ... ... .... ... ... .... ... .. ... ... ... ... ............. ... .... ....... ... ... ... ... ... ... .. ... . ... . ... ... . .. ... ... .. .. . . . . . . . . . . . . . . . . . . . . ... ... . . .... ........ . . . . . . . ... ... .... . . ... . . . ... . . . ... ... . . . . ... ... . . . . . ... ... . . . .. . ... . . . ... ... . . . .. ... . . . . . ... ... . . ... .. . . . . . . ... ... .. . ... .. . . . . . ........... ... . ... .. . . . ... ... .. .. . . . ... ... . . . .... ... ... . . . .... . ... ... .. . .... . . ... . . ...... ........ ... .............. .... ... ... ... ... ... .. .... ... ... ... .... ... ......... ........... ... ........ ... ..
Figure 4.3: A scan line of an image showing altitudes. The vertical lines at the peaks are dam walls visible above the water’s surface. These dam lines are the divide lines of the watersheds and are continuous boundaries. Watershed segmentation is generally applied to the gradient of an image. The performance of the system depends largely on the algorithm that computes the image gradient. Figure 4.3 shows a scan line through an image with its altitudes and dam walls that have been constructed.
4.2
Pattern classification for region based segmentation
In the previous section, we discussed various segmentation techniques. In our work, we aim to segment iris images using primarily a region based approach - pattern classification methods are used to merge pixels into homogenous regions. Each pixel is described by a pattern x that has been computed during the feature extraction process. Similar pixels (patterns) will be merged and the homogeneity criteria for regions will be satisfied using similarity measures for pattern comparison. Hence, a homogenous region will consist of features vectors that are highly similar while the regions themselves will have patterns that are dissimilar. In addition, we will identify regions using a classifier. This will enable us to discard irrelevant image regions. Pattern classification is the classic problem of finding structure in unlabelled data.
83 The unlabelled data in our problem are the image pixels. By labelling them, we can segment the image. The problem of classification can be stated as follows: given a features vector x, predict a label y ∈ {1, 2, . . . , k} that assigns x to class y such that
the probability of error is minimum. We must compute a minimum error classification function g such that y = g(x)
(4.12)
Classifiers running in supervised mode are trained to recognize particular patterns. They have parameters that have been computed using a particular set of labelled patterns and are, thus, able to classify unknown patterns using prior knowledge of the problem domain. Robust systems use training for classifier construction. This helps minimize the misclassification rate. We are concerned with finding decision boundaries - a boundary in pattern space that separates the training samples with the lowest error of classification. This decision boundary is then used to classify an unknown features vector x. Figure 4.4 shows two classes with a simple decision surface corresponding to a line. Patterns below the line fall into class 2 while patterns above the line are in class 1.
4.3
Pattern classification methods
We discuss here the following important and relevant pattern classification methods: 1. Neural networks. 2. Syntactic pattern recognition. 3. Fuzzy systems. 4. Linear discriminant functions. 5. Statistical pattern recognition. 6. Clustering. Other techniques include Boltzmann learning and evolutionary methods [99].
84 6
Class 1 oo o ooo oo oo x xx x
...................... ...................... ...................... ...................... ...................... ...................... ...................... ...................... ...................... ...................... ...................... ...................... ...................... ...................
x xxx
x
x
Class 2 -
Figure 4.4: A simple 2 class decision surface. Patterns below the line fall into class 2 while patterns above the line are in class 1 4.3.1
Neural networks
Neural networks aim to mimic the behaviour of the neurons of the human nervous system. A neural network (NN) consists of combinations of basic processors (neurons), each of which has single or multiple inputs and a single output. Neurons can be interconnected to form many layers. A NN takes as input a pattern (a vector of numerical elements) and produces a decision (class) as output. The perceptron is the simplest NN and learns a linear decision function for recognizing two classes [100]. In 2 dimensions these patterns may be separated by a line; in higher dimensions a hyperplane separates them. Its response is based on a weighted sum of its inputs and a bias, and an activation function that maps the sum to the final output. The bias usually has a value of 1. The perceptron may have a single or multiple inputs. The weights are tunable parameters for a particular classification problem. Supervised training iteratively adjusts these weights using input vectors and produces the desired mapping for the required output. The training algorithm usually begins with initial weights and iteratively adjusts them until all training patterns are correctly classified or a maximum number of iterations have been reached. The activation function limits the output of the neuron. Activation functions may be thresholding functions, piece-wise linear, logistic functions or hyperbolic tangent functions to name a few. Perceptrons can be combined to produce more complex
85
x
x1 ............. x2 .................................................................... ............ . ............ .............. . ............ P ..... .
...... ........... ............ .............. . ............ ........... ................. . . . . . . . . . . . . n−1 ... .................... ....... .........
xn
- f
-
s
b
Figure 4.5: A single neuron networks for complex functions. Figure 4.5 shows a single neuron with b denoting the bias and f the activation function. The most popular type of NN is a multilayer NN that performs multiclass pattern recognition [99]. It is also known as a feedforward neural network. It consists of an input and output layer. Contained between these two layers are multiple hidden layers with a number of neurons in parallel. The output of a neuron in one layer feeds into the input of every neuron in the next layer. The number of neurons in the output layer is equal to the number of classes that the NN has been trained to identify. The number of neurons in the input layer is usually equal to the dimension of the features vector. The activation function has a sigmoidal form. Training is the same as the iterative process described above. A highly popular training method is the error back-propagation algorithm. It uses a forward and backward pass through the layers for weight adjustment. Self-organizing maps (SOM) are based on competitive learning [66]. Neurons are placed at the nodes of a 2D lattice; higher dimensions are also possible. During the learning process the neurons become tuned to various input patterns and winning neurons tend to be ordered in their locations. Hence, a topographic map of the input patterns is formed. The spatial locations of the neurons correspond to intrinsic features of the input patterns. 4.3.2
Syntactic methods for pattern classification
An object’s structure can be contained in a syntactic description. This is a qualitative description rather than quantitative (numerical). Object descriptions (eg. boundaries) can be coded as strings and these strings can be compared by counting the number of symbols that match. Syntactic pattern recognition uses a set of pattern primitives, a set of rules (called a
86 grammar) that governs their interconnection and an automaton for recognizing strings (patterns) [71]. The structure of the automaton is determined by the grammar. An interconnection of primitives is called a word and represents the pattern. The set of words from a particular class is called the formal language. It is described by a corresponding grammar. The grammar can be represented by a mathematical model that generates syntactically correct words from the specific language. A grammar is the quadruplet G = (Vn , Vt , P, S)
(4.13)
where Vn = the set of non-terminal symbols Vt = the set of terminal symbols P = is a set of substitution rules S = is the grammar start symbol Terminal symbols are letters from the alphabet of symbols (primitives). A pattern belonging to a language generated by a grammar is recognized using an automaton. An automaton, A, determines whether a pattern belongs to the language with which the automaton is associated. It is a quintuplet A = (Σ, δ, Q, q0 , F )
(4.14)
where Σ = a finite input alphabet δ = a mapping from Q × Σ into the collection of all subsets of Q Q = the set of finite states q0 = the starting state F = the set of final (accepting) states A pattern (word) is recognized by an automaton if, starting in state q0 , the sequence of symbols, read as it is scanned from left to right, causes the automaton to be in a final state after the last symbol is read. δ provides a mapping from one state to
87 another as a symbol is scanned. String grammars can also be extended to include tree descriptions of patterns. String grammars are learned directly from sample patterns. This process is referred to as grammatical inference. 4.3.3
Fuzzy systems
Fuzzy systems describe uncertain and imprecise data. It is based on the theory of fuzzy logic [19]. Fuzzy logic allows the simultaneous inclusion of patterns in different fuzzy sets with varying degrees of membership. A fuzzy set S in a fuzzy space X is a set of ordered pairs S = {(x, µS (x)) | x ∈ X}
(4.15)
where µS (X) is the degree of membership of x in S and is called the membership function. Most classification methods have a hard partition - a pattern can belong to only one set (class). Fuzzy sets allow soft partitions - a pattern can belong to multiple sets with varying degrees of membership. Fuzzy pattern recognition uses fuzzy set operators for combining different fuzzy sets and computing their combined membership function - these operators include fuzzy intersection, fuzzy union and fuzzy complement. Fuzzy reasoning enables information contained in individual fuzzy sets to be used for decision making. A fuzzy solution space is defined using the method of composition - it is the functional relationship that determines the degree of membership in related fuzzy regions. De-fuzzification is performed to arrive at the decision by determining a functional relationship between the fuzzy solution space and decision. 4.3.4
Linear discriminant functions
A linear discriminant function can be used for a classification decision rule. It is linear in the components of x, or in some set of specified functions of x, and is computed by minimizing a criterion function. Its general form can be written as g(x) = wt x + w0
(4.16)
where w is the weight vector and w0 the bias. A linear discriminant function separates the pattern space by a hyperplane decision surface. Let us assume that ω1 is the set of features that lies on the positive side of the decision surface and ω2 the set of features
88 vectors on the negative side. Then for a general two-class problem, we decide ω1 if g(x) > 0 and ω2 if g(x) < 0. If g(x) = 0, x can be assigned to any class. A support vectors machine (SVM) uses a linear discriminant function for pattern classification [21, 65]. The SVM computes a decision boundary for 2 classes by projecting the vector space to a higher dimension where the separability of patterns is better than the lower dimension. The decision boundary produced is smooth and fits the training vectors. When nonlinearities are present, kernels are used to model them. The hyperplane wt x + w 0 = 0
(4.17)
that maximizes the margin within the learning set must be computed. This translates to an optimization problem whose solution is a set of support vectors that describe the hyperplane. The decision function is g(x) = sign(wt x + w0 )
(4.18)
where the sign (negative or positive) corresponds to a category. Fisher’s linear discriminant (FLD) finds an optimal plane for separating two classes of labelled data. This method can also be extended to more than two classes [99]. FLD analysis reduces an n-dimensional vector space to 1 dimension so as to provide optimal separability. This is an efficient way for dimensionality reduction and fast classification since the data can be viewed in 1D. Suppose, for a two class problem, we have a set of d dimensional features vectors X = (xi | i = 1, 2, ..., n). The dimensionality can be reduced to 1 dimension if the data is projected onto a line. This line must be oriented so that an optimal separation can be found for the samples. Suppose that n1 of the samples are in class ω1 and n2 are in class ω2 . Forming a linear combination of the components of each x we obtain the scalar y = wt x
(4.19)
and a corresponding set of n samples y1 , y2 , ..., yn divided into subsets α1 and α2 . The selection of the direction of w is important for a good separation of the ys into the subsets α1 and α2 . The separability of the projected points can be measured by the difference of the sample means. The sample mean in d-dimensional space is 1 X x mi = ni x∈α i
(4.20)
89 Hence, the sample mean for the projected points is 1 X m ˜i = y ni y∈ω i 1 X t = wx ni x∈ω
(4.21)
i
t
= w mi We can see that | m ˜1 − m ˜ 2 |=| wt (m1 − m2 ) |. This difference must be maximized
with respect to some measure of the standard deviations of each class. We define the scatter for the projected samples in αi as s˜2i =
X
y∈αi
(y − m ˜ i )2
(4.22)
s21 + s˜22 ) and where an estimate of the variance for the pooled data is provided by n1 (˜ s˜21 + s˜22 is called the total within-class scatter. We can now define the Fisher linear discriminant as the linear function w t x for which the criterion function J(w) =
|m ˜1 −m ˜2 | s˜21 + s˜22
(4.23)
is a maximum. The reader can refer to [99] for a full discussion. Si =
X
x∈ωi
and
(x − mi )(x − mi )
SW = S 1 + S 2
(4.24)
(4.25)
define the scatter matrices Si and SW . It can be shown that −1 w = SW (m1 − m2 )
(4.26)
is a solution to the criterion function for Fisher’s linear discriminant [99]. We can then use this result to derive a function of the form in Equation 4.16 for the Fisher classifier. 4.3.5
Statistical pattern classification
Bayesian decision theory is an important approach to statistical pattern classification [99]. Let us assume that we have a set of samples (patterns). We wish to assign each
90 sample x to a class. Let the class be ω, with ω = ωi being a particular category and 1 ≤ i ≤ C where the number of classes is denoted by C. There exists a prior
probability P (ωi ) that the next sample is of type ωi with the P (ωi ) summing to 1. x is considered to be a continuous random variable whose distribution depends on the state of nature and is expressed as the class conditional density function p(x | ω i ). It
is the probability density function of x given that the class is ωi .
Given x, we want to determine the probability that it belongs to class ωi . Bayes formula states
where
p(x | ωi )P (ωi ) P (ωi | x) = PC i=1 p(x | ωi )P (ωi ) p(x) =
C X i=1
p(x | ωi )P (ωi )
(4.27)
(4.28)
The formula shows that by observing a sample x we can convert the prior knowledge P (ωi ) to the posterior probability P (ωi | x) - the probability that the state of nature
is ωi given the sample x has been observed.
Training samples generally have a large class overlap and we must use a statistical approach to model this and derive an optimal classifier. The probabilities P (ω i ) and p(x | ωi ) can be estimated using the training samples. A pattern x is classified into class ωk if
P (ωk | x) = max(P (ωi | x)) for i 6= k
(4.29)
This is an optimal classifier. In most problems, p(x | ωi ) is assumed to be a normal density with mean µi and
covariance matrix Σi . This simplifies the problem from estimating p(x | ωi ) to one of
estimating µi and Σi . Two common approaches for estimating these parameters are maximum-likelihood estimation and Bayesian estimation [99]. When the underlying conditional densities of a feature space are unknown, known functional forms may not fit these distributions. Nonparametric techniques are used. They do not make any assumptions about these underlying densities. There are techniques for estimating p(x | ωj ) from training data and methods for directly estimating P (ωj | x) such as Parzen windows and kn -nearest-neighbour (KNN) estimation.
The KNN method uses the training data to guide the estimation of a probability
density function [99]. To estimate p(x) from n samples, we place a cell center at
91 x and let it expand until it encloses kn samples. The class composition of these kn samples indicates their distributions. We specify kn as a function of n. These kn samples or patterns are the kn nearest neighbours of x. The cell volume is dependent on the distance of the kn neighbours from x. If the neighbours are close, the volume is small; if they are far away, the volume is much larger. If n approaches infinity, the KNN classifier converges to the optimal Bayes classifier [99]. 4.3.6
Clustering algorithms
Clustering is a pattern classification approach that aims to create clusters, of features, that are homogenous and similar while the clusters themselves differ from each other. It is an iterative process and improves clusters at each iteration until convergence. Convergence is reached when a particular criterion is satisfied or a maximum number of iterations have been reached. We discuss the K-means (KM) and fuzzy C-means (FCM) algorithms. These are unsupervised techniques that require an input parameter that specifies the number of clusters to generate. Other methods such as hierarchical and agglomerative algorithms can be found in [11]. K-means clustering The MacQueens K-means algorithm [83] is a popular clustering algorithm in which the number of clusters (K) is known. It is an iterative process that assigns patterns to the closest cluster using a distance function. Let X = {x1 , x2 , . . . , xN } be a set
of N patterns that we wish to cluster. The K-means algorithm aims to minimize an objective function J with variables U and C by partitioning X into K clusters: J(U, C) =
K X N X
µil d(xi , cl )
(4.30)
l=1 i=1
subject to the condition K X l=1
where
µil = 1, 1 ≤ i ≤ N
(4.31)
• U is an N × K matrix and µil is a binary variable. Equation 4.31 indicates that pattern i is in cluster l.
92 • C = {c1 , c2 , . . . , cK } is a set of K prototypes representing the K clusters. • d(xi , cl ) is a similarity measure (such as the Euclidean distance) between object i and prototype l.
This optimization problem is solved iteratively: 1. Define the number of clusters K. 2. Initialize the cluster starting points ci for i = 1, . . . , K. This is done using random values. 3. Assign a pattern x to the nearest cluster ci . µil = 1 if d(xi , cl ) ≤ d(xi , ct ) for 1 ≤ t ≤ K; otherwise µil = 0 for t 6= l.
4. Recompute ci .
PN
cl = Pi=1 N
µil xi
i=1
for 1 ≤ l ≤ K.
µil
(4.32)
5. Repeat steps 3 and 4 until the centroids do not change or a maximum number of iterations have been reached. The K-means algorithm treats all variables equally in deciding the class (cluster) membership of a pattern. It can also be described as an algorithm that computes a hard partition - each pattern can belong to one and only one cluster. A disadvantage of the algorithm is its sensitivity to the prototype (cluster) initialization. However, it has rapid convergence and is computationally efficient. Fuzzy C-means clustering Fuzzy clustering allows data to belong to more than one class. This is reflected by their degree of membership in a particular cluster. The fuzzy C-means clustering algorithm was developed by Dunn [44] and improved by Bezdek [19]. It is based on the minimization of the objective function Jm =
C X N X j=1 i=1
2 um ,1≤m≤∞ ij k xi − ci k
(4.33)
93 where m is a real number greater than 1, uij the degree of membership of xi in the cluster j and xi is the ith d-dimensional data. The d-dimensional center of a cluster is denoted by cj and k ∗ k is the norm.
Fuzzy partitioning is an iterative optimization process. The membership uij and
the cluster centers cj are computed by uij = PC
1
2 kxi −cj k m−1 k=1 ( kxi −cj k )
cj =
PN
m i=1 uij ¦ xi PN m i=1 uij
(4.34)
(4.35)
The algorithm terminates when | uk+1 − ukij | is less than ², where ² is a threshold ij
between zero and one and k an iteration step. The algorithm is as follows:
1. Initialize the membership matrix U = [uij ], U 0 , and k=0. This is done using random values. 2. At the k-step, calculate the centers C k = [cj ] with U k using Equation 4.35. 3. Update k = k + 1, and U k = U k+1 using Equation 4.34. 4. If k U k+1 − U k k< ² then stop else go to step 2. For the above algorithm, the Euclidean distance measure is a common similarity measure. A disadvantage of this algorithm is the computational burden experienced with the computing the membership matrix U and the cluster centers cj . There is also a large memory requirement for storing U . Cluster validity In some clustering problems, the number of classes or clusters is known. This makes the clustering result reliable and accurate. In some cases, however, the number of clusters is unknown. In an image processing problem, a single region in an image may be represented more accurately by two or more clusters rather than one. We are, therefore, interested in the problem of cluster validity i.e. determining how many clusters are present in the data. A clustering algorithm aims to keep clusters as compact as possible while maximizing the distance between the clusters. We discuss two cluster validity indices
94 - Dunn’s index [44] and the Davies-Bouldin [42] index. These two indices identify clusters that are compact and well separated. Consider a partition P =
c [
(4.36)
Xi
i=1
where Xi is the ith cluster and c the number of clusters. Dunn’s index is defined as δ(X , X ) i j D(U ) = min min (4.37) 1≤i≤c 1≤j≤c max {4(Xk )} j6=i
1≤k≤c
The intercluster distance between clusters Xi and Xj is denoted by δ(Xi , Xj ) and the intracluster distance of cluster Xi by 4(Xi ). This measure maximizes intercluster distances while minimizing intracluster distances. Hence, large values of D correspond to good clusters. To identify a good clustering, a range of indices are computed for varying values of c. The value of c for the highest index is a good clustering result. The Davies-Bouldin (DB) index is defined as ½ ¾ c 1X 4(Xi ) + 4(Xj ) DB(U ) = max c i=1 i6=j δ(Xi , Xj )
(4.38)
A clustering configuration that minimizes the DB value is considered to be the optimal number of clusters c. Once again, we can compute different clusterings with a range of values for c and then select the clustering result with the lowest DB index. Intercluster metrics describe the distance between clusters. Single linkage defines the distance between two clusters S and T as the minimum distance between a pattern in S and a pattern in T : δ1 (S, T ) = min
(
d(s, t) s∈S,t∈T
)
(4.39)
Complete linkage defines the distance between two clusters S and T as the maximum distance between a pattern in S and a pattern in T : ( ) δ2 (S, T ) = max
d(s, t)
(4.40)
s∈S,t∈T
Average linkage computes the average of the sum of the distances between all pairs (s, t) such that s ∈ S and t ∈ T : δ3 (S, T ) =
X 1 d(s, t) | S || T | s∈S,t∈T
(4.41)
95 Intracluster distances describe the compactness of a cluster. The complete diameter metric describes the cluster as the maximum distance between any two samples in the cluster:
(
41 (S) = max d(s, t) s,t∈S
)
(4.42)
Average diameter is the average of the sum of distances between every pair of samples in the cluster: 42 (S) = 4.4
X 1 d(x, y) | S | ·(| S | −1) x,y∈S,x6=y
(4.43)
Similarity measures
The similarity (or dissimilarity) between two patterns can be measured using distance measures. Let us assume that we want to compare pattern x and y with dimensionality n. A popular similarity measure is the Euclidean distance v u n uX d(x, y) = t (xi − yi )2
(4.44)
sure. This issue can be addressed using the weighted Euclidean distance v u n uX d(x, y) = t wi (xi − yi )2
(4.45)
i=1
This measure assumes that all components contribute equally to the similarity mea-
i=1
where wi is a weighting component for the ith variable. Another well known metric is the city-block distance d(x, y) =
n X i=1
| xi − yi |
(4.46)
The Mahalanobis distance is a statistical measure between two points that are defined by two or more correlated values. Multivariate data with a Gaussian distribution tend to cluster about the mean vector µ in a bell shaped cloud whose principal axes are eigenvectors of the covariance matrix Σ. The Mahalanobis distance function incorporates this covariance matrix Σ: d(x, y) =
p x · Σ−1 · y
It is scale-invariant and takes into account the correlations of the data set.
(4.47)
96 For the above metrics, a small value implies a high similarity between the two patterns while large values correspond to high dissimilarity. In some problems, the distance measure used will not affect the final result of the classification process.
Chapter 5 Methods and results In this chapter, we present methods, experiments and results of the designed system. Image enhancement techniques, such as the edge preserving filter and multi-scale Retinex, and their results are discussed. We also provide implementation details of the RANSAC algorithm for locating eyelid boundaries. Parameter values are specified for the various feature extraction and pattern classification methods, together with details of the classifier design and region extraction algorithm. Results comparing and contrasting the different texture analysis and clustering methods are presented.
5.1 5.1.1
Experimental environment Image format
The input data for the experiments are 8-bit single band grey scale images. Grey-scale intensity values lie in the range [0, 255].These images are in Windows BMP format and have dimension 320 × 240. The origin (0, 0) of the image coordinate system is at the upper left hand corner. The positive x-axis extends to the right and the positive
y-axis to bottom of the image. Figure 5.1 shows the image coordinate system. Each cell in the grid represents a pixel with a grey scale value which, from Chapter 3, we denote f (x, y) where f is the image and (x, y) a cell (pixel) location. -
?
Figure 5.1: Image coordinate system 97
98 5.1.2
Software development environment
The algorithms for the system developed were written and compiled in the Borland C++ Builder 5.0 environment. The entire system was designed using no external libraries or image analysis plugins. However, some routines were adapted from Numerical Recipes in C [114] for matrix inversion. 5.1.3
Dataset
The images analyzed and processed during this research project were taken from the CASIA Iris Database. The database contains 108 sets of images. Each set comprises of two subsets of 3 and 4 images, respectively, acquired a month apart from the same subject. This database is available from the National Laboratory of Pattern Recognition of the Chinese Academy of Sciences. We use subsets of images for our experiments. 5.1.4
System overview
The design and implementation of a machine vision system requires an analysis of the image content and object characteristics. This helps to identify basic algorithms and processes that will take us closer to an effective system. We must consider what is required and find a transition from image pixels to a solution of a given problem. Our system for segmenting iris images has the structure shown in Figure 5.2:
99 INPUT ?
IRIS LOCALIZATION ?
IRIS IMAGE NORMALIZATION ?
IRIS IMAGE ENHANCEMENT ?
IRIS SEGMENTATION ?
OUTPUT
Figure 5.2: System overview An iris image is provided as input to the system. The iris is located in the localization step. Circles representing its outer boundaries are detected and the upper and lower eyelid contours are also found using curve fitting. By finding these boundaries, we extract the iris region. We normalize the located iris portion using a transform that makes it invariant to the size. Then the normalized image is enhanced and the segmentation algorithm is executed on it. The output is an extracted iris texture region. 5.2
Iris boundary localization
The first part of the system locates the iris boundaries. This refers to the inner and outer contours at the edge of the pupil and iris, and the edge of the iris and sclera respectively. These boundaries are modelled as circles for simplicity. They are,
100 however, elliptic at times. We then locate the upper and/or lower eyelid contours that may occlude the iris. These contours are modelled using parabolas. Before fitting these curves to the image, relevant edges in the image must be detected. Thereafter, the boundary detection is performed on these edge points. Edges in an image can be detected using the techniques discussed in Chapters 3 and 4. Prior to an edge detection scheme, it is advantageous to enhance the image contrast and remove noise. It has the following desirable results: • Edges are enhanced and become easier to detect. • The possibility of false boundaries is reduced. • Clutter in the image, caused by noise, is reduced. The boundary localization scheme proceeds as follows: • Locate edges for pupil detection. We will refer to this as PUPIL EDGE MAP. • Locate edges in the iris image for the sclera detection. This will be referred to as SCLERA EDGE MAP.
• Use PUPIL EDGE MAP to find the pupil boundary as a circle with defining parameters. We will call the circle that defines the pupil P.
• Use P to guide the location of L, the circle that defines the outer iris boundary, using SCLERA EDGE MAP.
• Use P and L to find the upper and lower eyelid curves, which we call E1 and E2 respectively.
Two edge maps are generated (PUPIL EDGE MAP and SCLERA EDGE MAP) for the circle detection to prevent the detection of false boundaries (circles). In each edge map, we search for a boundary circle denoted by the name of the edge map. First, we detect the pupil and fit a circle to its boundary using edge points in PUPIL EDGE MAP. Then, we use the pupil circle as a bounding circle for the center coordinates of the sclera and locate the sclera border using SCLERA EDGE MAP.
101 5.2.1
PUPIL EDGE MAP computation
The pupil in our iris data set occurs as a circular, sometimes elliptic, region of almost constant grey scale values in the lower range of [0, 255]. If we study the image histogram, it can be seen that a majority of the pupil pixels are represented by the mode. This is evident in almost all the images. The pupil is detected using global thresholding where the threshold value is equal to the mode of the image histogram. From an empirical analysis, we search for the mode in the range 0 to 70 of the histogram and then use it as the global threshold value. The thresholding is not perfect - it may also detect pixels not belonging to the pupil. However, the number of misclassified pixels are relatively low and is limited by the search range for the mode. Figure 5.3(a) shows an input image and (b) shows its histogram. The mode is located at grey scale value 41. In order to find the pupil edge, the Canny edge detector is applied to the thresholded image, with a lower threshold of 60 and an upper threshold of 80. The thresholded image may be noisy and contain some speckle and spots as shown in Figure 5.3(c). This, however, is suppressed by the Canny algorithm and (d) shows an image with just the pupil edge. This effectively generates PUPIL EDGE MAP.
102
(a) Input iris image.
(c) Input image thresholded for pupil detection
(b) The histogram of the input iris image with the mode located at value 41
(d) Edge map generated from input iris image using Canny edge detection
Figure 5.3: Computing an edge map for pupil detection
103 5.2.2
SCLERA EDGE MAP computation
The outer boundary of the iris (sclera) sometimes has a soft gradient. In addition, eyelashes and eyelids may occlude its contour. This impacts on the boundary detection algorithm. To address these issues, contrast enhancement and noise reduction are performed. Noise in an iris image manifests as eyelashes, specular reflection and other artifacts. The most common problem is the edge data provided by eyelashes, which can mislead a curve fitting technique. It can also make the curve fitting process inefficient by providing too many data points for the algorithm. Popular methods for noise reduction include Gauss and median filtering [81]. These methods generally blur the image in an effort to reduce noise. We use a different approach for noise filtering in order to retain important edge data that compliments our search for the sclera boundary. Two methods are implemented for noise filtering. The first method uses mathematical morphology operators - erosion and dilation - for reducing clutter in the image caused by eyelashes, speckle and reflection. Thereafter, contrast enhancement and an edge preserving filter are applied for edge data improvement. Mathematical morphology Mathematical morphology is based on non-linear operators that operate on object shape. These operators can be used for image preprocessing, enhancement of object structure and segmentation of objects from the background. The majority of these operators process binary images. The interested reader can find a deeper perspective and discussion in [53, 81]. We limit our discussion to a basic explanation of binary primitive operators and its extension to grey scale images. Mathematical morphology uses concepts from set theory for image processing. Most algorithms involve two primitive operations: dilation and erosion. The basic effect of the dilation operator on a binary image is to enlarge the boundaries of object (foreground) pixels. The area of foreground pixels grows in size while the sizes of holes in these regions become smaller. The binary erosion operator does the opposite - it erodes away the boundaries of object pixel. Hence, objects become smaller and holes within these regions increase in size. Let us introduce a few basic set theory concepts. Let A be a set in Z 2 and a be
104 an element of A. The complement of a set A is defined as the set of elements not contained in A: Ac = {w | w 6∈ A}
(5.1)
The difference of two sets A and B are those elements in A which are not in B, defined as A − B = {w | w ∈ A, w 6∈ B}
(5.2)
The reflection of the set B is defined as ∧
B = {w | w = −b, for b ∈ B}
(5.3)
and the translation of the set A by point z as (A)z = {c | c = a + z, for a ∈ A}
(5.4)
Given the sets A and B in Z 2 , the dilation of A by B is denoted A ⊕ B and defined
as
∧
A ⊕ B = {z | [(B)z
\
A] ⊆ A}
(5.5)
A set B is usually called the structuring element in dilation and other morphological ∧
operations. The dilation of A by B is the set of all displacements, z, such that B and A overlap by at least one element. The erosion of A by B is defined as A ª B = {z | (B)z ⊆ A}
(5.6)
and states that it is the set of all points z such that B, translated by z, is contained in A. These definitions are taken from [53] and may differ from other texts. However, it is easier to understand since it can be implemented as a convolution process. For grey scale images, the erosion operator replaces the current pixel with the minimum of its neighbours; the dilation operator replaces the pixel with the maximum of its neighbours. Dilation brightens small dark areas while erosion darkens small bright areas. In our system, we use greyscale dilation and erosion operations to filter out reflection and eyelashes. The erosion operator darkens the area near reflections, making specular reflection spots smaller or removing them; the dilation operator removes eyelashes by replacing the dark pixels with lighter ones. Compared to the median filter and Gauss filter, mathematical morphology is preferred for noise
105 reduction. It is more efficient than the median filter and is more effective for specular reflection and eyelash reduction than the Gauss filter. To implement noise reduction, opening and closing is performed. Opening is an erosion followed by a dilation; closing is a dilation followed by an erosion. Opening smoothes the contour of an object and eliminates thin protrusions. Closing smoothes contours but, as opposed to opening, it eliminates small holes and fills gaps. We first perform closing by carrying out 2 dilations and then 2 erosions. Then opening is performed by doing 2 erosions and then 2 dilations on the previous result. A 3 × 3 neighbourhood operator is implemented. Figure 5.4 shows an input image and some processing results. The dilation and erosion operators produce sharper edge boundaries than the median and Gauss filters and with greater noise reduction.
106
(a) Input image
(b) Closing performed on input image
(c) Opening performed on (b)
(d) Median filtered image (radius=3)
(e) Gauss filtered image (σ = 2)
Figure 5.4: Noise reduction using closing followed by opening
107 Contrast enhancement Texture details in an iris image must be enhanced since they have poor contrast. We perform contrast enhancement on the image processed using mathematical morphology to increase the edge gradient magnitude. The approach used is histogram based. The discrete histogram h(g) of an image f (x, y) = g represents the frequency of occurrence of g in f , where g ∈ {0, 1, ..., L − 1} and is the number of grey levels of
the image [53]. It provides a global view of the image data. The shape of a histogram provides basic insight into the image contrast. By looking at the components of the histogram, we have knowledge of the tonal content and can process the image according to that knowledge. For example, in a dark image the histogram frequencies are concentrated on the low side of the grey scale. Low contrast images typically have narrow histograms centered at the middle of the grey scale while high contrast images have an almost uniform distribution. A computationally efficient and effective method is histogram equalization. Denote the probability of occurrence of grey level rk as pr (rk ) =
nk , k = 0, 1, 2, . . . , L − 1 n
(5.7)
where ng is the number of pixels having grey level g and n the total number of pixels in the image. This is a normalized histogram since the sum of all the probability components is equal to 1. Histogram equalization computes a transformation function s = T (r) for 0 ≤ r ≤ 1
(5.8)
that produces a grey level s for every pixel with value r in the original image. In the above case, r is a normalized grey level. The transformation function for histogram equalization [53] is sk = T (rk ) =
k X
pr (rj )
j=0
=
k X j=0
(5.9)
nj k = 0, 1, 2, . . . , L − 1 n
The intention is to create an image with equally distributed brightness levels over the entire brightness scale. Histogram equalization was performed on input images and
108 the results produced were acceptable. However, it was felt that the limbic boundary was not well defined and another method was sought. We use a simple technique that performs linear contrast stretching based on the histogram statistics and is very effective. It incorporates thresholding and a linear normalization procedure. A Gaussian distribution of image grey level values is assumed. This distribution is transformed to zero mean and unit variance. A threshold t is specified and the zero mean image values z are transformed to a grey level g as follows:
0 g = T (z) = 255 × 255
z < −t z+t 2t
−t ≤ z ≤ t
(5.10)
z>t
From the above equation, we note that t must not equal zero. Figure 5.5 shows the output for different values of t. A large value of t has almost no effect on an input image while small values increase the contrast but also degrade image details. A value from 1 to 1.5 produces good contrast for the iris images. Figure 5.6 shows a result for the method proposed above and histogram equalization. This method is preferred to histogram equalization. It is effective if the object of interest lies close to the mean image grey level. If the object’s grey level is far from the mean, the thresholding function may, in fact, reduce its visual presence. An automatic process can also be designed for threshold selection and linear normalization by looking at the grey level frequencies and computing t based on the sum of probabilities across a range. In this case, it would be more effective to use two thresholds, t1 and t2 . This is discussed in a later subsection when the localized iris region is enhanced.
109
(a) Input image.
(b) t = 2
(c) t = 1.5
(d) t = 1
(e) t = 0.5
Figure 5.5: Linear contrast stretching
110
(a) Input image (which has been processed using opening and closing)
(b) Contrast enhanced input image with t=1
(c) Histogram equalized input image
Figure 5.6: Contrast enhancement on the image that is processed using closing and opening. The boundary gradients are sharper compared to standard histogram equalization
111
(x, y) w
Figure 5.7: The edge preserving filter. The dot in the center is the pixel at (x, y) with value f (x, y) Edge detection using the edge preserving filter When denoising an image, edges can become blurred and sometimes lose their discriminating characteristics. If the edge that we want to locate has a particularly low magnitude compared to other edges, it may be completely filtered out and we lose important edge data. A good filter should remove noise while preserving edge content. Now that we have filtered and enhanced the iris image, the next step is to improve the limbic boundary gradient. An edge preserving filter (EPF) is used for this purpose [48]. Given an image f = {(x, y, f (x, y)), where (x, y) is the image point and f (x, y)
the grey level at (x, y)}, the filtering process starts by forming, for each pixel P =
(x, y, f (x, y)), a window W of size w × w centered at P . Within W , form 5 sub-
windows of equal size as shown in Figure 5.7 - they are the 4 quadrants and the square window centered at (x, y). The mean µ and standard deviation σ of the pixel intensities within each sub-window are then computed. The pixel is then assigned the mean of the window with the lowest variance. An application of this filter, however,
produces an image with speckle and jagged edges in some instances. To reduce this effect, we first filter the image with a Gaussian filter (σ = 2) and then apply the EPF. This produces a smoother and cleaner image. Thereafter, application of the Canny edge detector results in an accurate edge map. The EPF filter works on an 11 × 11
window. Results using the EPF are shown in Figure 5.8. The edge borders are less fragmented when the edge detector is applied to an EPF processed image.
112
(a) Input image, denoised and enhanced image
(b) Application of the EPF to input image
(c) Canny edge detection performed on EPF processed input image
(d) Canny edge detection on input image. The edge map is fragmented
Figure 5.8: Application of the EPF
113
(a) Input image
(b) The localized iris
Figure 5.9: A localized iris 5.2.3
Iris localization
The edge maps that have been computed contain useful boundary data. This is used to extract useful contour information, namely the circular iris boundaries. This is accomplished in a two step fashion using the Hough transform discussed in Chapter 4. First, the pupil boundary is located (P). Then its location is used as an estimate for locating the outer limbic boundary (L). Location of the pupil is rapid since it contains little noise. Figure 5.9 shows an input image and a localized iris. The Hough transform implemented here is time and space intensive but very accurate.
5.2.4
Eyelid localization
Once P and L are located, they are used to reduce the search space for parabolas that fit the upper and lower eyelids. Edge points can be used from SCLERA EDGE MAP. In the system, however, we find that generating a new edge map with less noise and points is more successful. During the circle detection algorithm, we search the entire Hough space, which is computationally intensive. We combine the random sample consensus (RANSAC) algorithm, a voting procedure and simple learning to locate eyelids in a less computationally intensive fashion. RANSAC (Random Sample Consensus) [79] is a robust technique for efficiently
114 fitting curves through noisy data. It is an iterative algorithm that randomly selects points, fits a curve through them and then checks if a satisfaction criterion has been reached. If this has been achieved, then the curve fitted is accepted else the process is repeated. If no curve can be fitted satisfactorily, then the algorithm terminates after a specified number of iterations. When fitting parabolas, the three parameters that define a parabola are estimated. The boundary of an eyelid can be approximated with a parabolic curve with the form given below: y = ax2 + bx + c
(5.11)
where (x, y) are coordinates in the plane and (a, b, c) are the defining curve parameters. Our algorithm works as follows: given a set S of (x, y) coordinates, randomly select n points. Estimate parameters for Equation 5.11, using the method of least squares [59] applied to these n points. The least squares estimate fits an approximated parabola through the n points. Let
b=
y0 · ·
yn−1
with n the number of points that we want to fit a curve through. Let 1 x0 x20 ... A= ··· 2 1 xn−1 xn−1 and
In the ideal case
c
(5.12)
(5.13)
r= b a
(5.14)
b = Ar
(5.15)
Since out data is noisy, we want to approximate the solution so that b ≈ Ar
(5.16)
115 and the error is minimal. The solution vector r can be found by solving the following equation r = (AT A)−1 AT b
(5.17)
When we have approximated r = [c b a], we have an equation describing a possible eyelid boundary. This equation is used to count the number of points in the edge map that actually lie on the curve or a distance d from the curve. In our case, d = 0. The curve with the maximum count corresponds to the eyelid. To prevent inaccurate and false boundary detection, a simple learning procedure is implemented. 30 test images were studied and the values estimated for r were noted. We then looked at the satisfactory fits and used the coefficients computed to determine limits of values for eyelid contours. This is built into the system so that curves that are not likely to be eyelids are discarded. Table 5.1 shows the parameters estimated for 30 images. The values in red denote poor curve fits that were rejected. For the upper eyelid we estimated the following limits: 0.0008 ≤ au ≤ 0.003 −1.6 ≤ bu ≤ −0.3 70 ≤ cu ≤ 250 and the lower eyelid: −0.0036 ≤ al ≤ 0.0009 0.1 ≤ bl ≤ 1.31 70 ≤ cl ≤ 250 where (au , bu , cu ) and (al , bl , cl ) are the parameters for the upper and lower eyelids respectively. Values outside this range are curves that are rejected to prevent a false fit. To implement RANSAC, a stopping criterion must also be specified. The algorithm is iterated until a maximum number of iterations are reached. At each step we keep track of the curve with the highest number of votes and also check the parameters against the limits that we computed for acceptance or rejection. At the end of the iterative process, a curve is produced as output or no solution is computed. Figures 5.10 and 5.11 show some curves fitted to the iris images for eyelid detection.
116
Table 5.1: Learning data for parabolas. Red denotes data that was rejected UPPER EYELID au bu cu 0.005 -1.942 213 0.002 -0.587 81 0.011 -4.11 405 0.002 -1.07 161 0.002 -0.709 130 0.002 -0.735 143 0.0016 -0.505 122 0.0023 -0.8 125 0.0015 -0.483 105 0.0024 -0.824 132 -5.24 0.024 53 0.0018 -0.533 95 0.003 -1.518 214 0.0029 -0.82 95 0.016 -8.09 1073 0.002 -0.635 99.2 0.0008 -0.245 75 0.003 -0.84 99 0.0006 -0.188 48 0.001 -0.384 81 0.0018 -0.606 121 0.0018 -0.49 76 0.0015 -0.246 90 0.0023 -1.139 188 0.002 -0.506 54 0.004 -1.15 140 0.0014 -0.62 115 0.0009 -0.37 84 0.001 -0.69 128 0.002 -0.606 93
LOWER EYELID al bl cl -0.002 0.828 183 -0.002 0.45 197 -0.001 0.36 193 -0.003 1.31 91 -0.001 0.45 194 0.0002 -0.211 270 -0.0016 0.436 207 -0.001 0.666 177 -0.009 3.05 -5.8 -0.0019 0.479 189 -0.002 0.54 203 -0.0022 0.665 177 -0.001 0.648 156 -0.002 0.608 154 -0.0014 0.42 212 -0.005 1.49 111 -0.002 0.933 136 6.14 -0.021 220 -0.001 0.45 194 -0.002 0.642 199 -0.002 0.794 160 -0.0009 0.1 217 -0.0024 0.79 190 -0.002 1.17 92 -0.002 0.71 157 -0.003 0.79 204 -0.0036 0.8366 201 -0.0016 0.58 143 -0.004 1.51 92 -0.002 0.566 177
117
(a) Input image 1
(b) Eyelid localization of input image 1
(c) Input image 2
(d) A curve fitted to input image 2
(e) A fitted curve on input image 2 using learned parabola parameters
Figure 5.10: Eyelid localization using RANSAC and learned parameters
118
(a) Input image
(b) Eyelid localization of input image
(c) Noisy input image
(d) A parabola fitted through the noisy input image
Figure 5.11: Eyelid localization
119 5.3
Iris normalization
Once the region of interest is isolated, it is transformed to a dimensionless polar system with its eyelids masked out. This form is standard irrespective of iris size, pupil diameter or resolution. The implemented algorithm is based on Daugman’s [36] stretched polar co-ordinate system. The idea behind the dimensionless polar system is to assign an r and θ value to each co-ordinate in the iris that will remain invariant to the possible stretching and skewing of the image. For our transformation, the r value ranges from [0, 1] and the angular value spans the interval [0, 2π]. The remapping is done according to the following formulas, which we repeat for completeness: x(r, θ) = (1 − r)xp (θ) + rxi (θ) y(r, θ) = (1 − r)y (θ) + ry (θ) p
(5.18)
i
xp (θ) = xp0 + rp cos(θ)
(5.19)
xi (θ) = xi0 + ri cos(θ)
(5.20)
y (θ) = y + r sin(θ) p p0 p
y (θ) = y + r sin(θ) i i0 i
The center of the pupil is denoted by (xp0 , yp0 ) and (xi0 , yi0 ) is the center of the iris; rp is the radius of the pupil and ri is the radius of the iris; and (xp , yp ) and (xi , yi ) are the coordinates of points bordering the pupil’s and the iris’ radii respectively. The remapping is done so that the transformed image is a rectangle with dimension 544×96. The transformation process changes the shape of the eyelashes while freckles and spots are not adversely affected. The eyelashes appear curved and circular. Figure 5.12 shows a localized iris, the transformed annular region and the transformed region with the eyelids masked out.
120
(a) A localized iris
(b) The localized iris region normalized
(c) The normalized region with the eyelids masked out
Figure 5.12: Iris normalization
121 5.4
Texture preprocessing
In most machine vision segmentation applications, preprocessing is an important requirement for achieving good results. Preprocessing aims to enhance an image by improving the quality of the data present. It must be mentioned that the information content is not increased in any way. The normalized iris image must be enhanced to improve texture contrast and details. When attempting to enhance the image, we must also take care not to deteriorate the quality of the image due to the ad hoc nature of some approaches. The assumptions that facilitate some algorithmic approaches may not apply in some cases or produce undesirable results. Our first task is to compensate for uneven illumination. Methods are presented in the literature for reducing uneven lighting and improving the contrast of texture details [72, 74]. These techniques, however, can be viewed as simple approximations for the intensity gradient. Point spread functions and the Retinex model for lighting normalization, and then texture enhancement, are discussed. 5.4.1
Point spread functions
A simple formula for flattening illumination and improving contrast is f0 =
σ0 (fij − f˜wij ) + f˜0 σwij
(5.21)
where σ0 is the image standard deviation, σwij the standard deviation of the pixels in window wij , f˜wij the mean of the pixels in wij and f˜0 the image mean [72]. This, however, is not very effective in reducing uneven illumination. The degradation illumination model can also be stated as e·f =g
(5.22)
where e is the degradation factor, f the true image and g the acquired image. Then the true image can be reconstructed as f=
g e
(5.23)
if we can estimate e. One way to estimate e is to use a point spread function (PSF) to estimate the additive noise. It is a function that defines the propagation of light from
122 a point source. The point source is a pixel, which propagates energy in all directions to its neighbouring pixels. These functions can model the image formation process and estimate the degradation factor by taking into account the energy of surrounding pixels. A simple PSF is the Gaussian function. Convolving an image with a Gaussian of specified spatial extent will produce an estimated degradation factor, represented by the output image. An approximation of the true image can be derived by dividing the acquired image by the estimated degradation factor image. Results of this process are shown in Figure 5.13. A small sigma produces a grainy image while a large sigma has negligible effect. The contrast in the output images have been enhanced since the PSF has poor grey scale range output. The illumination in these outputs are flatter and the contrast much better - tiny texture details and the eyelashes are more pronounced - than the input image. Figure 5.14 shows two images that have been contrast enhanced, one with PSF processing and the other without. The PSF flattens the illumination and also improves the range of grey scale values. Selecting a PSF is difficult because we do not know the illumination composition in an input image. This problem is solved by using the Retinex model.
123
(a) Input image
(b) PSF applied to input image using σ = 4
(c) PSF applied to input image using σ = 8
(d) PSF applied to input image using σ = 12
(e) PSF applied to input image using σ = 16
Figure 5.13: Using point spread functions for normalizing image illumination. Note that the PSF outputs have been contrast enhanced
124
(a) Input image
(b) Input image with its contrast enhanced
(c) Input image processed using a PSF with σ = 16 and then contrast enhanced
Figure 5.14: Enhanced images, without and with illumination flattening using a PSF
125 5.4.2
The Retinex model
Uneven illumination in an image causes discontinuities that break up objects when segmentation is performed. It may also degrade object and texture details, making different objects appear the same. The opposite can also occur - similar objects can appear very different. The multi-scale Retinex [46, 47] is used for illumination flattening. The Retinex is a class of point spread functions. The single scale Retinex is defined as r(x, y) = log[f (x, y)] − log[g(x, y) ∗ f (x, y)]
(5.24)
where r(x, y) is the Retinex output image for an input image f (x, y). Convolution is denoted by ∗ and g(x, y) is a Gaussian function. The Retinex algorithm improves and balances illumination in an image. It can be used to flatten the illumination, thereby
reducing shadow effects and uneven patches in the image. It is a general purpose algorithm that aims to produce a good representation of a scene. The multi-scale Retinex output is m=
n X
ω i ri
(5.25)
i=1
where ri is a single scale Retinex output with σi for the Gaussian and weighting ωi . There are n scales, each defined by its σ value. In our implementation, n = 5 and σi ∈ {8, 16, 20, 24, 28}. Figure 5.15 shows outputs, from a PSF and the multi-scale
Retinex, that have been contrast enhanced. A disadvantage of the Retinex is that it
can sometimes desaturate the image. We address this problem in the next section by discussing the contrast enhancement method.
126
(a) Input image
(b) PSF processed image with contrast enhancement
(c) Retinex processed image with contrast enhancement
Figure 5.15: Enhanced images, with PSF and Retinex
127 5.4.3
Texture enhancement
The application of a PSF or multi-scale Retinex to an image can desaturate its grey level range and reduce the image contrast. Figures 5.16 (a) and (b) show an input image and the Retinex output respectively, with the latter being desaturated. The technique discussed in Section 5.2.2 was effective for contrast enhancement but a parameter t had to be specified. To manipulate the histogram for automatic enhancement, we set a fixed threshold t that is a fraction (0 < t < 1) and then scan the histogram from each end, summing the normalized frequencies as we proceed. We stop the scanning as soon as the summed frequencies for each end is greater than or equal to t. The grey level values at these two termination points, glower and gupper , are then noted. This provides us with upper and lower grey scale values for transforming the image. By treating t as a fraction, it is ensured that we are not making an assumption about the position of the histogram end points. This algorithm is effective if the histogram endpoints are not close to the entire grey scale range. Equation 5.10 is now revised so that it becomes 0 g = T (z) = 255 × 255
z < glower z−glower gupper −glower
glower ≤ z ≤ gupper
(5.26)
z > gupper
Figures 5.16 (c) - (e) show the results for different enhancement methods. Our
approach is far superior to traditional histogram equalization. Local histogram equalization computes a transform function in a local neighbourhood for each pixel. The result in (e) was produced using a 63 × 63 window. As can be seen, the methods of (d) and (e) actually degrade the image patterns and their distinctness. Grey level range is also inferior compared to (c).
128
(a) Input image
(b) Retinex output
(c) Retinex output with contrast enhancement
(d) Retinex output with histogram equalization
(e) Retinex output with local histogram equalization
Figure 5.16: Contrast enhancement of Retinex output
129 5.5
Parameter estimation and feature extraction
Once an iris has been localized, transformed and enhanced, we implement four texture analysis methods: 1. Grey level co-occurrence matrices (GLCM) 2. Gabor filtering (GABOR) 3. Discrete wavelet transform (DWT) 4. Markov random field (MRF) for feature extraction. This establishes a set of region properties for each pixel. First, there is a parameter estimation step where a window size is estimated for computing texture properties; texture parameters are also defined. Then we perform the feature extraction. The iris images being analyzed contain small structures - freckles, spots caused by reflection and eyelashes which manifest as edge like structures. Small windows are suited for these objects. Large windows will destroy the boundary information of these elements or may engulf them completely with the information of surrounding objects. This will make segmentation difficult since their discriminating characteristics will not be apparent. Using small windows for texture analysis implies that some operators will also be sensitive to noise and fluctuations. This problem is addressed by smoothing the features after feature extraction. An advantage of using small windows for analysis is that the computational burden is reduced drastically. This makes real time automated segmentation highly possible. A small window may also not capture sufficient texture information while a large window may include data from different textures. Using large windows destroys important boundary information, making the segmentation imprecise. It also introduces a third texture possibility at the borders. The large window and two types of texture present create this new element. In addition, tiny details will not be captured. From empirical results, we supply the following window sizes for the texture methods: 1. GLCM = 9 × 9. 2. DWT = 8 × 8.
130 3. MRF = 9 × 9. 4. The GABOR kernel size is dependent on the parameter λ. For comparative purposes, we ensure that the maximum kernel size is 9 × 9. This is performed
by carefully selecting the λs.
The window size for smoothing is 9 × 9 for the GABOR and MRF. We use 5 × 5 for
DWT and GLCM. The smoothing operator computes the average value within the subwindow for each feature type and assigns a pixel that value as a texture property. These parameters are estimated based on experimental results and a visual analysis of texture borders, fragmentation and accuracy. Figure 5.17 shows segmentation results using varying window sizes. A small window size, such as in (b), produces a fragmented result. It is also sensitive to borders and gradients and detects them easily. However, it does not capture enough image information. The result shown in (c) is much better but there is inaccuracy along the borders of the eyelashes. (d) detects the entire eyelash region but with poor accuracy. The borders are not accurate and there are eyelash pixels that are not detected. These results demonstrate the necessity to determine an effective texture window size for an algorithm.
131
(a) Input image
(b) 8 × 8 window
(c) 16 × 16 window
(d) 24 × 24 window
Figure 5.17: The effect of texture window size on segmentation with window sizes of 8 × 8, 16 × 16 and 24 × 24
132 5.5.1
GLCM
Four texture measures are computed from GLCMs generated from the input image. They are: CONT =
N X
i,j=1
ENTR = −
(i − j)2 pθ,d (i, j)
N X
pθ,d (i, j)log pθ,d (i, j)
(5.27)
(5.28)
i,j=1
MEAN = µ = µx =
N X
iPx (i, j)
(5.29)
i,j=1
v u N uX Px (i, j)(i − µ)2 SDEV = t
(5.30)
i,j=1
where
Px (i) =
N X
pθ,d (i, j)
(5.31)
j=1
These features are selected by considering the guidelines presented in [95]. For computational efficiency and improved co-occurrence relations of pixels, we reduce the number of grey levels in the input image to 32. This is done using a simple linear transform. Let G be the desired number of grey levels and pixelold be the initial grey scale value of a pixel. Then pixelnew =
pixelold ×G 255
(5.32)
Note that pixelnew is a whole number and not a fraction. Fractional parts should be dropped or rounded off. This transforms the input grey levels to the range {0, 1, . . . , G − 1}. We use θ ∈ {0◦ , 45◦ , 90◦ , 135◦ } and d ∈ {1}. Due to the small
size of the analysis window, it would be infeasible to consider more distances or val-
ues greater than 1. In addition, we want to keep the dimensionality of the features vector at a minimum. These parameters produce 16 features for each pixel in the input image. GLCMs are implemented dynamically so as to create matrices without zero entries. The grey level value for an index is stored in a list which is referenced efficiently. An input image of size 544 × 96 is processed in approximately 6 seconds.
133 5.5.2
GABOR
We use the GABOR function of Jain [12] and Kruizinga [70] 1 x2 y2 2πx g(x, y) = exp{− ( 2 + 2 )} cos( + φ) 2 σx σy λ
(5.33)
with Bf set to 1 and Bθ to 30◦ as suggested by [34]. This resolves to the following equations σx = 0.562λ
(5.34)
σy = 0.699λ
(5.35)
when simplifying Equation 3.57 and 3.58. Features are extraction using the GABOR energy method [70]. The output of a symmetric and antisymmetric filter pair are combined yielding the GABOR energy quantity q 2 2 + Rodd Eλ,θ = Reven
We use φ ∈ {0, π2 } where φ = 0 is symmetric and φ =
(5.36) π 2
is antisymmetric. Since
Bθ = 30◦ , we set θ ∈ {0◦ , 30◦ , 60◦ , 90◦ , 120◦ , 150◦ }. To generate small kernels of
maximum size 9 × 9, the Gaussian spread must be examined. In a 1D Gaussian
centered at x = 0, 99.9% of the data is contained in the range [−3σ, 3σ]. Using this
idea for the 2D case, we take max(σx , σy ) and compute a width for a window centered at the origin of the GABOR function. From Equation 5.34 and 5.35, we can see that the maximum spread is in the y direction. If a maximum kernel size of 9×9 is desired, then 6 × σy = 9
(5.37)
Using Equation 5.35 and 5.37, λ should not exceed 2.14. We use λ ∈ {1.41, 2.82},
providing 12 Gabor features. λ = 1 is sensitive to noise so it is omitted. The results are smoothed with a Gaussian that has spatial extent 1.3 times that of the corresponding filter due to the sensitivity of the kernels to noise. These results are then smoothed with the averaging filter. 5.5.3
DWT
10 features are extracted by a DWT algorithm. A generalized Haar [84, 58] algorithm is used to decompose the image, extracting detail and approximation coefficients by
134 applying the kernels illustrated in Figure 3.18 of Chapter 3. To perform feature extraction, an 8×8 window is centered at each pixel and 2 passes of the Haar algorithm are performed on windowed pixels. At each scale, 4 sub-images are produced from the detail and approximation coefficients - LL, LH, HL and HH. Features are computed for LL, LH and HH. For LL, the mean and AAD are computed. For LH and HL, the energy is computed. The wavelet energies describe the frequency content of the image. Thus, for each pass, 4 features are computed. In addition, the mean and AAD are computed for the original image. This provides a total of 10 features for the DWT transform of a texture. 5.5.4
MRF
An important parameter in the MRF texture model for successful segmentation is the neighbourhood order. Complex textures should be modelled using a high order. It is difficult to adapt the interaction structure for a particular texture and is not feasible for an automatic solution. We use order 2 and implement a GMRF that generates 4 texture features. A zero mean image is computed using a window with size 3 × 3. Features are computed using the method of Cesmeli [45], discussed in Chapter 3. 5.6
Training
The segmentation algorithm that is implemented uses supervised classification to identify image regions. It includes a training component that incorporates a priori knowledge for this purpose. During the training process, texture properties are computed for the pixels of a training image, which are labelled according to the image region that they belong to. Hence, given a set of training images, we have texture properties which are labelled. This constitutes the a priori knowledge of the problem. Ideally, the training images will effectively represent the structure of the feature space of unknown samples. This structure of the feature space is represent by the parameters that describe the classifier, which are computed during the training process. The classifier implemented in our system is the Fisher linear discriminant. The training process computes a set of weights wi for pairs of classes (labels). These weights define the hyperplane of the Fisher classifier that separates 2 classes. We first discuss sample selection, which is important for reducing the number of training
135 samples for improved running time of the training process. Thereafter, the classifier design is explained. 5.6.1
Sample selection
A common issue in supervised classifier design is the large number of pattern samples used in the training phase. This large data set makes the training process extremely inefficient. We use a method of sample selection that discards features vectors using a KNN approach. We also compute a quantitative measure to assess the ability of the selected samples to strongly represent the original sample set. The number of features vectors for iris image is large - approximately 50 000 of dimension d. If we had to use data from 20 images for training, this amounts to 1 000 000 features vectors, which is too large for practical purposes. We use an ad hoc method to reduce the number of samples. This is based on a KNN partition of a set of features vectors. To evaluate the information content of the selected subset, we consider its ability to predict the labels in the original set using a minimum distance classifier. We select 20 images for training samples. In these images, we label the different image regions, which we denote as IRIS, PUPIL, REFLECTION, SKIN and EYELASH. The labelled image and the original image form the input to an algorithm that computes texture features and then uses the labelled image to construct sets of features vectors belonging to the image regions mentioned above. These pattern sets are then reduced separately, recombined and evaluated for information content. An input image and a labelled image are shown in Figure 5.18. Our sample set f is reduced by removing redundant samples using a KNN algorithm [99]. A k value is first specified and then the algorithm proceeds by selecting a pattern x from f and discarding its k nearest neighbours in f . The selected features vector is placed in a new set fnew . The initial set f will now contain neither x nor its k nearest neighbours. This process is repeated on f until it is empty. From an empirical evaluation, we use k = 20 which selects approximately 2500 vectors (about 5% of the total feature set in the image). It is recommended that the number of training vectors be 10% of the total sample set size [99]. However, we select 5% to ease the computational burden and also reduce the possibility of overtraining. To
136
(a) Input image
(b) Labelled image
Figure 5.18: Input image and labelled image evaluate the new sample set in terms of its ability to represent the data of the initial sample set effectively, the selected features vectors set is used to predict the class labels of the original image. Given an image pixel and its features vector, we assign to the pixel the label of the vector in the reduced set that is nearest to it (we use the Euclidean distance for feature comparison). The predicted labels are then compared to the actual class labels and a normalized prediction accuracy is computed - 1 denotes the ability to predict all labels correctly; 0 denotes no labels being predicted correctly at all. Figure 5.19 shows the predication accuracy for the 20 training images. The accuracies for the GLCM, GABOR and DWT methods are similar. However, the MRF accuracy is significantly lower. This highlights the fact that more samples are required, as compared to the other methods, to effectively represent its pattern set. Since we are using a KNN approach to remove redundant samples, we can conclude that the features vectors of the GLCM, GABOR and DWT are generally more compact than the MRF. The KNN algorithm will retain important information if clusters are compact since a selected features vector has had neighbours discarded that are very close to it in feature space. In the case of the MRF, a pattern not similar to the selected one but included in the set of k nearest neighbours will be discarded.
137 Compactness is highly desirable for homogeneity. Figure 5.20 presents the average prediction accuracy for all four texture methods. The GLCM features vectors have the best discrimination since they have the highest prediction accuracy. This is followed by the GABOR and DWT patterns. The MRF has the poorest accuracy and is approximately 5% lower than the other methods, which is about 2600 pixels erroneously classified. Our method of subset selection of samples provides a good estimate of the original feature space since 88 to 95% of the patterns can be correctly predicted for the different texture analysis methods. We can assume that data integrity is maintained in the reduced sample set. The selected patterns are stored in a text file for each class and texture method. Hence, we have 20 text files of selected data (4 texture methods and 5 region classes). These text files provide the data for the classifier design process which is discussed next.
138
Figure 5.19: Prediction accuracy for training images
Figure 5.20: Average prediction accuracy
139 5.6.2
Classifier design
We design a simple hybrid classifier that uses the 2 class Fisher linear discriminant discussed in Chapter 4. Our sample selection algorithm has produced a reduced set of samples with corresponding image region labels. This reduced set consists of data from the 20 training images. We use these features and labels to compute a discriminant for every combination of two classes (producing 10 discriminants). This is done for each texture method to complete the learning process. Given the five image region classes (IRIS, PUPIL, REFLECTION, SKIN and EYELASH), we have the following 10 two category combinations: • IRIS-EYELASH • IRIS-SKIN • IRIS-REFLECTION • IRIS-PUPIL • EYELASH-SKIN • EYELASH-REFLECTION • EYELASH-PUPIL • SKIN-REFLECTION • SKIN-PUPIL • REFLECTION-PUPIL For each pair, the Fisher method computes a weight vector w for the separating plane. As such, an n-dimensional sample is projected to a single dimension. If we view the projected points graphically, two classes can be seen separated by an imaginary line. However, there will be a number of points that overlap if the classes are not perfectly separable. Figure 5.21 shows this visualization. We have selected 200 sample points each for the EYELASH class and the IRIS class (for the DWT). They are computed using the IRIS-EYELASH discriminant. These points are then joined for each class, as shown in Figure 5.21, to view the projection as a signal.
140
Figure 5.21: A graphical view of the FLD projection using a two class discriminant. The two classes are separated by the line y = 0. However, there exists a class overlap We then model the Fisher 1-dimensional projections for each class pair as two Gaussians (one for each class). For each set of projected 1-dimensional points, the mean µ and standard deviation σ for that set are computed. The distribution for the set of points is assumed to be the univariate Gaussian function " ¶2 # µ 1 x−µ g(x, µ, σ) = √ exp −0.5 σ 2πσ
(5.38)
Hence, for each pair there are 2 probability distributions. Given these 2 distributions, we use Bayesian decision theory to classify a feature. A pattern x is classified into class ωi if P (ωi | x) = max(P (ω1 | x), P (ω2 | x))
(5.39)
For the ten discriminants, there are ten outputs. We decide the class label of a feature as being the most common label in the ten outputs. In the case of a tie, a class is assigned randomly from those in the tie set. Tables 5.2 - 5.5 present the Gaussian parameters for discriminant pairs for each texture method.
141
Table 5.2: GLCM 2 class distributions based on FLD transform CLASS 1 (C1) IRIS IRIS IRIS IRIS EYELASH EYELASH EYELASH SKIN SKIN REFLECTION
CLASS 2 (C2) EYELASH SKIN REFLECTION PUPIL SKIN REFLECTION PUPIL REFLECTION PUPIL PUPIL
µC1 0.000027 0.000033 0.000046 0.000024 0.000148 0.000011 0.000355 0.000297 0.000392 0.001155
PARAMETERS σC1 µC2 0.000053 -0.000132 0.000067 -0.000196 0.000098 -0.000610 0.000155 -0.001058 0.000176 -0.000236 0.000192 -0.000511 0.000841 -0.005235 0.000448 -0.000822 0.001101 -0.005177 0.002504 -0.012093
σC2 0.000091 0.000120 0.000475 0.000356 0.000253 0.000519 0.001412 0.000613 0.001317 0.002579
Table 5.3: GABOR 2 class distributions based on FLD transform CLASS 1 (C1) IRIS IRIS IRIS IRIS EYELASH EYELASH EYELASH SKIN SKIN REFLECTION
CLASS 2 (C2) EYELASH SKIN REFLECTION PUPIL SKIN REFLECTION PUPIL REFLECTION PUPIL PUPIL
µC1 0.000019 0.000007 0.000030 0.000016 0.000202 0.000115 -0.000076 0.000027 -0.000079 0.001126
PARAMETERS σC1 µC2 0.000051 -0.000123 0.000027 -0.000030 0.000095 -0.000402 0.000126 -0.000685 0.000158 -0.000025 0.000192 -0.000479 0.000450 -0.001678 0.000420 -0.000880 0.000667 -0.002569 0.001137 -0.001976
σC2 0.000084 0.000043 0.000237 0.000255 0.000165 0.000371 0.000756 0.000516 0.001072 0.001341
142
Table 5.4: DWT 2 class distributions based on FLD transform CLASS 1 (C1) IRIS IRIS IRIS IRIS EYELASH EYELASH EYELASH SKIN SKIN REFLECTION
CLASS 2 (C2) EYELASH SKIN REFLECTION PUPIL SKIN REFLECTION PUPIL REFLECTION PUPIL PUPIL
µC1 0.000025 0.000029 0.000051 0.000016 0.000144 -0.000038 -0.000003 0.000072 -0.000327 0.001664
PARAMETERS σC1 µC2 0.000049 -0.000136 0.000061 -0.0000166 0.000110 -0.000626 0.000151 -0.001072 0.000156 -0.000132 0.000181 -0.000449 0.000443 -0.001587 0.000368 -0.000882 0.000798 -0.003398 0.001724 -0.002991
σC2 0.000109 0.000114 0.000394 0.000432 0.000207 0.000442 0.000770 0.000649 0.001352 0.001135
Table 5.5: MRF 2 class distributions based on FLD transform CLASS 1 (C1) IRIS IRIS IRIS IRIS EYELASH EYELASH EYELASH SKIN SKIN REFLECTION
CLASS 2 (C2) EYELASH SKIN REFLECTION PUPIL SKIN REFLECTION PUPIL REFLECTION PUPIL PUPIL
µC1 0.000016 0.000005 0.000025 0.000026 0.000210 -0.000062 -0.000063 0.000036 -0.000114 0.001139
PARAMETERS σC1 µC2 0.000047 -0.000109 0.000019 -0.000013 0.000080 -0.000270 0.000090 -0.000336 0.000153 -0.000025 0.000165 -0.000285 0.000238 -0.000483 0.000388 -0.000731 0.000558 -0.001220 0.001050 -0.000824
σC2 0.000079 0.000030 0.000173 0.000193 0.000180 0.000251 0.000358 0.000401 0.000577 0.000865
143
Figure 5.22: Input image and its ground truth
5.7
Computing ground truths
A ground truth image represents the true segmentation of an image and is used to compute the accuracy of a segmentation result produced by an algorithm. However, it is computed manually by the user. As a result, a level of bias and uncertainty is present since different users will have different opinions of a good segmentation. This will affect our interpretation of the system accuracy. We have not computed a correction factor for this bias. It is assumed that the accuracy computed is within reasonable range of the true segmentation accuracy. Given an input image, we denote iris pixels by the color black (0) and non-iris pixels by the color white (255) for the ground truth image. 100 ground truths are computed, one for each input image in the test set, for segmentation accuracy evaluation.
5.8
Iris segmentation using pattern classification
Once feature extraction for an image is performed, the computed features vectors are used either as input to the training algorithm or input to the image segmentation algorithm. The method of segmentation that we use is region growing. Regions are grown by clustering the features vectors, where each cluster represents an image region. We discuss the clustering parameters and experiments. The algorithms we implement are the K-means (KM) and fuzzy C-means (FCM) clustering methods.
144 5.8.1
Clustering of features for region growing
The clustering techniques have parameters that need to be supplied and a stopping criterion to be defined. For simplicity and completeness, it is assumed that the stopping criterion for the iterative process is a binary function s that uses the change in centroids at the current iteration and the previous iteration to decide whether to proceed or not. Consider the prototype of each cluster to be the centroid of its members. Let the centroids at step k be Ck = {ck1 , ck2 , . . . , ckn } where n is the number of centroids.
Let the centroids at step k − 1 be Ck−1 = {c(k−1)1 , c(k−1)2 , . . . , c(k−1)n } where n is the
number of centroids. Then
0, | cki − c(k−1)i |≤ ², ∀i = 1 . . . n s(CK , CK−1 ) = 1, otherwise
(5.40)
The clustering process runs as long as s evaluates to 1. A maximum number of iterations is introduced so that the process is not exhaustive - this value is 80. We use function s for the KM algorithm and also FCM since evaluating a stopping criterion for the membership matrix U is computationally burdensome. The fuzzy factor m for FCM is 2. From the experiments using the DB index, to be discussed later, we set the number of classes for the clustering algorithms to 6. The Euclidean distance is used as a measure of similarity. The running time of the clustering is improved by randomly selecting 1 sample from every subset of size 10. We cluster these, approximately, 5000 features, computing cluster prototypes. Then we assign the samples in the original set to the closest prototype. Through empirical observation it is noted that the segmentation results are not adversely affected. Figure 5.23 shows a labelled image produced by the clustering process. Experimental results In our clustering experiments we use 104 test images. The main purpose of these experiments is to: 1. Compare the K-means and fuzzy C-means clustering methods.
145
Figure 5.23: A labelled image produced by the clustering process 2. Compute a global value for the number of clusters in a typical iris image. 3. Evaluate the cluster separability of the four texture analysis techniques. Central to our clustering experiments is a cluster validity measure. We use the DaviesBouldin (DB) index. With an increase in sample size and the number of clusters, computation of a cluster validity index becomes time intensive. To alleviate this, n samples are randomly selected from each cluster - n is set to 1000. If the number of samples are less than n, we select all the samples. The DB index is then computed for a particular number of clusters for each of 10 runs and these results are then averaged. This is done for 104 images, with the number of clusters ranging from 2 to 9. This procedure is executed for each combination of texture method and clustering technique (8 outputs per image for each cluster number). The lowest DB index across a range of cluster numbers implies the best clustering result for that value of cluster numbers. We select this (lowest) index to evaluate a clustering result for a texture technique using a particular clustering algorithm. In addition, the texture technique and clustering algorithm with the lowest DB index also implies good separability of clusters and can be used to compare the techniques. Figure 5.24 shows the average DB index for each texture method using a particular clustering algorithm. This average is computed for the 104 test images. A ranking of methods according to lowest index shows the GLCM to be the best. In addition, for most of the texture methods, the FCM algorithm provides the best separability in terms of the DB index. This tells us that the FCM is the better clustering method. We can also deduce that the GLCM method provides the best texture separability since it also has the lowest index amongst all the methods. GLCM produces more clusters for separability in terms of DB index, implying compact and well defined features. A low DB index also implies a higher number of classes for clustering, as
146 shown by the relative averages in Figure 5.25. Looking at the results in Figure 5.25, we set a global value of 6 for the number of clusters in the clustering algorithms.
147
Figure 5.24: Average DB indices
Figure 5.25: Average number of classes for clustering
148 5.8.2
Iris image region classification for segmentation
As we have mentioned before, the iris segmentation algorithm is a region based approach. Firstly, the iris is localized and transformed to a size invariant representation. Thereafter, we compute texture features for each pixel. We then cluster these features, using K-means or fuzzy C-means, to produce homogenous regions. Each region is represented by a prototype - a cluster centroid. Image regions are then identified by classifying the centroids. This process uses the FLD and a voting process to establish labels. By computing labels for each region, we can identify iris texture and non-iris regions. The image is the relabelled by assigning 0 to iris pixels and 255 to non-iris pixels. The segmentation experiments entailed performing the above sequence of steps 8 times for each image - two times for each texture method, where one time is for the Kmeans and the other for the fuzzy C-means. The clustering algorithms are randomly initialized. This means that there is a possibility that the algorithm settles in a local optimum. Therefore, 10 test runs are performed for each clustering method and the average is then computed for a segmentation accuracy measure. The final accuracy measure for a particular texture method and clustering algorithm is the average for the 100 test images. The most popular method for evaluating a segmentation algorithm is to compute the percentage of pixels that are correctly classified by it. This method, however, is not entirely accurate in our problem. The objects that we wish to locate are sometimes small in size in comparison to the whole image. If a large number of its pixels are misclassified, we get poor object extraction even though the segmentation accuracy itself may be high. Two segmentation accuracy measures are computed for comparison. The first one counts directly the number of correctly classified pixels in the segmentation result - we call this the segmentation accuracy (SA). The second method computes a measure that is weighted by the size of the objects in the image - this is called the weighted segmentation accuracy (WSA). The weighted measure is discussed since the former is easy to implement. The weighted segmentation accuracy measure takes into account the different sizes of objects in the image. We first extract the locations of pixels in each region I i from the ground truth and then look at the segmentation result to see the percentage of
149 Table 5.6: Average segmentation accuracies, with their standard deviations shown in brackets Clustering KM(SA) FCM(SA) KM(WSA) FCM(WSA)
GLCM 89.34 (5.48) 89.52 (5.07) 88.90 (4.65) 88.48 (5.07)
GABOR 89.06 (5.40) 89.17 (4.92) 88.01 (5.22) 88.25 (5.19)
DWT 85.15 (7.32) 85.88 (7.22) 86.32 (5.97) 86.42 (5.88)
MRF 86.22 (6.53) 88.34 (5.93) 87.17 (5.52) 87.71 (5.24)
pixels in these locations that are correctly classified in the result S. Then the average over all regions Ii is computed. It is possible, however, that we may get 100% accuracy for a region Ii together with pixels in other regions being classified as belonging to the label of Ii . This means that if we extract what we think is region Ii from S, we get all pixels in Ii and the misclassified pixels. However, this is accounted for in the evaluation of other regions which will have lower accuracies. Hence, the average is a good estimate of segmentation accuracy. For the two class case, it is µ ¶ 1 c1 c2 W SA = + 2 n1 n2
(5.41)
The size of region 1, as specified by the ground truth, is n1 . The number of pixels correctly classified for locations in region 1 in the segmentation result is c 1 . Region 1 pixel locations are specified by the ground truth. The same applies to region 2. Experimental results Table 5.6 shows the average results, rounded to two decimal places, for the different algorithms using the two methods of segmentation accuracy evaluation. The standard deviations are shown in brackets next to the averages. Figures 5.26 and 5.27 summarize the performances of the different techniques. Considering both the FCM and KM techniques and using SA, we can rank the four texture methods - from best to worst - in terms of segmentation accuracy as follows: GLCM, GABOR, MRF and DWT. The FCM also outperforms the KM in all instances with as much as a 2% increase in accuracy (in the case of the MRF). From our experiments with the DB index and sample selection we can also see that, in the case of the DWT, a low DB index and good subset prediction accuracy
150 does not necessarily imply a high segmentation accuracy. In the same way the MRF method, which generated the highest DB indices and lowest feature subset prediction accuracies, is ranked third in terms of segmentation accuracy. It can also be seen that the methods with the high segmentation accuracies have low standard deviations as compared to the inferior methods. This means that a random segmentation result with these algorithms will be close to the mean accuracy and more stable. The ranking of the texture methods remains the same when using WSA. However, the KM algorithm produces better results for the GLCM compared to FCM. The FCM is still superior for the other texture representations. The segmentation accuracies of the GLCM and GABOR methods drop slightly while that of the DWT and MRF increase slightly. This, however, does not affect the ranking of the algorithms as presented using SA. Figures 5.28 and 5.29 show an image with a lot of texture variations and noise. The eyelashes obstructing the texture in the right half of the image are almost indiscernible from the iris texture patterns. Individual eyelashes are difficult to detect since the iris texture itself also has sharp gradients and small shape patterns similar to the eyelashes. The GLCM and FCM produce the most promising results. The tiny pupil region is detected as well as the eyelash region. Most importantly, the eyelash area detected is contiguous and not fragmented as in the other methods. The DWT produces the poorest result. However, in this case it is possible that the clustering algorithm settles in a local optimum. All of the methods produce small areas which are misclassified and create a fragmented result. These are high contrast pixels, caused by texture variations, which adversely affect the segmentation algorithm. Figures 5.30 and 5.31 show an iris image with little noise and variations. The texture details of the iris are not as varied as Figures 5.28 and 5.29. Hence, the overlap of iris and non-iris feature space is much smaller. The results produced by all the methods are acceptable. The MRF result has a few noisy areas. The pupil pixels are detected by all the texture and clustering methods. Figures 5.32 and 5.33 show an image with texture variations near the pupil boundary. The iris texture region has texture and non-texture (almost constant grey tone) pixels. The grey tone of the eyelashes is also similar to some areas of iris texture. In
151 addition, if we consider the fact that our texture window is small, we can immediately see that misclassification is highly possible since, at this level, pixel grey levels and patterns are similar for iris and eyelashes. Hence, the texture operators respond almost the same in iris and eyelash regions. This can be noted in the segmentation results. All 4 methods, while capable of identifying non-iris pixels, also falsely identify a large number of iris pixels as non-iris pixels. Another possibility for this result is that the classifier is not effective for these types of textures. This will be addressed in future experiments.
152
Figure 5.26: Average segmentation accuracy
Figure 5.27: Average weighted segmentation accuracy
153
(a) Input iris image
(b) GLCM and KM
(c) GLCM and FCM
(d) GABOR and KM
(e) GABOR and FCM
Figure 5.28: Segmentation results
154
(a) Input iris image
(b) DWT and KM
(c) DWT and FCM
(d) MRF and KM
(e) MRF and FCM
Figure 5.29: Segmentation results
155
(a) Input iris image
(b) GLCM and KM
(c) GLCM and FCM
(d) GABOR and KM
(e) GABOR and FCM
Figure 5.30: Segmentation results
156
(a) Input iris image
(b) DWT and KM
(c) DWT and FCM
(d) MRF and KM
(e) MRF and FCM
Figure 5.31: Segmentation results
157
(a) Input iris image
(b) GLCM and KM
(c) GLCM and FCM
(d) GABOR and KM
(e) GABOR and FCM
Figure 5.32: Segmentation results
158
(a) Input iris image
(b) DWT and KM
(c) DWT and FCM
(d) MRF and KM
(e) MRF and FCM
Figure 5.33: Segmentation results
159 5.8.3
Connected components filtering
Pixels belonging to the objects of interest are usually connected while noise pixels are disconnected and form small clusters. Properties derived from the connected pixels of an object can be used to improve a segmentation result. Simple properties include area and perimeter. The location (center of mass) of the object can also be determined. We implement connected components (CC) labelling [3, 53] to locate connected components and discard noise. Given a binary output image f , where 255 represents non-iris and 0 iris pixels, we define two pixels p, q ∈ f , where p and q both have grey level 255, as equivalent if a
path exists from p to q. A path is a sequence p1 , p2 , . . . , pn such that the chessboard distance between pi and pi+1 is 1, where i = 1 . . . n. Computation of the connected components of an image f is analogous to finding the equivalence classes. The output of a connected component labelling algorithm is an image c where each pixel is given a grey level unique to its class. For n classes, there are n + 1 grey levels, with the extra grey level being for the background. Formally, µ ¶ n c = ∪ Cj ∪ B j=1
(5.42)
where Cj is a class, n the number of classes and B the background. Once the binary image is labelled, we can then compute shape properties for each object, which is represented by a unique label. The non-iris objects, in a normalized iris image, are almost always connected to the top or bottom edge of the image region - they are usually attached to the eyelids, which are joined to the bottom edge of the image. Generally, objects not connected to the upper or lower image edge are noise components. In our test set, we found that even specular reflection occurs in close proximity to the upper or lower eyelids. Hence, all objects not connected to the upper or lower edge are discarded. Thereafter, if an object is connected to the upper edge of the image, the height and aspect ratio of its bounding box are examined to accommodate for misclassification caused by tiny freckles and other variations. In this case, objects with an aspect ratio less than 3 and a height greater than 8 are rejected. The aspect ratio of the bounding box is given by r=
w h
(5.43)
160 Table 5.7: Average segmentation accuracies using connected components. The standard deviation is shown brackets Clustering KM(SA) FCM(SA) KM(WSA) FCM(WSA)
GLCM 90.49 (5.95) 90.09 (5.61) 88.81 (7.08) 88.01 (7.22)
GABOR 90.49 (5.20) 90.64 (4.99) 87.94 (7.11) 88.36 (7.01)
DWT 89.19 (5.92) 90.04 (5.16) 88.14 (6.63) 88.22 (6.58)
MRF 88.59 (5.45) 89.91 (5.52) 88.09 (6.19) 87.89 (6.88)
Table 5.8: Difference between segmentation accuracies using connected components and no connected components filtering (CC - NO CC) Clustering KM(SA) FCM(SA) KM(WSA) FCM(WSA
GLCM 1.15 0.57 -0.09 -0.47
GABOR 1.43 1.47 -0.07 0.11
DWT 4.04 4.16 1.82 1.80
MRF 2.37 1.57 0.92 0.18
where w and h are the width and height of the bounding box respectively. Experimental results Table 5.7 presents the segmentation accuracies using connected components filtering. They are summarized in Figures 5.34 and 5.35. Table 5.8 presents the differences in accuracy with and without CC filtering. The SA measure shows that connected components filtering improves a segmentation result and, in the case of the DWT, this increase is as much as 4%. In the case of the GLCM and GABOR texture measures, the segmentation accuracy is relatively constant (an increase or decrease of approximately 1.5%) even with CC filtering. Hence, we can deduce that these methods have a low number of tiny fragments and produce more contiguous and superior results than the DWT and MRF. Thus, they are good as general texture descriptors for iris images. The accuracy of the DWT and MRF is greatly increased by as much as 4.16%, which is approximately 2000 more pixels being correctly segmented. In these methods, however, it means that these objects are initially misclassified by the texture descriptors. The CC filtering locates these misclassified objects and relabels them,
161 improving the segmentation result. Thus, these two descriptors are not as powerful as the GLCM and GABOR for general texture characterization. Figures 5.36 and 5.37 show the results of CC filtering on a segmentation result. Note how fragments are removed.
162
Figure 5.34: Average segmentation accuracy using CC filtering
Figure 5.35: Average weighted segmentation accuracy using CC filtering
163
(a) Input iris image
(b) GLCM and FCM without connected components filtering
(c) GLCM and FCM with connected components filtering
Figure 5.36: Segmentation results - successful connected components filtering
164
(a) Input iris image
(b) DWT and FCM without connected components filtering
(c) DWT and FCM with connected components filtering
Figure 5.37: Segmentation results - poor connected components filtering
165
(a) Input iris image
(b) Extracted iris texture
Figure 5.38: Input iris image and extracted iris texture 5.9
Iris texture extraction
The segmentation algorithm classifies and labels the image pixels into one of two categories - iris or non-iris. Using these labels, the iris texture is extracted, as shown in Figure 5.38. This is the final segmentation of an iris image. Most of the artifacts in the image are removed. Although the segmentation is not 100% accurate, it is very effective. The eyelashes, eyelids, reflection and pupil pixels have been removed to a great extent. The feasibility of a feature extraction and pattern classification approach for segmenting iris images has also been demonstrated. We follow up this chapter with a discussion on future research and then provide concluding remarks on our work.
Chapter 6 Future work and recommendations 6.1
Summary
In this thesis, we focussed on the classic problem of image segmentation by applying texture analysis and pattern classification techniques to an image of the eye to extract iris texture. To our knowledge, this method of iris texture extraction is new to the field and we have developed a feasible hybrid solution. The problem was formulated by surveying the literature on iris recognition and implemented algorithms for iris texture segmentation. It was mentioned that several methods made assumptions about the image content and that a generic iris texture extraction method was not available. We then conducted an in depth review of texture characterization and pattern classification techniques in order to provide a framework for developing a region based segmentation algorithm. To this extent, we selected the grey level co-occurrence matrix, Gabor filtering, discrete wavelet transform and Markov random field techniques for texture characterization and comparative purposes. For classification of features, the fuzzy C-means and K-means clustering algorithms were considered, with the incorporation of a Fisher linear discriminant for cluster labelling. This was used for region growing. These algorithms were then compared and contrasted in several controlled experiments. The segmentation results obtained are well documented and very promising. In addition, we also provided detailed analysis and discussions on image enhancement, curve fitting and feature set analysis. We implemented grey scale morphological operators for image filtering, which reduced the noise caused by reflection and eyelashes near the eyelid boundaries. This was complimented by an application of an edge preserving filter for enhancing the eyelid border. Thereafter, the RANSAC algorithm was implemented to locate the upper and lower eyelids. A simple learning procedure provided parameter values that were used to compute criteria for accepting 166
167 or rejecting curves. We located the iris and pupil borders using the Hough transform for circle detection. The Retinex algorithm was then used to process and enhance the normalized iris images. A simple feature selection and feature subset evaluation method was provided in the section on feature selection. Our research endeavor has warranted the implementation and testing of a wide range of digital image processing techniques. In doing so, the outputs and behaviours of these implementations have provided starting points for various future work. 6.2
Limitations of the system and recommendations for future work
Most software systems have limitations that must be taken into consideration within the scope of the problem. This enables better utilization of the program’s capabilities and an understanding of the future improvements that may be possible. The following is a list of limitations and suggestions for improvements and future work regarding the texture extraction algorithm: • Feature sets: The features sets computed for texture characteristics in our research may be inferior to others in the literature. The descriptive strength of
a feature set is relative to the context of the object in the image. For example, if the object is surrounded by noise or ambiguous pixels, it will be difficult to identify if the discriminative power of the set is poor. It is important to use feature sets that are highly discriminative for an object of interest. These sets can be determined by using combinations of different features and assessing their segmentation or classification accuracy. They are not limited to a particular feature extraction method - hybrid feature sets (Fourier descriptors, etc.) are also powerful. This approach can select optimal features for representation of iris classes and non-iris classes. • Running time: A limitation of the system is its processing time, which is
relatively slow. This can be improved by using hardware specific to the pro-
cessing requirements. With current trends in technology, this can be done easily. Optimized software is another possibility. This includes fast and efficient algorithms for fuzzy C-means and K-means clustering, feature extraction and object localization.
168 • Classifiers: Our approach to feature classification uses clustering and a linear classifier. An alternative is to implement non-linear classifiers. One powerful
solution is that of the support vectors machine (SVM). Other methods are neural networks and self organizing maps. • Data set: The data set that we have conducted experiments on is composed of people of the Chinese ethnic group. A broader range of ethnic groups would provide interesting insights into pattern types and texture characterizing. 6.3
Conclusion
We have addressed iris texture segmentation using texture features and pattern recognition to facilitate the extraction of image regions. The results produced highlight that the grey level co-occurrence matrix approach is superior to Gabor filtering, discrete wavelet transform and Markov random fields. They also demonstrate the feasibility of the approach to correctly characterize and separate the relevant texture regions. Improvements have been made in image enhancement using the Retinex algorithm and histogram adjustment. We have also implemented a robust eyelid detection algorithm that uses the RANSAC algorithm. Intensive testing has been performed using data clustering. We have analyzed ”hard clustering” and ”soft clustering” using the K-means and fuzzy C-means algorithms respectively. Clustering indices have been computed together with segmentation accuracy results on a data set of 100 images. The fuzzy C-means is superior to K-means, as shown by the several experiments that were undertaken. The average segmentation accuracies are high, with standard deviations of 5 (as high as 96% segmentation accuracy in some cases). Texture discrimination and object recognition is highly effective for iris image segmentation but we recommend that it should be used in conjunction with other features and methods. With the research undertaken, we have shown and substantiated the inclusion of several machine vision approaches for designing a solution to our problem.
Bibliography [1] A. Muron, P. Kois and J. Pospisil. Identification of persons by means of the fourier spectra of the optical transmission binary models of human irises. Optics Communications, (192):161–167, 2001. [2] A. Rosenfeld and A. Kak. Digital picture processing, volume 1. Academic Press, 1982. [3] A. Rosenfeld and J.L. Pfaltz. Sequential operations in digital image processing. Journal ACM, 13(4):471–494, 1996. [4] A. Rosenfeld, R.A. Hummel and S.W. Zucker. Scene labelling by relaxation operations. IEEE Transactions on Systems, Man and Cybernetics, 6:420–433, 1976. [5] A.C. Bovik, M. Clark and W.S. Geisler. Multichannel texture analysis using localized spatial filters. IEEE Trans. on Patt. Anal. and Mach. Intell., 12(1):55– 73, 1990. [6] F.H. Adler. Physiology of the eye. MO:Mosby, St. Louis, 1965. [7] A.K. Jain. Fundamentals of digital image processing. Prentice-Hall, New Jersey, 1989. [8] A.K. Jain, A. Ross and S. Pankanti. A prototype hand geometry-based verification system. In AVBPA, pages 166–171, 1999. [9] A.K. Jain, A. Ross and Salil Prabhakar. An introduction to biometric recognition. IEEE Trans. on Circ. and Sys. for Video Tech., 14(1):4–20, 2004. [10] A.K. Jain and F. Farrokhnia. Unsupervised texture segmentation using gabor filters. Pattern Recognition, 24(12):1167–1186, 1991. [11] A.K. Jain and R.C Dubes. Algorithms for clustering data. Prentice Hall, 1998. [12] A.K. Jain, N.K. Ratha and S. Lakshmanan. Object detection using gabor filters. Pattern Recognition, 30(2):295–309, 1997. [13] J.M.H Ali and A.E Hassanien. An iris recognition system to enhance e-security environment based on wavelet theory. Advanced Modelling and Optimization, 5(2), 2003. [14] B. J¨ahne, H. Haußecker and P. Geißler. Handbook of Computer Vision and Applications - Volume 2. Academic Press, 1999. 169
170 [15] D. H. Ballard. Generalizing the hough transform to detect arbitrary shape. Pattern Recognition, 13(2):111–122, 1981. [16] B.B. Chaudhuri, N. Sarkar and P. Kundu. Improved fractal geometry based texture segmentation technique. IEE Proc.-E 140, pages 233–241, 1993. [17] J. Besag. Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society, series B, 36(2):192–236, 1974. [18] S Beucher. Watersheds of functions and picture segmentation. In Proceedings IEEE International Conference Accoustics, Speech and Signal Processing, pages 1928–1931, 1982. [19] J.C. Bezdek. Pattern Recognition with Fuzzy Object Function. Plenum Press, 1981. [20] L.P. Boles and B. Boashash. A human identification technique using images of the iris and wavelet transform. IEEE Trans. on Signal Processing, 46(4):1185– 1188, 1998. [21] C.J.C Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2:121–167, 1998. [22] P.J. Burt and E. Adelson. The laplacian pyramid as a compact image code. IEEE Trans. Comput., 31(4):532–540, 1983. [23] C. Sanchez-Avila and R. Sanchez-Reillo. Two different approaches for iris recogition using gabor filters and multiscale zero-crossing representation. Pattern Recognition, 38:231–240, 2005. [24] C. Tisse, L. Martin, L. Torres and M. Robert. Person identification technique using iris recognition. In Vision Interface, pages 294–299, 2002. [25] C. Xu and J.L. Prince. Snakes, shapes and gradient vector flow. IEEE Transactions on Image Processing, 7(3):359–369, 1998. [26] F.W. Campbell and J.G. Johnson. Application of fourier analysis to the visibility of grating. Journal of Physiology, 197:551–566, 1968. [27] J.F. Canny. A computational approach to edge detection. IEEE Trans. on Patt. Anal. and Mach. Intell., 8(6):679–698, 1986. [28] C.H. Chen, L.F. Pau and P.S.P. Wang. The Handbook of Pattern Recognition and Computer Vision (2nd Edition). World Scientific Publishing Co., 1998. [29] R. Chellappa and S. Chatterjee. Classification of textures using gaussian markov random fields. IEEE Trans. Acous., Speech Signal Proc., 33(4):959–963, 1985.
171 [30] G.R. Cross and A.K. Jain. Markov random field texture models. IEEE Trans. on Patt. Anal. and Mach. Intell., 5:25–39, 1983. [31] D. Maio, D. Maltoni, R. Cappelli, J.L. Wayman and A.K. Jain. Fvc2002: Fingerprint verification competition. In ICPR, pages 744–747, 2002. [32] D. Marr and E. Hildreth. Theory of edge detection. Proceedings of the Royal Society, B207:187–217, 1980. [33] D. Reynolds. An overview of automatic speaker recognition technology. In ICASSP, volume 4, pages 4072–4075, 2002. [34] D.A. Clausi and M.E. Jernigan. Designing gabor filters for optimal texture separability. Pattern Recognition, 33:1835–1849, 2000. [35] J. Daugman. 2d spectral analysis of cortical receptive field profiles. Vision Res., 20:847–856, 1980. [36] J. Daugman. High confidence visual recognition of persons by a test of statistical independence. IEEE Trans. on Patt. Anal. and Mach. Intell., 15(11):1148–1961, 1993. [37] J. Daugman. Statistical richness of visual phase information: update on recognizing persons by iris patterns. Int. J. Comput. Vis., 45(1):25–38, 2001. [38] J. Daugman. Demodulation by complex-valued wavelets for stochastic pattern recognition. Int. J. Wavelets, Multires. and Info. Processing, 1(1):1–17, 2003. [39] J. Daugman. The importance of being random: stastical principles of iris recognition. Pattern Recognition, 36(2):279–291, 2003. [40] D.F. Dunn and W.E. Higgins. Optimal gabor filter for texture segmentation. IEEE Transactions on Image Processing, 4(7):947–964, 1995. [41] D.H. Ballad and C.M. Brown. Computer Vision. Prentice-Hall, 1982. [42] D.L. Davies and D.W. Bouldin. A cluster separation measure. IEEE Trans. on Patt. Rec. and Mach. Intell., 1(2):224–227, 1979. [43] D.P. Casasent, J.-S. Smokelin and A. Ye. Wavelet and gabor transforms for detection. Optical Engineering, 31:1893–1898, 1992. [44] J.C. Dunn. A fuzzy relative of the isodata process and its use in detecting compact well separated clusters. Journal of Cybernetics, 3:32–57, 1974. [45] E. Cesmeli and D. Wang. Texture segmentation using gaussian-markov random fields and neural oscillator networks. IEEE Trans. on Neural Networks, 12(2):394–404, 2001.
172 [46] E.H. Land. The retinex theory of colour vision. Scientific American, pages 108–129, 1977. [47] E.H. Land. Recent advances in retinex theory. Vision Research, 26(1):7–21, 1986. [48] F. Tomita and S. Tsuji. Extraction of multiple regions by smoothing on selected neighbourhoods. IEEE Transactions on Systems, Man and Cybernetics, SMC7:107–109, 1977. [49] F. Tomita and S. Tsuji. Computer analysis of visual textures. Kluwer, 1990. [50] L. Flom and A. Safir. Iris recognition system. 1987. [51] M.M. Galloway. Texture classification using grey level run-length. Computer Graphics and Image Processing, 4:172–179, 1975. [52] S. Geman and D. Geman. Stochastic relaxation, gibbs distribution, and the bayesian restoration of images. IEEE Trans. on Patt. Anal. and Mach. Intell., 6:721–741, 1984. [53] R.C. Gonzalez and R.E Woods. Digital Image Processing. Addison-Wesley Publishing Company, 2002. [54] H. Davson. Davson’s Physiology of the Eye. MacMillan, London, 1990. [55] H. Derin and H. Elliott. Modelling and segmentation of noisy and textured images using gibbs random fields. IEEE Trans. on Patt. Anal. and Mach. Intell., 9(1):39–55, 1987. [56] R.M. Haralick. Statistical and structural approaches to texture. Proceedings of the IEEE, 67(5):786–804, 1979. [57] J. Kim, S. Cho and J. Choi. Iris recognition using wavelet features. Journal of VLSI Signal Processing, 38:147–156, 2004. [58] J.-L. Starck, F. Murtagh and A. Bijaoui. Image Processing and Data Analysis: The Multiscale Approach. Cambridge University Press, 1998. [59] J.B. Fraleigh and R.A. Beauregard. Linear Algebra. Addison-Wesley, 1995. [60] J.M. Keller, S. Chen and R.M. Crownover. Texture description and segmentation through fractal geometry. Comput. Vision Graphics Image Process., 45:150–166, 1989. [61] J.S. Weszka and A. Rosenfeld. An application of texture analysis to material inspection. Pattern Recognition, 8:195–199, 1976. [62] K. Lim, K. Lee, O. Byeon and T. Kim. Efficient iris recognition through improvement of feature vector and classifier. ETRI Journal, 23(2):61–70, 2001.
173 [63] K. Lim, Y. Wang and T. Tan. Iris recognition based on multichannel gabor filtering. In Fifth Asian Conference on Computer Vision, volume 1, pages 279– 283, 2002. [64] B. Kepenekci. Face recognition using gabor wavelet transform. Master’s thesis, The Middle East Technical University, 2001. [65] K.I. Kim, K. Jung, S.H. Park, and H.J. Kim. Support vector machines for texture classification. IEEE Trans. on Patt. Anal. and Mach. Intell., 24(11):1542– 1550, 2002. [66] T. Kohonen. Self-organizing Maps. Springer Verlag, 1995. [67] W.K. Kong and D. Zhang. Detecting eyelash and reflection for accurate iris segmentation. International Journal of Pattern Recognition and Artificial Intelligence, 17(6):1025–1034, 2003. [68] K.R. Namuduri, R.Mehrotra and N. Ranganathan. Efficient computation of gabor filter based multiresolution responses. Pattern Recognition, 27:925–938, 1994. [69] P.C. Kronfeld. The gross embryology of the eye. The Eye, 1:1–66, 1968. [70] P. Kruizinga and N. Petkov. Nonlinear operator for orientated texture. IEEE, 8(10):1395–1407, 1999. [71] K.S. Fu. Syntactic Pattern Recognition and Applications. Prentice-Hall, 1982. [72] L. Hong, Y. Wan and A. Jain. Fingerprint image enhancement: algorithm and performance evaluation. IEEE Trans. on Patt. Anal. and Mach. Intell., 20(8):777–789, 1998. [73] L. Ma, T. Tan, Y. Wang and D. Zhang. Personal identification based on iris texture analysis. IEEE Trans. on Patt. Anal. and Mach. Intell., 25(12):1519– 1533, 2003. [74] L. Ma, T. Tan, Y. Wang and D. Zhang. Efficient iris recognition by characterizing key local variations. IEEE Transactions on Image Processing, 13(6):739– 750, 2004. [75] L. Ma, Y. Wang and T. Tan. Iris recognition using circular symmetric filters. In ICPR, volume 2, pages 414–417, 2002. [76] K.I. Laws. Rapid texture identification. In SPIE, pages 376–380, 1980. [77] K.I. Laws. Textured Image Segmentation. PhD thesis, University of Southern California, 1982.
174 [78] L.S. Davies, S.A. Johns and J.K. Aggarwal. Texture analysis using generalized co-occurence matrices. IEEE Trans. on Patt. Anal. and Mach. Intell., 1(3):251– 259, 1979. [79] M. A. Fischler and R. C. Bolles. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Comm. of the ACM, 24:381–395, 1981. [80] M. Kass, A. Witkin and D.Terzopoulos. Snakes: Active contour models. Int. J. Computer Vision, 1(4):321–331, 1987. [81] M. Sonka, V. Hlavac and R. Boyle. Image Processing: Analysis and Machine Vison. PWS Publishing Company, 1999. [82] M. Tuceryan and A.K. Jain. Texture segmentation using voronoi polygons. IEEE Trans. on Patt. Anal. and Mach. Intell., 12:211–216, 1990. [83] J. MacQueen. Some methods for classification and analysis of multivariate observations. In Proceedings of the 5th Berkeley Symposium-1, pages 281–297, 1967. [84] S. Mallat. A theory of multiresolution signal decomposition. IEEE Trans. on Patt. Anal. and Mach. Intell., 11(7):674–693, 1989. [85] S. Mallat. Zero-crossings of a wavelet transform. IEEE Trans. Inform. Theory, 37(14):1019–1033, 1991. [86] B.B. Mandelbrot. The Fractal Geometry of Nature. Freeman, 1982. [87] F. Meyer and S. Beucher. Morphological segmentation. Journal of Visual Communication and Image Representation, 1:21–46, 1990. [88] M.M. Trivedi, R.M. Haralick, R.W. Conners and S. Goh. Object detection based on grey level co-occurrence. Computer Vision, Graphics and Image Processing, 28:199–219, 1984. [89] N. Ahuja. Dot pattern processing using voronoi neighbourhood. IEEE Trans. on Patt. Anal. and Mach. Intell., 4:336–343, 1982. [90] N. Paragios and R. Deriche. Geodesic active regions for supervised texture segmentation. In International Conference on Computer Vision, volume 2, pages 926–932, 1999. [91] N.K Ratha, R. M. Senior and Bolle R.M. Automated biometrics. In ICAPR, pages 445–474, 2001. [92] P. Kruizinga, N. Petkov and S.E. Grigorescu. Comparison of texture features based on gabor filters. In Proceedings of the 10th International Conference on Image Analysis and Processing, pages 142–147, 1999.
175 [93] T. Pavlidis. Structural Pattern Recognition. Springer Verlag, 1997. [94] A.P. Pentland. Fractal-based description of natural scenes. IEEE Trans. on Patt. Anal. and Mach. Intell., 6:661–674, 1984. [95] Q. Zhang, J. Wang, P. Gong and P. Shi. Study of urban spatial patterns from spot panchromatic imagery using textural analysis. Int. J. Remote Sensing, 24(21):4137–4160, 2003. [96] R. Chellapa. C.L. Wilson and S. Sirohey. Human and machine recognition of faces: a survey. Proceedings of the IEEE, 83(5):705–740, 1995. [97] R. Urquhart. Graph theoretical clustering based on limited neighbourhood sets. Pattern Recognition, 15:173–187, 1982. [98] R.M. Haralick, K. Shanmugam and I. Dinstein. Texture features for image classification. IEEE Transactions on Systems, Man and Cybernetics, 8(6):610– 621, 1973. [99] R.O. Duda, P.E. Hart and D.G. Stork. Pattern classification. John Wiley and Sons, 2001. [100] R. Rosenblatt. Principles of Neurodynamics. Spartan Books, 1962. [101] R.W. Conners and C.A. Harlow. A theoretical comparison of texture algorithms. IEEE Trans. on Patt. Anal. and Mach. Intell., PAMI-2:204–222, 1980. [102] S. Krishnamachari and R. Chellappa. Multiresolution gauss-markov random field models for texture segmentation. IEEE Transactions on Image Processing, 6(2):251–267, 1997. [103] U. Schramm. Automatische Oberfl¨achenpr¨ ufung mit neuronalen Netzen. Stuttgart:IRB-Verlag, 1994. [104] S.L. Horowitz and T. Pavlidis. Picture segmentation by a directed split-andmerge procedure. In Proceedings 2nd International Joint Conference on Pattern Recognition, pages 424–433, 1974. [105] S.W. Zucker. Toward a model of texture. Computer Graphics and Image Processing, 5:190–202, 1976. [106] S.Z. Li and J. Lu. Face recognition using the nearest feature line method. IEEE Trans. on Neural Networks, 2(10):439–443, 1999. [107] T. Ojala, M. Pietik¨ainen and D. Harwood. A comparative study of texture measures with classification based on feature distribution. Pattern Recognition, 29:51–59, 1996.
176 [108] T. Ojala, M. Pietik¨ainen and T. M¨aenp¨aa¨. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. on Patt. Anal. and Mach. Intell., 24(7):971–987, 2002. [109] M. Tuceryan. Moment based texture segmentation. Pattern Recognition Letters, 15:659–668, 1994. [110] M. Unser. Sum and difference histograms for texture analysis. IEEE Trans. on Patt. Anal. and Mach. Intell., 8:118–125, 1986. [111] V. Caselles, R. Kimmel and G. Sapiro. Geodesic active contours. International Journal of Computer Vision, 22(1):61–79, 1997. [112] L. Vincent and P. Soille. Watersheds in digital spaces: An efficient algorithm based on immersion simulations. IEEE Trans. on Patt. Anal. and Mach. Intell., 13(6):583–598, 1991. [113] W. Zorski, B. Foxon, J. Blackledge and M. Turner. Fingerprint and iris identification method based on the hough transform. In Proceedings of IMA Third Conference on Imaging and Digital Image Processing, 2002. [114] S.A. Teukolsky W.H. Press, B.P. Flannery and W.T. Vetterling. Numerical recipes in C - The art of scientific computing. Cambridge University Press, 2002. [115] R.P. Wildes. Iris recognition: an emerging biometric technology. Proceedings of the IEEE, 85(9):1348–1362, 1997. [116] J.W. Woods. Two-dimensional discrete markovian fields. IEEE Trans. Info. Theory, 18(2):232–240, 1972. [117] Y. Bulatov, S. Jambawalikar, P. Kumar and S. Sethia. Hand recognition using geometric classifiers. In ICBA, 2004. [118] Y. Chen, M. Nixon and D. Thomas. Statistical geometrical features for texture classification. Pattern Recognition, 28:537–552, 1995.