Aug 29, 2012 - Information security application security, database security, file encryption. ⢠Internet access .... Converting images to a 1D vector would ... Convert TIF images to PGM: images provided by NIST are originally in TIF. (Tagged ...
Face Recognition System Ali Almuhamadi 1339214 29-08-2012
Master of Media and Knowledge Engineering Interactive Intelligence Group Faculty of Electrical Engineering, Mathematics and Computer Science
2
Graduation Committee Prof. Drs. Dr. Leon J.M. Rothkrantz Dr. ir. Pascal Wiggers Ir. H.J. Geers
Ali Almuhamadi Face Recognition System Student number: 1339214 Media and Knowledge Engineering 29-08- 2012 Delft University of Technology Faculty of Electrical Engineering, Mathematics and Computer Science Master of Media and Knowledge Engineering Interactive Intelligence Group Mekelweg 4 2628 CD Delft The Netherlands
3
4
Abstract In this thesis a research about a face recognition system is presented. The system uses a database of images to identify individuals. The starting point is a huge database of “known faces”, the FERET database. If a new face is presented the system has to decide whether this face is a member of the database or not. The matching of a face can be realized using the eigenfaces algorithm. Applying PCA (Principal Component Analysis) the database can be reduced to a finite number of eigenfaces, such that every face can be approximated by a weighted sum of eigenfaces. Every face can be represented by a column of elements and a distance measure is used to compute the distance between columns. In this thesis a face recognition system has been implemented and tests are performed. Two algorithms PCA (Principal Component Analysis) and ICA (Independent Component Analysis) have been implemented and the test results are compared with each other.
5
Table of content 1
Introduction.............................................................................................................9 1.1 General problems in face recognition .............................................................10 1.2 Problem definition .........................................................................................11 1.3 Societal relevance and background.................................................................12 1.4 Research goals ...............................................................................................14 1.5 Methodology..................................................................................................14 1.6 Outline of the thesis .......................................................................................16 2 Face recognition theoretical background ................................................................17 2.1 Preprocessing.................................................................................................17 2.2 A generic face recognition system..................................................................18 2.2.1 Face detection and segmentation ............................................................19 2.2.2 Feature extraction...................................................................................20 2.2.3 Face recognition.....................................................................................20 2.3 Face recognition using PCA eigenfaces..........................................................21 3 Test system architecture.........................................................................................25 3.1 PCA test system architecture..........................................................................25 3.1.1 Training the system ................................................................................27 3.1.2 Classification and recognition ................................................................29 3.2 Independent component analysis algorithm ....................................................30 3.2.1 ICA architectures ...................................................................................31 3.3 Differences between ICA & PCA...................................................................33 4 Implementation......................................................................................................35 4.1 PCA implementation......................................................................................35 4.2 ICA implementation.......................................................................................38 4.2.1 Architecture I experiment .......................................................................38 4.2.2 Other support classes..............................................................................42 4.3 Distance measures..........................................................................................42 5 Tools .....................................................................................................................43 5.1 The FERET database .....................................................................................43 5.2 ImageMagick s 'convert' utility ......................................................................44 5.3 CSU system preprocessing tool......................................................................45 5.4 Matlab ...........................................................................................................45 6 Experiment ............................................................................................................47 6.1 Experiment approach .....................................................................................47 6.2 PCA evaluation results ...................................................................................48 6.3 ICA evaluation results....................................................................................52 6.4 Evaluation results PCA vs. ICA .....................................................................56 6.5 Error analysis of recognized image sets..........................................................57 7 Conclusion and future work...................................................................................63 8 References .............................................................................................................65 Appendix A FERET naming convention........................................................................67 Appendix B snippet from the coords.3368 file. ..............................................................69 Appendix C training list 1..............................................................................................70 Appendix D training list 2 .............................................................................................72 Appendix E UML diagrams...........................................................................................74
6
List of figures Figure 1 Different appearances of a face........................................................................11 Figure 2 Surveillance face recognition systems..............................................................13 Figure 3 Preprocessed image .........................................................................................18 Figure 4 Configuration of a generic face recognition system..........................................19 Figure 5 Different faces found by algorithms PCA eigenfaces row1, ICA row 2, 3 and 4....................................................................................................................................22 Figure 6 Eigenface ........................................................................................................23 Figure 7 System processes capturing, training and updating modes ...............................26 Figure 8 System processes recognition mode.................................................................27 Figure 9 1D vector extracted from 2D image .................................................................28 Figure 10 ArchitctureI ...................................................................................................32 Figure 11 ArchitectureII ................................................................................................32 Figure 12 FERET images ..............................................................................................44 Figure 13 Variance captured by the eigenvectors ...........................................................49 Figure 14 Sharpening of images, image on the most left is the original ..........................58 Figure 15 Noise added to the images, original image in the middle................................58 Figure 16 White noise added to the images....................................................................58 Figure 17 Unsharpened images, original at the most left ................................................58 Figure 18 Blurred image, original at the most left ..........................................................58 Figure 19 Contrast changed left is the original image.....................................................62 Figure 20 System class diagram.....................................................................................74 Figure 21 Use case diagram...........................................................................................74 Figure 22 Sequence diagram train system ......................................................................75 Figure 23 Sequence diagram recognize image ...............................................................75
7
List of tables Table 1 Size of image sets .............................................................................................44 Table 2 Image sets.........................................................................................................48 Table 3 Training set I recognition results .......................................................................48 Table 4 PCA recognition results training set I subdim=200............................................50 Table 5 PCA recognition results training set I subdim=428............................................50 Table 6 PCA recognition results training set II subdim=200 ..........................................51 Table 7 PCA recognition results training set II subdim=501 ..........................................51 Table 8 ICA recognition results training set I, weight matrix size=200, block size=50 initial L=0.0005.............................................................................................................52 Table 9 ICA recognition results training set I, weight matrix size=200, block size=50 initial L=0.001...............................................................................................................53 Table 10 ICA recognition results training set I, weight matrix size=300, block size=50 initial L=0.0005.............................................................................................................53 Table 11 ICA recognition results training set I, weight matrix size=200, block size=20 initial L=0.0005.............................................................................................................54 Table 12 ICA recognition results training set I, weight matrix size=300, block size=20 initial L=0.0005.............................................................................................................54 Table 13 ICA recognition results training set II, weight matrix size=300, block size=50 initial L=0.0005................................................................................................55 Table 14 ICA recognition results training set II, weight matrix size=200, block size=50 initial L=0.001..................................................................................................55 Table 15 ICA recognition results training set II, weight matrix size=300, block size=50 initial L=0.001..................................................................................................56 Table 16 ICA recognition results training set II, weight matrix size=200, block size=50 initial L=0.0005................................................................................................56 Table 17 PCA error analysis results of modified recognized image................................59 Table 18 ICA error analysis results of modified recognized image.................................59 Table 19 PCA error analysis results of modified unrecognized image ............................60 Table 20 ICA error analysis results of modified unrecognized image.............................60 Table 21 PCA error analysis recognition results of an unrecognized image....................61 Table 22 ICA error analysis recognition results of an unrecognized image ....................61 Table 23 PCA recognition results of unrecognized FB images.......................................62
8
1 Introduction Man machine interaction is a research area that has been focusing on enhancing computer support to humans by mimicking their human abilities. Computers can provide support starting from the daily life task to highly skilled job tasks. A good example for such a task is automated face recognition that could more efficiently be performed by computers. Over 40 years of research in the area of face recognition have introduced many theories and methods on how face recognition can be implemented using a computer system. In most applications there is a database of known faces. If a new face has been presented the system has to decide whether it is a known face. Using the pre-captured FERET images database [2], the possibly of implementing a face recognition system based on the PCA Eigenfaces approach [1] is evaluated in this thesis. The focus lay on studying the algorithm, test and verify its performance with the FERET database using a test setup especially created for this propose. Using the same test setup PCA’s performance is compared with the performance of a second algorithm, namely ICA (Independent Component Analysis) [18]. The results found during the evaluation where promising enough to use the algorithm as a basis to design and implement a face recognition system. Identifying the strong points and weaknesses of the algorithm produced a clear view on how further improvements and future work could be conducted to improve the system’s performance. Although video based recognition has common ground with the image based approach, video based face recognition shall be out of the scope of this research.
9
1.1 General problems in face recognition Many problems have been encountered when attempts where made to develop face recognition systems. To provide background knowledge and a quick snapshot of what could be expected when developing such a system, it is useful to list some of these problems and remarks: • It is well known that for face detection, the image size can be quite small, but what about face recognition? Clearly the image size cannot be too small for methods that depend heavily on accurate feature localization. • Accurate feature location is critical for good recognition performance. This is true even for holistic matching methods, since accurate location of key facial features such as eyes is required to normalize the detected face. • Face recognition is not a dedicated process; the recent success of machine systems that are trained on large numbers of samples seems to confirm recent findings suggesting that extensive training is required. • The challenge of developing face detection techniques that report not only the presence of a face but also the accurate locations of facial features under large pose and illumination variations still remains a big challenge. • How to model face variation under realistic settings is still challenging for example, outdoor environments, natural aging, etc. • Many years of research has provided a vast number of methods and systems, recognizing the fact that each method has its advantages and disadvantages. Without the presence of generic method makes the selection of the best methods and systems application dependent. • Though machine recognition of faces from images has achieved a certain level of success, its performance is still far from that of human perception. • Face recognition in an uncontrolled environment is still very challenging. • Holistic approaches provide quick recognition methods, but the discriminate information that they provide may not be rich enough to handle very large databases. • The illumination variation problem and the pose variation problem Illumination variation: where the same face appears different due to a change in lighting. The changes induced by illumination are often larger than the differences between individuals, causing systems based on comparing images to misclassify input images. Pose variation: the performance of face recognition systems drops significantly when large pose variations are present, in the input images. • Significant illumination or posture change can cause serious performance degradation. Or in some cases, face images can be partially occluded. • These problems are unavoidable when face images are acquired in an uncontrolled, uncooperative environment. A system may need to recognize a person from an image in the database that was acquired some time ago needs to deal with these issues.
10
1.2 Problem definition Face recognition is one of the abilities that people develop in their early years, starting from the early months after their birth and keeps on improving with age. Some of us are very good at it and have an almost photographic memory while the others are not that good. The ability and performance of humans to recognize faces varies very much depending on many factors like memory, IQ, age, environment, face features etc. The performance of computer systems when performing face recognition also depends on many factors and is even more sensitive to changes, especially environment and input data changes. Figure 1 illustrates such changes. For these and other reasons performing face recognition should made in a specific controlled manner, based on a limited, properly defined input data. The face recognition problem of static images has been in general formulated as recognizing three-dimensional (3D) faces from two-dimensional (2D) images. The process of recognition depends on the implemented steps that could affect the results dramatically. The process starts first with capturing usable face images without missing essential face features, such as nose, eyes, eyebrows, mouth, and chin. The second step involves preprocessing the captured images to extract the face features that are essential for the recognition process, at the same time preprocessing discards (clips) unimportant parts of the image that could slow down or disturb the result. Depending on the used recognition algorithms, other steps such as training, classification, and finally recognition are triggered to recognize the individual presented to the system. Sections 2.2 and 3 contain details about the face recognition system and explain the face recognition stages in details.
Figure 1 Different appearances of a face
To develop an automatic system using existing technology, that will mimic the remarkable face recognition ability of humans with the advantage of a computer system and its capacity to handle large numbers of face images. Many studies in psychophysics and neuroscience have been conducted, which has delivered
11
knowledge with direct relevance for engineers to developing algorithms in the domain of face recognition. Based on these studies, in [7] a summary was given to the following topics and research questions: • Is face recognition a dedicated process? • Is face perception the result of holistic or feature analysis? • Ranking of significance of facial features. • Caricatures. • Distinctiveness. • The role of spatial frequency analysis. • Viewpoint-invariant recognition. • Effect of lighting change. • Movement and face recognition. • Facial expressions This research focuses on evaluating the PCA Eigenfaces [1] and the ICA [18] approaches. Using the evaluation results a system design that uses these algorithms shall be proposed to perform face recognition. Holistic matching methods are methods that use the whole face region as the raw input to the face recognition system. PCA and ICA are two of the most important holistic methods.
1.3 Societal relevance and background The need for automated face recognition applications in areas such as biometrics, information security, law enforcement and surveillance, access control and many other areas, has been growing enormously over the last decade. Face recognition involving large data volumes and a high level of concentration and speed would be almost an impossible task to be carried out by humans. Computers on the other side can support/serve or even replace humans to perform face recognition tasks, but need to be as accurate as humans. Human’s visual and perceptual processing abilities are far advanced than the best hardware available today and can achieve very good results in different or continuously changing environment.
12
Figure 2 Surveillance face recognition systems
For many years people have been trying different approaches to develop and improve the visual and perceptual processing abilities of computers to reach a level of accuracy as good as the human’s ability. Solving the face identification problem has a history of about 40 years of research, many possibilities have been investigated to solve the problem. Research stages varied from active in the mid 70’s looking at the problem as a pattern classification technique, to remain dormant in the 80’s. Since the early 1990s, research interested in face recognition has again grown significantly. One can attribute this to several reasons: • An increasing interest in commercial opportunities. • The availability of real-time hard/soft-ware, and enormous improvement in the processing capacity of hardware. • The increasing importance of surveillance-related applications. Face recognition can also improve the HCI (human computer interaction) by taking it to a next level. It could enhance the man-machine communication by enabling a better interaction between the human and the machine. Intelligent software agents can be developed to adapt their interaction with users as they recognize the user. Possible applications are listed below: • Video game, virtual reality, training programs.
13
• • • • • • • • •
Human-robot-interaction, human-computer-interaction. Drivers’ licenses, entitlement programs. Immigration, national ID, passports, voter registration Welfare fraud. TV parental control, personal device logon, desktop logon. Information security application security, database security, file encryption. Internet access, medical records. Secure trading terminals. Advanced video surveillance, CCTV control and surveillance portal control. Shoplifting, suspect tracking and investigation.
1.4 Research goals The need and demand for automated security and surveillance systems has grown enormously over the last decade, were of course automated face recognition has a great share of identifying individuals. Many of the face recognition approaches are available that can be used to implement face recognition systems. Unfortunately the available approaches lack of validation, as there are no known systems available that are based on these approaches. Commercial system could use or combine these, but usable details for further research are not published. Therefore the goals of this thesis have been identified as follows: • Study two of the famous face recognition algorithms, namely PCA and ICA. • Build a face recognition system using both PCA and ICA. • Test both algorithm using the same test condition and measures on various FERET database image sets and analyze the results.
1.5 Methodology To guide the research to a successful end and based on the research goals provided in the previous section, the following milestones were set: 1. Literature research The main goal of this stage aimed at exploring the available state of the art approaches used for face recognition of human images, and select algorithms that can be implemented in an application or a system to perform face recognition tasks. The application tries to recognize the input image by matching it with existing images. 2. Collecting images and image sets validation The need for data to develop, train and test the system was one of the essential parts of this research. The goal was to explore which possibilities were available to get or collect images the can be used in this research to evaluate and train the algorithms. The FERET database was selected as a resource for images. Other tasks related to this stage involved creating training sets and test sets of images that can be used during the experiment. 3. Experiment setup and algorithms evaluation Using information provided in papers, algorithm description and source code that were studied during the literature research stage, a more detailed analysis was done to verify whether the information provided in the literature was detailed enough to perform the experiment and achieve promising recognition results. Verification also 14
includes verifying all image sets collected for training and testing. The implemented system and the tests executed using it shall provide sufficient details to design a standalone application. Several validation and analysis rounds where performed. Such as error analysis to identify why some images where not recognized correctly. The best distance measure analysis and number of first principal components analysis was also evaluated to determine which measure performs best in combination with each algorithm. 4. Experiment execution, evaluation and system design Based on scripts and code implemented using Matlab and results achieved performing the experiment. A face recognition system was implemented, where an input image is provided and the system presents an image as an output that should match the input image. Executing test cases with different image sets to identify the recognition rate and conditions under which the system could operate, input images, type and specifications. The same results shall also be used to identify failure and bugs. This stage includes creating architecture and UML diagrams and providing other design details. 6. Identify improvements and future work Based on the achieved test results, an analysis is done to identify problems and weaknesses of the system and provide a set of improvements that can be implemented to improve the system and bring it to a more mature state.
15
1.6 Outline of the thesis The subject discussed in this thesis is spread over the document as follows: Chapter 22, starts with discussing the theoretical back ground information about face recognition, how a face recognition system could be build, which architectures and algorithms are used. Chapter 3 discuses the system architecture and its hart, namely the PCA and ICA algorithms in details and the differences between them. Chapter 4 discusses the implementation details of the system and the implemented classes. Chapter 5 provides an overview of the tools and applications used during the research. In chapter 6 the executed test and the performance results obtained during the experiment are presented. Finally in chapter 7 the conclusion and future work is discussed.
16
2 Face recognition theoretical background As different approaches are available to perform face recognition, this section gives an overview of approaches available for implementing a face recognition system. It also describes the generic steps of the face recognition process.
2.1 Preprocessing Images taken (stored) by any image capturing process, shall always contain not only the face image but also objects and items from the environment. The FERET database [2] images were used for this research, images were captured with an empty background containing no other objects. The images are gray scale and were made available in TIF format with the size of 256 x 384 pixels. Converting images to a 1D vector would result in a vector size of (256 x 384 = 98304). Section 5.1 gives a detailed overview of the FERET database. The focus of this research lays only on the recognition of human face images. Hence any irrelevant information that could affect the recognition process should be removed or smoothened, here where preprocessing becomes an essential part of the system. The key is to find the optimal balance between removing parts form the image to reduce the computational complexity and undesired noise, while maintaining essential facial features provided as input to the recognition algorithm. Depending on the preprocessing stage, results achieved by the algorithm and the required computational time could be affected dramatically. Preprocessing the standard FERET images involved several steps, a list of these steps and a description of each one is provided: 1. Convert TIF images to PGM: images provided by NIST are originally in TIF (Tagged Image File) format, these were converted to PGM (Portable Gray Map) format. More details on the tool used to convert the images is provided in section 5.2. 2. Geometric normalization: lines up human chosen eye coordinates, this is done by rotating the images so the face is straight up and down, and zooming in/out on the face so that all faces are the exact same size. 3. Masking: crops the image and image borders using an elliptical mask, such that only the face features from forehead to chin and cheek to cheek are visible. Removing unnecessary pixels outside of the face region reduces the computational processing time. The size of the images was reduced to keep images of size 150x130 pixels. 4. Histogram equalization: equalizes the histogram of the unmasked part of the image. Balancing the color to get maximum pixel depth in the facial region. 5. Pixel normalization: scales the pixel values to have a mean of zero and a standard deviation of one. Preprocessing the images in steps 2 to 5 was done using the exact coordinates of the eyes and a tool (CSUPreprocessNormalize) provided by the CSU Face Identification Evaluation System [23]. More details on the tool are provided in section 05.2.
17
The image below is an example of a preprocessed image.
Figure 3 Preprocessed image
Both algorithms PCA and ICA that are discussed in the next sections are based on two stages, namely: a training stage and a recognition or matching stage. In order to evaluate and compare recognition performance achieved by both algorithms, the same preprocessing approach and tool was used. Before any face can be provided as input either for training or matching it must be pre-processed and normalized. The same sets of images were used to train and verify the performance of both algorithms.
2.2 A generic face recognition system An automatic face recognition system involves three main key steps during execution [7]: 1. Detection and rough normalization of faces. 2. Feature extraction and accurate normalization of faces. 3. Identification and verification. The input to the system is an unknown face, and the system reports back the determined identity from a database of known individuals, whereas in verification problems, the system needs to confirm or reject the claimed identity of the input face. (Figure 4) illustrates the generic configuration of a face recognition system.
18
Figure 4 Configuration of a generic face recognition system
A brief summary of each of the key steps of generic face recognition is given next.
2.2.1 Face detection and segmentation Segmentation focuses on single-face segmentation from a simple or complex background. Several approaches could be used, e.g. a whole-face template, a featurebased template, skin color, or neural networks. Significant advances have been made in achieving automatic face detection under various conditions. Compared with feature-based methods and template-matching methods, appearance or image-based methods [9] [10] that train systems on large numbers of samples have achieved the best results. This may not be surprising since face objects are complicated, very similar to each other, and different from non-face objects. Through extensive training, computers can be quite good at detecting faces. Detection of faces under rotation has been studied in depth. One approach is based on training using multiple view samples [11] [12]. Compared with invariant-featurebased methods [8], multi-view based methods of face detection and recognition seem to be able to achieve better results when the angle of out-of-plane rotation is large (35◦). Whether face recognition is viewpoint-invariant or not is still under discussion. Studies seem to support the idea that for small angles, face perception is viewindependent, while for large angles, it is view-dependent. In a detection problem, two statistics are important: true positives (also referred to as detection rate) and false positives (reported detections in non-face regions). An ideal system would have a very high true positive and very low false positive rates. In practice, these two requirements are conflicting. Treating face detection as a two-class classification problem helps to reduce false positives dramatically [9] [10] while maintaining true positives. This is achieved by retraining systems with false positive samples that are generated by previously trained systems.
19
2.2.2 Feature extraction Many face recognition systems need facial features in addition to the holistic face. Even holistic matching methods, for example, eigenfaces, need accurate locations of key facial features such as center points of the eyes, nose, and mouth to normalize the detected face. Three types of feature extraction methods can be distinguished: 1. Generic methods based on edges, lines, and curves. 2. Feature-template-based methods that are used to detect facial features such as eyes centers, nose and mouth. 3. Structural matching methods that take into consideration geometrical constraints on the features. Approaches have used structural matching methods, for example, the Active Shape Model [11]. Compared to earlier methods, these recent statistical methods are much more robust in terms of handling variations in image intensity and feature shape. An even more challenging situation for feature extraction is feature “restoration”, which tries to recover features that are invisible due to large variations in head pose. The best solution here might be to hallucinate the missing features either by using the bilateral symmetry of the face or using learned information. For example, a viewbased statistical method claims to be able to handle even profile views in which many local features are invisible [12].
2.2.3 Face recognition In this section the different approaches to implement face recognition systems are discussed. Following the guidelines suggested by the psychological studies of how humans use holistic and local features to recognize faces, face recognition approaches are categorized to four categories [7]: 1. Holistic matching methods, these methods use the whole face region as the raw input to a recognition system. 2. Feature-based (structural) matching methods, typically, in these methods local features such as the eyes centers, nose, and mouth are first extracted and their locations and local statistics (geometric and/or appearance) are fed into a structural classifier. 3. Template based method is based on matching a predefined parameterized template to an image that contains a face region. Two templates are used for matching the eyes and mouth respectively. An energy function is defined that links edges, peaks and valleys in the image intensity to the corresponding properties in the template, and this energy function is minimized by iteratively changing the parameters of the template to fit the image. [15] compares the template based and feature matching methods. 4. Hybrid methods, just as the human perception system uses both local features and the whole face region to recognize a face, a machine recognition system should use both. These methods could potentially offer the best of the two types of method. Holistic methods use the whole face region as the raw input to a recognition system. PCA and ICA are two of the important holistic methods.
20
2.3 Face recognition using PCA eigenfaces Eigenfaces have been one of the major driving forces behind face representation, detection, and recognition. PCA (Principal component analysis) is one of the first approaches developed in the face recognition area. PCA is derived from KarhunenLoeve's transformation. Given an s-dimensional vector representation of each face in a training set of images, PCA tends to find a t-dimensional subspace whose basis vectors correspond to the maximum variance direction in the original image space [24]. This new subspace is normally of lower dimension. If the image elements are considered as random variables, the PCA basis vectors are defined as eigenvectors of the scatter matrix S. Pentland et.al [1] presented an approach to detect and identify human faces and provides a description of a near real-time face recognition system which tracks a subject’s head and then recognizes the person by comparing characteristics of the face to those of known individuals. The approach treats face recognition as a twodimensional recognition problem, taking advantage of the fact that faces are normally upright and thus may be described by a small set of 2-D characteristic views. Face images are projected onto a feature space (“face space”) that best encodes the variation among known face images. The face space is defined by the “eigenfaces”, which are the eigenvectors of the set of faces; they do not necessarily correspond to isolated features such as eyes, ears, and noses. The framework provides the ability to learn to recognize new faces in an unsupervised manner. In information theory terms the following steps have to be executed: extract relevant information in a face image, encode it as efficiently as possible, and compare one face encoding with a database of models encoded similarly. A simple approach to extracting the information contained in an image of a face is to capture the variation in a collection of face images, independent of any judgment of features, and use this information to encode and compare individual face images. In mathematical terms, find the principal components of the distribution of faces, or the eigenvectors of the covariance matrix of the set of face images (training set). These eigenvectors can be thought of as a set of features which together characterize the variation between face images. Each image location contributes more or less to each eigenvector, eigenvectors can be displayed as a sort of ghostly faces (Figure 5) which are called eigenfaces.
21
Figure 5 Different faces found by algorithms PCA eigenfaces row1, ICA row 2, 3 and 4
PCA [5] transforms a number of possibly correlated variables into a smaller number of uncorrelated variables called principal components. PCA [1] calculates Eigenfaces by letting a face image I(x, y) be a two-dimensional N by N array of intensity values, or a vector of dimension N 2. A typical image of size 256 by 256 describes a vector of dimension 65,536, or, equivalently, a point in 65,536-dimensional space. An ensemble of images, then, maps to a collection of points in this huge space. Images of faces, being similar in overall configuration, will not be randomly distributed in this huge image space and thus can be described by a relatively low dimensional subspace. The main idea of the principal component analysis (or Karhunen-Loeve expansion) is to find the vectors which best account for the distribution of face images within the entire image space. These vectors define the subspace of face images, which is called “face space”. Each vector is of length N2, describes an N by N image, and is a linear combination of the original face images. Because these vectors are the eigenvectors of the covariance matrix corresponding to the original face images, and because they are face like in appearance, they are referred to as “eigenfaces.” An eigenface is shown in (Figure 6).
PCA [4] is the most widely used subspace projection technique. In PCA, the basis vectors are obtained by solving the algebraic eigenvalue problem A=RT(XXT)R where X is a data matrix whose columns are training samples, R is a matrix of the eigenvectors, and A is the corresponding diagonal matrix of the eigenvalues. The projection of the data, Cn = Rn XT, from the original p dimensional space to a
22
subspace spanned by n principal eigenvectors is optimal in the mean squared error sense, i.e. the re-projection of Cn back into the p dimensional space has minimum reconstruction error.
Figure 6 Eigenface
Each face image in the training set can be represented exactly in terms of a linear combination of the eigenfaces. The number of possible eigenfaces is equal to the number of face images in the training set. However the faces can also be approximated using only the “best” eigenfaces - those that have the largest eigenvalues, and which therefore account for the most variance within the set of face images. The primary reason for using fewer eigenfaces is computational efficiency. The best M eigenfaces span an M-dimensional subspace “face space” of all possible images.
23
24
3 Test system architecture The following steps summarize the face recognition process: Mode 1 1. Capturing: This step involves capturing and storing images to introduce individuals to the system. As the FERET image database was used as a source of images this step was not performed. Mode 2 2. Initialization: Acquire the training set of n face images. 3. Training: Calculate the M eigenfaces incase of PCA, or the Independent Components in case of ICA, for a set of images which define the training face space. Mode 3 4. Pre-classification: Determine if the image is a face at all (whether known or unknown) by checking to see if the image is sufficiently close to “face space.” This step could be applied in real time of off-line systems to reduce processing time when the number of supplied images are very high 5. Projection: When a new face image is encountered, calculate a set of weights based on the input image and the M eigenfaces by projecting the input image onto each of the eigenfaces. Incase of ICA calculated the ICs (Independent Components) of the provided image. 6. Classification: Classify the weight pattern as either a known person or as unknown using a distance measure that best work for a specific algorithm. 7. Learning (Optional): If the same unknown face is seen several times, calculate its characteristic weight pattern and incorporate it into the known faces database (i.e., learn to recognize it).
3.1 PCA test system architecture The face recognition approach presented by Pentland et.al (PCA) [1] relies on mode two and three. Mode two (training mode) involves step 2 & 3 which is executed only once or when improvements and updating of the face space needs to be incorporated. Mode three (operating mode), which involves repeating steps 4 to 6 and optionally step 7 every time when a face is presented to the algorithm to determine whether it can be recognized or not. (Figure 7) & (Figure 8) illustrate the three modes and the order in which the various steps are involved.
25
Figure 7 System processes capturing, training and updating modes
26
Figure 8 System processes recognition mode
3.1.1 Training the system The images provided to the system, are two dimensional (2D) images. At this stage each image I(x,y) from the data set is converted to a 1D vector of N (N=X*Y) elements. The set of these vectors create the system’s data space.
27
Figure 9 1D vector extracted from 2D image
Performing recognition in a high dimensionality space arise problems and is computationally very expensive. The main idea behind PCA is to reduce the large dimensionality of the data space to a smaller intrinsic dimensionality of feature space (independent variables), which are needed to describe the data economically. First of all a set of images is selected to be used for training the system. Each face image I is converted to a 1D vector Γ1 , Γ 2 , Γ3 , Γ 4 … Γn of N elements. The idea is to represent Γ as: Φ i = Γi − Ψ (1) Ψ is the mean face of the data set. Calculating the mean face is done using the following equation:
1 Ψ= M
M
∑Γ
(2)
i
i =1
The data set of very large vectors is then subject to principal component analysis. Which seeks a set of M orthonormal vectors U n and their associated eigenvalues λ k which best describe the distribution of the data set. The eigenvectors and eigenvalues are calculated for the covariance matrix:
C=
1 M
∑
M n =1
Φ n Φ Tn (3)
= AAT Where A = [Φ 1 Φ 2 ....Φ M ] , AAT is very large, hence it not practical to be used for computing the eigenvectors and eigenvalues. As explained in [1] AT A (M x M matrix) has up to M eigenvalues and vectors, which correspond to the largest eigenvalues and eigenvectors of AAT . For the recognition process the K eigenvectors 28
corresponding to the largest K eigenvalues are considered for computation. Choosing K depends on the available data set. This set of vectors is system’s eigenspace.
3.1.2 Classification and recognition The previous section described how the recognition process is initialized by acquiring the training set of face images and calculates the eigenfaces, which define the eigenspace or facespace. This section continues to describe how the achieved results can be used for recognition. Whenever a new face image is encountered by the system, a set of weights based on the input image and the K eigenfaces needs to be calculated by projecting the input image onto each of the eigenfaces. After subtracting the mean face from the input image (1), projection is as simply as point–by-point multiplication and summation: K
ˆ = ∑ ωi u i Φ
(ωi = u iT Φ) (4)
i =1
ω1 ω Φ is represented as: Ω = 2 .... ω K To classify an image as a face or not (whether known or unknown), a check could be done to see if sufficiently close to the face space. If it is a face, classify the weight pattern as either a known person or as unknown, by calculating the distance between the image and all available system images. The image with shortest distances is the image to be the most likely to match the input image. Several distance measures such as Euclidean distance, Cosine distance or Mahalanobis distance can be considered to calculate the distance. Optionally if the same unknown face is seen several times, the algorithm could calculate its characteristic weight pattern and incorporate into the known faces database (i.e., learn to recognize it).
29
3.2 Independent component analysis algorithm ICA is another subspace projection based algorithm used in the field of face recognition. The algorithm presents a newer method that produces spatially localized and statistically independent basis vectors. ICA was originally developed for separating mixed audio signals into independent sources [16]. The goal of ICA is to minimize the statistical dependency between the basis vectors. The basis vectors in ICA are neither orthogonal nor ranked in order. Mathematically, this can be written as WXT = U, where ICA searches for a linear transformation W that minimizes the statistical dependency between the rows of U, given a training set X. There is no closed form expression to find W. Instead, many iterative algorithms have been proposed based on different search criteria. However, it has been shown that most of the criteria optimized by different ICA algorithms lead to similar or even identical algorithms [4]. InfoMax is one of the best-known ICA algorithms. The InfoMax algorithm as was described in [18] by Bartlett et al, was analyzed and its recognition performance was compared with PCA. Details about the experiment are provided in sections 6. The easiest way to understand ICA is with an example [25]. The classic example to explain ICA is to picture a room with talking people (or different sound sources). In this room two microphones records all the sounds; this represents the output of the room. When ICA is performed on the output it will return the independent sources (different sound sources in the room). So the idea behind ICA is that the original components can be obtained as long as they are independent. However in practice this requirement does not have to be entirely true. To determine the ICs (independent components), the Bell and Sejnowski’s InfoMax algorithm [18] is applied. The algorithm was derived from the principle of optimal information transfer in neurons with sigmoidal transfer functions. It is motivated as follows [18]: Let X be an n-dimensional (n-D) random vector representing a distribution of inputs in the environment. Let W be an n x n invertible matrix, U = WXT and Y = f(U) an nD random variable representing the output of n-neurons. Each component is an f= (f1….fn) and is an invertible squashing function, mapping real numbers into the [0, 1] interval. Typically, the logistic function is used: 1 fi(u)= (1) 1 + e −U The U1……Un variables are linear combinations of inputs and can be interpreted as presynaptic activations of n-neurons. The Y1…….Yn variables can be interpreted as postsynaptic activation rates and are bounded by the interval [0, 1]. The goal in Bell and Sejnowski’s algorithm is to maximize the mutual information between the environment X and the output of the neural network Y. This is achieved by performing gradient ascent on the entropy of the output with respect to the weight matrix W. The gradient update rule for the weight matrix, W is as follows: (2) ∆Wα∇WH (Y )W T W = ( I + Y ′U T )W Where α is the learning rate, H(Y) is the entropy of the random vector Y, ∇WH (Y ) is the gradient of the entropy in matrix form, i.e., the cell in row i, column j of this
30
matrix is the derivative of H(Y) with respect to Wij. Yi’=fi’’(Ui)/ f’i(Ui) from (1) this gives 1-2Yi. Detailed description of the learning rule can be found in [18]. The algorithm is speeded up by including a “sphering” step prior to learning. The row means of X are subtracted, and then is passed through the whitening matrix, which is twice the inverse square root of the covariance matrix. Wz = 2* (COV(X))-1/2 (3) This removes the first and the second-order statistics of the data; both the mean and covariances are set to zero and the variances are equalized. When the inputs to ICA are the “sphered” data, the full transform matrix WI is the product of the sphering matrix and the matrix learned by ICA. (4) WI=WWz [18] Discusses that the ICA algorithm converges to the maximum likelihood estimate of W-1 for the following generative model of the data: X=W-1S (5) Where S= (S1……Sn)’ is a vector of independent random variables, called the sources, with cumulative distributions equal to fi, in other words, using logistic activation functions corresponds to assuming logistic random sources and using the standard cumulative Gaussian distribution as activation functions corresponds to assuming Gaussian random sources. Thus, W-1, the inverse of the weight matrix in Bell and Sejnowski’s algorithm, can be interpreted as the source mixing matrix and the U=WX variables can be interpreted as the maximum-likelihood (ML) estimates of the sources that generated the data. 3.2.1 ICA architectures In this research the ICA algorithm provided by Bartlett et al was evaluated. A number of algorithms for performing ICA have been introduced, in [18] an approach based on the InfoMax algorithm proposed by Bell and Sejnowski [17] was discussed. Bartlett et al. performed ICA on the image set under two architectures: • Architecture I treated the images as random variables and the pixels as outcomes of the experiment (trials). Let X be a data matrix with rows and columns, each column of X represent an outcome (independent trials) of a random experiment, and each row of X represent a value taken (images) by a random variable Xi.. In this approach, it makes sense to talk about independence of images or functions of images. Two images i and j are independent if when moving across pixels, it is not possible to predict the value taken by the pixel on image j based on the value taken by the same pixel on image i. • Architecture II treated the pixels as random variables and the images as outcomes. By transposing X, images are organized in columns and pixels in rows. Here, it makes also sense to talk about independence of pixels or functions of pixels. Pixel i and j would be independent if when moving across the entire set of images it is not possible to predict the value taken by pixel i based on the corresponding value taken by pixel j on the same image
31
Architecture I finds weight vectors in the directions of statistical dependencies among the pixel locations. Architecture II finds weight vectors in the directions of statistical dependencies among the face images.
Figure 10 ArchitctureI
Figure 11 ArchitectureII
Architecture II uses ICA to find a representation, in which the coefficients used to code images are statistically independent, i.e., a factorial face code. These include the fact that the probability of any combination of features can be obtained from their marginal probabilities. To achieve this goal, the data matrix was organized so that rows represent different pixels and columns represent different images. ICA attempts to make the outputs, U, as independent as possible. Hence, U is a factorial code for the face images.
32
The source images estimated by the rows of U are then used as basis images to represent faces. In order to have control over the number of ICs (independent components) extracted by the algorithm, instead of performing ICA on the original nr images. ICA was performed on a set of m linear combinations of those images, where m < nr. It is assumed in [18] that images in X are a linear combination of a set of unknown statistically independent sources. The model is unaffected by replacing the original images with some other linear combination of the images. Linear combinations the first m PC (principal components) eigenvectors of the image set were chosen. PCA was performed on the image set in which the pixel locations are treated as observations and each face image as a measure, the use of PCA vectors in the input did not throw away the high-order relationships. These relationships still existed in the data but were not separated.
3.3 Differences between ICA & PCA Bartlett et al. [18] provided detailed description of the ICA algorithms and addressed its differences with PCA. The goal of PCA is to find a “better” set of basis images so that in this new basis, the image coordinates or better known as the PCA coefficients are uncorrelated, i.e., they cannot be linearly predicted from each other. Dependencies that show up in the joint distribution of pixels are separated out into the marginal distributions of PCA coefficients. However, PCA can only separate pair wise linear dependencies between pixels. High-order dependencies will still show in the joint distribution of PCA coefficients, and, thus, will not be properly separated. Bartlett et al. states that in face recognition, much of the important information may be contained in the high-order relationships among the image pixels, and thus, it is important to investigate whether generalizations of PCA which are sensitive to highorder relationships, not just second-order relationships, are advantageous. Independent component analysis (ICA) [16] is one such generalization. Research has proven that ICA is successful for separating randomly mixed auditory signals (the cocktail party problem), and for separating electroencephalogram (EEG) signals [20] and functional magnetic resonance imaging (fMRI) signals [21]. Bartlett et al. shows in [18] that the phase spectrum, not the power spectrum, contains the structural information in images that drives human perception. The fact that PCA is only sensitive to the power spectrum of images suggests that it might not be particularly well suited for representing natural images.
33
The potential advantages of ICA over PCA are as follows: 1. It provides a better probabilistic model of the data, which better identifies where the data concentrate in dimensional space. 2. It uniquely identifies the mixing matrix W. Such that the rows of matrix W are as statistically independent as possible. Where in PCA the rows of W are, in fact, the eigenvectors of the covariance matrix of the data. 3. It finds a not-necessarily orthogonal basis which may reconstruct the data better than PCA in the presence of noise. 4. It is sensitive to high-order statistics in the data, not just the covariance matrix. Since the three ICA basis vectors are non-orthogonal, they change the relative distance between data points. This change in metric may be potentially useful for classification algorithms, like nearest neighbor, that make decisions based on relative distances between points. The ICA basis also alters the angles between data points, which could affect similarity measures such as cosines. Moreover, if an under complete basis set is chosen, PCA and ICA may span different subspaces. Bartlett et al. states that the metric induced by ICA is superior to PCA in the sense that it may provide a representation more robust to the effect of noise [22]. It was found that if only 12 bits are allowed to represent the PCA and ICA coefficients, linear reconstructions based on ICA are 3 dB better than reconstructions based on PCA (the noise power is reduced by more than half). A similar result was obtained for PCA and ICA subspaces. If only four bits are allowed to represent the first 2 PCA and ICA coefficients, ICA reconstructions are 3 dB better than PCA reconstructions, variations in lighting and expressions can be seen as noisy versions of the canonical image of a person. ICA can be seen as a theoretically sound probabilistic method to find interesting non-orthogonal rotations. Bartlett et al. also states that in experiments to date, ICA performs significantly better using cosines rather than Euclidean distance as the similarity measure, whereas PCA performs the same for both. A cosine similarity measure is equivalent to lengthnormalizing the vectors prior to measuring Euclidean distance when doing nearest neighbor. Other studies such as [4] found that PCA outperforms ICA when the proper distance metric for each method is selected to maximize performance. In this research both algorithms where evaluated again using the same preprocessing algorithm on the same sets of images, using four different similarity measures with both approaches. Results presented in section 6.
34
4 Implementation 4.1 PCA implementation Using an algorithm implemented in Matlap, an experiment was performed to evaluate the algorithm and its usability for building a face recognition system. The main goal of the evaluation was to verify that the PCA algorithm was providing the promised results, and that the level of detail of information provided in the paper and other information sources were sufficient to design a system that could provide acceptable results when used for face recognition. Using the Matlab source code published by [3] and [24] as an initial setup to implement the test system, a test was executed using several sets of images from the FERET database, two training sets and four evaluation sets. The system was first trained with a training set of 428 images, where four different distance measures (Cityblock, Cosine, Euclidian and Mahalanobis) were evaluated to obtain the number of recognized images. The same test was repeated using the second training set with two images for some individuals, The set has 501 images. Several test runs were executed to verify the results of the algorithm. The algorithm execution was split into two Matlab classes PCA.m and Testpca.m. The PCA.m class is a responsible for executing the steps related to initializing the algorithm, perform training and project images onto a lower dimensional subspace. Below is the prototype of the PCA function: function pca (path1, datalist, path2, trainList, subDim)
Inputs Where each input argument is specified as follows: Path1: full path to the preprocessed images available in datalist. Datalist: is the list of all images available in the system database to be used for recognition. Path2: full path to the preprocessed images available in trainList. TrainList: list of images to be used for training. Names should be without extension. The .pgm extension shall be added automatically to the file name. SubDim: Numer of dimensions to be retained (the desired subspace dimensionality). If this argument is ommited, maximum non-zero dimensions will be retained, i.e. (number of training images) -1. Outputs: Function will generate the following outputs: DATA: a matrix where each column is one image reshaped into a vector, this matrix size is (number of pixels) x (number of images). ImSpace: same as DATA but only images in the training set. Psi: mean face of training images. ZeroMeanSpace: a matrix with mean face subtracted from each row in imSpace. PcaEigVals: eigenvalues. W: lower dimensional PCA subspace.
35
PcaProj: all images projected onto a sub-dimensional space. The function PCA executes a number of steps to train the system: 1. Reads images from the specified path, reshape them into vectors, and creates the matrix DATA. 2. Creates a training images space (ImSpace matrix), by copying the specified images in the list of training images (TrainList). 3. Calculates the mean face (psi) from the training images. 4. Subtract the mean face (psi) from each row in the imSpace matrix and save the results into the matrix ZeroMeanSpace. 5. Calculates the eigenvectors and eigenvalues of the matrix L (L = zeroMeanSpaceT * zeroMeanSpace), as specified by Turk-Pentland in [1], see section [3.1.1] for details. 6. The diagonal matrix of the eigenvalues is sorted by the largest values. 7. Using the normalized eigenvectors, eigenfaces are calculated. 8. The best eigenvalues are used to choose the associated eigenvectors for creating the lower dimension face space (w). The number of chosen eigenfaces depends on the desired subspace dimensionality specified by the input argument SubDim. 9. All images are projected onto the new lower dimensional subspace (w) to create the projection space. Testpca.m is a class responsible for classifying and recognizing images. It reads evaluation images and calculates the distance of the images with respect to other images in the database. The image with shortest distance is then proposed as the recognized matching image. The function is available in two variants, the first one can be used to evaluate the recognition of a single image, while the second one can be used to evaluate a list of images. The functions have the following prototype, inputs and outputs: function Testpca (img, psi,w,pcaProj,list)
Inputs: Img: full path to the image to be evaluated. E.g.( ‘C:\images\00001fa010_930831.pgm’) PSI: mean face of training images, (output of the PCA function). W: lower dimensional PCA subspace, (output of the PCA function). PcaProj: The projection of all images in the database onto a sub-dimensional space. (output of the PCA function). List: a list of the projected images. Outputs: The function prints the matching index and the associated image name from list. function Testpca2 (path,psi,w,pcaProj,list,evallist)
Inputs: Path: full path of the directory containing the images to be evaluated, e.g.( ‘C:\images\’). PSI: mean face of training images, (output of the PCA function). W: lower dimensional PCA subspace, (output of the PCA function).
36
PcaProj: The projection of all images in the database onto a sub-dimensional space. (output of the PCA function). List: list of all projected images. Evallist: list of image to be evaluated. Outputs: The function evaluates each image separately, and compares the matched image name from ‘list’ with the matched image name from ‘evallist’. If both image belong to the same individual then, the image is considered as recognized and a counter for the number of matched images is increased. The function prints the number of matched image as the number of recognized images. The results of the experiment are discussed in section 6.
37
4.2 ICA implementation The source code of the ICA algorithm as discussed in [18] was made publicly available for research purposes. In this research the source code developed in Matlab was re-used as a base to implement an algorithm to execute the experiment. As discussed in section 3.2, two ICA architectures where proposed by Bartlett et al, but the ICA experiment was performed using architecture I only. The Matlab classes and steps executed during the experiment are listed and discussed below. As with PCA, the execution of each experiment was split into two stages, the training part and the evaluation and recognition part. Bartlett et al. uses the principal components found by the PCA (i.e. eigenfaces and vectors) to reduce the computational complexity of ICA. After all ICA operates on the higher-order statistics while PCA operates on pair wise relationships between pixels, also known as secondorder statistics. The same approach was followed during this experiment.
4.2.1 Architecture I experiment The ICA architecture I training (learning) part was performed by running the Arch1Train class, which by its turn is using the following classes: • pcabigFn • spherex • zeroMn • runica • sep96 In this section a brief description of each class is given. The Matlab class Arch1Train (shown below) runs several steps to perform the training part, the steps are: 1- Perform PCA, and determine the eigenvectors, eigenfaces and eigenvalues. 2- Select the first 200 eigenvectors, if the dimension of the data matrix is higher than 200. 3- Run ICA to determine the independent components based on the info max algorithm.
38
% Perform the PCA calculations and determine the PCA coefficients [V,R,E] = pcabigFn(C'); % Choose the first 200 eigenvectors, if bigger than 200. % (If PCA generalizes better by dropping first few eigenvectors, ICA will too). if size(V,2)