Multi-view segmented filter for multi-correlation: Application to 3D face recognition A. Alfalou*a and C. Brosseaub a
ISEN Brest, L@bISEN, 20 rue Cuirassé Bretagne CS 42807, 29228 Brest Cedex 2, France
b
Université Européenne de Bretagne, Université de Brest, Lab-STICC and Département de Physique, CS 93837, 6 avenue Le Go1rgeu, 29238 Brest Cedex 3, France.
ABSTRACT A novel approach to realize an optimized multi-decision segmented filter for recognizing a 3D object in a given scene is suggested. For that purpose, a correlation filter suitable for multi-view images is proposed. Like 3D object reconstruction techniques from multi-view images, we use multiple cameras displayed in a well-defined pickup grid. Each multi-view image gives a unique perspective of the scene. Then, the elemental images of the 3D object are separately correlated using a multi-channel correlator with the newly designed optimized filter. This optimization was carried out to take into account possible changes of the 3D object in the scene, e.g. rotation, scale, because correlation techniques are very sensitive to this kind of noise. To solve this problem, the filter must be sufficiently flexible to include information about different rotations of the elemental image. The optimization of the fusion of the different elemental references images, e.g. taken from different angles of rotation, was realized. In addition, a shifting of the different spectra was done in the filter plane to minimize the overlap between the different elemental spectra. In this study, particular attention is paid for the choice of the lateral shifting. With this correlation system, each channel gives a single decision concerning a single perspective of the 3D object. We demonstrate that the final correlation decision takes into account all these elemental decisions to determine the presence (or not) of the 3D object in the target scene. Keywords: Optical correlation, Classical Matched filter, BPOF, Segmented composite filter, multi-view images, 3D object reconstruction, Fusion in the Fourier plane.
1. INTRODUCTION Since the publication by Vander Lugt [1] in 1964 of a technique for recognition based on a coherent optical processing system, this technique has been the focus of intensive interest. This is mainly due to four issues: (1) It is based a very simple principle i.e. an optical setup called 4f [2], and on a comparison between the target image (image to be recognized) and a reference image coming from a learning base; (2) The appearance of many algorithms for correlation filters (high discrimination ability) based only on the phase information of the Fourier reference plane [3]; (3) The advent of optoelectronic interfaces (such as the spatial light modulators: SLM), allowing the re-configurability of correlation filters [4-6]; and (4) The growing need for recognition and identification for security systems. To achieve this correlation method, two architectures have been proposed: the Frequency Correlator Plane FCP (or Vander Lugt Correlator VLC) [1] and the Joint Transform Correlator JTC [7]. Here, we shall be interested only in the first technique. For further information about correlation methods, the reader may want to consult the chapter book of Ref. [8]. We begin with an overview of the different manufacturing techniques of correlation filter that have inspired us to carry out our new filter: Shifted Multi-view segmented filter for multi-correlation (SMVSF) to recognize a 3D face. Then, we develop our technique to decompose a 3D face. Next, the fabrication of the SMVSF is described, which is followed by a numerical validation. Finally, a summary ends this work.
*
[email protected]; phone + 33 2 98 03 84 09; fax + 33 2 98 03 84 10 Optics and Photonics for Counterterrorism and Crime Fighting V, edited by Colin Lewis, Proc. of SPIE Vol. 7486, 74860P · © 2009 SPIE · CCC code: 0277-786X/09/$18 · doi: 10.1117/12.831077
Proc. of SPIE Vol. 7486 74860P-1
Background and Notations The 1980s and 1990s have seen a large number of correlation filters to improve the Classical Matched filter (CMF) [1]. These improvements were intended to include physical constraints of the SLM modulators which are required to display filters in the optical set-up. A few words about the notation: I(x,y) and Ri(x,y) are the (2D) target image (to recognize) and the 2D reference image number #i, respectively. In like fashion, SI(u,v) and SR(u,v) denote the Fourier transform of the target image and the reference image, respectively. • HCMF (Classical Matched filter): defined as the conjugate of the reference image. Without noise this filter can be written by HCMF =S
•
• •
•
* Ri
(u,v) . This filter is very robust, but sensed a low discriminating power [3,9].
HPOF (Phase Only Filter): defined as H POF
=
S S
* Ri * Ri
(u,v) . This filter has a very sharp peak, is very
(u,v)
discriminating, but is too sensitive to specify noise arising e.g. in object deformation [3,9]. HBPOF (Binary Phase Only Filter): It is a binary version of the HPOF filter to consider the physical constraints imposed by using fast and binary SLM [10-11]. Several techniques have been proposed to achieve this binarization, e.g. [12]. Here we consider that HBPOF = +1 if Real(HPOF) ≥ 0 and HBPOF = -1 if Real(HPOF) < 0. HROS_POF (Phase Only Filter with region of support): defined as a multiplication of the HPOF filter with a passband function P [12-13] H ROS − POF = H POF × P . This HROS_POF filter permits to increase the robustness of the classical HPOF, and has a good discriminating power. HCOMP (Composite Filter): This filter allows one to merge multiple versions of a reference image. This type of filter is interesting to solve the correlation filter sensitivity vis-à-vis rotation and scale. This filter can be defined as a linear combination of different reference versions H COM =
∑a R i
i
, where the ais are coefficients to
i
optimize a cost function defined for a desired application [14-15]. HSeg (Segmented filter): It is another version of the Composite Filter to overcome the saturation problem due to a combination of a large number of references [16-17]. HSeg optimizes the use of the space bandwidth product SBWP in the spectral plane filter. Several criteria have been proposed to evaluate correlation performances i.e. recognition, discrimination, [18-19], i.e. the peak to correlation Energy, PCE, defined as the energy of the peak correlation normalized to the total energy of the correlation plane •
N
PCE =
∑E
peac (i, j)
i, j
M
∑
,
(1)
Ecorrelatio n _ plane(i, j)
i, j
where N denotes the size of the peak correlation spot and M the size of correlation plane, and the SNRout, defined as ratio of the energy of the peak correlation spot to the noise in the correlation plane N
SNRout =
∑E
peac (i,
j)
(2)
i, j
M
∑E i, j
correlatio n _ plane (i, j)−
N
∑E
peac (i,
j)
i, j
2. ANALYSIS In many practical applications, 3D objects, e.g. faces, need to be recognized. In order to identify them, they should be first converted in 2D images. In this case, SLMs, used to display the input plane and the filters into a correlator are 2D.
Proc. of SPIE Vol. 7486 74860P-2
The 3D-2D conversion is generally achieved with a CCD camera, resulting in a loss of information. To deal with this issue, we propose to use a basic correlator (FCP) and to merge information of both object shape and object depth in its Fourier plane. Background on 3D object reconstruction Two major methods for 3D object reconstruction can be found in the literature. The first is based on holography, i.e. the PSI technique [20-21]. Even though it can provide a reliable 3D object reconstruction, it is discarded here because it requests the achieving of complex transmittances using several specific reference waves. Furthermore, it requires the registration of these transmittances. The second method, called integral imaging technique (II) [22], is easy to achieve since it does not require complex amplitudes and acquisition of a large amount of information. Indeed, this technique is based on the registration of multi-views of a 3D target object. These different views are called elemental 2D images. They are taken using a matrix of fixed cameras (pickup grid). Each of these cameras captures a view of the 3D object (Figure (1)). Then, the elemental 2D images are used to rebuild the 3D object by a specific process [22]. Our approach to recognize 3D object The first step in our approach consists in decomposing the 3D object into many elemental 2D images. To do this, we use a pickup grid, i.e. symmetric arrangement with three rows of cameras: A first row is associated to an angle of -15° view of the object, a second one to 0°, and the third one to +15°. In our arrangement, the cameras are placed at a distance equal to 3 cm each in length and width. To validate our approach, we used objects without noise: the subject is positioned in front of the camera placed at the center. To recognize the 3D object, we can correlate each 2D elemental image with its appropriate 2D correlation filter, thus giving an elemental decision. The final decision on the 3D object is taken by combining all these elemental decisions. However, it is a very long process that requires the a priori knowledge of the information on the object (especially on its orientation). To deal with this problem, we propose to merge the different elemental images together to produce a single image that contains all the necessary information on the various views of the 3D object. Next, we compare this image with only one filter. Two fusion techniques were used: the first one is to carry out the composite filter and the second one is to carry out the segmented filter. Next, we will detail the process to merge the input elemental images and the fabrication of the filter.
Figure 1. Decomposition of the 3D object into several multi-view images.
Proc. of SPIE Vol. 7486 74860P-3
3. MULTI-VIEW SEGMENTED FILTER FOR MULTI-CORRELATION For the dual purpose of merging the information dealing with a 3D object and reducing it in a 2D image, we start by applying the technique used in the fabrication of the composite filter. The 3D/2D image obtained is a combination of different elemental images. However, one major drawback of this technique lies in its saturation problem, especially when the number of elemental images is large, which eventually leads to a decrease of the performance of the correlator. This saturation is due to the fact that the technique we want to propose here should be optically implementable. SLMs are used to display the input images and filters that require 8-bit coding of information for the SEIKO modulator (amplitude modulator, with 256 gray levels) and 2-bit coding for the Displaytech modulator. The former modulator is used to display the target image in the input plane of our correlator. The latter modulator is used to display the various filters. To overcome the saturation, the number of merged elemental images is limited to three. A fusion technique based on the segmentation of the Fourier plane is used [16]. Segmented filter To improve the robustness of the HPOF filter, it was shown that the multiplication of the latest filter with a binary passband function ( P(i)=0 iN, where N is chosen according to the specific application) increases the robustness of the POF filter against noise. To improve the pass-band function, Ding and co-workers [13] proposed an iterative method. This technique requires, a large computing time and depends on the desired application. Moreover, it is based on a binary function (0 or 1) leading to a loss of Horner efficiency which can be detrimental for optical implementation. A similar technique was proposed in Refs. [16-17] to improve the pass-band function based on a noniterative method and optimized by using the space bandwidth product in the Fourier plane. This technique was shown to give a good Horner efficiency identical to a conventional HPOF. In this approach the Fourier plane is divided in several zones, each being allocated to a specific elemental image. As shown in Figure 2, the principle of this method consists in merging the spectra of the various elemental images according to a well-defined criterion. The segmentation criterion, used in this work compares for each pixel (k,l) the relative (normalized to the total energy) energy Ei ( k , l ) associated with this pixel for image #i to the energy of the other images. From these comparisons a "winner-takes-all" decision assigns this pixel to a specific elemental image ⎛ ⎞ ⎜ ⎟ Ei(k,l) E j (k,l) ⎜ ⎟ ∀ i≠ j . = Max N N ⎜ N N ⎟ Ei(m,n) ⎜ ∑∑ E j (m,n) ⎟ ∑∑ m=0 n =0 ⎝ m=0 n =0 ⎠
Figure 2. Synoptic diagram of the segmented filter manufacturing.
Proc. of SPIE Vol. 7486 74860P-4
(3)
Optimized multi-view segmented filter After this short presentation of the principle of the segmented filter technique, an extension of this technique to 3D recognition is now proposed. For that purpose, we first consider a linear combination of different elemental 2D images (Figure 3(a)) taken by one camera row in the pickup-grid (with reference to Figure 3 (b)). Next, each spectrum of merged elemental images is shifted (Figure 3 (c)). It was also necessary to shift the spectra to optimize the segmentation in the Fourier plane. This is required because the elemental 2D images are too close to each other. In practice, the shifting is realized optically by multiplying each merged elemental image by a specific phase term [16]. Afterwards, a segmentation to obtain the SMVSF is performed.
Figure 3. Principle of the optimized multi-view segmented filter.
Particular attention was paid to find the appropriate value of the shifting parameter Δ that minimizes the overlap between the different spectra. For that purpose, we rely on previous work by Papoulis [24], who determined the necessary minimal size of a given spectrum by calculating the quantity called RMS duration
Δ= 1 2π
+∞ +∞
∫∫
2
∇H(u,v) dudv =
−∞ −∞
+∞ +∞
2
∫∫ x y S (
−∞ −∞
2
+
)
2
(x, y) dxdy , I
where ∇H (u , v) is the gradient of the spectrum, SI(u,v) denotes the spectrum of an image I, and
(4)
1 2π
is a normalization
factor. Proposed 3D correlator set-up The principle of our approach is presented in Figure 4. The first operation is to add the different elemental target images. For that purpose, we perform the same merging steps applied to fabricate our filter presented above. A multiplication with the SMVSF is followed by an inverse Fourier transformation giving us the correlation plane. In the output plane, we get the result of simultaneous correlation of all elemental images obtained from the 3D face.
Proc. of SPIE Vol. 7486 74860P-5
Figure 4. Principle of the 3D Correlator set-up.
4. RESULTS AND DISCUSSION The purpose of this section is to validate the principle of our approach by numerical simulations. These simulations were conducted without noise, but taking into account the limitations imposed by using SLMs and their specific coding, i.e. 8 bit-coding of the input image and 2 bit-coding of the filter. We recall that the specific application we have in mind for these simulations is the recognition of a 3D face (with reference to Figure 5).
Figure 5. Example of decomposition of a 3D face into multi-view 2D elemental images
If we apply our protocol for correlating the target image with a non-shifted multi-view segmented filter (NonSMVSF), the correlation plane is presented in Figure 6(a). This figure shows the presence of a single correlation peak resulting from the autocorrelation between elemental input images with different corresponding elemental filters. It must be noted that the large width and the high level noise presented in the correlation plane (-38 dB)
Proc. of SPIE Vol. 7486 74860P-6
affects the performance of this filter. The segmentation of the different spectra is made from very similar spectra leading to strong spectral segmentation and consequently to the emergence of the isolated pixel phenomenon.
PCE = 0.00127 SNR1= -37.8 dB (a)
PCE = 0.00221 SNR1= -32 dB (b)
Figure 6. Correlation plane (a) without spectra shifting, i.e. Non-SMVSF , and (b) with spectra shifting, i.e. SMVSF. To overcome the problem of strong segmentation, we propose to optimize the use of the filter space bandwidth product. This is done by shifting the spectra to eliminate the high overlapping areas. The shifting value was calculated using the RMS duration criterion. Figure 6 (b) shows the correlation plane obtained with this optimization. It is seen that the correlation peak is higher and that the noise has been significantly reduced in the correlation plane (-32 dB). Form the implementation standpoint, it is necessary to binarize the filter. Figures 7(a) and 7(b) show the correlation result obtained by the binarization of two filters used in (Figure (6)). The most compelling result is the significantly higher correlation peak observed in these figures.
PCE = 0.0051 SNR1= - 45.8 dB (a)
PCE= 0.00221 SNR1= - 41 dB (b)
Figure 7. Correlation plane obtained by binarizing the (a) Non-Shifting MVSF filter, and (b)the SMVSF filter.
Proc. of SPIE Vol. 7486 74860P-7
5. CONCLUSION In this work, a new approach for the purpose of recognizing 3D objects has been presented and validated. This approach is based on the decomposition of the 3D target object and the 3D reference images by using the II method. Once the elemental images are obtained, they are merged together using composite and segmented techniques. To deal with the problem of isolated pixel the different spectra were shifted. This shifting has been calculated based on the RMS duration criterion. The work presented here allows one to correlate a 3D object with a single filter, to obtain a higher and sharper correlation peak, and to reduce the noise level in the correlation plane. It would be interesting to test the robustness of this filter against noise and also with respect to the modification of a 3D target object compared to 3D references images (rotation, scale).
REFERENCES [1] A.Vander Lugt, “Signal detection by complex spatial filtering”, IEEE Trans. Info. Theory IT-10, 139 (1964). [2] J. W. Goodman, [Introduction to Fourier Optics], McGraw-Hill, New York, (1968). [3] J. L. Horner and P. D. Gianino, "Phase-only matched filtering," Appl. Opt. 23, 812 (1984). [4] G. Keryer, J. L. de Bougrenet de la Tocnaye, and A. Alfalou, “Performance comparison of ferroelectric liquidcrystal-technology-based coherent optical multichannel correlators,” Appl. Opt. 36, 3043 (1997). [5] R. M. Turner, K.M. Johnson, and S. Serati, [High Speed Compact Optical Correlator Design and Implementation], Cambridge University Press, New York, (1995). [6] M. Madec, W. Uhring, J.B Fasquel., P.Joffre, Y. Hervé, “Compatibility of temporal multiplexed spatial light modulator with optical image processing,” Opt. Commun. 275, 27 (2007). [7] B. Javidi, “Nonlinear joint power spectrum based optical correlation,” Appl. Opt. 28, 2358 (1989). [8] A. Alfalou and C. Brosseau, “Understanding Correlation Techniques for Face Recognition: From Basics to Applications”, in Face Recognition, In-Tech, ISBN 978-953-7619-00-X. [9] B. Javidi, S.F. Odeh, and Y.F. Chen, “Rotation and scale sensitivities of the binary phase-only filter”, Appl. Opt. 65, 233 (1988). [10] J.L. Horner, B. Javidi, and J. Wang, “Analysis of the binary phase-only filter”, Opt. Commun. 91, 189 (1992). [11] A. Alfalou, [Implementation of Optical Multichannel Correlators: Application to Pattern Recognition], PhD thesis Université de Rennes 1, Rennes-France, (1999). [12] B. V. K. Vijaya Kumar, “Partial Information Filters”, in Digital Signal Processing, Vol. 4, 147 (1994) [13] J. Ding, M. Itoh and T. Yatagai “Design of optimal phase-only filters by direct iterative search”, Opt. Commun. 118, 90 (1995). [14] J. L. de Bougrenet de la Tocnaye, E. Quémener, and Y. Pétillot, “Composite versus multichannel binary phase-only filtering,” Appl. Opt. 36, 6646 (1997). [15] B. V. K. V. Kumar, “Tutorial survey of composite filter designs for optical correlators,” Appl. Opt. 31, 4773 (1992). [16] A. Alfalou, G. Keryer, and J. L. de Bougrenet de la Tocnaye, “Optical implementation of segmented composite filtering,” Appl. Opt. 38, 6129(1999). [17] A. Alfalou, M. Elbouz, and H. Hamam, “Segmented phase-only filter binarized with a new error diffusion approach,” J. Opt. A: Pure Appl. Opt. 7, 183 (2005). [18] B. V. K. Vijaya Kumar, and L. Hassebrook, “Performance measures for correlation filters”, Appl. Opt. 29, 2997 (1990). [19] J. L. Horner, “Metrics for assessing pattern-recognition performance”, Appl. Opt. 31, 165 (1992). [20] E. Darakis and J. J. Soraghan, “Reconstruction domain compression of phase-shifting digital holograms,” Appl. Opt. 46, 351 (2007). [21] T. J. Naughton, Y. Frauel, B. Javidi, and E. Tajahuerce, “Compression of digital holograms for three-dimensional object reconstruction and recognition,” Appl. Opt. 41, 4124 (2002). [22] B. Tavakoli, M. Daneshpanah, B. Javidi, and E. Watson, "Performance of 3D integral imaging with position uncertainty," Opt. Express 15, 11889 (2007) [23] M. Farhat, A. Alfalou, H. Hamam, and C. Brosseau, ”Double fusion filtering based multi-view face recognition”. Opt. Commun. 282, 2136 (2009). [24] A. Papoulis, [The Fourier Integral and its Applications], McGrawHill, New York, (1962).
Proc. of SPIE Vol. 7486 74860P-8