NOVEL PRESENTATION ATTACK DETECTION ... - IEEE Xplore

NOVEL PRESENTATION ATTACK DETECTION ALGORITHM FOR FACE RECOGNITION SYSTEM: APPLICATION TO 3D FACE MASK ATTACK R. Raghavendra

Christoph Busch

Norwegian Biometric Laboratory, Gjøvik University College, Norway. Email:

{raghavendra.ramachandra, christoph.busch} @hig.no

ABSTRACT The face biometric systems are highly vulnerable for the presentation attack that can be carried out by presenting a photo or video or even a 3D mask. In this paper, we present a novel Presentation Attack Detection (PAD) algorithm that can accurately detect and mitigate the 3D mask attacks on a face recognition system. The proposed scheme extracts both local and global features from the captured face image. The local features employed in this work corresponds to the eye (periocular) and nose region that are expected to provide clue on the presence of the mask. In addition, we also capture the micro-texture variation as a global feature using Binarized Statistical Image Features (BSIF). We then train a linear Support Vector Machine (SVM) independently on these two features whose scores are fused using the weighted sum rule before making the decision about a real face or an artefact. Extensive experiments are carried out on the public 3D mask database 3DMAD that shows the superiority of the proposed scheme with an outstanding performance of HTER = 0.03%. Index Terms— Biometrics, Security, face mask attack, Counter measure 1. INTRODUCTION Biometrics have evolved as a promising solution for reliable subject identification and verification applications. Among the wide range of suitable biometric characteristics, the face is very popular for because of its high acceptability amongst data subjects. Nevertheless, the wide deployment of low-cost capture devices on the one hand and the trend towards innovative presentation attacks or spoofing attacks has given rise to additional challenging problems for facial image capture processes. One major goal of the presentation attack is for an un-authorized user to get access to the biometric system by presenting the artificially created biometric characteristic (i.e. artefact) that can impersonate an enrolled subject. The face biometric system is highly vulnerable for these attacks as it is not only easy to spoof but also it is very cost effective for the attacker to generate the face artefact of the target subject by considering its availability via internet-based social networks or ease to capture a facial image in high resolution at a distance. In general a face biometric system can be spoofed by presenting either a photo [1] or by replaying a video [2] or by presenting a 3D mask [3] [4]. There exists a wide range of PAD algorithms (or counter measures or anti-spoofing methods) to address more and more challenging attacks. Further, the results achieved in recently organized competitions [5] [6] demonstrated the successful detection of presentation attacks especially when a photo or video was used. Recently, a new way of attacks are introduced that involves in attacking face recognition system in a more sophisticated manner

978-1-4799-5751-4/14/$31.00 ©2014 IEEE

323

WƌŽƉŽƐĞĚƐƉŽŽĨ ZĞĂůͬDĂƐŬ ĚĞƚĞĐƚŝŽŶ ĐůĂƐƐŝĨŝĐĂƚŝŽŶ ĂůŐŽƌŝƚŚŵ

ĂƐĞůŝŶĞĨĂĐĞ ƌĞĐŽŐŶŝƚŝŽŶ ĂůŐŽƌŝƚŚŵ

sĞƌŝĨŝĐĂƚŝŽŶ

Fig. 1. Block diagram of the proposed scheme

using 3D masks [3] [4] [7]. The vulnerability of the face recognition system on printed 3D masks are presented in [4]. Here, 3D masks are manufactured by scanning the subject face using 3D scanner and then the mask is printed using a 3D printer. Then PAD algorithm based on Local Binary Features (LBP) and SVM is introduced to accurately detect the mask attacks. However, this kind of mask generation is not only expensive but also demands subject cooperation. Recently, a cost effective mask attack database is presented in [3] [7]. Here, masks are generated from ThatsMyFace.com and then attacks are recorded using the Kinect device that provide an additional information to analyze the mask attack not only from the normal 2D image but also from the depth image. Further, a PAD algorithm based on LBP and Linear Discriminant Analysis (LDA) is presented to effectively mitigate the vulnerability effect of the 3D mask attack of face recognition system. In this work, we propose a novel PAD algorithm to detect the 3D mask attack on face recognition systems. The proposed scheme is based on extracting both local and global features. The local features are extracted to capture the variation in both periocular and nose region that can reflect the variations indicating the presence of mask. The use of global features can effectively analyze the micro-textures information by employing the BSIF features [8]. This is the first work that explores BSIF for mask attack detection in addition to exploring the local features. Finally, a linear SVM classifier is trained separately on local and global features whose scores are combined using weighted sum rule based on which the decision about artefact(or mask)/real is made. Extensive experiments are carried out on the publicly available 3DMAD database that shows the efficacy of the proposed scheme over well-established state-of-the-art schemes [4] [7]. The rest of the paper is organized as follows: Section 2 describes the proposed PAD algorithm, Section 3 describes the experimental protocol and results and Section 4 draws the conclusion. 2. PROPOSED PRESENTATION ATTACK RESISTANT FACE RECOGNITION SYSTEM Figure 1 shows the block diagram of the proposed spoof resistant face recognition system. The proposed system can be broadly structured in two functional units namely: (1) Proposed spoof detection algorithm (2) Baseline face recognition system.

ICIP 2014

'ůŽďĂů &ĞĂƚƵƌĞƐ

^sD ůĂƐƐŝĨŝĞƌ

>ŽĐĂů ĨĞĂƚƵƌĞƐ

Z'/ŵĂŐĞ

ZĞĂůͬDĂƐŬ

ĂƚĂĐĂƉƚƵƌĞ ƵƐŝŶŐŽĐĂů &ĞĂƚƵƌĞƐ


'ůŽďĂů &ĞĂƚƵƌĞƐ


ůĂƐƐŝĨŝĐĂƚŝŽŶ

ĞƉƚŚ/ŵĂŐĞ

'ůŽďĂů ĨĞĂƚƵƌĞƐ

Z'/ŵĂŐĞ


ĞƉƚŚ/ŵĂŐĞ

>ŽĐĂů &ĞĂƚƵƌĞƐ

;ĂͿ >ŽĐĂů ĨĞĂƚƵƌĞƐ Z'/ŵĂŐĞ

Fig. 2. Block diagram of the proposed PAD algorithm

'ůŽďĂů ĨĞĂƚƵƌĞƐ

Figure 2 shows the block diagram of the proposed spoof detection algorithm based on both local and global features extracted from the presented facial biometric characteristic. The local features are extracted from periocular and nose region of the captured face image as explained below

The core idea of using periocular region is to successfully extract the sharp variation and discontinuities that might result from the presence of a mask. The periocular region extracted from the normal (or real) images shows strong features in terms of edges that are contributed due to presence of eye lashes and eye lid. However, the use of face artefact (or 3D masks) in which eye regions are left open to fit the eye is expected to hide rich details of the eye region that will result in a different set of variations and discontinuities when compared to the real eye. Further, availability of the corresponding depth image will also reflect this variation in terms of the disparity information because of the presence of 3D mask or artefact. Thus, it is our assertion that accurately capturing these variations can provide a useful information which in turn can be effectively used to identify the 3D mask (or artefact) attack on the face recognition system. Hence in this work, we propose to capture this information by computing a spectrogram of the eye region corresponding to both 2D image and its corresponding depth image rendered by the Kinect camera[7]. Given the face image F captured using Kinect, we first extract the periocular region using the Haar cascade detector as mentioned in [9]. Let the extracted eye region be denoted as Fe , we then estimate the spectrogram of Fe by computing the squared magnitude of Short Time Fourier Transform (STFT) as follows [10]:

(1)

n=1

Where, Fe [n] denotes the periocular region, w[n] denotes the window type that corresponds to the hamming window that employed in this work and N denotes the length of Fe . Thus, given the Fe , we first convert this to a vector on which we perform the STFT by running a hamming window for a duration of 25msec with 50% overlapping. In this way, we capture the variation that can reflect the artefacts caused due to the presence of mask. We then compute the Power Spectral Density (PSD) of Fe as follows: Pe (m, ω) = Se ∗

Fs

N

2

n=1

|w[n]|2

;ďͿ Fig. 3. Illustration of local and global features (a) Normal (or Real) face sample (b) artefact (or 3D Mask) face sample

2.1.1. Local features from periocular and nose region

N 2 −jωn Se (m, ω) = Fe [n]w[n − m]e

ĞƉƚŚ/ŵĂŐĞ

2.1. Proposed PAD algorithm

(2)

324

We finally compute the quantitative value by taking the ratio on the sum of PSD between low and high frequency region as follows: QFe =

Pe (m, 1 : (ω/2)) Pe (m, (ω/2 + 1) : ω)

(3)

We repeat the above mentioned procedure to compute the quantitative PSD value for the corresponding depth image and let us denote this as QF D . Figure 3 illustrates the qualitative results of the spectrogram obtained on both 2D image and its corresponding depth images for both real (Figure 3 (a)) and 3D Mask (or spoof or artefact) face image (Figure 3 (b)) respectively. Here one can observe that the high PSD for real images that are contributed due to the strong appearance of eye lashes and eye lid. This factor justifies our choice of a spectrogram to represent the variations in the periocular region. In the next step, we extract local features corresponding to the nose region in the face. An accurate 3D reconstruction of the nose in the face mask is challenging because of a limited number of landmark points being present. This fact can be witnessed from Figure 4, that illustrates the nose region detected from both real and mask samples from different subjects. Here, it can be observed that, the size of the nose in real and mask samples are very different, thus, in this work, we would like to explore the size of nose to differentiate between real and mask image. Given the face image F , we first detect the nose region using Haar cascade classifier as described in [11]. We then compute the area of the detected region as the feature to represent the variation in nose region. Let this feature be denoted as QN . Note that, we extract this feature only from the 2D image and not on its corresponding depth images rendered by the Kinect. In this way, we extract three different local features such that, the 2D image has two features QFe and QN while for the corresponding depth image we have only one local feature QF D . 2.1.2. Global Features In addition to the local features, we also propose to extract global features to build an accurate presentation attack resistant (or antispoofing) system. In this work, we propose to employ the Binarized

ICIP 2014

DĂƐŬ

ZĞĂů

computed on the depth image, third SVM classifier (C3 ) on global features obtained from 2D image (GF I ) and the fourth SVM classifier (C4 ) on global features extracted from the corresponding depth image (GF D ) . The final decision is achieved by combining the decision from these four classifiers as follows: De =

4

wk Ck

(6)

k=1

Fig. 4. Illustration of variation in nose size on both real and mask face sample

Statistical Image Features (BSIF) [8] to effectively capture the information from the presented face. The idea of the BSIF is to represent each pixel as a binary code obtained by computing its response to a filter-bank that is trained utilizing the statistical properties of a set of natural images. Thus, the learning process is a crucial step in accurately constructing a filter which in turn characterizes the image to have a BSIF descriptor. In this work, we employed the open-source filters [8] that are trained using 50000 image patterns randomly sampled from 13 different natural images [12]. The learning process to construct these statistically independent filters involves three main steps (1) Mean subtraction of each patches (2) Dimensionality reduction using Principle Component Analysis (PCA) (3) Estimation of statistically independent filters (or basis) using Independent Component Analysis (ICA). Thus, given a 2D face image patch IF (x, y) and a filter hi of same size then, filter response is obtained as follows [8]: IF (x, y)hi (x, y) (4) ri = x,y

Where x and y denotes the size of the 2D face image patch and hi , ∀i = {1, 2, . . . , n} denotes the set of linear filters whose response can be computed together. . The filter response ri is binarized to obtain the binary string as follows [8]: 1, if ri > 0 bi = (5) 0, otherwise Finally, the BSIF features are obtained as the histogram of pixel’s binary codes that can effectively characterize the texture components in the 2D Face image and lets denote this as GF I . we repeat this procedure on corresponding depth image to obtain its global features denoted as GF D . Figure 3 illustrates the qualitative results of the BSIF features employed on both real and mask features. Even though the mask database which is used in this work is of very high quality that can reflect the real face characteristics. But still, there exists a minor difference in texture and smoothness characteristics that was effectively captured using BSIF features. That further justifies our proposed BSIF features for this precise application of spoof detection. 2.1.3. Weighted score fusion After extracting both local and global features from both 2D image and its corresponding depth image, we employ four different linear SVM classifiers whose comparison scores are combined using weighted sum rule to make the decision about normal/artefact (or 3D mask). Among these four SVM classifiers, we trained the first SVM classifier (C1 ) on concatenated local features from 2D image (QFe and QN ), the second SVM classifier (C2 ) on local feature (QF D )

325

Where, w indicates the weights such that 4k=1 wi = 1. In this work, the weights are computed by measuring the individual performance of Ck as mentioned in [13] on the development database and evaluated on the testing database. 2.2. Baseline Face recognition system In order to evaluate the spoofing performance of 3D masks on 2D face recognition system, we employ the well-established face recognition baseline system based on the Sparse Representation Classifier (SRC) [14]. The obtained results on the adopted protocols are presented in the following section. 3. EXPERIMENTS AND RESULTS We evaluate the proposed presentation attack detection algorithm on the publicly available 3D mask attack database (3DMAD) [7]. This database is comprised of both real and mask attack videos of 17 subjects recorded using Microsoft Kinect device. The 3D face mask for all 17 subjects are generated from ThatsMyFace.com. The whole database is collected in three different scenarios such that the first two sessions record the real access attempt while third session records the mask attack. For each subject, 5 videos were recorded for every session that resulted in a total of 255 color and depth videos with 300 frames each. 3.1. Evaluation protocol The whole database is partitioned into three independent (nonoverlapping) groups namely: training, development and testing dataset. The training dataset represents the samples corresponding to 5 subjects which are used only to train the SVM classifier on both local and global features from real as well as mask face samples. The development database consists of 5 subjects whose samples are used to tune the parameters (eg. weights in fusion) of our proposed algorithm. Finally, the testing database is comprised of 7 subjects which is solely used to report the performance of our proposed scheme. We repeat this data partition for 10 times and results are presented by taking the average over all 10 runs. Further, in order to measure the vulnerability of the face recognition system to the presentation attack using 3D masks, we propose to divide the development and testing database into reference and probe samples. In this work, we have used videos from session 1 as reference and video from session 2 as a probe to represent the performance of the system for real face while videos from session 3 are used as probe to measure the performance of the face recognition system under mask attack. 3.2. Results and discussion All the results are presented by assuming the verification protocol. The performance of the proposed PAD algorithm is measured using

ICIP 2014

0.4

60

Genuine Scores Imposter Scores Mask Attack Scores

0.35

Baseline Baseline under attack Baseline + Countermeasure (Th = EERd)

40

0.25

0.2

0.15

0.1

0.05

0

0

0.1

0.2

0.3

0.4

0.5 Score Bins

0.6

0.7

0.8

0.9

1

Fig. 5. Comparison score distribution of mask attack from testing dataset overlapped on the imposter and genuine scores obtained from the development dataset

False Reject Rate (in %)

Number of Attempts

0.3

20

10

5

2

two types of errors namely [7]:(1) False Fake Rate (FFR) that indicate number of real access classified as the mask. This error is also referred as Normal Presentation Classification Error Rate (NPCER) according to the ISO standard mentioned in WD 30107-3:2014 [15] (2) False Living Rate (FLR) where number of mask attacks classified as the real access. This corresponds to the Attack Presentation Classification Error Rate (APCER) according to ISO WD 301073:2014 [15]. Finally, the performance of the overall face recognition system is presented in terms of Half Total Error Rate (HTER) computed as (F F R + F LR)/2. In this work, the HTER is calculated on the testing database by fixing the threshold value that corresponds to the Equal Error Rate (EER) computed on the development database. This EER threshold value corresponds to the point where FFR and FLR are equal. We denote this threshold value as EERd. Figure 5 shows the comparison score distribution of genuine and imposter scores obtained on the development database and mask attack scores obtained on the testing dataset. The comparison scores are obtained on the probe partition corresponding to the development and testing set. Here, one can observe the strong overlapping of the mask score with the real genuine scores. Under this circumstance, placing the threshold value EERd will result in 84.12% of mask attack classified as the real faces in the testing dataset. Thus, the idea of using the proposed PAD algorithm should mitigate the spoof false acceptance rate to make the overall system vulnerable to the attacks. Figure 6 shows the performance of the proposed PAD algorithm when the threshold is computed on the development dataset (EERd). Here, it can be observed that, the presence of the mask attack will drastically increase the False Acceptance Rate (FAR) of the baseline system (as indicated by the red line in the Figure 6). It is also interesting to observe that, the use of proposed PAD algorithm has reduced the FAR of the baseline system under attack by accurately discarding the presence of 3D mask attacks. This strongly justifies the accuracy, robustness and applicability of the proposed PAD algorithm.

1

Fig. 6. dataset

1

2

5 10 20 False Acceptance Rate (in %)

40

Verification performance with different FFR on testing

Table 1 shows the quantitative performance of the proposed PAD algorithm. For the comprehensive comparison, we have also included the performance of the well known state-of-the-art algorithm based on LBP-SVM [4] and LBP-LDA [7]. The performance of the state-of-the-art algorithms are computed based on our experimental protocols and HTER is computed based on the value of EERd. Here, it can be observed that, the proposed PAD algorithm is emerged as the best algorithm with an outstanding performance with HT ER = 0.03%. Thus, based on the above experiments it is evident that, the proposed PAD algorithm is more accurate and robust in addressing the 3D mask attacks. Kindly refer http://youtu.be/cxoznWbb6LA for additional comprehensive results. 4. CONCLUSION In this paper, we address the problem of accurately detecting the presentation attack performed using 3D face mask on the face recognition system. We proposed a novel PAD algorithm based on both local and global features. The local features are extracted to capture the artefacts in eye and nose region contributed due to the presence of the 3D mask while global features are used to extract the microtexture components using BSIF. We then fuse the comparison scores obtained using local and global features on both 2D color and depth image using weighted sum rule. Extensive experiments that are conducted on 3DMAD database shows the outstanding performance of the proposed scheme with a lowest HTER of 0.03%. 5. ACKNOWLEDGMENT

Table 1. Comparative performance of the proposed PAD algorithm (∗reimplemented with Matlab) Algorithms LBP-SVM∗ [4] LBP-LDA∗ [7] Proposed scheme

HTER (%) 8.14 9.16 0.03

326

This work is funded under the 7th Framework Programme of the European Union Project FIDELITY (SEC-2011-284862). All information is provided as is and no guarantee or warranty is given that the information is fit for any particular purpose. The European Commission has no liability in respect of this document, which merely represents the authors view.

ICIP 2014

6. REFERENCES [1] André Anjos and Sébastien Marcel, “Counter-measures to photo attacks in face recognition: a public database and a baseline,” in Biometrics (IJCB), 2011 International Joint Conference on. IEEE, 2011, pp. 1–7.

[15] ISO/IEC JTC1 SC37 Biometrics, ISO/IEC WD 30107-3:2014 Information Technology - presentation attack detection - Part 3: testing and reporting and classification of attacks, International Organization for Standardization, 2014.

[2] Gang Pan, Lin Sun, Zhaohui Wu, and Shihong Lao, “Eyeblinkbased anti-spoofing in face recognition from a generic webcamera,” in Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on, 2007, pp. 1–8. [3] Nesli Erdogmus and Sebastien Marcel, “Spoofing 2d face recognition systems with 3d masks,” in Biometrics Special Interest Group (BIOSIG), 2013 International Conference of the. IEEE, 2013, pp. 1–8. [4] N. Kose and J.-L. Dugelay, “Countermeasure for the protection of face recognition systems against mask attacks,” in Automatic Face and Gesture Recognition (FG), 2013 10th IEEE International Conference and Workshops on, April 2013, pp. 1–6. [5] Murali Mohan Chakka, Andre Anjos, Sebastien Marcel, Roberto Tronci, Daniele Muntoni, Gianluca Fadda, Maurizio Pili, Nicola Sirena, Gabriele Murgia, Marco Ristori, et al., “Competition on counter measures to 2-d facial spoofing attacks,” in Biometrics (IJCB), 2011 International Joint Conference on. IEEE, 2011, pp. 1–6. [6] Muhammad-Adeel Waris, André Anjos, Iftikhar Ahmad, Dushyant Goyal, Shubham Khandelwal, Nicola Sirena, Anderson Rocha, Sébastien Marcel, Christian Glaser, Shubham Gupta, et al., “The 2nd competition on counter measures to 2d face spoofing attacks,” in International Conference of Biometrics 2013, 2013. [7] N. Erdogmus and S. Marcel, “Spoofing in 2d face recognition with 3d masks and anti-spoofing with kinect,” in IEEE Sixth International Conference on Biometrics: Theory, Applications and Systems (BTAS), 2013, pp. 1–8. [8] Juho Kannala and Esa Rahtu, “Bsif: Binarized statistical image features,” in Pattern Recognition (ICPR), 2012 21st International Conference on, 2012, pp. 1363–1366. [9] “Eye detector,” http://yushiqi.cn/research/eyedetection.html. [10] S Hamid Nawab and Thomas F Quatieri, “Short-time fourier transform,” Advanced topics in signal processing, vol. 6, no. 2, pp. 289–337, 1988. [11] M. Castrilln, O. Dniz, C. Guerra, and M. Hernndez, “Encara2: Real-time detection of multiple faces at different resolutions in video streams,” Journal of Visual Communication and Image Representation, vol. 18, no. 2, pp. 130 – 140, 2007. [12] Aapo Hyvèarinen, Jarmo Hurri, and Patrick O Hoyer, Natural Image Statistics, vol. 39, Springer, 2009. [13] R. Raghavendra, G.H. Kumar, and A. Rao, “Qualitative weight assignment for multimodal biometric fusion,” in Advances in Pattern Recognition, 2009. ICAPR ’09. Seventh International Conference on, Feb 2009, pp. 193–196. [14] R. Raghavendra, Bian Yang, and Christoph Busch, “Robust on-the-fly person identification using sparse representation,” in IEEE International Conference on Multimedia and Expo Workshops (ICMEW), 2013, pp. 1–4.

327

ICIP 2014