Binary Watermarks: A Practical Method to Address Face Recognition Replay Attacks on Consumer Mobile Devices Daniel F. Smith, Arnold Wiliem, and Brian C. Lovell School of ITEE, The University of Queensland Qld. 4072. Australia
[email protected] [email protected] [email protected]
Abstract Mobile devices (laptops, tablets, and smart phones) are ideal for the wide deployment of biometric authentication, such as face recognition. However, their uncontrolled use and distributed management increases the risk of remote compromise of the device by intruders or malicious programs. Such compromises may result in the device being used to capture the user’s face image and replay it to gain unauthorized access to their online accounts, possibly from a different device. Replay attacks can be highly automated and are cheap to launch worldwide, as opposed to spoofing attacks which are relatively expensive as they must be tailored to each individual victim. In this paper, we propose a technique to address replay attacks for a face recognition system by embedding a binary watermark into the captured video. Our monochrome watermark provides high contrast between the signal states, resulting in a robust signal that is practical in a wide variety of environmental conditions. It is also robust to different cameras and tolerates relative movements well. In this paper, the proposed technique is validated on different subjects using several cameras in a variety of lighting conditions. In addition, we explore the limitations of current devices and environments that can negatively impact on performance, and propose solutions to reduce the impact of these limitations.
1. Introduction Imagine the near future. By now, most banks will have implemented three-factor authentication for access to higher risk transactions. The usability of online banking has improved significantly — no more difficult-to-remember passwords [4, 5]. Accessing your account now is as simple as looking at your smart phone or desktop computer. The face verification system is smart enough to detect spoofing attempts, such as a photograph or 3D mask [7, 10]. Banks have greatly reduced disputed transactions as the system has a strong audit trail of who performed the transaction, since the video of the authenticated user performing the transaction is available.
You access online banking to pay a bill, only to find that there is no money in your account. After contacting the bank, you find that all of your money was transferred to a foreign bank — by you! Indeed, your bank shows you the video of you actually transferring the funds. Reports in the news are that thousands of customers have also been affected. How could this happen? Replay Attacks are a significant threat to face recognition systems that use cameras on uncontrolled devices (such as desktop computers, laptops, tablets, or smart phones) to capture a person’s biometric data. Replay attacks occur when the images of the user’s face are captured in digital form and then injected back into the system at a later time (and possibly from a different device). The attack can be created cheaply and launched worldwide in the form of malicious software such as Trojan Horses or viruses [21, 22]. A major attack of this nature could completely undermine public trust in biometric systems. Replay attacks are very different from spoofing attacks which occur when the attacker creates a biometric facsimile such as a photograph or 3D mask of the target victim for use in front of the camera. Current smart phones now possess fingerprint readers, which were quickly compromised through the use of fake fingerprints. Spoofing attacks are expensive to create as each attack must be crafted for the target victim. Yet, the public is more aware of spoofing attacks as they have been portrayed in popular movies (e.g. James Bond “Never Say Never Again”, 1983). In our opening scenario, since the video was of a real person, the anti-spoofing mechanisms are unable to detect that it is fraudulent. By automating the attack, many victims could be compromised in a short space of time. The root cause of the problem is that the bank cannot tell when the video was captured, or from which device. The communication path from the camera to the device is generally not protected, allowing the original video stream to be intercepted prior to any additional security (such as a timestamp) being added. If security is to be added to the video, it must be done prior to an intruder gaining access to the video data. Recently, Smith et al. [19] proposed a method of using a coded color sequence on the device screen to reflect from the user’s face. These reflections were analyzed to deter-
mine the color patterns in the captured video of the user’s face. The method was very sensitive to small movements, and required a dark room to provide the necessary contrast. In this paper, we propose significant improvements in both utility and robustness to this work. Instead of encoding a watermark signal as a series of colors, our proposed system uses high contrast monochromatic illumination. The proposed challenge signal is coded as a framewise binary stream. The use of monochromatic illumination provides higher contrast between the two illumination states in a wider variety of environments and mitigates color constancy problems. In addition, instead of using a Support Vector Machine and specific training data, we propose a classification algorithm that adaptively learns its model from the current environment. The improved system is far more tolerant to small hand movements, which naturally occur when the smart device is hand held. Contributions: The contributions of this paper are: • To propose a non-cooperative, watermark-based, antireplay attack technique for face recognition on uncontrolled consumer devices that is robust in a wide and practical set of environmental conditions; • To test the performance of our proposed system under different environmental conditions and different cameras, under natural usage conditions; • To provide public datasets of our experiment to encourage independent validation and further research. We continue this paper as follows: Section 2 examines previous work on determining liveness of biometric subjects. Section 3 outlines our proposal to address replay attacks in face recognition systems. Section 4 details the experimental evaluation of the proposal, with Section 5 showing the results of that experiment. Section 6 covers conclusions and future work.
2. Related works Ambalakat [2], Bolle et al. [3], and Khan et al. [13] proposed challenge-response systems that require an intelligent biometric sensor. Unfortunately, such intelligent sensors are not currently widely deployed on consumer smart devices. Frischholz and Werner [8] used head pose estimation to determine the liveness of the subject, by directing the user to turn their head as directed, but details were omitted. Maltoni et al. [14] stated that the replay of fingerprint data “can be prevented by using standard cryptographic techniques”. Galbally et al. [9] indicated that replay attacks “exploit possible weak points in the communication channels of the system”. Shelton et al. [18] limit their discussion on replay attacks to data sent across a network. For nonspecialized consumer devices, the internal communication channels are generally not well protected, and using either cryptography to protect those channels, multiple sensors, or multi-factor authentication is simply not feasible. Jee et al. [12] used eye movement and blinking to determine liveness. Pan et al. [16] also proposed eye blinking
for liveness detection. De Marsico et al. [6] used random head movements to determine the 3-D nature of the face. However, these undirected movements could be previously captured and replayed at a later time. Akhtar et al. [1] used a fusion of biometric modes and classification algorithms to determine liveness. Whilst these techniques demonstrate liveness, the timing of when the video was captured is not secured. Smith et al. [19] proposed to watermark the video signal captured using a webcam on an uncontrolled device, by displaying a color sequence on the screen that is reflected from the user’s face. Their proposal was limited to very dark environments, and was susceptible to small hand movements that naturally occur when using a hand held device.
3. Proposed approach In this paper, we propose using high contrast monochromatic illumination to produce the reflected watermark. This high contrast signal, coupled with an adaptive analysis process, provides a robust and practical solution to the replay attack problem in a wide variety of environments. The watermark is a binary nonce challenge in a challenge-response system, and is inserted into the video prior to the intruder gaining access to the video signal. The watermark challenge signal is displayed on the entire screen of the device as a random sequence of illumination levels (Light or illuminated means screen illumination on; Dark or non-illuminated means screen illumination off ). The challenge signal can include consecutive displays of the same illumination level. This displayed illumination reflects from the person or object positioned in front of the camera. Each time the next illumination level in the sequence is displayed, an image of the object is captured, and later analyzed to determine the level of reflection from the object. These reflections form the response in the challengeresponse system. If the reflection sequence is (mostly) the same as the displayed illumination sequence, then the video is deemed to have been captured by this device and at the time the illumination was displayed, thus defeating any attempt to replay this video later, as the next challenge sequence will be different. The next subsections discuss the binary watermark insertion process, highlighting limitations imposed by the current technology, followed by the description of the extraction process that classifies the reflected illumination states.
3.1. Binary watermark insertion process In our binary watermark insertion process, we use a static Region Of Interest (ROI) window, to assist with analysis. This window is 30% of the width by 50% of the height of the video frame, centered on the frame, which is the approximate face region when normally using a webcam. The camera is started for three seconds (3s) to allow the automatic camera settings to settle. During this time, the user is
requested to align with the ROI and to remain relatively stationary throughout the entire capture process. In practice, this ROI could be replaced with a robust face detector. Six frames of Dark are captured and recorded, followed by one frame of Light. These are used as analysis start and calibration signals. The challenge signal is then displayed as a randomized sequence of 31 illuminations (Dark or Light) while the images are captured and recorded. The capture protocol concludes with six frames of Dark. The proposed challenge signal forms a 231 entropy challenge. Figure 1 shows a sequence of Light and Dark frames being captured, with the different illumination states representing the inserted binary watermark.
Figure 1. Video sequence of Light and Dark frames.
During testing, it was observed that when changing illumination state, captured frames would continue to display reflections from the previous illumination state, either fully or partially, for several frames. Therefore, the watermark insertion process implements a delay each time the illumination level changes, defined in the next Section.
3.2. Time delay between illumination changes From testing, there is a delay between when the illumination state is requested to be updated, and when that update is observed in the reflection by the camera. The root cause was identified as a delay in updating the screen illumination, and not when the cameras capture images. Empirical analysis (results not shown) determined that a fixed delay of 200ms turned out to be sufficient to complete the screen update in most cases. If the illumination state is unchanged, the next frame can be used immediately. The current capture process therefore requires a duration from 4.5s (min.) to 9.9s (max.) to fully complete (at 30f ps). Reducing the screen update time (e.g. using a LED for illumination instead of the screen) may reduce this 200ms time delay. Recent smart phones now possess bright white LEDs, as well as high speed cameras capable of capturing 240f ps. The time to complete the capture process could fall to 3.2s, (including the preamble of 3s) for a 231 entropy. Such a system could allow many more frames to be captured in a shorter time, significantly increasing the entropy and allowing for an increased tolerance to classification errors. Therefore, the current capture time could be significantly reduced in the near future, further reducing the problem of movement occurring during the capture process. Another possible way to address this was by measuring the intensity of each frame as it was captured, and discarding frames until the illumination state change was observed.
However, this technique could not determine when a frame had only partially changed, or fully changed illumination.
3.3. Binary watermark extraction To extract the binary watermark from the video data stream, we propose an adaptive analysis technique that incrementally updates the model based upon that data stream. Unlike Smith et al. [19] that used multiple color states, we use only Light and Dark illumination states. To that end, we propose an intensity feature to represent the intensity of reflected illumination above the ambient background in each ROI window (see Section 3.1). We then define an illumination state classifier that determines the frame’s illumination state that adaptively updates its model. The sequence of classified illumination states represents the extracted binary watermark signal. Intensity feature extraction: We extract the intensity feature from the ROI window by using θ : Rp×q×3 7→ [0, 1]. More precisely, we define the ROI window in the nth frame, Wn ∈ Rp×q×3 , as a static area, centered on the frame, where p and q are 30% of the width and 50% of the height of the frame. This is the most likely area within the captured frame to find the object of interest. This also serves to remove extraneous background information from the frame which may produce spurious results due to subtle movements. We first minimize the noise (ambient light and static scenery) by performing a clipped subtraction of a nonilluminated window, W0 , from the subsequent windows, Wn , in the RGB color space, as: ∆Wn = max (Wn − W0 , 0) , (1) where ∆Wn is the result of the clipped subtraction. Next, we convert ∆Wn from the RGB color space to the HSV color space to isolate the intensity using the V channel (similar to Smith et al. [19]). In our method, we are not concerned which color (or Hue) is used for illumination. Finally, we calculate the function, θ, as the normalized, weighted level of intensity, using Equation 2: Pp θ (∆Wn ) =
i=1
Pq
j=1
M (∆Wn (i, j) [V ])
α×p×q
,
(2)
where: 0 β M (∆Wn (i, j) [V ]) = α
if 0 ≤ ∆Wn (i, j) [V ] < β if β ≤ ∆Wn (i, j) [V ] < α . if α ≤ ∆Wn (i, j) [V ] (3)
∆Wn (i, j) [V ] represents the value of the V channel of the pixel in the ith row and jth column of ∆Wn . α and β are parameters defined from experimental analysis. The weighting function in Equation 3 quantizes the V channel values to limit the effect of noise pixels from relative movements during the capture process. Other techniques can be explored here that may produce better results. Illumination state classifier: The illumination state classifier determines if the illumination state for the current frame
will implicitly remain the same as for the previous frame, or be explicitly set to a particular state. In order to detect if the illumination state should be updated, we first define the change in intensity between consecutive frames, ∆θn , as:
classifications without significantly reducing the security of the system, resulting in a more robust and practical solution.
∆θn = θ (∆Wn ) − θ (∆Wn−1 ) . (4) Small movements of the subject during the capture process result in small changes in ∆θn . Therefore, we must define the required threshold, τ1 , which if |∆θn | > τ1 , the illumination state will be determined appropriately. To calculate τ1 , we maintain the Light and Dark models, λL and λD , represented as the average intensity of the previously classified Light and Dark windows, respectively, updated for each frame. We calculate τ1 as the square of the Euclidean distance between λL and λD , as: 2 τ1 = max ||λL − λD ||2 , γ . (5)
During our test experiment, we set the weighting values in Equations 2 and 3 (α and β), to 30 and 3, respectively. We set γ from Equation 5 and Conditions L2 and D2 to 1% of the maximum possible value of the function θ. The experimental equipment consists of: Microsoft Surface Pro 2 tablet; Windows 8.1 Pro; VMware Player 6.0.4 virtual machine; Ubuntu 13.10; OpenCV 2.4.9; internal Microsoft LifeCam webcam; external Logitech QuickCam Pro 5000 webcam; external Logitech QuickCam Pro 9000 webcam; screen resolution 1920 × 940 pixels; video capture resolution 640 × 480 pixels at either 15 or 30f ps (selected automatically by the camera driver, depending on the lighting conditions). Our experiment uses six environments: Dark Room; Office Light (using fluorescent lights); Natural Light indoors; Cloud Cover outdoors; Full Shade but under an open sky; and Full Sunlight. In Smith et al. [19], the environment was restricted to only a darkened room which is not a typical use case for online authentication. Although this technique is aimed at face recognition systems, any objects in front of the camera are sufficient to generate suitable reflections for classification. All objects are positioned approximately 35cm in front of the screen, which is a normal usage distance. The experiment is performed on three primary objects. The first set of objects consist of Soft Toys. A total of 20 Soft Toys are used, which provides a wide variety of shapes, sizes, textures, and colors. The Soft Toys are hand held in front of the tablet to simulate the hand holding of the mobile device. Each toy is recorded five times in each environment. This results in 100 videos for each of the six environments. The second set of objects are faces printed on paper with the face outline cut out and the background removed. This is done to contrast our results against those obtained by Smith et al. [19]. The Paper Faces are hand held in front of the tablet to simulate the hand holding of the mobile device. Five different faces are used, and are recorded five times for each environment. This results in 25 videos for each of the six environments. The final set of objects consist of real people. Five participants hold the tablet computer in their hands, which introduces additional movement relative to the background in the video. Each participant is recorded five times in the six environments, resulting in 25 videos per environment. The resulting datasets (FRAUD2) are available from: http://itee.uq.edu.au/sas/datasets/ . We then performed further analysis on the conditions of each environment. Our proposed system can be viewed as a typical communication system in a noisy environment [17]. The reflected light is a communication signal, and the ambient light is noise. Using this analogy, we can quantify the problem by calculating the Signal-to-Noise Ratio (SNR).
By squaring the Euclidean Distance, we place more emphasis on larger differences between λL and λD , and less emphasis on smaller differences. We enforce that τ1 is not less than γ to address the situation where the difference between λL and λD is small, resulting in the illumination state being updated regardless of the magnitude of ∆θn . In addition, as discussed in Section 3.2, a time delay of 200ms is used when capturing frames to remove almost all of the partially illuminated frames. However, very occasionally, one partially illuminated frame is captured. As a result, both ∆θn and ∆θn+1 might not exceed τ1 , whereas, if ∆θn represented a completed illumination change, it would have exceeded τ1 . Therefore, a further condition is defined to determine the illumination state, based on the intensity of the current frame (θ (Wn )). This condition first requires that |∆θn | > γ to ensure that some illumination has occurred. If this is true, then the illumination state will be determined by whether θ (Wn ) is closer to λL or λD . To this end, a second threshold, τ2 , is defined as the midD point between λL and λD (i.e. τ2 = λL +λ ). The illumi2 nation state of Wn will be defined as Dark if θ (Wn ) < τ2 , Light if θ (Wn ) > τ2 , and unchanged if θ (Wn ) = τ2 or |∆θn | ≤ γ. The illumination state classifier therefore uses the results of the following four test conditions as input: L1 : L2 : D1 : D2 :
sign (∆θn − τ1 ) = +; (i.e. ∆θn > τ1 ); sign (∆θn − γ) = sign (θ (Wn ) − τ2 ) = +; sign (∆θn + τ1 ) = −; (i.e. ∆θn < −τ1 ); sign (∆θn + γ) = sign (θ (Wn ) − τ2 ) = −;
If either condition L1 or L2 are met, then the illumination state for Wn is Light. If either condition D1 or D2 are met, then the illumination state for Wn is Dark. If no conditions are met, the illumination state for Wn remains unchanged from Wn−1 . Note that Li and Di are mutually exclusive. Due to the increased entropy in this system, it is possible to tolerate a small number of incorrect illumination state
4. Experiment
0.0
-20.0
-40.0
Office s125nlux3
Natural s800nlux3
iPhonen6nLEDnsflash3ns363.0nlux3 iPhonen6nLEDns60.8nlux3 iPhonen6nScreenns20.1nlux3
Dark P8hluxp
Figure 2. Signal-to-Noise Ratio (SNR, in dB) for different devices in each environment.
5. Results and analysis Figure 3 shows the results for the experiment. In all cases, we accept up to two errors in frame illumination classification (i.e. a Hamming Distance of two is acceptable) out of the total of 31 illumination classification possibilities. As can be seen, for a typical indoor environment, our proposed technique performs exceptionally well, regardless of the object used in front of the camera. All cameras performed equally well (detailed analysis not shown here). Despite hand holding and simulated hand holding of the tablet computer during the video capture which produces normal relative movements, our results demonstrate that our proposed technique is robust to these movements. For outdoor environments, the results are much less favorable. The Cloud Cover environment produced a significantly decreased correct classification rate than all indoor
Natural P800hluxp
Cloud Shade Sunlight P9000hluxp P12500hluxp P60000hluxp
PaperhFaces
LivehFaces
Figure 3. Correct Classification Rate (CCR, in %) for all objects in each environment.
environments. In particular, the performance for Live Faces was no better than any other outdoor environment (i.e. 0%). The Full Shade environment produced even lower results. The measured ambient lux for this environment was 39% more than the Cloud Cover environment, resulting in a significant decrease in performance. From Figure 2, this suggests that the minimum SNR to perform adequately is approximately -15dB. In Full Sunlight, it was not possible to determine any significant difference between when the subject was illuminated by the screen, and when it was not. Figure 4 shows two examples of the signal strength of the reflected Pp Pqlight. P The signal strength was calculated as i=1 j=1 Ci ∈{R,G,B} Wn (i, j) [Ci ], where Ci is the Red, Green, or Blue color channel for pixel Wn (i, j). The total score for each frame was then normalized to the range of [0, 100%].
Cloud Shade Sunlight s9000nlux3 s12500nlux3 s60000nlux3 iPadn3nScreenns62.3nlux3 SurfacenPron2ns51.6nlux3
Office P125hluxp
SofthToys
Natural Light, LifeCam, Subject 4, 20141031151540.avi Scaledf Signal Strengthf (inf%)
Dark s8nlux3
100 90 80 70 60 50 40 30 20 10 0
100
Scaledf Signal Strengthf (inf%)
SNRnsinndB3
20.0
CorrecthClassificationhRatehPinh%p
Lux is the measure of illumination over a defined area. ISO2720 [11] defines luminance, L, when calibrating reflected light meters as: L = K1 A2 ÷ (tS) , (6) where A is the f -number, t is the exposure time (in secs), and S is the arithmetic film speed. K1 is a calibration constant, defined by Padfield [15] as: K1 = π × ρ × σ, where π converts luminance in cd/m2 to lux; ρ is 12.4 (ISO2720 defines the range as 10.6 to 13.4), and σ is a correction factor of 1.3 to account for lens absorption and diffuse reflectance. We used a dSLR camera to measure the reflected ambient light for each environment, and calculated the lux using Equation 6. These calculated lux values are a guide only as, for example, office lighting varies between locations, cloud cover is highly variable, and sun illumination varies with geographic location, time of day, and season. We also measured the lux of the reflection illuminated by the following devices: Apple iPhone 6 and iPad 3 screens; Apple iPhone 6 LED in light and flash mode; and Microsoft Surface Pro 2 screen. Using the following equation, we calculated the SNR for the reflected illumination signal in each environment (results shown in Figure 2): Psignal SN RdB = 10 × log10 . (7) Pnoise
100
50 0
1
6
11
16
21 26 FramefNumber
31
36
41
Full Sunlight,InputfSignal LifeCam, Subject 4, 20141031153442.avi ReflectedfSignalf(Wn) 50 0
1
6
11
16
InputfSignal
21 26 FramefNumber
31
36
41
ReflectedfSignalf(Wn)
Figure 4. Input signal and unprocessed reflected signal strength (in %) for sample videos from Natural Light and Full Sunlight environments taken from the FRAUD2 dataset.
As can be seen in Figure 4, there is a strong signal in the Natural Light environment, but the signal is more challenging to identify in the Full Sunlight environment. Based upon the results shown in Figure 3, we conclude that the proposed technique using the screen to illuminate the subject is limited to indoor environments that are not excessively lit. Possible causes for the failure to discriminate reflections when outdoors are that the camera may adjust its auto settings to a point where it cannot observe the reflections, or
that the reflection from the subject is simply not strong enough to be distinguished from the ambient light (i.e. the SNR is too low; see Figure 2). To investigate the former, we brought an object (in this case, a hand) closer to the screen and camera. Since light follows the inverse-square law [20], bringing the object closer to the screen should increase the signal (reflection). Holding a hand at 5cm distance in the Full Shade environment resulted in a 93.3% CCR (allowing zero or one errors). To investigate the latter, we used a small mirror at 35cm distance in the Full Sunlight environment. This resulted in a 100% CCR (with zero errors). These investigations confirmed that camera settings do not contribute to the classification problem, and increasing the SNR had a positive effect. The Apple iPhone 6 already has a white LED that provides seven times the illumination of the Microsoft Surface Pro 2 screen (see Figure 2), and a camera that can record at 240f ps. These facilities may create a usable system in most environments (Full Sunlight may remain challenging, but most people will not use their phone in Full Sunlight as it is too difficult to read the screen).
6. Conclusions and future work The results of our experiment show that our proposed technique of inserting a binary watermark into captured video using reflected Light and Dark illumination is highly effective and practical in most indoor environments. Although it is less effective in outdoor environments, we identified SNR constraints that could be removed to improve this performance in the future. The provided FRAUD2 dataset may be used for further research in defeating replay attacks. Acknowledgment: The experiment data involving humans was performed in accordance with the School of ITEE Ethics Approval Number: EC201303SMI.A1.
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
References [1] Z. Akhtar, C. Micheloni, C. Piciarelli, and G. L. Foresti. Mobio livdet: Mobile biometric liveness detection. In 11th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pages 187–192. IEEE, 26–29 Aug 2014. [2] P. Ambalakat. Security of biometric authentication systems. In 21st Annual Computer Science Seminar. Rensselaer Polytechnic Institute, 2005. [3] R. M. Bolle, J. H. Connell, and N. K. Ratha. Biometric perils and patches. Pattern Recognition, 35(12):2727–2738, Dec 2002. [4] J. Bonneau, C. Herley, P. C. van Oorschot, and F. Stajano. The quest to replace passwords: A framework for comparative evaluation of web authentication schemes. In IEEE Symposium on Security and Privacy, pages 553–567. IEEE, 20–23 May 2012. [5] B. Coskun and C. Herley. Can “something you know” be saved? In T.-C. Wu, C.-L. Lei, V. Rijmen, and D.-T. Lee, editors, Information Security Conference (ISC08), volume
[17] [18]
[19]
[20] [21]
[22]
5222 of Lecture Notes in Computer Science, pages 421–440. Springer Berlin / Heidelberg, 16–18 Sep 2008. M. De Marsico, M. Nappi, D. Riccio, and J.-L. Dugelay. Moving face spoofing detection via 3d projective invariants. In 5th IAPR International Conference on Biometrics (ICB), pages 73–78. IEEE, 29 Mar – 1 Apr 2012. N. Erdogmus and S. Marcel. Spoofing face recognition with 3d masks. IEEE Transactions on Information Forensics and Security, 9(7):1084–1097, 2014. R. W. Frischholz and A. Werner. Avoiding replay-attacks in a face recognition system using head-pose estimation. In International Workshop on Analysis and Modeling of Faces and Gestures (AMFG), pages 234–235. IEEE, 17 Oct 2003. J. Galbally, J. Fierrez, and J. Ortega-Garcia. Vulnerabilities in biometric systems: Attacks and recent advances in liveness detection. In Spanish Workshop on Biometrics (SWB), volume 1, 5 Jun 2007. J. Galbally and S. Marcel. Face anti-spoofing based on general image quality assessment. In 22nd International Conference on Pattern Recognition (ICPR). IEEE, 24–28 Aug 2014. ISO 2720-1974. Photography - general purpose photographic exposure meters (photoelectric type) - guide to product specification, 15 Aug 1974. H.-K. Jee, S.-U. Jung, and J.-H. Yoo. Liveness detection for embedded face recognition system. International Journal of Biological and Medical Sciences, 1(4):235–238, 2006. M. K. Khan, J. Zhang, and K. Alghathbar. Challengeresponse-based biometric image scrambling for secure personal identification. Future Generation Computer Systems, 27(4):411–418, 2011. D. Maltoni, D. Maio, A. K. Jain, and S. Prabhakar. Securing Fingerprint Systems, book section 9, pages 371–416. Springer-Verlag, London, UK, 2003. T. Padfield. Using a camera as a lux meter, 2003. Retrieved 06-Nov-2014 from http: //www.conservationphysics.org/lightmtr/ luxmtr1.php. G. Pan, L. Sun, Z. Wu, and S. Lao. Eyeblink-based antispoofing in face recognition from a generic webcamera. In 11th International Conference on Computer Vision (ICCV), pages 1–8. IEEE, 2007. C. E. Shannon. Communication in the presence of noise. Proceedings of the IRE, 37(1):10–21, 1949. J. Shelton, G. Dozier, J. Adams, and A. Alford. Permutationbased biometric authentication protocols for mitigating replay attacks. In IEEE Congress on Evolutionary Computation (CEC), pages 1–5. IEEE, 10–15 Jun 2012. D. F. Smith, A. Wiliem, and B. C. Lovell. Face recognition on consumer devices: Reflections on replay attacks. IEEE Transactions on Information Forensics and Security, 2015. In press. H. Stockman. Communication by means of reflected power. Proceedings of the IRE, 36(10):1196–1204, 1948. C. Xiao. Wirelurker: A new era in ios and os x malware. Report PAN WP U42 WL 0110514, Palo Alto Networks, 5 Nov 2014. Retrieved 13-Nov-2014. Y. Zhou and X. Jiang. Dissecting android malware: Characterization and evolution. In IEEE Symposium on Security and Privacy (SP), pages 95–109. IEEE, 2012.