2016 International Conference on Optoelectronics and Image Processing
Thermal Face Recognition Using Convolutional Neural Network
I 2
1 l 1 ' Zhan Wu , 2, Min Peng , 2, Tong Chen , 2 School of Electronic and Information Engineering Southwest University
Chongqing Key Laboratory of Nonlinear Circuits and Intelligent Information Processing Southwest University Chongqing, 400715, China e-mail:
[email protected]
Abstract-With the rapidly development of the face recognition
classifier. Goutam Majumder et al [16] used Gabor wavelet
(FR),
transform and fast independent component analysis (Fast
thermal
face
recognition
has
received
increasing
attention. However, the traditional methods for thermal face
ICA) to extract thermal face features, which can effectively
recognition mainly concentrate on the hand-crafted feature
improve
design, which requires more efforts from human to select and
reduction. However,
extract features and usually has relatively lower recognition (CNN) architecture for thermal face recognition. CNN is a new
show
that
our
proposed
recognition
by
dimension
higher recognition rate. But the moment invariant method is not capable enough to describe the details. Javier Ruiz-del
CNN
Solar et al [18] analyzed the advantages and limitations of
architecture achieves higher recognition rate compared with
the traditional algorithms of thermal face recognition. The
the traditional recognition such as LBP, HOG and moments
results showed that, LBP improved the recognition rate to
invariant.
some extent and had good gray translation invariance and
Keywords-thermal face recognition;
better robustness.
convolutional neural
network; RGB-D-Tface database ; deep learning
I.
traditional
thermal
face
as the complicated process, low and inefficient recognition INTRODUCTION
rate, etc. An alternative method for face recognition would be
Face recognition has been a hot research topic, and
deep leaning method. It has shown more advantages and has been applied to many different fields such as computer
number of researches investigating RGB face recognition,
vision [20] and pattern recognition [21]. Deep learning
which has been applied to business [2], transportation [3],
achieves the complex function approximation through a
security [4], etc. However, the face recognition based on
nonlinear network structure and shows the powerful learning
RGB images are fragile to the change of illumination[5], [6],
ability
[7]. Therefore, some other images instead of RGB images
(CNN) architecture for thermal face recognition. CNN is a kind of deep learning algorithm. Thermal images are firstly
efficient [10]. However, the price of hyperspectral equipment
obtained from the RGB-D-T face database [14]. Then, a
is expensive and the data processing is time-consuming [11]. emitted by the objects in the scene and convert
recognition
In this paper, we present a convolutional neural network
can be damaged easily after long-term use [9], which brings
IR
traditional
feature [23].
more maintenance cost. Face recognition based on HIS is
the
with
study features to reduce the workload for the manual design
Near infrared images are relatively robust to the illumination change [8]. However, the equipment for the NIR imaging
absorb
Compared
extraction and classifier determination into one step and can
infrared (NIR) [8], and hyperspectral images (HSI) [10].
equipment
[22].
algorithms, deep learning combines feature selection or
have been used for the recognition, such as thermal, near
imaging
Nevertheless,
recognition methods do have some shortcomings [19], such
received widely attention for many years [1]. There are a
Thermal
face
the recognition rate has been not
neighbor classifier for the recognition, which achieved a
effective features from the raw data. Experiment results on database
of
invariant to extract features and employed the nearest
type of neural network method which can automatically learn face
speed
obviously improved. Naser Zaeri et al [17] used moment
rate. In this paper, we present a convolutional neural network
RGB-D-T
the
CNN method is proposed, which can automatically learn
energy
effective features from the thermal face data. Compared with
it into
other state-of-art methods such as LBP, HOG and moments
electrical signal [12]. The face recognition can be performed
invariant, experiment results show that our proposed CNN
under the insufficient or completely dark environment [13]
architecture for face recognition has higher recognition rate.
by using thermal imaging technique. What's more, the speed of image processing is faster than hyperspectral equipment.
II.
The traditional algorithms for thermal face recognition
PROPOSED ARCHITECTURE
often take the following steps [15]: thermal face data are
In a multilayer neural network structure, each node of
firstly preprocessed and the features are then selected or
each layer undertakes the forward calculation and serve as
extracted; based on the useful features, the classifier is
the input of the nodes of the next layer. The value of any
trained; the recognition is achieved by using the determined
node at a layer is relevant to values of all the nodes
978-1-5090-0880-3/16/$31.00 ©2016 IEEE
6
connecting to It
m
synchronously. There are images of 51 college students in
the prior layer [24]. In this paper, we
propose a CNN model that is designed for thermal face
the database, whose resolution are 640x480,
recognition. Different from the ordinary neural network, the
384x288 for RGB, depth, and thermal images, respectively.
proposed CNN model optimizes the neural network structure through
the
local
receptive
field,
power
sharing,
640x480,
Three factors influencing face recognition, including
and
head rotation, expression variation, illumination variation,
sampling. The CNN has nine layers appending rectified
are considered in the database. When the images of a subject
linear units, which contains an input layer, three convolution
rotating his/her head were taken, his/her facial expression
layers, three pooling layers, a norm layer and a softmax
was required to be neutral and the illumination was required
classifier layer. The CNN structure is shown in Fig. 1, where
to be constant. Expression variation experiment requires the
S denotes step length and P denotes padding. Data
are
convoluted
in
the
position of the head and the illumination to be constant. The
convolution
layer.
images
of
neutral,
happy,
sad,
angry,
and
surprised
Convolution kernels translate two pixels after convolution
expression were taken. Illumination variation experiment
every time. The first convolution layer has a filter of size 5 x5.
requires the expression to be neutral and head position to be
The other two convolution layers have filters of size 3x3.
constant. In this paper, the thermal images of all subjects
The convoluted data are pooled to reduce feature parameters.
under three conditions were employed. For one subject and
In the paper, three max pooling layers are used which have
one factor experiment, there are 100 images. Therefore, for
2x2 accept domain. The convolution layers and max pooling
51 subjects there are 15300 (51x3xlOO) images.
layers are used alternatively. Feature map is achieved by
In order to achieve recognition, the faces need to be
local convolution and the dimension is reduced through
located from upper body images by using binarization [25].
pooling. Rectified linear unit layers (ReLU) which are placed
The pixels corresponding to the body region were set to zero,
around the three convolution layers replace sigmoid as
and then the central point of upper body (m\, na and the
activation function to intensify the learning efficiency in our
highest point (m2' n2) were located. m" m2 and nt, n2 were
structure
set as the abscissa and ordinate of pixels. Square area is
and
make
network
capable
of
the
sparse
characteristics. Finally, Softmax classification layer is used
constituted by setting (nrn,) as side length of square and (m"
to classify the above figure characteristics.
na as the base midpoint. Finally, the images are normalized into 112x112 pixels. A Matlab index function provided by the database for
Classifier
randomly selecting training set and testing set was used in Pooling 3 2×2 S2 P2
this paper in order to realize random signal acquisition to
ReLU 3
improve the reliability of the experiment. For each subject and each condition, the training set are the 10 images out of 50 training images produced by the index function, and the
Convolution 3 3×3 S1 P1
testing set are the 50 testing images produced by the index function.
Pooling 2 2×2 S2 P2
B.
ReLU 2
Parameter Setting The proposed CNN has nine layers. Convolution layer
1 employs 64 different convolution kernels whose output
Convolution 2 3×3 S1 P1
image is 56x56 in size. The output data of max pooling layer 1are decreased into 28x28, which undertakes down-sampling. The fourth layer is mainly to perform normalization, which is
Norm 1
advantageous to the data in the form of image. Then, the data are convoluted and pooled twice, whose dimension and size
Pooling 1 2×2 S2 P2
become 128 and 7x7. Finally, the data are grouped into 51
ReLU 1
class. The structure and the output size of each layer are summarized in Table I.
Convolution 1 5×5 S2 P2
TABLE!.
Data
Figure 1.
OUR PROPOSED CNN STRUCTURE AND THE OUTPUT SIZE Layer
Output
Images data
112xl12 64x56x56 64x28x28 64x28x28
Convolution layer 1
The CNN architecture.
Pooling layer 1 Norm layer1
III. A.
EXPERIMENT
Convolution layer 2
128x28x28 128xl4xl4
Convolution layer 3 Pooling layer 3
128xl4xl4 128x7x7
Classifier layer
51xlxl
Pooling layer 2
Database and Preprocessing The RGB-D-T face database [14] was used to test our
CNN model. The images in the database are collected from RGB camera, Kinect camera, and thermal imaging camera
7
100
In this paper, we employed RGB-D-T face database to test the proposed method. In order to show the superiority of the CNN, we compared it with three competitive algorithms for thermal face recognition, which include LBP-KNN [26], moment invariant-KNN
[27],
HOG-KNN
[28].
parameters of the three compared algorithms
The
were
�
90
� " .g
80
....... LBP
70
_ Moment Invariant HOG
;:
bfJ 0
set
(,) '" � '" (,) '"
according to the references [26-28]. The LBP were computed using 8 sampling points on a circle of radius 1. As for HOG
�
algorithms, 8x8 cell units and 2x2 blocks were set. In the case of the moment invariant, there was no needs for setting any parameters. C.
60 50
Experimental Results and Discussion
Head Rotation Expression
Table II shows the recognition rate considering three
Figure 2.
Illumination
Line chart of face recognition rate.
factors including the head rotation, expression, illumination. ACKNOWLEDGMENT
It is observed that the factor of head rotation affects the recognition
rate most. The
recognition
rate
under
this
We would like to acknowledge the support from the
condition is the lowest, which can be as low as 59.37% for
National Natural Science Foundation of China (Grant No.
moment invariant. The illumination variation has the least effect on the recognition rate. The average rate of four methods is over 98% (the lowest is 94.51%, the highest is 100%). This results
61301297
and
Research
Funds
The proposed CNN method outperforms other three in
all
conditions.
Even
in
the
head
the recognition rate (98%) is a dramatic improvement. Under another two conditions, i.e. expression and illumination (99.4% for expression variation, and lOO% for illumination variation).
that the CNN has the best recognition performance and can
Klin, Ami, "A nonned study of face recognition in autism and related disorders." Journal of autism and developmental disorders 29.6 1999, pp. 499-508.
[4]
Karmakar, Dhiman, and C. A. Murthy. "Face Recognition using Face Autocropping and Facial Feature Points Extraction." Proceedings of the 2nd International Conference on Perception and Machine Intelligence. ACM, 2015.
[5]
Wang S, He M, Gao Z, Emotion recognition from thennal infrared images using deep Boltzmann machine[J]. Frontiers of Computer Science, 2014, 8(4), pp. 609-618.
[6]
Chan, Chi Ho, et al. "llIumination invariant face recognition: a survey." Face Recognition in Adverse Conditions 2014, pp. 147-166.
[7]
Zhu, Jun-Yong, Wei-Shi Zheng, and Jian-Huang Lai. "Logarithm gradient histogram: A general illumination invariant descriptor for face recognition."Automatic Face and Gesture Recognition (FG), 2013 10th IEEE International Conference and Workshops on. IEEE, 2013.
[8]
Ruiz-del-Solar, Javier, "Thermal Face Recognition in Unconstrained Environments Using Histograms of LBP Features." Local Binary Patterns: New Variants and Applications. Springer Berlin Heidelberg, 2014, pp. 219-243.
[9]
Liu Y, Ai K, Liu J, Dopamine - Melanin Colloidal Nanospheres: An Efficient Near - Infrared Photothermal Therapeutic Agent for In Vivo Cancer Therapy[J]. Advanced Materials, 2013, 25(9), pp. 1353-1359.
largely improve the recognition rate under extreme condition, such as head rotation. FACE RECOGNITION RATE CONSIDERING THREE FACTORS
LBP
79.33
96.27
98.35
59.37
91.76
94.51
HOG
90.27
98.78
99.18
Our Method
98.00
99.40
100.00
IV.
CONCLUSION
In this paper, we present a CNN method for thermal face
recognition.
Three
conditions,
i.e.
head
rotation,
expression variation, illumination variation, which affect recognition traditional recognition,
rate
were
considered.
recognition the
methods
proposed
Compared
for
method
with
[10] Pan, Zhihong, "Face recognition in hyperspectral images." Pattern Analysis and Machine Intelligence, IEEE Transactions on 25.12 2003, pp. 1552-1560.
the
the
thermal
face
can
produce
best
(No.
[3]
A line chart was made according to the data in Table 2,
Moment Invariant
Fundamental
Jain, Anil, Ruud Bolle, and Sharath Pankanti, eds. Biometrics: personal identification in networked society. vol. A479. Springer Science & Business Media, 2006.
which is shown in Fig 2. This chart can visually illustrate
illumination
the
Universities
[2]
variation, the CNN method can again produce best results
Expression
and
Central
Drira, Hassen, "3D face recognition under expressions, occlusions, and pose variations." Pattern Analysis and Machine Intelligence, IEEE Transactions on 35.9 2013, pp. 2270-2283.
Compared with the second highest recognition rate 90.27%,
Head Rotation
the
[I]
rotation
it can achieve a recognition rate of 98%.
TABLE II.
61472330),
REFERENCES
illumination variation.
experiment,
for
XDJK2013CI24).
support that the thermal face recognition is robust to
methods
No.
[II] Gonzalez, Carlos, "Use of FPGA or GPU-based architectures for remotely sensed hyperspectral image processing." INTEGRATION, the VLSljournaI 46.2,2013, pp. 89-103.
recognition results. This suggests that CNN is a promising method for the thermal face recognition under extreme
[12] Lloyd, 1. Michael. Thennal imaging systems. Springer Science & Business Media, 2013.
conditions, such as side face view and rapid changing illumination environment.
8
[13] Wang, Shangfei, "Emotion recognition from thermal infrared images using deep Boltzmann machine." Frontiers of Computer Science 8.4 2014, pp. 609-618.
[22] Arel, Itamar, Derek Rose, and Robert Coop, "DeSTIN: A Scalable Deep Learning Architecture with Application to High-Dimensional Robust Pattern Recognition." AAAI Fall Symposium: Biologically Inspired Cognitive Architectures, 2009.
[14] Simon, Marc Oliu, "Improved RGB-DT based Face Recognition." let Biometrics ,2016.
[23] Chen, Guang, "Combining unsupervised learning and discrimination for 3D action recognition." Signal Processing 110 2015, pp. 67-81.
[15] Glorot, Xavier, Antoine Bordes, and Yoshua Bengio, "Deep sparse rectifier neural networks." International Conference on Artificial Intelligence and Statistics, 2011.
[24] Klepac, Goran, ed. Developing Churn Models Using Data Mining Techniques and Social Network Analysis, IGI Global, 2014.
[16] Majumder, Goutam, and Mrinal Kanti Bhowmik, "Gabor-Fast ICA Feature Extraction for Thermal Face Recognition Using Linear Kernel Support Vector Machine." Computational Intelligence and Networks (CINE), 2015 International Conference on. IEEE, 2015.
[25] Pang, Ying-Han, Andrew Teoh Beng Jin, and David Ngo Chek Ling. "Binarized revocable biometrics in face recognition." Computational Intelligence and Security. Springer Berlin Heidelberg, 2005, pp. 788795.
[17] Zaeri, Naser, Faris Baker, and Rabie Dib. "Thermal Face Recognition Using Moments Invariants.", 2015.
[26] Ruiz-del-Solar, Javier, "Thermal Face Recognition in Unconstrained Environments Using Histograms of LBP Features." Local Binary Patterns: New Variants and Applications. Springer Berlin Heidelberg, 2014, pp. 219-243.
[18] Xie, Zhihua, and Guodong Liu, "Infrared face recognition based on LBP co-occurrence matrix and partial least squares." International Journal of Wireless and Mobile Computing 8.1, 2015, pp. 90-94.
[27] Zaeri, Naser, Faris Baker, and Rabie Dib. "Thermal Face Recognition Using Moments Invariants." 2015.
[19] Gui, Jie, "Discriminant sparse neighborhood preserving embedding for face recognition." Pattern Recognition 45.8 2012, pp. 2884-2893.
[28] Hermosilla, Gabriel, "Study of Local Matching-Based Facial Recognition Methods Using Thermal Infrared Imagery." International Journal of Pattern Recognition and Artificial Intelligence 29.08, 2015, 1556012.
[20] Bengio, Yoshua. "Learning deep architectures for AI." Foundations and trends® in Machine Learning 2.1 2009, pp. 1-127. [21] LeCun, Yann, Yoshua Bengio, and Geoffrey learning." Nature52 \.7553, 2015, pp. 436-444.
Hinton.
"Deep
9