Face Detection using Selected Feature Maps from a

0 downloads 0 Views 456KB Size Report
framework of the VGG-Face model is shown in Fig 1. The pre-trained model contain thirteen convolution and 5 pooling. Figure 1: Proposed Framework. 46 High ...
The 3rd International Conference on Next Generation Computing(ICNGC2017b)

Face Detection using Selected Feature Maps from a Pre-trained CNN 1

Amin Ullah, 1Jamil Ahmad, 1Mi Young Lee, 2Jin-Taek Kim, 1Sung Wook Baik* 1

Digital Contents Research Institue, Sejong University Seoul, Republic of Korea 2 Cosortium of Cloud Computing Research, Seoul, Republic of Korea [email protected], [email protected], [email protected], [email protected] *Corresponding Author Email: [email protected] Abstract― Face detection is one of the key visual information analysis tasks in Computer vision. In this paper, we present an efficient and light weight face detection technique based on spatial activations in the intermediate feature maps of a pre-trained convolutional neural network (CNN). Feature maps which detect various facial features are first identified. Then the selected subset of feature maps are combined to approximate the location of face(s) in the image. The proposed method can efficiently detect face region in images and generates less false positives as compared to a state-of-the-art method. Keywords― Face detection, Convolution neural network, Binary operation.

I.

Introduction

Face detection is a challenging task and it has many application in computer vision. Such as face recognition, face tagging in social media (Facebook, Instagram) and facial expression analysis for human computer interaction and other face related problems. Beside the importance of face detection some crucial challenges in are illumination, face pose estimation, background scenario, facial expressions and other similar problems [1]. Researchers have been working for the fast and efficient face detection algorithms form last two decades. A well-known Viola–Jones detector used Haar-like features and it is computationally quick to calculate, but still it has some problem for detecting face from different scale and angels. It is also weaker to find images having blurred regions etc. This problem was firstly try to solved by using separate cascade classifier for each direction of face image, or by using a decision tree for pose approximation and matching cascade

Image

ROI

46 High Activation Feature Maps

Query Image

Sum of 46 Maps

Threshold

Results

Figure 1: Proposed Framework. to validate the detection accuracy score [2].

The adaptive boosting algorithm extracts Haar-like features, which are processed and feed to classifier by using cascade approach. The cascade architecture makes the face detector faster by analyzing the integral image. Whereas adaptive boosting is the crucial to achieve a good detection accuracy score for a cascade node. Todays, several researcher have been using Viola and Jones’ cascade- based architecture, where the flow of it is to firstly use some low level classifiers which use Haar-like features extracted from sliding window at different scales and finally then enhance those low level into a robust classifier for the prediction of face is in the window or not [3]. Irwin [4] et al. present an ASIC model which greatly activities parallelism of the adaptive boost face recognition method by parallelizing accesses of image data. This method have given computation complexity of 52 frame per second (FPS), however they did not show image size of experimental evaluation. Wei et al. [5] presents a FPGA model which use only a small portion of the complete algorithm. They have got computation complexity up to 15 FPS for 120x120 images. II. Proposed Method The proposed method for face detection is divided into two main steps. Firstly we have cropped region of face from image and feed face to pre-trained VGG [6] model to extract layer convolution 5 feature maps. After achieving 256 feature maps from conv5, we analyzed high activation of face regions, and save those feature maps which generate high activations for face region. Secondly, those feature maps are added and normalized between 0 and 1. Finally, face is detected by apply thresholding to find high activations of face in image. Noises and wrong detection is removed using binary operations opening and closing. Framework of proposed technique is given in Fig.1. The VGG model is working on the VGG-Very-Deep19 CNN architecture. Which is trained on large ImageNet [7] dataset of one million images for classification. The VGG model is consist of sixteen convolutional layers, five pooling layers followed by three fully-connected (FC) layers. The highly adjusted weight of convolution kernels generate high informational feature maps while with help of pooling layers dimension of the feature are reduced with a factor of 2. The framework of the VGG-Face model is shown in Fig 1. The pre-trained model contain thirteen convolution and 5 pooling

The 3rd International Conference on Next Generation Computing(ICNGC2017b) layer and three fully connected layers. Training new CNN model need millions of images, therefore we used a pretrained model [8]. In the proposed method the weights of convolution layer five are utilized for face detection. In the first step we have extracted face region from image and pass it through CNN model. The feature maps of convolution layer five are analyzed to find high activation of face regions. We have selected 46 high activation feature maps out of 256 from this layer. After achieving high activation in feature maps. We compute their sum and apply basic thresholding to remove low activation values from image and to keep on high activations. We also got some noise of activation around the image which is then remove by binary operation opening and

proposed method give correct detection in such case. III. Conclusion In this paper we have proposed face detection method based on high activation of feature maps of pre-trained VGG model. The detected face is post process using thresholding and opening, closing binary operations. The proposed method is tested on different queries and compared with state-of-theart technique. The result of the proposed method can be improve by training own model for faces. However, on pretrained model the proposed method prove as a good descriptor for face detection. Acknowledgement This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea Government (MSIP) (No.2016R1A2B4011712).

References 1.

2.

3.

4.

5. Original Image

Proposed Method

Viola-Jones Method

Figure 2: Comparison of proposed method with Viola–Jones detector. closing. Opening is the composite two operation firstly erosion is applied which shrink detected region through which unwanted regions are removed followed by dilation (with the same structuring element) which fill the detected region holes. Closing is the composite operation of dilation followed by erosion (with the same structuring element). These operation clear the detection of face in the image. The proposed method is evaluated using variety of face images taken form the Web. It contain faces capture in different illumination and also form different angles and scale. The images are collected from different field of life, such as sports, entertainments, and news. Fig.2 show results of proposed method which is compared with well-known violajones face detection algorithm. It can be seen from results that there are incorrect detection by the viola-jones but the

6.

7.

8.

Viola, P. and M.J. Jones, Robust real-time face detection. International journal of computer vision, 2004. 57(2): p. 137-154. Jones, M. and P. Viola, Fast multi-view face detection. Mitsubishi Electric Research Lab TR20003-96, 2003. 3: p. 14. Pan, H., Y. Zhu, and L. Xia, Efficient and accurate face detection using heterogeneous feature descriptors and feature selection. Computer Vision and Image Understanding, 2013. 117(1): p. 12-28. Theocharides, T., N. Vijaykrishnan, and M.J. Irwin. A parallel architecture for hardware face detection. in Emerging VLSI Technologies and Architectures, 2006. IEEE Computer Society Annual Symposium on. 2006. IEEE. Wei, Y., X. Bing, and C. Chareonsak. FPGA implementation of AdaBoost algorithm for detection of face biometrics. in Biomedical Circuits and Systems, 2004 IEEE International Workshop on. 2004. IEEE. Simonyan, K. and A. Zisserman, Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Deng, J., et al. Imagenet: A large-scale hierarchical image database. in Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. 2009. IEEE. Sajjad, M., et al., Integrating salient colors with rotational invariant texture features for image representation in retrieval systems. Multimedia Tools and Applications, 2017: p. 1-21.

Suggest Documents