A Pipeline to Improve Face Recognition Datasets and Applications I. Gallo, S. Nawaz, A. Calefati Università dell’Insubria, Varese, Italy
G. Piccoli Louisiana State University, LA, USA
[email protected]
[email protected]
Abstract We propose a pipeline to improve face recognition systems. Our pipeline is capable of cleaning an existing face dataset to improve the recognition performance or creating one from scratch. We present detailed experiments to show characteristics and performance of the pipeline. In addition, a small-scale application for face recognition that makes use of the proposed cleaning process is presented. Frames from Videos
Dataset creation
Still images
Expert assignes assignesidentities identities
Center Loss
removes removesfalse false positive positive
Michael Inception-ResNet-v1 trained with Center loss function
Multi-Task CNN [*] for face detection
Pre-trained CNN model (center loss)
Aligned faces
Alex
Nic
Embedding DBSCAN Clusters Size 128
Dataset cleaning experiment The MS-Celeb-1M dataset is strongly affected by noise. We randomly selected a sub-set of 10, 000 identities to remove noise from each identity. We automatically selected the biggest cluster obtained using DBSCAN to remove noise from each identity.
[*] Multi-Task CNN for face detection
Pisto
Center Loss
Datasets used for training - VGGFace2 - CASIA-WebFace - MS-Celeb (clean - unclean)
The proposed pipeline can be used to create a brand new dataset from scratch, for face recognition
Dataset cleaning Select Selectthe thebiggest biggest cluster clusterfor foreach eachclass class
Typically, large scale datasets created in a semi-supervised way from search engines are prone to noise. The pipeline can be used to remove noise from existing datasets. Feeding aligned faces to a clustering algorithm, we separate identities and noise using embeddings from the pre-trained CNN model.
Steve Job Jennifer Aniston
Unclean dataset
Dataset creation experiment Angelina Jolie
Applications Taking attendance in a controlled environment • Schools • Companies • ...
Comparison of results obtained using the Inception ResNet-v1 trained on both cleaned and uncleaned versions of MS-Celeb dataset.
In this experiment we want to check if we can obtain automatically a single cluster of an identity merging various videos taken from heterogeneous sources with different environment settings.
3 short videos of 5 celebrities
[*] K. Zhang, Z. Zhang, Z. Li, and Y. Qiao, Joint face detection and alignment using multitask cascaded convolutional networks, IEEE Signal Processing Letters, vol. 23, no. 10, pp. 1499–1503, 2016.
Accuracy (Acc), precision (P) and recall (R) of the cleaning process applied to 50 randomly selected identities of the MS-Celeb dataset. Positive images belong to a selected identity, while negative are all remaining images.
Running our pipeline for dataset creation we obtained: • 2104 detected faces • 1921 faces contained in the selected clusters • 91.59% average accuracy (selecting only the biggest cluster) • 8.41% average discarded faces for each identity • 0 errors in each selected cluster We are able to merge videos or images coming from various sources with zero false positives in the selected clusters.
IVCNZ 2018: Image and Vision Computing New Zealand conference - Auckland, New Zealand. 19-21 November, 2018