Second Main Title Line Second Line Third Main Title ...

5 downloads 6026 Views 3MB Size Report
30 http://www.digitaltrends.com/computing/facebook-to-combat-child- ..... Betaface Face Detection and Recognition SDK is a Windows DLL library .... object recognition and image matching, and a set of API methods for audio processing. ... languages, including C++, Python, Java, dotNet, and PHP, and wrappers are ...
First Main Title Line First Line Second Main Title Line Second Line Third Main Title Line Third Line

1

European Commission Joint Research Centre Institute for the Protection and Security of the Citizen

Contact information Laurent Beslay Address: Joint Research Centre, Via Enrico Fermi 2749, TP xxx, 21027 Ispra (VA), Italy E-mail: [email protected] Tel.: +39 0332 78 6556

http://ipsc.jrc.ec.europa.eu/ http://www.jrc.ec.europa.eu/

This publication is a Technical Report by the Joint Research Centre of the European Commission.

Legal Notice This publication is a Technical Report by the Joint Research Centre, the European Commission’s in-house science service. It aims to provide evidence-based scientific support to the European policy-making process. The scientific output expressed does not imply a policy position of the European Commission. Neither the European Commission nor any person acting on behalf of the Commission is responsible for the use which might be made of this publication.

JRC85864

Luxembourg: Publications Office of the European Union, 2013

© European Union, 2013

Reproduction is authorised provided the source is acknowledged.

2

3

Executive summary The JRC has started fruitful cooperation with the European Cybercrime Centre, since its inception in January 2013. The first joint research topic selected has been Video and image database search tool for forensic purposes. The aim is to develop and integrate smart tools for the identification of victims and perpetrators of online child abuse in very large media (video and picture) databases through enhanced automated categorisation of these media. With this regard, the present document analyses the state of the art in image and video analytics technologies supporting investigators on the fight against on-line child pornography (Child Sexual Abuse – CSA), and in particular to contribute to the identification of perpetrators and victims of such crimes. The aim is to provide a better understanding of currently available solutions (both commercial and open-source), and of the present level of academic and industrial research in the field. This document will offer a platform for detailed technical dialogue with EU law enforcement bodies involved in the fight against CSA, and will ground all the subsequent steps of the JRC project on fight against CSA. The document focuses primarily on five functionalities, namely face detection/recognition, age estimation, gender estimation, object detection/recognition, and context analysis, which have been selected by the JRC as of potential interest for the great variety of applications that they can give support to. As an instance: 









Face detection can be helpful when examining a certain video, to quickly and directly jump where the face of the aggressor and/or of the victim is visible. Face recognition techniques instead, coupled with a data base of template images of known child molesters can enable a fast, machine-aided identification of the aggressor. Age estimation can be used to automatically distinguish child pornography from legal pornography, by detecting if persons shown in a certain media file are below a certain age. Gender estimation can add value to search tools in media archives, where the gender can be used as a complementary query for the identification of a male or female perpetrator or victim, or to refine query results. Object detection and recognition tools could allow investigators to retrieve all the videos containing a specific object that has been detected previously in another image or video (from media material produced by the same child molester, or in the same location in a lawful and neutral context), and enables cross-referencing between different cases or media contents. The analysis of contextual information such as metadata can provide further tools e.g. to know the model of the camera that has acquired a certain image or video (by analysing Exif metadata), or to find other images that have been acquired using the same device that produced a certain picture (by analysing the noise pattern left in the image by the camera silicon sensor).

The next steps for the JRC project on fight against CSA will be the performance assessment of existing solutions, and possibly the development of new tools and/or the enhancement and integration of existing ones.

4

5

Table of Contents Executive summary .......................................................................................................................... 2 1. Introduction .................................................................................................................................. 9 1.1. Background and purpose....................................................................................................... 9 1.2. Face detection and recognition ........................................................................................... 10 1.3. Age estimation..................................................................................................................... 11 1.4. Gender estimation............................................................................................................... 11 1.5. Object detection and recognition ....................................................................................... 11 1.6. Context analysis ................................................................................................................... 12 1.7. Additional functionalities not covered by this study........................................................... 13 1.8. Outline of the following Chapters ....................................................................................... 13 2. Face detection and recognition .................................................................................................. 15 2.1. Face recognition from still images....................................................................................... 17 2.2. Face recognition from video................................................................................................ 18 2.3. Challenges and benchmark data sets .................................................................................. 20 3. Age estimation............................................................................................................................ 25 3.1. Facial age estimation ........................................................................................................... 25 3.2. Facial age modelling and simulation ................................................................................... 26 3.3. Automatic face recognition and age ................................................................................... 27 3.4. Benchmark data sets ........................................................................................................... 28 4. Gender estimation ...................................................................................................................... 29 4.1. Effect of age on gender estimation ..................................................................................... 29 5. Object detection and recognition .............................................................................................. 32 5.1. Object detection .................................................................................................................. 32 5.1.1. Representation and modelling ......................................................................................... 32 5.1.1.1. Window-based models .................................................................................................. 33 5.1.1.2. Local features ................................................................................................................ 33 5.1.1.3. Part-based representations ........................................................................................... 34 5.1.2. Detection .......................................................................................................................... 36 5.1.3. Challenges ........................................................................................................................ 37 5.1.4. Benchmark data sets and indicative performance ........................................................... 37 5.2. Object recognition ............................................................................................................... 38

6

5.2.1. Local features detection and extraction .......................................................................... 39 5.2.2. Local features matching ................................................................................................... 39 5.2.3. Verification of matches .................................................................................................... 40 5.2.4. Challenges ........................................................................................................................ 41 5.2.5. Benchmark data sets and indicative performance ........................................................... 41 5.3. Detection and recognition of persons ................................................................................. 42 5.3.1. Recognising a person ........................................................................................................ 43 6. Context analysis .......................................................................................................................... 45 6.1. Semantic classification of images ........................................................................................ 45 6.1.1. Benchmark data sets and indicative performance ........................................................... 46 6.2. Sensor pattern noise for source camera identification ....................................................... 48 6.2.1. Benchmark data sets and indicative performance ........................................................... 50 7. Conclusions ................................................................................................................................. 52 Annex A. Commercial and open-source solutions ......................................................................... 53 A.1. Proprietary solutions ........................................................................................................... 53 A.1.1.

Comparative evaluation ................................................................................................ 56

A.2. Commercial face recognition and detection tools .............................................................. 58 A.3. Open-source solutions ........................................................................................................ 60 A.3.1.

Computer Vision software libraries .............................................................................. 60

A.3.2.

Machine Learning software libraries ............................................................................ 62

Bibliography................................................................................................................................ 64

7

8

1. Introduction 1.1. Background and purpose This document aims at analysing the state of the art in image and video analytics technologies supporting investigators on the fight against on-line child pornography (Child Sexual Abuse – CSA), and in particular to contribute to the identification of perpetrators and victims of such crimes. The objective is to provide a better understanding of currently available solutions (both commercial and open-source), and of the present level of academic and industrial research in the field. The outcomes of this analysis will ground all the subsequent steps of the JRC project on fight against CSA. In support to the action plan implementing the Stockholm Programme1, the JRC has strengthened its links with DG HOME and EUROPOL, in particular with its recently created European Cybercrime Centre (EC3), with a view to provide strong scientific contributions to the following policy elements:    

The EU cyber-security strategy – "An Open, Safe and Secure Cyberspace" adopted in February 20132; Communication COM(2012) 1403, on tackling Crime in our Digital Age establishing the European Cybercrime Centre (EC3); The Directive 2011/92/EU on combating the sexual abuse and sexual exploitation of children and child pornography4; The Global Alliance to fight child sexual abuse online launched by US and EU and signed by EU and non-EU countries in December 20125.

The JRC has started fruitful cooperation with the European Cybercrime Centre, since its inception in January 2013. The first joint research topic selected has been Video and image database search tool for forensic purposes. The aim is to develop and integrate smart tools for the identification of victims and perpetrators of online child abuse in very large media (video and picture) databases through enhanced automated categorisation of these media. Although many techniques from video analytics could be exploited to develop valuable forensic tools for law enforcement investigators, the present document focuses primarily on five functionalities, enumerated in Table 1.

1

http://europa.eu/legislation_summaries/human_rights/fundamental_rights_within_european_union/jl0 034_en.htm 2 http://ec.europa.eu/information_society/newsroom/cf/dae/document.cfm?doc_id=1667 3 http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=COM:2012:0140:FIN:EN:PDF 4 http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=OJ:L:2011:335:0001:0014:EN:PDF 5 http://ec.europa.eu/dgs/home-affairs/what-we-do/policies/organized-crime-and-humantrafficking/global-alliance-against-child-abuse/index_en.htm

9

    

Face detection and recognition Age Estimation Gender Estimation Object detection and recognition Context analysis Table 1: Sub-set of Video-Analytics functionalities that will be considered in this report.

These functionalities have been selected by the JRC as of potential interest for the great variety of applications that they can give support to, e.g., multi-query search on media archives based on multiple, combined queries6, supervised7 classification of media content to discriminate between legal and illegal material (i.e., legal pornography and child pornography), etc. Some of these capabilities are already supported by commercial products, while others can be implemented using open source software libraries for computer vision. The choice of those five functionalities is based on the conclusions of December 2012 JRC workshop on “Emerging Surveillance Techniques for Crime Prevention & Prosecution”, preliminary discussions with EUROPOL/EC3 and some EU national law enforcement services, scientific literature analysis. The rest of this Chapter is organised as follows. First, Sections 1.2 – 1.6 briefly describe the functionalities above. Section 1.7 then presents two further functionalities, namely copy detection and content-based image retrieval, which have been implemented in various commercial solutions; they are not analysed in deep in this document. Finally, Section 1.8 outlines the rest of the document.

1.2. Face detection and recognition Face detection refers to the ability to automatically detect the presence and approximate position of faces (if any) in an image or video frame. This is achieved by using statistical detectors trained on common visual features that characterise a face. Face recognition, instead, is the capability to assign an identity to a detected face. Typically, this is achieved by matching the face image with a set of templates (faces already known by the system) and finding the most similar one. With respect to the proposed aim of providing tools for the fight against CSA, both technologies are potentially very valuable. Face detection can for instance be helpful when examining a certain video, to quickly and directly jump where the face of the aggressor and/or of the victim is visible; this allows the investigator to spend less time for examination, and tend to mitigate the disturbing job of watching the entire video. Face recognition techniques coupled with a data base of template images of known child molesters can enable a fast, machine-aided identification of the aggressor. This technique can as well facilitate the selection in a large database of all the media containing the 6

I.e., queries involving multiple aspects of Table 1, e.g., videos that contain faces of female persons below the age of 10, or images that contain a specific object AND the face of a specific person. 7 In machine learning, the term “supervised” referred to classification means that a classifier is at first trained on a training set of labelled samples of the categories to be recognised (e.g., legal and illegal material). The human supervision is therefore needed in this phase to provide the true labels of training samples.

10

same persons, helping to establish links between different cases or even with other lawful material collected elsewhere.

1.3. Age estimation Age estimation techniques are aimed at determining the approximate age of a person given an image of his/her face. Typically, this is achieved by extracting various types of features from the image and then using a pre-trained statistical classifier (machine learning) to estimate the age of the person. Coupled with face detection, age estimation can be used for instance to automatically distinguish child pornography from legal pornography, by detecting if persons shown in a certain media file are below a certain age.

1.4. Gender estimation Machine learning methods can also be used to estimate, from certain biometric traits, the gender of a person. Techniques for gender estimation can add value to search tools in media archives, where the gender can be used as a complementary query for the identification of a male or female perpetrator or victim, or to refine query results. Typically, the gender estimation works on the face, and must cope with the same issues of face detection and recognition (pose changes, illumination changes, occlusions, etc.). In addition to that, the peculiarity of the application domain at hand adds the problem of the effect of age on the feasibility of gender estimation. In fact, it may be difficult even for a human to determine the gender of a young child (