Zhihai He, Zhi Zhang, Guanghan Ning, Chen Huang, and Tony Han are with the ...... âSub-ice foraging behavior of emperor penguins,â J. Exp. Biol., vol. 203, pp.
Feature zhihai He, roland Kays, zhi zhang, guanghan ning, chen Huang, tony X. Han, Josh millspaugh, tavis forrester, and William mcshea
ImagE LIcEnsEd By Ingram PuBLIsHIng
Visual Informatics Tools for Supporting Large-Scale Collaborative Wildlife Monitoring with Citizen Scientists Abstract Collaborative wildlife monitoring and tracking at large scales will help us understand the complex dynamics of wildlife systems, evaluate the impact of human actions and environmental changes on wildlife species, and answer many important ecological and evolutionary research Digital Object Identifier 10.1109/MCAS.2015.2510200 Date of publication: 12 February 2016
fIrst QuartEr 2016
questions. To support collaborative wildlife monitoring and research, we need to develop integrated camera-sensor networking systems, deploy them at large scales, and develop advanced computational and informatics tools to analyze and manage the massive wildlife monitoring data. In this paper, we will cover various aspects of the design of such systems, including (1) long-lived integrated camerasensor system design, (2) image processing and computer vision algorithms for animal detection, segmentation, tracking, species classification, and biometric feature
1531-636X/16©2016IEEE
IEEE cIrcuIts and systEms magazInE
73
extraction, (3) cloud-based data management, (4) crowdsourcing based image annotation with citizen scientists, and (5) applications to wildlife and ecological research. 1. Introduction ildlife monitoring is established as a standard method and has provided the core data for countless scientific advances and conservation decisions [1, 2, 3]. Collaborative wildlife monitoring at large geographical and time scales will help us understand the dynamic behaviors of wildlife systems, evaluate the impact of human actions and environmental changes on wildlife species, and help us answer many important questions in wildlife, ecological and environmental research [3]. For example: (1) do a few exurban housing scattered in a natural environment impact the native wildlife populations and change the surrounding wildlife population dynamics? (2) How does animal density and diversity change along vegetation and altitudinal gradients? (3) How do the seasonal activity patterns of carnivores and inter-specific interactions change due to landscape modifications or the presence of predators or preys? The geographic scale of these types of conservation, ecological, and environmental issues is often beyond the capability of any single study to tackle, although numerous studies have documented the phenomenon at selected sites [4]. As human-induced environmental changes are expected to increase greatly over the coming decades, collaborative wildlife monitoring at large scales for effective wildlife resource management and protection has become an urgent task [4, 5]. To support collaborative wildlife monitoring at large scales, we need to develop a cyber-infrastructure which consists of an integrated set of tools for collection, analysis, and management of massive multi-modal wildlife sensor data. In this paper, we will use eMammal [28] as an example to explain how such a cyber-infrastructure can be designed and developed, what are the major challenges and research issues, and what are the existing methods, approaches, and emerging tools needed to address these challenges. The rest of the paper is organized as follows. In Section 2, we provide a comprehensive review of existing technologies for wildlife monitoring. Section 3 introduces the concept of collaborative wildlife monitoring at large scales with citizen scientists. Section 4 presents a recent developed cyber-infrastructure, called eMammal, for
W
supporting collaborative wildlife monitoring with cameratraps. Section 5 presents the research problem and the current status of visual informatics tools for automated content analysis of camera-trap data. In Section 6, we discuss the need for future work the improve the performance of automated camera-trap image analysis so as to achieve successful applications in large-scale collaborative wildlife monitoring. Section 7 concludes the paper. 2. Wildlife Monitoring Technologies Engineers and wildlife researchers have been developing advanced sensing and communication technologies for wildlife monitoring [1, 8]. Existing technologies for wildlife monitoring include Very High Frequency (VHF) radio tracking [6, 7, 8], satellite tracking [9], and Global Positioning System (GPS) tracking [10, 11], and sensor networks. Radio tracking uses a radio transmitter or receiver attached to an animal to collect its location information [8]. Since its early use in the mid-1960s, the VHF radio tracking has been a traditional method for animal telemetry. It requires a user to receive the transmissions from a VHF transmitter, usually in a collar attached to the animal, via a hand-held antenna. The location of the transmitter or the animal is determined using a so-called triangulation method with three or more received VHF signals [8]. GPS-based animal tracking was first developed in the early 1990s. Since then, tremendous progress has been made to reduce the size and cost of the device and improve the performance. The GPS location data can be transferred from the animal to the user using one of the following three approaches: (1) local point-topoint communication, (2) transmission over satellite links, and (3) on-board storage [10]. Recent technological advances in hardware miniaturization of sensors, low-power microprocessor design, and wireless ad hoc networking have enabled the development and deployment of wireless sensor networks for wildlife monitoring [12, 13]. The WSN technology provides the research community an enabling platform to simultaneously monitoring a large group of free-ranging animals at granular scales [13, 14]. ZebraNet [15] developed at Princeton University is such an example that utilizes GPS-based radio tracking and sensor networks to track the movement of a group of zebras and study animal migrations and interspecies interactions. Embedded sensor networks have also been developed for habitat and health monitoring, using the sensors to collect temperature, humidity, and other biological information of the research animals [16, 17].
Zhihai He, Zhi Zhang, Guanghan Ning, Chen Huang, and Tony Han are with the Department of Electrical and Computer Engineering, University of Missouri, Columbia, Missouri. Roland Kays is with the North Carolina Museum of Natural Sciences and NC State University, Raleigh, North Carolina. Josh Millspaugh is with the Department of Wildlife and Fisheries, University of Missouri, Columbia, Missouri. Tavis Forrester and William McShea are with the Smithsonian Conservation Biology Institute, Front Royal. Virginia.
74
IEEE circuits and systems magazine
first QUARTER 2016
As human-induced environmental changes are expected to increase greatly over the coming decades, collaborative wildlife monitoring at large scales for effective wildlife resource management and protection has become an urgent task.
The aforementioned wildlife monitoring technologies do not offer visual information. Looking back at the history of wildlife research, we find that wildlife researchers have been pursuing a dream, a dream to see what the animals see in the field without disturbing their natural behaviors, in a cost-effective way [18, 19]. Image/ video-based wildlife monitoring emerges as an increasingly important technique in wildlife research. Wildlife researchers have found that, for accurate behavior analysis and interaction modeling, it is imperative to obtain some visual information about the animal’s activity, its resource selection, as well as the environmental context of the behavior [3, 18, 20]. Otherwise, we are kept “blind” from the animals, and fail to understand some important behavioral attributes of wildlife species. From images or videos associated with multi-modal sensor data, we can extract a rich set of important information about animal appearance, biometric features, activities, behavior patterns, their resource selection, as well as important information about the environmental context [19, 21, 22]. There are two types of integrated camera-sensor systems for wildlife monitoring: animal-borne systems and camera-traps. Animal-borne systems, such as the CritterCam [23, 24] and DeerCam [25, 26], are light-weight camerasensor systems mounted on animals, such as lions, tigers, turtles, whales, and penguins, to record their free-ranging behaviors. The animal-amounted systems follow the animals all the time, allowing us to see what they see and hear what they hear in the field [11, 12]. These compact systems allow scientists to study animal behavior without interference by human observers. They require professional
knowledge and significant efforts in animal capture, sedation, and system deployment, therefore, not easily scalable for large scale deployment [27]. They need to be carefully designed based on the animal care protocols and to minimize their disturbance on the animal behaviors. For example, the American Ornithologists Union suggested that the transmitter package should be less than 5% of the body weight of the birds and the U.S. Geological Survey Bird Banding Laboratory recommended that transmitters be less than 3% of the bird body weight [27]. It has observed that batteries occupy a major portion of the system weight. For example, in DeerCam [25], the batteries contribute to more than 85% of the system weight. Therefore, minimizing the battery usage and energy consumption of the sensing system has become a critical problem in integrated camera-sensor system design [26]. Camera-traps are stationary camera-sensor systems attached to some fixtures, such as trees in the field. Triggered by animal motion with on-board infrared motion sensors, they record short image sequences or video clips of the animal appearance and activities associated with other sensor data, such as light level, moisture, temperature, and GPS sensor data, as shown in Figure 1. All animals move, but most are shy and quiet. Camera-traps are non-invasive, being able to capture the activities of shy and quiet animals. Motion-sensitive camera traps are an important visual sensor for wildlife that can record animal appearance without disturbance [18, 21]. Due to their relatively low cost, rapid deployment, and easy maintenance, camera traps are now being extensively used in wildlife monitoring [21], with the potential to be deployed at large
Figure 1. A camera-trap image sequence of a white-nosed coati moving past the camera-trap. Triggered by motion, the cameratrap captured a sequence of images at low frame rates (one frame per second in this case for 10 seconds).
first QUARTER 2016
IEEE circuits and systems magazine
75
spatiotemporal scales, together with existing legacy datasets, are Wildlife Community Camera Trap Data Federation often massive, far exceeding the Researchers, Students & Public data processing capabilities of eMammal Website Visualization humans. Currently, most images/ Smithsonian Repository Tools video samples collected from camPictures era-traps are manually processed Project Metadata Final by wildlife researchers, which is Project Blog Management very labor-intensive. Therefore, it is imperative for us to develop Amazon Cloud advanced computation and inforVisual Informatics Pictures Tools matics tools to automatically proExpert cess these massive data samples, Metadata Review Image Ingest Tools extract important parameters Tools Annotation Camera-Trap and metadata, and convert them Citizen into compact searchable wildlife Scientist’s Computer activity records in a database. Figure 2. The basic framework of our eMammal cyber-infrastructure. Second, camera and sensor data collected by different individuals scales in space and time. From camera-trap images, we can and groups of citizen scientists are often represented in extract a rich set of information about animal appearance, various forms, and isolated from each other. Lack of combiometric features, species, behaviors, their resource mon image processing and informatics tools to produce selection, as well as important environmental features scientific data at broad geographical and time scales about the surrounding habitats [18, 19]. significantly limits our capability in understanding the dynamics of wildlife systems and in addressing impor3. Collaborative Wildlife Monitoring at tant ecological and environmental issues. Third, we need Large Scales with Citizen Scientists to develop a cyber-infrastructure to archive and manage Many conservation, ecological, and environmental the massive camera-trap data. Specially, the infrastrucresearch questions require us to collect wildlife data at ture needs to provide interfaces and tools for citizen scilarge geographic and time scales. The geographic scale entists to process, analyze, and upload the camera-trap of ecological phenomenon is far beyond the capability of data and associate metadata. It needs to provide tools for any single study to tackle, although numerous studies have experts to review the data samples uploaded by citizen documented the phenomenon at selected sites [28]. Cur- scientists. It also needs to establish an online community rently, there is lack of tools and infrastructure to support to educate, train, support, motivate, and engage citizen large-scale wildlife data collection, analysis, and manage- scientists for sustainable camera-trap data collection. ment. From the data collection perspective, camera-traps are commercially available at relatively low cost, rapidly 4. eMammal Cyber-Infrastructure deployable, easy to use, and therefore suitable for largeto Support Collaborative Wildlife Monitoring scale deployment and data collection [29]. However, eMammal is a biological informatics cyber-infrastrucdeploying a large number of camera-traps at broad spa- ture which brings together citizen scientists and wildlife tiotemporal scales poses a significant logistical challenge. professionals to collect, analyze, and manage massive The citizen science approach offers a crowd-sourcing camera-trap data [28]. Figure 2 shows the basic framesolution to this challenge. Citizen science, also known as work and software architecture of eMammal. During the crowd-sourced science, is scientific research conducted, in past two years, we have developed a core set of web and whole or in part, by amateur or nonprofessional scientists. informatics tools that operate in a scalable and sustainIt involves public participation in scientific research. It has able way [28]. The Picture Ingest Tool (PIT) is a desktop proven success in weather monitoring, ecological observa- app that runs on the volunteer’s computer to analyze the tions, and birdwatchers [28]. camera-trap images and extract metadata. The Visual To support this new age of collaborative wildlife moni- Informatics Tools inside the PIT are a set of automated toring and research with citizen scientists, there are image analysis and computer vision tools to detect, segthree critical issues that need to be carefully addressed. ment, and recognize animals from the camera-trap images First, image and sensor data samples collected from and extract related animal biometric features, such as collaborative wildlife monitoring networks at large animal speed, body size, group size, etc. The automated 76
IEEE circuits and systems magazine
first QUARTER 2016
analysis results are then reviewed and further annotated by citiFine-Grain Expert Quality Control zen scientists. The camera-trap Animal Review and Volunteer images, along with the metadata, Recognition Tools Reliability Rating are then uploaded temporarily onto the Amazon Cloud. Our ecology experts then use an online Expert Review Tool (ERT) to Computer Volunteer Expert Final Data Camera-Trap Recommendation Annotation Review Product review the annotation results of volunteers. Once data quality has Figure 3. Overview of eMammal work flow for camera-trap sample data analysis. been confirmed by experts, the images and metadata are archived long-term in the Smithsonian’s Digital Repository. Our size is a useful indicator for the age of the animal. The eMammal infrastructure has been demonstrated effec- entry angle tells us which direction the animal is coming tive and critical to our success in classifying 2.6 million from, providing critical information for estimating the images in the past two years [28]. animal population density in space. The group size repreIn eMammal, as illustrated in Figure 3, we follow a sents the number of animals in the group. At the pattern three-step for camera-trap image annotation: computer layer, we need to recognize animal species. For a subset recommendation, volunteer annotation, and expert review. of species with unique pelage patterns, such as spots or First, our automated computer vision tools analyze the stripes, individual animals can be recognized from the camera-trap samples, extract visual features of the animal camera-trap images. These metadata, including the aniand environment, and generate computer recommenda- mal location and bounding box information extracted at tions, for example, the top 3 most likely species IDs of the the object layer, the appearance, motion, and biometric animal in the camera-trap sample. The volunteer verifies features extracted at the feature layer, and animal species these computer recommendations, chooses the correct ID information extracted at the pattern layer, along with result, and provides needed changes or annotations. The other environmental information, such as sample time, camera-trap samples annotated by volunteers are further GPS location, and habitat type, are indexed and archived reviewed by our experts to ensure absolute correctness in a database for retrieval and analysis by biologist. of the final data product for scientific research. In our Table 1 summarizes the set of features extracted from the eMammal design, we aim to tightly couple the capability camera-trap images and their biological use. of automated tools in handling massive data, collaborative efforts of citizen scientists in annotating large scale 5.2. Related Work in Image Processing image datasets, and expert knowledge in making correct Automated content analysis of camera-trap data falls in decisions to achieve the eMammal system design goals. research domain of image processing, computer vision, 5. Visual Informatics Tools for Automated Content Analysis of Camera-trap Data 5.1. Overview Figure 4 shows the overall framework of the eMammal image analysis for camera-trap data. It consists of multiple layers of image processing. At the object layer, we detect animals from the camera-trap images and segment the animal body from the background. At the feature layer, we extract appearance, motion, and biometric features, such as body size, moving speed, entry angle, group size, etc. Here, the body
Analysis Data Summarization and Database Management
Pattern
Species, Individual ID Recognition
Appearance and Motion Features
Biometric Features
Environmental Features
Animal Detection and Tracking
Videos
Feature
Object
Sensor Data
Data
Figure 4. The basic framework of automated processing of camera-trap images.
first QUARTER 2016
IEEE circuits and systems magazine
77
Table 1. Major Animal Features and Their Biology use. Animal Features
Biology use
Animal species Individual ID (when possible) Individual sex (when possible) Group size Body size Maximum and average moving speed; turns, time in the view Distance and entry angle of first detection Habitat data Weather data
Required for all analyses, would be checked by human. Use for mark/recapture estimates of animal density. Males and females often use habitat differently. Counting the number of animals that walk past a camera together. Help in identifying species and classifying age categories. Useful for animal movement models. To estimate the amount of area surveyed by the camera and motion sensor. Natural resources and habitat information in the study region from GIS. How rainfall, temperature and other weather conditions affect behavior.
and machine learning. During the past decades, a wide variety of methods have been developed for background modeling, object detection, object, object classification and verification [40, 51, 53]. They have found successful applications in surveillance, biomedical imaging, and bioinformatics. However, these existing methods cannot be directly and efficiently applied to camera-trap images. Figure 5 shows several examples of typical images captured by camera-traps. Compared to other images, camera-trap data from the heavily wooded natural environment has its own characteristics, is highly cluttered with low contrast, and has dramatic background motion. Animal detection needs to handle a wide variety of animal body sizes, appearance, and poses. Animal species
classification needs to handle the large number of target classes, large intra-class variations caused by different animal poses, body orientations, and image noise, and strong inter-class ambiguity for closely related species. In the literature, there are a few interesting methods being developed for animal tracking and appearance modeling. For example, Burghardt and C´alic from University of Bristol developed an algorithm to detect a lion’s face and track its locomotive behavior [30]. Ramanan et al from University of California Berkeley tracked the body parts of zebras, tigers, and giraffes using temporal texture coherence and appearance models [31]. Pattern recognition and photo-identification methods have been developed to use photographs taken in the field to
Figure 5. Examples of camera-trap images showing animals of different sizes with varying light conditions and background clutter.
78
IEEE circuits and systems magazine
first QUARTER 2016
One of the greatest modern challenges to ecologists is to characterize and understand the dynamic relationship between the local abundance of a given animal species and the corresponding environmental variables. identify individual animals with unique surface patterns [32, 33, 34]. These studies demonstrated the technical feasibility and great potentials of automated image processing in wildlife research. However, an integrated suite of methods and tools for automated content analysis of camera-trap data is not available yet to support largescale collaborative wildlife monitoring and research. [35] and [36] develop software tools to manage cameratrap photos and data from multiple surveys. Wildlife@ Home describes a method for analyzing avian nest videos using crowd sourcing [37]. They do not provide methods and tools for automated image processing, and is limited in the types and volume of images it supports. 5.3. Animal Detection and Segmentation from Camera-Trap Images Unlike many other image processing and vision analysis tasks, detecting and segmenting animals from the camera-trap images is very challenging since natural scenes in the wild are often highly cluttered due to heavy vegetation and highly dynamic due to waving trees, moving shadows and sun spots. Our work [40] represents one of the very first approaches that can successfully work with real-world images in the wild. The basic flow of the proposed ensemble video object cut method is illustrated in Figure 6. We first scan the image sequence, perform initial background-foreground image patch classification, and construct bag-of-words (BoW) background models with Histogram of Oriented Gradients (HOG) features. This BoW model is able to capture the background motion and texture dynamics. To segment the foreground object from the background, for each image patch, we construct features to describe its texture and neighborhood image characteristics. Based on the BoW background models, we analyze its temporal salience. We also compare the image patch to its neighborhood patches to form the spatial salience measure. Based on this spatiotemporal salience analysis results, we construct the foreground salience graph. We then apply the graphcut energy minimization method to obtain the foreground segmentation. These background-foreground classification results of neighboring frames are fused together to update the weights of the foreground salience graph. Shape prior information is extracted from the detected foreground objects and used as constraints to guide the graph-cut energy minimization procedure. This classification-fusionrefinement procedure is performed in an iterative manner to achieve the final video object segmentation results. first QUARTER 2016
Our extensive experimental results and performance comparisons over a diverse set of challenging videos with dynamic scenes [41] demonstrate that our method outperforms various states-of-the- art algorithms. This method significantly outperforms other methods. For example, we have achieved an average precision of 95.34%, much higher than the second best 83.26% [41]. This is because our method is able to effectively capture and model the highly dynamic background motion and to accurately locate the object boundary by sharing and fusing foreground-background classification information between frames. Figure 7 shows how our ensemble video cut method is able to refine the segmentation results in an iterative manner by sharing and fusing foregroundbackground classification information between frames. Figure 8 shows a screen snapshot of the eMammal client software distributed to citizen scientists which is able to detect and locate animal using the above method. 5.4. Animal Species Recognition from Camera-Trap Images One of the greatest modern challenges to ecologists is to characterize and understand the dynamic relationship between the local abundance of a given animal species and the corresponding environmental variables. During our Phase 1 field studies of eMammal, we found that the animal species recognition is the most time-consuming task for citizen scientists. Detailed species classification and accurate
Foreground Objects & Shape Priors
Background Patches Graph Cut
Foreground Salience Graph
Temporal Salience Analysis
Spatial Salience Analysis
Background Estimation
Bag of Words Background Models
... Figure 6. Overview of the ensemble video object cut algorithm [40].
IEEE circuits and systems magazine
79
Figure 7. The background-foreground (animal) classification information are fused across frames to refine their segmentation results in a collaborative and iterative manner.
Figure 8. A screen shot of eMammal client software for camera-trap image analysis with automated animal detection. The detected animal is labelled with a bounding box.
80
IEEE circuits and systems magazine
first QUARTER 2016
During the past decades, a wide variety of methods have been developed for background modeling, object detection, object, object classification and verification. animal identification for supporting scientific research requires comprehensive knowledge of wildlife species taxonomy [28, 42]. This requires us to develop advanced visual informatics tools to assist the citizen scientists for highly efficient and accurate animal species classification. We use the ensemble video object cut method in Section 5.4 to detect and segment the animal from the background. This will produce a bounding box around the animal. Figure 9 shows some samples of these animal image patches. We can see that they are very challenging for animal species recognition. The animal detection method, although achieving the state-of-the-art performance, still has difficult in accurately extracting the animal body from the background with a correct and tight bounding box. This is because the camera-trap images are highly dynamic and cluttered. We can also see that animals in the image exhibit a wide range of poses with significant intraclass variations in their appearance. The major challenge is that we need to use these detected image patches for animal species recognition. In the following, we present two major methods for animal species recognition. (A) Animal Species Classification Using Multi-Class SVM The work by Yu et al in [43] singles out the animal species classification task from the animal detection and assumes that accurate and tight bounding boxes around
the foreground animals are available. Their method uses advanced image features based on Scale Invariant Feature Transform (SIFT) descriptors, cell structured Local Binary Patterns (cLBP), dictionary learning, max pooling, as well as linear spatial pyramid matching with sparse coding [44]. It uses linear multi-class SVMs (Support Vector Machines) for animal species classification. On camera trap datasets with about 18-25 animal species, it has have achieved an average accuracy of 82%. It is believed that this represents the best level of performance that can be achieved by state-of-the-art vision methods. Considering the challenging nature of camera-trap images, this performance is encouraging. (B) Animal Species Recognition Using Deep Convolutional Neural Network The second work by Chen et. al [45] considers direct animal species classification using image patches generated by the animal detection method. Clearly, this task is much more challenging and the classification algorithm needs to tolerate the inaccurate bounding boxes of animals. Their work compares two image classification methods: (1) conventional multi-class SVM with HoG (histogram of oriented gradients) and bag of words (BOW) models; (2) a deep convolutional neural network (DCNN) method. These two image classification methods both have their own advantages and
Figure 9. Samples of animal image patches being extracted by the animal detection module with the ensemble video object cut method, which are very challenging for the subsequent species recognition.
first QUARTER 2016
IEEE circuits and systems magazine
81
128 9 9 9 9
9
9
128
20
Max Pooling
32
Max Pooling
32
Max Pooling
Figure 10. The structure of deep convolutional neural network for animal species classification.
disadvantages. Method (1) is simple and quite robust to deformation and part clipping, but it achieves only suboptimal results. The DCNN based image classification method can achieve superior performance over most of the state-of-the-art image classification algorithms [53], but requires large amount of labeled training data, even if data augmentation techniques are applied. The animal species classification method in [45] designed a DCNN with 3 convolutional layers and 3 max pooling layers, as illustrated in Figure 10. The convolutional layer has a convolutional kernel with a size of 9 × 9, while the pooling layer has a kernel with a size of 2 × 2. The input image patch is normalized to 128 × 128 pixels. In the first convolutional layer, which applies 2-D convolution to the 128 × 128 input layer, the output will be a 120 × 120 matrix. Since there are 32 kernels in the first convolution layer, we have 32 out matrices. After 2 × 2 max pooling, the output of the first layer will have 32 60 × 60 matrices, which are inputs to the second convolution layer. Similarly, the output of the second layer convolution will produce 32 52 × 52 matrices. With max pooling, the output will be 32 26 × 26 matrices. At the third layer, after convolution and max pooling, the output will be 32
9 × 9 matrices, being converted into a 2592 dimensional vector. After that, a fully connected layer and a soft max layer are used. The soft max layer has 20 neurons and the max output among these 20 neurons is used to determine the label of input image. A data augmentation process [30] is also used during our training stage. The experiment has 20 species to be classified: Agouti, Collared Peccary, Paca, Red Brocket Deer, White-nosed Coati, Spiny Rat, Ocelot, Red Squirrel, Common Opossum, Bird spec, Great Tinamou, White Tailed Deer, Mouflon, Red Deer, Roe Deer,Wild Boar, Red Fox, European Hare, Wood Mouse, and Coiban Agouti. The training and testing images are randomly sampled from the total collection of images including color images, gray images, and infrared images with resolutions ranging from 320 × 240 to 1024 × 768. Each image contains only one animal out of the aforementioned 20 species. Table 2 presents the species classification accuracy using the conventional BOW+SVM approach and the DCNN approach. The overall species recognition accuracy of the BOW is 33.507% and the overall species recognition accuracy of the DCNN is 38.315%. Note that the learning capacity of the DCNN is very high and therefore the performance of the DCNN can be further improved if more training data
Table 2. Direct animal species recognition with detected animal bounding boxes.
82
Method
Agouti
Peccary
Paca
R-Brocket Deer
W-nosed Spiny Coati Rat
Ocelot R-Squirrel
Opossum
Bird Spec
BOW DCNN
0.041 0.13
0.108 0.122
0.298 0.187
0.01 0.02
0.333 0.243
0.146 0.05
0.398 0.224
0.028 0.038
0.296 0.147
0.011 0.001
Method
W-Tail Tinamou Deer
Mouflon
R-Deer
Roe Deer
Wild Boar
R-Fox
Euro Hare
Wood Mouse
Coiban Agouti
BOW DCNN
0.397 0.298
0.647 0.71
0.746 0.82
0.038 0.046
0.246 0.171
0.001 0.001
0.143 0.02
0.746 0.873
0.055 0.045
0.69 0.5
IEEE circuits and systems magazine
first QUARTER 2016
Automated animal detection, localization, and species recognition lie in the heart of automated camera-trap image analysis.
are made available. From this comparison, we can see that the proposed DCNN outperforms the traditional BOW+SVM method. It can be seen that the current recognition accuracy of joint animal detection and recognition is still very low, especially for animal species with high ambiguity levels. 6. Future Work for Automated Camera-Trap Image Analysis Automated animal detection, localization, and species recognition lie in the heart of automated camera-trap image analysis. From the above sections, we can see that, currently, there is still a significantly gap between the current performance of animal species recognition and the performance goal for effective field use by citizen scientists. For the animal species recognition to be effective in practice, it needs to achieve an accuracy of >97% for top three recommendation. Specifically, for a given camera-trap image sequence of one animal, it suggests the top 3 candidates for the animal species, the probability of the correction result being among these three should be larger than 0.97. Currently, the gap between this the current performance level and the target is still tremendous. Before we proceed to discuss various approaches and future tasks to improve the species classification accuracy, we would like to mention that: (1) the animal species recognition accuracy rate reported in Section 5.4 is for each image patch. A camera-trap sample typically has 10-20 images. It will typically contain multiple (e.g., 5-7) image patches of the same animal. We can perform species classification of all of these image patches and use voting to determine the species ID for the animal. This voting procedure will significantly improve the overall accuracy. (2) In field use of camera-trap image analysis and animal species recognition, the recognition module can suggest the top 3 candidates for the citizen scientists to choose. Otherwise, they need to manually choose the animal species ID from a long list of species names. As we know, the top 3 hit ratio will be much higher than the original recognition accuracy for each individual image patch. With the above two options, it become not impossible to achieve the target 97% accuracy level for top 3 recommendations by exploring further advanced machine learning and image classification methods. More specifically, within the context of camera-trap image analysis by citizen scientists, the following methods can be explored: (1) Using the environmental features as context. Each camera-trap image sample is associated with specific first QUARTER 2016
environmental features, such as habitat type, time, season, and weather. As we know, a specific set of environmental features corresponds to a subset of animal species. This dependence can be modeled with conditional probabilities obtained from previous wildlife camera-trap survey data. We expect that these conditional probabilities, once being incorporated into our hierarchy animal species classifiers, will significantly reduce the classification errors. (2) Using the wildlife taxonomy to construct finegrain hierarchical animal species classification. Animals are organized in a taxonomy hierarchy based on Order, Suborder, Family, Subfamily, Genus, Species and Subspecies [42]. One can train and learn a hierarchy of animal species classifiers based on this animal taxonomy tree. More specifically, with the environmental features as context, each node of the tree is associated with a conditional probability, indicating the likelihood of this animal species appearing at this environment. (3) Continuous online training and active learning of animal classifiers. As more citizen scientists being recruited and more concurrent camera-traps being deployed, eMammal will accumulate a vast amount of labeled animal image samples. This provides sufficient data for training the animal classifiers. Currently, datadriven approaches are prevailing in the computer vision research community [51]. Researchers recognize that a high-quality sufficient amount of labeled training data is the key for high-performance machine learning and classification. One can utilize the massive and growing expert reviewed image samples to develop effective model boosting and adaptation methods for continuous online training and active learning of the animal classifiers [52]. As more data labels are confirmed by expert reviews, the incremental active learning will further improve the performance of our fine-grain animal species classification. (4) Tracking and improving the eMammal performance of citizen scientists. We observed that some volunteers, due to lack of knowledge and insufficient training, will make incorrect selections of computergenerated labels and wrong annotations of camera-trap image samples, especially for those closely related and visually similar animal species. In addition, during the annotation process, a volunteer may get tired, lose his or her enthusiasm, or get distracted, affecting the volunteer’s decision and annotation performance. We rely on a large group (hundreds) of broadly recruited citizen scientists to remotely collect and annotate camera-trap IEEE circuits and systems magazine
83
images. Effectively monitoring, tracking and improving their performance becomes critical for the sustainable success of our eMammal cyber-infrastructure. Therefore, we need to develop a set of tools (1) to identify volunteers with insufficient training; (2) to evaluate and rate the annotation reliability of each volunteer; (2) report abnormal behaviors of volunteers during annotations. (5) Transferring expert knowledge to citizen scientists and visual informatics tools. We envision that the citizen scientists can learn online from experts to improve their knowledge and capabilities beyond their initial training. The key is to solicit and record the expert knowledge and transfer it to volunteers. Experts love showing off their knowledge. In eMammal, we can develop a friendly interface, letting the expert record their knowledge on difficult problems during their review process. For example, when an expert finds out that volunteers and/or the computer module have difficulty in classifying two animal species, he can mark (with a bounding box) the salient image region (with a bounding box) that is important for discriminating these two species. He can also type additional text to describe the decision. At the volunteer side, next time, once this scenario arises, the Picture Ingest Tool can display a tip to the volunteer, explaining how the expert making decision for these types of animal species, helping the volunteer to make the right decision when annotating images. These types of expert tips are also very helpful for improving the performance of the visual informatics tools, especially the animal detector and species classifiers. More specifically, the locality information of the expert tip will tell our machine learning module where to look and which subset of locality features should be used when detecting or classifying the target species. To this end, we will investigate a local decision model adaption approach [Cao10] for animal classification with expert tips. (6) Visual Informatics tools for supporting fast and efficient expert review. With tens of thousands of images being uploaded each week by volunteers, the primary bottleneck issue of our current eMammal system is the expert review of these images. A relatively small number of experts need to review all the camera-trap annotation results submitted by hundreds of citizen scientists. Therefore, we need to develop effective visual informatics tools to support fast and efficient expert review of massive annotation results. In addition, we will study how the expert knowledge could be effectively transferred to the computer module and volunteers to improve the image annotation performance. 7. Concluding Remarks In this paper, we have introduced the research work of collaborative wildlife monitoring and tracking at large geographical and time scales with citizen scientists using 84
IEEE circuits and systems magazine
camera-traps. To support collaborative wildlife monitoring and research, we need to develop integrated camera-sensor networking systems, deploy them at large scales, and develop advanced computational and informatics tools to analyze and manage the massive wildlife monitoring data. We have explained the key research questions and its current research status on automated camera-trap image analysis for animal detection, tracking, and species recognition. We have also discussed how advanced machine learning and image analysis methods can be explored to further improve the current system performance to achieve successful deployment in field practice. Zhihai He (S’98-M’01-SM’06-F’15) received the B.S. degree from Beijing Normal University, Beijing, China, and the M.S. degree from Institute of Computational Mathematics, Chinese Academy of Sciences, Beijing, China, in 1994 and 1997 respectively, both in mathematics, and the Ph.D. degree from University of California, Santa Barbara, CA, in 2001, in electrical engineering. In 2001, he joined Sarnoff Corporation, Princeton, NJ, as a Member of Technical Staff. In 2003, he joined the Department of Electrical and Computer Engineering, University of Missouri, Columbia, as an assistant professor. His current research interests include image/video processing and compression, network transmission, wireless communication, computer vision analysis, sensor networks, and embedded system design. He received the 2002 IEEE Transactions on Circuits and Systems for Video Technology Best Paper Award and the SPIE VCIP Young Investigator Award in 2004. Currently, he has served as an Associate Editor for IEEE Transactions on Circuits and Systems for Video Technology, IEEE Transactions on Multimedia, and Journal of Visual Communication and Image Representation. He is also a guest-editor for IEEE TCSVT Special Issue on Video Surveillance. He is the Co-Chair of the 2007 International Symposium on Multimedia over Wireless in Hawaii. He is a member of the Visual Signal Processing and Communication Technical Committee of the IEEE Circuits and Systems Society, and serves as Technical Program Committee member or session chair of a number of international conferences. Roland Kays received his B.S. degree in biology from Cornell University in 1993 and Ph.D. degree in zoology from University of Tennessee in 1993. He is a zoologist with a broad interest in mammal ecology, evolution, and conservation. He is a Research Associate Professor and North Carolina State University and the Director of the Biodiversity Lab at the NC Museum of Natural Sciences. He is an expert first QUARTER 2016
in using new technologies to study free-ranging animals, especially to track their movement with telemetry, GPS, and remote camera traps. Zhi Zhang (S’13) is a Ph.D. student in Electrical and Computer Engineering at University of Missouri-Columbia. Prior to beginning of the PhD program, Zhi received the B.S. degree in Electronic and Information Technology from Beijing Jiaotong University, Beijing, China, and the M.S. degree in ECE from University of Missouri-Columbia, in 2012 and 2014, respectively. His current research interests include multimedia content retrieval, object recognition/classification, wild animal segmentation, and person re-identification. Guanghan Ning (S’13) received the B.S. degree in electrical engineering from Beijing Jiaotong University, Beijing, China, in 2012 and the M.S. degree in electrical engineering from the University of Missouri in 2014. He is currently working toward the Ph.D. degree in electrical and computer engineering at the University of Missouri. His main research interests include computer vision and machine learning. His current research focuses primarily on natural scene text detection, scene classification, and image retrieval. He is a student member of the IEEE. Chen Huang received his B.S. degree in electrical and computer engineering from Beihang University, China, in 2010. Now he is pursuing a Ph.D. degree in Electrical and Computer Engineering from the University of Missouri-Columbia. His research interests include contour-based object detection, fine-grained image classification and convolutional neural network. Tony X. Han (M’01) received the Ph.D. degree in electrical and computer engineering from the University of Illinois at Urbana-Champaign, Urbana, in 2007. He is currently an Associate Professor of electrical and computer engineering with the University of Missouri, Columbia. His specialties lie in computer vision and machine learning, with emphasis on human/object detection, large-scale image retrieval, object tracking, action recognition, video analysis, and biometrics. Dr. Han was a recipient of a CSE fellowship. His research team was a joint winner of the action recognition task in the worldwide grand challenge PASCAL 2010. The human detector developed by first QUARTER 2016
his group was ranked second in the worldwide grand challenge PASCAL 2009. His research team together with UIUC joint team also won the first place in Facial Expression Recognition and Analysis Challenge (FERA) in 2011. Joshua Millspaugh received his B.S. degree in Forest Biology from the State University of New York in Syracuse, New York, a M.S. degree from South Dakota State University in Wildlife Ecology, and a Ph.D. from the University of Washington in Wildlife Ecology in 1999. Joshua is currently Professor and O’Connor Distinguished Professor of Wildlife Management and the Interim Director of the School of Natural Resources at the University of Missouri. Joshua is a Fellow of The Wildlife Society and in 2013, he was the inaugural recipient of the Southeastern Athletic Conference Faculty Achievement Award at the University of Missouri which “honors professors with outstanding records in teaching and scholarship who serve as role models for other faculty and students.” His current research interests include the design and analysis of wildlife radio-tracking studies, vertebrate population ecology, and application of sensor technology in ecology. Tavis Forrester received his B.S. degree from Oregon State University in 1999 and his Ph.D. degree on ecology from University of California, Davis in 2014. Since November 2012, he has been a conservation biologist in Smithsonian Institute. His research areas include conservation biology, citizen science, and community ecology. William McShea is a wildlife ecologist for the Smithsonian Institution based within the Conservation Biology Institute at Front Royal, VA. He has been at SCBI since 1986 after receiving his B.S. at Bucknell University, MS at University of New Hampshire, and PhD at State University of New York at Binghamton. He is an expert on large mammal ecology with primary emphasis on deer species including whitetailed deer in eastern oak forests. He is co-chairman of IUCN Deer Specialist Group as well as a member of the bear and Asian bovid specialist groups He also studies large mammals in China and Southeast Asia along with their habitats. References [1] J. Millspaugh and J. Marzluff, Radio Tracking and Animal Populations. New York: Academic Press, 2001. [2] I. R. Swinland and P. J. Greenwood, The Ecology of Animal Movement. Oxford, UK: Oxford Univ. Press, 1983. IEEE circuits and systems magazine
85
[3] R. Kays and K. M. Slauson, “Remote cameras,” in Noninvasive Survey Methods for North American Carnivores, R. A. Long, P. MacKay, J. Ray, and W. Zielinski, Eds. 2008, pp. 110–140. [4] L. Markovchick-Nicholls, H. M. Regan, D. H. Deutschman, A. Widyanata, B. Martin, L. Noreke, and T. A. Hunt, “Relationships between human disturbance and wildlife land use in urban habitat fragments,” Conserv. Biol., vol. 22, no. 1, pp. 99–109, 2008. [5] C. T. Darimont, S. M. Carlson, M. T. Kinnison, P. C. Paquet, T. E. Reimchen, and C. C. Wilmers, “Human predators outpace other agents of trait change in the wild,” Proc. Natl. Acad. Sci., vol. 106, pp. 952–954, 2009. [6] W. D. Hamilton and R. M. May, “Dispersal in stable habitats,” Nature, vol. 269, pp. 578–581, 1977. [7] G. C. White and R. A. Garrott, Analysis of Wildlife Radio-Tracking Data. Elsevier, 2012. [8] L. D. Mech, A Handbook of Animal Radio-Tracking. Univ. Minnesota Press, 1983. [9] M. S. Coyne and B. J. Godley, “Satellite Tracking and Analysis Tool (STAT): An integrated system for archiving, analyzing and mapping animal tracking data,” Mar. Ecol. Prog. Ser., vol. 301, pp. 1–7, 2005. [10] J. Young and T. Morgan, Animal Tracking Basics. Stackpole Books, 30 Jan. 2007. [11] I. A. R. Hulbert and J. French, “The accuracy of GPS for wildlife telemetry and habitat mapping,” J. Appl. Ecol., vol. 38, no. 4, pp. 869–878, Aug. 2001. [12] I. F. Akyildiz, W. Su, Y. Sankarasubramaniam, and E. Cayirci, “Wireless sensor networks: A survey,” Comput. Netw., vol. 38, no. 4, pp. 393–422, 2002. [13] A. Mainwaring, D. Culler, J. Polastre, R. Szewczyk, and J. Anderson, “Wireless sensor networks for habitat monitoring,” in Proc. 1st ACM Int. Workshop on Wireless Sensor Networks and Applications, 2002, pp. 88–97. [14] B. Kranstauber, A. Cameron, R. Weinzerl, T. Fountain, S. Tilak, M. Wikelski, and R. Kays, “The Movebank data model for animal tracking,” Environ. Modelling Softw., vol. 26, no. 6, pp. 834–835, 2011. [15] P. Juang, H. Oki, Y. Wang, M. Martonosi, L. S. Peh, and D. Rubenstein, “Energy-efficient computing for wildlife tracking: Design tradeoffs and early experiences with ZebraNet,” ACM SIGPLAN Not., vol. 37, no. 10, pp. 96–107, 2002. [16] R. Szewczyk, E. Osterweil, J. Polastre, M. Hamilton, A. Mainwaring, and D. Estrin, “Habitat monitoring with sensor networks,” Commun. ACM, vol. 47, no. 6, pp. 34–40, 2004. [17] Z. J. Haas and T. Small, “A new networking model for biological applications of ad hoc sensor networks,” IEEE/ACM Trans. Netw., vol. 14, no. 1, pp. 27–40, 2006. [18] J. Beringer, J. J. Millspaugh, J. Sartwell, and R. Woeck, “Real-time video recording of food selection by captive white-tailed deer,” Wildl. Soc. Bull., vol. 32, no. 3, pp. 648–654, 2004. [19] R. Kays, B. Kranstauber, P. Jansen, C. Carbone, M. Rowcliffe, T. Fountain, and S. Tilak, “Camera traps as sensor networks for monitoring animal communities,” in Proc. 34th IEEE Conf. Local Computer Network, Oct. 2009, pp. 811–818. [20] A. B. Cooper and J. J. Millspaugh, “Accounting for variation in resource availability and animal behavior in resource selection studies,” Radio Tracking and Animal Populations. Academic Press, pp. 243–274, Jan. 2001. [21] R. Kays, S. Tilak, B. Kranstauber, P. A. Jansen, C. Carbone, M. Rowcliffe, and Z. He, “Monitoring wild animal communities with arrays of motion sensitive camera traps,” Int. J. Res. Rev. Wireless Sens. Netw., vol. 1, pp. 19–29, 2011. [22] E. Mendoza, P. R. Martineau, E. Brenner, and R. Dirzo, “A novel method to improve individual animal identification based on camera‐trapping data,” J. Wildl. Manage., vol. 75, no. 4, pp. 973–979, 2011. [23] G. J. Marshall, “Crittercam: An animal-borne imaging and data logging system,” Mar. Technol. Soc. J., vol. 32, no. 1, pp. 11–17, 1998. [24] G. Marshall, M. Bakhtiari, M. Shepard, I. Tweedy, D. Rasch, K. Abernathy, B. Joliff, J. C. Carrier, and M. R. Heithaus, “An advanced solid-state animal-borne video and environmental data-logging device for marine research,” Mar. Technol. Soc. J., vol. 41, no. 2, pp. 31–38, 2007. [25] Z. He, W. Cheng, and X. Chen, “Extending the operational lifetime of portable video communication devices using power-rate-distortion optimization,” IEEE Trans. Circuits Syst. Video Technol., vol. 18, no. 5, pp. 596–608, May 2008. [26] Z. He, J. Eggert, W. Cheng, X. Zhao, J. Millspaugh, R. Moll, J. Beringer, and J. Sartwell, “Energy-aware portable video communication system design for wildlife activity monitoring,” IEEE Circuits Syst. Mag., vol. 8, no. 2, pp. 25–37, 2008. [27] N. J. Silvy, The Wildlife Techniques Manual. John Hopkins Univ. Press, 2012. [28] R. Kays, R. Costello, W. McShea, T. Forrester, M. Baker, A. Parsons, R. Montgomery, L. Kalies, and J. J. Millspaugh, “eMammal - citizen science camera trapping as a solution for broad-scale, long-term monitoring of wildlife populations,” in Proc. North American Conservation Biology, Missouri, July 2014, pp. 80–86. [29] Reconyx PC85 RapidFire webpage, Professional Color IR camera system. [Online]. Available: http://www.reconyx. com/page.php?id=56.
86
IEEE circuits and systems magazine
[30] T. Burghardt and J. Calic, “Analysing animal behaviour in wildlife videos using face detection and tracking,” IEE Proc.-Vis. Image Signal Process., vol. 153, no. 3, pp. 305–312, June 2006. [31] D. Ramanan, D. A. Forsyth, and K. Barnard, “Building models of animals from video,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 8, pp. 1319–1334, Aug. 2006. [32] L. Gamble, S. Ravela, and K. McGarigal, “Multi-scale features for identifying individuals in large biological databases: an application of pattern recognition technology to the marbled salamander Ambystoma opacum,” J. Appl. Ecol., vol. 45, no. 1, pp. 170–180, 2008. [33] M. Lahiri, C. Tantipathananandh, R. Warungu, D. I. Rubenstein, and T. Y. Berger-Wolf, “Biometric animal databases from field photographs: Identification of individual zebra in the wild,” in Proc. 1st ACM Int. Conf. Multimedia Retrieval (ICMR’11), Apr. 2011, pp. 6–10. [34] L. Hiby, P. Lovell, N. Patil, N. S. Kumar, A. M. Gopalaswamy, and K. U. Karanth, “A tiger cannot change its stripes: Using a three-dimensional model to match images of living tigers and tiger skins,” Biol. Lett., vol. 5, no. 3, pp. 383–386, June 2009. [35] M. Tobler. CameraBase software webpage. [Online]. Available: http://www. atrium-biodiversity.org/tools/ camerabase/ [36] H. Grant, R. Thompson, J. L. Childs, and J. G. Sanderson, “Automatic storage and analysis of camera trap data,” Bull. Ecol. Soc. Amer., vol. 91, no. 3, pp. 352–360, 2010. [37] T. Desell, R. Bergman, K. Goehner, R. Marsh, R. Vander-Clute, and S. EllisFelege, “Wildlife@Home: Combining crowd sourcing and volunteer computing to analyze avian nesting video,” in Proc. 9th IEEE Int. Conf. eScience, vol. 107, no. 115, pp. 22–25, Oct. 2013. [38] S. R. Pasumbi, “Humans as the world’s greatest evolutionary force,” Science, vol. 293, no. 5536, pp. 1786–1790, 2001. [39] S. J. Riley, D. J. Decker, J. W. Enck, P. D. Curtis, D. Paul, T. Lauber, T. L. Brown, and S. J. Rileey, “Deer populations up, hunter populations down: Implications of interdependence of deer and hunter population dynamics on management,” Ecoscience, vol. 10, no. 4, pp. 455–461, 2003. [40] X. Ren, T. Han, and Z. He, “Ensemble video object cuts in highly dynamic scenes,” in Proc. Int. Conf. Computer Vision and Pattern Recognition (CVPR), June 2013, pp. 1947–1954. [41] N. Goyette, P. M. Jodoin, F. Porikli, J. Konrad, and P. Ishwar, “changedetection. net: A new change detection benchmark dataset,” in Proc. IEEE Computer Vision and Pattern Recognition Workshop (CVPRW), 2012, pp. 1–8. [42] D. E. Wilson and D. Reeder, Mammal Species of the World, 3rd ed. Baltimore: Johns Hopkins Univ. Press, 2005. [43] X. Yu, J. Wang, R. Kays, P. Jansen, T. Wang, and T. Huang, “Automated identification of animal species in camera trap images,” EURASIP J. Image Video Process., vol. 1, pp. 1–10, 2013. [44] P. Khorrami, J. Wang, and T. Huang, “Multiple animal species detection using robust principal component analysis and large displacement optical flow,” in Proc. ICPR Workshop on Visual Observation and Analysis of Animal and Insect Behavior (VAIB), Tsukuba, Japan, Nov. 2012, pp. 32–35. [45] G. Chen, T. X. Han, T. Forrester, R. Kays, and Z. He, “Deep convolutional neural network based species recognition for wild animal monitoring,” Int. Conf. Image Process., pp. 858–862, Sept. 2014. [46] L. Markovchick-Nicholls, H. M. Regan, D. H. Deutschman, A. Widyanata, B. Martin, L. Noreke, and T. A. Hunt, “Relationships between human disturbance and wildlife land use in urban habitat fragments,” Conserv. Biol., vol. 22, no. 1, pp. 99–109, 2008. [47] N. M. Adimey, K. Abernathy, J. C. Gaspard III, and G. J. Marshall, “Meeting the manatee challenge: The feasibility of using CRITTERCAM on wild manatees,” Mar. Technol. Soc. J., vol. 41, no. 4, pp. 14–17, 2007. [48] P. J. Ponganis, R. P. van Dam, G. J. Marshall, T. Knower, and D. H. Levenson, “Sub-ice foraging behavior of emperor penguins,” J. Exp. Biol., vol. 203, pp. 3275– 3278, 2000. [49] B. L. Sullivan, C. L. Wood, M. J. Iliff, R. E. Bonney, D. Fink, and S. Kelling, “eBird: A citizen-based bird observation network in the biological sciences,” Biol. Conserv., vol. 142, no 10, pp. 2282–2292, 2009. [50] W. M. Hochachka, D. Fink, R. A. Hutchinson, D. Sheldon, W. K. Wong, and K. Steve, “Spatiotemporal exploratory models for broad-scale survey data,” Ecol. Appl., vol. 20, no. 8, pp. 2131–2147, 2010. [51] J. Deng, W. Dong, R. Socher, L. J. Li, K. Li, and F.-F. Li, “ImageNet: A largescale hierarchical image database,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), June 2009, pp. 248–255. [52] S. Hanneke, “A bound on the label complexity of agnostic active learning,” Int. Conf. Mach. Learn., pp. 353–360, June 2007. [53] A. Krizhevsky, I. Sutskever, and G. E Hinton, “ImageNet classification with deep convolutional neural networks,” in Proc. Advances in Neural Information Processing Systems (NIPS), 2012, pp. 1097–1105.
first QUARTER 2016