Ship Classification Using Deep Learning Techniques

0 downloads 0 Views 287KB Size Report
based on the Inception and ResNet architectures to perform ship classification. ... CNNs to perform maritime vessel image classification on a limited ... layers of CNN architectures for fast training and testing of ... Trees (PART) algorithm to classify Air Traffic Control (ATC) .... ResNets can have between 18 to 152 layers.
2018 21st International Conference on Information Fusion (FUSION)

Ship Classification using Deep Learning Techniques for Maritime Target Tracking Maxime Leclerc

Ratnasingham Tharmarasa

Mihai Cristian Florea

Thales Research and Technology Quebec, QC, Canada [email protected]

TrackGen Solutions Inc. Mississauga, ON, Canada [email protected]

Thales Research and Technology Quebec, QC, Canada mihai.fl[email protected]

Anne-Claire Boury-Brisset

Thiagalingam Kirubarajan

Nicolas Duclos-Hindi´e

Defence Research and Development Canada Quebec, QC, Canada [email protected]

TrackGen Solutions Inc. Mississauga, ON, Canada [email protected]

Thales Research and Technology Quebec, QC, Canada [email protected]

Abstract—In the last five years, the state-of-the-art in computer vision has improved greatly thanks to an increased use of deep convolutional neural networks (CNNs), advances in graphical processing unit (GPU) acceleration and the availability of large labelled datasets such as ImageNet. Obtaining datasets as comprehensively labelled as ImageNet for ship classification remains a challenge. As a result, we experiment with pre-trained CNNs based on the Inception and ResNet architectures to perform ship classification. Instead of training a CNN using random parameter initialization, we use transfer learning. We fine-tune pre-trained CNNs to perform maritime vessel image classification on a limited ship image dataset. We achieve a significant improvement in classification accuracy compared to the previous state-of-the-art results for the Maritime Vessel (Marvel) dataset. Index Terms—Maritime Domain Awareness (MDA), Ship Classification, Deep Learning, Intelligence, Surveillance and Reconnaissance (ISR), Automated Target Recognition and Identification (R&I)

I. I NTRODUCTION In the last decades, an important effort has been deployed to improve the security and safety associated to the maritime domain. Technology advances in the Intelligence, Surveillance and Reconnaissance (ISR) domain give rise to the development of more sophisticated sensor systems that are used to collect increased volumes of heterogeneous data. A maritime patrol aircraft that is used for ISR operations generally uses a wide variety of sensors including Electro-Optical/Infrared (EO/IR) camera, radar, Electronic Support Measures (ESM). These sensors are used to capture environmental, individual, and conventional signatures, and generate large volumes of collected data. Coping with this deluge of information becomes one of the main issues in the actual world. The biggest challenge we face is to figure out how to make sense of it in a timely manner, by decreasing as much as possible the human interaction and taking advantage of semi-autonomous and autonomous systems. Therefore, there is a need for novel architectures and fusion algorithms to better exploit available information sources for target recognition and identification, and ultimately provide

978-0-9964527-7-9 ©2018 ISIF

tactical operators and intelligence analysts with higher-quality fused information. The objective of the current work is to obtain a high accuracy of vessel/ship classification based on a priori information about the ships (stored in databases), and also based on on-the-fly sensed data from multiple multimodal sensors. Due to the heterogeneous/multimodal nature of the sensed data (visual images or videos, kinematics, etc.), classification algorithms can be developed for specific types of data. Neural networks approaches have been used for ship classification in the last decades using specific data such as Inverse SyntheticAperture Radar (ISAR) images [1], Forward Looking Infrared (FLIR) images [2] or [3]. These conventional neural networks techniques for image classification used hand-crafted features for specific tasks. This was both time-consuming and errorprone. A big challenge in the development of target (vessel/ship) classification algorithms was the lack of processing power and the lack of real data (large volumes of labelled training data obtained in similar conditions/environments as the validation/test data). In recent years, the availability of larger datasets and high-processing power has led to the development (or the implementation) of improved vision-based classification approaches, leveraging convolutional neural networks (CNNs) and deep learning techniques that do not use hand-crafted descriptors [4], [5]. Deep learning methods, also referred to as deep neural networks, are artificial neural networks with multiple hidden layers (as many as 150) between the input and output layers. Instead of hand-crafted features, the descriptors are learned from large collections of raw images using a neural network. As a result, deep learning models are able to learn a hierarchy of features that are invariant to geometric transformations from raw input images. Deep learning is an “end-to-end learning” technique where raw data is fed to a classification algorithm which learns how to do the classification task automatically. Training a CNN using random parameter initialization with a small target dataset usually results in lower performance

737

2018 21st International Conference on Information Fusion (FUSION)

[6]. In such situations, transfer learning techniques have been developed to exploit specific smaller target datasets [7]. Transfer learning provides an effective way to train a large network - without over-fitting - using in a first step a large number of images and a large number of classes. ImageNet is widely used as the source dataset for such problems. In a second step, transfer learning for CNNs usually involves replacing the last fully connected layer(s) of a pre-trained network on a large source dataset with new fully connected layer(s). This step is followed by a new training phase that uses the smaller target dataset. Our main contribution is that we transfer the convolution layers of CNN architectures for fast training and testing of ship images with increased classification accuracy. Previous solutions used shallower CNNs [8] resulting in lower accuracy. We demonstrate that the proposed method outperforms the state-of-the-art CNN based method in ship image classification with limited data. This paper is structured as follows. In section II we introduce related work. Section III is an introduction to image classification and to the different CNN architectures that will be compared in this work for our ship classification problem. Section IV first describes the experiments and compares the results obtained with the different architectures. We conclude in Section V and present some future works directions. II. R ELATED WORK A. Target Classification In addition to obtaining track estimates, it is desirable, wherever possible, to identify and classify the targets and predict their intent. The state estimation and classification go handin-hand in maritime surveillance (or any other surveillance problem) where classification results can play a significant role in the countermeasures against identified targets. This becomes even more crucial in view of the large surveillance area within which targets need to be tracked and classified in maritime or ground surveillance problems. In [9] Bayesian target classification methods using radar and electronic support measure (ESM) data were considered. The limitation of this approach is that joint tracking and classification becomes computationally more challenging and the performance degrades with increasing number of targets and target classes. In [10] the application of a machinelearning approach based on the Partial Rules from Decision Trees (PART) algorithm to classify Air Traffic Control (ATC) trajectory segments (or modes of lights) from radar detections was addressed. In [11] target recognition is considered as a decision problem based on a number of features such as mean intensity, area of target and polarimetric power ratio. In [12] a new proposed joint decision and estimation (JDE) framework to a joint target tracking and classification problem is presented, but the limitation is that this method is not applicable to multitarget scenarios. In [13] an algorithm for the identification of air and sea targets in coastal radars based on a maximum likelihood frequency classification algorithms followed by an artificial neural network is presented. In [14] target

classification based on the Belief Function Theory was addressed. In [15] a system called the Intelligent Surface Threat Identification System (ISTIS), which improves the surface threat ID process, quality, and efficiency using track data and multisensor feature information based on a multiple hypothesis maintenance and reasoning approach was presented. In [16] a new joint target tracking and classification technique based on Observable Operator Models (OOM) was considered, but the major limitation of this approach was scalability to largescale problems. In [17] a novel feature extraction approach for robust classification and identification of moving target vehicles to reduce those factors was proposed based on Low Rank Matrix Decomposition of acoustic signals. However, multiple target tracking and data association issues were not addressed in this paper. In [18], a new algorithm based on multiple Hidden Markov Models that can use kinematic and feature information for automatic classification, even when the number of types of classes is unknown, was proposed. In [19], a systematic Bayesian approach for integrating classification information into association-based multitarget tracking algorithms was proposed. Video data has been used extensively in the detection and tracking of maritime targets. A comprehensive review works on this topic can be found in [20]. In [21], a systematic approach for video acquisition, vessel detection and activity labeling using standard and relational supervised machine learning for ontology based vessel activity annotation was proposed. In [22], new algorithms for video-based maritime detection, tracking and classification from rapidly moving (e.g., airborne or spaceborne) platforms were proposed based on traditional image processing, segmentation and horizon detection techniques. In [23], a new ship detection algorithm that is robust against varying lighting conditions and sun reflections was proposed based on blob analysis techniques. In [24], various Bayesian, fuzzy and learning-based methods are presented in detail for background modeling and foreground object detection for maritime and other video surveillance problems. In [25], a new Grabcut algorithm for improved background modeling and ship detection was proposed. In [26], a framework for detecting and tracking ships based on classical image processing techniques such as horizon detection and edge detection, followed by Kalman filtering, was proposed. In [27], traditional image processing approaches based on blob analysis, anomaly detection and constant-falsealarm-rate processing are used to mitigate the effects of wakes, glint and other background characteristics in the automatic detection and tracking of maritime objects from airborne video. B. CNN Classification Convolutional Neural Networks (CNNs) are a class of deep neural networks that has successfully been applied to image classification. Since 2012, CNNs have been used to win every round of the ImageNet Large-Scale Visual Recognition Challenge (ILSVRC), a yearly image recognition competition where the state-of-art classifiers compete. The performance of

738

2018 21st International Conference on Information Fusion (FUSION)

Fig. 1. Transfer learning for ship classification and tracking in five steps

the winning CNN models has increased every year [28]. In general, the deeper and wider networks can detect superior visual features. These new techniques have also been applied in recent years for ship SAR or image classification [29], [30], and [8]. The results published so far were either on smaller datasets or obtained lower accuracies than what we achieved. CNNs and transfer learning have already been used for a variety of computer vision tasks such as image classification, scene recognition, fine grain recognition, attribute detection and image retrieval [31], [6] and [32]. It was shown that the combination of CNN with transfer learning, even when using a distant source task, achieves better performance than the alternative approach where the neural network weights are initialized with random values. III. I MAGE C LASSIFICATION T ECHNIQUES APPLIED TO S HIP C LASSIFICATION



• •



IHS dataset [34] – subscription-based commercial ship registry information and imagery (190,000 merchant ships and 12,000 warships) FleetMon dataset [35] – subscription-based vessel photo database (around 440,000 images) MaritimeTraffic dataset [36] - subscription-based vessel photos database - there are fees associated with this dataset (around 2,600,000 images) VesselFinder dataset [37] - subscription based ship photos and vessels gallery (around 300,000 images)

Because of the limitations of the available ships datasets, we will consider a CNN deep learning classification method jointly with a transfer learning approach described in [7] instead of training a deep network from scratch. B. Transfer Learning

From an ISR operation point of view, images of ships to be used in a classification algorithm could be obtained from airborne plateforms using EO/IR cameras such as MX20, with a specific range of acquisition angles. There exists no large-scale labelled ship image dataset comparable to ImageNet since data acquisition is difficult and obtaining highquality annotated images is expensive. Consequently, we use a selected ship image dataset to develop and experiment with deep-learning classification techniques, such as CNN training jointly with transfer learning. In this paper, we evaluate three different pre-trained CNN architectures with various depths: AlexNet, ResNet and Inception (Section III-D). A. Ships Image Dataset One of the main issues in realizing this study was the need of a large publicly available database of ship images comprehensively annotated. However, several relatively smaller ships databases are available such as: • MARVEL dataset [8] – publicly available large dataset of labelled visible images of ships (around 140,000 ships extracted from 2,000,000 images) • VAIS dataset [33] – unregistered thermal and visible images of ships (around 3,000 images)

We first use the open source and large ImageNet source dataset (Section III-C) for training. This first step provides us with a neural network that has learned to do general image classification. The first CNN layers react to the different orientated edges and lines in the image, middle layers react to parts of an object such as corners and surface boundaries, while high layers react to larger object parts and even complete objects. As a result, the last layers of CNNs tend to be more specific to the datasets and tasks performed. As a second step, we learn to classify target Maritime Vessel images by finetuning this pre-trained network. This second phase of training is much faster than the first step and requires far less images. The proposed design for ship classification is described in Figure 1 and contains five steps. In this paper, we use the pre-trained CNN architectures (Section III-D) since they provide among the best accuracy with the ImageNet dataset. These networks contain the following types of layers: convolutions, rectified linear unit activations, average-pooling, max-pooling, dropout, fully connected and softmax activation. To integrate the transfer learning into the pre-trained CNN architectures, we remove the last fully-connected layer from each pre-trained model and add a new fully connected output layer. This output layer corresponds to

739

2018 21st International Conference on Information Fusion (FUSION)

O = sof tmax(W · X + B)

(1)

where X is the output of the second to last layer of the pretrained models, W , B are the parameters to be trained, and sof tmax is the non-linear activation function that computes the probability predicted for each label. The size of the last layer equals the number of ship classes. C. ImageNet Dataset ImageNet is a large-scale labelled image dataset based on the WordNet hierarchy [38]. It was constructed in part by collecting image labels with Amazon Mechanical Turk. ImageNet contains 1.2 million images split into 1000 classes. Classes include for example goldfish, koala, golden retriever, fly, airship, coffee mug, honeycomb, passenger car, etc. The dataset has a rich variety of images and includes occluded, partial and small objects.

Fig. 2. Sample images from the ImageNet dataset [38]

D. CNN Architectures A notable CNN architecture is AlexNet [39], the deep convolutional neural network proposed by Krizhevsky et al. This CNN was the most efficient method to classify the ImageNet 2012 dataset. AlexNet was composed of five convolutional layers, three fully connected layers and a final softmax activation. Krizhevsky et al. exploited GPU acceleration to train the CNN. AlexNet was 10.9% more accurate than the next best algorithm. CNN accuracy can be increased by adding more layers. However, these additional layers make it harder to train CNNs [40]. A breakthrough was made when Residual Networks (ResNets) were introduced. Residual training allowed researchers to train very deep networks, up to 152 layers deep. ResNets won the first place for the 2015 ImageNet classification task. The Inception architecture [41] introduced parallel network connections, increasing classification accuracy. These parallel connections have the effect of processing the image at multiple scales. GPU acceleration was once again used to speed up Inception training. E. CNN Hyper-parameters Hyper-parameters allow to configure CNN models. These hyper-parameters are not estimated from the data during gradient descent training. They are instead fixed manually

before training begins and must be tuned using rules of thumb, prior experience and trial and error. Number of Layers. As mentioned above, we can often increase CNN accuracy by adding more layers. Again, deeper networks make the training process harder. AlexNet has eight layers. ResNets can have between 18 to 152 layers. Inceptionv1 has 22 layers while Inception-v3 has 48 layers. Since there is no guarantee that the deepest network will have the best performance for our task, we test the accuracy of different network depths. L2 Regularization. To get the best accuracy, we need to balance model complexity. The actual model parameter to manage model complexity is called regularization. If we increase regularization, we decrease the model complexity. The best performing model should be of average complexity, not too simple, not too complex. As a result, we can test various regularization values to achieve the best model complexity and validation accuracy. Learning Rates. Learning rates affect how the error rate or loss evolves over time during the neural network training. Low learning rates slowly reduce the error rate over time. Since they converge slowly, they do not achieve the highest accuracy. High learning rates achieve even lower accuracy than the low learning rate. They can rapidly decrease the error rate when training starts, but quickly get stuck at higher error rates. Finding the ideal learning rate can be challenging. A common strategy is to start with a moderate learning rate. We then gradually reduce that rate over time in order to avoid getting stuck at suboptimal accuracies. This is the strategy that we use in this paper. The learning rate schedule is defined in section IV-C. F. Improve Tracker Estimate with Target Classification Assume that each target belongs to one of s known classes c ∈ {1, 2, . . . , s} and the kinematic behavior of each target class is characterized by a linear dynamical model set Sc with Markov property and corresponding transition matrix Pc . The ith dynamical model Mci ∈ Sc , i = 1, ..., r(c) can then be represented by x(k) = F (Mci )x(k − 1) + G(Mci )u(k, Mci ) + v(k, Mci ) (2) where x(k)  [x(k) x(k) ˙ x ¨(k) y(k) y(k) ˙ y¨(k)] is target kinematic state at time k and v(k, Mci ) is a white independent identically distributed (iid) Gaussian process noise with zero mean and covariance Q(mic ). Then the measurement equation can be written as z(k)

=

Hx(k) + w(k)

(3)

where w(k) is assumed as iid Gaussian noise with zero mean and covariance R. Here it is assumed that there are s numbers of Interacting Multiple Model (IMM) estimators [42] corresponding to s target classes, respectively. These s numbers of IMM estimaˆ i (k|k), tors are set up for each target. The state estimate x

740

2018 21st International Conference on Information Fusion (FUSION)

associated covariance matrix Pi (k|k), posterior probability of mode j being correct given the measurement up to time k−1, noted as μji (k), and the measurement likelihood function conditioned on mode j, noted as Λji (k), obtained by each IMM estimator i = 1, ..., s are provided to the subsequent recognition and fusion steps. Given the measurement sequence Zk = {z(0), . . . , z(k)} up to time k, the posterior probability of target being class i, noted as P (c = i|Zk ), in the recursive Bayesian framework, is given by P (c = i|Zk ) = P (c = i|z(k), Zk−1 ) P (z(k)|c = i, Zk−1 )P (c = i|Zk−1 ) = s k−1 )P (c = j|Zk−1 ) j=1 P (z(k)|c = j, Z

(4)

authors grouped all the vessels into 26 super-classes (Table I). Each of these classes has 8,192 training images and 1,024 test images. As a result, the training and test sets have 212,992 and 26,624 images each. Different image croppings are used for the classes that have an insufficient number of images. The training and test sets do not overlap. B. Vessel Taxonomy StatCode 5 ship type coding system is a standard method of describing ship types [43]. It makes it easier to see a logical break down and allocation of ships between groups. Table I shows a comparison between the MARVEL superclasses and the StatCode 5 standard proposed by IHS Markit. It can be noticed that all superclasses from the MARVEL dataset are part of the StatCode 5 standard, except 3. C. Experimental Parameters

where P (z(k)|c = i, Z

k−1

)

r(i)

=

 j=1

P (z(k)|u(k) = Mij , c = i, Zk−1 ) (5)

× P (u(k) = Mij |c = i, Zk−1 ) =

r(i)  j=1

μji (k)Λji (k)

μji (k)

is the posterior probability of mode j being Also, correct given the measurement up to time k − 1 and Λji (k) is the likelihood function conditioned on mode j of class i at time k. Finally, we have P (c = i|Zk ) = s

r(i)

j=1



j=1

μji (k)Λji (k)P (c = i|Zk−1 )

r(j) m=1

m k−1 ) μm j (k)Λj (k)P (c = j|Z



We use a single NVIDIA GeForce GTX 1080 Ti GPU with 11GB of GPU memory for training. In addition, we use an Intel Core i7-8700K processor and 32GB of RAM. We use mini-batch gradient descent with momentum and a cross-entropy loss. Our hyper-parameters are as follows: the mini-batch size is 128, the momentum is 0.9 and the number of epochs is 300. Previous benchmarks have been performed to document the number of images per second that different architectures can process during training [44]. The metrics are 142 images/second for Inception-v3, 218 images/second for ResNet-50, 91 images/second for ResNet-152 and 2890 images/second for AlexNet.

(6)

and P (c = i|Z0 ) = P0 (c = i)

(7)

where P0 (c = i) is the prior probability that the target belongs to class i. IV. E XPERIMENTS AND R ESULTS In this section, we present our experimental results for our CNN image classifiers. We detail the procedure and rationale behind the experiments performed for this paper. We use the Marvel dataset for our experiments. A. Marvel Ship Image Dataset We selected the publicly available Marvel Dataset [8] for this work since it satisfies most of our requirements. The dataset is a collection of 140K unique labelled maritime vessel images. Each image has a size of 256 by 256 pixels. Figure 3 shows a sample of the MARVEL ships database. In [8], the

741

TABLE I V ESSEL TAXONOMIES . C OMPARISON OF MARVEL SHIP SUPERCLASSES [8] AND IHS M ARKIT S TAT C ODE 5 CLASSES [43] Class No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26

MARVEL Superclasses Container Ship Bulk Carrier Passengers Ship Ro-ro/passenger Ship Ro-ro Cargo Tug Vehicles Carrier Reefer Yacht Sailing Vessel Heavy Load Carrier Wood Chips Carrier Livestock Carrier Fire Fighting Vessel Patrol Vessel Platform Standby Safety Vessel Combat Vessel Training Ship Icebreaker Replenishment Vessel Tankers Fishing Vessels Supply Vessels Carrier/Floating Dredger

StatCode 5 Classes Class Container Ship Bulk Carrier Passenger Ship Passenger/Ro-Ro Cargo Ship Ro-Ro Cargo Ship Tug Vehicles Carrier Refrigerated Cargo Ship Leisure Vessels Sailing Vessel Heavy Load Carrier Wood Chips Carrier Livestock Carrier Fire Fighting Vessel Patrol Vessel Platform Supply Ship Standby Safety Vessel N/A Training Ship Icebreaker N/A Tankers Fishing Vessel Offshore Supply N/A Dredging

Level 4 4 4 4 4 4 4 3 4 4 4 4 4 4 4 4 4 N/A 3 3 N/A 2 4 3 N/A 3

2018 21st International Conference on Information Fusion (FUSION)

TABLE II VALIDATION ACCURACY RESULTS Architecture AlexNet Inception Inception Inception ResNet ResNet ResNet ResNet ResNet ResNet ResNet ResNet ResNet

Layers (version) 8 22 (v1) 22 (v1) 48 (v3) 18 18 18 34 34 34 34 50 152

Learning rate N/A [0.2]*30 + [0.1] [0.4]*20 + [0.2]*20 + [0.1]*30 + [0.01]*20 + [0.001]*30 + [0.0001]*30 + [0.00001] [0.2]*10 + [0.1]*10 + [0.05]*30 + [0.005]*20 + [0.0005]*30 + [0.00005] [0.2]*10 + [0.1] [0.2]*30 + [0.1] [0.4]*20 + [0.2]*20 + [0.1]*30 + [0.01]*20 + [0.001]*30 + [0.0001]*30 + [0.00001] [0.2]*10 + [0.1]*10 + [0.05]*30 + [0.005]*20 + [0.0005]*30 + [0.00005] [0.4]*30 + [0.2]*30 + [0.1]*30 + [0.01]*30 + [0.001]*30 + [0.0001]*30 + [0.00001] [0.4]*30 + [0.2]*30 + [0.1]*30 + [0.01]*30 + [0.001]*30 + [0.0001]*30 + [0.00001] [0.4]*30 + [0.2]*30 + [0.1]*30 + [0.01]*30 + [0.001]*30 + [0.0001]*30 + [0.00001] [0.4]*30 + [0.2]*30 + [0.1]*30 + [0.01]*30 + [0.001]*30 + [0.0001]*30 + [0.00001] [0.4]*30 + [0.2]*30 + [0.1]*30 + [0.01]*30 + [0.001]*30 + [0.0001]*30 + [0.00001]

L2 Reg. N/A 0.0005 0.0005 0.0005 0.0005 0.0005 0.0005 0.0005 0.001 0.002 0.005 0.0005 0.0005

Valid. acc. 73.14% [8] 71.89% 69.94% 78.73% 74.23% 74.10% 74.10% 75.84% 72.94% 72.92% 71.42% 74.38% 74.67%

D. Results Results are presented in Tables II and III. Our optimal transfer learning architecture and parameter choices achieve 78.73% accuracy, a significant improvement over the current state of the art of 73.14% [8]. Here we use the following learning rate notation: [learning rate] ∗ number of epochs. The last learning rate applies to the remaining epochs up to a total of 300. After comparing different choices, we found that the best accuracy is obtained with the following learning rates: [0.2]∗10+[0.1]∗ 10 + [0.05] ∗ 30 + [0.005] ∗ 20 + [0.0005] ∗ 30 + [0.00005]. We found that a regularization value of 0.0005 gave the best accuracy. We compared the performance of various versions of ResNets and Inception networks. We found that Inception-v3 performed better than both ResNets and AlexNet for maritime vessel classification. For ResNets, we found that we can achieve the best accuracy when we use the ResNet 34 architecture. For the Inception architecture, we found that 48 layers (v3) performed better than 22 layers (v1). Fig. 3. Sample images from the Maritime Vessel dataset [8]

V. C ONCLUSION AND F UTURE W ORKS

AlexNet Architecture: We include the results from previous experiments with AlexNet [8]. Inception Architecture: For the Inception architecture, two different configurations have been investigated for the Ship Classification problem: 22 layers (v1) and 48 layers (v3). Resnet Architecture: We tested what happens when we increase the number of ResNet layers from 18, to 34, to 50, and to 152. L2 Regularization: We varied the amount of L2 regularization to see the effect on test error. Table II present the regularization values that were tested. We compared the accuracy when we change the regularization in the 0.0005 to 0.005 interval. Learning Rates: We test various learning rates using the general strategy presented above: start with a moderate learning rate then gradually reduce that rate over time.

In this project, we exploited the existing deep convolutional neural network architectures Inception and ResNet to solve a new classification problem. We performed transfer learning experiments to find the ideal architecture, number of layers, learning rate, and regularization parameters to classify maritime vessel images. We achieved state-of-the-art results on the Marvel maritime vessel image classification dataset. This work demonstrates that using features from the first layers of convolutional neural networks can be transferred to the problem of classifying maritime vessels. Future experiments will require to apply a ship detector to the ships database to analyze the impact of the background outside the ships bounding boxes on the classification accuracy. Additional classifiers based on ESM and kinematic measurements will also be used to improve the classification accuracy. The new ship classifier will be integrated with a software toolset for analysis, visualization, real-time object tracking and multi-sensor data fusion. Using our results, the

742

2018 21st International Conference on Information Fusion (FUSION)

TABLE III N ORMALIZED C ONFUSION M ATRIX . E ACH CELL CONTAINS A PERCENTAGE OF IMAGES CLASSIFIED IN EACH CATEGORY.

software will be able to perform image classifications and exploit the classification results to improve ship tracking. R EFERENCES [1] M. M. Menon, E. R. Boudreau, and P. J. Kolodzy, “An Automatic Ship Classification System for ISAR Imagery,” The Lincoln Laboratory Journal, vol. 6, no. 2, 1993. [2] P. Withagen, K. Schutte, A. Vossepoel, and M. Breuers, “Automatic classification of ships from infrared (FLIR) images,” in SPIE AeroSense Orlando, Signal Processing, Sensor Fusion, and Target Recognition VIII, vol. 3720, 1999. [3] Q. Zhongliang and W. Wenjun, “Automatic ship classification by superstructure moment invariants and two-stage classifier,” in Communications on the Move, Singapore ICCS/ISITA, 1992. [4] Y. LeCun, Y. Bengio, and G. E. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, 2015. [Online]. Available: https://doi.org/10.1038/nature14539 [5] W. Rawat and Z. Wang, “Deep convolutional neural networks for image classification: A comprehensive review,” Neural computation, vol. 29, no. 9, pp. 2352–2449, 2017. [6] J. Yosinski, J. Clune, Y. Bengio, and H. Lipson, “How transferable are features in deep neural networks?” in Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 813 2014, Montreal, Quebec, Canada, 2014, pp. 3320–3328. [Online]. Available: http://papers.nips.cc/paper/5347-how-transferableare-features-in-deep-neural-networks [7] M. Oquab, L. Bottou, I. Laptev, and J. Sivic, “Learning and transferring mid-level image representations using convolutional neural networks,” in 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014, pp. 1717–1724.

[8] E. Gundogdu, S. Berkan, Y. Veysel, and K. Aykut, “Marvel: A largescale image dataset for maritime vessels,” in Asian Conference on Computer Vision. Springer, Cham, 2016, pp. 165–180. [9] S. Challa and G. Pulford, “Joint target tracking and classification using radar and ESM sensors,” in IEEE Transactions on Aerospace and Electronic Systems, vol. 3, no. 37, 2001, pp. 1039 – 1055. [10] J. Garcia, O. Concha, J. Molina, and G. Miguel, “Trajectory classification based on machine-learning techniques over tracking data,” in International Conference on Information Fusion, 2006. [11] B. Hughes and L. Soldani, “Automatic target recognition: The problems of data separability and decision making,” in IET High Resolution Imaging and Target Classification Seminar, 2006. [12] X. R. Li and M. Yang, “Joint tracking and classification based on Bayes joint decision and estimation,” in International Conference on Information Fusion, 2007. [13] M. Malboubi, J. Akhlaghi, M. Saraf, and H. Sadeghi, “The intelligent identification of air and sea targets in coastal radars,” in Third European Radar Conference, 2006. [14] B. Ristic and P. Smets, “Target classification approach based on the belief function theory,” IEEE Transactions on Aerospace and Electronic Systems, vol. 41, no. 2, pp. 574–583, 2005. [15] R. Stottler, B. Ball, and R. Richards, “Intelligent surface threat identification system (ISTIS),” in IEEE Aerospace Conference, 2007. [16] S. Sutharsan, A. Sinha, T. Kirubarajan, and M. Farooq, “Observable operator model based joint target tracking and classification,” in Signal Processing, Sensor Fusion, and Target Recognition XV. Proc. SPIE 6235., vol. 6235, 2006. [17] T. Viangteeravat, A. Shirkhodaie, and H. Rababaah, “Multiple target vehicles detection and classification based on low-rank cecomposition,” in Proc. SPIE 6566, Automatic Target Recognition XVII, 2007. [18] X. He, R. Tharmarasa, A.-L. Jousselme, P. Valin, and T. Kirubarajan, “Joint Class Identification and Target Classification Using Multiple Hidden Markov Models,” IEEE Transactions on Aerospace and Electronic Systems, vol. 50, no. 2, pp. 1269–1282, 2014.

743

2018 21st International Conference on Information Fusion (FUSION)

[19] Y. Bar-Shalom, T. Kirubarajan, and C. Gokberk, “Tracking with Classification-Aided Multiframe Data Association,” IEEE Transactions on Aerospace and Electronic Systems, vol. 41, no. 3, pp. 868–878, 2005. [20] R. Moreira, N. Ebecken, A. Alves, F. Livernet, and A. Campillo-Navetti, “A survey on video detection and tracking of maritime vessels,” in IJRRAS, vol. 20, no. 1, 2014, pp. 37–50. [21] K. Gupta, D. Aha, R.Hartley, and P. Moore, “Adaptive Maritime Video Surveillance,” in Proceedings of SPIE: Visual Analytics for Homeland Defense and Security, vol. 7346, 2009. [22] S. Fefilatyev, “Algorithms for Visual Maritime Surveillance with Rapidly Moving Camera,” Ph.D. dissertation, University of Souther Florida, 2012. [Online]. Available: http://scholarcommons.usf.edu/etd/4037 [23] J. Marques, A. Bernardino, G. Cruz, and M. Bento, “An algorithm for the detection of vessels in aerial images,” in IEEE International Conference on Advanced Video and Signal Based Surveillance, 2014. [24] T. Bouwmans, F. Porikli, B. H¨oferlin, and A. Vacavant, Background Modeling and Foreground Detection for Video Surveillance. Chapman and Hall/CRC, 2014. [25] C. Xu, D. Zhang, Z. Zhang, and Z. Feng, “BgCut: Automatic Ship Detection from UAV Images,” Thee Scientific World Journal, vol. 2014, no. 171978, 2014. [26] S. Fefilatyev, “Detection of marine vehicles in images and video of open sea,” Master’s thesis, University of Souther Florida, 2008. [27] S. Parameswaran, C. Lane, B. Bagnall, and H. Buck, “Marine Object Detection in UAV Full-Motion Video,” in Proc. SPIE 9076, Airborne Intelligence, Surveillance, Reconnaissance (ISR) Systems and Applications XI, 2014. [28] K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification,” in Proceedings of the IEEE international conference on computer vision, 2015, pp. 1026–1034. [29] N. Ødegaard, A. O. Knapskog, C. Cochin, and J.-C. Louvigne, “Classification of Ships Using Real and Simulated Data in a Convolutional Neural Network,” in 2016 IEEE Radar Conference (RadarConf), 2016. [30] C. Dao-Duc, H. Xiaohui, and O. Mor`ere, “Maritime vessel images classification using deep convolutional neural networks,” in Sixth International Symposium on Information and Communication Technology, 2015, pp. 276–281. [31] A. Razavian, H. Azizpour, J. Sullivan, and S. Carlsson, “CNN features off-the-shelf: an astounding baseline for recognition,” in Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 512–519.

[32] H. Azizpour, A. Razavian, J. Sullivan, A. Maki, and S. Carlsson, “From generic to specific deep representations for visual recognition,” in CVPRW DeepVision Workshop, Boston, MA, USA, 2015. [33] M. M. Zhang, J. Choi, K. Daniilidis, M. T. Wolf, and C. Kanan, “Vais: A dataset for recognizing maritime imagery in the visible and infrared spectrums,” in IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Boston, USA, 2015. [34] IHS Markit. (2018) IHS Maritime World Register of Ships. [Online]. Available: https://ihsmarkit.com/products/maritime-worldship-register.html [35] FleetMon. (2018) Vessel Photo Database. [Online]. Available: https://www.fleetmon.com/community/photos [36] MaritimeTraffic. (2018) Vessel photos. [Online]. Available: http://www.marinetraffic.com/en/photos/of/ships [37] Vessel Finder. (2018) User contributed ship photos and vessels gallery. [Online]. Available: https://www.vesselfinder.com/gallery [38] J. Deng, W. Dong, R. Socher, L. Li, K. Li, and L. Fei-Fei, “ImageNet: A large-scale hierarchical image database,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2009), 2009, pp. 248– 255. [39] A. Krizhevsky, I. Sutskever, and G. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” in NIPS’12 Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1, 2012, pp. 1097–1105. [40] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778. [41] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 2818–2826. [42] Y. Bar-Shalom, X. R. Li, and T. Kirubarajan, Estimation with Applications to Tracking and Navigation: Theory, Algorithms and Software. Wiley, NY, 2001. [43] IHS Markit. (2018) StatCode 5 Shiptype Coding System - A Categorisation of Ships By Type - Cargo Carrying Ships. [Online]. Available: https://cdn.ihs.com/www/pdf/Statcode-ShiptypeCoding-System.pdf [44] Google. (2018) Benchmarks. [Online]. Available: https://www.tensorflow.org/performance/benchmarks

744

Suggest Documents