sensors Article
Multiscale Rotated Bounding Box-Based Deep Learning Method for Detecting Ship Targets in Remote Sensing Images Shuxin Li, Zhilong Zhang *, Biao Li and Chuwei Li ATR National Key Laboratory, National University of Defense Technology, Changsha 410073, China;
[email protected] (S.L.);
[email protected] (B.L.);
[email protected] (C.L.) * Correspondence:
[email protected] Received: 9 July 2018; Accepted: 14 August 2018; Published: 17 August 2018
Abstract: Since remote sensing images are captured from the top of the target, such as from a satellite or plane platform, ship targets can be presented at any orientation. When detecting ship targets using horizontal bounding boxes, there will be background clutter in the box. This clutter makes it harder to detect the ship and find its precise location, especially when the targets are in close proximity or staying close to the shore. To solve these problems, this paper proposes a deep learning algorithm using a multiscale rotated bounding box to detect the ship target in a complex background and obtain the location and orientation information of the ship. When labeling the oriented targets, we use the five-parameter method to ensure that the box shape is maintained rectangular. The algorithm uses a pretrained deep network to extract features and produces two divided flow paths to output the result. One flow path predicts the target class, while the other predicts the location and angle information. In the training stage, we match the prior multiscale rotated bounding boxes to the ground-truth bounding boxes to obtain the positive sample information and use it to train the deep learning model. When matching the rotated bounding boxes, we narrow down the selection scope to reduce the amount of calculation. In the testing stage, we use the trained model to predict and obtain the final result after comparing with the score threshold and nonmaximum suppression post-processing. Experiments conducted on a remote sensing dataset show that the algorithm is robust in detecting ship targets under complex conditions, such as wave clutter background, target in close proximity, ship close to the shore, and multiscale varieties. Compared to other algorithms, our algorithm not only exhibits better performance in ship detection but also obtains the precise location and orientation information of the ship. Keywords: multiscale rotated bounding box; deep learning; ship detection; remote sensing image
1. Introduction Ships are important maritime targets and monitoring the ship target is valuable in civil and military applications [1]. With the development of optical remote sensing, the resolution of remote sensing images has gradually increased, enabling the detection and recognition of ship targets. A conventional approach to interpret the optical remote sensing images relies on human expertise, which, however, faces difficulty in satisfying the need of processing mass image data. Therefore, it is necessary to study the problems involved in ship detection. There are three difficulties in detecting ships from the optical remote sensing images. First, ships vary in appearance and size. The base shape of the ship target is long spindle, but ships of different categories have different scales, aspect ratios, and appearances. Second, different ship targets have different orientations. As remote sensing images are captured above the target from the sky
Sensors 2018, 18, 2702; doi:10.3390/s18082702
www.mdpi.com/journal/sensors
Sensors 2018, 18, 2702
2 of 14
platform, ship targets have many orientations, which make ship detection harder. Third, there is complex background clutter around a ship target, such as wave interference, ships in close proximity, and land background when the ships are close to the shore. There are two conventional approaches for detecting ships. The first is to detect the ship target through heuristic artificially designed multistep methods [1–4]. Relying on the difference between the ship and the background, these algorithms analyze the region geometric features, such as aspect ratio and scale, to decide in which region the ship target is after segmentation of ocean and land. However, artificially designed steps result in a lack of robustness. The other methods [5–8] rely on machine learning to train the classifier using features extracted from positive and negative samples. When testing a new image, sliding windows or regions of interest which is produced by a pre-detection method, such as Selective Search, are sent to the classifier to predict the class score, and then, a threshold is set to obtain the final detection result. Using a large scale of positive and negative samples to train the classifier, the performance of the detector is made more robust in complex backgrounds, which is usually used in target verification stage [9]. However, most methods use horizontal bounding boxes to detect the target, which is not suitable for more complex conditions such as targets in close proximity and when the ship is close to the shore. Recently, deep learning-based methods have made significant progress in computer vision applications. Combining a large-scale dataset and high-performance computing hardware GPUs, deep learning methods have undergone a leap in development in areas such as target detection [10–14], target classification [15,16], and semantic segmentation [17,18]. Deep learning-based detection methods exhibit high performance in detecting common objects in daily images. For example, the Faster R-CNN algorithm [12] adopts a two-stage pipeline to detect objects. The first stage produces the regions of interest from the feature maps extracted through a pretrained network, and the second stage selects the features of regions of interest from the shared feature maps and predicts a more precise classification and localization. The SSD algorithm [13] uses end-to-end training networks to predict the class name of the target and regress the bounding box information from multiscale feature maps, which is produced by the hierarchical down sampling structure of deep network. Combined with the strategy of data augmentation, the performance of SSD is close to that of the Faster R-CNN algorithm and SSD exhibits faster detection than Faster R-CNN. Though exhibiting good performance, the above two algorithms use a horizontal bounding box to detect objects, which is not suitable for rotated objects in close proximity. In the area of ship detection in remote sensing images, there are also deep learning-based methods [19–23], which use deep neural networks to extract robust features from the image. However, most of them are also based on a horizontal bounding box. Recently, deep learning-based methods using a rotated bounding box have been proposed to detect rotated remote sensing targets. Liu et al. proposed a DRBox algorithm [24] to combine the rotated bounding box with deep learning methods to realize rotation-invariant detection. This algorithm can effectively detect airplane, vehicle, and ship targets in remote sensing images and output the precise location and orientation information of the target. However, the approximate calculation in matching the rotated bounding boxes makes the performance unstable. Xia et al. proposed an improved Orientated Bounding Box based (OBB) Faster R-CNN algorithm [25] using eight parameters, which include the coordinates of the four corner points ( x1 , y1 , x2 , y2 , x3 , y3 , x4 , y4 ), to record the rotated bounding box based on the Faster R-CNN algorithm. The algorithm can effectively detect the oriented targets, but the predicted bounding box sometimes becomes deformed as there is no constraint to maintain the rectangular shape of the box. Figure 1 shows a comparison of the target with horizontal and rotated bounding boxes. When using horizontal bounding boxes to label targets in close proximity, it is difficult to distinguish the rotated targets, while using rotated boxes, they can be clearly distinguished.
Sensors 2018, 18, 2702 Sensors 2018, 18, x FOR PEER REVIEW
(a)
3 of 14 3 of 14
(b)
(c)
Figure 1. 1. Comparison Comparisonofoftarget target with horizontal rotated bounding (a) image, Input image, (b) Figure with horizontal andand rotated bounding boxes.boxes. (a) Input (b) targets targets with horizontal bounding boxes, and (c) targets with rotated bounding boxes. with horizontal bounding boxes, and (c) targets with rotated bounding boxes.
This paper proposes a rotated bounding box-based deep learning algorithm to detect ship This paper proposes a rotated bounding box-based deep learning algorithm to detect ship targets targets in remote sensing images. We aim to detect rotated ships in a complex background more in remote sensing images. We aim to detect rotated ships in a complex background more effectively. effectively. Based on the deep networks, we combine the rotated bounding box with the Based on the deep networks, we combine the rotated bounding box with the convolutional feature convolutional feature maps to simultaneously predict the location and orientation of the ship target. maps to simultaneously predict the location and orientation of the ship target. When labeling the ship When labeling the ship targets, we add an angle parameter and use five parameters, including two targets, we add an angle parameter and use five parameters, including two center point coordinates, center point coordinates, height, width, and angle of the box, to record a rotated box, which will height, width, and angle of the box, to record a rotated box, which will limit the box to have a limit the box to have a rectangular shape. In the training stage, we first set prior rotated boxes at rectangular shape. In the training stage, we first set prior rotated boxes at each location of feature maps each location of feature maps produced by networks and match them with the ground-truth boxes produced by networks and match them with the ground-truth boxes to obtain the positive sample to obtain the positive sample information. Through the range control strategy, we narrow down the information. Through the range control strategy, we narrow down the selection scope and the amount selection scope and the amount of calculation required for the matching is reduced. In the testing of calculation required for the matching is reduced. In the testing stage, the test image is sent to the stage, the test image is sent to the network to predict the class, location, and orientation information network to predict the class, location, and orientation information of the target. Through comparing of the target. Through comparing the scores with the threshold and post-processing of the scores with the threshold and post-processing of nonmaximum suppression, we can obtain the final nonmaximum suppression, we can obtain the final detection and location result. This paper has the detection and location result. This paper has the following contributions. (1) Combining the rotated following contributions. (1) Combining the rotated bounding boxes with the deep learning methods bounding boxes with the deep learning methods to simultaneously detect and locate the rotated ship to simultaneously detect and locate the rotated ship targets in a complex background, (2) adding an targets in a complex background, (2) adding an angle parameter to the original four parameters to angle parameter to the original four parameters to record a rotated bounding box to maintain the record a rotated bounding box to maintain the rectangular shape of the box, and (3) using a range rectangular shape of the box, and (3) using a range control strategy to calculate the matching control strategy to calculate the matching between the prior rotated boxes and ground-truth boxes to between the prior rotated boxes and ground-truth boxes to reduce the amount of calculation. reduce the amount of calculation. 2. Method to Label the Rotated Targets 2. Method to Label the Rotated Targets To label the rotated targets, we need to add a parameter, besides the original four parameters, To label the rotated targets, we need to add a parameter, besides the original four parameters, to record the angle information of the rotated target. The original labeling method records the to record the angle information of the rotated target. The original labeling method records the coordinates of the top-left and bottom-right corners to uniquely determine a horizontal bounding coordinates of the top-left and bottom-right corners to uniquely determine a horizontal bounding box. ( xc h, )y,c where , w, h) , box. the deep learning detection methods, the parameters are transformed In theIndeep learning objectobject detection methods, the parameters are transformed as ( xc ,as yc , w, (where xc , yc ) is of the center point, w is the width andofh the is the height w of ( xthe , yccoordinate ) is the coordinate of the center point, is the thebox, width box, andofhthe is box. the c The four parameters can better explain the feature of the box and ensure that the detection algorithm height of the box. The four parameters can better explain the feature of the box and ensure that the converges effectively.converges effectively. detection algorithm Based parameters, we add angle parameter theta to uniquely Based on onthese thesefour four parameters, we another add another angle parameter uniquely theta todetermine a rotated bounding box ( xc , yc , w, h, theta), where ( xc , yc ) is still the coordinate of the center point; determine a rotated bounding box ( xc , yc , w, h, theta ) , where ( xc , yc ) is still the coordinate of the h represents the long side of the box; w represents the short side of the box, which is perpendicular to w represents center point;represents the long side hofinthe thethe short side of the box, which h represents h; and theta the angle between thebox; upward direction and horizontal right direction, is perpendicular toofh[0, ; and represents the angle between in the in upward and the and is in the range 180) theta degrees. The labeling parameters arehshown Figuredirection 2. horizontal right direction, and is in the range of [0,180) degrees. The labeling parameters are shown in Figure 2.
Sensors 2018, 18, 2702
4 of 14
Sensors 2018, 18, x FOR PEER REVIEW Sensors 2018, 18, x FOR PEER REVIEW
4 of 14 4 of 14
(a)(a)
(b) (b)
(c) (c)
Figure 2. 2. Comparison used in horizontal and rotated rotatedbounding boundingboxes. boxes.(a)(a) (a) Parameters Figure Comparisonof parametersused usedin inhorizontal horizontal and and Parameters Figure 2. Comparison ofofparameters parameters rotated bounding boxes. Parameters used in horizontal bounding boxes and (b,c) parameters used in rotated bounding boxes. used in horizontal bounding boxes and (b,c) parameters used in rotated bounding boxes. used in horizontal bounding boxes and (b,c) parameters used in rotated bounding boxes.
3. Multiscale RotatedBounding Bounding Box-Based Deep Deep Learning Detection Model 3. Model 3. Multiscale Multiscale Rotated Rotated Bounding Box-Based Box-Based Deep Learning Learning Detection Detection Model Our algorithmisisbased based ondeep deeplearning learning methods methods and uses the rotated bounding Our algorithm themultiscale multiscale rotated bounding Our algorithm is based onon deep learning methods andand usesuses the multiscale rotated bounding boxes boxes to detect and locate the ships precisely. This section gives a comprehensive explanation of of thethe boxes to and detect andthe locate theprecisely. ships precisely. Thisgives section gives a comprehensive explanation to detect locate ships This section a comprehensive explanation of the algorithm algorithm from the perspectives of network structure, design of the key parts, and implementation algorithm from the perspectives networkdesign structure, design of theand keyimplementation parts, and implementation from the perspectives of networkof structure, of the key parts, details. details. details.
3.1. Network Structure 3.1. Network Structure 3.1. Network Structure The network structure is similar to that of the region proposal network of the Faster R-CNN The network structure is similar to that of the region proposal network of the Faster R-CNN The[12]. network structure is similar to that of conv5_3 the region proposal network of the Faster R-CNN method First, of VGG16 VGG16[16] [16] extract the features method [12]. First,we weuse uselayers layersfrom fromcov1_1 cov1_1 to to conv5_3 of totoextract the features of of thethe method [12]. First, we use layers from cov1_1 to conv5_3 of VGG16 [16] to extract the features of the image. VGG16 model is pretrained on on thethe ImageNet dataset [15],[15], which has has highhigh performance in object image. VGG16 model is pretrained ImageNet dataset which performance in image. VGG16Many model is pretrained the ImageNet datasetfrom [15],it,which has highR-CNN performance in classification. methods of targeton are fine-tuned such it, as Faster and SSD. object classification. Many methods ofdetection target detection are fine-tuned from such as Faster R-CNN object Many methods of target detection are fine-tuned from it, such as Faster R-CNN So and we classification. choose this model and fine-tune it to get a better performance. Then, based on the feature maps, SSD. So we choose this model and fine-tune it to get a better performance. Then, based on the and SSD. So weflow choose this model fine-tune it to gettoasimultaneously better layers performance. Then, based on and the we design two paths with two convolutional layers the loss label feature maps, we design two flowand paths with two convolutional to predict simultaneously predict feature maps, we two pathswe with two Finally, convolutional layers toloss simultaneously predict regress the location-angle offsets. Finally, calculate the losswe with the lossthe layer and the loss label anddesign regress the flow location-angle offsets. calculate withbackpropagation the loss layer the loss label and regress the location-angle offsets. Finally, we calculate the loss with the loss layer to train the model. The network is shown in Figure 3 and the the detection stage and backpropagation to train structure the model. The network structure is diagram shown inofFigure 3 and the and backpropagation to train the model. The network structure is shown in Figure 3 and the diagram the detection stage is shown in Figure 4. is shown inofFigure 4. diagram of the detection stage is shown in Figure 4. Input
Input Image data
Image annotation
Image data
Image annotation Output
Output
Convolution layer
Convolution layer Pooling layer
Output (test stage)
Loss calculation (train stage)
Output (test stage)
Loss calculation (train stage)
Class output layer
Location-angle output layer
Pooling layer 。 。 。
。 。 。 Convolution layer
Class output layer Convolution layer Pooling layer
Location-angle output layer
Convolution layers
Pooling layer
Convolution layers
Feature extraction
Target detection
Figure3.3.Diagram Diagram of of network Feature extraction Targetstructure. detection Figure network structure.
Figure 3. Diagram of network structure.
Sensors 2018, x FOR PEER REVIEW Sensors 2018, 18, 18, 2702 Sensors 2018, 18, x FOR PEER REVIEW
2k scores 2k scores cls layer cls layer
of 14 5 of5 14 5 of 14
k prior boxes k prior boxes
4k coordinates 4k coordinates reg layer reg layer 256-d 256-d intermediate layer intermediate layer
sliding window sliding window conv feature map conv feature map
… …… …
Figure 4. Diagram of target detection part. Figure Diagram target detection part. Figure 4.4.Diagram ofoftarget detection part.
Rotated Prior Box 3.2.3.2. Rotated Prior Box 3.2. Rotated Prior Box After obtaining convolutional feature maps, set we set rotated the rotated prior boxeseach on each position After obtaining thethe convolutional feature maps, After obtaining the convolutional maps, we we setthe the rotatedprior priorboxes boxeson on eachposition positionof of the feature maps to locate the ship targets. The rotated prior boxes are shown in Figure 4, is which the feature maps to to locate thethe ship targets. The rotated prior boxes areare shown in Figure 4, which in of the feature maps locate ship targets. The rotated prior boxes shown in Figure 4, which is in blue color. Based on the rotated prior boxes, we can predict the class name and location color.color. BasedBased on theon rotated prior boxes, can predict class name and location information isblue in blue the rotated priorwe boxes, we canthe predict the class name and location of the target in various orientations. trainingfirst the match network, we first match the of information the targetof in the various orientations. trainingWhen theWhen network, the boxes with information target in various When orientations. trainingwe the network, we prior first match the boxes withbounding the ground-truth to obtain the positive sample information, which theprior ground-truth boxesbounding to bounding obtain boxes theboxes positive sample information, which will be used prior boxes with the ground-truth to obtain the positive sample information, whichto will be used to calculate the loss. The diagram of rotated prior boxes are shown in Figure 5, the one calculate thetoloss. The diagram ofThe rotated prior of boxes are shown in Figure 5, the one matched withone the will be used calculate the loss. diagram rotated prior boxes are shown in Figure 5, the matched with the ground-truth bounding box is shown in red. ground-truth bounding box is shown in red. matched with the ground-truth bounding box is shown in red.
(a)
(a)
(b)
(b)
(c)
(c)
Figure 5. Diagrams of the ground-truth boundingand boxthe and best matching prior box.image (a) The Figure Diagrams of the bestthe matching priorprior box. (a) Figure 5.5. Diagrams the ground-truth ground-truthbounding boundingbox box and the best matching box.The (a) The image and the ground-truth box.boxes (b) Prior boxes with and the bestprior matching prior and the ground-truth box. (b) Prior linedotted andline theline best matching box with thebox image and the ground-truth box. (b) Prior with boxesdotted with dotted and the best matching prior box with the bigger ship, which is in red. (c) Prior boxes with dotted line and the best matching prior bigger which is which in red. is (c)inPrior withboxes dotted linedotted and theline bestand matching box with the with theship, bigger ship, red.boxes (c) Prior with the bestprior matching prior box with the smaller ship, which is in red. smaller which is in red. box with ship, the smaller ship, which is in red.
After splitting the large-scale image, we feed the network with images of size 600 × 600 pixels. After Aftersplitting splittingthe thelarge-scale large-scaleimage, image,we wefeed feedthe thenetwork networkwith withimages imagesof ofsize size600 600××600 600pixels. pixels. The scale of the prior box is set as {32,64,128, 256} and the aspect ratio means the ratio of h to w, The scale of the prior box is set as and the aspect ratio means the ratio of h {32,64,128, 256} The scale of the prior box is set as {32, 64, 128, 256} and the aspect ratio means the ratio of htotow,w, which is set as 5:1, so the long side h of the box is {71,142, 284,568} ; the step size of the box is 16. which whichisisset setas as5:1, 5:1,so sothe thelong longside sidehhofofthe thebox boxisis {71,142, 284, 568}; ;the thestep stepsize sizeofofthe thebox boxisis16. 16. {71, 142,284,568} Using multi-scale prior boxes, we can detect the multi-scale targets better. The rotation angle is set Using multi-scale prior boxes, we can detect the multi-scale targets better. The rotation angle is set Using multi-scale prior boxes, we can detect the multi-scale targets better. The rotation angle is set , which ignores head or direction as it difficult is difficult to identify when ship {0, as as 45,45,90,135} 90, 135}, ,which which ignores thethe head ortail tailtail direction identify when thethe ship {0,45,90,135} as ignores the head or direction asasititisisdifficult totoidentify when the ship {0, scale is small. The rotated prior boxes ofscale one are scale are shown in6.Figure 6. These parameters are scale is small. small. The of of oneone shown in Figure These parameters are chosen scale is The rotated rotatedprior priorboxes boxes scale are shown in Figure 6. These parameters are chosen based on the characteristics of the ship targets in the dataset and the network structure, based on the characteristics of the ship in the dataset anddataset the network structure, which will be chosen based on the characteristics of targets the ship targets in the and the network structure, which will be explained in details in Section 4.3. explained in details in Section 4.3. which will be explained in details in Section 4.3.
Sensors 2018, 18, 2702 Sensors 2018, 18, x FOR PEER REVIEW
6 of 14 6 of 14
90
135
45
0
Figure 6. 6. Diagram Diagram of of the the selected selected angles angles of of the the rotated rotated bounding bounding boxes. boxes. Figure
3.3. Rotated Box Matching 3.3. Rotated Box Matching Rotated box matching plays an important role in target detection, which is used in choosing Rotated box matching plays an important role in target detection, which is used in choosing positive samples in training, nonmaximum suppression in testing, and deciding whether it is the positive samples in training, nonmaximum suppression in testing, and deciding whether it is the right right detection in evaluation. Let us assume two rotated boxes, recorded as box and box2 . We detection in evaluation. Let us assume two rotated boxes, recorded as box1 and box2 .1 We use a matrix use aall matrix allthe zeros except ones in the box region to record so the we area can obtain the with zeroswith except ones in thethe box region to record the box, so we the can box, obtain of the box area of the boxthe bysum computing the sum of the The matrix Therecorded two matrices are recorded as of I1 by computing of the matrix elements. twoelements. matrices are as I1 and I2 . The areas
Area Area2 .we and1 and The2 are areas of boxas as the Tocalculate obtain the overlap I 2 . box box recorded Area1box and Arearecorded overlap the product 2 . To obtain 1 and region, 1 and 2 are from I and I . The result is recorded as matrix I , and the overlap area is the sum of the elements, 2 region,1 we calculate the product from I1 and 3I 2 . The result is recorded as matrix I 3 , and the recorded as Area . Thus, matching degree of theastwo boxes is calculated as ∩ Area overlap area is the sum ofthe the elements, recorded . Thus, the matching degree of the two boxes is calculated as
Area∩ Area∩ = (1) Area Area1 + Area Area2 − Area∩ ∪ Area matching degree = = (1) Area Area - Area When matching the rotated boxes, the calculation between prior box will lead to a large 1 + Area2each amount of calculation. Thus, we should first narrow down the selection scope. We set a selection range When matching the rotated boxes, the calculation between each prior box will lead to a large tc and calculate the distance between the center points dc = k( x1c , y1c ) − ( x2c , y2c )k, where ( x1c , y1c ) and amount of calculation. Thus, we should first narrow down the selection scope. We set a selection ( x2c , y2c ) are the center point coordinates of the two boxes. If dc < tc , we select a priorc box with the c d c ( x1By , y1cimplementing ) ( x2 , y2c ) , where range tangle the distance between the center points degree. c and nearest to calculate the ground-truth box angle to calculate the matching these c c c c strategies the calculation is significantly reduced. ( x1 , y1 ) and ( x2 , y2 ) areamount the center point coordinates of the two boxes. If dc tc , we select a prior matching degree =
box with the nearest angle to the ground-truth box angle to calculate the matching degree. By 3.4. Loss Function implementing these strategies the calculation amount is significantly reduced. There are two flow paths to output the result, one predicts the class information of the target and 3.4. other Loss Function the predicts the location and angle information. The two layers are usually called the class label output layer and location angletoregression layer. When loss, we combine of There are two flow paths output the result, one calculating predicts thethe class information of the the loss target class label output layer and location layerThe to calculate theare weighted Using and the other predicts the the location andangle angleregression information. two layers usuallysum. called the xclass = 1, 0 to record whether the ith prior box is matched with the jth ground-truth box, 1 means { } ij label output layer and location angle regression layer. When calculating the loss, we combine matched and 0 means mismatch. loss calculated as regression layer to calculate the weighted the loss of class label output layerThe and theislocation angle sum. Using xij {1,0} to record whether the 1 i th prior box is matched with the j th ground-truth L( x, c, l, g) = ( Lcls ( x, c) + αLloc ( x, l, g)), (2) N The loss is calculated as box, 1 means matched and 0 means mismatch.
where x is the matching status, c is class label1output, l is the output of location and angle offsets, g is L( x, c, l , g ) ( Lcls ( x, c) Lloc ( x,l,g )), (2) the ground-truth information, and N is the N number of prior boxes matched with the ground-truth boxes. If N = 0, the loss is 0. Lcls is the loss of class label output layer, Lloc is the loss of the location where x is the matching status, c is class label output, l is the output of location and angle and angle offset regression, and α is the weight, which is used to balance the two losses, here we set α offsets, g is the ground-truth information, and N is the number of prior boxes matched with the to 1 experimentally. ground-truth boxes. loss isiscalculated 0. Lcls is as the loss of class label output layer, Lloc is the N output 0 , thelayer The loss of class If label loss of the location and angle offset regression, and is the weight, which is used to balance the N two losses, here we set to 1 experimentally. Lcls ( x, c) = − ∑ xij log(cˆ1i ) − ∑ log(cˆ0i ), (3) The loss of class label output layer isicalculated as ∈ Pos i ∈ Neg
Sensors 2018, 18, 2702
7 of 14
cˆ1i =
exp(c1i ) ∑ p exp(c1i )
,
(4)
where ci is the output of class label layer, cˆi is the class score, cˆ1i is the probability of having a ship target in the box, and cˆ0i is the probability of being the background. When calculating the weighted sum of the log function, we should see whether it is a positive or negative prior box after matching with the ground-truth boxes. The loss of regression of location and angle offsets uses the SmoothL1 loss [11] for the calculation: N
Lloc ( x, l, g) =
∑
∑
i ∈ Pos m∈{cx,cy,w,h}
( smooth L1 ( x ) =
0.5x2
xijk smooth L1 (lim − gˆ m j ),
if | x | < 1
| x | − 0.5 otherwise,
(5)
(6)
where l is the predicted offsets of the five parameters ( xc , yc , h, w, theta) and gˆ j is the offset between the ith prior box di and the jth ground-truth box g j , as shown in Equations (7) and (8): y
y
y
gˆ xj c = ( g xj c − dixc )/diw , gˆ j c = ( g j c − di c )/dih gˆ w j
= log
gw j diw
! , gˆ hj
= log
ghj dih
(7)
! , gˆ theta = tan( gtheta − ditheta ) j j
(8)
When calculating the loss, the sum is calculated only if the prior box is matched with a ground-truth box. 3.5. Implementation Details We use Caffe [26] to build the network structure and train our deep neural network. In the training stage, we process one image at one iteration. In the image, we select the prior boxes with a matching degree larger than 0.7 or those having the largest matching degree with a ground-truth box as the positive samples. The positive number Np is not more than 64; the negative samples are randomly selected in the ones with matching degree below 0.3. The negative number N f is 128 − Np , so the total number is 128. We use the stochastic gradient descent method to optimize the training. The initial learning rate is 0.001 and the number of iterations is 80,000. After 60,000 iterations, the learning rate is reduced 10-fold. In the testing stage, we set the score threshold at 0.25 to decide whether there is a target in the box. Then, we perform the nonmaximum suppression to obtain the final result. We set the nonmaximum suppression threshold at 0.2 to remove the boxes whose matching degree is larger than 0.2 with another box of a larger score. 4. Experiments and Analysis The algorithm was tested on images with ship targets in complex backgrounds to evaluate the detection and localization performance. We tested on a properly selected dataset and compared our algorithm with those using horizontal bounding boxes, such as Faster R-CNN and SSD, to show the advantage of detection using rotated boxes. We also carried out a comparison with existing algorithms using rotated bounding boxes, such as DRBox and OBB Faster R-CNN, to show the robustness of our algorithm.
Sensors 2018, 18, 2702
8 of 14
Sensors 2018, 18, x FOR PEER REVIEW
8 of 14
4.1. Dataset 4.1. Dataset
We collected 640 remote sensing images from Google Earth. The long side of the image was We 1000 collected 640 remote sensing images Google Earth. The long side of the in image shorter than pixels. There were 4297 shipfrom targets with complex backgrounds the was images. shorter than 1000 pixels. There were 4297 ship targets with complex backgrounds in the images. The complex clutter included wave clutter, ships in close proximity, and ships close to the shore. The complex clutter included wave clutter, ships in close proximity, and ships close to the shore.
4.2. Performance Evaluation Index
4.2. Performance Evaluation Index
We used the average precision (AP) value to evaluate the performance of the algorithm. AP is We used the average precision (AP) value to evaluate the performance of the algorithm. AP is the area under the precision-recall curve. Precision is the ratio of correct detection to the total number the area under the precision-recall curve. Precision is the ratio of correct detection to the total of targets detected. Recalldetected. is the ratio of correct to the total number of ground-truth number of targets Recall is the detection ratio of correct detection to the total number targets. of Whenground-truth we change the score threshold, we will get a different precision and recall. The precision–recall targets. When we change the score threshold, we will get a different precision and curve recall. showsThe theprecision–recall precision variety following theprecision recall variety, it can reflect the overall curve shows the varietyso following the recall variety,performance so it can the overall performance of the algorithm. considering whether is a correct the detection, of thereflect algorithm. When considering whether it isWhen a correct detection, weitcalculated matching we calculated the matching degree between the detected box and the ground-truth box, as degree in degree between the detected box and the ground-truth box, as in Equation (1). If the matching Equation (1). If the matching degree is higher than 0.5, we consider it as a correct detection. When is higher than 0.5, we consider it as a correct detection. When a ground-truth bounding box hasabeen ground-truth bounding box has been matched with a detected box, then other detected boxes matched with a detected box, then other detected boxes matched with it are considered as false alarms. matched with it are considered as false alarms.
4.3. Performance of the Algorithm and Parameter Analysis
4.3. Performance of the Algorithm and Parameter Analysis
As the ships vary in scale and appearance, the algorithm should select suitable multiscale As the ships vary in scale and appearance, the algorithm should select suitable multiscale parameters and augment the the data to learn more parameters and augment data to learn moreabout aboutthe the variety. variety. 4.3.1. Dataset Characteristics Analysis and Model Selection 4.3.1. Dataset Characteristics Analysis and ModelParameter Parameter Selection First, First, we analyzed the the characteristics inthe thedataset dataset help select model we analyzed characteristicsofofthe theship ship targets targets in to to help select the the model w , the aspect parameter. We recorded long side therotated rotated box box as as h, theshort short side parameter. We recorded the the long side ofof the sideas as w, aspectratio ratio as h ,the as hthe and the / w ,angle h/w, and asangle theta.as theta . FigureFigure 7 shows the main characteristics distribution in the thedataset. dataset.The The long side h 7 shows the main characteristics distributionof ofthe the targets targets in long side is distributed mainly from 20 to 15020pixels. longest aboutis500 pixels, distributed mainly from to 150The pixels. The is longest about 500while pixels,others whileare others are h is distributed distributed and length 300. The length 50.6, and theh/w aspect distributed between 200 andbetween 300. The200 mean is mean 50.6, and theis aspect ratio is ratio distributed between h / w ismostly 3 and is 7. 12, Thewhile largestothers value are is 12,distributed while others are distributed between 8 and value 10. Theis 4.3. 3 and mostly 7. The between largest value between 8 and 10. The mean ◦ mean value is 4.3. The angle of the ships is mainly distributed around 0° and uniformly distributed The angle of the ships is mainly distributed around 0 and uniformly distributed at other angles. at other angles.
(a)
(b)
(c)
(d)
Figure 7. Cont.
Sensors 2018, 18, 2702
9 of 14
Sensors 2018, 18, x FOR PEER REVIEW
9 of 14
(e) Figure 7. Main characteristics distribution of the targets in the dataset. (a) The long side h of all Figure 7. Main characteristics distribution of the targets in the dataset. (a) The long side h of all 4279 4279 targets, (b) distribution of long side h , (c) aspect ratio h / w of all 4279 targets, (d) targets, (b) distribution of long side h, (c) aspect ratio h/w of all 4279 targets, (d) distribution of aspect distribution of aspect ratio h / w , and (e) distribution of angle theata . ratio h/w, and (e) distribution of angle theata.
After analyzing the characteristics of the targets in the dataset, we can use them to help select the parameters. Wecharacteristics set the aspectofratio to 5 sointhat the rotated prioruse boxes match the the After analyzing the the targets the dataset, we can themcan to help select ground-truth well.ratio When of prior the prior box, consider the long side boxes h parameters. We set boxes the aspect tochoosing 5 so that the the scales rotated boxes canwe match the ground-truth to be mainly distributed between 20 and 150 and the larger ones to be mainly between 200 and 300. well. When choosing the scales of the prior box, we consider the long side h to be mainly distributed Therefore, we select the scales at {32,64,128, 256} to better match the characteristics of the real between 20 and 150 and the larger ones to be mainly between 200 and 300. Therefore, we select the targets when the shorter side of the input image is bigger than 600 pixels and the longer side is scales at {32, 64, 128, 256} to better match the characteristics of the real targets when the shorter side smaller than 1000 pixels and is not preprocessed. When performing data augmentation, we rotate of the input image is bigger than 600 pixels and the longer side is smaller than 1000 pixels and is not the images and label information to increase the variability of the targets. Since the size of the input preprocessed. performing augmentation, we rotate the images label information image to When the deep network is data limited to 600–1000 pixels, the images with a and larger or smaller scale to increase the of the Since size of the input image to theadd deep network limited will bevariability preprocessed and targets. the target scalethe is changed. Therefore, we should zeros to the issmall to 600–1000 the images with largerwith or smaller scaletowill be the preprocessed thetarget. target scale is imagespixels, to expand them and splita them a large scale retain original sizeand of the changed. Therefore, we should add zeros to the small images to expand them and split them with a 4.3.2. to Model Parameter Analysis large scale retain the original size of the target. We trained the deep network with different parameters and compared the performances to
4.3.2. Model Analysis analyzeParameter the influence of the parameters.
We trained on a dataset of with 148 images withparameters 80,000 iterations and an aspect of 3. The to We trained the deep network different and compared the ratio performances performance AP was 0.091 when testing on all 640 images. When changing the aspect ratio to 5 and analyze the influence of the parameters. retaining all other parameters, the performance AP remained at 0.091. When the training dataset We trained on a dataset of 148 images with 80,000 iterations and an aspect ratio of 3. contained 200 images and the aspect ratio was 5, the performance AP increased to 0.212. These The performance AP training was 0.091 when testing on help all 640 images. When changing the aspect results show that with less data will not learn the ship’s feature well, resulting in poorratio to 5 and retaining all other parameters, the performance AP remained at 0.091. When the training performance. When training with 200 images, the algorithm is more robust because a closer datasetdistribution containedof200 images and the aspect ratio was 5, the performance AP increased to 0.212. the training and testing datasets. Then, we considered the with influence input scale learn and preprocessed the images keep These results show that training less of data willimage not help the ship’s feature well,toresulting target within When the original scale. After the preprocessing, we trained with robust a training dataset of in poortheperformance. training with 200 images, the algorithm is more because a closer 400 images with 80,000 iterations and an aspect ratio of 5. This caused the performance AP to distribution of the training and testing datasets. increase 0.362. The result shows thatofhaving training samples and maintaining the original Then, wetoconsidered the influence input more image scale and preprocessed the images to keep target scale will help the algorithm to learn more robustly and obtain better performance. the target within the original scale. After the preprocessing, we trained with a training dataset of Performances with different parameters are shown in Table 1. 400 images with 80,000 iterations and an aspect ratio of 5. This caused the performance AP to increase to 0.362. The result shows that having more training samples the original target scale Table 1. Performance comparison of our method and with maintaining different parameters. will help the algorithm to learn more robustly and obtain better performance. Performances with Number of Training Samples Aspect Ratio Keep Original Scale AP different parameters are shown in Table 1. 148 3:1 no 0.091 148 5:1 no 0.091 Table 1. Performance comparison of our method with different parameters. 200 5:1 no 0.212 400 5:1 yes 0.362 Number of Training Samples Aspect Ratio Keep Original Scale AP
148 148 200 400
3:1 5:1 5:1 5:1
no no no yes
0.091 0.091 0.212 0.362
Sensors 2018, 18, 2702
10 of 14
Sensors 2018, 18,Comparison x FOR PEER REVIEW 4.4. Performance with Algorithms Using Horizontal Bounding Boxes
10 of 14
The commonlyComparison applied methods in the target use the horizontal bounding boxes to 4.4. Performance with Algorithms Using detection Horizontal area Bounding Boxes search and detect targets, such as SSD and Faster R-CNN. Here, we compare the performances The commonly applied methods in the target detection area use the horizontal bounding boxesof the two algorithms with ourtargets, method to as show of using bounding boxes. to search and detect such SSD the andadvantage Faster R-CNN. Here, rotated we compare the performances of We trained on a dataset 400toimages an SSD algorithm testedboxes. all 640 images. the two algorithms with ourwith method show theusing advantage of using rotatedand bounding In this case, performance APwith was400 0.206. While using thealgorithm Faster R-CNN algorithm with the Wethe trained on a dataset images using an SSD and tested all 640 images. Insame this case, the performance AP was 0.206. While using the Faster R-CNN algorithm with the same parameters, the performance AP is 0.293. In our algorithm, the performance AP was 0.362. These parameters, AP rotated is 0.293. bounding In our algorithm, the performance AP characteristics was 0.362. Thesebetter results show thatthe theperformance method using boxes can learn the target results show that the method using rotated bounding boxes can learn the target characteristics and detect the rotated ships in close proximity better. Figure 8 presents a comparison of the detection better and detect the rotated ships in close proximity better. Figure 8 presents a comparison of the results between algorithms using horizontal bounding boxes and our method. These results show that detection results between algorithms using horizontal bounding boxes and our method. These both the method using horizontal boxes and that using rotated boxes can detect the ships off-shore results show that both the method using horizontal boxes and that using rotated boxes can detect well. the When theoff-shore ships are in close or close to the shore, or ourclose method bounding ships well. Whenproximity the ships are in close proximity to theusing shore,rotated our method boxesusing can detect the target better and locate it more precisely. A comparison of the performance rotated bounding boxes can detect the target better and locate it more precisely. A AP values is shown in the Table 2. comparison of performance AP values is shown in Table 2.
(a)
(b)
(c)
Figure 8. Comparison with the algorithm using horizontal bounding boxes. (a) Detection results of
Figure 8. Comparison with the algorithm using horizontal bounding boxes. (a) Detection results of SSD, (b) detection results of Faster R-CNN, and (c) detection results of our method. SSD, (b) detection results of Faster R-CNN, and (c) detection results of our method. Table 2. Performance comparison of different algorithms.
Table 2. Performance comparison of different algorithms. Method AP SSD Method 0.206 AP Faster R-CNN 0.293 SSD 0.206 DRBoxFaster R-CNN 0.305 0.293 Faster R-CNN (OBB) 0.278 DRBox 0.305 Faster R-CNN (OBB) 0.278 Our method 0.362 Our method
0.362
Sensors 2018, 18, 2702
11 of 14
Sensors 2018, 18, x FOR PEER REVIEW
11 of 14
4.5.4.5. Performance Comparison Rotated Bounding BoundingBoxes Boxes Performance Comparisonwith withOther OtherAlgorithms Algorithms Using Using Rotated WeWe compared the the performance of our with that algorithms using rotated compared performance of algorithm our algorithm withofthat of algorithms using bounding rotated boxes to show thetoadvantage our method. of the methods is DRBox, which iswhich basedison bounding boxes show the of advantage of ourOne method. One of the methods is DRBox, SSD and on combined rotated bounding to detect oriented used the based SSD andwith combined with rotated boxes bounding boxesthe to detect thetargets. orientedWe targets. We trained used the trained model provided author testimages on all 640 found the performance model provided by the authorbytothe test on allto640 andimages found and the performance AP to be AP 0.241. be 0.241. results The detection results indicated thelarge-scale model detects ships better while Thetodetection indicated that the model that detects shipslarge-scale better while many small ships many small ships were missed. Hence, we added a larger resolution scale to detect the smaller ships were missed. Hence, we added a larger resolution scale to detect the smaller ships and the performance performance APtowas found increase to We continued by resolution adding another APand wasthe found to increase 0.305. We to continued by 0.305. adding another larger scale, larger and the resolution scale, and thetoperformance dropped to 0.277. These thatnumber both the performance AP dropped 0.277. TheseAP results indicate that both theresults correctindicate detection and correct detection number and false alarm number are increasing. Overall, the performance of false alarm number are increasing. Overall, the performance of DRBox algorithm is lower than that of algorithm is accuracy lower than of our small method since is thelow accuracy of detecting smallare targets is ourDRBox method since the ofthat detecting targets and more false alarms detected. low and more false alarms are detected. Another method is the oriented bounding box (OBB) Faster Another method is the oriented bounding box (OBB) Faster R-CNN algorithm, which uses a rotated R-CNN algorithm, which uses a rotated bounding box with eight parameters to detect the oriented bounding box with eight parameters to detect the oriented targets. We tested the provided trained targets. We tested the provided trained model on all 640 images and found the performance AP to model on all 640 images and found the performance AP to be 0.278, which is also lower than that of be 0.278, which is also lower than that of our method. The main reason leading to the performance our method. The main reason leading to the performance drop is the missing of several small ships. drop is the missing of several small ships. Figure 9 shows that when detecting ships with a background of water, all three algorithms Figure 9 shows that when detecting ships with a background of water, all three algorithms can candetect detect well and locate ships precisely; the ships inproximity close proximity to the well and locate the the ships precisely; whenwhen the ships are in are close or closeor to close the shore, shore, our algorithm detects better, methods miss some detecting small our ships, our algorithm detects better, whilewhile otherother methods miss some ships;ships; when when detecting small ships, ouralgorithm algorithm performs better, while other methods miss some ships. The last column shows some performs better, while other methods miss some ships. The last column shows some false false alarms land background both DRBox method and method, while OBB Faster alarms on on thethe land background in in both DRBox method and ourour method, while thethe OBB Faster R-CNN method misses AP values valuesofofthe thedifferent differentmethods methods mentioned R-CNN method missessome someships. ships.The The performance performance AP mentioned above areare shown inin Table above shown Table2.2.
Figure 9. Cont.
Sensors 2018, 18, 2702
12 of 14
Sensors 2018, 18, x FOR PEER REVIEW
(a)
12 of 14
(b)
(c)
Figure 9. Comparison with other algorithms using rotated bounding boxes. Detection results of (a) Figure 9. Comparison with other algorithms using rotated bounding boxes. Detection results of DRBox, (b) OBB Faster R-CNN, and (c) our method. (a) DRBox, (b) OBB Faster R-CNN, and (c) our method.
4.6. Discussion
4.6. Discussion
The above experiments and analysis show that our method using multiscale rotated bounding The can above experiments show thatwhen our method using rotated bounding boxes detect the shipsand andanalysis locate precisely the ships are multiscale in complex backgrounds. boxes can detect thethe ships and locate when the ships are inour complex backgrounds. Compared Compared with method usingprecisely horizontal bounding boxes, method detects better when with the are method usingproximity horizontal boxes, our method detects better when ships in close ships in close orbounding close to the shore; compared to other methods usingare rotated proximity or close to the shore; compared to other methods using rotated bounding boxes, our method bounding boxes, our method detects ships with higher accuracy and lower missing detection rate, detects ships withdetects higher small accuracy andbetter. lowerHowever, missing detection especially detects ships and especially ships there arerate, stilland some problems, suchsmall as false better. However, there still some AP problems, false alarms; as a result, the performance AP alarms; as a result, theare performance still hassuch roomasfor improvement.
still has room for improvement. 5. Conclusions
5. Conclusions
We proposed a multiscale rotated bounding box-based deep learning method to detect oriented ships in complex backgrounds in remote sensing The algorithm usedoriented five We proposed a multiscale rotated bounding box-based deepimages. learning method to detect parameters to determine a rotated bounding box, which limits box to a rectangular shape. The to ships in complex backgrounds in remote sensing images. Thethe algorithm used five parameters algorithma predicted different positions of the feature by a pretrained determine rotated bounding box, which limits themaps box extracted to a rectangular shape. deep The network algorithm to output the classpositions label andoflocation-angle offsetextracted regression on the multiscale rotated predicted different the feature maps bybased a pretrained deep network to prior output boxes. In the training stage, we searched the best-matched prior boxes with the ground-truth boxes the class label and location-angle offset regression based on the multiscale rotated prior boxes. In the to obtain thewe positive sample information, prior whichboxes was with used the to ground-truth learn the shipboxes features. Whenthe training stage, searched the best-matched to obtain matching the rotated boxes, we first narrowed down the search scope to reduce the amount of positive sample information, which was used to learn the ship features. When matching the rotated calculation. In the testing stage, we input the testing image to the network and perform forward boxes, we first narrowed down the search scope to reduce the amount of calculation. In the testing calculation to obtain the class labels and the location-angle offsets of each position. Then, we set a stage, we input the testing image to the network and perform forward calculation to obtain the class threshold of the class score to determine whether it is a ship and perform post-processing, such as labels and the location-angle offsets of each position. Then, we set a threshold of the class score nonmaximum suppression, to obtain the final detection result. The experimental result showed that to determine whether it is a ship and perform post-processing, such as nonmaximum suppression, our method can detect ships robustly under complex conditions, such as ships in wave clutter, in to obtain the final detection result. result showed that our method location can detect ships close proximity, close to the shore,The andexperimental with different scales, and obtain an accurate of the robustly complex conditions, ships inmethods. wave clutter, in close close tostill the has shore, ships. under Our method performs bettersuch thanas the other However, theproximity, performance AP and with different scales, and obtain an accurate location of the ships. Our method performs better room for improvement, so we will study the target missing and false alarm problems in the future than the otherthe methods. However, the performance AP still has room for improvement, so we will to improve algorithm.
study the target missing and false alarm problems in the future to improve the algorithm. Author Contributions: S.L. and Z.Z. conceived and designed the experiments; S.L. performed the experiments; Author Contributions: designed the experiments; the experiments; S.L. and C.L. analyzedS.L. the and data;Z.Z. B.L.conceived contributedand material/analysis tools; S.L. andS.L. Z.Z.performed wrote the paper. S.L. and C.L. analyzed the data; B.L. contributed material/analysis tools; S.L. and Z.Z. wrote the paper. Funding: This project is supported by the National Natural Science Foundation of China (Grant No. 61101185) Funding: This project is supported by the National Natural Science Foundation of China (Grant No. 61101185) and Ministerial level Foundation (Grant No. 61425030301). and Ministerial level Foundation (Grant No. 61425030301). Acknowledgments: The authors express their gratitude to the reviewers and editors for their kind help. Acknowledgments: The authors express their gratitude to the reviewers and editors for their kind help. Conflicts Interest: The authorsdeclare declareno noconflicts conflictsof of interest. interest. Conflicts of of Interest: The authors
References
References 1.
1.
Zhu, C.; Zhou, H.; Wang, R.; Guo, J. A novel hierarchical method of ship detection from spaceborne Zhu, C.; Zhou, H.; Wang, R.; Guo, J. A novel hierarchical method of ship detection from spaceborne optical optical image based on shape and texture features. IEEE Trans. Geosci. Remote Sens. 2010, 48, 3446–3456. image based on shape and texture features. IEEE Trans. Geosci. Remote Sens. 2010, 48, 3446–3456. [CrossRef]
Sensors 2018, 18, 2702
2. 3. 4.
5. 6. 7.
8.
9. 10.
11. 12.
13.
14.
15.
16. 17.
18. 19.
20.
13 of 14
Shi, Z.; Yu, X.; Jiang, Z.; Li, B. Ship detection in high-resolution optical imagery based on anomaly detector and local shape feature. IEEE Trans. Geosci. Remote Sens. 2014, 52, 4511–4523. Xu, F.; Liu, J.; Dong, C.; Wang, X. Ship detection in optical remote sensing images based on wavelet transform and multi-level false alarm identification. Remote Sens. 2017, 9, 985. [CrossRef] Liu, G.; Zhang, Y.; Zheng, X.; Sun, X.; Fu, K.; Wang, H. A new method on inshore ship detection in high-resolution satellite images using shape and context information. IEEE Geosci. Remote Sens. Lett. 2014, 11, 617–621. [CrossRef] Xu, F.; Liu, J.; Sun, M.; Zeng, D.; Wang, X. A hierarchical maritime target detection method for optical remote sensing imagery. Remote Sens. 2017, 9, 280. [CrossRef] Nie, T.; He, B.; Bi, G.; Zhang, Y.; Wang, W. A method of ship detection under complex background. ISPRS Int. J. Geo-Inf. 2017, 6, 159. [CrossRef] Sui, H.; Song, Z. A novel ship detection method for large-scale optical satellite images based on visual LBP feature and visual attention model. In Proceedings of the 23rd International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences Congress (ISPRS 2016), Prague, Czech Republic, 12–19 July 2016; pp. 917–921. Liu, Z.; Wang, H.; Weng, L.; Yang, Y. Ship rotated bounding box space for ship extraction from high-resolution optical satellite images with complex backgrounds. IEEE Geosci. Remote Sens. Lett. 2016, 13, 1074–1078. [CrossRef] Zou, Z.; Shi, Z. Ship detection in spaceborne optical image with SVD networks. IEEE Trans. Geosci. Remote Sens. 2016, 54, 5832–5845. [CrossRef] Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the 27th IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2014), Columbus, OH, USA, 23–28 June 2014; pp. 580–587. Girshick, R. Fast R-CNN. In Proceedings of the 15th IEEE International Conference on Computer Vision, (ICCV 2015), Santiago, Chile, 11–18 December 2015; pp. 1440–1448. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proceedings of the 29th Annual Conference on Neural Information Processing Systems (NIPS 2015), Montreal, QC, Canada, 7–12 December 2015; pp. 91–99. Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. SSD: Single shot multibox detector. In Proceedings of the 14th European Conference on Computer Vision (ECCV 2016), Amsterdam, The Netherlands, 11–14 October 2016; pp. 21–37. Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 779–788. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. In Proceedings of the 26th Annual Conference on Neural Information Processing Systems (NIPS 2012), Lake Tahoe, NV, USA, 3–6 December 2012; pp. 1097–1105. Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556 Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the 28th IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. He, K.; Gkioxari, G.; Dollar, P.; Girshick, R. Mask R-CNN. In Proceedings of the 16th IEEE International Conference on Computer Vision (ICCV 2017), Venice, Italy, 22–29 October 2017; pp. 2980–2988. Zhang, R.; Yao, J.; Zhang, K.; Feng, C.; Zhang, J. S-CNN-based ship detection from high-resolution remote sensing images. In Proceedings of the 23rd International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences Congress (ISPRS 2016), Prague, Czech Republic, 12–19 July 2016; pp. 423–430. Tang, J.; Deng, C.; Huang, G.-B.; Zhao, B. Compressed-domain ship detection on spaceborne optical image using deep neural network and extreme learning machine. IEEE Trans. Geosci. Remote Sens. 2014, 53, 1174–1185. [CrossRef]
Sensors 2018, 18, 2702
21.
22.
23. 24. 25. 26.
14 of 14
Liu, Z.; Hu, J.; Weng, L.; Yang, Y. Rotated region based CNN for ship detection. In Proceedings of the 24th IEEE International Conference on Image Processing (ICIP 2017), Beijing, China, 17–20 September 2017; pp. 900–904. Liu, Z.; Liu, Y.; Weng, L.; Yang, Y. A high resolution optical satellite image dataset for ship recognition and some new baselines. In Proceedings of the International Conference on Pattern Recognition Applications and Methods, Porto, Portugal, 24–26 February 2017; pp. 324–331. Shao, X.; Li, H.; Lin, H.; Kang, X.; Lu, T. Ship detection in optical satellite image based on RX method and PCAnet. Sens. Imaging 2017, 18, 21. [CrossRef] Liu, L.; Pan, Z.; Lei, B. Learning a rotation invariant detector with rotatable bounding box. arXiv 2017, arXiv:1711.10398. Xia, G.S.; Bai, X.; Ding, J.; Zhu, Z.; Belongie, S.; Luo, J.; Datcu, M.; Pelillo, M.; Zhang, L. DOTA: A large-scale dataset for object detection in aerial images. arXiv 2017, arXiv:1711.10398. Jia, Y.; Shelhamer, E.; Donahue, J.; Karayev, S.; Long, J.; Girshick, R.; Guadarrama, S.; Darrell, T. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 2014 ACM Conference on Multimedia, Orlando, FL, USA, 3–7 November 2014. © 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).