Fully convolutional networks for the classification of aerial VHR imagery Nicholus Mboga, Stefanos Georganos, Tais Grippa, Moritz Lennert, Sabine Vanhuysse, Eléonore Wolff1 1. Department of Geoscience, Environment and Society, Université Libre De Bruxelles (ULB), Avenue Franklin Roosevelt 50-1050, Brussels, Belgium, University nmboga, sgeorgan, tgrippa, mlennert, svhuysse,
[email protected] ABSTRACT. Remote sensing images provide the capability to obtain information of various land cover classes. This information is useful to the process of urban planning, socio-economic modelling and population studies. There is a need for accurate and efficient methods to obtain this information, particularly in regions where there is a scarcity of reference data. In this work, we develop a methodology based on fully convolutional networks (FCN) that is trained in an end-to-end fashion using aerial RGB images only as input. The experiments are conducted on the city of Goma in the Democratic Republic of Congo. We compare the results to a stateof-the art approach based on a semi-automatic Geographic object image-based analysis (GEOBIA) processing chain. State-of-the-art classification accuracies are obtained by both methods whereby FCN and the best baseline method have an overall accuracy of 91.24% and 89.34% respectively. The maps have good visual quality and the use of a FCN skip architecture minimizes the rounded edges that is characteristic of FCN maps. Finally, additional experiments are done to refine FCN classified maps using segments obtained from GEOBIA. This resulted in improved edge delineation in the FCN maps, and future work will involve explicitly incorporating boundary information from the GEOBIA segmentation into the FCN pipeline in an end-to-end fashion. KEYWORDS: fully convolutional networks, deep learning, very high resolution aerial imagery, GEOBIA
GEOBIA'2018 – Montpellier, 18-22 June 2018
Fully convolutional networks for the classification of aerial VHR imagery
2
1. Introduction Advances in remote sensing technology have led to an increase in the availability of a large volume of remote sensing data with a high spatial resolution. Consequently, the capability of making precise and accurate land cover maps of urban areas has increased. Land cover (LC) maps are useful for various applications such as urban planning, population modelling and socio-economic analysis. In developing countries, such spatial data is often lacking. In addition, their heterogeneous urban structure where roofs of buildings have various sizes and are made of different roof materials increases the challenge of mapping. Very high spatial resolution remote sensing images (VHR) are desirable for urban mapping because they allow for mapping of a higher level of thematic detail of landcover and have a synoptic view of the earth surface. The enormous amount of data and challenging urban fabric in developing countries creates the need for efficient processing algorithms. VHR imagery often contains limited spectral information. Thus, standard pixelbased classifiers perform poorly because of the high intra-class and low inter-class variance. On the other hand, the pixels can be initially grouped into segments according to some homogeneity criteria, which are then labelled using a trained classifier (Hay & Castilla, 2008). This is the principle of Geographical Object-Based Image analysis (GEOBIA) methods which usually perform better than pixel-based classifiers because segments have additional information compared to individual pixels (Blaschke, 2010). The extraction of spatial-contextual features has been shown to improve the classification performance of both approaches. Hand-crafting of features is limited by the high number of free parameters that need to be optimized, the need for domain knowledge and the amount of effort involved. On the other hand, deep learning allows for the automatic learning of the spatial-contextual features from the input data (LeCun et al., 2015). Convolutional neural networks (CNNs) have performed well in image classification tasks in the computer vision domain (Krizhevsky et al., 2012). Recently, they are being applied in the analysis of remote sensing data. CNNs learn spatial-contextual features in a hierarchical fashion from simple features in the lower layers to abstract features in the deeper layers. In Patch-based architectures, the central pixel is assigned a label for a given input patch (Bergado et al., 2016). Alternatively, a fully labelled image patch is used for training a fully convolutional network (FCN) architecture (Shelhamer et al., 2017). The FCN is more computationally efficient when training and testing tiles because no redundant operations are performed on neighboring patches. During inference, the FCN can take in an input of varied spatial dimensions and make a dense prediction (i.e. per-pixel classification) (Sherrah, 2016). In this paper, we investigate the fully convolutional network with skip architecture for an urban LC mapping application in the city of Goma, Democratic Republic of Congo. Such an architecture aims to recover the primitive features that have been learnt in the lower convolutional layers of the network for better edge definition (Shelhamer et al., 2017). In addition, we use dilated convolutions that allow enlarging of the receptive field without an increase in the number of parameters instead of using
Fully convolutional networks for the classification of aerial VHR imagery
3
downsampling layers. The output of each convolutional layer has the same spatial resolution as the input image (Persello & Stein, 2017; Yu & Koltun, 2016). Secondly, we perform a classification using a state-of-the-art semi-automatic GEOBIA processing chain and use its results as the baseline approach (Grippa et al., 2017). We then compare the classification results of both approaches as few works compare CNN to GEOBIA in a detailed fashion. For instance, Guirado & Tabik (2017) perform object detection of a protected vegetation species from Google Earth® images by fine-tuning existing patch-based CNN architectures. An FCN with downsampling layers is investigated for the detection of an invasive grass species (i.e. Cogon Grass) in a wetland whereby the main contributions are evaluation of the effect of background information surrounding an object of interest to the FCN classification accuracy and impact of several data augmentation strategies. A refinement of the classification from FCN using segments and assigning a majority class for each of the underlying segments is done (Liu et al., 2018; Liu & Abd-elrahman, 2018). Similarly, an urban mapping application is evaluated in Längkvist et al. (2016) using a patchbased CNN and the results post-processed using segments derived from SLIC algorithm. In this work, we aim to investigate post-processing techniques to improve the boundary definition in the FCN maps with a background that CNN maps usually have smoothed straight edges and rounded sharp corners (Marmanis et al., 2018). The work of Zhao et al. (2017) is close to our work as it aims at using segmentation to minimize this rounding edge effect. However, they use a FCN that has downsampling which reduces the spatial resolution of the feature maps. Thus, we extend the concept of using the segments obtained from OBIA to improve the crispness of the boundaries of the classification map obtained from the FCN. The contribution of our work is that our FCN makes use of dilated convolutions, a skip architecture and the post-processing is done with an aim to minimize the rounded corners and smoothing of straight edges in the FCN maps. In addition, to the best of our knowledge, it is the first application of FCN and comparison for multiclass LC classification of an urban environment in an African city. The rest of the paper is organized as follows: Section 2 describes the data used and the methodology, Section 3 presents the results while Section 4 provides the discussion, summary and future works. 2. Methodology 2.1. Data description For the experiments, an aerial VHR multispectral image acquired over Goma City, the Democratic Republic of Congo, in 2018 is used. The image has been orthorectified and comprises of three bands namely Red, Green and Blue and has a spatial resolution of 0.175 m. Goma is a city in the North Kivu province and has an approximate population of 800,000 inhabitants. There is a limited availability of high spatial resolution land cover products for the area (Michellier et al., 2016). The study area is presented in FIGURE 1 and shows Tile A and Tile B from which training and testing samples are drawn respectively. Each of the tiles covers an area of 1361×1635 m. We
Fully convolutional networks for the classification of aerial VHR imagery
4
consider five landcover classes namely Buildings (BU), Vegetation (VG), Bare Soil (BS), Impervious Surface (IS) and Shadows (SH). 3000 sampling locations are used to provide the training data for both the FCN and OBIA approach. In the FCN approach, an image patch of 33 × 33 pixels is extracted around each location and a corresponding ground reference image prepared through visual image interpretation. For the GEOBIA, a segment is extracted and labelled for each of the location. During testing, 1000 randomly selected and labelled pixels from Tile B are used to evaluate the classification methods.
FIGURE 1: Map illustrating the study area of Goma, Democratic Republic of Congo. Tile A and Tile B are the training and testing tiles respectively. 2.2. FCN with dilated convolutions and skip architecture The architecture of the implemented FCN is shown in FIGURE 2a. The building blocks of the architecture are convolutional layers. We make use of dilated (atrous) convolutions that involves interspersing a convolution filter with zeros to increase the receptive field of view without raising the number of parameters (Persello & Stein, 2017; Yu & Koltun, 2016). For a given filter having spatial dimensions of f × f, a dilated convolution with a rate, r introduces r-1 zeros between the consecutive elements of the filter. The effective filter dimensions become fe = f + (f-1) (r-1). An illustration of a convolutional layer with a dilation rate of 1 and 2 is illustrated in FIGURE 2b.
Fully convolutional networks for the classification of aerial VHR imagery
5
a
b) a) r = 1
b) r = 2
FIGURE 2: (a) Illustration of implemented FCN and (b) dilated convolution with r=1 and r=2 and filter size, f=3. Key-MP (Maxpooling), RELU (Rectified linear unit), BN (Batch Normalization) Each convolutional layer is comprised of d filters with a spatial dimension of f × f. During a convolution, the filters are shifted through a given number of steps, also called the stride, s of the convolution. Weights and biases are the learnable parameters of the filters. The FCN takes as input a raw image patch with dimension of m × m × d where m is the spatial dimension and d represents the number of channels in the input. A rectified linear unit (RELU) activation function, defined as f(x)=max(0,x) is applied to introduce nonlinearities to the output of the convolution (Glorot & Bengio, 2010). Batch normalization is then applied to minimize the internal covariate shift during training (Ioffe & Szegedy, 2015). Pooling ensures that a dominant signal is propagated to the subsequent layer in the FCN network. In our FCN, we use Maxpooling (MP) with a window size of 2 × 2 and stride s=1, which ensures that the feature maps are not downsampled. The receptive field of view is rather increased by using dilated convolutions whereby the dilation rate r=2 in each of the dilated convolutional layers. The spatial dimensions of the feature maps (i.e. outputs of the convolutional layers) m - fe +2z m - fe +2z are controlled using zero padding and have a dimension of × ×k s s where z is the number of zeros used to pad the input, fe is the effective filter dimension after dilation, s is the stride of the convolution and k is the number of channels in the feature map. In the FCN implementation of this paper, the first two convolutional layers have filters with dimension of 64 × 5 × 5 while the subsequent four have filters with a dimension of 64 × 3 × 3 similar to the architecture in (Sherrah, 2016). During training, the classification loss is given by the cross-entropy function given by: n
c
1 (i) (i) L(w)= - ∑ ∑ yk ln(ŷ k ) n i=1 k=1
(1)
Fully convolutional networks for the classification of aerial VHR imagery
6
where n is the number of pixels in mini-batch (i.e. a small subset of the training data), (i) (i) c is the number of classes, yk is the vector of true labels and ŷk is the vector of predicted labels. We use stochastic gradient descent (SGD) with backpropagation algorithm to optimize the learning using a mini-batch of size 128, in 100 epochs and a momentum of 0.9 (Bottou, 2010; Goodfellow et al., 2016). The learning rate is set at 0.1 for the first 50 epochs and 0.001 for the subsequent epochs. Training is done from scratch and takes approximately 10 minutes using an 8 GB NVDIA® GTX 1080 GPU. Skip connections are used to concatenate the feature maps of the first six convolutional layer and use them as the input to the last convolutional layer. It involves fusing features from low layers with abstract features from higher layers which is useful in recovering the primitive features learnt in the lower convolutional layers to allow for more regular edges in the classified maps. CNN learns features in a hierarchical fashion, and the abstract features poorly detect object contours and edges (Chen et al., 2018; Marmanis et al., 2018). The last convolutional layer has filters of dimension 5×1×1 and is followed by a softmax activation function that gives the class distribution of scores for each input pixel 𝑥𝑖 . p(yi |xi )=
exp(xi ) c ∑i=1 exp(xi )
(2)
At inference, we split the input image into strips of 100 × 7780 pixels that can be loaded onto the GPU memory to obtain a fully labelled map of the test tile. Inference for Tile B takes approximately ten minutes. Our FCN is implemented using the opensource libraries namely Keras and Theano (Theano Development Team, 2016). 2.3. GEOBIA Semi-Automatic processing chain The GEOBIA methodology is implemented using an opensource semi-automated processing chain (Grippa et al., 2017) which integrates GRASS GIS with Python and R programming languages (Neteler et al., 2012; R Core Team, 2015). The GRASS module “i.segment” based on region-growing algorithm is used for segmentation (Momsen et al., 2015). The quality of a selected segmentation can have a significant impact on the accuracy of the classification (Espindola et al., 2006) and as such, a robust and time efficient unsupervised segmentation parameter optimization (USPO) approach was undertaken (Georganos et al., 2018b). Ideally, an optimal segmentation should minimize over- and under-segmentation. In the literature, Moran’s I (MI) is used to describe the spectral variability between a segment and its neighbors and is considered an oversegementation goodness metric while weighted variance (WV) describes the variability within a segment and is considered an undersegmentation goodness metric. From a set of candidate segmentations, we selected the one that maximized the objective function of the F-score (Johnson & Xie, 2011) which maximizes inter-segment heterogeneity and minimizes intersegment heterogeneity by using normalized Weighted Variance and Global Moran’s I defined as: MIn =
MImax − MI MImax − MImin
(3)
Fully convolutional networks for the classification of aerial VHR imagery
WVn =
WVmax − WV
7
(4)
WVmax − WVmin
where WVn is the normalized WV (or MI), WVmax is the highest WV (or MI) value of all examined segmentations, 𝑊𝑉min represents the lowest WV (or MI) value of all selected segmentations and WV describes the WV (or MI) value of the current segmentation (Espindola et al., 2006). The F-measure is given as: Fopt = (1+a2)
WVmax − WV 𝑎2 ∗ WVmax − WVmin
(5)
3000 objects were randomly sampled and visually labelled over the training region. The calculations of the F-measure for the candidate segmentations were performed through the “i.segment.uspo” module in GRASS (Lennert et al., 2016). Several spectral descriptive features were calculated on each segment (minimum, median, mean, 1st, 3rd Quantiles, maximum, range, standard deviation, variance) derived from each of the RGB bands along with geometrical covariates (compactness, fractal dimension, perimeter, area). Feature selection was performed to select the final informative features for use in the training of the classification models (Georganos et al., 2017). The selected features were then used as input to various state-of-the-art supervised machine learning (ML) classifiers, namely Extreme Gradient Boosting (Xgboost), Random Forest (RF) and Support Vector Machines (SVM). The ML classifiers parameters were optimized by Bayesian optimization for Xgboost (Georganos et al., 2018a), minimization of the Out-of-Bag (OOB) error for RF, and 10-fold cross-validation for SVM. For brevity, the classifications of segments with SVM, RF and Xgboost are abbreviated as SVM-SEG, RF-SEG and XGB-SEG respectively. 2.4. Refining FCN maps using GEOBIA segments This step involves overlaying the segments generated in Section 2.3 with the classified map from FCN. Then each segment is labelled with the majority class of the pixels within this segment. (Zhao et al., 2017). This approach is abbreviated as FCN-SEG and is illustrated using a FIGURE 3.
FIGURE 3: Illustration of the FCN-SEG. In (a) the segments from OBIA are overlaid over the raw image, in (b) the majority class of pixels from the classified FCN map is assigned to each segment and in (c), the final refined map is shown. The classes: BDbuilding, VG-vegetation, BS-bare soil, IS-impervious surface and SH-shadows.
Fully convolutional networks for the classification of aerial VHR imagery
8
3. Results Accuracy assessment is carried out on an independent test set of 1000 points drawn from Tile B. In TABLE 1 the fully convolutional network with skip architecture has a producer accuracy (PA) of 94.78 % in buildings while the highest PA for the GEOBIA approach (i.e. XGB-SEG) is 90.23%. The FCN and FCN-SEG methods have high PA for in the building and impervious surface classes. We observe that refining FCN with segments improves slightly the PA of the building and impervious surface classes, although it does not result in a significant change in the overall accuracy (OA). On the other hand, SVM-SEG, RF-SEG, and XGB-SEG have a high PA for the bare soil and shadows classes. The evaluated approaches have high overall accuracies on the independent test as the lowest classification is given by SVM-SEG at 87.82 %. TABLE 1: Producer accuracy and overall accuracy (OA) of the evaluated methods
FCN FCN-SEG SVM-SEG RF-SEG XGB-SEG
Building (BD)% 94.78 95.11 87.62 89.58 90.23
Vegetation (VG)% 96.55 93.11 92.52 93.68 92.53
Bare Soil (BS)% 75.21 65.81 80.34 82.91 84.61
Imp Surface (IS)% 93.27 96.30 88.88 87.88 88.21
Shadows (SH)% 83.84 75.76 85.86 92.93 90.91
OA % 91.24 89.7 87.82 89.33 89.34
Sample scenes from classification maps of XGB-SEG, FCN and FCN-SEG, which had the highest overall accuracy are presented in FIGURE 4 for illustration purposes. Visually, the maps from the three approaches have a high quality because the classes are well distinguished. The maps from FCN have some effect of rounding of edges and smoothing of straight edges. However, these effects are less pronounced because of the use of a FCN with skip architecture. The maps from GEOBIA, i.e. XGB-SEG have better defined edges and corners. However, the maps are sensitive to the quality of the segmentation step, as shown in Row (d) and Row (c). Moreover, our GEOBIAbased approach is sensitive to the corrugated iron roof structure and its orientation as shown by the stripes present in the building class Row (d). We also evaluated the impact of refining the FCN classified maps using segments from GEOBIA. In Row (c) it is evident that smooth edges and rounded corners become well defined after refinement with segments. However, in Row (d), we observe that refining the FCN classified map with segments introduces stripes. The FCN map also has misclassified building pixels as bare soil as observed in Row (b).
Fully convolutional networks for the classification of aerial VHR imagery
RGB
XGBOOST
FCN
9
FCN-SEG
a
b
c
d
FIGURE 4: Classification maps of sample scenes for XGB-SEG, FCN and FCN-SEG methods. BD-Building, VG- Vegetation, BS- Bare Soil, IS- Impervious surfaces, SHShadows 4. Discussion and Conclusion From the experiments conducted in this paper, high classification accuracy and good visual quality of classified maps is observed for the evaluated approaches. The FCN learns spatial-contextual features directly from the raw input image in a hierarchical fashion which gives it a better generalization capability. The effect of smoothed edges and rounded corners in FCN maps was minimized using skip architecture within the FCN. Skip architectures can recover the basic features learnt in the low convolutional layers by infusing the primitive features from the low layers with the abstract features in the deeper layers, hence better edge detection in the classified map (Chen et al., 2018). The baseline method used performed well because segments have
Fully convolutional networks for the classification of aerial VHR imagery
10
homogeneous characteristics that are more descriptive than individual pixels (Blaschke et al., 2014). The heterogeneous roof structure with different roofing materials such as dust and rust deposits could explain some misclassification in the GEOBIA approach. Lastly, we investigate the effect of refining the classification result of FCN using segments generated from GEOBIA. In most cases, there is an improved boundary definition of classes such as buildings. However, the refinement is sensitive to the quality of the segmentation especially in areas where there was undersegmentation. It is sensitive the tuning of the segmentation parameters (Blaschke et al., 2014). In addition, better performance could be achieved if the quality of the FCN classification is high with less misclassified pixels. In Marmanis et al. (2018), an edge detector is explicitly incorporated in the FCN classification pipeline. We suggest that for future works, explicit incorporation of the information from segmentation into the FCN classification be investigated. In this work, we have investigated utility of deep fully convolutional networks for the classification of VHR aerial images of an urban environment in Goma, The Democratic Republic of Congo. It is the first application of FCN and comparison for multi-class classification for an urban environment of an African city. We also compared the classification results to a state-of-the-art semi-automatic GEOBIA processing chain. We evaluate the accuracy on an independent tile from which no training samples were derived. Lastly, we demonstrate how to improve boundary definition of FCN classification results by using segments which was beneficial for the buildings and impervious ground surfaces. However, the impact of the quality of the segmentation as a tool for improving the FCN results needs to be researched further. With some post-processing, the classified maps can be used in the subsequent tasks such as socio-economic mapping. Future work will investigate how to explicitly incorporate segmentation results into the deep learning framework of an FCN and further evaluating strategies of transferability of the approach. Acknowledgements: We acknowledge the Royal Museum for Central Africa (RMCA) for providing the imagery of Goma. The work presented in the paper is funded by Belgian Science Policy Office (BELSPO) in the frame of the PASTECA project (BR/165/A3/PASTECA). References Bergado, J. R., Persello, C., & Gevaert, C. (2016). A deep learning approach to the classification of sub-decimeter resolution aerial images. In 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS) (pp. 1516–1519). Beijing. Blaschke, T. (2010). Object based image analysis for remote sensing. ISPRS Journal of Photogrammetry and Remote Sensing, 65(1), 2–16. Blaschke, T., Hay, G. J., Kelly, M., Lang, S., Hofmann, P., Addink, E., … Tiede, D. (2014). Geographic Object-Based Image Analysis - Towards a new paradigm. ISPRS Journal of
Fully convolutional networks for the classification of aerial VHR imagery
11
Photogrammetry and Remote Sensing, 87, 180–191. Bottou, L. (2010). Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT’2010 (pp. 177–186). Physica-Verlag HD. Chen, L., Papandreou, G., Member, S., Kokkinos, I., Murphy, K., & Yuille, A. L. (2018). DeepLab : Semantic Image Segmentation with Deep Convolutional Nets , Atrous Convolution , and Fully Connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4), 1–14. Espindola, G. M., Camara, G., Reis, I. A., Bins, L. S., & Monteiro, A. M. (2006). Parameter selection for region‐growing image segmentation algorithms using spatial autocorrelation. International Journal of Remote Sensing, 27(14), 3035–3040. Georganos, S., Grippa, T., Vanhuysse, S., Lennert, M., Shimoni, M., Kalogirou, S., & Wolff, E. (2017). Less is more: optimizing classification performance through feature selection in a very-high-resolution remote sensing object-based urban application. GIScience & Remote Sensing, 1–22. Georganos, S., Grippa, T., Vanhuysse, S., Lennert, M., Shimoni, M., & Wolff, E. (2018a). Very High Resolution Object-Based Land Use-Land Cover Urban Classification Using Extreme Gradient Boosting. IEEE Geoscience and Remote Sensing Letters, 15(4). Georganos, S., Lennert, M., Grippa, T., Vanhuysse, S., Johnson, B., & Wolff, E. (2018b). Normalization in Unsupervised Segmentation Parameter Optimization: A Solution Based on Local Regression Trend Analysis. Remote Sensing, 10(2), 222. Glorot, X., & Bengio, Y. (2010). Understanding the difficulty of training deep feedforward neural networks. Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (AISTATS), 9, 249–256. Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. Retrieved from http://www.deeplearningbook.org Grippa, T., Lennert, M., Beaumont, B., Vanhuysse, S., & Stephenne, N. (2017). An OpenSource Semi-Automated Processing Chain for Urban Object-Based Classification. Remote Sensing, 9(4). Guirado, E., & Tabik, S. (2017). Deep-learning Versus OBIA for Scattered Shrub Detection with Google Earth Imagery : Ziziphus lotus as Case Study. Remote Sensing, 9(12), 1–22. Hay, G. J., & Castilla, G. (2008). Geographic Object-Based Image Analysis ( GEOBIA ): A new name for a new discipline. In Blaschke T., Lang S., & H. G. (Eds.), Object-Based Image Analysis. Lecture Notes in Geoinformation and Cartography. Springer, Berlin, Heidelberg. Ioffe, S., & Szegedy, C. (2015). Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. arXiv:1502.03167, 1–11. Johnson, B., & Xie, Z. (2011). Unsupervised image segmentation evaluation and refinement using a multi-scale approach. ISPRS Journal of Photogrammetry and Remote Sensing, 66(4), 473–483. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet Classification with Deep Convolutional Neural Networks. In F. Pereira, C. J. C. Burges, L. Bottou, & K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems 25 (pp. 1097– 1105). Curran Associates, Inc.
Fully convolutional networks for the classification of aerial VHR imagery
12
Längkvist, M., Kiselev, A., Alirezaie, M., & Loutfi, A. (2016). Classification and Segmentation of Satellite Orthoimagery Using Convolutional Neural Networks. Remote Sensing, 8(4), 329. LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444. Lennert, M. ., & Team, G. D. (2016). Addon i.segment.uspo. Geographic Resources Analysis Support System (GRASS) SoftwareVersion 7.3. Liu, T., & Abd-elrahman, A. (2018). An Object-Based Image Analysis Method for Enhancing Classification of Land Covers Using Fully Convolutional Networks and Multi-View Images of Small Unmanned Aerial System. Remote Sensing, 10(3). Liu, T., Abd-elrahman, A., Jon, M., Wilhelm, V. L., Liu, T., Abd-elrahman, A., … Wilhelm, V. L. (2018). Comparing Fully Convolutional Networks, Random Forest, Support Vector Machine, and Patch-based Deep Convolutional Neural Networks for Object-based Wetland Mapping using Images from small Unmanned Aircraft System. GIScience & Remote Sensing, 55(2), 243–264. Marmanis, D., Schindler, K., Wegner, J. D., Galliani, S., Datcu, M., & Stilla, U. (2018). Classification with an edge : Improving semantic image segmentation with boundary detection. ISPRS Journal of Photogrammetry and Remote Sensing, 135, 158–172. Michellier, C., Pigeon, P., Kervyn, F., & Wolff, E. (2016). Contextualizing vulnerability assessment : a support to geo-risk management in central Africa. Natural Hazards, 82(1), 27–42. Momsen, E., Metz, M., & TEAM, G. D. (2015). Module i.segment. Geographic Resources Analysis Support System (GRASS) Software, Version 7.0. Open Source Geospatial Foundation. Neteler, M., Bowman, M. H., Landa, M., & Metz, M. (2012). GRASS GIS: A multi-purpose open source GIS. Environmental Modelling & Software, 31, 124–130. Persello, C., & Stein, A. (2017). Deep Fully Convolutional Networks for the Detection of Informal Settlements in VHR Images, 14(12), 2325–2329. R Core Team. (2015). R: A Language and Environment for Statistical Computing. Vienna, Austria. Retrieved from https://www.r-project.org/ Shelhamer, E., Long, J., & Darrell, T. (2017). Fully Convolutional Networks for Semantic Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(4), 640–651. Sherrah, J. (2016). Fully Convolutional Networks for Dense Semantic Labelling of HighResolution Aerial Imagery, 1–22. Retrieved from http://arxiv.org/abs/1606.02585 Theano Development Team. (2016). Theano: A {Python} framework for fast computation of mathematical expressions. arXiv E-Prints, abs/1605.0. Retrieved from http://arxiv.org/abs/1605.02688 Yu, F., & Koltun, V. (2016). Multi-scale Context Aggregation By Dilated Convolutions. In International Conference on Learning and Representations (pp. 1–13). Zhao, W., Du, S., Wang, Q., & Emery, W. J. (2017). Contextually guided very-high-resolution imagery classification with semantic segments. ISPRS Journal of Photogrammetry and Remote Sensing, 132, 48–60.