Automatic generation of land-use maps for a spatial ... - IEEE Xplore

2 downloads 0 Views 544KB Size Report
University of Puerto Rico. San Juan, Puerto Rico. Email: [email protected]. Glenda Román. Geographic Mapping Technologies Corp. San Juan, Puerto ...
In: Stilla U, Gamba P, Juergens C, Maktav D (Eds) JURSE 2011 - Joint Urban Remote Sensing Event --- Munich, Germany, April 11-13, 2011

Automatic generation of land-use maps for a spatial decision support system for Puerto Rico Johannes van der Kwast, Josefien Delrue, Luc Bertels, Inge Uljee, Stijn Van Looy, Joan Schepens and Guy Engelen

Elias Guti`errez

Glenda Rom´an

Graduate School of Planning University of Puerto Rico San Juan, Puerto Rico Email: [email protected]

Geographic Mapping Technologies Corp. San Juan, Puerto Rico Email: [email protected]

Unit Environmental Modelling VITO Mol, Belgium Email: [email protected]

Abstract— This study proposes an automatic image processing procedure in order to facilitate regular updating of the land-use map of Puerto Rico, which is a key dataset for the Xplorah Planning Support Systems. The procedure is based on the contextual reclassification of digital high resolution aerial photographs that were preclassified using a decision tree classifier. For the contextual reclassification the Optimized Spatial Reclassification Kernel (OSPARK) is used, which is able to discriminate functional landuse classes and land cover based on the configuration of objects in a kernel. A unique property of OSPARK is that it automatically adapts the kernel size as a function of spatial variation in the neighborhood of each pixel to be classified. The processing chain has been implemented on a computer cluster, which enables parallel processing. Classification results were evaluated using independent land-use data derived from visual interpretation. It can be concluded that the procedure gives good classification results for the tiles that are used to train the algorithm, but that the extrapolation to other tiles resulted in much lower accuracies. Error sources have been identified and suggestions for improvements are given.

I. I NTRODUCTION The Xplorah Planning Support Systems, developed for the Puerto Rico Planning Board, enables planners and policy makers to forecast land-use changes as the result of various scenarios and to assess alternative planning and policy options in their fully integrated, dynamic and spatial context. The quality of land use predicted by Xplorah, as well as other landuse change models, relies heavily on the availability of high quality geographically referenced data. A high quality time series of land-use maps is necessary for calibration, validation and updating of the model. Land-use maps, however, are often lacking. Even if time series are available, inconsistencies in mapping methodologies, legends and scales often induce measured land-use changes that do not represent actual changes in land-use patterns. Furthermore, land-use maps are mainly derived from manual mapping, which is time-consuming and

expensive. This study evaluates the feasibility of using an automatic image processing procedure in order to update the land-use map to be used in Xplorah. The aim is to automatically derive land-use maps at 60 m resolution from digital aerial photographs with a classification accuracy of ≥66%. The procedure proposed in this study uses a contextual reclassification algorithm applied to a preliminary classification of digital aerial photographs. The processing chain has been implemented on a computer cluster, which enables parallel processing. II. R EMOTE SENSING AND GIS DATA In the period from October to December 2009 thousands of multispectral images were acquired over Puerto Rico, using the ADS40 SH52 digital image sensor of Fugro Earthdata, Inc. Each frame covers 10K by 10K pixels in four spectral bands (red, green, blue and near-infrared). Flying at an altitude of 2900 m, a ground resolution of 0.3 m was obtained. Histogram matching was applied during image pre-processing in order to ensure that all images have a comparable reflectance. The reference land-use data consists of the Xplorah 2010 land-use map at 60 m resolution, which has been developed as part of the Xplorah project [1]. The map is derived by means of visual interpretation using remote sensing data, supplemented with ancillary datasets. The reported accuracy of the Xplorah 2010 land-use map is 97%, although it should be noted that this land-use map is a representation of reality with its inherent uncertainties that are difficult to quantify. The goal of the remote sensing based classification, however, is to produce land-use maps similar to the Xplorah land-use map with higher temporal availability and less costs. Therefore it should be noted that the statistics derived from the comparison between the automatic classification and the reference map do not necessarily reflect disagreement with reality.

c 978-1-4244-8657-1/11/$26.00 ⃝2011 IEEE

433

In: Stilla U, Gamba P, Juergens C, Maktav D (Eds) JURSE 2011 - Joint Urban Remote Sensing Event --- Munich, Germany, April 11-13, 2011

Orthophotos 2010 10k x 10k tiles @ 0.3 m Resampling Conversion to IDL Histogram matching

Initial land‐cover map 1k x 1k tiles @ 3 m

Initial land‐cover map of training tiles

Conversion to GDAL retile PCRaster

Initial land‐cover map 3k x 3k tiles @ 3 m

OSPARK algorithm

OSPARK land‐use map 3k x 3k tiles @ 3 m

OSPARK land‐use map of training tiles

Resampling Reclass

III. T HE OSPARK ALGORITHM The Optimized SPARK (OSPARK) algorithm [2] is a contextual reclassifier, which is based on the Spatial Reclassification Kernel (SPARK, [3]). Contextual reclassifiers are based on the concept that information captured in neighboring cells or information about patterns surrounding the pixel of interest may provide useful supplementary information in the classification process [4]. Previous research [3] has demonstrated a strong relationship between the spatial structure of urban areas and its functional characteristics. The SPARK algorithm examines the local spatial patterns of land cover in a square kernel or moving window and classifies the center pixel based on the arrangement of adjacent pixels. OSPARK is an extension to SPARK in the sense that it automatically adapts the kernel size to the spatial variation detected around the pixel to be classified. The classification consists of three phases [3]: 1) Producing a land-cover map using any type of pixelbased spectral classifier from a remotely sensed image, further referred to as ‘initial land-cover map’; 2) Defining decision rules based on local, spatial patterns of land cover in typical land-use types; 3) Reclassifying the initial land-cover map into land-use types based on the decision rules of phase 2. Fig. 1 shows the flowchart of the OSPARK algorithm. The algorithm derives adjacency event matrices M by counting the frequency of the pixel-based classes positioned next to each other as well as diagonally within each template kernel. Next, the M-matrices are compared with template (T𝑘 ) matrices that are derived from kernels that are representative for the landuse classes to be derived. The similarity index, Δ𝑘 , is used as a goodness-of-fit measure:   𝑐 ∑ 𝑐 ∑  ( )2 𝑚𝑖𝑗 − 𝑡𝑘𝑖𝑗 (1) Δ𝑘 = 1 − ⎷0.5 ⋅ 𝑁 −2 ⋅ 𝑖=1 𝑗=1

where 𝑚𝑖𝑗 is the adjacency event in a 𝑐 by 𝑐 matrix M, 𝑡𝑘𝑖𝑗 is the adjacency event in a 𝑐 by 𝑐 matrix T𝑘 , which is a template matrix for land-use class 𝑘, 𝑁 is the total number of adjacency

434

Sample templates

Mosaick

OSPARK land‐use map @ 15 m @ 60 m @ 240 m

Fig. 1. Flowchart of the OSPARK algorithm. The shaded part shows the original SPARK algorithm that is iterated for a range of kernel sizes in the OSPARK algorithm [5].

Template database

Fig. 2.

Validation

Reference land‐use map @ 15 m @ 60 m @ 240 m

Flowchart of the classification procedure.

events in the kernel and 𝑐 is the number of classes in the perpixel classified input map. Δ𝑘 can range from 0 to 1. If Δ𝑘 equals 0, M is completely different from T𝑘 , while a value of 1 means that they are identical. OSPARK iteratively calculates the similarity index for kernel sizes with an apothem, i.e. distance from the center pixel to a side of a square kernel, from 1 to 𝑊 pixels. The resulting stack, consisting of 𝑊 similarity maps is analyzed by an integration operator, which assigns the class that corresponds with the optimal Δ𝑘 -value for each pixel. The optimal Δ𝑘 value is determined based on two possible cases for the evolution of Δ𝑘 with increasing kernel size [2]: 1) In the case that local maxima are present, the first local maximum above a user-defined minimum Δ𝑘 -threshold value is determined and the corresponding land-use class is assigned; 2) In the case that local maxima are absent, the curve converges to Δ𝑘 ≈ 1 and the integration operator assigns the class to the pixel when the Δ𝑘 -value changes less than 0.05 between consecutive iterations and is higher than the threshold value. The threshold prevents classification of pixels with a too low Δ𝑘 -value. The derived land-use map and a map containing the Δ𝑘 value corresponding to the optimal kernel size for each pixel are the outputs of the algorithm. IV. T HE PROCESSING CHAIN Fig. 2 shows the workflow for the automatic classification of the orthorectified aerial photographs of 2009. The procedure consists of preprocessing, building the template database, running OSPARK in batch on a computer cluster, post-processing and accuracy assessment. A. Preprocessing First the 1500 orthophoto tiles were in batch converted to the IDL ENVI image format and resampled to tiles of 1000 by 1000 pixels with 3 m resolution. Next, each tile, containing blue, green, red and near-infrared channels, was classified using a decision tree classification. The decision tree classifier

In: Stilla U, Gamba P, Juergens C, Maktav D (Eds) JURSE 2011 - Joint Urban Remote Sensing Event --- Munich, Germany, April 11-13, 2011

contingency matrices with the independent reference data sampled from the Xplorah 2010 land-use map. Based on the quality of the derived land-use maps, templates were selected or removed from the database. The final set of templates was used to classify all tiles.

1 3

2

C. Implementation on a computer cluster 0

10

20

Kilometers 40

Legend NATURAL

BEACH

FOREST

CORAL REEF

AGRICULTURE

WATER RESOURCES

CONSTRUCTION

PUBLIC AND RECREATION

MINING

UTILITIES

INDUSTRIAL

INFRASTRUCTURE

HIGH-DENSITY TRADE AND SERVICES

ROCKY CLIFFS AND SHELVES

HIGH-DENSITY RESIDENTIAL

RANGELANDS

FOREST RESERVES

LOW-DENSITY TRADE AND SERVICES

MANGROVES AND SWAMPS

LOW-DENSITY RESIDENTIAL

SEA

Fig. 3. Training tiles for building the OSPARK template database. 1 = urbanized area (San Juan), 2 = natural/rural area (around Bosque Estatal de Monte Guilarte), 3 = urbanized area (Mayag¨uez). The background map shows the OSPARK classification result.

is an unsupervised classification method that performs a multistage classification by using a series of binary decisions in order to cluster pixels. This procedure resulted in 25 classes. 3 m was considered as the most optimal resolution for the initial land-cover map as the objects could be clearly defined at this resolution, while noise introduced by unnecessary spatial detail was avoided. The initial land-cover map was retiled to tiles with 3000 by 3000 pixels and converted to the PCRaster format, which is the input for the OSPARK algorithm. Open-source utilities distributed with the Geospatial Data Abstraction Library (GDAL, http://www.gdal.org) were used to perform this. The size of the tiles was considered as optimal, since small tiles would result in many missing values after the OSPARK classification, while larger tiles could cause the system to run out of memory. Separate tiles were calculated to cover the tile edges that will have missing values after the OSPARK classification. In total 178 tiles of 3000 by 3000 pixels, 150 tiles of 100 by 3000 (row by columns) and 166 tiles of 3000 by 100 were used to classify the entire Commonwealth of Puerto Rico. B. Building the template database The OSPARK algorithm needs a database of representative template matrices. For this purpose three 3000 by 3000 tiles were selected (Fig. 3). These training tiles were selected in order to include the most important land-use classes involved in urban dynamics, but also to represent natural and rural land-use classes. The center coordinates of the template kernels were derived by stratified random sampling of 50 points within each class of the land-use map. The same procedure was followed to derive an independent set of pixels for evaluation of the contextual classification of the tiles. In order to check the quality of the selected templates and their transferability to different areas, different combinations of templates have been used in the OSPARK classifications of the three tiles. The resulting maps were evaluated using

435

The OSPARK algorithm was applied to all tiles covering Puerto Rico using the templates database derived using the procedure described in the previous section. After general preprocessing, consisting of preparing the input tiles and obtaining a good set of templates (T𝑘 ) for the database, OSPARK is run at a computer cluster. The cluster hardware consists of a server with a dual core Intel Xeon CPU (2.8 GHz) and 1 GB of RAM. The 19 nodes of the cluster each consist of 2 Intel Xeon CPU’s and between 4 and 12 GB of RAM, which allows the parallel execution of up to 144 jobs. In the current set up of the algorithm, the maximum kernel apothem (W) was set to 30 pixels, which is a trade-off between calculation time and classification accuracy. With this configuration four tiles can be parallel processed at the cluster. The OSPARK algorithm applied to each tile consists of: 1) Loading the proper tile and templates database; 2) Parallel execution of SPARK for apothems ranging from 1 to W pixels, where W = 30 in this case; 3) Running the integration operator that estimates the optimal class for each cell based on the stack of similarity maps and resampling the output from 3 to 60 m cells using a majority filter of 120 m. In step 3 also ocean and forest reserves are copied from the Xplorah 2010 land-use map to the OSPARK classification, because the ocean class does not show much dynamics and the forest reserves class is determined by policy decisions and zoning documents rather than morphology or reflective properties of the landscape. Therefore, it is not feasible to derive this class by means of remote sensing techniques. D. Postprocessing After all tiles of all four sections are calculated, a general postprocessing routine mosaickes all the classified tiles into land-use maps of Puerto Rico at 60 m resolution. V. R ESULTS A. OSPARK results for training tiles Analysis of the contingency matrices of the classification of the three tiles shows that the kappa and overall accuracy of the classification of the training tiles is not always higher than 66%. The producer’s and user’s accuracy of the individual classes show that some classes can be retrieved at an accuracy higher than 66%, while others are classified with a lower accuracy. The results vary per training tile. In training tile 1 the classes construction, mining, residential, sea, beach, water resources and utilities have a producer’s and user’s accuracy higher than 0.5. Other classes show a higher level of confusion. Training tile 2 shows a better result, but many classes are not present in the scene that covers mainly an agricultural and

In: Stilla U, Gamba P, Juergens C, Maktav D (Eds) JURSE 2011 - Joint Urban Remote Sensing Event --- Munich, Germany, April 11-13, 2011

forested area. Good results were obtained for the classes forest, trade and services, residential, water resources, public and recreation and rangelands. For training tile 3 good results were obtained for urban classes: construction, industry, residential, public and recreation, utilities, and infrastructure. In addition, good results were also obtained for the non-urban classes forest, agriculture, mangroves and swamps, sea, beach, water resources and rangelands. An optimal database of templates was derived by trial-anderror based on the analysis of these three tiles. The optimal database was used to classify the entire Commonwealth of Puerto Rico. B. OSPARK results for all tiles In approximately one month time all tiles were processed by the computer cluster (Fig. 3). The overall accuracy is 66% and the kappa value is 0.57. The high figures are however biased by the large area of sea and forest reserves that are not taken into account by OSPARK, but directly derived from the Xplorah 2010 land-use map. A more detailed analysis of the accuracy reveals that most classes have a low user’s and producer’s accuracy. Exceptions are the relatively high user’s and producer’s accuracy for the forest and residential classes. Water resources and public- and recreation facilities can be derived with an acceptable user’s accuracy, although their producer’s accuracy is low. VI. D ISCUSSION AND CONCLUSIONS In this study the feasibility of using a fully automated landuse classification procedure applied to high resolution remote sensing images has been investigated. A processing chain has been described for (1) preprocessing the aerial photographs, (2) performing a pre-classification of the blue, green, red and near-infrared channels of the orthomosaic based on a decision tree classification, (3) training of the OSPARK algorithm using three training tiles covering important land-use types, and (4) running the algorithm on a computer cluster in order to improve the calculation times by parallel processing of the kernels. Results of the classification procedure were compared with the Xplorah 2010 land-use classification, which has a reported overall accuracy of 97%. Although the results for the individual training tiles were promising and gave acceptable results for most land-use classes, the application of the algorithm to the entire Commonwealth of Puerto Rico resulted in a much lower accuracy for most classes. Classes that can be inferred with an acceptable accuracy using the proposed procedure are: forest, residential, water resources, and public and recreation. The overall accuracy was 66%. This value is, however, biased by sea and forest reserve classes that were not derived by the OSPARK classification, but were copied from the reference map. The errors in the classification can be attributed to different sources. The main source of errors is caused by the templates database that is used. Although the templates in the database gave good results for the three training tiles, the results for

436

the entire Commonwealth of Puerto Rico indicate that the templates were not representative for all tiles and could not be extrapolated. Further research should focus on a better training of the template database, using statistical or machine learning techniques. It should also be investigated if it is feasible to classify Puerto Rico with only one representative set of templates or if a spatial stratification would yield better classification results. Other sources of errors could be introduced by the maximum kernel size, which choice is a trade-off between calculation time and accuracy. Furthermore, the resolution of 3 m chosen for the initial land-cover map has an impact on the detection of homogeneous objects and consequently on the configuration of objects within a kernel to be classified by OSPARK. This problem is aggravated by the comparison of the automatically interpreted land-use map with the Xplorah 2010 land-use map, which is generated at 15 m resolution by means of visual interpretation. The visual interpretation will, based on human insight, generalize areas featuring a salt-andpepper structure in the most meaningful land uses covering larger, contiguous areas, while the automatic classification will consider the individual cells as meaningfull contributors to each template analyzed. Examples of such generalizations are described in [1]. Other errors could be introduced by the histogram matching of the aerial photographs, which might cause a different illumination in the different regions. Future studies should also investigate these causes of inaccuracies. In general it can be concluded that the automatic derivation of 18 land-use classes by means of remote sensing techniques remains a challenge. The proposed processing chain, however, can contribute to more advanced methods of classification that can increase the time interval between land-use maps, while reducing the production costs compared to the labor-intensive manual map production. ACKNOWLEDGMENT The research presented in this paper is funded by the Graduate School of Planning / University of Puerto Rico in the frame of the Xplorah project. The reference land-use data were made available by GMT Corp. R EFERENCES [1] G. Rom´an, A. Castro, and E. Carreras, “Generation of land-use maps required for the implementation phase of a spatial decision support system for puerto rico: Xplorah 2010 land-use map,” Geographic Mapping Technologies Corporation, San Juan, Puerto Rico, Tech. Rep., 2010. [2] J. van der Kwast, T. van de Voorde, F. Canters, I. Uljee, S. van Looy, and G. Engelen, “Inferring urban land use using the optimised spatial reclassification kernel (OSPARK),” Environmental Modelling & Software, in review. [3] M. Barnsley and S. Barr, “Inferring urban land use from satellite sensor images using kernel-based analysis and classification,” Photogramm. Eng. Rem. S., vol. 62, no. 8, pp. 949–958, 1996. [4] S. M. de Jong and F. van der Meer, Remote sensing image analysis, including the spatial domain, ser. Remote sensing and digital image processing. 5: Kluwer academic publishers, 2004. [5] J. van der Kwast, T. van de Voorde, F. Canters, G. Engelen, and C. Lavalle, “Using remote sensing derived spatial metrics for the calibration of land-use change models,” in IEEE Proceedings of the 7th International Urban Remote Sensing Conference (URS 2009). Shanghai: IEEE, 2009.

Suggest Documents