automated batch processing of mass remote sensing ... - IEEE Xplore

4 downloads 364 Views 379KB Size Report
GDAL data model and python and GDAL Library to convert HDF5 format ... execution of automated batch processing that converted and prepared mass data ...
AUTOMATED BATCH PROCESSING OF MASS REMOTE SENSING AND GEOSPATIAL DATA TO MEET THE NEEDS OF END USERS Jianjun Zhao1, Yeqiao Wang2,*, Hongyan Zhang1 1. School of Urban and Environmental Science, Northeast Normal University, Changchun, China 2. Department of Natural Resources Science, University of Rhode Island, Kingston, RI 02881, USA * Corresponding author: [email protected] ABSTRACT In this paper we describe a framework of GIS-based system for automated processing of mass remote sensing and geospatial data products as a step in preparation of data for the needs of end users. In particular we employed the GDAL data model and python and GDAL Library to convert HDF5 format data into standard GIS format. We then batch processed all the data to a targeted data type using python coding. Finally we integrated all related statistics of the data into Microsoft Excel worksheet files or ASCII files use the C# programming language. Index Terms - Mass remote sensing data, GDAL, HDF5, TOPS, Appalachian Trail. 1. INTRODUCTION The rapid development of remote sensing science and technologies and the generated mass geospatial data pose challenges in delivery and usage of the data from data providers to end users. In particular the end users who have limited resources in data processing are always challenged to deal with data from different formats packaged for the effective mass data delivery but not user friendly for simple office operations. For example, mass remote sensing data and geospatial data products have been routinely packaged and delivered using the Hierarchical Data Format (HDF). Although HDF is most effective in management of extremely large and complex data, it could be a huge challenge for end users who only need to conduct simple routines such as on screen display, visualization and GIS mapping using selected data from the package. It would require the end users to possess multiple specialized software packages and programming skills in data extraction and preparation. The end users may prefer using existing and widely used data analysis tools and software framework even if those seem inefficient. To bridge such a gap we developed a set of geospatial data processing tools by integration of Python, GDAL, ArcGIS Engine, and C# in extraction of data packaged and delivered in HDF. This

978-1-4577-1005-6/11/$26.00 ©2011 IEEE

3464

paper explains the principal strategies that we took for the execution of automated batch processing that converted and prepared mass data from HDF5 to user preferred format that common GIS and remote sensing image processing software systems could handle. 2. DATA, GDAL AND PYTHON 2.1. Data As part of the efforts in development of a decision support system for monitoring, reporting and forecasting ecological conditions of the Appalachian Trail region [1], the project team employed multi-platform remote sensing and data products provided by the Terrestrial Observation and Prediction System (TOPS) [2]. The mass data for the study area [1] were packaged in HDF5 format and include the followings. ƒ ƒ ƒ ƒ ƒ ƒ ƒ ƒ ƒ ƒ

Landcover Dynamics (MOD12Q2) 2001-2006 Snow Cover (MOD10A2) 2000-2009 Land Cover Type (MOD12Q1) 2001-2004 Vegetation Indices (MOD13A2) 2000-2009 Leaf Area Index FPAR (MOD15A2) 2000-2009 NDVI (MOD13Q1) 2000-2009 Land Surface Temperature (MOD11A2) 2000 - 2009 GIMMS (Global Inventory Modeling & Mapping Studies) NDVI 1981-2006 NACP (North American Carbon Program) GPP, NPP and NEP 1982-2006 SOGS (Surface Observation Gridding System) 8-km resolution metrological data 1976-2009

The users of such a decision support system mostly only have routine office GIS operation capacity but are not capable of working on the mentioned HDF5 data directly. 2.2. HDF5 The HDF, in particular HDF5, is a suite of supporting libraries for reading and writing data and working utilities.

IGARSS 2011

It makes possible the management of extremely large and complex data collections, and for data preparation, delivery and packaging. For example, a sub-dataset in the above mentioned SOGS data contains more than ten thousands of individual files that would be difficult to manage otherwise. The end users who employ basic remote sensing and GIS software systems in resource management and mapping would have difficulty to handle the HDF5 data directly. Among widely used GIS and image processing software systems, ArcGIS would not be able to display HDF5 format data until the most recent release of ArcGIS10. Even with this update, the end users could feel troublesome when they make attempts to extract one or more particularly interested single sub-dataset within the HDF5 package. Therefore upon receiving the mass data, we needed to develop a data processing routine that could extract the data imbedded in the HDF5 files and convert the data into a data format popular to end users for simple usage such as visualization, mapping and analysis.

The HDF5 datasets that we received include data from different origins in different spatial and temporal resolutions. For example, the SOGS package includes datasets of dewpoint temperature (DEWP), precipitation (PRECIP), daytime average short wave radiation (SRAD), maximum temperature (TMAX), minimum temperature (TMIN) and daytime average VPD (VPD) from 1976 to 2009. In total, there are 365 (calendar year) data files for each year and about 12,000 data files for the 33 years in the HDF5 data package. We must know the name of the sub-datasets, the projection system used and the ranges of coordinates.

2.3. GDAL and Python Geospatial Data Abstraction Library (GDAL) is a translator library for raster geospatial data formats that is released under an X/MIT style Open Source license by the Open Source Geospatial Foundation. As a library, it presents a single abstract data model to the calling application for all supported formats. It also comes with a variety of useful command-line utilities for data translation and processing. The GDAL can be used in the Perl, Python, VB6 Bindings, Java, C# and .Net languages. FWTools includes OpenEV, GDAL, MapServer, PROJ.4 and OGDI as well as some supporting components [3]. Python is a programming language with a number of tools for using the GDAL. At the same time, Python is an opensource language that has been integrated into ArcGIS. In developing Python-based geoprocessing tools for ArcGIS, all tools share a common graphical user interface and developers must implement only the geospatial analysis tasks performed by the tool [3]. Therefore we selected python as the programming language to process the data. 3. METHODS 3.1 Batch Processing We employed the GDAL to read and covert files and python coding for batch processing of the data. Figure 1 illustrates the flow and steps for automated batch processing of mass remote sensing and geospatial data. The GDAL library was used to batch convert HDF5 format to other GIS data format, the ArcGIS Toolbox was used to batch geodata processing in spatial statistical and analysis.

3465

Figure 1. The flow chart for automated batch processing.

Therefore, three steps were taken to extract the data and to complete the format conversions. Following coding fetched dataset from HDF5 to ENVI format files. Step1: Extract Sub-datasets from HDF5 1) Traversal all the files: def walk_dir(dir,fileinfo,topdown=True): for root, dirs, files in os.walk(dir, topdown): for name in dirs: str = (os.path.join(root,name)) + "/*.h5" fileList = glob.glob(str) for name in fileList: name_SOGS = name[:-3] 2) Extract sub-dataset: strCmd="gdal_translate -of ENVI HDF5:" + name + "://DEWP" + ' ' + name_SOGS +'DEWP.img' os.system(strCmd) "://DEWP" was used to extract sub-dataset. Other HDF5 data could use the same method.

3) Output header file information: We used the following code to output header information of the data in ENVI data format with identified projection system and coordinates, e.g., the "Lambert Azimuthal Equal Area" projection in this case.

gp.ZonalStatisticsAsTable_sa(type_shp, "SUBSECTION", fc, outputfc, "DATA") fc=fcs.next() We used the listRasters interface of geoprocessing tool to list all the files. We implemented batch processing such as to remove background value used the following code.

f= open(name_SOGS+'DEWP.hdr','a') gp.SingleOutputMapAlgebra_sa("con("+raster+" 0,"+ raster +")", outputfc, "") Other tools used included clipping, raster to ASCII conversion, computation of Zonal Statistics, which are available in the toolbox. End users could invoke those tool functions by geoprocessing tool in the ArcGIS toolbox. Python programs invoked the tool by importing the *.py module and called it directly.

f.write('projection info = {11, 6378137.0, 6356752.3, 40.000000, -77.000000, 0.0, 0.0, WGS-84, Lambert Azimuthal Equal Area, units=Meters} '+'\n') Step2: Extraction of daily data from yearly data We employed the Python coding to extract daily data imbedded in the yearly data from the SOGS dataset and converted the data into .img format. 1) Traversal daily files: def walk_dir(dir,fileinfo,topdown=True): for root, dirs, files in os.walk(dir, topdown): for name in dirs: str = (os.path.join(root,name)) + "/*DEWP.img" fileList = glob.glob(str) for name in fileList: 2) Extract daily files: name_without_suffix = name[:-4] strCmd="gdal_translate -of HFA -b 1 "+ name +" "+ name_without_suffix + '1.img' os.system(strCmd)

3.3 Other Batch Processing Some data are difficult to handle for completing the calculation in Toolbox. For example, to extract the average temperature at subsection level of the study area from the SOGS data, at first we used the zonal statistics as the table function in the automatic data processing program to obtain all table for each day. However, tens of thousands of tables in the dataset would not be the files that end users would want to deal with. Therefore we developed the program to merge, run statistics, and calculate in the operations to generate data in common file format such as Microsoft Excel or ASCII (Figure 2).

Step3: Run the Python code When the Python coding completed, specification of the folder path in FWTools was required to run the program. We used the Python coding to specify a directory traversal of all or specified data files to automate the process [4][5]. 3.2 ArcToolbox Batch Processing We used the ArcGIS Model Builder and ArcToolbox to build the model and create scripts with Python. The example coding used “Zonal Statistics As Table” in Toolbox model to batch processing the data. The following code can be used to list all raster data that need to process. try: fcs= gp.listrasters("","img") fcs.reset() fc= fcs.next() while fc: TrimShp=fc[:-4] outputfc=outws+"/"+TrimShp+".dbf"

3466

Figure 2 The example for merging all the tables to the excel file.

4. RESULT The following examples illustrate the results out of the automated batch processing of mass remote sensing and geospatial data in format conversion, sub-dataset extraction, pre-processing, spatial analyze and statistics.

The process of SOGS data from Step1 to Step 3 (Figure 3) at the subsection level took about 3 days on a station operated by Windows XP with Intel(R) Core(TM) Quad CPU and 2GB RAM. The automated batch processing was repeated for the rest of the datasets for the study area with variations of processing times depended on the complexity level of the data and the spatial and temporal resolutions.

DEWP(*.img) Step1

SARD(*.img) PRECIP(*.img)

SOGS ˄*.H5˅

Step2

TMAX(*.img)

Daily Data (* i ) Step3

TMIN(*.img) VPD(*.img)

Statistical Data(*.xlsx)

5. CONCLUSION We presented a batch processing method to convert mass remote sensing data from HDF5 format to simple raster data such as the .img, .grd, .tiff and other formats for end users who are familiar with and capable of basics GIS operations but have little experience in data format conversion. This automated batch processing bridges the gap between data providers and end users for the effective use of the mass remote sensing and geospatial data. It also increases the readiness of end users to deal with continued updates and supplies of mass data in the future. Converting mass data from HDF5 format to user preferred simple data format could be inefficient in data processing and modeling, however it is helpful to end users in extraction of information in GIS operations for management practices.

Figure3. The data processing scheme.

6. ACKNOWLEDGEMENTS

After data conversion and extraction, the raster data layers could be used for visualization and mapping and the statistical data could be used in research analysis and summary. For example from the approaches developed in Step 3 we obtained the information to reveal and to display spatial–temporal characteristic of the study area in LST (Figure 4A), LAI (Figure 4B), NDVI (Figure 4C) and Snow Cover (Figure 4D), among others.

This study provided assistance as part of the efforts in data preparation for the development of a decision support system for monitoring, reporting and forecasting ecological conditions of the Appalachian Trail [1]. TOPS data were developed and provided by the project team members Ramakrishna Nemani, Hirofumi Harshimoto, Forrest Melton, and Samuel Hiatt from the Ames Research Center. Other project team members including those listed as coauthors in the reference [1] contributed significantly in data development and preparation. 7. REFERENCES [1] Wang, Y., R. Nemani, F. Dieffenbach, K. Stolte, G. Holcomb, M. Robinson, C.C. Reese, M. McNiff, R. Duhaime, G. Tierney, B. Mitchell, P. August, P. Paton, C. LaBash (2010). "Development of a Decision Support System for Monitoring, Reporting and Forecasting Ecological Conditions of the Appalachian Trail." in Proceedings of the 2010 IEEE International Geoscience and Remote Sensing Symposium, IEEE Xplore, entry: 978-1-4244-9566-5, pp. 2095-2098. [2] Nemani, R., H. Hashimoto, et al. (2009). "Monitoring and forecasting ecosystem dynamics using the Terrestrial Observation and Prediction System (TOPS)." Remote Sensing of Environment 113(7): 1497-1509. [3] Roberts, J. J., B. D. Best, et al. (2010). "Marine Geospatial Ecology Tools: An integrated framework for ecological geoprocessing with ArcGIS, Python, R, MATLAB, and C++." Environmental Modelling and Software 25(10): 1197-1207. [4] Xie, H., X. Zhou, et al. (2005). "GIS-based NEXRAD Stage III precipitation database: Automated approaches for data processing and visualization." Computers and Geosciences 31(1): 65-76. [5] Zhao, S., T. Yu, et al. (2010). GDAL-based extend ArcGIS engine's support for HDF file format. 18th International Conference on Geoinformatics, Geoinformatics 2010 , art. no. 5567583.

Figure 4. Examples of spatial and temporal data for end users in LST (A), LAI (B), NDVI (C) and Snow Cover (D) at spatial distribution as a particular time and trends in time sequences.

3467

Suggest Documents