Geographic Information Systems Applications on ... - Semantic Scholar

2 downloads 0 Views 252KB Size Report
RDN CRC Distributed High Performance Computing Project. Abstract. .... in plain ASCII and list relevant information such as calibration data and satellite.
Proc. HPCN Europe '97, Vienna, 28-30 April 1997, Technical Note DHPC-003.

Geographic Information Systems Applications on an ATM-Based Distributed High Performance Computing System November 1996

K.A.Hawick?, H.A.James, K.J.Maciunas, F.A.Vaughan, A.L.Wendelborn Department of Computer Science, University of Adelaide, SA 5005, Australia & M.Buchhorn, M.Rezny, S.R.Taylor, M.D.Wilson Australian National University, Canberra, ACT 0200, Australia RDN CRC Distributed High Performance Computing Project

Abstract. We present a distributed geographic information system (DGIS) built on a distributed high performance computing environment using a number of software infrastructural building blocks and computational resources interconnected by an ATM-based broadband network. Archiving, access and processing of scienti c data are discussed in the context of geographic and environmental applications with special emphasis on the potential for local-area weather, agriculture, soil and land management products. Software technologies such as tiling and caching techniques can be used to optimise storage requirements and response time for applications requiring very large data sets such as multi-channel satellite data. Distributed High Performance Computing hardware technology underpins our proposed system. In particular, we discuss the capabilities of a distributed hardware environment incorporating: high bandwidth communications networks such as Telstra's Experimental Broadband Network (EBN); large capacity hierarchical storage systems; and high performance parallel computing resources.

Introduction This document describes a distributed Geographic Information System (DGIS). We believe the current hardware and software technologies available to us can be integrated together to provide a glimpse of the sort of system that will become routinely available to end-users in a few years. We have considered a number of end-user niches and are focusing on environmental applications including: agricultural decision support; climate and weather prediction; and soil and land care management. We also expect to gain insights into a number of more technologically driven aspects of distributed geographic information systems such as the distributed management of very large scienti c data repositories. We believe this project to be a worthwhile directed application of current DHPC technology [1] against the background of global climatic change and the need for ?

Author for correspondence, Email: [email protected], Fax: +61 8 8303 4366, Tel +61 8 8303 4519.

improved management of scarce natural resources and increased automation of traditionally labour intensive activities like crop production. The project is built around applications access to data from the Geostationary Meteorological Satellites. In particular we consider data from the Japanese Meteorological Agency's (JMA) Satellite \Himawara 5", known as GMS-5[11]. This provides visible spectra and Infra-Red (IR) spectra data for a large region of the earth including Australia. In summary, the proposed system will employ a distributed hierarchical le store with component parts communicating across Telstra's high bandwidth Asynchronous Transfer Mode (ATM) [5] Experimental Broadband Network (EBN)[9]. Even with the bandwidths of current telecommunications infrastructure a DGIS can make use of very high bandwidths along the backbone of a DHPC system which connects the major computational and storage nodes. End-users may access derived-data products along lower bandwidth lines to service points on the backbone. Furthermore, we believe our project is in fact prototyping for the future when farmers and other end-users may routinely have ISDN or even 10Mbit modem lines. Backbone bandwidths will probably increase by similar levels over the same time.

Applications of a DGIS We have considered the potential end users of a DGIS and have identi ed the following environmental applications areas to whom we believe a DGIS would be of particular value:

{ { {

farmers and soil and land resource managers interested in agriculture and highly localized weather prediction capabilities; weather forecasters and nowcasters2 interested in detailed short term atmospheric phenomena at global, national and local levels; climate prediction scientists interested in trends such as warming and pollution intensity changes over relatively long periods of time.

Within each user category there are likely to be a spectrum of actual users with di erent needs. For example, in agricultural resource management the government scientist may wish to prepare information products based on many individual farms or stations and may have a budget for local fast computing and a direct connection to the DGIS network. Individual farmers may have only desktop PC or Macintosh resources on their own premises. There is a corresponding range of delivered information products that will be within the interest level of di erent end-users. The DGIS we propose will enable end-users such as farmers who do not have expensive computing hardware by automatically carrying out the costly (in computing and storage resources) data processing within the system to yield the desired information products. One possible user pro le is derived from agricultural land management. The task is managing and predicting the crop-yield performance and possible degradation of arable land. Figure 1 illustrates the feeding chain of data in a GIS landcare system. The data chains from satellite to repository to applications programs, via intermediate level 2

Nowcasting is the rapid analysis of data in assessing current weather conditions. This is dicult due to limitations in gathering multi source data rapidly enough together in one place.

GMS Satelite Data First Level Processing

ftp Site, One Day maximum capacity

60 Tape DLT 4000 Stacker 1.2 TB

Cut out Australia Georectify Data Radiometric Correction

Cloud Recognition Rainfall Prediction Finite Element Simulation

Cache

Land Response Prediction

End User

Fig. 1. GIS/Landcare Feeding Chain information products of use to scientists and analysts and nally to end-user products of use to individual station managers or farmers. The GMS data sets, in concert with other meteorological data, allow for highly localised prediction of rainfall and sunlight levels [7, 3]. Precipitation data is derived from the GMS data by automatic assimilation of moisture channel data into local atmospheric dynamic models, the integration of which with soil/vegetation models can provide an hour-by-hour pro le of rainfall and evapotranspiration. With detailed soil, vegetation and land topological data, possibly derived from visible channel GMS data, a detailed nite element water runo and crop growth prediction system may be developed. Some agricultural crop yield and landcare models are described in [13, 8]. Coarse grained weather models which rely upon sub-grid scale approximations for local weather prediction have proved unreliable in accurately modelling vegetation growth. Averages over time and space in such systems cannot di erentiate between steady rainfall and short heavy storms - two extremes which give widely di erent crop yields. This temporal and spatial resolution limitation is the major obstacle to applying such models to crop production. This assimilation of data into a weather model is a non-trivial task computationally [6]. We envisage a scenario in which such a DGIS would provide soil management scientists as well as individual station managers and farmers with highly localised decision support data.

Satellite Data Repository In this section we describe the satellite data that will constitute the primary data repository content of our prototype DGIS. We discuss the Japanese GMS-5 satellite primarily although it is probable that some storage and partial processing of other satellites' data may form part of our DGIS applications. The Geostationary Meteorological Satellite (GMS) provides more than 24 full hemisphere multichannel images per day, requiring approximately 204MBytes storage capacity per day, or 75GBytes per year. This raw data must be the primary repository

format since processing loses some information and may not be reversible. However, various loss-less compression algorithms (such as implemented in the compress or gnuzip programs - see for example [15]) can be applied to this data at the cost of changing the random access time for a given subset. The data can be compressed losslessly in various ways - for example, compressing an entire week of data at once allows for greater compression eciency than does compressing a single image le. The tradeo is that the granule of data manipulation is then forced to be a week rather than a single image and the random access time for an arbitrary image is increased.

The GMS-5 Satellite The GMS-5 satellite [11] was launched in June 1995 and provides visual and infrared data in various wavelength channels from a Visible and Infra-Red Spin Scan Radiometer (VISSR). The channels and resolution are shown in Table 1. Some typical images are shown (at 1/16th size) in gure 2. Each data set consists of 4 image les and a number of documentation les. The GMS-5 documentation les are in plain ASCII and list relevant information such as calibration data and satellite schedules. One image represents albedo in the visible part of the spectrum and the other three are in the infrared range. Each image is a full-disk photograph of the Earth from an altitude of 35,800km and a position almost directly over the equator at 140 degrees east (close to Adelaide's longitude). These images are 2291x2291 pixel resolution, (approximately 3km pixel edge resolution) with varying resolving power resolution for the di erent channels. Channel

Wavelength Resolution ( m) (km) Visual 0.5 - 0.75 1.5 Thermal IR 1 10.5 - 11.5 5.0 Thermal IR 2 11.5 - 12.5 5.0 Water Vapour IR 3 6.5 - 7.0 5.0

Table 1. GMS-5 VISSR Channels The images and metadata are provided in Hierarchical Data Format (HDF) [12], an NCSA le format which is supported by a set of utilities and programming libraries. We regard this collection of HDF data as our \raw" data, although it has been calibrated and preprocessed to some extent before distribution. The Visible and Infra-Red Spin Scan Radiometer (VISSR) records visible radiation through a photomultiplier tube and infra-red radiation through a HgCdTe detector, using a scanning mirror system. Signals are quantized into 64 bits (visual) and 256 bits (infra-red) prior to transmission to an earth based receiving station. The satellite takes approximately 27.5 minutes to record visual and IR data in a 20x20 degree area which includes the earth disk image with approximately 2500 mirror scan steps. Imaging swathes are 5 km by 1 scan line for infra-red and 1.25 km by 4 scan lines for visual data. The satellite is spin-stabilized and scans are synchronised with the spin rate. The observed schedule is full earth disk images hourly with images labeled 0000Universal Time (UT) actually observed between 2330UT and 0000UT. Variations in the

i)

ii)

Fig. 2. GMS-5 Data: i) Visible Spectra and ii) Water Vapour IR Spectra observation schedule are made for periods when the satellite is eclipsed, for periods of solar interference, typhoon special observation periods and occasional satellite maintainance periods. More information on the GMS satellite and its sensors is given in [11].

Data Storage and Tiling The raw satellite data can be stored in a number of ways, and there are tradeo s between the storage capacity required and the access speed obtained for a given application. An 80GByte disk array could hold approximately one full years worth of multichannel data which although convenient for short range process applications is signi cantly complicated by the caching strategies necessary to optimise access response patterns. The need for some applications to access more than one year of data further complicates matters and suggests the need for a carefully optimised hierarchical storage scheme. We do not discuss this further here but note that we are constructing a hierarchical le store incorporating distributed tape silos and RAID technology as well as optimised data mapping or striping on the devices. Some technologies for carrying out these operations are described in [2] which builds on earlier work on parallel data transforms [4].

Satellite Data Processing A range of primary and derived processes are needed to prepare the GMS data for use in a particular application. Generally these may be divided into: navigational; radiometric; geometric; and application-speci c. Navigational correction is already applied in the HDF level 1.0 data format we receive and this corrects the satellite signal data to yield a hemispherical disk view of the earth. Radiometric correction is then necessary to compensate for changes in satellite sensitivity due to thermal and other operational variations in the remote sensing instrumentation. Geometric correction is needed to compensate for the curvature of the earth and produce data that can be manipulated by a Geographic Information System (GIS) and other

applications. The raw HDF 1.0 data we receive incorporates information to allow radiometric correction and geometric correction to be applied.

GMS Image Format and Pre-Processing Only subsets of the satellite image data are needed for agricultural modelling purposes. The images also need to be warped to allow for the earth's curvature and possibly remapped so that Australia appears attened. The images also have to be registered to allow for variations in the received data so that pixels correspond to known ground positions. Since Australia, in particular the southern region, is so far below the equator, the warping e ect is quite signi cant. When the Australian image area is warped or recti ed, the shape of the pixels will have to be handled carefully. Pixels may need to be smoothed or interpolated. Figure 3 illustrates that the warping e ect for Australia is signi cant. Also shown is one of the simple image processing algorithms that can be applied to raw data to enhance contrast.

i)

ii)

Fig. 3. Visible Spectrum GMS-5 Data for Australia: i) unprocessed and ii) Histogram equalised

Application-Speci c Processing Application-speci c uses of the radiometrically and geometrically corrected GMS satellite data include:

{ { { { { { {

direct visualisation of ne resolution of visual channel and water vapour channel imagery. combine di erent channels of data to derive other analysis elds. histogram or otherwise average primary or derived elds as percentage indices for various pollution or land usage or vegetative growth measures. grouping together of various time slice sequences of imagery into historical movie sequences. arti cial colour enhancement of particular features in the data. automatic feature recognition and extraction in the data. combine primary or derived GMS data with other aerial or satellite data with di erent observation channels.

{ combine primary or derived channels with other spatial data such as other

observed elds or vector data such as political or infrastructural data such as roadways or agricultural boundaries. { interpolate spatial data or time-sequence data into an atmospheric or oceanic simulation.

Our prototype system implements a selection of these and our plans to implement more are dependent upon the needs of our collaborators. There are a number of data assimilation processes that are applicable to meteorological satellite data. The low-level processing required for the US National Oceanic and Atmospheric Administration (NOAA) Advanced Very High Resolution Radiometer (AVHRR) data includes calibration, navigation, re-mapping and atmospheric correction. High level products built on these processes include: land and sea surface temperatures; vegetation and various atmosphere content indices and also integration with TIROS(TV and IR Observation Satellite) Operational Vertical Sounder (TOVS) data. Various agencies hold world-class algorithms for these remote sensing processes and work is underway to develop calibrated equivalent algorithms for GMS-5 data. There is scope for deriving: cloud distribution; cloud amount; cloud height; wind eld; cyclone development tendencies as well as sea and land surface temperatures. Other derived information products such as Atmosphere-corrected Normalized Di erence Vegetation Index (ANDVI); Volcanic Ash Cloud Index (VACI); Fuel Moisture Index (FMI) [10] are all feasible given suitable applications demand.

Satellite Data Access System We have built a prototype data manipulation and access delivery package (known as Eric) for out satellite data repository. This prototype uses the Common Gateway Interface (CGI) mechanisms to allow a World Wide Web server to provide access services to our GMS-5 data. Services provided by the package include: browsing of the repository at various resolutions; delivery of an image at a user-selected resolution; cutting of the main image to sub areas such as the whole hemisphere, Australia or South Australia; concatenation of a sequence of images into an MPEG video format; selection of a particular sequence of images staggered over weeks or months of images to allow viewing of seasonal trends. The user is also able to select a particular channel of data { visible, infra-red or water-vapour. The system is illustrated in gures 4 and 5 which show the hypertext query-interface form and a typical resulting image, respectively. The package is implemented using Perl interpreted script language which runs entirely on the server side and generates its own invoking form which is down-loaded to the user. The main script invokes a number of image manipulation programs that are currently all run on high performance workstations within our distributed high performance computing environment. We are currently developing an interface to allow more complex remote execution on high performance computers such as our Connection Machine (CM5) and SGI Power Challenge. These platforms will be able to carry out compute intensive multi-source data-fusion and other processing operations which would be impossible in interactive time on workstation. We are also integrating in a number of the functions provided by the GRASS Geographic Information System (GIS)[14] which is in the public domain. This approach allows us to build upon the considerable collection of GIS functions already avail-

Fig. 4. Query Interface for Satellite Image Retrieval from Archive able in GRASS to make more complex compound operations and hence provide very powerful services to a remote user. Our distributed high performance computing environment [1] also allows us the capability of integrating computationally-expensive operations such as kriging, that are run on specialist computing resources such as a parallel computer or task-farmed network of computers.

Discussion We have discussed a distributed geographic information system including the market sectors that would bene t from such as system and our early prototype implementation. In parallel with construction of this application prototype, we are developing a software infrastructure to allow best use of the distributed computing resources available to us and to our collaborators. We are working in particular with the Cooperative Research Centre for Soil and Land Management (SLM CRC) in South Australia to develop a DGIS infrastructure and interface that will be of direct use to SLM CRC scientists and ultimately to their own end-user community - namely farmers and land resource managers who may wish to use information products for decision support and planning purposes.

Fig. 5. Results from Satellite Image Retrieval from Archive

Acknowledgments Distributed High Performance Computing (DHPC) is a project of the Research Data Networks Cooperative Research Center (RDN CRC), is managed by the Advanced Computational Systems CRC, and is a joint activity of the University of Adelaide and the Australian National University. Some of the computational resources we have used in this work are owned by the South Australian Centre for Parallel Computing (SACPC), located in Adelaide. We thank Telstra for provision of the Experimental Broadband Network. The idea for parallel computational enhancements to a geographic information systems originated during discussions with Geo rey Fox and Kim Mills at the Northeast Parallel Architectures Center, Syracuse University. KAH wishes to thank Geo rey and Kim for this.

References 1. \An ATM-based Distributed High Performance Computing System" K.A.Hawick, H.A.James, K.J.Maciunas, F.A.Vaughan, A.L.Wendelborn, M.Buchhorn, M.Rezny, S.Taylor, and M.Wilson, Submitted to HPCN 97. 2. \K-Tiling: A Structure to Support Regular Ordering and Mapping of Image Data" Oscar Bosman, Peter Fletcher and Kenneth Tsui, Proc. Australian Pattern Recognition Society Workshop on Two and Three Dimensional Spatial Data: Representation and Standards, December 1992, Perth, Western Australia. 3. \Atmospheric Data Analysis", Roger Daley Pub. Cambridge University Press, 1991, ISBN 0-521-45825-0.

4. \Parallel Data Transforms", P.M.Flanders and S.F.Reddaway, DAP Series, Technical Note, Active Memory Technology, 1988. 5. \ATM Networks - Concepts, Protocols, Applications", R.Handel, M.N.Huber, S.Schroder, Pub. Addison-Wesley, 1994, ISBN 0-201-42274-3. 6. \Parallelisation of the Uni ed Model Data Assimilation Scheme" K.A.Hawick, R.S.Bell, A.Dickinson, P.D.Surry and B.J.N.Wylie, Proc. Workshop of Fifth ECMWF Workshop on Use of Parallel Processors in Meteorology, European Centre for Medium Range Weather Forecasting (ECMWF), Reading November 1992. (Invited paper) 7. \The Physics of Atmospheres", John T. Houghton, Pub. Cambridge University Press, Second Edition, 1986, ISBN 0-521-33956-1. 8. \Modelling of agricultural production: weather, soils and crops", H. van Keulen and J. Wolf Pub. Centre for Agricultural Publishing and Documentation, Wageningen, 1986, ISBN 90-220-0858-4. 9. \Telstra's Experimental Broadband Network", D.Kirkham, Telecommunications Journal of Australia, Vol 45, No 2, 1995. 10. \Applied Remote Sensing", C. P. Lo, Pub. Longman, 1986, ISBN 0-582-30132-7. 11. \The GMS User's Guide", Pub. Meteorological Satellite Center, 3-235 Nakakiyoto, Kiyose, Tokyo 204, Japan. Second Edition, 1989. 12. \Getting started with HDF - User Manual" NCSA, University of Illinois at UrbanaChampaign, May 1993. 13. \Climate System Modelling", Edited by Kevin E. Trenberth, Pub. Cambridge University Press, 1992, ISBN 0-521-43261-6. 14. \Geographic Resource Analysis Support System (GRASS 4.1) User Reference Manual" United States Army Corps of Engineers, Construction Engineering Research Laboratories (CERL), Champaign, Illinois, Spring 1993. 15. \Managing Gigabytes: Compressing and Indexing Documents and Images", I.H. Witten, A. Mo at, and T.C.Bell, Pub. Van Nostrand Reinhold, 1994, ISBN 0-442-01863-0.