Disaster Management & Assessment System Using Interfaced Satellite and Terrestrial Grids Pankaj Ojha1*, Mangala N..1, Prahlada Rao B.B.,1, Manavalan R.1, Tapan Mishra2, V.Manavala Ramanujam2, Haresh Bhat2 1
Centre for Development of Advanced Computing (C-DAC), Bangalore, India. e-mail: {pankajo*, mangala, prahladab, rmanavalan}@cdacb.ernet.in 2
Space Application Centre –ISRO, Ahmedabad, India. e-mail:
[email protected]
ABSTRACT Flood assessment is one of the challenging application of multidisciplinary disaster management system. The Authors in this paper present a flood assessment system implemented on Indian National grid- GARUDA and the satellite grid technologies. DMSAR is a collaborative project between ISRO’s Space Application Center and C-DAC. The purpose of this Application is to demonstrate the important interfaces between Satellite Grids like GSAT/ EDUSAT and C-DACs Terrestrial grid- GARUDA grid. Disaster Management using Synthetic Aperture Radar (DMSAR) is a project to delineate the distribution of open water flood data of a flood affected region, captured using an airborne Synthetic Aperture Radar for Disaster Management. The raw radar data is huge and processing its requires large compute power. In the grid based DMSAR, the raw data is distributed using both satellite and terrestrial network channels of the GARUDA. The workflow designed splits the raw data into multiple parts and distribute it to different computing clusters available on GARUDA grid for processing. As the raw data volume is large, and the digital signal processing component of the data processing, exploits the inherent coarse grain and fine grain parallelism available in the application, to reduce the total cycle time. The application scaled well on grid with the availability of high performance clusters on GARUDA. The post processing takes care of collating the partial results at the grid head node.
Keywords: Grid-enabled, Disaster Management, DMSAR, Satellite grid, Terrestrial grid, GARUDA 1. INTRODUCTION Disaster Management needs high speed reliable networks, large data storage, efficient distributed processing algorithms and data dissemination with fast reaction time. Technological advancements in these fields made Grid Computing a reality which can be used by scientific community for various real time disaster simulations. Several attempts are being made world over to use Grid Computing technology for Disaster Management solutions. The Network for Earthquake Engineering Simulation (NEES)[14], NSF[15] is using grid technologies to link earthquake engineering researchers across the USA with shared engineering research equipment, data resources and leading-edge computing resources. The Crossgrid
[17] has developed a flood disaster management support system in which the flood crisis teams, would be able to predict the risk of a flood on the basis of historical records and actual hydrological and meteorological data and provide information to the crisis team decision support system. It supports flood monitoring, forecasting and simulation. Flood is one of the frequently occurring natural calamity causing huge damages. Flood Assessment System will help to monitor how the flood is spreading. Disaster Management Using Synthetic Aperture Radar is a project to delineate the distribution of open water flood Data of the flood affected region is captured using airborne Synthetic Aperture Radar for Disaster Management (DMSAR)[9]. The GARUDA Flood Assessment System [24] involves several components. This focus of our experiment is to run the DMSAR application using the resources of SatGrid and GARUDA [16] Grid. The SatGrid is formed using the GSAT-3 / EDUSAT[12] developed by ISRO[8]. GARUDA is the Indian national grid computing initiative anchored by C-DAC[7]. The interconnection between the SatGrid and the GARUDA grid is shown in the Figure 1. The Garuda Grid consists of heterogeneous HPC[18] resources (Linux / AIX clusters) that are at different geographical locations connected through ultra high speed interconnect. The GARUDA grid has aggregated compute clusters from research institutions across 17 different cities of India and has currently over 600CPUs. This Flood Assessment System is running smoothly on this integrated satellite and terrestrial grid. The data gathered by the Synthetic Aperture Radar (SAR) by monitoring the earth's surface is sent to the Satellite Interface Terminal (SIT). The SIT at Bangalore receives data from the SIT at Space Application Center - ISRO, via the GSAT-3/EDUSAT and transfers to the GARUDA Grid. There the voluminous input data captured by the SAR is splitted in to parts equal to the number of available clusters. Now the smaller/splitted data set is submitted to the remote clusters of the Garuda Grid resources using
Video Conf
ERDAS / LVE
Bangalore Grid Portal
GSAT-3
SIT Bangalore
Submit node (gridfs machine)
GSAT-3
SIT
SAC, Ahm Audio Conf
LVE
Chennai Linux
Bangalore AIX
Cluster Head Node
LVE
IIT-Guw Linux
Cluster Head Node
Pune Linux
Cluster Head Node
Video Conf
Cluster Head Node
Cluster Head Node LVE
Compute Nodes
Hyderabad Linux Pune SIT
Video Conf
Compute Nodes
Figure 1: Architecture and Resources of SatGrid and GARUDA Grid
Job submission API’s developed by C-DAC. At the remote clusters optimized mathematical libraries for the Digital Signal Processing of the raw data and for the Fast Fourier Transform of the raw data are required. Application Program Interfaces (APIs) are provided for Job Submission to perform functions like • Copy the raw data from the database of the space agency to the head node of the grid. • Splits the raw data to send that data to heterogeneous computing resources to use the computing elements available at different clusters. • Get the input processing information from the user in a friendly graphical user interface. • Get the results back from the across the grid and put them in the grid head node for post processing As the raw data is splitted at grid head node and copied to the heterogeneous locations all the libraries are available at the computing elements and the raw data gathered from the radar can be processed by executables available at the clusters. This job submission API’s are invoked to submit the job to the compute clusters of GARUDA. The last step in flood assessment application is to extract water body information, mosaic/merging of parts of images which we get back from the remote clusters and other flood related information using image processing software ERDAS[19]. The same information gathered by ERDAS can be shared with the other people using Leica
2. APPLICATION SYSTEM ARCHITECTURE This Flood assessment system comprises of the steps shown in the Figure 2 starting from Data Acquisition using airborne synthetic aperture radar to the Remote Visualization of image. Figure 2 shows an airplane fitted with an airborne Synthetic Aperture Radar (SAR) gathering the data from the flood affected area and then transferring the raw data gathered to the central satellite location for uploading to the GSAT-3/EDUSAT satellite. The processing of the gathered raw data is complex compute intensive task which uses several mathematical calculation based on the velocity of the airplane, turbulence, lat long values of the Location based on the inbuilt Geographic Information System (GIS). The program to process this data has been ported, optimized and made available on heterogeneous
Virtual Explorer (LVE) [19] software for the rescue plans of the people which can be carried out by the different agencies. Figure 2: DMSAR Application System
clusters of the GARUDA grid. In grid enabling the DMSAR application [20] several parallelization and optimization techniques have been used like MPI/OpenMP[22,23] and Math Kernel Libraries (mkl)[13]. Once the data is processed the results can be merged, flood features extracted and analyzed using image processing software ERDAS. The resultant data is disseminated to the remote sites where decision making procedure can proceed further with the help of Leica Virtual Explorer (LVE) software.
Data acquisition and transmission Gather raw data from satellite terminal Move data to storage resource broker
Synthetic Aperture Radar developed at the SAC, ISRO is routinely used for the flood monitoring as a part of the disaster management program of ISRO. The raw data collected over four minutes flight time, covering flight length of 30-35 km captures a data of size approximately 9-10 Giga Bytes in size. The raw data captured using the SAR is sent via the GSAT-3/EDUSAT to the Satellite Interface Terminal (SIT) at Bangalore which is connected to the GARUDA grid. This raw data transfer (to and fro) may take some time depend upon the bandwidth available. 3.2 Data Management Using SRB & Splitter Program As there is huge amounts of raw data flow it will be useful to manage this data using storage solutions like the Storage Resource Broker (SRB)[21]. The data downloaded from the satellite at the grid head node is uploaded in to the SRB data server and there it is divided in to number of parts.
Data management using SRB Split raw data at SRB Copy splitted data from SRB to cluster
Job submission and monitoring Copy required input files to grid cluster locns Submit job to grid scheduler
Merging of images Send results back to grid head node Move data to storage resource broker Figure3. Flow Diagram for Flood Assessment System
This section has explained the various steps of the DMSAR application which can be conceptually put together as four parts as shown in the Figure 3. 3. EXISTING APPLICATION ARCHITECTURE The friendly graphical user interface of the Application and the system requirements are described in this section. 3.1 Data Acquisition and Transmission This is the first part of the application which is activated when the natural disaster occurs and it can be stated in terms of these two parts Data Collection Using the SAR Data Transmission to Satellite
3.2.1 Significance of SRB Data Server The memory requirement becomes huge for temporal data collected by SAR at different time instances (T1, T2, Tn). Storage Resource Broker (SRB) offers some exclusive features that are very useful for such situations. The SRB federates high-value data that is located among heterogeneous and distributed storage systems. It serves the purpose of putting the data at the centralized location and sharing data among diverse and distributed systems. SRB allows the virtual organization of data and a simplified presentation view for end-users. SRB Containers help to reduce access times and latencies. Further the SRB employs different authentication mechanisms for its clients and servers using GSI authentication based on the GSSAPI, and Kerberos authentication. 3.2.2 Significance of the Splitter Application Developed for SAR data The data collected by the SAR contains the data in form of frames and range gates. It follows some specific structure and accesses some information from the previous frames/ range gates so that raw data collected can’t be directly cut in to some parts. The splitter program so developed takes as input parameters, the name of raw data file and the number of parts and performs the splitting of the raw data in a manner as to suit the application. There are several constraints for this application which has been taken care in the splitter program like the minimum number of the frames and the range gates, location of some synchronous bytes and header/footers in the raw data are checked. 3.3 Job submission and monitoring The most important step of the application is the job submission since the grid enablement of the application is done so it can run simultaneously on the different clusters of the grid. This grid enablement of the application
The application can be submitted from the SIT terminal directly using the GARUDA Portal API’s. This is done using the simple java programming which is stand alone and also using the workflow GE-MINDA [11] tool developed using the Triana workflow to make it user friendly. There are different API’s developed for the C-DAC Portal which takes care of different tasks like submitting the task to the desired cluster and copying the input files to the desired location. These all tasks are done in the 3 steps 1. Takes the input parameter 2. Creates the RSL [3] file and copy RSL and input files to grid head node 3. Submit the job from the Grid Head node The First step takes the input files needed by the application and the name of the cluster where we need to run the application. The second step is internally taken care by the API’s as the resource specification language file is generated automatically by the API and it is copied to the grid head node along with the supplied input files. The third step is the job submission to the middleware Globus [1] using the resource specification language. This RSL file contains the name of the remote scheduler and the name of the remote cluster where to submit the job. The job submitted will generate the unique job id after the submission of the job to the scheduler of the remote cluster. Then as soon as the job is finished it reports the processing status in the stdout and stderr files.
Nodes Used 04 08 16
Total no. of Processors 32 64 128
Time Taken 15 hrs and 20 minutes 9 hrs and 45 minutes 8 hrs and 30 minutes
Table1: Execution Time for 9GB data set
Since the data can be divided in to N numbers with taken care the constraint of logical binding of the data so smaller the data size, smaller the processing time will be. It will eventually increase the efficiency of the application. The Figure 4 shows the difference between the performance of the different data set and it is evident that it can be further improved if the available number of cluster increases. So the thumb rule is efficiency is directly proportional to number of available clusters. Processing Time Vs Data Size 18 16 14 12 Time (Hr)
involves the hybrid programming MPI[22] and OpenMP[23] and the porting of the application to the various (32/64 bit, Linux/AIX) clusters of the grid.
4 nodes
10
8 nodes 8
16nodes
6 4 2 0 0
2.5
9
Raw Data Size (GB)
If the job gets completed successfully the output images is copied back to the grid head node and then using the image processing tool the flood is extracted from the generated output. Then the extracted features along with the image is send to the competent authority for the decision making process using the satellite terminal grid. 4. PERFORMANCE RESULTS OF THE APPLICATION The raw data can be splitted in the number of parts which is equal to the number of available/up clusters in the grid and this makes the application efficient. The splitted parts of the raw data can now be forwarded to different clusters and then effectively reduces the time taken in processing the smaller raw data. The data stated below in Table1 is tested in one of the clusters of the GARUDA grid. This cluster is composed of 64 bit AMD opteron processors of 2200 MHz and each node is 8 way SMP.
Figure 4: Performance Curve for 9GB and 2.5 GB data sets
In this application we are using 10 meter resolution radar data of size 9 GB and covers an area of about 30-35 km and if executed on a single CPU of Pentium 4 with 2 GHz clock speed it will take a time of around 35 hrs to execute but if the same is executed in the parallel mode having 16 node each of 8 processors takes about 8 hrs and 20 minutes only. The raw data is splitted using the splitter program and transferred to different clusters and the same task can be completed in the much lesser time say in 2 hrs if we have five Clusters of 16 nodes (8 processors each node). In the workflow (GE-MINDA) we can divide the raw data according to number of clusters available for the execution and splits the data across the remote clusters. The total time taken in the workflow is also dependent on the speed of the network since the data is copied from the
Grid head node to the remote clusters for execution so if increasing the bandwidth can increase the speed of the network the performance of the application can be even better. 5. CONCLUSIONS AND FUTURE WORK GARUDA is one of the first NationalGrid in Asia Pacific region having interface with Satellite Grid. On this test bed, C-DAC and SAC teams collaborated to Grid enable the Disaster Management Application, that processes the airborne SAR raw data and produces the images of disaster-affected regions. The encouraging results & experience gained in this experiment, will help to deploy a full-fledged operational DMSAR System in the main production phase of the GARUDA project. The different components of the Grid enabled version of DMSAR have been explained in this paper. The future work is to collaborate the work components, of different Disaster Management bodies, in the full fledged Disater Management application, is quite a challenging task. ISRO is coming up with the Radar Image processing Satellite (RISAT), where the airborne radars will be replace with space borne radars. Thus, some of the steps of this application undergo the required changes, like data capturing and transmission to the satellite grid. The point to point Network Bandwidths also get enhanced to GigaBits/sec which is currently 100MB/sec only. The future GARUDA grid architecture changing to a service oriented architecture, can encourage the use of the open distributed services available in Service oriented Architectures.
REFERENCES: [1] [2]
[3] [4] [5]
[6] [7] [8] [9]
[10]
[11]
[12]
The Globus Project, http://www.globus.org/ GSFL: A Workflow Framework for Grid Services, Sriram Krishnan, Patrick Wagstrom, Gregor von Laszewski1 http://www.globus.org/cog/papers/gsfl-paper.pdf Globus Resource Specification Languange (RSL) v1.0, http://www.globus.org/gram/rsl_spec1.html Visual Grid Workflow in Triana Ian Taylor, Matthew Shields, Ian Wang and Andrew Harrison Programming Scientific and Distributed Workflow with Triana Services, Matthew Shields Schools of Physics and Astronomy and Computer Science, Cardiff University Ian Taylor School of Computer Science, Cardiff University Garuda Grid Computational Resources http://www.garudaindia.in/comp_res.asp Centre for Development of Advanced Computing http://www.cdac.in/ Space Application Center (SAC)–ISRO www.sac.gov.in Disaster Management Using synthetic Aperture Radar (DMSAR)http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber =04558001 Dal, J., Jorgensen, J.H., Christensen, E.L. and Madsen, S.N., 1992,A real-time processor for the Danish airborne SAR,IIEE Proceedings-F, Vol. 139, No. 2, April 1992, pp. 115-12 1. Grid Enabled Workflow For Disaster management and Assessment (GE-MINDA) http://www.dlr.de/Portaldata/32/Resources/dokumente/ceos200 8/ceos08_program.pdf GSAT-3 or EDUSAT http://space.skyrocket.de/index_frame.htm?http://www.skyrock et.de/space/doc_sdat/gsat-3.htm
[13] [14]
[15] [16]
[17] [18] [19]
[20]
[21] [22] [23] [24]
Math Kernel library http://www.dcsc.sdu.dk/docs/intel- mkl/mklman52.pdf 13th IEEE International Symposium on High Performance Distributed Computing (HPDC-13 '04): Laura Pearlman, Carl Kesselman, Sridhar Gullapalli1, B.F. Spencer, Jr., Joe Futrelle,Kathleen Ricker, Ian Foster, Paul Hubbard, Charles Severance: Distributed Hybrid Earthquake Engineering Experiments: Experiences with a GroundShaking Grid Application National Science Foundation: www.nsf.gov Ram, N., Ramakrishnan, S. "GARUDA: India's National Grid Computing Initiative," CTWatch Quarterly, Volume 2, Number 1, February 2006. http://www.ctwatch.org/quarterly/articles/2006/02/ garuda-indias-national-grid-computing-initiative/ Crossgrid Project Website http://www.crossgrid.org/main.html High Performance Computing http://en.wikipedia.org/wiki/Highperformance_computing ERDAS and LVE http://www.erdas.com/; http://gi.leicageosystems.com/LGISub1x251x0.aspx Mangala N and Prahlada Rao., Tutorial on Applications Enablement on Grid, in IEEE International Conference on eScience 2007. Storage Resource Broker http://en.wikipedia.org/wiki/Storage_Resource_Broker Message Passing Interface http://en.wikipedia.org/wiki/Message_Passing_Interface OpenMP http://en.wikipedia.org/wiki/OpenMP; http://openmp.org/wp/ Garuda Flood Assessment System (G-FAS) www.isrsindia.org/sessdets.pdf